Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark

 

What is cloud computing?

 

Cloud Computing is a term that encompasses a wide range of use cases and implementation models. In essence, a computing ‘cloud’ is a large shared pool of computing resources including data storage. When someone needs additional computing power, they are simply able to check this out of the pool without much (often any) manual effort on the part of the IT team, which reduces costs and significantly shortens the time needed to start using new computing resources. Most of these ‘clouds’ are run on the public Internet by well-known companies like Amazon and Google. Some larger organisations have also found value in running private clouds inside their own data centres, where similar economies of scale begin to apply.

The generally accepted characteristics of a typical cloud service may be defined as computers and data storage which are:

  • Available when required (‘on demand’), without the need for lengthy procurement and configuration processes;
  • Available on standard networks such as the Internet, without special requirements for obscure or proprietary networking, protocols, or hardware;
  • Able to offer additional capacity as demand increases, and less as demand falls (‘elastic’);
  • Capable of only billing customers for the storage they use.

 

Cloud computing and digital preservation

 

Cloud computing can offer several benefits:

  • The flexibility of the cloud allows relatively rapid and low-cost testing and piloting of emerging service providers. There are already some pilot activities with these cloud services and opportunities for shared learning across the community;
  • There is now much greater flexibility and more options in deployment of cloud storage services and therefore greater relevance to archival repositories compared to earlier years (see Public, Community, Private and Hybrid clouds);
  • There are potential cost savings from easier procurement and economies of scale, particularly for smaller repositories. These are important at a time of financial pressures;
  • Cloud services can provide easy, automated replication to multiple locations essential for business recovery planning and access to professionally managed digital storage; in addition, the specialists can add access to other dedicated tools, procedures, workflow and service agreements, tailored for digital preservation requirements.

 

Cloud service models and service providers

 

There are four different cloud service models:

  • Public – Commercial services hosted in large data centres around the world, accessible over public networks to anyone with the means to pay.
  • Private - Large organisations create their own cloud by virtualising large sets of physical servers inside their own data centres.
  • Hybrid – Combines aspects of combine aspects of public and private cloud , typically to handle large fluctuations in demand, or to satisfy different security requirements.
  • Community - Architecturally, it may be effectively the same as a public cloud service, but optimised for a particular group of users to which access is restricted.

There are currently two classes of cloud service provider: generalists offering cloud storage (Amazon, Rackspace, Google, etc), and specialist companies that address additional specific digital preservation requirements and functions (see Resources and case studies for examples).

 

Positives

 

  • Cloud services can provide easy, automated replication to multiple locations and access to professionally managed digital storage and integrity checking. As a result bit preservation (durability) of digital information can be at least as good (or better) than can be achieved locally;
  • Archives can add access to dedicated tools, procedures, workflow and service agreements, tailored for digital preservation requirements via specialist vendors;
  • There are potential cost savings from easier procurement and economies of scale, particularly for smaller archives;
  • The flexibility of the cloud allows relatively rapid and low-cost testing and piloting of providers;
  • There is much greater flexibility and more options in deployment of cloud services and therefore greater relevance to archives compared to earlier years. In particular private cloud or hybrid cloud implementations can address security concerns over storage of more sensitive material perhaps considered unsuitable for public cloud;
  • Exit strategies can be put in place to address archival concerns over provider stability and longevity or other change risks. For example synchronising content across two cloud service providers or an external cloud with local internal storage; or agreeing an escrow copy held independently by a trusted third-party;
  • There are already some pilot activities with these cloud services and opportunities for shared learning across the community.

 

Negatives

 

  • The Cloud is designed for flexibility and rapid change. Archives however are long-term. Cloud storage and service contracts need careful management through time to meet archive needs. Data held in archives must be expected to be both preserved and accessible beyond the commercial lifespan of any current technology or service provider;
  • Cloud can be cheaper, but it often requires organisations to think differently about the way their budgets are managed. There are also different skills to IT service vendor and contract management that may involve re-training or recruitment costs;
  • Public cloud services tend to bill each month for capacity that has actually been consumed. As a result it can be difficult to budget ahead, or to accurately predict the amount of data likely to be uploaded, stored, or downloaded (however some vendors can invoice you for an annual subscription based on volume);
  • As with any form of outsourcing, it is important that archives exercise due diligence in assessing and controlling the risks of cloud storage. Ensure that any legal requirements and obligations relating to third party rights in, or over, the data to be stored will be met. These may relate to management, preservation or access, and may have been placed upon archives and their parent organisations by their donors and funders via contracts and agreements or via legislation by Government;
  • Use of cloud services will require archives to consider copyright-related questions including: who currently owns the copyright; whether additional licence permissions may be required; what permissions the cloud provider will need to provide the service; whether the cloud provider is able to use the data for their own purposes; and which party will own the rights in any data or works created from the original data;
  • Use of cloud services may raise data security issues, where the relevant data is ‘personal data’ (e.g. data that permits the identification of a living individual), these include determining responsibility for securing data and audit of providers, as well as about location of processing and the extent to which risks incurred by automation of service provision can be addressed by contract;
  • The legal elements of the relationship between an archive and a cloud service provider or providers (e.g. terms of service contracts and service level agreements) must be well defined and meet your requirements. This can be challenging as many cloud providers have standard SLAs and contracts to achieve commodity pricing and have limited flexibility on negotiating terms;
  • Explicit provision must be made for pre-defined exit strategies and effective testing, monitoring and audit procedures.

 

Conclusions

 

The term "cloud" can encompass a wide range of implementation models for digital preservation services. There is much that can be learnt from organisations who have already piloted or moved to use of the cloud. For example several archives have been able to address the most widely held concerns over cloud services and find ways to successfully integrate cloud storage into their digital preservation activities. Others are using cloud based services for all or part of their other digital preservation functions such as preservation planning. Ultimately, procuring cloud services is similar to procuring any IT. You have to manage and address risks like you would for any other part of your IT infrastructure.

 

Resources

The National Archives Guidance on Cloud Storage and Digital Preservation (2nd Edition 2015)

http://www.nationalarchives.gov.uk/documents/CloudStorage-Guidance_March-2015.pdf

This guidance explores how cloud storage in digital preservation is developing, emerging options and good practice, together with requirements and standards that archives should consider. Sections focussing on services, legal issues, and five linked case studies, are provided. Sources of further advice and guidance are also included. (39 pages).

Aitken, B, McCann, P, McHugh, A and Miller, K, 2012, Digital Curation and the Cloud, DCC

http://www.dcc.ac.uk/sites/default/files/documents/Curation-in-the-Cloud_master_final.pdf

This 2012 report focused on the use of cloud services for research data curation. It provides some definitions of Cloud computing and examined a number of cloud approaches open to HE institutions in 2012. (30 pages).

Anderson. S, 2014, Feet On The Ground: A Practical Approach To The Cloud Nine Things To Consider When Assessing Cloud Storage, AV Preserve

http://www.avpreserve.com/wp-content/uploads/2014/02/AssessingCloudStorage.pdf

A white paper on cloud services, divided into nine topics and questions to ask. Vendor profiles against these nine topics are available. (7 pages).

A. Brown, C. Fryer, 'Achieving Sustainable Digital Preservation in the Cloud'

http://www.girona.cat/web/ica2014/ponents/textos/id87.pdf

This paper describes how Parliament is using the cloud as part of its digital repository infrastructure. 2004 (10 pages).

Digital Preservation Specialist Cloud Service Providers

ArchivesDirect

http://archivesdirect.org

ArchivesDirect features a hosted instance of Archivematica with storage via DuraCloud in secure, replicated Amazon S3 and Amazon Glacier storage.

Arkivum

http://arkivum.com

Arkivum's Archive as a Service provides a fully-managed and secure service for long-term data retention with online access and a guarantee of data integrity that's part of its Service Level Agreement and backed by worldwide insurance.

DuraCloud

http://www.duracloud.org

DuraCloud is a managed service from DuraSpace. It provides support and tools that automatically copies content onto several different cloud storage providers and ensures that all copies of the content remain synchronized. See also ArchivesDirect for its joint service with Archivematica.

LIBNOVA

https://www.libnova.com/

The LIBSAFE integrated digital preservation is a microservices/plug-ins based platform that includes a pre-ingestion module, full OAIS conformance and public access interface. It can be obtained as a software license to be deployed on the organization’s infrastructure or as a fully managed cloud service, running in any of the Amazon AWS regions worldwide. 

Preservica

https://preservica.com/digital-archive-software/products-editions

Preservica Cloud Edition is a fully cloud hosted OAIS compliant digital preservation platform that also includes public access/discovery to allow you to safely share your archive or collection

David Rosethal's blog

http://blog.dshr.org/

Contains a number of posts on the economics of cloud computing

 

Case studies

The National Archives case study: Archives & Records Council Wales Digital Preservation Working Group

http://www.nationalarchives.gov.uk/documents/Cloud-Storage-casestudy_Wales_2015.pdf

This case study discusses the experience of a cross-sectoral working group of Welsh archives cooperating to test a range of systems and service deployments in a proof of concept for cloud archiving. It explains the organisational context, the varied nature of their digital preservation requirements and approaches, and their experience with selecting, deploying and testing digital preservation in the cloud. The case study examined the open source Archivematica software with Microsoft's Windows Azure; Archivematica with CloudSigma; Preservica Cloud Edition and has begun testing Archivematica with Arkivum 100. January 2015 (10 pages).

The National Archives case study: Tate Gallery

http://www.nationalarchives.gov.uk/documents/Cloud-Storage-casestudy_Tate_Gallery_2015.pdf

This case study discusses the experience of developing a shared digital archive for the Tate's four physical locations powered by a commercial storage system from Arkivum. It explains the organisational context, the nature of their digital preservation requirements and approaches, and their rationale for selecting Arkivum's on-premise solution, "Arkivum/OnSite" in preference to any cloud-based offerings. It concludes with the key lessons learned, and discusses plans for future development. January 2015 (7 pages).

The National Archives case study: Dorset History Centre

http://www.nationalarchives.gov.uk/documents/Cloud-Storage-case-study_Dorset_2015_%281%29.pdf

This case study covers the Dorset History Centre, a local government archive service. It explains the organisational context of the archive, the nature of its digital preservation requirements and approaches, its two year pilot project using Preservica Cloud Edition (a cloud-based digital preservation service), the archive's technical infrastructure, and the business case and funding for the pilot. It concludes with the key lessons they have learnt and future plans. January 2015 (9 pages).

The National Archives case study: Parliamentary Archives

http://www.nationalarchives.gov.uk/documents/Cloud-Storage-casestudy_Parliament_2015.pdf

This case study covers the Parliamentary Archives and their experience of procuring via the G-Cloud framework. For extra resilience/an exit strategy they have selected two cloud service providers with different underlying storage infrastructures. This is an example of an archive using a hybrid set of storage solutions (part-public cloud and part-locally installed) for digital preservation as the archive has a locally installed preservation system (Preservica Enterprise Edition) which is integrated with cloud and local storage and is storing sensitive material locally, not in the cloud. January 2015 (6 pages).

The National Archives case study: Bodleian Library, University of Oxford

http://www.nationalarchives.gov.uk/documents/Cloud-storage-casestudy_Oxford_2015.pdf

This case study covers the Bodleian Library and the University of Oxford, and the provision of a "private cloud" local infrastructure for its digital collections including digitised books, images and multimedia, research data, and catalogues. It explains the organisational context, the nature of its digital preservation requirements and approaches, its storage services, technical infrastructure, and the business case and funding. It concludes with the key lessons they have learnt and future plans. January 2015 (6 pages).

King's College London Kindura Project

http://link.springer.com/article/10.1186%2F2192-113X-2-13

The Kindura project led by King's College London and funded by Jisc, sought to pilot the use of a hybrid cloud for research data management. It used DuraCloud to broker between storage or compute resources supplied by external cloud services, shared services, or in-house services. There is an earlier Jisc prepared case study but a more recent open-access article on the project is linked.

University of Illinois Archives 2011 evaluation of Archivematica

http://e-records.chrisprom.com/evaluating-open-source-digital-preservation-systems-a-case-study-2/

Angela Jordan describes a 2011 evaluation by the University of Illinois Archives of Archivematica—an open-source, OAIS Reference Model-compliant digital preservation system. Because Archivematica was then in its alpha stages, working with this system was a way to explore what the system offered in relation to the needs of the University Archives, as well as provide input to the developers as they continued to refine the software for production release.