Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark
Introduction
This section covers the emerging practice of using IT storage systems for digital preservation. It deals with generic issues and more specific issues associated with cloud storage are addressed separately (see Cloud services section). The traditional practice of preserving of digital media, for example legacy items within existing collections is also covered elsewhere (see Legacy media section). Many organisations will have mixed strategies or will be in the process of transitioning from one to the other.
The use of storage technology for digital preservation has changed dramatically over the last twenty years. During this time, there has been a change in practice. Previously, the norm was for storing digital materials using discrete media items, e.g. individual CDs, tapes, etc., which are then migrated periodically to address degradation and obsolescence. Today, it has become more common practice to use resilient IT storage systems for the increasingly large volumes of digital material that needs to be preserved, and perhaps more importantly, that needs to be easily and quickly retrievable in a culture of online access. In this way, digital material has become decoupled from the underlying mechanism of its storage. With this come consequent benefits of allowing different preservation activities to be handled independently.
Resilient storage systems
A resilient IT storage system consists of storage media contained within a server that provides built in resilience to various failure modes by using inbuilt redundancy and recovery. For example, a storage system might be hard disk drives in a Redundant Array of Independent Disks (RAID), data tapes in a tape library, or a combination of storage types in a Hierarchical Storage Management system (HSM). It can include onsite storage and/or remote cloud storage and automated replication of digital materials across multiple sites and systems.
These systems will still become obsolete over time and digital materials should be migrated regularly between storage systems as they become obsolete. Migration between storage systems is separate to migration between file formats and can be handled largely as an IT issue, with the proviso that proper oversight is employed to ensure preservation requirements are met. The upside is that the use of IT systems for data storage can provide much faster access, a more scalable solution, easier management, and ultimately lower costs especially at scale.
It is critical to understand the difference between standard IT storage solutions and the additional needs of long-term preservation. It is essential to be able to explain these differences to your IT department or storage service provider and to be able to specify these requirements when procuring a system or service. Standard storage systems are designed for digital objects that are in active use. While backup procedures are usually included, they generally do not meet the more stringent requirements to ensure long-term preservation of digital materials. Backup and digital preservation are not the same thing and many IT departments or experts may not appreciate this. Preservation storage systems require a higher level of geographic redundancy, stronger disaster recovery, longer-term planning, and most importantly active monitoring of data integrity in order to detect unwanted changes such as file corruption or loss.
There are many ways of meeting requirements for preservation storage and these will vary in scale and complexity depending on organisational context. It will be necessary to assess in-house resources and consider out-sourcing and cloud storage options. The approach taken will often depend on the collection size and complexity of collection and the resources that are available within the organisation. It is possible to meet preservation storage requirements with a basic set-up, but as a collection increases in size it will be necessary to address issues such as scalability and automation.
Principles for using IT storage systems for digital preservation |
|
The following represent principles that should be employed when designing or selecting storage systems for preservation. |
|
1 |
Redundancy and diversity
|
2 |
Fixity, monitoring, repair
|
3 |
Technology and vendor watch, risk assessment, and proactive migrations
|
4 |
Consolidation, simplicity, documentation, provenance and audit trails
|
Storage reliability
When looking at storage solutions, either onsite or cloud, the question arises of how reliable they are and if something goes wrong then what does this mean in terms of data loss. Manufacturers will typically assert statistics such as reliability, durability, failure rates and error rates.
For example, this might take the form of a cloud service being designed for 99.999% durability, a Bit Error Rate (BER) of 1 in 1016 when reading data from a data tape, or a Mean Time Between Failure (MTBF) of 1M hours for a hard drive. These numbers are then used to calculate further measures of reliability. For example, a Mean Time To Data Loss (MTTDL) of 1,000 years might be asserted when hard drives are used in a RAID6 array.
These numbers can be hard to understand, and they need to be interpreted with great care when attempting to estimate ‘how safe’ a given storage solution will be.
There has already been substantial work on how to describe, measure and predict storage reliability, including from a digital preservation perspective. This is a complex topic that is not possible to cover in this handbook. Some example references for further reading are Greenan et al (2010), Rosenthal (2010) and Elerath (2009). What comes from this work are several important considerations:
- IT Storage technology is in general remarkably reliable for what it does. Failures are relatively rare events, but they do and will happen. The temptation is to assume that just because at an individual level a particular type of failure hasn't been experienced then that storage technology is in general more reliable than it really is. This is dangerous position. For example, many people will have hard drives that have worked perfectly well for years and years, but the reality is that, on average, up to 5-15% of hard drives actually fail within one year (Backblaze, 2014), (Pinheiro et al, 2007).
- Because failures and errors are relatively rare events, reliability statistics from vendors are typically based on models and simulations and not from long-term observations of what actually happens in practice. For example, if a manufacturer says that the shelf life of media is 30 years then it's not because they have actually tested media over that time period. Likewise, if a vendor estimates the MTTDL is 1,000 years then they clearly haven't built a system and tested it for anywhere near that length of time. Therefore, statistics should be interpreted as best estimates from vendors of how a system might behave in practice - but it may not actually turn out that way. For example, field studies have suggested that manufacturer estimates of reliability can be over optimistic (Jiang et al, 2008) .
- The likelihood of data loss increases dramatically when correlations are taken into account. Correlations are where parts of a system, or different copies of the digital material, can't be considered as independent. If there is a problem with one part of the system or copy of the digital material then there is likely to be a problem with another part or copy. Examples include a manufacturing fault affecting all the hard drives in a storage server, software or firmware bugs systemically corrupting digital material, failure by an organisation to regularly test its backups, or failure to isolate or decouple storage systems so that if one copy of the digital material is accidentally deleted then all the other copies don't get deleted too. These correlation factors can be far more significant than the specific failure modes covered by reliability statistics.
These findings and observations result in the following recommendations:
- Plan for failures to happen in IT storage solutions no matter how cleverly designed by the manufacturer. Failure rates in practice may well be higher than manufacturer statistics suggest.
- Data loss can be caused by failure to put in place proper processes and procedures around the use of IT storage as well as from the storage technology itself. Proper risk assessment is the way to identify and manage these problems.
- The best strategy remains to create multiple independent copies of digital material in different locations and to store them using different technologies where possible. This should include a process of actively and regularly checking data integrity of all the copies so problems can be detected no matter why and where they might occur. In this way, risks are both minimised and spread, and reliance isn't placed on any particular storage technology or service being completely error free.
Multi-copy storage strategies
Digital storage technologies present several risks to long-term preservation of digital objects. These risks can be reduced by using a digital storage strategy that involves one or more storage systems and at least two copies of the data.
Good practice is for a storage strategy to have the following characteristics:
(a) multiple independent copies exist of the digital materials
(b) these copies are geographically separated into different locations
(c) the copies use different storage technologies
(d) the copies use a combination of online and offline storage techniques
(e) storage is actively monitored to ensure any problems are detected and corrected quickly.
A digital storage strategy can be implemented in a staged way, starting with a basic level of protection and access to digital content and moving on towards a more automated and scalable approach that gives a higher level of data safety and security.
Risks to digital content come from a range of sources and a digital storage strategy helps balance the cost of digital storage with the reduction of those risks. Example risks to consider include fire, flood, failure to instigate or follow proper processes or procedure, malicious attack, media degradation, and obsolescence of storage systems and technologies. The principal risks and means of addressing or mitigating them are often addressed in an organisation's business continuity planning (see Risk and change management).
It is important to realise that many examples of content loss are not necessarily due to technical faults with storage technology (although it is important to recognise that these do happen), but can come from human error, lack of budget or planning of storage migrations, or a failure to regularly check and correct failures that might occur.
In a world that is increasingly using networked systems and technologies for digital storage, there is a role for an offline copy of digital materials. This can provide a 'fire break' against problems with online systems that can automatically propagate between locations, e.g. deletion of a file in one location that automatically deletes a mirrored copy at another site.
Making more than one copy of the digital materials is fundamental to achieving a basic level of data safety. Using different types of storage for each copy helps spread the risk and ensure that a problem with one technology doesn't affect the others. The way each copy is stored can be adjusted to achieve an acceptable overall level of cost, risk and complexity. For example, one copy might be held using an online storage server for fast access and one copy might be on data tape in deep archive for low cost and relatively high safety.
This Handbook follows the National Digital Stewardship Alliance (NDSA) preservation levels (NDSA, 2013) below in recommending four levels at which digital preservation can be supported through storage and geographic redundancy. We make the additional recommendation of using a combination of online and offline copies to achieve a good combination of data access and data safety:
Level |
Approach |
Risks addressed and benefits achieved |
1 |
|
|
2 |
|
|
3 |
|
|
4 |
|
|
Managing storage system obsolescence and risks
The use of storage technologies and solutions needs careful planning and management to be an effective approach to supporting digital preservation. If done properly, the result can be very good levels of data safety, rapid access to content when needed, and costs that are both low and predictable.
IT storage technologies can fail or cause data corruption and the lifetime of media and systems is typically short, for example 3-5 years, which means solutions become obsolete quickly and migration is needed to avoid digital materials becoming at risk. Migration in this context means moving data off an old storage system and onto a new storage system. The digital material itself does not change but the storage solution does. An IT department or storage service provider will think of migration at the storage level. This is in contrast to file format migration where the file format will change, but the way that the files are stored doesn't change.
Resources
NDSA Levels of Preservation
http://www.digitalpreservation.gov/ndsa/activities/levels.html (2013)
https://ndsa.org//publications/levels-of-digital-preservation/ (Version 2.0, 2018)
The National Digital Stewardship Alliance (NDSA) "Levels of Digital Preservation" are a tiered set of recommendations for how organizations should begin to build or enhance their digital preservation activities. It is intended to be a relatively easy-to-use set of guidelines useful not only for those just beginning to think about preserving their digital assets, but also for institutions planning the next steps in enhancing their existing digital preservation systems and workflows. It is not designed to assess the robustness of digital preservation programs as a whole since it does not cover such things as policies, staffing, or organizational support.
These are some of the more notable digital preservation storage systems and storage system/service providers There are a wide-range of commodity IT storage vendors, as well as specialist digital preservation service providers that can provide onsite or cloud storage (see also Cloud services). These specialists typically may support other preservation functions in addition to storage.
Arkivum
Digital Preservation Network
DSpace
ePrints
Fedora
iRods
LOCKSS
OCLC Digital Archive CONTENTdm
http://www.oclc.org/digital-archive.en.html
Portico
http://www.portico.org/digital-preservation/
Preservica
Rosetta
https://www.exlibrisgroup.com/products/rosetta-digital-asset-management-and-preservation/
Community Owned digital Preservation Tool Registry COPTR
http://coptr.digipres.org/Main_Page
Although focussing principally on tools the COPTER registry also covers a range of storage systems and services. It acts primarily as a finding and evaluation tool to help practitioners find the tools they need to preserve digital data. COPTR captures basic, factual details about a tool, what it does, how to find more information (relevant URLs) and references to user experiences with the tool.
DSHR's Blog
David Rosenthal is a computer scientist and chief scientist for the LOCKSS project. His blog frequently covers computer storage development and trends and implications for digital preservation.
Case studies
The National Archives case study: Bodleian Library, University of Oxford
http://www.nationalarchives.gov.uk/documents/archives/case-study-oxford.pdf
This case study covers the Bodleian Library and the University of Oxford, and the provision of a "private cloud" local infrastructure for its digital collections including digitised books, images and multimedia, research data, and catalogues. It explains the organisational context, the nature of its digital preservation requirements and approaches, its storage services, technical infrastructure, and the business case and funding. It concludes with the key lessons they have learnt and future plans. January 2015 (4 pages).
The National Archives case study: Parliamentary Archives
http://www.nationalarchives.gov.uk/documents/archives/case-study-parliament.pdf
This case study covers the Parliamentary Archives. It is an example of an archive using a hybrid set of storage solutions (part-public cloud and part-locally installed) for digital preservation as the archive has a locally installed preservation system (Preservica Enterprise Edition) which is integrated with cloud and local storage and is storing sensitive material locally, not in the cloud. January 2015 (4 pages).
The National Archives case study: Tate Gallery
http://www.nationalarchives.gov.uk/documents/archives/case-study-tate-gallery.pdf
This case study discusses the experience of developing a shared digital archive for the Tate's four physical locations powered by a commercial storage system from Arkivum. It explains the organisational context, the nature of their digital preservation requirements and approaches, and their rationale for selecting Arkivum's on-premise solution, "Arkivum/OnSite" in preference to any cloud-based offerings. It concludes with the key lessons learned, and discusses plans for future development. January 2015 (4 pages).
References
Backblaze, 2014. Hard Drive Reliability Update – Sep 2014. Backblaze. [blog] Available: https://www.backblaze.com/blog/hard-drive-reliability-update-september-2014/
Elerath, J., 2009. Hard-Disk Drives: The Good, the Bad, and the Ugly. Communications of the ACM. 52 (6), 38-45. Available: doi:10.1145/1516046.1516059. http://cacm.acm.org/magazines/2009/6/28493-hard-disk-drives-the-good-the-bad-and-the-ugly/fulltext
Greenan, K.M., Plank, J.S. & Wylie, J.J., 2010.Mean time to meaningless: MTTDL, Markov models, and storage system reliability. Proceedings of the 2nd USENIX conference on Hot topics in storage and file systems. Available: https://www.usenix.org/legacy/event/hotstorage10/tech/full_papers/Greenan.pdf
Jiang, W. et al., 2008. Are Disks the Dominant Contributor for Storage Failures? A Comprehensive Study of Storage Subsystem Failure Characteristics. Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST '08). Available: http://www.usenix.org/events/fast08/tech/jiang.html
NDSA, 2013. The NDSA Levels of Digital Preservation: An Explanation and Uses, version 1 2013. National Digital Stewardship Alliance. Available: http://www.digitalpreservation.gov/ndsa/working_groups/documents/NDSA_Levels_Archiving_2013.pdf
Pinheiro, P., Weber, W-D. & Barroso, L.A., 2007. Failure Trends in a Large Disk Drive Population. Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST' 07). Available: http://static.googleusercontent.com/media/research.google.com/en//archive/disk_failures.pdf
Rosenthal, D.S.H., 2010. Bit Preservation: A Solved Problem? The International Journal of Digital Curation. 5 (1) Stanford University Libraries, CA. Available: http://www.ijdc.net/index.php/ijdc/article/view/151/224