Daniel Greenberg is Product Manager for Digital Resources Management at Ex Libris.
The theme for World Digital Preservation Day this year is “Breaking Down Barriers,” highlighting how digital preservation supports expanding horizons for libraries and other institutions. For national libraries, this expansion comes with some unique considerations and best practices.
But let’s take a step back and look at the basic challenges national libraries face in managing preservation of their digital assets.
Managing preservation at scale
Digitally preserved archives, by their nature, tend to grow with time. This is especially true for national libraries, which generally have a mandate to preserve broadly defined cultural heritage content in perpetuity. Historical data is rarely deleted and the types of content collected diversifies with time. If digital archives once housed only digitized books and manuscripts, they now must contend with preserving content like social media, audio recordings, video clips, native formats, websites, and so on.
For some institutions, that can add up to digital preservation of more than 5 petabytes of data. To put it another way, some national libraries are managing archives of over 1 billion metadata-rich files – and more will be joining them in the near future. Therefore, scalability and the capability to handle massive volumes of data is increasingly crucial for digital preservation.
Sometimes, large sets of files in the digital archive are preserved in a format or on a system that is at risk of sunsetting (or that is simply less efficient). In such cases, national libraries must be able to convert and migrate hundreds of thousands of affected files efficiently and completely.
In addition to managing content already in their digital archive, national libraries often have to rapidly ingest large numbers of new files and massive amounts of data. This throughput can include blocs of documents, entire collections, digitized daily journals, multiple large multimedia files, and more. The process must be lossless (without losing any information on the way) and timely, to avoid creating both ingest logjams and patron frustration.
The end-game is sharing
The goal of digital preservation in national libraries is, ultimately, to share the valued products of the nation’s cultural heritage with the general public, across borders and with various research institutions. The challenge is to catalog and index such a huge volume of data, making it easily searchable and widely available. In addition, an effective indexing methodology will allow the digital preservation system to seamlessly run services encompassing all of its content.
Another important way in which national libraries can break down barriers and safely share their digital archives is third-party certification. Whether governmental, regional or international, trusted digital repository certificates indicate to partners, users and funders that the preserved collections are being effectively managed and safeguarded in accordance with protocols and standards. Having been deemed reliable, secure and durable by an external authority, the archived data is more likely to be shared over a longer period of time, ensuring its lasting value.
Finally, national libraries should cultivate relations in both the international community and domestically. This will allow them to share knowledge, best practices and tools related to digital preservation, as well as setting the groundwork for potential collaborations.
What we’ve learned thus far
As Product Manager with Ex Libris, I have had the privilege of seeing how Rosetta – our unique end-to-end solution for secure management, preservation and delivery of digitally-born and digitized assets – addresses all the digital preservation challenges faced by national libraries. That is why a growing community of national and state libraries around the world, alongside many other archives, museums, academic libraries and institutions, use Rosetta to preserve their most culturally important digital assets.
Rosetta is currently serving institutions with hundreds of millions of files in multiple formats, preserved in dark archives or open to the public. Any user can easily upload content to the preservation repository on demand, while terabytes of data are also autonomously added every day without human intervention. In part, this is thanks to Rosetta’s advanced parallelization algorithm for managing simultaneous deposit of mass amounts of data.
The archived items are indexed in Rosetta in SolrCloud, the flexible, distributed and state-of-the-art indexing framework. Advanced preservation planning ensures long-term access to the digital archives, in accordance with the latest PREMIS guidelines (created in collaboration with major institutions around the globe). Similarly, many institutions in our user community have already achieved CTS (Core Trust Seal) and NESTOR certifications.
The Rosetta user community, which includes top digital preservation experts, is a rich, global forum for exploring new and innovative directions in digital asset management. Less experienced institutions are able to benefit from their colleagues at other libraries and incrementally expand their usage of the system over time.
Finally, we see it as our responsibility to provide preemptive support to national libraries using Rosetta as they devote resources to preserving a rapidly growing volume of digital assets. To that end, we have set up a unique test environment simulating preservation processes for over 1 billion diverse, data-rich files. We are identifying the vulnerabilities, pain points and risks at that scale now, so our customers won’t ever have to.