Peter Zhou is Director and Assistant University Librarian at University of California, Berkeley
Over the past decade, I have spoken frequently at conferences on both sides of the Pacific, on digital information management and digital preservation, and I have just as frequently encountered academic leaders, librarians, and information specialists working under the misconception that digitization somehow equals digital preservation.
To many, converting print or analog content to a digital format and transferring the converted content to a disk, server, or other storage devices is an exercise in digital preservation. I usually point out that digital conversion makes content digital, but it cannot and will not guarantee that the digitized content can or will be preserved for an unspecified period to come, since the new format may become old, obsolete, or unusable in a matter of a few years—and then there are the problems of format reconciliation, checksum, error correction, data storage, and data migration, all of which are critical components of a robust digital preservation operation, whereas by simply storing the digital content and doing nothing else, one will miss all those vital steps.
Storing digital content is not the same as preserving digital content. The former relocates content to a storage device but does not ensure that the content stored can be used and re-used in the indefinite future. The latter seeks to deliver the original state of the content years from now. It is the first step toward permanently preserving knowledge and information stored in the digital format. A trusted digital preservation program along with established standards and required preservation best practices are essential to the success of any digital enterprise.
The Digital Dunhuang Project, for which I have acted as consultant, provides an exemplary illustration of the complete life cycle of digital preservation. The objective of the project is long-term preservation of cultural heritage of inestimable value while providing a platform for sharing all digital assets generated in the act of the preservation at this world heritage site. Its digital preservation processes include performing digital preservation actions by creating checksums, validating files, and extracting technical metadata upon ingest; monitoring file format obsolescence; migrating file formats as well as content; tracking and copying files to LTO tapes, etc. This dynamic chain of actions and tasks allows the project to establish means and tools for preserving massive amounts of data perpetually. Digital Dunhuang has shed light on what is needed to preserve the vast amount digital assets in libraries and museums around the globe.
Figure 1. Digitizing the Dunhuang Caves. Image courtesy of the Dunhuang Foundation.
All those actions have a single purpose, that is, to preserve the knowledge and digital heritage we humans create for many years to come. In the present day, we can still examine books and manuscripts that were created centuries ago. But we do not know yet whether the digital books and documents created or converted from the print content today will endure for even a hundred years. The only way to achieve the goal of digital preservation is to develop scientific methods for preserving all forms of digital assets through robust systems, mechanisms, and platforms. The first step towards that goal, however, is clearly a better, and broader, understanding of the distinction between digitization and digital preservation.