Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark
Acronyms and Initials are a feature of any specialised discipline. In an emerging discipline, such as digital preservation, another major difficulty is the lack of a precise and definitive taxonomy of terms. Different communities use the same terms in different ways which can make effective communication problematic. The following working set of definitions and acronyms are those used throughout the Handbook and the DPC Technology Watch Reports and Website. They are intended to assist in its use as a practical tool.
Access As defined in the Handbook, access is assumed to mean continued, ongoing usability of a digital resource, retaining all qualities of authenticity, accuracy and functionality deemed to be essential for the purposes the digital material was created and/or acquired for.
ADS Archaeology Data Service. A UK based service active in digital preservation. http://ads.ahds.ac.uk
AIP Archival Information Package. An Information Package, consisting of the Content Information and the associated Preservation Description Information (PDI), which is preserved within an OAIS (OAIS term).
AMIA Association of Moving Image Archives, an organisation active in the field of moving image archiving. http://www.amianet.org
ARC Container format for websites devised by the Internet Archive, superseded by WARC.
ASCII American Standard Code for Information Interchange, standard for electronic text. https://en.wikipedia.org/wiki/ASCII
Authentication A mechanism which attempts to establish the authenticity of digital materials at a particular point in time. For example, digital signatures.
Authenticity The digital material is what it purports to be. In the case of electronic records, it refers to the trustworthiness of the electronic record as a record. In the case of "born digital" and digitised materials, it refers to the fact that whatever is being cited is the same as it was when it was first created unless the accompanying metadata indicates any changes. Confidence in the authenticity of digital materials over time is particularly crucial owing to the ease with which alterations can be made.
Bit A bit is the basic unit of information in computing. It can have only one of two values commonly represented as either a 0 or 1.The two values can be interpreted as any two-valued attribute (yes/no, on/off, etc).
Bit Preservation A term used to denote a very basic level of preservation of digital resource as it was submitted( literally preservation of the bits forming a digital resource). It may include maintaining onsite and offsite backup copies, virus checking, fixity-checking, and periodic refreshment to new storage media. Bit preservation is not digital preservation but it does provide one building block for the more complete set of digital preservation practices and processes that ensure the survival of digital content and also its usability, display, context and interpretation over time.
Born-Digital Digital materials which are not intended to have an analogue equivalent, either as the originating source or as a result of conversion to analogue form. This term has been used in the Handbook to differentiate them from 1) digital materials which have been created as a result of converting analogue originals; and 2) digital materials, which may have originated from a digital source but have been printed to paper, e.g. some electronic records.
BWF Broadcast WAV format, the European Broadcasting Union standard for a WAV file, with extra metadata. http://www.digitalpreservation.gov/formats/fdd/fdd000003.shtml
Byte (B) A unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit of memory in many computer architectures.
CCSDS Consultative Committee for Space Data Systems, the body responsible for the OAIS Reference Model. http://public.ccsds.org/default.aspx
Chain of Custody A key concept in forensics whereby the custody and provenance of digital hardware, media and files are safeguarded through, for example, the appointment of evidence custodians. The purpose of the Digital Evidence Bag (DEB) is to hold digitally, along with the evidential digital objects, provenance metadata that can be updated as required: a concept that is familiar to digital preservation practitioners.
Checksum A unique numerical signature dreived from a file. Used to compare copies.
CLIR Council on Library and Information Resources. US based organisation active in digital preservation. http://www.clir.org
CNI Coalition for Networked Information. US based organisation active in digital preservation. http://www.cni.org
Continuing Access refers to the right of a subscriber to an electronic publication and their users to have on-going permanent access to electronic materials which have already been leased and paid for by the subscriber from a publisher. It is a term used, along with its synonyms perpetual access and post-cancellation access, in the information industry to describe the ability to retain access to electronic materials by the subscriber/licensee after the contractual licensing agreement with the publisher/licensor for those materials has ended, whatever the reason for the cessation. It may also cover as appropriate arrangements for digital preservation needed to guarantee some elements of continuing access.
COPTR Community Owned digital Preservation Tool Registry hosted by The Open Preservation Foundation. http://coptr.digipres.org
Crawl The act of browsing the web automatically and methodically to index or download content and other data from the web. The software to do this is often called a web crawler.
Dark Archive is an archive that cannot be accessed by any current users but may be accessible at future dates subject to the occurrence of specific pre-defined events ('trigger event'). Access to the data is either limited to a few set individuals or completely restricted to all.
DCC Digital Curation Centre. A UK based organisation active in digital preservation. http://www.dcc.ac.uk
DDI Data Documentation Initiative. A de facto international metadata standard for describing data from the social, behavioral, and economic sciences. http://www.icpsr.umich.edu/DDI
Designated Community an identified group of potential consumers who should be able to understand a particular set of information from an archive. These consumers may consist of multiple communities, are designated by the archive, and may change over time (OAIS term).
Digital Archiving This term is used very differently within sectors. The library and archiving communities often use it interchangeably with digital preservation. Computing professionals tend to use digital archiving to mean the process of backup and ongoing maintenance as opposed to strategies for long-term digital preservation. It is this latter richer definition, as defined under digital preservation which has been used throughout this Handbook.
Digital Forensics The application of scientific technical methods and tools toward the preservation, collection, validation, identification, analysis, interpretation, documentation and presentation of digital information derived after-the-fact from digital sources.
Dim Archive provides bit preservation for the content plus digital preservation planning and actions for long-term perpetual access, and also limited current access (perhaps limited to on-site users or previous subscribers post-cancellation, etc.).
DigCurV Digital Curator Vocational Education Europe. A project funded by the European Commission to establish a curriculum framework for vocational training in digital curation. http://www.digcurv.gla.ac.uk/
Digital Materials A broad term encompassing digital surrogates created as a result of converting analogue materials to digital form (digitisation), and "born digital" for which there has never been and is never intended to be an analogue equivalent, and digital records.
Digital Preservation Refers to the series of managed activities necessary to ensure continued access to digital materials for as long as necessary. Digital preservation is defined very broadly for the purposes of this study and refers to all of the actions required to maintain access to digital materials beyond the limits of media failure or technological and organisational change. Those materials may be records created during the day-to-day business of an organisation; "born-digital" materials created for a specific purpose (e.g. teaching resources); or the products of digitisation projects. This Handbook specifically excludes the potential use of digital technology to preserve the original artefacts through digitisation. See also Digitisation definition below.
- Short-term preservation - Access to digital materials either for a defined period of time while use is predicted but which does not extend beyond the foreseeable future and/or until it becomes inaccessible because of changes in technology.
- Medium-term preservation - Continued access to digital materials beyond changes in technology for a defined period of time but not indefinitely.
- Long-term preservation - Continued access to digital materials, or at least to the information contained in them, indefinitely.
Digital Preservation Management Workshop and Tutorial An intensive training workshop and online tutorial developed and maintained by Cornell University Library, 2003-2006; extended and maintained by ICPSR, 2007-2012; and now extended and maintained by MIT Libraries, 2012-on. http://www.dpworkshop.org/dpm-eng/eng_index.html
Digital Publications "Born digital" objects which have been released for public access and either made available or distributed free of charge or for a fee. They may consist of networked publications, available over a communications network or physical format publications which are distributed on formats such as floppy or optical disks. They may also be either static or dynamic.
Digital Records See Electronic Records
Digital Resources See Digital Materials
Digitisation The process of creating digital files by scanning or otherwise converting analogue materials. The resulting digital copy, or digital surrogate, would then be classed as digital material and then subject to the same broad challenges involved in preserving access to it, as "born digital" materials.
DIP Dissemination Information Package. An Information Package, derived from one or more Archival Information Packages (AIPs), and sent by Archives to the Consumer in response to a request to the OAIS (OAIS term).
DLF Digital Library Federation. A US based organisation active in digital preservation. http://www.diglib.org
Documentation The information provided by a creator and the repository which provides enough information to establish provenance, history and context and to enable its use by others. See also Metadata.
DOI Digital Object Identifier. A technical and organisational infrastructure for the registration and use of persistent identifiers widely used in digital publications and for research data. The DOI system was created by the International DOI Foundation and was adopted as International Standard ISO 26324 in 2012. http://www.doi.org
DPC Digital Preservation Coalition. A UK and Ireland based organisation active in digital preservation and responsible for the Digital Preservation Handbook. http://www.dpconline.org
DPTP Digital Preservation Training Programme, an intensive training course run by the University of London Computer Centre. http://dptp.org/
DRAMBORA Digital Repository Audit Methodology Based on Risk Assessment. A set of risk assessment tools developed by the Digital Curation Centre. http://www.dcc.ac.uk/resources/repository-audit-and-assessment/drambora
DROID A file profiling tool developed and distributed by TNA to identify file formats. Based on PRONOM. http://www.nationalarchives.gov.uk/information-management/manage-information/policy-process/digital-continuity/file-profiling-tool-droid/
Electronic Records Records created digitally in the day-to-day business of the organisation and assigned formal status by the organisation. They may include for example, word processing documents, emails, databases, or intranet web pages.
Emulation A means of overcoming technological obsolescence of hardware and software by developing techniques for imitating obsolete systems on future generations of computers.
Escrow A widespread legal practice of the deposit of content or software source code with a third party. Escrow takes place in a contractual relationship, formalized in an escrow agreement, between at least three parties: the provider, the customer, and the third party providing the escrow service.
FIAF International Federation of Film Archives, an association of the world's leading film archives. http://www.fiafnet.org
FIAT International Federation of Television Archives, a professional association for those engaged in the preservation and exploitation of broadcast archives. http://fiatifta.org
File Format A file format is a standard way that information is encoded for storage in a computer file. It tells the computer how to display, print, and process, and save the information. It is dictated by the application program which created the file, and the operating system under which it was created and stored. Some file formats are designed for very particular types of data, others can act as a container for different types. A particular file format is often indicated by a file name extension containing three or four letters that identify the format. http://en.wikipedia.org/wiki/File_format
Fixity Check a method for ensuring the integrity of a file and verifying it has not been altered or corrupted. During transfer, an archive may run a fixity check to ensure a transmitted file has not been altered en route. Within the archive, fixity checking is used to ensure that digital files have not been altered or corrupted. It is most often accomplished by computing checksums such as MD5, SHA1 or SHA256 for a file and comparing them to a stored value. http://en.wikipedia.org/wiki/File_Fixity
GIF Graphic Interchange Format, an image which typically uses lossy compression. http://en.wikipedia.org/wiki/GIF
Gigabyte (GB) A unit of digital information often used to describe data or data storage size, equates to approximately 1,000 Megabytes (MB).
GIS Geographical Information System, a system that processes mapping and data together.
HTML Hypertext Markup Language, a format used to present text and other information on the World Wide Web. Since 1996, versions of the HTML specification have been maintained by the World Wide Web Consortium (W3C). http://en.wikipedia.org/wiki/HTML
IASA International Association of Sound and Audiovisual Archives, an association for archives that preserve recorded sound and audiovisual documents. http://www.iasa-web.org
IIPC The International Internet Preservation Consortium. http://www.netpreserve.org
Information Assurance An aspect of digital security, specifically directed at ensuring that the quality of the information is demonstrably safeguarded, that it has not been tampered with or accessed inappropriately.
Ingest the process of turning a Submission Information Package (SIP) into an Archival Information Package (AIP), i.e. putting data into a digital archive (OAIS term).
InterPARES project International Research on Permanent Authentic Records in Electronic Systems. http://www.interpares.org
ISO International Organization for Standardization. http://www.iso.org/iso/home.html
JHove2 A characterization tool for digital objects. Characterisation is comprised of four elements: identifying the object's format; validating that the object conforms to its format's technical norms;, extracting technical metadata from the object; and assessing whether the object should be accepted into a repository, based on policies set by the curator. https://bitbucket.org/jhove2/main/wiki/Home
JPEG Joint Photographic Experts Group, a committee that oversees international standards for compression and processing of digital photographs . The majority of JPEG formats are lossy. http://www.jpeg.org/
JPEG 2000 a revision of the JPEG format which can use lossless compression.
Kilobyte (KB) A unit of digital information often used to describe data or data storage size, equates to approximately 1,000 Bytes
Life-cycle Management Records management practices have established life-cycle management for many years, for both paper and electronic records. The major implications for life-cycle management of digital resources, whatever their form or function, is the need actively to manage the resource at each stage of its life-cycle and to recognise the inter-dependencies between each stage and commence preservation activities as early as practicable. This represents a major difference with most traditional preservation, where management is largely passive until detailed conservation work is required, typically, many years after creation and rarely, if ever, involving the creator. There is an active and inter-linked life-cycle to digital resources which has prompted many to promote the term "continuum" to distinguish it from the more traditional and linear flow of the life-cycle for traditional analogue materials. We have used the term life-cycle to apply to this pro-active concept of preservation management for digital materials.
Lossless Compression A mechanism for reducing file sizes that retains all original data.
Lossy Compression A mechanism for reducing file sizes that typically discards data.
LOTAR (LOng Term Archiving and Retrieval) a digital preservation standard for 3D CAD models and product data management information developed by LOTAR International, an industrial consortium of aerospace and defence companies from the US and Europe. http://www.lotar-international.org
Megabyte (MB) A unit of digital information often used to describe data or data storage size, equates to approximately 1,000 Kilobytes (KB).
Metadata Information which describes significant aspects of a resource. Most discussion to date has tended to emphasise metadata for the purposes of resource discovery. The emphasis in this Handbook is on what metadata are required successfully to manage and preserve digital materials over time and which will assist in ensuring essential contextual, historical, and technical information are preserved along with the digital object. The PREMIS Data Dictionary for Preservation Metadata has become a key de facto standard in digital preservation.
Migration A means of overcoming technological obsolescence by transferring digital resources from one hardware/software generation to the next. The purpose of migration is to preserve the intellectual content of digital objects and to retain the ability for clients to retrieve, display, and otherwise use them in the face of constantly changing technology. Migration differs from the refreshing of storage media in that it is not always possible to make an exact digital copy or replicate original features and appearance and still maintain the compatibility of the resource with the new generation of technology.
MIME Multipurpose Internet Mail Extensions. A protocol for including non-ASCII information in email messages. Software typically include interpreters that convert MIME content to and from its native format, as necessary. http://en.wikipedia.org/wiki/MIME
MPEG Moving Picture Experts Group. A committee responsible for the development of international standards for compression, decompression, processing, and coded representation of moving pictures, audio and their combination. http://www.mpeg.org
NCDD The Netherlands Coalition for Digital Preservation. http://www.ncdd.nl/en/
NDSA National Digital Stewardship Alliance a US based organisation active in digital preservation. http://www.digitalpreservation.gov/ndsa/
NESTOR The German competence network for digital preservation. http://www.langzeitarchivierung.de/Subsites/nestor/EN/Home/home_node.html/
Open Archival Information System (OAIS) An Archive, consisting of an organization, which may be part of a larger organization, of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community. It meets a set of responsibilities, as defined in section 4 of the OAIS standard that allows an OAIS Archive to be distinguished from other uses of the term ‘Archive’. The term ‘Open’ in OAIS is used to imply that the OAIS standards are developed in open forums, and it does not imply that access to the Archive is unrestricted. The OAIS abbreviation is also used commonly to refer to the Open Archival Information System reference model standard which defined the term. The standard is a conceptual framework describing the environment, functional components, and information objects associated with a system responsible for the long-term preservation. As a reference model, its primary purpose is to provide a common set of concepts and definitions that can assist discussion across sectors and professional groups and facilitate the specification of archives and digital preservation systems. It has a very basic set of conformance requirements that should be seen as minimalist. OAIS was first approved as ISO Standard 14721 in 2002 and a 2nd edition was published in 2012. Although produced under the leadership of the Consultative Committee for Space Data Systems (CCSDS), it had major input from libraries and archives.
OPF Open Preservation Foundation, formerly the Open Planets Foundation. http://openpreservation.org
PAIMAS Space Data and Information Transfer Systems - Producer-Archive Interface - Methodology Abstract Standard. This ISO 20652:2006 standard covers the first stages of the ingest process defined by OAIS reference model. http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=39577
PDF Portable Document Format, a set of formats and open standards maintained by the International Organization for Standardization for producing and sharing electronic documents originally developed by Adobe Systems. The original page description format has been elaborated over successive versions to enable the embedding of such complex objects as image, audio, and moving image files, hyperlinks, embedded XML metadata, and updatable forms. Specification for various versions and profiles of the format are now maintained by the International Standards Organization. http://www.adobe.com/uk/products/acrobat/adobepdf.html
PDF/A Versions of the PDF standard intended for archival use. http://www.aiim.org/Research-and-Publications/Standards/Committees/PDFA
PDI Preservation Description Information. The information which is necessary for adequate preservation of the Content Information and which can be categorized as Provenance, Reference, Fixity, Context, and Access Rights Information (OAIS term).
Perpetual Access see Continuing Access.
Petabyte (PB) A unit of digital information often used to describe data or data storage size, equates to approximately 1,000 Terabytes (TB).
PIN Pérennisation des Informations Numériques, the French national interest group for digital preservation. http://pin.association-aristote.fr/doku.php
Post-cancellation Access see Continuing Access.
PREMIS Preservation Metadata: Implementation Strategies. A de facto standard for digital preservation metadata. http://www.loc.gov/standards/premis/
PRONOM A database of file formats, software products and other technical components required to support long-term access to electronic records and other digital objects of cultural, historical or business value. Used with DROID. http://apps.nationalarchives.gov.uk/PRONOM/Default.aspx
PST Personal Storage Table is a file extension for local 'personal stores' written by the program Microsoft Outlook. PST files contain email messages and calendar entries using a proprietary but open format, and they may be found on local or networked drives of email end users. Several tools can read and migrate PST files to other formats. http://en.wikipedia.org/wiki/Personal_Storage_Table
Reformatting Copying information content from one storage medium to a different storage medium (media reformatting) or converting from one file format to a different file format (file re-formatting).
Refreshing Copying information content from one storage media to the same storage media.
Sandbox Containment A secure computing environment for running novel, unattested or experimental code or changes in code, including potentially malicious code. The environment is self-contained with tightly controlled resources and is characteristically virtual.
SGML Standard Generalized Markup Language an ISO standard for how to specify a document markup language or tag set. http://en.wikipedia.org/wiki/Standard_Generalized_Markup_Language
Significant properties Characteristics of digital and intellectual objects that must be preserved over time in order to ensure the continued accessibility, usability and meaning of the objects and their capacity to be accepted as (evidence of) what they purport to be. http://www.significantproperties.org.uk
SIP Submission Information Package. An Information Package that is delivered by the Producer to the OAIS for use in the construction or update of one or more Archival Information Packages (AIPs) and/or the associated Descriptive Information (OAIS term).
SMPTE Society of Motion Picture and Television Engineers, a professional organisation and technical standards body for television and motion picture. https://www.smpte.org
TDR Trusted Digital Repository. A trusted digital repository has been defined as having “a mission to provide reliable, long-term access to managed digital resources to its designated community, now and into the future”. The TDR must include the following seven attributes: compliance with the reference model for an Open Archival Information System (OAIS), administrative responsibility, organizational viability, financial sustainability, technological and procedural suitability, system security, and procedural accountability. The concept has been an important one particularly in relation to certification of digital repositories.
Terabyte (TB) A unit of digital information often used to describe data or data storage size, equates to approximately 1,000 Gigabytes (GB).
Three-Legged Stool A conceptual approach to digital preservation that suggests a fully implemented and viable preservation programme addresses organisational issues, technological concerns, and funding questions, balancing them like a three-legged stool. Developed as part of the Digital Preservation Management Workshop and Tutorial.
TIFF Tagged Image File Format, a common format for images typically lossless. http://en.wikipedia.org/wiki/Tagged_Image_File_Format
TRAC Trusted Repository Audit and Certification, toolkit for auditing a digital repository. http://www.crl.edu/sites/default/files/d6/attachments/pages/trac_0.pdf
Trigger Event This terminology is used when specific conditions relating to an electronic publication and its continued delivery to users are met. If the publication is no longer available to users from the publisher or any other source for a variety of reasons then a trigger event is said to have occurred. They can set in motion access for users via an archive where the electronic publication may be digitally preserved.
UKWA UK Web Archive. http://www.webarchive.org.uk/ukwa/
WARC The WARC (Web ARChive) format is a container format for archived websites, also known as ISO 28500:2009. It is a revision of the Internet Archive's ARC File Format used to store web crawls harvested from the World Wide Web. http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=44717
WAV the standard file wrapper for audio; see BWF (Broadcast WAV Format) for the professional variant. http://en.wikipedia.org/wiki/WAV
Writeblockers Tools that prevent an examination computer system from writing or altering a collection or subject hard drive or other digital media object. Hardware writeblockers are generally regarded as more reliable than software writeblockers.
XML Extensible Markup Language, a widely used standard (derived from SGML), for representing structured information, including documents, data, configuration, books, and transactions. It is maintained by the World Wide Web Consortium (W3C). http://www.w3.org/XML/