Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark
Introduction
This section provides guidance on the use of persistent identifiers for digital objects and digital preservation. Other types of persistent identifier schemes exist e.g. for individuals or institutions.
A persistent identifier is a long-lasting reference to a digital resource. Typically it has two components: a unique identifier; and a service that locates the resource over time even when it's location changes. The first helps to ensure the provenance of a digital resource ( that it is what it purports to be), whilst the second will ensure that the identifier resolves to the correct current location.
Persistent identifiers thus aim to solve the problem of the persistence of accessing cited resource, particularly in the academic literature. All too often, web addresses (links) fail to take you to the referenced resource you expect. This can be for technological reasons like server failure but human-created failures are more common. Organisations transfer journals to new publishers, reorganise their websites, or lose interest in older content, leading to broken links when you try to access a resource. This is frustrating for users, but the consequences can be serious if the linked resource is essential for legal, medical or scientific reasons.
Persistent identifiers can also be used 'behind-the-scenes' within a repository to manage some of the challenges in cataloguing and describing, or providing intellectual control and access to born-digital materials.
Schemes
Since the problem of persistence of an identifier is created by humans, the solution of persistent identifiers also has to involve people and services not just technologies. There are several persistent identifier schemes and all require a human service element to maintain their resolution systems. The main persistent identifier schemes currently in use are detailed below.
Digital Object Identifier (DOI)
DOIs are digital identifiers for objects (whether digital, physical or abstract) which can be assigned by organisations in membership of one of the DOI Registration Agencies; the two best known ones are CrossRef, for journal articles and some other scholarly publications, and DataCite for a wide range of data objects. As well as the object identifier, DOI has a system infrastructure to ensure a URL resolves to the correct location for that object.
Handle
Handles are unique and persistent identifiers for Internet resources, with a central registry to resolve URLs to the current location. Each Handle identifies a single resource, and the organisation which created or now maintains the resource. The Handle system also underpins the technical infrastructure of DOIs, which are a special type of Handles.
Archival Resource Key (ARK)
ARK is an identifier scheme conceived by the California Digital Library (CDL), aiming to identify objects in a persistent way. The scheme was designed on the basis that persistence "is purely a matter of service and is neither inherent in an object nor conferred on it by a particular naming syntax".
Persistent Uniform Resource Locator (PURL)
PURLs are URLs which redirect to the location of the requested web resource using standard HTTP status codes. A PURL is thus a permanent web address which contains the command to redirect to another page, one which can change over time.
Universal Resource Name (URN)
URNs are persistent, location-independent identifiers, allowing the simple mapping of namespaces into a single URN namespace. The existence of such a Uniform Resource Identifier does not imply availability of the identified resource, but such URIs are required to remain globally unique and persistent, even when the resource ceases to exist or becomes unavailable. The URN term is now deprecated except in the very narrow sense of a formal namespace for expressing a Uniform Resource Identifier.
Choosing a Persistent Identifier Scheme
There needs to be a social contract to maintain the persistence of the resolution service - either by the organisation hosting the digital resource, a trusted third party or a combination of the two. Each scheme has its own advantages and constraints but it is worth considering the following when deciding on a persistent identifier strategy or approach:
Advantages
- Critically important in helping to establish the authenticity of a resource.
- Provides access to a resource even if its location changes.
- Overcomes the problems caused by the impermanent nature of URLs.
- Allows interoperability between collections.
Disadvantages
- There is no single system accepted by all, though DOIs are very well established and widely deployed.
- There may be costs to establishing or using a resolver service.
- Dependence on ongoing maintenance of the permanent identifier system.
Conclusions
Persistent identifiers need to be supported by enduring services and are not just unique strings of alpha-numeric characters that are assigned to a digital resource. They have become particularly important for research data and e-journal articles (see content specific preservation section on e-Journals) and are a significant part of the long-term infrastructure for digital preservation of research. For the issue of link-rot for more general web pages, and solutions harnessing web-archives to resolve this see the content specific preservation section on Web-archiving.
Resources
Persistent identifiers - an overview. TWR Technology Watch Review
http://www.metadaten-twr.org/2010/10/13/persistent-identifiers-an-overview/
This article by Juha Hakala (2010) describes five persistent identifier systems (ARK, DOI, PURL, URN and XRI) and compares their functionality against the cool URIs. The aim is to provide an overview, not to give any kind of ranking of these systems.
Preservation, trust and continuing access for e-Journals DPC technology watch report
http://dx.doi.org/10.7207/twr13-04
This 2013 report by Neil Beagrie discusses current developments and issues which libraries, publishers, intermediaries and service providers are facing in the area of digital preservation, trust and continuing access for e-journals. It includes generic lessons and recommendations on outsourcing and trust of interest to the wider digital preservation community and covers relevant legal, economic and service issues as well as technology. (49 pages).
Persistent Identifiers in the Publication and Citation of Scientific Data
Presentation by Jens Klump, German Research Centre for Geosciences (GFZ) on the DFG STD-DOI project, which details the background and reasoning behind the foundation of DataCite. 2009. (47 pages).
DCC Briefing Paper: Persistent Identifiers
http://www.dcc.ac.uk/resources/briefing-papers/introduction-curation/persistent-identifiers
This 2006 paper by Joy Davidson discusses how progress in defining the nature and functional requirements for identifier systems is hindered by a lack of shared agreement on what identifiers should actually do; simply provide a globally or locally unique name for a digital or analogue resource, or incorporate associated services such as resolution and metadata binding. The application and maintenance of identifiers forms just one part of an overall digital preservation strategy; in order to offer any guarantees of persistence in the long or short-term they need institutional commitment and clearly defined roles and responsibilities. (2 pages)
ARK
http://www.cdlib.org/services/uc3/arkspec.pdf
CrossRef
DataCite
DOI
Handle
Perma.CC
PURL
http://archive.org/services/purl/
URN
http://tools.ietf.org/html/rfc3986
Case studies
DCC case study: Assigning digital object identifiers to research data at the University of Bristol
http://www.dcc.ac.uk/resources/persistent-identifiers
The University of Bristol runs a dedicated research data repository as part of their Research Data Service. They are using the DataCite service at the British Library to assign digital object identifiers (DOIs) to research datasets in order to provide unique and perpetual identifiers for data, to allow easy citation and discoverability. The Bristol Research Data Service provides guidance on how to use the identifiers to cite data and is developing appropriate policies to monitor usage. 2004. (4 pages).
Links that Last
http://www.dpconline.org/events/previous-events/925-links-that-last
This DPC briefing day in July 2012 introduced the topics of persistent identifiers and linked data, and discussed the practical implications of both approaches to digital preservation. It considered the viability of services that offer persistent identifiers and what these offer in the context of preservation; reviewed recent developments in linked data, considering how such data sets might be preserved; and by introducing these two parallel topics it went on to consider whether both approaches can feasibly be linked to create a new class of robust linked data. A series of presentations including case studies are linked from the provisional programme.
Comments
RFC 8141 is the revised URN syntax specification, which describes also URN namespace registration mechanisms, formerly depicted in RFC 3406. RFC 3986 is URI syntax, which is not informative from URN point of view.