On Friday last week, our latest #DPClinic chat delved into the topic of persistent identifiers (PIDs). As I remarked at the start of the session, persistent identifiers are something that pop up as an example of accepted good practice in DPC RAM, our Rapid Assessment Model. They are mentioned at the managed level of the metadata section with the example “Persistent unique identifiers are assigned and maintained for digital content.”
Whilst this is something we consider to be good practice and we encourage people to do, we don’t actually tell people how, and we understand that the application of persistent identifiers can be a genuine hurdle or sticking point for some. From my (admittedly fairly limited) understanding of the state of play, it appears to be a complicated landscape and as with most areas of digital preservation there are many different ways of approaching the challenge.
Questions about applying and using persistent identifiers for digital content were raised at our RAM Jam event in December last year, so we decided it might be a good topic for a longer discussion at one of our #DPClinic events.
It was encouraging to see such a good mix of people attending the event and to be part of a lively and interesting discussion. We covered a lot of ground in 45 minutes including the following topics and questions:
- There are a whole range of types of persistent identifier out there and this is perhaps a confusing landscape for those who are new to this area. Our discussions touched on many of them - DOI, ARK, ROR, handles and more. But how do you decide which type to use? It appears to depend on the organizational context, the purpose, the existing infrastructure.
- At what level of granularity should you assign persistent identifiers? To every file, every version, or to the top level intellectual entity? Again, it seems like different practices are apparent in the different domains in which persistent identifiers are used.
- What if the content you are assigning identifiers to isn’t fixed (for example if changes are made or different versions are produced after the identifier has been assigned)? This was noted as a headache for more than one participant.
- What to do when you have to migrate systems? How to ensure persistent identifiers remain persistent and how to set up redirects? Is the solution sustainable?
- The technical infrastructure for minting and resolving persistent identifiers - the pros and cons of doing this in-house or through an existing service.
- Who archives the persistent identifiers and associated metadata itself? Can we be sure this is being appropriately looked after? In some cases yes, but do we know if this is the case across the board?
Several useful links were shared throughout the conversation and these are copied below for reference:
- The Towards a National Collection research programme in the UK has been looking at PIDs for heritage collections - read about the project here, and in particular, there is lots of useful information here.
- The University of Westminster has been working in this area and provides information for their own researchers here, and a recent blog post on the DPC blog from Holly Ranger discusses ongoing work to investigate the use of PIDs for practice based research.
- The Digital Preservation Handbook was mentioned as a good place to find a basic introduction to the topic.
- The National Gallery (UK) was mentioned as an interesting case study. They are starting to use a middleware system called CIIM to mint PIDs for their collection items and connect across several systems and this approach is summarised in an article.
Conversation turned to what the DPC can do to help members move forward in this area and we have some ideas to think about going forward. Thanks to everyone who came along and shared their thoughts and questions in this session. Hope to see you again next time!