Evelyn McLellan is President of Artefactual Systems and member of the PREMIS Editorial Committee.
It has been a busy couple of years for the PREMIS Editorial Committee. Since June 2015, when we released version 3.0 of the PREMIS Data Dictionary, we have been revising and releasing supporting documentation such as revised Guidelines for using PREMIS with METS and Understanding PREMIS, and updating and enhancing the preservation vocabularies, particularly the eventType vocabulary.
Perhaps the biggest undertaking, however, has been the preparation of a new OWL ontology by a working group that includes some members of the Editorial Committee plus external Linked Data experts and preservation practitioners. This is a work in progress and we are hoping to release a draft soon for a period of public review and feedback.
The new ontology is not just an update of the previous version but rather a re-imagining of how to express preservation metadata as Linked Data. Where the old ontology is essentially a one-to-one mapping between semantic units in the Data Dictionary and classes or properties, the new one draws liberally from other ontologies, linking PREMIS entities and concepts to external classes and properties and simply reusing those external classes and properties whenever possible. The result is a relatively lightweight ontology which avoids duplication with other efforts and which can easily be extended by implementers.
A case in point is file format, one of the essential pieces of information a repository needs to capture about an ingested digital object. The old ontology created Format, FormatDesignation and FormatRegistry classes, and added nine properties - all of which taken together faithfully corresponds to semantic units in the Data Dictionary but which ignore external classes and properties which might also fit the bill. In the new draft ontology, format is expressed through a dct:format property pointing to an instance of the dct:FileFormat class, and uses skos:closeMatch and skos:exactMatch to point to registries such as PRONOM instead of the previously used hasFormatRegistry, hasFormatRegistryKey and hasFormatRegistryName. The example below shows the use of these external terms. Note the addition of an EBUCore property to establish mimetype, something that is not explicitly supported by a semantic unit in the Data Dictionary; implementers can make their own decisions about additional terms to incorporate from established ontologies to flesh out their preservation metadata profiles.
<file1> a premis:File ;
ebucore:hasMimeType “application/pdf” .
dct:format <pdfa1bformat> .
<pdfa1bformat> a dct:FileFormat ;
foaf:name "Acrobat PDF/A - Portable Document Format" ;
premis:hasVersion "1b" ;
skos:exactMatch <http://www.nationalarchives.gov.uk/pronom/fmt/354> ;
skos:closeMatch
<https://www.loc.gov/preservation/digital/formats/fdd/fdd000318.shtml> .
Of course, this being Linked Data, wherever possible the ontology uses URIs to incorporate terms in the preservation vocabularies hosted at LC’s Linked Data Service for Authorities and Vocabularies. For example, the terms in the eventType vocabulary are used to declare sub-classes of Event. The relationships between the ontology and the vocabularies need to be made explicit, and we have identified changes to those vocabularies, including adding new vocabularies or new terms in existing ones. When we release the draft ontology we will be making recommendations to enhance and extend them. For example, we propose creating a new eventOutcome vocabulary with terms such as pass, fail, positive, tentative, etc., to allow users to capture values as URIs rather than literals (plain text).
Linked Data is complicated and writing the ontology has been challenging to say the least (Rights nearly killed us), so we plan to assist PREMIS implementers to understand and use the new ontology with diagrams, examples, slides and other supporting resources. We also plan to deliver a series of webinars that provide the opportunity for live questions and feedback.
The ontology work has afforded the working group a fresh view of PREMIS. The group has stepped beyond the boundaries of the Data Dictionary and looked at the wider metadata environment in which it sits. This benefits of linking to standards and vocabularies, and taking advantage of accumulated expertise from other areas are considerable. We believe this linked data version of PREMIS is a strong step forward for the digital preservation community and offers a holistic view of how best to manage digital assets in the long term.
We hope the community will actively engage in learning about the ontology and making suggestions for changes or improvement. In order to try and help fulfill William Kilbride’s call to ensure that the work we do “remains geared to the needs of the growing and diverse community”, we will be undertaking consultation before the final version is minted. We would love for your feedback and comments.
Our thanks to the members of the ontology working group who have tirelessly taken on a significant piece of work. PREMIS is a community standard and lives only because of the community. You can get involved through the listserv [pig@loc.gov] or email the Editorial Committee [Peter.McKinney@dia.govt.nz].