Filling the Digital Preservation Gap is exploring how we can apply digital preservation tools to research data management. Funders now expect Higher Education Institutions to retain research data in a usable form for longer periods of time. Rolling retention periods such as those mandated by the Engineering and Physical Sciences Research Council (EPSRC) state that data should be retained for ten years from the date of last access ; the Natural Environment Research Council [1] (NERC) requires that for projects of major importance data may need to be retained for 20 years or longer and the Science and Technology Facilities Council (STFC) expects data that cannot be [2] remeasured to be retained indefinitely. When retaining data for long periods of time, [3] preservation becomes increasingly important.
The project team from the Universities of York and Hull
Many Higher Education Institutions have been working on processes and systems to manage and provide access to research data, a but few have actively utilised digital preservation tools to help curate the data for the longer term. The invisible and slightly intangible nature of digital preservation means that resources tend to gravitate to the more visible and immediate areas of need – for example data deposit, storage and access.
The project team were keen to investigate an open source preservation system called Archivematica to assess its potential use for the preservation of research data. Making use of a freely available tool that allowed institutions to automate many of the processes and activities around digital preservation could offer a pragmatic solution to the problem of preserving research data – particularly given the lack of resource available in most institutions for carrying out this work. We were inspired by the concept of ‘Parsimonious Preservation’ a phrase coined by Tim Gollins [4] which suggests that sustainable steps towards digital preservation can be achieved by using free tools and automated processes.
In the first phase the project team explored whether Archivematica had potential for use in this context. The project teams installed Archivematica locally for testing purposes and this was complemented by wider research into its capabilities and discussion with the user community. A further strand of work investigated the nature of the research data we would be looking to preserve. A detailed phase one report summarised our findings and included an accessible set of FAQs to help inform others about the need for digital preservation and suitability of Archivematica. We concluded that Archivematica had potential to be used in this context and were also able to highlight several areas where it did not quite meet our requirements and would benefit from further development.
A second phase of the project aimed to address some of these areas and initiate a number of enhancements to Archivematica. We did this by sponsoring the development of Archivematica in six discrete areas. We worked with Artefactual Systems (the lead developers for Archivematica) to specify our requirements and test the resulting code. The developments sponsored were designed specifically to deal with issues relating to the nature of research data (its large size and diverse nature) and integration with other systems (for example repository systems and reporting systems). We also aimed to reduce some of the barriers to uptake by improving available documentation – specifically regarding tools for setting up automated workflows. These enhancements and resources will be made available to all Archivematica users and have value for use cases beyond the sphere of research data management.
During phase two, the project team also drew up detailed implementation plans to inform subsequent work in a future phase of the project. It was recognised that just recommending the use of Archivematica was not enough and that other practitioners would be interested in exactly how we would approach the implementation. For example, how would Archivematica integrate with other systems and how would the workflow be configured?
One of the important themes running through the project relates to the nature of research data and how we can use available tools and registries to identify such a diverse set of file formats. This strand of the project has been of relevance to many other digital preservation practitioners working with different types of data and is of primary concern to our community. As well as engaging with The National Archives to discuss how we can increase the coverage of their technical registry (PRONOM) to include research data file formats, we have also been considering workflows within digital preservation tools such as Archivematica and how we can encourage the community to engage with this problem in a practical way.
A key element of this project has been advocacy and awareness raising. We have been keen to talk to others about the project and disseminate our findings as widely as possible. Alongside the publication of our reports at the end of each phase of work, we have also spread the word through a number of different channels.
The project team have presented at several conferences and meetings. In order to reach a wide range of individual, our choice of events has been targeted at a variety of different audiences including digital preservation community, archivists, research data managers and librarians and those working with a particular technology.
We have also maintained a project webpage to promote the project and have released numerous blog posts which have been read by an international audience. This has enabled us to disseminate information about progress in a more dynamic and immediate way as the project moves forward and has helped to highlight some of the thought processes we have gone through as we consider Archivematica and the nature of research data.
1 https://www.epsrc.ac.uk/about/standards/researchdata/expectations/
2 http://www.nerc.ac.uk/research/sites/data/policy/datapolicyguidance/
3 http://www.stfc.ac.uk/funding/researchgrants/datamanagementplan/
4 http://www.nationalarchives.gov.uk/documents/informationmanagement/parsimoniouspreservation.pdf
FInd out more:
- Project webpage: http://www.york.ac.uk/borthwick/projects/archivematica/
- Project blog: http://digital-archiving.blogspot.co.uk/