Jaana Pinnick is Research Data and Digital Preservation Manager for British Geological Survey
We started to develop digital preservation capabilities at BGS in 2016 by exploring the initial requirements and writing a preservation policy to guide the future work. This blog describes the progress we have made so far.
Background
BGS is an approved Place of Deposit under the Public Records Act and committed to looking after certain geoscience data in its care “in perpetuity”. Its National Geoscience Data Centre (NGDC) makes most of its data openly available under the Open Government Licence. BGS also has legal obligations to manage some types of data. The UKRI Data Policy requires data with acknowledged long-term value to be preserved and made available for future research; however, the NGDC considers the retention of most of its geoscience data to be longer than the ten years stipulated in the policy.
Geoscience data covers numerous data types from geochemical, seismic, oil and gas, and geophysical data to data about rocks, minerals, sediments, soils, land contamination, natural resources, erosion, and many more. It includes GIS and geospatial data, represented in various geographic coordinate systems, and vector and raster data stored as layers and used in mapping. Our designated user community stretches from local and national government to industry, manufacturing, construction, transport, research and academia to the general public. Geoscience data helps create new products and knowledge and answer pressing scientific questions; it is used in decision-making and to build infrastructure or risk models, to innovate, and to trade onwards.
To help manage this diversity requires the use of standards and normalisation of data and metadata. BGS discovery metadata service means that data deposited with the NGDC is discoverable via the NERC Data Catalogue Service, the UK Government Data.gov.uk platform and the European Commission INSPIRE Geoportal. Over the last three years we have enhanced the long-term continuity of our data by introducing digital preservation thinking within the organisation.
Preservation challenges
The usual digital preservation challenges apply on geoscience data but there are some additional ones too. Collecting deep borehole data to the depth of many kilometres costs tens of millions of pounds and is too expensive to repeat. Seismic data originating from earthquakes is another example of unique data. Geoscience data has long validity and underpins future research, so appraisal is a key in building reliable and reusable datasets. The lack of terms and conditions or spatial attributes in legacy data means that it is not possible to reuse data and the resulting science would not be based on valid conclusions. Recovery of data from magnetic tapes prevalent in some geoscience sub-disciplines is resource-intensive. New ways of capturing data using sensor and monitor networks will result in a data deluge, magnifying data description and storage issues.
But digital preservation is not just about tools and technologies, it is also about people doing it. Strategic thinking is required to ensure our digital skills are developed and maintained to cover all aspects of data lifecycle. We need to manage the change from analogue to digital thinking amongst our staff, and building services using various funding sources and amalgamating stakeholder requirements is not straightforward.
There is always the option of doing nothing – simply storing the data. That would risk more data becoming unusable, leading to the loss of capability to answer research questions quickly, fully and accurately – BGS relies on this ability to conduct its core business - as well as the loss of reputation as a national archival organisation. The usability of unique geoscience datasets would be endangered and the potential for new science and data product development hindered.
Progress so far
We have developed an online Data Deposit Portal to standardise the ingestion of digital data and metadata, and we guide data depositors to use open and acceptable file formats. Continuing process improvement and automation helps alleviate the impact of growing data volumes, and the use of DOIs for datasets provides persistent links to data.
Our digital preservation policy was published and launched in 2017. Writing a policy may sound a bit daunting, but there are many available online to learn from. We reviewed policies by several public sector organisations including the British Library, the National Archives, the Parliamentary Archives, as well as several UK and European Universities and digital libraries. The DPC Handbook includes excellent guidance on establishing organisational preservation strategy and policy and they also run topical workshops to support their members.
We wanted to make our policy a brief, high-level document. To begin with, we defined the scope and objectives of the policy and listed some benefits of preservation for the organisation. We then outlined the preservation framework as well as our initial requirements and business drivers and defined roles and required resources. To initiate the non-information managers (read: scientists), we described some key concepts for functional preservation. We published our policy on the NGDC website to assure data depositors of the long-term security of their data, and we will review it every three years.
A preservation policy is a good start but it does not fully justify the need to spend our limited resources on digital preservation. This is the function of another document, the business case. It demonstrates to the executive the need for and the benefits of digital preservation, listing objectives, challenges and benefits of the work. Our business case outlined the development of a modular preservation programme to show the need for resources and highlighted the cost of doing nothing and the risk of losing valuable data assets. We are not planning to purchase a commercial software solution; instead, we will enhance our existing data management capabilities and infrastructure and use and develop our staff skills.
In 2017 we also undertook a project to become a trustworthy digital repository. We wanted to become accredited to build stakeholder confidence in our data centre and to benchmark our processes against recognised standards. For this we selected the CoreTrustSeal certification, which is a peer-reviewed self-assessment. The process took us about 18 months, although it might have been quicker if we didn’t have so many other obligations and projects on the go. We were able to demonstrate compliance with many of the requirements although we had to make sure everything was formally documented and publicly available, and we received the accreditation in February 2018 as the first UK repository.
Other work already in progress includes plans to design a PREMIS metadata element for inclusion within our existing BGS Discovery Metadata schema, which is held and managed as a relational database. We have trialled the creation and capture of checksum values for some of our datasets, and are looking into adding fixity checks into ingest and appraisal workflows. This will strengthen data integrity and authenticity in line with our data centre accreditation. As part of raising awareness BGS took part in the International Digital Preservation Day in 2017, using the event as the formal launch of the preservation policy.
We have adopted the “Parsimonious Preservation” philosophy described by Tim Gollins in his paper “Putting Parsimonious Preservation into Practice”, with the overriding message being “Don’t panic!” This we intend to follow to the letter. In particular I agree with Tim’s statement “A more imminent threat is poor capture and inability to achieve safe and secure storage of the original material”.
Future work
The third building block for our preservation programme will be a strategic action plan based on the policy and the business case. This document is currently being written and will be modified as we find out what is realistic and what works for us. The selected option of building a modular programme will enable us to develop our capabilities and add preservation functionality incrementally as resources allow and risks are identified.
To inform the strategy, BGS will carry out a capability assessment and a risk assessment using appropriate tools (e.g. DPCMM© and SPOT model). The initial results from the DPCMM© tool together with the NDSA Five Levels of Digital Preservation have highlighted areas where we need to make a concentrated effort to improve, but before getting down to business we will do a data asset survey and create and populate a digital asset register. We have looked at some existing asset register examples (TNA, UK Parliamentary Archives) and started to develop our own template modelled for geoscience research and geospatial data.
It is important to have clarity of the repository’s user requirements and to agree on realistic goals for the digital preservation programme. At an era of ever increasing data volumes BGS will need to decide what needs to be preserved permanently and for what purpose. There needs to be a clear prioritisation of both key datasets and resources dedicated to managing them, including providing staff training as required. Collaboration with other organisations will assist in sharing the lessons learned and resources and in increasing general awareness amongst the stakeholders.
Summary
Our preservation policy defined the scope and the objectives of our modular preservation programme. We supported this with a business case aimed at senior management, to describe and justify the need for undertaking the work and using the necessary resources. The third step will be to consider how we achieve our objectives in practice using a strategic action plan and a digital asset register. The outcomes of risk and capability assessments will feed back into the strategy, helping prioritise and select key activities and directing resources where our digital content is most at risk. All this will support our corporate data management objectives, which are to maximise the long-term accessibility of digital geoscience data, to support innovation and economic growth using that data, and to raise awareness of and build up skills in digital preservation and research data management.