Jenny Hunt is the Digital Archivist for the National Records of Scotland Digital Records Unit.
Like many other institutions worldwide National Records of Scotland (NRS) is grappling with digital preservation and finding a solution which will allow the Keeper of the Records of Scotland to carry on his duty to preserve the records of Scotland into the future in a digital age. Our Digital Preservation Programme is currently working on a new digital infrastructure on which to build its repository – so how are we preserving archival digital material in the meantime?
NRS first received archival born digital records in 1998. Since then we have amassed almost 1.07 million digital files in over 130 accessions. We began a formal process of quarantining accessions in the late 2000s, copying each transfer on to a standalone pc with virus-checking software which was updated weekly and scanning it once a week for a period of four weeks. A duplicate of the accession was then produced and sent to off-site storage.
After some years of attempting to develop an in-house digital repository solution NRS established the Digital Records Unit, which for the first time meant that dedicated archival staff could put their minds to the problem. We knew that a fully-functioning digital preservation solution lay somewhere off, and in the meantime we began to look at how we could put together some processes to at least allow us to safely and securely process and store the accessions we had and continue to accept new ones. So how does this interim solution – as we call it - work?[i]
Digital records selected for permanent preservation are transferred by depositors on encrypted USB drives, supplied by NRS. On receipt an accession record is created in our Calm catalogue and then the drive is scanned for viruses on a standalone pc (with write-blocker) whose anti-virus software is manually updated twice per week. If this initial scan is successful the contents of the drive are copied to the quarantine pc, where they remain for a four-week period, with scans taking place each time updates are installed. Once this quarantine period has successfully completed the copied records are retained on the pc as a back-up until the accession has been transferred to the storage server.
The drive is then attached (again via write-blocker) to a second standalone pc where DROID is used to create a profile of the files present along with MD5 checksum hashes. The DROID outputs are used to analyse the types of files present in our collections and help to inform preservation planning. We are currently in the process of defining lists of file formats which we can and cannot accept for permanent preservation.
We have a number of methods we can use to check the completeness of a transfer, depending on its size and the supporting information supplied by depositors. For small accessions a manual comparison of content with a handlist will suffice; for larger accessions we have used MS Excel processes to compare lists of filenames.
Once all checks are complete we can transfer the accession to our interim storage space. This is a secure partition of storage area network which we currently share with another business area within NRS, but which is tightly controlled and subject to the normal organisational IT processes such as back-up and refresh. It does mean having to transport myself and the records to a different building in the NRS estate, and because of the nature of the other business area’s work we can only access the SAN outside of normal nine to five working hours. These are the two main drawbacks to the solution. However the small numbers of accessions which we receive at the moment means it is workable for now.
We use Windows Robocopy utility to copy the files to the server. This gives us a log of the copy process which is added to the metadata for the accession, and which can be checked for any errors which might occur. It is a more robust copy process than simply copying and pasting in Windows Explorer, and it has allowed me to learn some command line coding which I am discovering is a in fact a core digital preservation skill! So far the only issue we have found with Robocopy is that it doesn’t like filenames which have been truncated because of the 8-character filename rule of a past Windows operating system, but we have very few instances of this in our collections and can work around them.
We have a second copy of DROID on the SAN which we use to profile the server copy of the data, and from this we can compare the checksums to ensure no changes have taken place to the files as they have copied over. Checksum comparison using Excel, straightforward on the surface, has thrown up some issues with the copy processes we have used, for example: the 8-character truncation described above; the proliferation of thumbs.db files and the alteration of diacritics in filenames.
We have been thinking a bit more about how and where we use DROID. For instance while we require a method of generating checksums on the server so that we can compare them with the transfer copy, we don’t need to repeat the file identification task. In fact keeping the versions of DROID and its signature files in sync in two locations can be tricky and can lead to mismatches. So we are in the process of identifying something simpler which will just do the checksumming.
The interim solution is a simple and inexpensive solution, but crucially it allows NRS to take safe custody of digital records from depositors, to perform basic pre-ingest processes on them, to store them securely and to monitor their fixity – good sturdy bit-level preservation. The modular manual nature of the interim solution allows for easy modification where required. For instance we are due to move to a one-off virus scanning procedure using two separate anti-virus products – one signature-based, the other heuristic – on the quarantine pc, and we need to introduce a method for depositors to generate their own checksums and supply them with the transferred records. We have started to introduce Bitcurator to our suite and use its tools now and then as required, for instance to produce images of transfer media. At the moment we are using the shell terminal to explore a transfer created with a Linux file system.
The interim solution also provides a wonderful opportunity to learn and understand what lies behind the various processes and how they interact, in a way that would not be possible if they were more automated. As the rate of accession increases over time the limitations of the manual system will show, but it will be phased out once a more technically sophisticated solution is implemented, having served its purpose well.
[i] Note that NRS currently uses Microsoft Windows.