Data Tapes and Boilerplates

Also in this section
Blog Topics

Latest Comments

An Unexpected Gift
- Niamh Murphy 3 months ago
  
  This is fantastic! Thank you so much, Andy! Merry Christmas!
The unsung digital preservation story arc in the Star Wars galaxy
- Euan Cochrane 4 months ago
  
  This is great and it reminds me of an old post from 2017 after Rogue One came out. Jon Tilbury at ...
Đáp ứng Thách thức Định dạng Tệp - Liên minh Bảo tồn Kỹ thuật số
- Andrew Jackson 4 months ago
  
  Yes, I agree, there's likely to be many more. The goal here is just to establish some kind of credible ...

Archives

DPC Blog RSS Feed

Also in this section

Steve Daly

Last updated on 28 April 2021

Steve Daly is Head of Technology for BBC Archives.

At the BBC Archive we deal with a large variety of different multi-media assets, from physical video and audio tapes through to disk and cloud-based storage. Our highest-quality, uncompressed master video files are generally stored on LTO data tape, with copies in multiple locations.

This means we’re managing tens of thousands of data tapes and, whilst all these are heavily tracked in our warehouse and data management systems, not all the other data tapes that come into our possession have been looked-after quite so well.

Where we receive or inherit tapes created outside our own operations it can sometimes be difficult to be sure of the exact provenance of the contents and how best to read and understand them. We want to make sure we’re not creating issues like this for future users of our own data tapes, so we take a few additional steps to ensure the long-term life of our content.

README

Firstly, for every single data tape we write we make sure that we store a README file on it.

This is a simple text file which describes how the content on the tape came to exist; why it was written to tape; and any information we want to pass on to future users of the content.

For example, we might explain that some content is only being written as a temporary backup of some work-in-progress, or instead that these are the final archive masters of a particular collection. And we might add detail of technical commands needed to process or use the files in future.

Although all this knowledge is also held in our database systems, from our own experience we worry that over a long time-period the tapes could become separated from the data about them so we always try to ensure they are self-describing.

Many of our tapes are written by automated systems, for example where we migrate between generations of data tape (e.g. LTO3 to LTO7) and we take some further steps in these cases;

Boilerplate

We write a ‘boilerplate’ text file to each tape with a full history of the project that created it, including as much detail as we can and writing in a way that doesn’t assume any current knowledge of our operations. We use Markdown for this, to combine universal readability with some optional nice formatting.

bbc1

Files

We list all the file-types on each tape and explain where the files came from and their purpose

bbc2

Format Documentation

For the more esoteric formats, we include the complete file format specification (in PDF-A of course)

bbc3 bbc4

Data

We include a complete data set (as CSV) of the whole project, not just for the files on the tape, so that you can see where this subset sits in context.

bbc5

Source Code

The BBC are heavy users and creators of open-source software, and on each tape we put:

A complete copy of the source code which was used to create the files on the tape
Source code for software that could be used to replay and re-use these files

bbc6

Videos

Being a content-creation organisation, we thought it appropriate to make some videos about the project – interviewing a selection of people involved to explain the background and the workflows and processes involved.

bbc7

As a reminder, all of the additional material I describe here is added to every single data tape we make. It adds about 1GB to each 6000GB tape we make, so only an 0.02% overhead.

We currently write LTFS tapes for situations like this so are comfortable with future readability but for more complex format (e.g. tar) we make sure that a valid restore command is included with each tape. I’ve seen another organisation laser-etch this command onto each of their tapes, but we haven’t gone that far (yet).

I like to think of this like the plaque on the side of the Pioneer spacecraft.

bbc8

If one of these data tapes was separated from the herd, or if some future lifeform stumbled across these tapes, we hope that they would be well-placed to get the very best out of them.

Add comment

An Unexpected Gift

The unsung digital preservation story arc in the Star Wars galaxy

Đáp ứng Thách thức Định dạng Tệp - Liên minh Bảo tồn Kỹ thuật số

Steve Daly