Richard Higgins

Richard Higgins

Last updated on 19 September 2018

Richard Higgins works in Durham University Library's Archives and Special Collections, where he has worked on developing cataloguing, repository and discovery systems


My suspicion of the backup dates back to the first job where I used a computer as more than an entry terminal and developed a deeper involvement in its business use.

There were two systems in the basement: one DEC VAX/VMS and XENIX based with tape backup, the other a Mannesman Kienzle the size of a washing machine with backup disks like a car’s spare wheel that you had to lower into the top. These were in turn surrounded by wide carriage dot matrix printers that printed out every transaction on green-lined continuous paper. As they were accounting systems, the printouts were all filed as part of the accepted procedure of the time. We used a rotating set of tapes to back up until, one day, things went wrong.

So out came the backup tape, system restore and … system not there at all. The tape turned out to be as good as blank and we were in a worse situation than before. Fortunately the previous backup tape did work, but by now our system was two weeks behind. At this point the rigorous printing out of all transactions became useful and we were able to manually reconstruct and re-enter the data. Re-keying that burnt into my soul a distrust of the complacent backup.

Roll forward a few decades and we are establishing a digital repository here. The storage end starts to accumulate a few terabytes of content and keeping copies of everything on an ever-increasing set of external hard disks (which also fail as soon as they realise that they have got enough of your difficult to replace data to cause real pain) becomes less practicable. We have a rolling checksum checker in the background, keeping an eye on the files in the repository, but what happens when it finds a problem? As can often be the case in organisations, we have no direct access to the backups, much less supervision of the process. There is disk mirroring as the first stage, but that is less appropriate for long term storage that is being added to but seldom edited. Although the data is stored in two separate locations, cannot the mirroring process itself mean that data corruption or ransom style attacks hit both locations before you are aware of the problem? Snapshots again are useful for day to day data, but we discovered that one area only had two days worth of snapshots, meaning if your data goes wrong on Friday it is too late to do anything by Monday.

Considering my early experience again, I also ask how do we know that our backup works until we need to use it? As the repository grows and fills all the disk space that can be made available, it becomes increasingly difficult to find an equal space in which to restore that data to see if it is all there. It is even more difficult to find out if what has been stored still works as a repository by replicating the cloud of interconnected virtual servers and storage silos now required. We have been able to restore the repository, empty of data, to establish that it works as an empty repository would, and to verify that randomly selected (small) subsets of data can be restored. Most backup routines can verify each file as they proceed, although often at a time cost that will mean that at a certain point the size of the repository will mean that the backup routine will just run continuously. The idea that after a disaster we can just spin that repository back up again is of course naïve – the unpredictable nature of “disasters” is such that your repository will probably be a very low priority in the great scheme of things. However, it would be reassuring to know that, once you have reached the head of the queue, something meaningful emerges from your contingency plans and that you don’t have to go back to paper and rekey it all again (hoping nobody remembers that there had been born-digital stuff).


Scroll to top