Jeanne Kramer-Smyth

Jeanne Kramer-Smyth

Last updated on 3 November 2021

Jeanne Kramer-Smyth is Digital Archivist with the World Bank Group Archives


There are so many cliches about how hard it is to get from “almost done” to “done”.  Cliches like, “the last 10% of the work takes 90% of the time” and  “the last mile is the longest” can feel all too real. The barrier I am most obsessed with these days is the “last mile” between transfer and ingest of digital records.

At the World Bank Group Archives’ digital preservation program we are diligent in our efforts to work on all the traditional components of archival work with digital records, including appraisal, transfer, ingest, description, preservation, and access. Each component presents its own challenges, but getting from transfer (which I define as “bringing digital records into archival custody”) to ingest (which I define as “moving digital records into the Digital Vault, our digital preservation platform”) often feels larger than all the others.

This is not the case for scenarios in which we have created a direct pipeline from a World Bank system that creates born-digital records of enduring value into our Digital Vault. The integration work done during development of such a pipeline must identify and address all pre-ingest issues. While this means that the integration work can be quite involved and require a fair amount of time, energy, and technical skill – it also means that once completed, the ingest of the source records can be done smoothly and repeatably. This is one of the benefits of working in an archival function within the very organization that is creating the records: we are very close to the beginning of the records’ lifecycle.

The step of copying records held by other teams from their storage area into our temporary storage areas has been fairly easy. In some cases, we have arranged for a regular schedule for transferring copies of records into our care. We use an inexpensive, readily available tool with an easy user interface to facilitate copying and verifying the files.

The pain point we are struggling with is creating a stable, repeatable process for performing what I think of as the “housekeeping” steps before records can be ingested. Here are a few examples of the questions and activities we face in these steps:

  • Are the files we are about to copy in the right state? For example, one set of moving image records we transfer on a regular basis are recordings of video conferences. These files must be validated to ensure their names have been updated from generated IDs to the name of the meeting recorded.

  • Are there files that need to be deleted before ingest? For example, we decided to preserve a set of audio files that had been used on a website. We did not choose to preserve over 2,000 files that were used to bookmark a specific location in each audio file for playing on the website.

  • Do the files need to be reorganized or the folders renamed to ensure easier understanding of the records in the future? An example of this is renaming a folder from “January 2021” to “01-2021”. This permits the monthly folders to sort chronologically within the “2021” parent folder.

  • Where in the Digital Vault’s hierarchy should the records be ingested?

I thoroughly document these steps to enable sustainability in this work with the expectation that for each broad type of records we accept for ingest we may need a separate document. Many of these tasks require comfort working with a command line interface, rather than a graphical user interface, which not all our staff members have as a skill. We expect comfort with these housekeeping steps to improve over time, but I am also on the lookout for ways to streamline and automate whenever possible.

It may sound like the simplest of steps, but I expect these hands-on command line tasks, which can vary with each record type, to be an ongoing challenge.

Comments   

#1 Carol Mulligan 2021-11-15 13:20
This is very interesting! In this scenario, you mention audio and video files as the two "born-digital" sources. Do you include any others, such as emails?
Quote

Scroll to top