Bish Bash Backup: A Blog about converting metadata to PREMIS

Home
Digital Preservation
Champion Digital Preservation
Bit List
DPC
Events
Awards
Bish Bash Backup: A Blog about converting metadata to PREMIS

Bish Back Backup is a student blog that focuses on preservation metadata and it reports on a project to create a simple but innovative tool for implementing PREMIS in XML by automatically extracting and converting metadata into PREMIS-complaint format.

The project was carried out over the course of the Digital Curation module (10 weeks) in class and during independent study time and is linked to both the class intended learning outcomes and the personal learning outcomes identified by students in week 1. This student blog received a distinction and the highest grade in the cohort, demonstrating a precise grasp of preservation metadata as a sustainability and professional issue and executed to an extremely high standard across all the criteria assessed (communication, contextualization, analysis and synthesis, presentation), as well as to a professional standard beyond the course requirements. A key challenge of this assessed work is to balance technical and practical aspects with appropriate detail and tone of communication, as well as being clearly situated in a wider body of professional discourse and practice, while also maintaining academic rigour. This blog achieves this with great clarity and attention to detail in both content and visual communication.

The Bish Bash Backup blog first reflects on the significance of preservation metadata, then it considers different schemas (PREMIS, METS, TOTEM) in order to set the scene for PREMIS implementation. One of the class activities during the module was to manually build a simple PREMIS XML (eXtensible Markup Language) schema using Excel and Oxygen (a free XML editor) and to map metadata from other sources such as the file identification tool DROID and using the PREMIS Data Dictionary in order to gain a deeper understanding of metadata mapping and basic skills in using an XML editor to transform data. This project also used XML as the basis to explore PREMIS implementation, reflecting particularly on the utility of automation given the complexity of working with such metadata schema manually and the “data deluge”. The blog also reflects on the high bar (and sometimes cost) of getting to grips with larger digital repository systems that support PREMIS, as well as the limitations of other “small” tools aimed at generating PREMIS-compliant metadata. The blog therefore identifies a gap in standalone “medium” tools available for converting detailed PREMIS XML from a target file.

DPA2024 Finalists Student Bish Bash 1

Figure 1: Logo for Bish Bash Backup Blog. Image: Alex Habgood.

The resulting prototype, “Premissh”, was developed for the Linux Command Line: “a BASH script that can automatically generate PREMIS XML for an input file(s).” The tool uses third-party applications DROID and ExifTool to extract the necessary metadata from the input file and this is converted to PREMIS XML using and XSLT (eXtensible Stylesheet Language Transformations) script. The code functions successfully and the blog reflects on the limitations of the prototype and areas for continued development e.g., the ability to change which metadata elements are included in the output metadata file.

DPA2024 Finalists Student Bish Bash 2

Figure 2: Excerpt of code, from an early version of Premissh. Image: Alex Habgood.

Crucially, the blog reflects on whether digital curators need to learn to code and where a project like this sits within that context. While the student concludes that not all digital curators needs to code, they reflect on how the development of their own coding skills “opened up new avenues… to solve problems with digital technology” and grow their “knowledge of the underlying functionality of digital technologies”, which ultimately could serve as generative way for digital curation professionals to improve their digital literacy.

DPA2024 Finalists Student Bish Bash 3

Figure 3: GitHub page for Premissh where the code is publicly accessible for others to use. Image: Alex Habgood.

The development of a novel tool, going beyond learning and critically examining an existing tool, exceeds the requirements of the blog assessment and directly contributes to professional practice. The code was also developed to be FOSS (free and open source) and the source code has been made publicly available without restrictions via GitHub, underlining the extensibility of the work. Additionally, it was shared on social media with the aim of eliciting feedback from fellow professionals and identifying areas for future improvement, being both well-received and generating insights for the blog itself and for the future development of the tool. Again, this testing and deployment went beyond the elementary requirements of the assessment to bring in professional collaboration.

Latest Comments

Planning ahead for DVD-Video migration research
- Nigel Bryant 4 years ago
  
  Thanks Kieran, having worked with a small ...
Starting with complexity: Archiving digital-born music compositions from Mac systems of the 80s/90s
- Beat Mattmann 4 years ago
  
  I hope, they will never cross my road ...
- Graham Purnell 4 years ago
  
  Unfortunately, not in a professional ...

Pre-WWW Videotex Data Services and Bulletin Board Services
Pre-WWW Videotex Data Services and Bulletin Board Services Pre WWW telephone and television information services that allowed a degree of user interaction and data retrieval with modem-based two-way...

Read More …

Byte-sized Bit List: Using the Bit List to manage digital preservation actions
Elizabeth Hughes is Digital Preservation Lead for the Digital Archive Team at Queensland State Archives Queensland State Archives has used the Bit List internally, in a very practical way, to help us...

Read More …
Byte-sized Bit List: Using the Bit List to prioritize digital preservation
Leo Konstantelos is Digital Archivist at the University of Glasgow At the University of Glasgow, we have used the Bit List in a couple of ways: In 2022, we put together a Business Case for funding to...

Read More …
Byte-sized Bit List: Using the Bit List to start a conversation about digital preservation
Bryony Hooper is the Digital Preservation Manager at the University of Sheffield At the University of Sheffield, we have established a Digital Preservation Advisory Group (DPAG) which is composed of...

Read More …
Byte-sized Bit List: Using the Bit List to garner action in support of Community Archives
As we invite new nominations to the Bit List, we have invited DPC Members who have used this resource in support of their own advocacy work to share how they did it and what the results have been. ...

Read More …
Endangered bits and how to save them
Time has felt a bit elastic over the last few years for all of the obvious reasons; a year has either felt like a decade or 10 minutes. But I can assure you that despite it feeling like the latter,...

Read More …

See All Posts

Bish Bash Backup: A Blog about converting metadata to PREMIS

Planning ahead for DVD-Video migration research

Starting with complexity: Archiving digital-born music compositions from Mac systems of the 80s/90s