Taryn Ellis

Taryn Ellis

Last updated on 9 November 2023

Taryn Ellis is Digital Preservation Technical Analyst at State Library of South Australia, a member of National and State Libraries Australia (NSLA). She attended the iPRES 2023 Conference with support from the DPC Career Development Fund, which is funded by DPC Supporters.


As a relative newcomer to the field of digital preservation (so much to learn, so much to do!) I was keen to attend the iPRES 2023 Panel, “Policies, Risks and Strategies: A File Format Debate” chaired by Sam Alloing from the National Library of the Netherlands (KBNL).

For many of us working in digital preservation, when we join an organisation the file format policy is: (a) one of the documents we are given, perhaps with the caveat that it needs to be updated; or (b) something we are expected to develop, because this is good practice. It could incorporate guidance as to what material the organisation should accept, when and how accepted material should be processed for preservation, or both.  This panel provided an opportunity to examine in more detail the purpose of this approach to managing collections and prompted us to consider alternative language and models that could be used.

The background to this discussion can be found in a 2022 DPC blog post by Paul Wheatley in which he reflects on the publication of the International Comparison of Recommended File Formats, which brings together information on “accepted and preferred file formats“ from a range of institutions.

The panel opened with each participant providing an outline of the ways in which their organisation frames policy on file formats and the context in which such assessments are utilised. Of the six organisations represented by the panel, three have recommended file format policies, two don’t (placing greater emphasis on strategies and action plans), and one presents a breakdown of the per-format levels of preservation currently provided. None of the institutions refused material based on format. Some interesting themes also emerged during the panel – more than any brief blog can capture – so please watch the recording here!

One of these themes was language, and the ways in which defining formats as “acceptable” can be interpreted and the impacts that can have on our collections. Some of the panellists noted that this could be understood to mean that some formats shouldn’t be deposited or collected; that migration of “unacceptable” formats is always necessary; or that “acceptable” formats don’t require any further analysis.

These interpretations can be detrimental in a number of ways. Through proscribing a format, an organisation may fail to acquire key elements of a data set or collection. Depositors may not offer some material, or even change file extensions to match “accepted” formats (thereby increasing the workload of under-resourced preservationists!).

File migration is not guaranteed to be a reliable and straightforward process. By placing the onus on producers or depositors to do this work, we risk receiving damaged or incomplete files - or not receiving material at all. If migration is mandated by policy, the costs in staff time and storage (if the original is also kept), may actually impede more meaningful work - and at scale has an environmental impact we can ill-afford. Also as the panel reminded us, migration is hard, data loss can happen, and we all make mistakes.

As well, it is risky to assume that format alone can be used to determine how a file should be managed rather than taking a more nuanced approach that considers the specific properties of a file. For example, problematic characteristics of an “accepted” format could be overlooked, or a producer may ignore more robust or resilient format choices.

While every organisation will use file format policies differently - as a guide to depositors, as a means of tracking their own resources and capabilities, as a strategy to maintain a consistent internal approach to preservation challenges, as a way of sharing hard-won knowledge with other agencies - it is worth considering alternative methods of communicating these intentions. Especially when poor communication could restrict or impair collections.

The National Library of the Netherlands (KBNL) has developed an approach that may help avoid some of the pitfalls outlined above. Rather than ranking file formats, they instead describe the levels of preservation they are currently able to provide based on the organisation’s knowledge of the format. There are three knowledge levels:

 Level 1 indicates that the format can only be stored, meaning that the file will receive bit-level preservation and its ongoing integrity will be ensured. Level 2 formats are those that can be identified and assigned a PRONOM  PUID. Level 3 formats are considered to be known i.e. well-understood. These will be validated and have technical metadata extracted. When outlining this system, Sam explained that the organisation’s goal is to one day reach the point that all their holdings are considered Level 3, but with over 1000 file extensions in their collection they recognise that this is a lot of work!

This kind of transparency acknowledges the reality that creators of cultural artefacts will always use the tools available to them. It also communicates to the broader community that as custodians of cultural heritage, despite the many challenges, it is our responsibility to continue to innovate and adapt.


Acknowledgements 

The Career Development Fund is sponsored by the DPC’s Supporters who recognize the benefit and seek to support a connected and trained digital preservation workforce. We gratefully acknowledge their financial support to this programme and ask applicants to acknowledge that support in any communications that result. At the time of writing, the Career Development Fund is supported by Arkivum, Artefactual Systems Inc., boxxe, Evolved Binary, Ex Libris, Iron Mountain, Libnova, Max Communications, Preservica, and Simon P Wilson. A full list of supporters is online here.

 

Comments

Sam Alloing
1 year ago
Hi Taryn,

Thanks for writing your takeaways from the panel. Really interesting to read!

Sam
Quote

Scroll to top