Introduction
Making sense of the contents of files – especially large numbers of files in diverse formats – is a persistent and ubiquitous challenge for those undertaking digital preservation. Understanding how information is packaged, transmitted and processed is essential for ensuring that it can remain intelligible through time. Understanding the intricacies of files can be a daunting and intricate exasperating task, but it is also an obvious candidate for informed collaboration. There have been numerous initiatives in the last decade designing, developing and refining tools and registries that help us to understand the behaviour of files, and some of these are now plugged directly into the architectures of numerous repositories and archives already.
This DPC ‘Day of Action’ will introduce a range of recent initiatives in this domain and it will provide a focussed burst of activity which will be of benefit to all with an interest in digital preservation. Participants will be invited to bring problem files with them, and to work with experts in the field to catalogue problems and develop tools to help characterise and manage them. Participants will
- Be updated on a range of recent activities in file characterisation and format registries
- Have an opportunity to support the development of file format registries
- Be shown how to develop and supply signature information for characterisation
- Encourage collaboration on shared challenges in managing diverse or ‘problem’ files
- Contribute to a wide ranging discussion about strategic needs
Who should come?
This day will be of interest to:
- Collections managers, librarians, curators and archivists in all institutions
- Tools developers and policy makers in digital preservation
- Innovators and researchers in information technology and computing science
- Vendors and providers of services for preservation, records management and forensics
- Innovators, vendors and commentators on digital preservation and cognate fields
- Analysts seeking to develop tools and approaches for information management
What should you bring?
This day of action will let participants 'bring their problems with them'. As far as possible it will provide the tools or ideas to fix them on the spot. At the very least it will enable participants to share challenges and solutions in such a way as to get a range of people working in them at the same time. So participants will be encouraged to bring a lap top with them, and to have thought ahead of time about what problems they would like to share. Ideas for what to bring include:
- Examples of files they can't identify the format of
- Examples of files that are consistently charactrerised wrongly
- Examples of files where the characterisation is accurate but so imprecise as to inhibit effective preservation
- Examples of concrete digital preservation challenges which become evident after characterisation (including images of bad files or files that have become corrupted etc)
- Files that exhibit the digital preservation challenges
- Examples of exotic digital files
- Examples of file formats that wrap contents of multiple file types
As far as possible these should be sharable under a Creative Commons License (CC0) and as far as possible should avoid files that may compromise data security regimes or present any reputational risks.
In addition participants with solutions on offer are encouraged to bring:
- new working file signatures which need testing or refinement
- new implementations of characterisation tools
- tools that identify the extent to which a file conforms to a given standard
- tools that make it possible to render non-conformant files in mainstream browsers or applications
- information about physical characteristics of file formats
- information about the behaviour of files under known conditions
Outline Programme
1000 Registration and Coffee
1030 Welcome and introductions - William Kilbride, DPC
1040 The nature of the problem - Chris Rusbridge
1100 Recent Developments with PRONOM and DROID - David Clipsham, National Archives
1115 CRISP - Maureen Pennock, British Library
1130 Crowd-solving the file format problem, Paul Wheatley, Leeds University Library
1145 Discussion
1200 First parallel session
- Contributing to Collaborative Initiatives (Paul Wheatley and Maureen Pennock)
- Developing and sharing file signatures for PRONOM / DROID (David Clipsham)
1300 Lunch
1345 Second Parallel Session
- Deploying Tools for Characterisation (Carl Wilson)
- Developing and sharing file signatures for PRONOM / DROID (part II/continued David Clipsham)
1500 Tea and Coffee
1530 Panel session and discussion: who does what and why
1645 Wrap-up and thanks - William Kilbride, DPC