Neil Jefferies is Research and Development Project Manager for the University of Oxford's Bodleian Libraries Research & Learning Services
I feel that there is an increasing disconnect between the digital artifacts that we capture and the mechanisms used to create the knowledge that they embody. This contributes to some of the difficulties with preserving these born-digital materials in a way that effectively retains their meaning.
The seed for this train of thought has been my involvement in the Cultures of Knowledge project and the accompanying online resource Early Modern Letters Online (EMLO). The aim of the project is to use digital methods to reassemble and interpret the correspondence networks of the early modern period (roughly 1550-1750). This period is interesting in that it saw the emergence of a significant social network across Europe and the associated Empires, enabled by the development of postal services and increased population mobility. This resulted in an explosion of intellectual activity that laid the foundations for the Enlightenment and established patterns for scientific discourse that persist until the present day - for example, the foundation of The Royal Society and the publication of the first scientific journals.
While working on this project, two things struck me that seem pertinent to Digital Preservation:
Firstly, EMLO started life as a digitised card catalogue of letters, but its subsequent development was in the hands of scholars rather than librarians so that they could shape it to their needs. What developed (and is still developing) was a much richer resource, including people, places, organisations and events with annotations, commentaries and links to other resources. It proved so useful that it has assimilated over a hundred smaller catalogues from across Europe. What is interesting is that all these additional entities are composed almost entirely of metadata, and they also begin to form an overarching framework of historical narrative around the letters. This rich metadata framework serves both to aid discovery but also to contextualise and thus inform the meaning of the discovered material.
Secondly, it is noteworthy that there is a distinct similarity between the literary forms of journal entries, in the sense of personal diaries, letters, the primary mode of formal discourse, and published papers and pamphlets. Indeed, this epistolary (letter) form was even used for some novels at the time and collections of scientific papers came to be called journals. In many cases, there was an expectation that all these materials would be read by others so some care was taken in their construction and letter writers would make and keep copies of their correspondence. To a large extent, this state of affairs persisted well into the 20th century.
When we consider born digital materials, however, the systems used for discourse and the construction of knowledge are increasingly divergent from published, and thus archived, forms. Discourse has become:
- Multi-modal - taking place over email and any number of fora and discussion platforms. Reconstructing a train of discussion without prior knowledge of these channels can be a significant challenge.
- Ephemeral - frequently there is little permanence to the content of many of these channels. Automatic deletion of material to keep costs down is standard practice.
- Dispersed - the internet provides ready access to background context by maintaining discussion threads, easy communication with creators, and other linked resources. The meaning of an individual digital artefact is often intimately dependent on this near-omnipresent penumbra of supporting information. .
Consequently, the construction of published papers, reports etc. has become a distinct and laborious activity rather than a continuation of the discursive process. However, in most cases, they will still be read and evaluated by those with access to an external contextual framework. There is, therefore, little immediate benefit to putting effort into ensuring that documents contain the necessary contextual information to be meaningful as standalone entities in the long term. The crisis of reproducibility in some scientific disciplines is, to some extent, symptomatic of this lack of completeness in individual artefacts.
To summarise…
- A contextual framework is essential for discovering and giving meaning to the artefacts of discourse (digital or otherwise) that survive.
- Historians (and digital forensic specialists) spend a lot of time reconstructing these frameworks from surviving artefacts
- The nature of, and disconnect between, the discursive and published domains in the digital world potentially make this even more difficult in the future since the artefacts contain less intrinsic contextual information.
There does not seem to be a single, clear cut solution to this issue, but I would be very interested to discuss this further. Several approaches spring to mind…
- Accepting that we are unlikely to be able to influence the construction of these digital objects significantly, can we capture better contextual metadata?
- Can we change what is considered an acceptable for archiving and preservation? Aiming to capture a form that is closer to the actual mechanisms of discourse (whatever that may be).
- In any case, because the contextual framework is dispersed, any effort is likely to involve multiple organisations and technologies. Silo’ed, centralised approaches to preservation and archiving would need to be evaluated.
- Human behaviour can be hard to change, so machine approaches to context extraction or metadata generation would seem to be obvious avenues to investigate.