David Underdown and Leontien Talboom work at The National Archives UK


Designated Communities, Representation Information and Knowledge Bases in a Wiki World 

Thanks to Andy Jackson of the British Library's web archiving team for partial inspiration for the title of this post (the rest of the conversation is also largely relevant to the post too)

Introduction

Most of us in the digital preservation field are familiar with the Open Archival Information System (OAIS) model. After nearly two decades this model has become a backbone for many of our architectures, certification and protocols. Terms such as AIP, SIP and DIP are in common use and the first sighting of the OAIS diagram at a conference is frequently remarked on. The model has given us a common language to communicate our digital preservation needs with. But how many of us have actually read and engaged with the model further than the most common terms from it? We would like to admit that the first time we read the guidelines of the OAIS model was at the start of this year, even though we have been using terms from the model throughout both our digital archiving careers.

Here at The National Archives we have (like many archives) used the OAIS reference model to guide our thinking on digital preservation. This blog post will not be focusing on the overall OAIS model, but on the concept of Designated Community. Determining the Designated Community is defined as one of the responsibilities that an OAIS archive should undertake:

‘An identified group of potential Consumers who should be able to understand a particular set of information. The Designated Community may be composed of multiple user communities. A Designated Community is defined by the Archive and this definition may change over time (CCSDS 2012).’

This all sounds easy enough, an archive needs a group of users, a Designated Community, for whom they archive and make the material accessible. However, the model does not give any further information on how an archive should determine the appropriate Designated Community. In her 2016 article 'The power of imaginary users: Designated communities in the OAIS reference model', Rhianon Bettivia concludes, after a number of interviews with OAIS authors that, the ‘...Designated Community must be someone, whether real or not, and it cannot be everybody’ because that would imply the collection of an unfeasible amount of Representation Information or place an unreasonable expectation on the level of Knowledge Base required from such a broad community in order for them to independently understand the preserved digital information.

However, the Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) appears to contradict that conclusion:

‘A repository may have a single, generalized “designated community” (e.g., every citizen of a country), while other repositories may have several, distinct designated communities with highly specialized needs, each requiring different functionality or support from the repository.'

Either way, neither of these two definitions aid an archive in determining their Designated Community. This is not as much of a problem for archives serving a small specialized user group, but is a problem for institutions serving several user groups or even the public. The National Archives is a good example of this, having a statutory duty to serve the general public (and having, at the heart of our new strategy, the desire to be an Inclusive Archive). But how do we square this with the implied need in OAIS to collect large amounts of Representation Information and so on to ensure users can then independently understand the files we make available? And even smaller depositories who are opening up their archives through the World Wide Web are starting to serve a wider audience, how does this fit into the Designated Community expected of the OAIS model? David Giaretta has suggested that we should not worry about this, he states that if an archive is unable to define their Designated Community in a way that will make the data accessible to the right users, the depositors of the data will not entrust their digital objects to the archive and they will go elsewhere. But this assumes an open market where depositors can choose an archive, it does not work for institutions such as The National Archives, where government bodies are obliged to deposit their data with us, there is no going elsewhere. 

And then we have processable digital data, such as 3D models, GIS maps and video games, where it can be unclear what the significant properties of these files are and could differ for different user groups with different Knowledge Bases and skills (see Bettivia). For example, in the case of video games, some users will be interested in the code, while others are more interested in the game play, or even the wider game experience (as explored by Jerome McDonough and James Newman), making it very unclear which properties should be preserved for the users of the archive.

In this blog post we would like to review the current methods used for determining Designated Communities and look at published examples of Designated Community. Then we would like to propose a discussion surrounding the Designated Community and we propose a number of potential solutions for it.   

Defining the Designated Community

As discussed above, whilst the OAIS model and TRAC give a definition for Designated Community, they do not help an archive in determining their own. But for most archives, the defining of a Designated Community does not become a reality until they apply for certification or accreditation. Many of the certification requirements, including Core Trust Seal and Trusted Digital Repository relate back to the Designated Community and emphasize the importance of having one, however they also do not offer any help in defining the Designated Community as emphasised by Edward Corrado.

This does not mean that an archive without certification is unable to serve its Designated Community. The OAIS model introduced the term to make archive aware of who they are making material available for, however the model is of a conceptual nature and lacks a practical implementation and this disconnect between the conceptual model and practice forces national institutions to artificially exclude certain users from their Designated Community, as they cannot state ‘everybody’.

However, over the last decade several approaches have been proposed to aid in the process of determining and defining appropriate Designated Communities. Tarvo Kärberg composes a Designated Community by using automated user observations in archive user profiling and Anita Locher used the Delphi expert judgment elicitation method to propose user groups in her chapter 'Characterizing Potential User Groups for Versioned Geodata' in Service-Oriented Mapping: Changing Paradigm in Map Production and Geoinformation Management. However, there is no evidence of any archives taking one of these approaches in practice.

Despite this lack of guidance, many institutions have published their defined Designated Community. As of June 2019, two archives have been formally accredited under ISO 16363, six under TRAC, 93 hold the Core Trust Seal or the earlier Data Seal of Approval (CoreTrustSeal 2019) and five other published definitions from institutions have been found (Canada's York University; Indiana University; McMaster University; NASA Space Science Data Coordinated Archive; and Academic Preservation Trust). In addition, the US Council of State Archivists does offer advice on formulating a Designated Community, and steps to ensure an archive is working with them appropriately.  

Illustrative world map showing the distribution of repositories holding the CoreTrustSeal and Data Seal of Approval certifications

Few, even those assessed for ISO 16363, make a formal link between the defined Designated Community and the expected Knowledge Base and required Representation Information. Many definitions of Designated Community have a high degree of similarity with each other.  This can either be explicitly acknowledged such as York University acknowledging the ScholarsPortal definition of Designated Community as the inspiration for their own definition, or probably arises as a result of several institutions being part of networks such as CLARIN and therefore serving very similar groups of users, just based at different institutions or within different countries.

This suggests that the concept of a Designated Community remains one of the least developed aspects of the OAIS model, with many archives being unsure how to articulate their Designated Community clearly. It seems that archives are currently making a minimal attempt at the definition in order to satisfy the requirements of certification or accreditation schemes and may not recognise the value of defining the Designated Community as part of a wider user engagement strategy. 

The most robust descriptions of Designated Community are those which do explicitly describe the Representation Information being maintained by the archive, or the assumed Knowledge Base of the Designated Community.  It is noticeable that there are only three such definitions, and none are for institutions that regard themselves as having a wide public remit. In fact, all specifically disclaim any real support for general public access. The first of these is CLOCKSS, they state: 

‘No Consumers "interact with [CLOCKSS] services to find preserved information of interest and to access that information in detail.’ 

They go on to say that all the material they hold is designed to be accessed via a web browser, and that this is sufficient Representation Information due to the nature of the World Wide Web and the fact that it is largely backwardly compatible by design. 

The next institution is the NASA Space Science Data Coordinated Archive (NSSDCA), which says: 

‘Many NSSDCA holdings are also available to the general public. In general, the [Representation Information] associated with the [...] AIP is designed to make the data usable by a scientist with the appropriate college-level education or equivalent. [...], the general public is not considered to be part of the Designated Community for these data. NSSDCA staff is not generally available to support use of the data by individuals without the appropriate background.’

Finally, the United States Government Publishing Office says:

’The Designated Community for the system includes staff in Federal depository libraries, the United States Senate, the House of Representatives, the Administrative Office of the United States Courts, and the Office of the Federal Register. Members of the Designated Community are familiar with the organizations, documents, publications, and processes of the legislative, executive, and judicial branches of the United States Federal Government. The Designated Community is able to access content information from the system and render it electronically.’ 

Discussion

So, following the OAIS model leads us to conclude that we cannot meaningfully define ‘everybody’ as our Designated Community. There is an added complexity on top of this with the emergence of born-digital processable data in the archives, which can be represented in several different ways, according to the needs and interests of the user. The OAIS model and certifications based on this model ask archives to determine their Designated Community, but do not give any further guidance on this.

This lack of guidance appears to have led to something of a “tickbox culture” around the OAIS concept of the Designated Community, with archives only defining theirs when forced to by accreditation processes, and not really engaging with the implication within the reference model that the choice of Designated Communities has consequences (and trade-offs) for the Representation Information that the archive is collecting, or the Knowledge Base it is assuming of its researchers.  Nor does it seem that the accreditation processes are pushing archives on this aspect. However, is the OAIS model being too hard on archives here? We do not expect every visitor to be able to “independently understand” the content of paper and parchment records, we know that they may have to develop language and palaeography skills, plus understanding of the wider historical context of a record, and specific domain knowledge to interpret terms used within a given record. Even looking beyond these published examples, the wider archiving community is struggling to make data accessible to their growing audiences. But how should we be tackling this problem? Especially as the user of archives are changing and their expectations are growing. They expect interfaces that are easy to navigate and search to give relevant results to show up in seconds. Also, as noted by Martien De Vletter, more quantitative questions are being asked by users, which can encapsulate whole collections. 

The National Archives has acknowledged this change in behaviour of users: and our Digital Strategy now distinguishes between ‘readers’, users who view individual records in a more traditional manner (even if interacting with digital objects); and ‘data users’ who want to work with records as data for computational analysis, enabling broader work with records (we also know that we need to work on our data model to enable this type of access). The National Archives has taken the first step in making data available for the ‘data users’ by providing an Application Programming Interface (API) to access the metadata of the records held by The National Archives. This does not currently give the ‘data users’ access to the actual digital objects, but it is a start in recognizing different uses of the archives and catalogue (David has given a worked example of API use, building on a research question initially addressed by colleagues through more traditional means). 

The National Archives is not the only one who have recognized a change in user behaviour. The Netherlands' Institute for Sound and Vision (Beeld en Geluid), have also acknowledged that their users not only want different metadata, but also different environments as they work with data in different ways. York University have also noted the presence of data users. 

Determining a Designated Community by looking at the different usage of the archives instead of the user groups could open up a better way of linking the Designated Community to a Knowledge Base and the right Representation Information. However, there is another group of users, the ‘digitally curious’. This group would fall in between the ‘readers’ and ‘data users’. As described by David Nicholas and David Clark in their chapter "Finding Stuff" in "Is Digital Different?": these users are aware of the possibilities that computer techniques have to offer, such as the metadata API that The National Archives hosts, but are not able to execute these tasks themselves. Most of these users also have limited retrieval skills for digital data, many are untutored or relative novices to online search (Nicholas and Clark 2015), and do not have the skills to perform more complicated computational tasks or easy access to a computing infrastructure that would assist them. 

But how should we serve these three different types of users? We believe that it may be beneficial to put an infrastructure into place, where archives work together on a shared Knowledge Base, saving us all from creating a distinct Knowledge Base ourselves. An example of such a project is CLARIAH (and particularly the MediaSuite), which gives one platform to browse audio-visual content from different Dutch institutions in one place. This platform also provides notebooks and possibilities to compute over the data. The platform provides guidelines on how this should be done. The EaaSi project is another good example of building shared infrastructure, and also making use of existing infrastructures outside archives to help with the question of representation information and knowledge bases, building on existing data in WikiData to help provide suitable emulated environments for rendering older fil formats. But who should be educating the digitally curious? Is it down to us as the digital archives, or is it something that we should start expecting from our users?

Screenshot from the Clariah MediaSuite to demonstrate possible shared infrastructure approaches

We do not suggest any solution to the problem of serving ‘everybody’ when not being able to create an indefinite number of distinct Knowledge Bases and sets of Representation Information. But we hope by approaching it in a slightly different way we would be able to serve a wider and more inclusive audience. Also, we hope that archivists will challenge the term and the overall OAIS model more in the future. The model was created to aid us in digital preservation, but it does not mean that we have to take everything literally word for word from it. 

Comments   

#1 David Underdown 2019-07-08 16:05
On twitter, the Software Preservation Network has pointed out (https://twitter.com/SoftPresNetwork/status/1148259324571070464) the personas being developed as part of the Collections as Data project https://collectionsasdata.github.io/personas/
Quote
#2 Jenny Mitcham 2019-07-11 12:57
A great and thought-provoki ng articulation of the problem associated with the OAIS term 'designated community'. Thanks for sharing your thoughts both! I used to like the idea of 'designated community' ...but that was when I worked at the Archaeology Data Service where the community was relatively easy to define. Since then I have had many misgivings about how useful the concept is, particularly when serving a broad community. The fact that certification/a ccreditation is often built on this concept further muddies the waters in my mind.

However, I do believe that considering users and potential use cases should be a key part of what we do - particularly when carrying out preservation planning activities or enabling access to a particular collection of digital objects. This to me feels much more practical and useful than defining a broad designated community to cover an extensive and varied archival collection.
Quote
#3 David Underdown 2019-07-12 15:41
Thanks Jenny,

The emphasis on Agile development techniques at The National Archives means we're getting very used to thinking in terms of user needs.

It seemed to us that the accreditation processes are in practice largely ignoring the Designated Community aspect, and certainly the linkage that the OAIS model implies with the Representation Information that an archive is acquiring to support the records themselves. Also I think the main stumbling block is actually the idea of making digital objects "independently understandable" and including an actual understanding of what the digital object is actually communicating in addition to being able to find appropriate software to render it. We don't really expect readers of analogue materials to necessarily come with (for example) fully developed paleography skills and knowledge of probate procedures in the 19th century.
Quote

Scroll to top