No guide to computational access would be complete without discussion of some of the ethical issues which should be considered when applying these techniques. This section provides a summary of some of the key considerations and signposts further sources of information.

 

The digital preservation community has inherited and further developed a sophisticated understanding of the ethics surrounding the provision of access to heritage collections, an understanding which is having to become even more sophisticated in respect of the provision of digital access. Some of the challenges around the ethics of access, with a particular focus on approaches and tools that have been applied at the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) are summarized in Exploring ethical considerations for providing access to digital heritage collections.

The provision of computational access will require a similar evolution in understanding for several reasons. Firstly, because computational access, as defined in this guidance, generally involves the use of algorithms or other computational methods, some of which are not in themselves commonly understood or easily explainable. This leads to issues with maintaining transparency and accountability with respect to, and hence trust in, their use and the conclusions and outcomes that arise from that. Arguably, we should for this reason be circumspect in adopting such techniques for our own (digital preservation) purposes.

Secondly, a common factor in the use of algorithms and other computational methods is the desire to be able to work at scale, processing larger amounts of data at one time than was previously possible using more traditional methods, and being able to combine disparate datasets to create a clearer picture. The bigger the scale, the bigger the potential for harm and unexpected outcomes, particularly when the data being worked on are about people. This potential for harm is heightened still further by the fact that the regulatory environment for the use of algorithms and computational methods (in all contexts, not just for the provision of computational access to digital heritage collections) is itself in an unsophisticated state of development, with discussion ongoing and the absence of any form of widely held consensus on the topic. There is a good discussion of some of these questions here: Algorithmic accountability for the public sector.

In large part then, there is an element of ‘watch this space’. However, a first step could be to enhance our documentation of any ‘sets’ of material or data to which computational access is anticipated or offered. Within the AI and data science communities increased attention is being paid to dataset documentation; see, for example, Datasheets for Datasets – Microsoft Research or the work by Eun Seo Jo and Timnit Gebru from the machine learning community. While some of the questions asked as prompts to documentation may seem odd to those used to describing digital assets for non-computational use, this highlights the information needed in order to ensure that they can use it safely. Similarly, projects such as The Data Nutrition Project and Data Hazards flag up some of the already known dangers for those wishing to analyze data using computational means and a blog from the Archives Hub describes some of the challenges of understanding the effectiveness of, and bias within the tools.

When providing access to collections, improving the documentation we offer to support users to use it more safely would seem to be the least we, as practitioners, can do. It would provide users with a context for the material and help them to make an informed and ethically responsible choice as to how to use it. Although what users ultimately do with this material is their responsibility, there is perhaps still a debate to be had about whether, in light of the increased potential for harm and the nascent regulatory environment, we feel it is our responsibility to police the uses to which our collections are put even more stringently than in the past. A good way to get started is to provide terms of use, discussed in the approaches section of this guide. Should we prohibit certain types of use? And if so, how do we design systems and procedures to prevent these?


Scroll to top