Evanthia Samaras

Evanthia Samaras

Last updated on 20 April 2021

Evanthia Samaras is the VERS Senior Officer at Public Record Office Victoria. She attended IDCC 2021 with support from the DPC’s Career Development Fund, which is funded by DPC Supporters.


Recently my workplace, Public Record Office Victoria (PROV), joined the DPC. This partnership afforded me eligibility to apply for one of DPC’s Career Development Fund grants to attend the 16th International Digital Curation Conference (IDCC21) on Monday 19 April 2021. Without this generous grant I would not have been able to attend the conference. I would like to extend my gratitude to the DPC for providing me with this valuable learning opportunity. In this blog, I will share some of my learnings from IDCC21 – focusing on digital preservation-related content from the online conference.

Why attend the conference? 

I have worked as a government archivist in Melbourne, Australia since 2014. In my current role at PROV, I am responsible for supporting the Victorian Electronic Records Strategy (VERS), which is about ensuring the creation, capture and preservation of authentic, complete and meaningful digital records by the Victorian public sector.

Most of my digital archiving work to date with Victorian public offices has been with common digital record formats such as PDF and MS Office files. However, the Victorian Government also creates and shares datasets to deliver policies and services for Victorian citizens. To be able to provide advice on the preservation of Victorian Government datasets, I feel it is prudent to develop an understanding of data curation.

I have not had much exposure to date within this field (beyond my own PhD research, which was a qualitative research study). The theme for IDCC21 is: ‘Data quality and data limitations: working towards equality through data curation’. Hence, attending would provide me with an excellent opportunity to build knowledge about data curation including concepts, frameworks, tools, challenges, best practices and more!

The digital preservation session

With the timezone difference, the conference started in the evening for me in Australia – requiring me to stay up rather late (and drink too much coffee!). Conveniently, following the fantastic keynote from Te Taka Keegan on Māori data sovereignty, the first session of three lightning talks I attended was about digital preservation!

 

Digipres talk 1: File identification tools for curating research data formats: not just for preservation anymore! 

First up was Rebecca Dickson from the Council of Prairie and Pacific University Libraries (COPPUL). Rebecca’s talk was about a project currently underway in Canada to assess file format identification tools for data curation. To undertake this project, the project team needed to first create a data set to run tools upon, which needed to contain a variety of data types. Preliminary findings found that DROID and Siegfried tools were really good at detecting common file types such as Adobe PDF and MS Office files. However, they struggled with tabular data and code. Also, lots of data files use a .dat extension, which the tools often failed to identify. Rebecca also noted that data files are not a well-documented file format and there is no PRONOM record to match them. In response to this issue, at the talk’s Q&A, a delegate suggested holding a sprint event to add research data formats to PRONOM – a great idea!

 

Digipres talk 2: Forgotten FE 

Next was Paul Stokes from Jisc, who presented about the state of Further Education (FE) sector data. He said that FE institutions are often overlooked when it comes to data management and preservation. Paul described the FE data system landscape as a “can of worms” for various reasons. For instance, it historically evolved organically without clear planning and is rife with mergers and acquisitions. Paul then explained that FE institutions need to keep a lot of data, often for long periods of time. However, retention periods are not widely understood and the reasons for retention are not always valid or explained. As a result, data is often kept “just in case”. Overall the FE landscape has numerous pain points including:

  • Multiple disconnected systems, many of which are becoming obsolete

  • Little or no back-up or export facilities (some systems remain online indefinitely)

  • Data storage requirements are exponentially increasing

  • Retention obligations that are difficult to meet

  • Lack of available FE data standards to provide guidance.

 
Digipres talk 3: Re-use, the ultimate test for sustainability 

The final lightning talk in the digital preservation session was from Barbara Sierman of Digitalpreservation.nl. Barbara’s talk was about how digital preservation should examine the satisfaction of the user because this is the ultimate test of the sustainability of data. Barbara began by outlining the importance of open science to support scientific integrity, transparency, open access publishing and the sharing and re-use of data. She then introduced the FAIR principles and explained how FAIR data practices have been fostered in Europe through the FAIRsFAIR project and associated tools F-UJI (an automated data assessment tool) and FAIR-Aware (an online tool to help researchers and data managers assess their datasets against FAIR before uploading them into a repository). Then she introduced the TRUST principles, which originated from the Reference Model for an Open Archival Information System (OAIS). Barbara noted that while there is FAIR and TRUST, the users are missing and the concept of re-use is out of scope.

esblog

Figure 1 Re-Use is missing from data digital preservation (adapted from Sierman’s presentation slide)

Barbara indicated that digital preservation needs to pay attention to the current and future user and that we should collect more than figures about the re-use of data. We should examine the satisfaction of the user as this is the ultimate test of the sustainability of the data. Digital preservation processes can be steered by users’ experiences of both the successful and unsuccessful re-use of data.

Reflections on the conference

Overall I found the conference experience overwhelming (so many terms to learn!) and eye-opening. It was interesting to learn about the challenges of digitally preserving data and the wide range of projects, tools, workflows and studies that are in progress around the world to strengthen and improve data curation practices and outcomes. While I have much to learn, attending IDCC21 was a fantastic starting point to begin developing my knowledge and skills in data curation. Thanks again DPC for making this happen!


Scroll to top