Digital preservation for research datasets

Also in this section
Blog Topics

Latest Comments

An Unexpected Gift
- Niamh Murphy 4 months ago
  
  This is fantastic! Thank you so much, Andy! Merry Christmas!
The unsung digital preservation story arc in the Star Wars galaxy
- Euan Cochrane 4 months ago
  
  This is great and it reminds me of an old post from 2017 after Rogue One came out. Jon Tilbury at ...
Đáp ứng Thách thức Định dạng Tệp - Liên minh Bảo tồn Kỹ thuật số
- Andrew Jackson 4 months ago
  
  Yes, I agree, there's likely to be many more. The goal here is just to establish some kind of credible ...

Archives

DPC Blog RSS Feed

Also in this section

Antonio Guillermo Martínez Largo

Last updated on 7 June 2021

Antonio Guillermo Martinez is the CEO and founder of LIBNOVA and is based in Madrid, Spain.

The following blog is also available in Spanish below:

Last year, in our guest blog post for the DPC we wrote about “Augmenting the community, lowering the risk internationally” and we commented that many times individual problems related to digital preservation have a solution by looking at the experience of the community. This year the theme for the World Digital Preservation Day is ‘Digits: For Good´, and we want to focus on digital preservation of research datasets.

Let's look back, LIBNOVA promise from the beginning is to provide the most advanced digital preservation platform to the community. And we are achieving it step by step.

A few years ago, we created LIBNOVA RESEARCH LABS, to coordinate the lines of research to be followed in technological innovation within the company. At the same time, we have been doing market research to understand the needs and the differences between sectors (e.g., cultural heritage vs research). And finally, last year, the confluence of these two paths has led us to the development and launch of a ground-breaking research data management and preservation tool.

But what have we learned along the way?

Research data challenges

During our market study, we have been getting feedback from more than 50 research organizations. And these are the most widespread reasons because they do not properly preserve research data:

There is a lack of a unified view of research data, as it resides in many dispersed platforms during its lifecycle, due to functionality, protocols and featured needs.
Digital preservation is addressed (if at all) at the end of the project, when the next project is on everybody’s mind, the resources are scarce, and the data is dispersed over a myriad of platforms.
Due to the fact that in many projects data structure and software are cutting edge and no project is the same; there is no effective way to standardize formats and data structures.
Often the code and data are not together, losing representation information.

Our own challenges as a researcher

As a research organization we also have our own concerns. The main ones are the following:

We need to be confident about how research data is managed and protected for the whole data lifecycle.
We need to provide the best available tools to our researchers, carefully balancing resources across research projects, plus asking: how much is this going to be?
We are concerned about data volumes and platform scalability.

Thoughts and thinking on digital preservation for research datasets

These are the main insights we have reached in these early years of research and feedback on digital preservation for research datasets:

If we focus on archiving at the end of the project, most of it is already lost:

It should start BEFORE the project starts, providing a platform that researchers can use during the whole project lifecycle, as the “only” place to keep things.
Researchers work together (even from different institutions), so they need a place to share content.
All the previous topics, while also taking care of the “integrity chain”.

For Research data, code is usually data’s representation information:

It is important to preserve the code, together with the data. Reproducibility is usually needed in the long term.
ISO 16363 and OAIS alignment is crucial.

Create easy ways to provide metadata, including the possibility to create the Representation Information Network for the content.

Flexible to accommodate different disciplines.
But standards-based to improve accessibility.
How much metadata? At least consider FAIR and TRUST principles.

When thinking about digital preservation of research data we have to think that we are not only preserving digits, but that it may be the key to much future research. That is why we have to take all the necessary precautions during the process.

La preservación digital de los datos de investigación (research datasets)

El año pasado, en nuestro post como invitados en el blog de la DPC, escribimos sobre “Aumentar la comunidad para disminuir el riesgo, de forma internacional” (se puede leer en inglés aquí: Augmenting the community, lowering the risk internationally) y comentamos que muchas veces la solución a los problemas individuales relacionados con la preservación digital se encuentra mirando a la comunidad, y buscando si otra institución ya ha pasado por lo mismo. Este año, el tema central del Día Mundial de la Preservación Digital es “Digits: For Good” (Datos: Para siempre) y nosotros hemos querido centrarnos en la preservación digital de los datos de investigación.

Pero echemos la vista atrás, la promesa de LIBNOVA siempre ha sido proporcionar a la comunidad la plataforma de preservación digital más avanzada. Y lo estamos consiguiendo paso a paso.

Hace unos años, creamos LIBNOVA RESEARCH LABS, el departamento desde el que se coordinan las líneas de investigación a seguir en innovación tecnológica dentro de la compañía. Al mismo tiempo, hemos estado haciendo un análisis de mercado para entender las necesidades de los distintos sectores y las peculiaridades de cada uno (por ejemplo, entre patrimonio cultural y departamentos de investigación). Finalmente, la confluencia de estas dos vías de trabajo ha dado como resultado el desarrollo y lanzamiento de una herramienta de gestión y preservación de datos de investigación innovadora y vanguardista que presentamos el año pasado.

Pero, ¿qué hemos aprendido por el camino?

Los desafíos de los datos de investigación

Durante nuestro estudio de mercado, hemos intercambiado comentarios e información con más de 50 organizaciones de investigación. Las razones más extendidas entre las instituciones por las que no están preservando adecuadamente sus datos de investigación son las siguientes:

Falta una visión unificada de los datos de investigación, ya que están dispersos en múltiples plataformas a lo largo de su ciclo de vida, atendiendo a su funcionalidad, los protocolos o procedimientos a seguir y sus principales características.

La preservación digital se aborda (si se aborda) al final del proyecto, cuando todos están pensando ya en el siguiente proyecto, los recursos son escasos, y los datos están dispersos en infinidad de plataformas.
En muchos proyectos tanto las estructuras de datos como el software empleado son de vanguardia (se están inventando) y además ningún proyecto de investigación es igual a otro, por lo tanto no hay una manera eficaz de normalizar o estandarizar los formatos y las estructuras de datos.
A menudo, el código y los datos no están juntos, por lo que se pierde información relevante.

Nuestros propios desafíos como investigadores

Como organización de investigación, también tenemos nuestras propias preocupaciones, que se podría resumir en los siguientes puntos:

Necesitamos tener confianza en cómo se gestionan y se protegen los datos de investigación durante todo el ciclo de vida de los datos.
Necesitamos proporcionar las mejores herramientas disponibles a nuestros investigadores, equilibrando cuidadosamente los recursos disponibles en el proyecto de investigación, preguntándonos: ¿cuánto va a costar esto?
Nos preocupa también la gestión de grandes volúmenes de datos y la escalabilidad de la plataforma.

Pensamientos y reflexiones sobre la preservación digital de datos de investigación

Estas son las principales conclusiones a las que hemos llegado en estos primeros años de investigación y retroalimentación sobre la preservación digital para los conjuntos de datos de investigación:

Si nos preocupamos por el trabajo de archivo o de archivado al final del proyecto, la mayor parte de la información ya se habrá perdido:

Esta labor debería comenzar ANTES de comenzar el proyecto, proporcionando una plataforma que los investigadores utilicen durante todo el ciclo de vida del proyecto como el “único” sitio para guardar los datos.
Los investigadores trabajan juntos (incluso desde diferentes instituciones), así que necesitan un sitio para compartir el contenido.
Además de lo anterior, también tenemos que tener en cuenta y cuidar la “cadena de integridad”.

En el caso de los datos de investigación, el código normalmente es información representativa de los datos:

Por eso es importante preservar el código junto con los datos. Ya que la reproducibilidad suele ser necesaria a largo plazo.
Es crucial estar alineado con el Modelo OAIS y la ISO 16363.

Es necesario crear formas sencillas de proporcionar metadatos, incluyendo la posibilidad de crear la Red de Información de Representación (Representation Information Network) para el contenido:

Ha de ser flexible para dar cabida a diferentes disciplinas.
Basado en estándares para mejorar la accesibilidad.
Y ¿cuántos metadatos? Se han de considerar al menos los principios de FIABILIDAD (FAIR) y CONFIANZA (TRUST).

Cuando pensamos en la preservación digital de datos de investigación tenemos que pensar que no solo estamos preservando dígitos, si no que puede ser la clave de muchas investigaciones futuras. Por eso, debemos tomar todas las precauciones que sean necesarias durante todo el proceso, incluso desde antes de que comience el proyecto.

Add comment

An Unexpected Gift

The unsung digital preservation story arc in the Star Wars galaxy

Đáp ứng Thách thức Định dạng Tệp - Liên minh Bảo tồn Kỹ thuật số