Research Data Published through Repositories

Research Data Published through Repositories

 Vulnerable small

Research data published through digital repositories or other services providers with specialist skills to manage the data and an ongoing commitment to ensure preservation.

Digital Species: Research Outputs

Trend in 2022:

reduced riskMaterial improvement

Consensus Decision

Added to List: 2019

Trend in 2023:

reduced riskMaterial improvement

Previously: Vulnerable

Imminence of Action

Action is recommended within three years, detailed assessment within one year.

Significance of Loss

The loss of tools, data or services within this group would impact on many people and sectors.

Effort to Preserve | Inevitability

It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques.

Examples

Recognized data repositories in specialist disciplines; institutional data repositories in subject specialist centres and partnerships.

‘Endangered’ in the Presence of Aggravating Conditions

Lack of long-term commitment; lack of user community; lack of visibility to potential depositors; lack of institutional commitment; insufficient documentation.

‘Lower Risk’ in the Presence of Good Practice

Certification and documented good practice; effective documentation requirements for depositors; proven financial sustainability; skilled staff including professionalising disciplinary and general data stewardship offering a clear career option; participation in the digital preservation community; research data management training by repositories and research funders offered to depositors, in particular new career researchers.

2023 Review

This entry was added in 2019 as a separate entry, but it was previously introduced in 2017 under ‘Published research outputs,’ though without explicit reference to the capacity of the repository infrastructure. In 2019, the Jury split the entry into a range of contexts for research outputs, including this addition. It was classified as Vulnerable; the preservation of research data published through a well-founded repository with the capacity and commitment to ensure preservation and capability through their own professional development activities makes it a 'lower risk' outcome for research data.

The 2021 Jury agreed with this classification but commented on the improvements and initiatives towards the preservation of research data and outputs, leading to a trend towards reduced risk.

The 2022 Taskforce agreed on a trend towards reduced risk based on material improvement over the last year that have not only offered examples of good research data management and preservation practices but also suggest a significant shift towards culture of change and collaboration across different research communities and stakeholders. These include (but are not limited to) improvements and initiatives by the European Open Science Cloud (EOSC), Science Europe, Research Data Alliance (RDA), Digital Curation Centre (DCC) and related projects on the preservation of research data and outputs.

The 2023 Council agreed with the Vulnerable classification and noted that there was a trend towards reduced risk due to increasing research data management and engagement activity by libraries, which should result in increasing amounts of datasets being deposited. The 2023 council did also note it would be useful to see empirical data of depositing trends to assess this.

Additional Comments

A key consideration with this entry is whether the data repository is integrated with a preservation system to facilitate long term access and usability of datasets.

The loss of tools, data or services within this group would impact on people and sectors around the world. Particularly those involved with reproducibility and those wishing to use the datasets for further research.

Although there have been improvements in current practice, policies and workflows, there is still a significant corpus of information that was deposited before these improvements came into force. It is unlikely that there will be the time, will or resources to bring this information up to current standards.

Creating additional preservation metadata to research data holdings may help render data more robust in the long term, where using a preservation system is not an option. With an emphasis on environmental sustainability, some repositories hesitate mandating additional copies of large datasets which may be in the region of hundreds of terabytes, as this adds to both storage cost and carbon footprint, especially when capturing and preserving the research methodology would enable recreating the dataset.

Case Studies or Examples:

See also:

Read More

Published Research Data Appended to Journal Articles

Published Research Data Appended to Journal Articles

   Endangered large

Closed research data sets produced and documented in accordance with good practice and simply appended to a journal article or transferred to a repository that does not have sufficient subject-matter expertise or funding commitment to ensure reliable or ongoing preservation for the long term.

Digital Species: Research Outputs

Trend in 2022:

reduced riskMaterial improvement

Consensus Decision

Added to List: 2019

Trend in 2023:

reduced riskMaterial improvement

Previously: Endangered

Imminence of Action

Action is recommended within three years, detailed assessment within one year.

Significance of Loss

The loss of tools, data or services within this group would impact on people and sectors around the world.

Effort to Preserve | Inevitability

It would require a small effort to preserve materials in this group going forward, requiring the application of proven tools and techniques.

Examples

Data sets added to papers in repositories that are designed primarily for papers; electronic journals offering data sets without obvious preservation capacity; institutional repositories servicing highly complex scientific data sets with insufficient subject-matter expertise.

‘Endangered’ in the Presence of Aggravating Conditions

Unstable funding or revenues; poorly designed migration or normalization processes; poorly formed ingest and quality assurance procedures; rapid churn of staff; incoherent patterns of subject matter; lack of domain knowledge; no or very small numbers of users; weak or absent collecting policy; deposit to ensure minimal compliance with funder mandate; limited or dysfunctional data management planning.

‘Lower Risk’ in the Presence of Good Practice

Clear preservation planning; repository development roadmap; ability to transfer collections or share metadata with subject repositories or portals; strong user base; demonstrable re-use of data; clear collecting policy; data management planning early in the data lifecycle.

2023 Review

This 2019 entry was previously introduced in 2017 under 'Published Research Outputs,' though without explicit reference to the research data appended to journal articles. In 2019, the Jury split the entry into a range of contexts for research outputs, including this addition and ‘Research Data Published through Repositories’. The entry draws attention to services that take upon themselves commitments to preserve research data, but which may not deliver those promises through lack of capability.

The 2021 Jury agreed with the Endangered classification but commented on the improvements and initiatives towards preservation of research data outputs, with good practice documentation and replication in this space (e.g., collaborations with publishers and repositories, LOCKSS, CLOCKS, etc.). For these reasons, the 2021 trend was towards reduced risk.

The 2022 Taskforce agreed on a trend towards reduced risk based on material improvement over the last year that have not only offered examples of good research data management and preservation practices but also suggest a significant shift towards culture of change and collaboration across different research communities and stakeholders. These include (but are not limited to) improvements and initiatives by the European Open Science Cloud (EOSC), Science Europe, Research Data Alliance (RDA), Digital Curation Centre (DCC) and related projects on the preservation of research data and outputs.

The 2023 Council changed the classification from Endangered to Vulnerable. This change was due to the fact that many, if not most, HEI libraries that produce research are doing more in terms of research data management and the activities in this area are growing and scaling up. Due to increased focus on this area, it was recommended that the classification change to Vulnerable

Additional Comments

Research data is complex and has specific requirements for documentation which may only be known to subject matter experts. However well intended, it is risky for institutions to attempt to replicate that level of expertise across all the domains within the institution, and it can be hard for smaller publishers to make commitments to sustain data in the long term.

The loss of tools, data or services within this group would impact on people and sectors around the world. Particularly those involved with reproducibility and those wishing to use the datasets for further research.

Although there have been improvements in current practice, policies and workflows, there is still a significant corpus of information that was deposited before these improvements came into force. It is unlikely that there will be the time, will or resource to bring this information up to current standards.

UK funders e.g., UKRI-NERC Environmental Data Service are educating researchers about data policies which mandate depositing master and raw data at the funder disciplinary repository. These repositories have a strong expertise in the research discipline ensuring data and metadata standardization and quality assurance. Any copies of datasets published in journal articles or similar are considered secondary copies and do not comply with data policy, hence risking obtaining future research funding by the institute attempting to use journal outputs as their funder-acknowledged datasets.

The significance and impact of this entry specifically depends on whether it is the only copy of the dataset in existence, or whether there is another copy hosted in a data repository.

Case Studies or Examples:

  • The FAIRsharing Collaboration with DataCite and Publishers. See McQuilton, P., Sansone, S.A., Cousijn, H., Cannon, M., Chan, W.M., Carnevale, I., Cranston, I., Edmunds, S., Everitt, N. and Ganley, E., (2019) ‘FAIRsharing Collaboration with DataCite and Publishers: Data Repository Selection, Criteria That Matter’. Available at: https://doi.org/10.17605/OSF.IO/N9QJ7

  • Resources and research outputs from the Enhancing Services to Preserve New Forms of Scholarship project, which examined a variety of enhanced eBooks and identified which features can be preserved at scale using tools currently available. Of particular note is the published guidelines for preserving new forms of scholarship. See Greenberg, J., Hanson, K., & Verhoff, D. (2021) ‘Guidelines for Preserving New Forms of Scholarship’ NYU Libraries. Available at: https://doi.org/10.33682/221c-b2xj.

  • The work by the Centre pour la Communication Scientifique Directe (CCSD) of France and the Confederation of Open Access Repositories (COAR) in creating a preprint repository directory which has been relevant to building a user community). See Centre pour la Communication Scientifique Directe (CCSD) of France and the Confederation of Open Access Repositories (COAR) (n.d.) ‘Directory of Open Access Preprint Repositories’. Available at: https://doapr.coar-repositories.org/ [accessed 24 October 2023]

Read More

Cloud Storage

Cloud Storage

   Vulnerable small

Materials routinely copied or backed up to an independently managed, off-site data storage facility and able to be restored under contractual terms.

Digital Species: Cloud, Integrated Storage

Trend in 2022:

No change No Change

Consensus Decision

Added to List: 2019

Trend in 2023:

No change No Change

Previously: Vulnerable

Imminence of Action

Action is recommended as required, with periodic review every five years.

Significance of Loss

The loss of tools, data or services within this group would impact on many people and sectors around the world.

Effort to Preserve | Inevitability

Loss seems likely: by the time tools or techniques have been developed the material will likely have been lost.

Examples

Remote network storage provided by a third-party service under contracts, such as DropBox, Amazon, Microsoft Azure, Dell EMC, Google Cloud Platform, Google Drive, IBM, Intel, Rackspace, Iron Mountain, SAP, and others

‘Endangered’ in the Presence of Aggravating Conditions

Encryption; lack of routine maintenance; lack of storage replication; over-dependence on a single supplier; insufficient documentation; lack of local alternative; political or commercial instability; overly aggressive compression; poor information security; lack of transparent integrity-checking; lack of strategic investment; lack of migration plan; lack of exit strategy; unenforceable penalties; unstable pricing; unpredictable removal costs.

‘Lower Risk’ in the Presence of Good Practice

Backup to different technology; backup to diverse locations; documentation of assets; integrity checking; preservation planning; export functionality; resilient to hacking; version control; resilient funding; technology watch; enforceable contract; disaster planning and documentation; stable pricing; budgeted removal costs.

2023 Review

This entry was added in 2019 to ensure that the range of media storage is properly assessed and presented. The 2021 Jury noted increased risk in light of greater reliance on the cloud and localized disruptions to cloud services over the pandemic. A 2021 trend towards greater risk was based on the wider (global) dependence on these services, especially Google Drive, for recordkeeping and business workflows. The impact of loss increases with more reliance on cloud services leading to greater risk; however, this should not deter people from using cloud storage. The 2022 review agreed with this assessment but noted no significant increase in trend for 2022.

The 2023 Council review recommended this entry be moved to a new higher-level Cloud species as the previous Integrated Storage species worked less well (for hardware technologies). The Council agreed the previous Vulnerable classification stand with the overall risks remaining on the same basis as before so long as there are safeguards in place. However, the Council noted that these safeguards may not in all cases be sufficient to address existing risks. They noted how some governments may cut off the internet in times of unrest, having a disastrous effect on access to cloud-based resources, and raised questions about the feasibility to recover material after a major cloud vendor fails, or due to malicious acts. For these materials, the significance of loss and effort to preserve is much greater with potential for an increased trend towards greater risk with loss of existing safeguards.

Additional Comments

The history of digital preservation suggests that the risk of vendors going out of business or shutting down services is the key issue here, over and above any specific technical solutions or risks.

Case Studies or Examples:

  • Case of a cloud storage provider who suffered major data loss (or its clients suffered data loss) due to a fire in its data centre. Those clients suffered most who did not include geographically redundant storage in the contract with the storage provider as this was more expensive. See Rosemain, M. and Satter, R. (2021) ‘Millions of websites offline after fire at French cloud services firm’, Reuters. Available at: https://www.reuters.com/article/us-france-ovh-fire-idUSKBN2B20NU [accessed 24 October 2023]

  • Case of fired credit union employee accessing the financial institution's computer systems without authorization and destroying over 21 gigabytes of data via remote network storage. See Gatlan, S. (2021) ‘Fired NY credit union employee nukes 21 GB of data in revenge’, BleepingComputer. Available at: https://www.bleepingcomputer.com/news/security/fired-ny-credit-union-employee-nukes-21gb-of-data-in-revenge [accessed 24 October 2023]

  • The National Archives UK (2023) ‘Digital Services and carbon emissions in the heritage sector: some preliminary findings’, which noted areas relating to the cloud and cloud storage. They write “If we are looking for areas where significant carbon reductions could be made quickly, they are not to be found here. The evidence is that hosting digital services on site results in more carbon emissions than a sensibly located (i.e., in a territory with a high proportion of electricity generated from renewables) cloud host and that, where it might be felt that migrating services simply migrates emissions from scope 2 to scope 3, in practice cloud providers can offer the same storage and compute with lower emissions. Amazon in particular reports its view of the carbon ‘saved’ by using its services rather than your own, but these are estimates and should not be regarded as robust.”  See The National Archives (UK) (2023), ‘Digital Services and carbon emissions in the heritage sector: some preliminary findings’. Available at: https://www.nationalarchives.gov.uk/archives-sector/digital-services-and-carbon-emissions-in-the-heritage-sector-some-preliminary-findings/ [accessed 24 October 2023]

Read More

Current Hard Disk Technologies

Current Hard Disk Technologies

   Vulnerable small

Materials saved to storage devices with a variety of underlying magnetic or solid-state (flash) technologies that are hardwired into a computer still under warranty or supported: typically hard disks that are less than five years old.

Digital Species: Integrated Storage

Trend in 2022:

No changeNo Change

Consensus Decision

Added to List: 2019

Trend in 2023:

No changeNo Change

Previously: Vulnerable

Imminence of Action

Action is recommended within five years, detailed assessment within three years.

Significance of Loss

The loss of tools, data or services within this group would impact on many people and sectors.

Effort to Preserve | Inevitability 

Loss of material in this group could be entirely avoidable if provided the means to deploy proven tools and techniques.

Examples

Direct Attached Storage (DAS) such as magnetic or solid-state drives integrated into individual laptops or workstations and into smaller scale storage facilities.

‘Endangered’ in the Presence of Aggravating Conditions

Encryption; poor handling; poor storage; lack of consistent replication; failure of external (dependencies, e.g., suppliers, security); political or commercial interference; failure of internal dependencies (e.g., power supply, disk controller); overly aggressive compression; poor information security; lack of integrity-checking; lack of strategic investment; lack of warranty; unenforceable warranty.

‘Lower Risk’ in the Presence of Good Practice

Backup to different technology; backup to diverse locations; documentation of assets; integrity checking; preservation planning; refreshment planning; export functionality; resilient to hacking; selection and appraisal criteria; version control; resilient funding; technology watch; enforceable warranty; disaster planning.

2023 Review

This entry was added in 2019 to ensure that the range of media storage is properly assessed and presented. It was reviewed in 2021 with a noted trend towards greater risk in light of the continued shift towards reliance on cloud storage with computers increasingly reducing hard disk for solid-state storage and commercial motivations for less support, and reviewed in 2022 with no noted increase in trend towards even greater or reduced risk.

The 2023 Council agreed with the current Vulnerable classification, with overall risks remaining on the same basis as before (no change to the trend), while also noting a slight decrease in the effort needed to preserve and the imminence of action required when compared to the 2021 Jury review.

Additional Comments

As people increasingly select other storage methods, such as cloud, they are less likely to maintain existing content on portable hard disks, which means the portable hard disks are more likely to be overlooked or ignored (e.g., left in drawers) rather than checked and refreshed. There are also indications of increasing prevalence of soldered-in flash storage which cannot easily be accessed in the case of device failure.

Case Studies or Examples:

  • Some new technologies like shingling, HAMR/MAMR and multiple actuators have given HDD technology–and, more importantly for preservation, interfaces such as SATA and SAS–a new lease on life. Nevertheless, the writing is on the wall as flash and related technologies move to NVME and CXL interfaces. See Mellor, C. (2023) ‘Pure: No more hard drives will be sold after 2028’, Blocks & Files. Available at https://blocksandfiles.com/2023/05/09/pure-no-more-hard-drives-2028/ [accessed 24 October 2023]

  • For example, SSDs can be remarkably sensitive to storage conditions when unpowered. See Cox, A. (2013) ‘JEDEC SSD Specifications Explained’, JC-64.8. Available at: https://www.jedec.org/sites/default/files/Alvin_Cox%20%5bCompatibility%20Mode%5d_0.pdf [accessed 24 October 2023]

See also:

Read More

Recently Commissioned or Completed Media Art

Recently Commissioned or Completed Media Art

  Vulnerable small

Media art currently displayed in a gallery or in the process of being displayed.

Digital Species: Media Art

Trend in 2022:

No changeNo Change

Consensus Decision

Added to List: 2019

Trend in 2023:

No changeNo Change

Previously: Vulnerable

Imminence of Action

Action is recommended within twelve months, detailed assessment is a priority.

Significance of Loss

The loss of tools, data or services within this group would impact on many people and sectors.

Effort to Preserve | Inevitability

It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques.

Examples

Media art recently acquired by galleries that utilize specific hardware and software in order to be accessed or exhibited.

‘Endangered’ in the Presence of Aggravating Conditions

Lack of documentation to enable maintenance; lack of clarity with respect to intellectual property; complex interdependencies on specific hardware, software or operating systems; lack of capacity in the gallery or workshop; lack of strategic investment; complex external dependencies; lack of documentation about artist intent.

‘Lower Risk’ in the Presence of Good Practice

Strong documentation; clarity of preservation path and ensuing responsibilities; proven preservation plan; capacity of workshop to support artwork at de-installation; capacity of gallery to conserve after de-installation; capacity of gallery to re-install work.

2023 Review

This entry was added in 2019 as a separate entry, but it was previously introduced in 2017 under ‘Media Art’ with particular reference to historical media art. It was added for greater specificity for its recommendations, to represent works commissioned in the last five years where there is a reasonable expectation that documentation has been produced or could still be obtained.

While the 2020 Jury found no change in trend, the 2021 Jury discussed how prospects for longterm preservation depend entirely on whether the artwork is collected post-commission and by an organization with the resources to care for it. They agreed that the classification remains Vulnerable but with a trend towards greater risk because the imminence of action is timesensitive, requiring working with the artist to get the documentation from them about their work and what is needed before it is too late. Furthermore, there remains a vulnerability for the smaller museums or others that do not take the preservation of media art as seriously.

The 2023 Council agreed with the Vulnerable classification with overall risks remaining on the same basis as before (no change to the trend), although noted a change in the imminence of action from 3 years to 12 months.

Additional Comments

By the time digital art, time-based media, etc., has entered into the permanent care of a stewarding institution, many of its technologies are already end-of-life, unsupported, or the hardware components have deteriorated. Often the expertise to maintain these many interacting components sits outside the host organization, with a technical supplier to the gallery, and this is in itself vulnerable to business change. Although there are a few exceptions, there is a need for greater capacity within the museum and gallery sector to address the challenges. There have been new initiatives for guidance and examples of institutions taking wider sectoral responsibility for standards, which have helped with the effort to preserve, such as Matters in Media Art information resource and guidance.

Media artworks are often made with a network of knowledge that can be precarious. Documentation around production processes can be minimal, and hence acting quickly with known processes can gather information before the knowledge and people networks start to disperse. This can mean preservation of production environments and associated workflows can be preserved alongside the media.

Some art works specifically leverage the limitations and characteristics of the systems that they incorporate, often in unusual ways. This can be hard to migrate or emulate accurately.

Case Studies or Examples:

  • Resources and outputs from the Preserving and Sharing Born Digital and Hybrid Objects From and Across The National Collection project. See V&A Research Projects (n.d.) ‘Preserving and Sharing Born Digital and Hybrid Objects’. Available at: https://www.vam.ac.uk/research/projects/preserving-and-sharing-born-digital-and-hybrid-objects [accessed 24 October 2023].

  • This includes decision model work around acquisition of complex collections such as born digital and hybrid art. See Ensom, T, and McConnachie, S. (2022) ‘Preserving and sharing born-digital and hybrid objects from and across the National Collection’, Decision Model Report: March 2022. Available at: http://doi.org/10.5281/zenodo.7097489

  • Matters in Media Art (n.d.) ‘Guidelines for the care of media artworks’. Available at: http://mattersinmediaart.org/ [accessed 24 October 2023]

See also:

  • NEW MEDIA MUSEUMS: Creating Framework for Preserving and Collecting Media Arts in V4, initiated by the Olomouc Museum of Art as a joint international platform for sharing experience with building and maintaining collections of new media artworks across different types of institutions. The aim of the project is to find workable methods for heritage institutions to build and maintain collections of media arts, which are necessary for safeguarding this area for the benefit of society. See Central European Art Database (2021) ‘NEW MEDIA MUSEUMS: Creating Framework for Preserving and Collecting Media Arts in V4’. Available at: http://cead.space/Detail/projects/3797 [accessed 24 October 2023]

  • The Collaborative Infrastructure for sustainable access to digital art LIMA project, to prevent the loss of digital artworks and to commonly develop the knowledge to preserve these works in a sustainable way. The project ‘Infrastructure sustainable accessibility digital art’ invests in research, training, knowledge sharing and conservation to prevent the loss of both digital artworks and the knowledge to preserve them. See LIMA (n.d.) ‘Collaborative infrastructure for sustainable access to digital art’. Available at: https://www.li-ma.nl/lima/article/collaborative-infrastructure-sustainable-access-digital-art [accessed 24 October 2023]

Read More

PDF

 PDF

   Vulnerable small

Documents presented in PDF (Portable Document Format) format (ISO 32000:1 and ISO 32000:2) and other data wrapped inside them, including all variants and versions, including PDF/A.

Digital Species: Formats

Trend in 2022:

No changeNo Change

Consensus Decision

Added to List: 2017

Trend in 2023:

No changeNo Change

Previously: Vulnerable/Endangered

Imminence of Action

Action is recommended as required, with periodic review every five years.

Significance of Loss

The loss of tools, data or services within this group would impact on people and sectors around the world.

Effort to Preserve | Inevitability 

It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques.

Examples

Documents stored offline, or online in repositories or EDRMS, including reports, agenda, minutes, correspondence, contracts, essays, articles, or research papers, PDF 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 and 2.0. PDF/A, PDF/X and PDF/E.

‘Endangered’ in the Presence of Aggravating Conditions

Loss of context; loss of authenticity or integrity; external dependencies; poor storage; lack of understanding; significant diversity of data; poorly developed digitization specifications; lack of integrity checking; poorly developed migration or normalizations specifications; lack of virus control; poor storage or replication; lack of validation at the point of creation; encryption.

‘Lower Risk’ in the Presence of Good Practice

Well-managed data infrastructure; preservation planning; authenticity managed; use of persistent identifiers; reduction of dependencies; application of records management standards; recognition of preservation requirements beyond formats; strategic investment in digital preservation; preservation roadmap; participation in digital preservation community; format validation; version control.

2023 Review

The PDF entry was added in 2017 and was split into two entries, ‘PDF/A’ and ‘PDF other than PDF/A’, in 2019 to emphasize the different threats faced by different types of PDF.

The 2021 Jury agreed with this decision and noted that trends for the PDF other than PDF/A entry and the PDF/A entry were both towards a reduced risk.

The 2023 Council recommended merging the two previously split entries (‘PDF/A’ and ‘PDF other than PDF/A’). After reviewing the two entries separately, they found more similarities than differences between the two and indeed across all types of PDF (not just PDF/A). Due to the level of commercial, open-source tools that are available to assist preservation, the risk of loss is less persistent than previously suggested. Therefore a Vulnerable classification is appropriate for all PDF formats as whole.

Additional Comments

There is a lot of material produced and kept in PDF. Some of it is authoritative, in other words, the only available copy, while some of it is not. However, if it is the only copy and it is lost, it can have an impact on a lot of people

The challenge in evaluating the significance and impact of the loss of PDFs is that they’re quite often a surrogate of something else, whether a digitized record or a Word document, etc. Whether or not that record is retained may be a factor. We should also be considering PDF Portfolios, which are an extension of PDF 1.7. Portfolios contain embedded files and can include text documents, spreadsheets, PowerPoints, emails, Computer Aided Design (CAD) drawings.

Vulnerability also depends on if the PDF file conforms to the specific PDF/A standard or not. This is caused by a combination of 1) not conforming to the standard and 2) collection managers assuming that the file is resilient simply because it purports to be a PDF/A. This risk is less with the format and more with the understanding and experience in data management. Moreover, materials embedded in or attached to PDF/A-2 and PDF/A-3 may be at risk.

See also:

Read More

Published Research Papers

 Published Research Papers

   Endangered large

Completed research papers published in serials, monographs or theses which fall under specific collecting policies of research libraries or archives and are managed through dedicated repository infrastructures.

Digital Species: Research Outputs

Trend in 2022:

No change No Change

Consensus Decision

Added to List: 2017

Trend in 2023:

No change No Change

Previously: Vulnerable

Imminence of Action

Action is recommended within three years, detailed assessment within one year.

Significance of Loss

The loss of tools, data or services within this group would impact on people and sectors around the world.

Effort to Preserve | Inevitability

It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques.

Examples

Published research papers in scholarly E-Books and Electronic Journals; Electronic manuscripts; Electronic theses (E-theses).

‘Endangered’ in the Presence of Aggravating Conditions

Lack of documentation; lack of clarity with respect to intellectual property; embedded complex objects; unstable funding for repository; lack of strategic investment; complex external dependencies; lack of persistent identifiers; bespoke formats; lack of legal deposit mandate.

‘Lower Risk’ in the Presence of Good Practice

Strong documentation including intellectual property rights; clarity of preservation path and ensuing responsibilities; credible preservation plan; proven capacity of repository; legal deposit preservation copying; post-cancellation access service; persistent identifiers used consistently; non-proprietary formats used and validated; minimal or well managed external dependencies.

2023 Review

This entry was added in 2017 under 'Published research outputs,' though without reference to the capacity of the repository infrastructure. The 2019 Jury amended it to presume the existence of repository infrastructure and noted that the aggravating conditions (which introduce risks) and good practice enhancements (which reduce it) are most relevant to repository operations.

While the 2020 Jury found no change in trend, the 2021 Jury agreed it should remain Vulnerable but discussed improvements and initiatives towards the preservation of research data and outputs, pointing to a trend towards reduced risk.

The 2023 Council agreed with the Vulnerable classification, noting a slight decrease in imminence of action with no significant trends towards greater or reduced risk.

Additionally, the Council recommended that a nomination received for a new ‘E-theses’ entry would provide a valuable example to this entry rather than as a new, standalone entry. However, as noted in the additional comments below, a recommended rescoping of the entry planned the next Bit List will revisit this nomination again as part of a restructuring.

Additional Comments

The 2023 Council additionally recognize that further scoping and input are needed for this entry and recommend that the next major review revisit and restructure the entry, in particular looking at restructuring based on differences between:

  • Types of published material. There are different levels of risk relating to the published version of record of the research paper (typically hosted on a publisher or aggregator platform), research papers hosted on institutional open access repositories (typically the author accepted manuscript rather than the version of record), and E- theses (typically hosted on an institutional repository or similar platform, sometimes with a copy harvested by an aggregation service, such as Ethos). However, there is a chance of becoming too granular with entries if separating them by types.

  • The version of the record hosted on the publisher platform and the version hosted in open access repository. In other words, it might be a better question to ask where it is being published rather than what is being published. Preservation risks and considerations for these are quite different and would benefit from being assessed separately.

A 2023 nomination for E-theses highlights distinct risks tied to these digital published materials. E-theses tend to be sole documents which when published by universities may get harvested into other aggregators or resources but in many cases the only copy (with no physical/analogue copy) sits on an Institution's repository. In addition, many are deposited in PDF format (of many varieties and many don't even attempt to use PDF/A etc.) risking long term accessibility and re-use. However, the breadth of risks goes beyond just the PDF variety, as e-theses often include databases, audiovisual materials, websites, and more.

The loss of tools, data or services within this group would impact on people and sectors around the world. Particularly those involved with reproducibility and those wishing to use the datasets for further research.

Although there have been improvements in current practice, policies and workflows, there is still a significant corpus of information that was deposited before these improvements came into force. It is unlikely that there will be the time, will or resources to bring this information up to current standards.

See also:

  • Konstantelos, L., (2021) ‘Breaking down barriers in e-only thesis submission: how digital preservation contributes to the conversation at the University of Glasgow’, Digital Preservation Coalition Blog. Available at: https://www.dpconline.org/blog/wdpd/wdpd2021-konstantelos [accessed 24 October 2023]

  • Klungthanaboon, W., (2021) ‘From “research output'' to “research data'' - a willingness to move forward?’, Digital Preservation Coalition Blog. Available at: https://www.dpconline.org/blog/wdpd/research-output-to-research-data [accessed 24 October 2023]

  • Beagrie, N (2013) ‘Preservation, Trust and Continuing Access for E-Journals’, DPC Technology Watch Report 13-04. Available at: http://doi.org/10.7207/twr13-04

  • Morrissey, S, and Kirchhoff, A (2014) ‘Preserving E-Books’, DPC Technology Watch Report 14-01. Available at: http://doi.org/10.7207/twr14-01

  • Resources and recent outputs from Public Knowledge Project (PKP) Preservation Network, which developed to digitally preserve Open Journal Systems (OJS) journals. See Public Knowledge Project (n.d.) ‘PKP Preservation Network’. Available at: https://pkp.sfu.ca/pkp-pn/ [accessed 24 October 2023]

 

Read More

Local Network Storage

Local Network Storage

   Vulnerable small

Materials routinely copied or backed up to locally managed data storage facilities and able to be restored under institutional service arrangements.

Digital Species: Integrated Storage

Trend in 2022:

No change No Change

Consensus Decision

Added to List: 2019

Trend in 2023:

No change No Change

Previously: Vulnerable

Imminence of Action

Action is recommended as required, with periodic review every five years.

Significance of Loss

The loss of tools, data or services within this group would impact on many people and sectors.

Effort to Preserve | Inevitability 

Loss of material in this group could be entirely avoidable if provided the means to deploy proven tools and techniques.

Examples

Institutional or departmental network storage and institutional data centers based on technologies such as (NAS) Network Attached Storage, (RAID) Redundant Array of Independent Disks, (SAN) Storage Area Networks, JBOD (Just a bunch of disks), SPAN and related.

‘Endangered’ in the Presence of Aggravating Conditions

Encryption; lack of routine maintenance; lack of storage replication; over-dependence on a single supplier, technology or technician; insufficient documentation; single point of failure; political or commercial interference; failure of dependencies (e.g., power supply, controller software); overly aggressive compression; poor information security; lack of integrity-checking; lack of strategic investment; lack of warranty; unenforceable warranty, encryption.

‘Lower Risk’ in the Presence of Good Practice

Backup to different technology; backup to diverse locations; documentation of assets; integrity checking; preservation planning; refreshment planning; export functionality; resilient to hacking; selection and appraisal criteria; version control; resilient funding; technology watch; enforceable warranty; disaster planning and documentation.

2023 Review

This entry was added in 2019 to ensure that the range of media storage is properly assessed and presented.

The 2023 Council agreed with the current Vulnerable classification with overall risks remaining on the same basis as before (no change to the trend), while also noting a slight decrease in the effort needed to preserve and the imminence of action required when compared to the 2021 Jury review

Additional Comments

There has been a renewed interest in tape as offline storage is the only sure protection against advanced ransomware.

See also:

Read More

Pension, Mortgage and Insurance Records

Pension, Mortgage and Insurance Records

   Vulnerable small

Records of transactions for long-lived financial products and services contracted between individuals and corporations. These records typically contain or depend on significant amounts of personal information and outlast the infrastructure on which they were created.

Group: Sensitive Data

Trend in 2022:

No change No Change

Consensus Decision

Added to List: 2017

Trend in 2023:

No change No Change

Previously: Vulnerable

Imminence of Action

Action is recommended within three years, detailed assessment within one year.

Significance of Loss

The loss of tools, data or services within this group would impact on many people and sectors.

Effort to Preserve | Inevitability

It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques.

Examples

Applications, correspondence and ancillary records relating to pensions, mortgages and insurances and other contracts of long duration. This includes corporate databases, email, web archives and EDRMS, and may require some coordination of paper, microfiche, born-digital and digitized records. These records often include the scope and duration of the contract as well as any agreed changes during the lifetime of the product. It may also include evidence of mis-selling or other sharp practice, which only becomes apparent after the fact. This entry pertains to corporate records rather than personal records. 

‘Endangered’ in the Presence of Aggravating Conditions

Lack of corporate preservation planning; lack of preservation within the procurement of corporate systems; companies conflating backup with preservation; loss of integrity and authenticity; loss of context and connections to provide meaning; lack of preservation capability within agencies; lack of preservation voice at executive level; poor planning and roadmap for corporate infrastructure; proliferation of legacy systems; slapdash procurement or migration of new systems; mergers and acquisitions leading to confusion of corporate systems; lack of compliance, audit or accountability at operational levels; encryption.

‘Lower Risk’ in the Presence of Good Practice

Backup and documentation; use of open formats and open source software; considered data management planning; licencing that enables preservation; preservation capability in designated repository; resilient to hacking; selection and appraisal in place; authenticity and integrity of records managed; resilient funding and recognition at executive level; technology watch; regular preservation audits; accreditation and participation in the professional preservation community.

2023 Review

This entry was added in 2017 but was outside the competence of the judges to assess at that time. It was assessed in 2019 with additional expertise invited to the panel to support this assessment and reviewed again in 2020. The 2021 Jury agreed with that 2019 assessment and subsequent 2020 review, which classified these digital materials as Vulnerable with no trend towards greater or reduced risk. The 2023 Council agreed with the Vulnerable classification with the overall risks remaining on the same basis as before (no change to the trend).

Additional Comments

The importance of retaining documentation in any kind of legal agreement offers this kind of material more protection than most but legal organizations may conflate backup with preservation and not always have consistent records management systems.

See also:

Read More

Scroll to top