Published Research Data Appended to Journal Articles

   Endangered large

Closed research data sets produced and documented in accordance with good practice and appended to a journal article or transferred to a repository that does not have sufficient subject-matter expertise or funding commitment to ensure reliable or ongoing preservation for the long term.

Digital Species: Research Outputs

Trend in 2023:

reduced risk Material improvement

Consensus Decision

Added to List: 2019

Trend in 2024:

No change No Change

Previously: Endangered

Imminence of Action

Action is recommended within three years, detailed assessment within one year.

Significance of Loss

The loss of tools, data or services within this group would impact on people and sectors around the world.

Effort to Preserve | Inevitability

It would require a small effort to preserve materials in this group going forward, requiring the application of proven tools and techniques.

Examples

Supplementary data sets added to formally published papers in repositories that are designed primarily for papers; electronic journals offering data sets without obvious preservation capacity; institutional repositories servicing highly complex scientific data sets with insufficient subject-matter expertise.

‘Endangered’ in the Presence of Aggravating Conditions

Complex mix of formats; deposit in repositories that lack relevant expertise or knowledge or funding; poorly designed migration or normalization processes; poorly formed ingest and quality assurance procedures; rapid churn of staff; incoherent patterns of subject matter; lack of domain knowledge; no or very small numbers of users; weak or absent collecting policy; deposit to ensure minimal compliance with funder mandate; limited or dysfunctional data management planning and documentation; uncertainty over IPR or the presence of orphaned works.

‘Lower Risk’ in the Presence of Good Practice

Clear data management planning and documentation; deposit by publisher in a trusted repository; deposit by author/s in appropriate repositories with digital preservation expertise and mandate; clear licensing to enable digital preservation and access; strong user base; development roadmap; ability to transfer collections or share metadata with subject repositories or portals; demonstrable re-use of data; clear collecting policy; data management planning early in the data lifecycle.

2023 Review

This 2019 entry was previously introduced in 2017 under 'Published Research Outputs,' though without explicit reference to the research data appended to journal articles. The 2019 Jury split the entry into a range of contexts for research outputs, including this addition and ‘Research Data Published through Repositories’. The entry draws attention to services that take upon themselves commitments to preserve research data, but which may not deliver those promises through lack of capability. The 2021 Jury agreed with the Endangered classification but commented on the improvements and initiatives towards the preservation of research data outputs, with good practice documentation and replication in this space (e.g., collaborations with publishers and repositories, LOCKSS, CLOCKS, etc.). For these reasons, the 2021 trend was towards reduced risk.

The 2022 Taskforce agreed on a trend towards reduced risk based on material improvement over the last year that had not only offered examples of good research data management and preservation practices but also suggested a significant shift towards a culture of change and collaboration across different research communities and stakeholders. Those mentioned included (but were not limited to) improvements and initiatives by the European Open Science Cloud (EOSC), Science Europe, Research Data Alliance (RDA), Digital Curation Centre (DCC) and related projects on the preservation of research data and outputs.

In light of the identified 2021 and 2022 trends, the 2023 Council changed the classification from Endangered to Vulnerable. They noted that many, if not most, HEI libraries that produce research are doing more in terms of research data management, and the activities in this area are growing and scaling up. Due to increased focus on this area, it was recommended that the classification change to Vulnerable with 2023 trend of ‘Material improvement’. 

2024 Interim Review

These risks remain on the same basis as before, with no significant trend towards even greater or reduced risk (‘No change’ to trend).

A Council member recommended that, to add further clarity, it might be worth differentiating use cases—for closed research data sets produced and documented in accordance with good practice and appended to a journal article, and for closed research data sets produced and documented in accordance with good practice and transferred to a repository that does not have sufficient subject-matter expertise or funding commitment to ensure reliable or ongoing preservation for the long term.

Additional Comments

A number of aggravating conditions—those relating to poorly formed ingest and quality assurance procedures, rapid churn of staff, incoherent patterns of subject matter, lack of domain knowledge, no or very small numbers of users, weak or absent collecting policy, and deposit to ensure minimal compliance with funder mandate—are problems with some repositories, not all repositories.

Presenting different use cases can tease apart the use case for supplementary materials appended to journals (e.g., which CLOCKSS and Portico preserve) and those in repositories that are perhaps not tailored for this use case. Cases where data is transferred to a repository that does not have sufficient subject-matter expertise or funding commitment to ensure reliable or ongoing preservation for the long term are far more at risk.

Research data is complex and has specific requirements for documentation which may only be known to subject matter experts. However well intended, it is risky for institutions to attempt to replicate that level of expertise across all the domains within the institution, and it can be hard for smaller publishers to make commitments to sustain data in the long term.

The loss of tools, data or services within this group would impact on people and sectors around the world. Particularly those involved with reproducibility and those wishing to use the datasets for further research.

Although there have been improvements in current practice, policies and workflows, there is still a significant corpus of information that was deposited before these improvements came into force. It is unlikely that there will be the time, will or resource to bring this information up to current standards.

UK funders e.g. UKRI-NERC Environmental Data Service are educating researchers about data policies which mandate depositing master and raw data at the funder disciplinary repository. These repositories have a strong expertise in the research discipline ensuring data and metadata standardization and quality assurance. Any copies of datasets published in journal articles or similar are considered secondary copies and do not comply with data policy, hence risking obtaining future research funding by the institute attempting to use journal outputs as their funder-acknowledged datasets.

The significance and impact of this entry specifically depends on whether it is the only copy of the dataset in existence, or whether there is another copy hosted in a data repository.

Case Studies or Examples:

  • Analysis from Martin Eve of CrossRef shows scholarly content at risk. The findings, based on the assessment of around 7.5 million of the e-books and articles for which CrossRef provides a fixed identifier or Digital Object Identifier, suggest that around a quarter of academic publications are not being preserved for the future. For c. 2 million articles in the study there were no evidence of them being preserved, and 4.3 of works studied were preserved in at least one place. See: Eve, M. P. (2024) ‘Digital Scholarly Journals Are Poorly Preserved: A Study of 7 Million Articles’. Journal of Librarianship and Scholarly Communication 12(1). Available at: https://doi.org/10.31274/jlsc.16288

  • The FAIRsharing Collaboration with DataCite and Publishers. See McQuilton, P., Sansone, S.A., Cousijn, H., Cannon, M., Chan, W.M., Carnevale, I., Cranston, I., Edmunds, S., Everitt, N. and Ganley, E., (2019) ‘FAIRsharing Collaboration with DataCite and Publishers: Data Repository Selection, Criteria That Matter’. Available at: https://doi.org/10.17605/OSF.IO/N9QJ7

  • Resources and research outputs from the Enhancing Services to Preserve New Forms of Scholarship project, which examined a variety of enhanced eBooks and identified which features can be preserved at scale using tools currently available. Of particular note is the published guidelines for preserving new forms of scholarship. See Greenberg, J., Hanson, K., & Verhoff, D. (2021) ‘Guidelines for Preserving New Forms of Scholarship’ NYU Libraries. Available at: https://doi.org/10.33682/221c-b2xj.

  • The work by the Centre pour la Communication Scientifique Directe (CCSD) of France and the Confederation of Open Access Repositories (COAR) in creating a preprint repository directory which has been relevant to building a user community). See Centre pour la Communication Scientifique Directe (CCSD) of France and the Confederation of Open Access Repositories (COAR) (n.d.) ‘Directory of Open Access Preprint Repositories’. Available at: https://doapr.coar-repositories.org/ [accessed 24 October 2023]


Scroll to top