Chelsea Goforth, Senior Data Project Manager at Inter-university Consortium for Political and Social Research (ICPSR).
Since the start of the COVID-19 pandemic, the Inter-university Consortium for Political and Social Research (ICPSR) has engaged in digital preservation activities to enable rapid data access, wide data sharing, and long-term data re-use of COVID-19 related data collections. Our preservation work supports social science researchers engaged in asking and answering questions about COVID-19 in order to soften its impact, especially for the most vulnerable populations.
Beginning in April 2020, ICPSR began working with the Research Data Alliance (RDA) to lend leadership and expertise in the creation of COVID-19 Recommendations and Guidelines, representing a global effort to increase the quality of and access to data being collected during the COVID-19 pandemic. The guidelines were written by and to be used across several major scientific domains, including the social sciences, for which ICPSR co-led the writing effort. The recommendations and guidelines are written for funders, researchers, repositories, and others involved in the data life cycle. The effort has allowed ICPSR to liaise with other research organizations, funding agencies, repositories from Europe, Africa, Canada, Australia, and South America, and over 600 data professional experts around the world, to offer suggestions so that social science infrastructure might facilitate easier and faster reuse of COVID-19 data. The 30+ writers working on the social sciences section emphasized the following:
-
retention of information to allow data linkage within and across domains;
-
enabling access to measures and data that might be used to adjust for selection bias thereby improving the representativeness of findings from limited samples
-
the desire to share data widely should be balanced by ensuring that human subjects protections are met and confidential data are kept secure; and
-
protecting the interests of vulnerable populations and indigenous populations in all data collection and sharing efforts.
The Data Sharing in Social Sciences section covers important aspects of the data life cycle to ensure COVID-19 data are findable, accessible, interoperable, and reusable whenever possible. In particular, in the Policy Recommendations subsection, the writers urge agencies funding social sciences to support not just data collection but also infrastructure for data archiving, preservation, and to cover infrastructure costs in the broadest sense, considering needs that include FAIR data and enable long-term preservation. They also recommend that various official data providers require minimum metadata that will allow linking across domains, and social sciences journals should require data availability statements for COVID-19 research articles upon their publication. Both of these stakeholder mandates would help push rapid COVID-19 data publication toward long-term considerations of data access and reuse across disciplines and domains. Moreover, in the Data Sharing and Long-term Preservation subsection, the Social Sciences subgroup authors extensively discuss long-term data access, emphasizing preservation as an important criterion for evaluating a data repository, the secure preservation of data linkage information, and the importance of preserving complete contextual information, including documentation and code.
In order to provide the infrastructure support for data sharing and preservation as recommended by the RDA, ICPSR created the COVID-19 Data Repository to host data projects that examine the social, behavioral, public health, and economic impact of the novel coronavirus global pandemic. ICPSR is particularly well-suited to respond quickly to the urgent research and data sharing needs necessary for pandemic response given the institution’s status as a trusted repository.[1https://www.go-fair.org/fair-principles/">FAIR principles, as noted in the RDA recommendations). Within the COVID-19 Data Repository:
-
data producers have access to a wide variety of rich metadata fields to describe their projects and increase the discoverability of COVID-19 data projects;
-
data producers and secondary data users can generate file-level citations for a data project in addition to the study-level citations;
-
data projects are indexed and thus searchable and discoverable within 24 hours of publication with an immediate distribution network of nearly 800 member institutions;
-
projects can be both versioned and linked to publications for even greater transparency and discoverability;
-
usage metrics based on the COUNTER Code of Practice for Research Data are included on every published project’s home page;
-
there are options for depositing restricted-access projects for sensitive or identifiable data; and
-
self-published project files are archived in ICPSR’s standard archival storage, which replicates holdings through multiple and varied methods and locations, and provides as-is, bit-level preservation and server-side encryption.
In addition, all self-published COVID-19 data projects can utilize ICPSR’s General Archive curation services once they’ve been published. ICPSR Data Curation entails making sure people can easily find and use data—now and in the future—by finding and fixing any issues with the data (e.g., standardizing missing data, dealing with outliers or other unexpected values, mysterious variables, correcting for documentation that includes too much or too little detail), adding/updating study-, file-, and variable-level metadata that adhere to the ICPSR Thesaurus, creating complete documentation, including an ICPSR codebook, producing files for major statistical packages (i.e., SAS, SPSS, Stata, R, ASCII) and for online analysis, and saving data & documentation in recognized archival formats. Thus, ICPSR is helping to ensure rapid and timely access to COVID-19 data projects through its initial self-publication in the COVID-19 Data Repository and access to high quality, well-documented data that have been organized, described, cleaned, enhanced, and preserved for public use by our team of professional curators.
We encourage all those interested to browse the COVID-19 Data Repository’s growing collection of pandemic-related data projects and/or share your own data projects to be indexed alongside other topically relevant research. Please email icpsr-help@umich.edu if you have any questions related to the COVID-19 Data Repository or about ICPSR services more generally.
[1https://www.coretrustseal.org/">CoreTrustSeal certified and the 2019 Winner for the National Medal for Museum and Library Services (IMLS), among other certifications.