25 April 2022

Digital Collections Management and Services requires contract support to analyze the technical characteristics of complex heterogeneous eBook and eJournal content accessible in the Library of Congress’ (LOC/ Library) onsite access platform Stacks to inform preservation planning. This content is published, born digital material acquired from a wide range of publishers through the Cataloging in Publication Program (CIP), and Copyright Deposit through the U.S. Copyright Office. 

The contractor shall analyze the technical characteristics of complex heterogeneous eBook and eJournal content accessible in the Library’s onsite access platform Stacks to inform preservation planning. Using specialized tools such as Apache Tika, this research helps understand the structure and composition of over 50,000 ePub files, 100,000 PDF files, and a small number of XML/ONIX for Books, JATS and HTML files. Many of these sets of files contain embedded data such as audio, video and other interactive features that are not fully transparent. This research will inform action plans for access and preservation.

The deliverables from the project shall include:

  • Comparison matrixes for characterization tools and tools for rendering for eBook and eJournal supporting formats

  • Gap analysis for unmet needs in tools for specific formats

  • Report detailing process and outcomes with LOC-focused and community wide recommendations

  • Meeting (method to be determined) with LOC staff and selected community members to discuss project results and recommendations


Scroll to top