This blog post has been written by Laurence Brewer, Elizabeth England, Lisa Haralampus, Leslie Johnston, and Markus Most at National Archives and Records Administration (NARA)
Background
The U.S. federal government has signaled how important web content is to the public documentation and experience of government. According to U.S. government website tracking data, every 90 days there are over 5 billion visits to government websites. Enabling long-term preservation of government website content is critical to public understanding of the government and its history.
NARA implemented a program in 2006 for the capture and archiving of Congressional web records in the Center for Legislative Archives. The web sites for each Congress have been archived since that date and are available to the public at webharvest.gov. This effort is supported by a vendor for both the crawling and the hosting of archived sites for public access. NARA has taken a different approach regarding Presidential web records by preserving and presenting whitehouse.gov and related sites after the end of each Administration.
Image Description: Screenshot of Archived Clinton White House Website from July 2000, available at https://clintonwhitehouse3.archives.gov/
In 2022, the U.S. National Archives and Records Administration (NARA) chartered a Web Records Archiving Project working group to identify policy needs and create recommendations for capture, processing, preservation, and access to federal government web records as part of the National Archives. The group consists of staff from across the organization, representing experts from all parts of the records lifecycle. The working group serves in both an advisory and functional capacity to ensure NARA's proposed federal web archiving activities meet NARA's mission.
Building the Team
NARA anticipated that implementing a web archiving program would be challenging. From staffing, to funding, to technology, web archiving will require changes to all aspects of the records lifecycle from policy to public access. The broad impact on the organization made it imperative for us to collaborate across almost every office at NARA: the Office of the Chief Records Officer that provides guidance for agencies and would issue policies for the selection and capture of websites; custodial units responsible for processing and description; digital preservation; the information technology team that would have to support the infrastructure; and the Office of Innovation with responsibilities for the internal description tool and the National Archives Catalog. The planning effort would not have been a success without the active support of NARA’s leadership, and the collaborative and inclusive participation of staff who would have a role in every part of the proposed web archiving lifecycle.
Understanding the Environment
One of our working group’s first steps was to ensure everyone had a shared understanding of the broader environment of web archiving. We began by interviewing staff at the National Archives already involved in Congressional and Presidential web archiving. Then we engaged with other national archives and libraries, U.S. state government archives, the International Internet Preservation Consortium, and Federal government agencies with existing web archiving programs. It was extraordinarily helpful to understand the different collecting scopes, funding and staffing levels, technical approaches, and processes used by the varied institutions, which allowed us to identify requirements and recommended best practices for a NARA web archiving program.
As a critical part of our planning, we also needed to understand the expectations of user groups. We hosted discussions with researchers who use web archives in their work and teaching, including the use of web archives as datasets, as well as public interest groups with a focus on federal web content preservation and ongoing access. Their feedback increased our awareness of the diverse uses for web archives and will shape the public access goals of our proposed program.
What Comes Next
The working group has completed its first phase of work to conduct benchmarking, evaluate potential approaches, and develop recommendations for NARA to implement a web archiving program. This month, our working group began a new 18-month phase, where we will develop draft guidance, consider different public access approaches, document metadata requirements, and identify a feasible technical strategy needed to implement a web archiving program. We appreciate all the people who made the time to share their work and answer our questions; continued collaboration across NARA and with outside communities will be foundational for a web archiving program to succeed.