Maria Praetzellis is Web Archiving Program Manager for the Internet Archive
Community Webs cohort members at Internet Archive Headquarters, San Francisco, CA
We are known for many things at the Internet Archive: a place to play long lost arcade games, listen to old 78 rpm records, or, most recently, finding that old MySpace track “accidently” deleted from the live web. Most people know us by the Wayback Machine, as we are the oldest and largest publicly available web archive.
The Internet Archive’s success in preserving over 45 petabytes of unique data is due in large part to numerous collaborations with libraries, archives, and other organizations across the globe. As relatively new members of the DPC (#80), we thought we’d take this opportunity to introduce ourselves by sharing some current collaborations, hopefully sparking interest from the DPC community to join us in future endeavors.
Community Initiatives
With some notable exceptions, the majority of web archiving has happened in academic and government libraries. We have been working in partnership with a number of specific organizational communities to enable them to start or scale their efforts to provide access to preserved materials from the web. Our Community Webs program is working to provide education, training, and infrastructure, via our Archive-It service, for public librarians to develop expertise in web preservation for the purpose of documenting local history and the unique citizens and cultures of their region. Many public libraries have active local history collections of print materials, yet public libraries have historically constituted a small percentage of the web archiving community. We hope this effort can diversify and localize the overall archive of web-published records, while also providing a foray into born-digital collection for many smaller institutions. Community Webs has been successful in bringing numerous public libraries into web archiving and all the open educational resources and training materials from the program will be made publicly available soon. We are interested in bringing this program to other nations and welcome ideas and partnerships from the DPC community to help expand globally.
We also aim to take this program model and adapt it to other communities largely underrepresented in digital preservation, such as city or municipal archives, local historical societies, or other special collections or community archives. As an example, we are working with are art libraries, who also have long served as stewards of the physical materials documenting the histories of the art world including local galleries, artist groups, and events. To pursue a similar collaborative, multi-institutional approach to scaling institutional capacity within art libraries for preserving web content, we are working on an effort with the New York Art Resources Consortium to catalyze collaboration among art libraries in the stewardship of historically valuable art-related materials published on the web and recently held a two day national forum in San Francisco. We are excited to expand these program models to other communities, institutions, and countries.
Technology Development & Services
Internet Archive is also known for open-source and collaborative technology development. Recent work includes system integrations through the development of data transfer APIs in support new research and access tools. Working with a range of partners, the Web Archiving Systems API (WASAPI) project created a standardized mechanism for export and import of archived web data between systems for preservation, distribution, and research use. This work has enabled our collaboration with the Archives Unleashed project’s excellent work bringing computational research to curated collections of archived web data. We are also developing service partnerships for distributed digital preservation with LOCKSS, preserving research data, and archiving open access scholarly works at web scale and with a number of European partners.
But wait, there’s more!
Beyond preservation and access to born-digital materials, the Internet Archive has other initiatives that we hope can further the stewardship and open knowledge communities. Internet Archive’s Open Libraries program uses controlled digital lending to deliver more than 900,000 volumes of digitized texts to readers and researchers all over the world. Through controlled digital lending, libraries use controls to ensure an “owned-to-loaned” ratio, meaning the library circulates the exact number of copies of a specific title it owns, regardless of format, putting controls in place to prevent users from redistributing or copying the digitized version. While CDL is based on US copyright law, Internet Archive seeks international partners that are interested in implementing CDL for their collections. Lastly, Our Great 78 projects is digitizing 78 records and making them available to listeners online. We’ve been digitizing full library collections of these awesome recordings. If you have 78 collections begging to be digitized, please do get in touch.
Meet and Greet
This year we’ll be holding a meeting for all our international partners to coincide with the iPres annual conference in Amsterdam, September 17-18. All are welcome! Those in the web archiving community are especially encouraged to join. And all DPC members are welcomed to visit our unique, former-church headquarters in San Francisco if you are ever in town. Just drop us a line and we will give you a tour and maybe even buy you lunch or a California IPA beer.
Feel free to reach out to us about any of the projects mentioned or if you see potential for new collaborations or partnership. Keep on preservin’, DPCsters!