Guest blogger Anna Perricci at Rhizome introduces us to the Webrecorder
In her recent post, Sara Day Thomson described how digital preservation can be a conversation stopper at parties and at passport control. I empathize though for me the puzzlement she describes is a real paradox: as our lives turn increasingly online so it seems obvious that some evidence of our collective neuroses, passions and creativities should be preserved. Perhaps the web’s most astonishing feature is the speed at which it has become indispensable. Yet as it becomes more crucial, so it grows in size; and as it grows so it becomes more complex: and so the tools necessary to manage and preserve those essential traces of our memory face a three-fold challenge of scale, complexity and expectation.
Enter Webrecorder: ‘web archiving for all!’
This began as specialist project to capture and manage net art for museum-quality conservation. But it should come as no surprise to DPC members to hear us argue that the subtle and complex demands of digital art offer a practical basis for much of the innovation needed by the wider digital preservation community. Digital art sits at the intersection of any number of key challenges for digital preservation so, as Dragan Espenschied, Rhizome’s Preservation Director, notes in this blog post what is good for net art is good for everyone.
Via a web browser Webrecorder collects content and data from web pages including: HTML, images, scripts, stylesheets, Flash, Java applets as well as video, audio and other elements used to make web pages and web apps.
In this way, it creates high fidelity archives that deals explicitly with the dynamic and personalised content that constitutes modern web browsing, and writes that data to a standardized file format (WARC). This dynamic web content cannot be captured by most crawler-based web archiving tools but this is an area where Webrecorder excels. It achieves this by having a symmetrical web archiving approach which uses the exact same software for recording and replay: a regular web browser. That means Webrecorder captures what you see when you are logged in to a social media profile (though it does not record site login credentials).
Webrecorder works at a human scale as it does not yet include automated tools such as classic web crawlers. It has 2 main configurations: an option to download captured content (as a WARC file) and save it locally; or an online stored version allowing you to build collections over time (via a free online account that comes with 5GB of storage space). Desktop software that can open a WARC file, such as Webrecorder Player, is needed to view web archives downloaded from Webrecorder.
Our most recent release in mid July included tools for extraction and patching from existing web archives. So any standard WARC (or ARC, or HAR) file, including those created by traditional web crawlers or older versions of the Webrecorder software, can be imported and added to users’ collections. This new feature makes it possible to add high fidelity recordings to an existing low fidelity data collection or indeed to import older collections. Additionally Webrecorder gives access to an emulated (remote) browser environment optimally configured for specific recording and replay, which ensures ongoing access to web archives in a fitting, contemporaneous browser. This feature is becoming increasingly essential as existing technologies, like Flash, are phased out and no longer available in newer browser releases.
User created collections can be kept private or made public. Public collections can be viewed by anyone and navigated via the bookmarks listed on the collections page or by using arrow buttons next to the box listing the archived page’s URL as you move from page to page in the archive. Each collection is a separate unit so at this time you can only navigate content within one collection at a time. To some this clearly defined boundary could be a limitation but on the other hand it can be very helpful in focusing users on the materials you have curated, including views of social media which can vary widely from one user to another. This could be compared to having an exhibition with a clear focus as opposed to an open storage approach in which a wide swath of materials shown may or may not provide a narrative or clear message to a viewer. More curatorial and collection management tools are being planned, which will extend the use of Webrecorder as well as provide users a clearer path as they navigate web archives.
For those interested in the technical details, Webrecorder is an open-source software (under the Apache License) and shared via GitHub. The Webrecorder Player is also available via GitHub. Another notable service created by Webrecorder’s lead developer, Ilya Kreymer, is OldWeb.Today, which gives users an opportunity to view web archives using a range of emulated web browsers contemporary to the 1990s through present day. Ilya has also been the primary developer of Python WayBack (pywb), the web archive replay and live web proxy system which serves as a core component of Webrecorder. Other projects using pywb include Perma.cc, the Portuguese Web Archive, MirrorWeb (including for the UK Government Web Archive), and Hypothes.is.
The Webrecorder team, myself included, have been thrilled to see how Webrecorder has been used so far and are very much looking forward to extending use. Some exemplar collections include: Our First Social Media President, Amalia Ulman’s Instagram performance Excellences & Perfections, and Marisa Olson’s American Idol Training Blog (2004-2005).
As online services come and go Webrecorder is a valuable tool to have at hand. For example, Vine, a highly popular sharing platform for six second videos, was shut down in early 2017 but Webrecorder does have some captures of Vines from the original site. These collections made by staff at the National Football Museum show some interesting Vines and this blog post discusses the significance of the media and platform.
Webrecorder has seen use beyond the cultural heritage sector as well. Webrecorder is a key part of the NetFreedom Pioneers’ innovative Toosheh satellite datacasting program. Toosheh employees use Webrecorder to record news website content to then send via satellite to people in Iran and neighboring countries. Thanks to Toosheh users can access a wide range of information harvested from the web daily by receiving data via their already installed satellite dishes. In a region where internet access is severely limited due to government restrictions, infrastructure and costs, Toosheh provides important information that connects users to worldwide conversations both efficiently and at no charge.
Since its beginning in 2014, Webrecorder and associated tools have made a large impact in a short amount of time. A sustainability plan is being formed and we, the Webrecorder team, are building a strong foundation both technically and organizationally. All of Webrecorder’s services have been free to users so far. Looking forward, if specific use-cases and integrations require additional support or storage we may need to charge for those services. Some premium features, which could contribute to Webrecorder’s long term viability, might be introduced in the future as well. Currently we are completing our project goals associated with generous funding from the Andrew W. Mellon Foundation. An interface redesign is in progress and new phases of development are being pursued.
Please stay tuned for more and contact us if you have any questions or feedback.
Byline: Webrecorder is a project of Rhizome under its digital preservation program led by Dragan Espenschied. It's currently developed by Ilya Kreymer with the assistance of Senior Front-End Developer Mark Beasley, Design Lead Pat Shiu, and Contract Developer Raffaele Messuti. Webrecorder has benefitted from research done by Lozana Rossenova, who is a PhD Researcher in Digital Archives Curation at London South Bank University as well as Rhizome. As I write this post I am working on contract as the Partnership Manager and Sustainability Consultant for Webrecorder.