Preserve to Innovate: De-brief on Web Archiving Week 2017 in London

Sara Day Thomson

Last updated on 30 June 2017

From the 12^th to the 16^th June, the member institutions of IIPC and researchers from a surprising array of backgrounds gathered at the Senate House in London for Web Archiving Week – an amalgamation of the Archives Unleashed hackathon, the annual IIPC conference, and the ReSAW Conference. The week’s activities conveyed a healthy and vibrant picture of current capture and curation practices as well as the use of archived web content for research.

Lots of people in Senate House for the first day of our #WAWeek2017 conference! See what's next on the programme at https://t.co/89tjXb2IVD pic.twitter.com/VFPrb1GlTv
— SAS News (@SASNews) 14 June 2017

Talks and demonstrations highlighted the advances made by institutional web archiving programmes and services, from integrated access for NYARC to better metadata for discovery from the OCLC Research Library Partnership Web Archiving Metadata Working Group.

.@artlibrariannyc NYARC discovery tool contains also #webarchive material on Dutch #art. #WAweek2017 https://t.co/aAZkZ3rN1H pic.twitter.com/HycQigAfG8
— Kees Teszelszky (@keesone) 16 June 2017

Hey cool, great list of researchers and thinkers on this. Great to hear summary of an epic lit review from OCLC metadata group. #WAWeek2017 pic.twitter.com/3M76mstrPu
— Ian Milligan (@ianmilligan1) 16 June 2017

The programme also included new updates on the development of tools for curators and researchers as well as the creative use of existing tools. The Library Innovation Lab at Harvard demo-ed perma.cc, a shiny new tool for comparing an archived version of a website to its live counterpart. And Steven M. Schneider from SUNY Polytechnic Institute walked us through the use of Tiddlywiki for scholarly analysis of archived web objects.

Presentation of perma.cc : permanent records & #webarchive comparison #WAweek2017 pic.twitter.com/4VbuvKARvT
— Lucien Castex (@LucienCastex) 16 June 2017

#tiddlywiki a platform to hep researchers to analyse archived web pages https://t.co/JyqqwEy6YK #WAweek2017 pic.twitter.com/RulQ42441B
— Helena Byrne (@HBee2015) 15 June 2017

The diversity of research at the conference revealed an impressive and compelling range of ways that the study of the web provides a critical approach to almost every discipline. And, correspondingly, institutions have shown a new commitment to understanding user needs and building web archive collections researchers can use. Notably the user needs project at Parliamentary Archives carried out with the help of Peter Webster.

.@C_Fryer & @pj_webster encourage more conversation about meeting user needs at PA, it's not all about about #goatgate :) :) :) #WAWeek2017
— Sara Day Thomson (@sdaythomson) 16 June 2017

Anat Ben-David from The Open University of Israel presented on the DNS leaks from North Korea in 2016. As part of her research, Ben-David showed the disparity of access to North Korean websites from different nations. The United States (from where the Internet Archive crawls its collections) is only able to return a fraction of the North Korean webpages accessible in Russia. Even Europe has greater access to the highly secretive .kp domain. This finding makes a persuasive case for national web archives to share resources.

Really outstanding forensic work by @anatbd on the complexities pf making of a web archive for North Korea at #WAWeek2017 https://t.co/Ua4TmnHBS5
— Peter Webster (@pj_webster) 15 June 2017

Nicholas Taylor from Stanford University Libraries gave an illuminating overview of the court cases where web content (from the Internet Archive) has been introduced as evidence. Taylor reported that since the first uses in 2004, courts have come to generally accept Wayback Machine records as evidence. This research goes to show that questions about the authenticity and reliability of archived web content reaches far beyond the archive and library professions.

.@nullhandle: imp to help legal profs explain to juries how to interpret web archives & what assumptions/claims can be supported #WAWeek2017
— Sara Day Thomson (@sdaythomson) 16 June 2017

These research projects and many others reveal a growing interest in the study of web archives as an entire corpus rather than focusing on individual web objects. The volume of content accessible through web archives (especially national collections) has created a new resource for data analysis and longitudinal studies of trends over time. Institutions have begun to re-think their role as custodians and begun addressing the need to provide access to tools and services that enable users to access web archives as datasets, rather than ‘one at a time’. The UK Web Archive’s Jason Webber led a workshop on SHINE, a great example of a free, easy-to-use tool using the JISC UK Web Domain Dataset (1996-2010) that anyone can play with online. Even if you’re not a web researcher (yet!) and are just looking for a party trick this weekend, have a look at the Trends function. Some of us may or may not have analysed the use of our own name, place of work, hometown, alma mater, favourite band, well you get the idea…

@jasonmarkwebber demonstrating the 'trends' functionality of the 'shine' https://t.co/oLEliu9lr7 #WAweek2017 pic.twitter.com/eZuODuPEwn
— Sally Chambers (@schambers3) 15 June 2017

If you want to learn more about my talk on applying principles of digital preservation to social media archiving, head over the SAS blog to see mine (and others’) full conference papers.

Or if you want to version with pictures, here are my slides .

To sum up my experience, over the two days I was able to attend, I detected a very inspiring and encouraging theme surfacing across many of the talks and discussions. Archiving institutions and researchers alike expressed an escalating need to approach the web as it comes, rather than trying to flatten it to fit established research models and collecting practices. By trying to mould archived web resources into something that can be inserted into our current systems and ways of working, we distort meanings inherent to web technologies, platforms, and content. By doing so, we restrict the full breadth of knowledge that might be gleaned from the study of the web. If I take forward one lesson in my approach to archiving web content, especially social media and user-generated content, it is that we preserve to innovate. By ‘preserve to innovate’ I mean developing approaches to archiving and curating that reflect user needs and don’t interfere with the full range of possible uses. Our aim as archivists, librarians, curators, and tool developers is to maximise the potential investigations and experiments undertaken by researchers and journalists and policy-makers and all future users.

Web Archiving Week 2017 was a roller coaster of inspiring ideas and glimpses into the progress of web collecting and web research. Thank you to the organisers and all the wonderful speakers and participants. Until next time!

Can you believe that #WAWeek2017 is almost over? Thanks to both programme committees & PC chairs @jfwinters & @nullhandle @resaw @SASNews pic.twitter.com/TFVsoqmb6Q
— IIPC (@NetPreserve) 16 June 2017

Web Archiving Week was a big, sprawling conference – which goes to show the growth and popularity of web archiving and web research, but it also means I didn’t make it to every session. To learn more about the diverse range of talks and workshops, have a look at these other summaries:

Nicola Osborne’s transcriptions: http://nicolaosborne.blogs.edina.ac.uk/2017/06/14/
Harvard Library Innovation’s round-up blog: https://lil.law.harvard.edu/blog/2017/06/16/iipc-2017-day-two/
Web Science and Digital Libraries Research Group Old Dominion University trip report: http://ws-dl.blogspot.co.uk/2017/06/2017-06-26-iipc-web-archiving.html
The Bodleian’s take on the week: http://blogs.bodleian.ox.ac.uk/archivesandmanuscripts/2017/06/22/waweek2017/
Peter Webster’s thoughtful de-brief: https://peterwebster.me/2017/06/19/reflections-on-web-archiving-week-2017/

If you have a conference summary not listed above, please add it to the comments below!