iPres 2019 session - NEW HORIZONS // Web Archiving

Also in this section
Blog Topics

Latest Comments

14 Things I Loved and Learned at iPRES 2025
- Villy Magero 5 months ago
  
  I am so proud of your work Ruby. Keep going you are destined for good things in the profession!
Archiving Facebook, Right Now
- Helena 8 months ago
  
  Thanks for sharing this Andy, it is such a useful read
The Data Recovery of it all: iPRES 2025
- Norah 6 months ago
  
  Interesting to learn that like many solutions or innovations there are a lot of adjustments to be ...

DPC Blog RSS Feed

Also in this section

Leontien Talboom

Last updated on 2 October 2019

Leontien is a collaborative PhD student at The National Archives, UK and University College London, her research is about access to born-digital material. She attended iPres2019 with support from the DPC's Leadership Programme which is generously funded by DPC Supporters.

The first session that I will be covering from iPres is on Web Archiving. My own research is around access to born-digital archival material, as web sites and other web material are one of the many examples of born-digital material, I couldn't miss this session! During this session three papers were presented, all with a slightly different approach to web archives.

Who is asking? Human & Machine experience a different Scholarly Web

In this first paper Martin Klein starts with an engaging thought experiment, where he compares using a DOI in internet browsers to calling the emergency services with a phone. Initially you would think these two topics have nothing in common, but as Klein explains that using the same service (in this case the emergency service) should not render a different result when using the same type of device (in this case a mobile phone, an older mobile phone and a landline). This should also be the case when using a DOI, it should also render the same result regardless of the internet browser or service being used to access the web page.

However, as Klein hightlights, this is for from true. When investigating the top 100 DOIs from the top 100 frequent domains in the Internet Archive's crawl of the scholarly domain different results appeared when different requests were used and this is concerning, as the DOIs are supposed to be a reliable and stable way of referencing academic articles.

Saving Data Journalism: Using ReproZip-Web to Capture Dynamic Websites for Future Reuse

The second talk was given by Vicky Steeves and Katherine Boss and focused on the preservation of News Apps. News Apps are interactive and dynamic websites set up by journalists, they have a database at the backend and sometimes even use external APIs. Having a backend or external APIs on top of the dynamic and interactive nature of these websites, make it difficult to archive them. Currently, web crawls are good at capturing the static web, but have a hard time capturing all the elements of the more dynamic web, such as these News Apps, which is a growing concern, as data journalism is growing.

Steeves and Boss propose a different way of archiving this material with an emulated web archiving tool called ReproZip Web. This tool should be set up in conjunction with the news room creating these News Apps and should be easy to use for them, as only four commands are needed to run the program.

Data Stewards and Digital Preservation in Everyday Research Practice

The last talk was given by Esther Plomp, who is a data steward at TU Delft. Data stewards are part of every faculty at the University in Delft and they help imporve the data produced by researchers. The improvement of this material is not done by the data stewards themselves, but they encourage and help the researchers in their faculty with this.

When this talk started it was unclear how this tied in to the other two web archiving talks, but Plomp was able to tie it in by giving an example of the websites of research projects that need to be preserved once a research projects comes to an end.

Question Round

After the talks, there was some time left for questions. These were mainly focused on Reprozip, and how this tool is sustained and if the public would ever get access to the preserved material, but also how willing news rooms are to preserve this material. The session was well led and the talks given were very inspiring. If you want any further details on the talks, the papers can be found here.

Add comment

14 Things I Loved and Learned at iPRES 2025

Archiving Facebook, Right Now

The Data Recovery of it all: iPRES 2025