Emily Chen

Emily Chen

Last updated on 11 June 2021

Emily Chen is Digital Archivist at the Parliamentary Archives. 


I actually started in the middle of the pandemic so I can’t tell you what business as usual looks like at the Parliamentary Archives (I’ve been told that biscuits figure hugely in it though). I am part of the Digital Preservation Team here, and so whilst some of my colleagues have begun returning to the office, I have only managed to be on site a grand total of two times since I started. To me working from home IS the norm. The joy and curse of being a digital archivist!

Whilst many things had to be put on hold in the middle of the pandemic, much of our work continued and was indeed able to expand. Primarily, in the areas of ingesting born-digital records into our digital repository and our archiving of the Parliamentary web estate.

A major focus of our ingest was on the roughly quarter of a million records pulled from our decommissioned ERMS (SPIRE), which reached end of life in 2019. Since then we had established a mature workflow that streamlined and automated a great deal of the cataloguing of these records. This meant that even with no prior digital preservation experience, it only took a bit of training for our cross-office colleagues to get to ingesting on their own. They were able to give us their invaluable help and support in steaming ahead with the ingest and over the period of April 2020 to March 2021 we ingested nearly 2/3rds of all the folders exported to date. This was around 13,000 individual folders (each corresponding roughly to one child workflow).

SPIRE Child Workflows Per Month (Mar 2020 - Mar 2021)

One of the benefits that came out of it (besides the obvious!) was that it gave many of our colleagues a chance to try their hand at digital preservation and demystify it a bit, as well as letting them see what our day to day work is like.

The other area where we made big strides in was wrangling our web archive into shape. The Parliamentary web estate has and continues to grow and change, and this past year has been no different.

We’ve seen some dramatic overhauls to large sections of the site. Some we were very glad to see go (old Calendar site, I’m looking at you) as they were a nightmare to crawl; with our crawlers getting stuck in endless loops. Some broke away from the main site and got their own domain names e.g. https://northernestate.uk and https://restorationandrenewal.uk but they’re still part of us and we will continue support and archive them. Some changes were less helpful, like a sudden mania for serving dynamic content whenever and wherever they can, but we’re working on it!

Overall, there has been a definite shift towards more dynamic images and content and much less text. This can be seen in these visual breakdowns of the file types crawled from the Parliament Week site in 22 Jan 2020 where roughly 75% of the content was text/html and images were a measly 3% to 12 Mar 2021 where text/html had drop to 27.5% and images had moved to take the lion’s share at 64%.

 MIME types for Parliament Week - 22 Jan 2020 / 12 Mar 2021

Over the course of the year a lot of work has been put into catching any technical glitches, checking for missing content and ensuring that the look and feel of sites were being properly captured. We’ve also taken steps to outline and formalise the QA process; prioritising problem sites and identifying common problems and their solutions.  During that time the day-to-day work of web archiving has grown from a largely solo (and heroic) effort to a group one with the help of keen volunteers from other teams.

While 2020 has shown us that we CAN work from home, I'm looking forward to meeting my colleagues in person and building on the connections we’ve made by sharing our work and living in each other’s homes.


Scroll to top