Alicia Pastrana García and José Carlos Cerdán are part of the Non-Print Legal Deposit department at the National Library of Spain
I know, you are living it, you know how your day-to-day life is changing but, what about your grandchildren? Will they understand how this is changing our lives? The best view of our society is on Internet, especially on social networks. How will a researcher of the future understand this change if he does not have access to all the information that is flowing on the Web? Is anyone preserving all that information?
Yes, someone does!
At the National Library of Spain, we are doing just that, like many other national libraries around the world. We began nominating websites about the emergence and spread of the coronavirus in mid-February, responding to a call from International Internet Preservation Consortium (IIPC). But when the situation got worse in Spain, we decided to create our own collection, making a much more exhaustive selection. Thus, the first Spanish crawl was launched on March 10th. Since that date, both we and our collaborators from the regional libraries, within the framework of the Library Cooperation Council, began an intense work of searching and nominating the information published on the Internet related to this topic.
We have already collected one of the most important web collections in our history. The number of pages that appeared and are still appearing to the situation caused by the expansion of the coronavirus is immense and most of them will disappear once this great crisis is over. Web collections will become one of the largest sources of information about the situation caused by COVID-19.
Strength in numbers
As we said before, we had the help of our regional collaborators in the search and nomination of the websites and it was necessary to coordinate the work of all of them. Those were days of intense work. We tried to make sure that we did not lose the information that was being generated at the beginning of the pandemic. Their collaboration has allowed us to create a collection that covers both national and more local aspects.
Other departments of the Library also participated in the website selection work. The support of the Music and Audiovisuals Department has been especially important. They provided us with a wide selection of audios and videos about the coronavirus created during the pandemic, more than 700 until today. We add to this selection other videos such as official press conferences, national and regional health campaigns, radio programs and interviews with virologists, scientists and other specialists on the pandemic.
Finally, we decided to ask for public nominations through social networks. We provided the citizens with a web form so they could make their proposals.
But since a web curator is never sure that he is harvesting everything that will be useful to the researcher of the future (that’s the nature of the selection work) we decided to do our annual broad crawl sooner than scheduled, bringing it forward to July. In this way we took a snapshot of the Spanish web in times of the pandemic. For those who don’t know about web archives, a broad crawl saves absolutely all the websites in a national domain, in our case .es.
Some data (of those that quickly become obsolete)
Today, the collection consists of more than 5,000 websites that cover multiple types of content related to the disease, the created situation and its consequences. It contains both more official pages (public administrations, political parties, media...) and others arising in a more spontaneous way, such as citizen and neighborhood initiatives, family activities, memes, etc. It has also more than 1,000 social media profiles and hashtags and a large amount of videos and audios already mentioned. Approximately 50 Terabytes of information have been stored so far.
To be continued…
But this (unfortunately) is not over yet, the collection is still alive and we are still working on it.
We would like to think that we are working on a time capsule whose content will increase in value as the years go by and, when this pandemic is far away in memory, the collection will be used to relive and study what happened in this period and how we dealt with it.