DPA2024 Finalists Legacy ArquivoThe Web is the largest and most widely used source of digital objects. Despite these objects becoming immediately accessible to millions of people as soon as they are published, most of them are solely hosted at their original source and are at-risk of being irremediably lost. Therefore, ready-to-use tools and services to safeguard digital objects published online are required to safeguard this invaluable digital legacy for future generations.

Arquivo.pt is a public digital preservation infrastructure that enables anyone to store, search and access historical digital objects preserved from the Web since the 1990s. It contains over 20 billion digital objects (1.3 PB) in multiple formats and languages, acquired from websites from all over the world. About half of Arquivo.pt users come from outside of Portugal.

The main objective of the Arquivo.pt Catalog is to support the preservation of the born-digital information published online that rules modern societies by providing a toolkit of services freely accessible to a broad scope of users so that any Internet user can contribute to the digital preservation lifecycle of objects published online. As different users may have different needs regarding digital preservation, providing a comprehensive Catalog of tools potentiates the fulfilment of most requirements.

The Arquivo.pt Catalog of tools for digital preservation was officially launched in October 2023 after 15 years of iterative development. It is composed of 13 running tools/services listed at https://arquivo.pt/catalog to support the preservation of online digital objects from their acquisition to dissemination:

  • Search and access (arquivo.pt): includes full-text search, image search, version history listing, advanced search, automatic generation of narratives and replay of web-archived content with 6 complementary options (e.g. Technical details, Complete page or Reply with old browser);

  • Application programming interfaces (arquivo.pt/api): facilitates the development of added-value applications by third parties to support search over URL, full-text and images (Arquivo.pt API, Image Search API, CDX-server API, Memento API);

  • Suggest websites (arquivo.pt/suggest): any Internet user can suggest websites to be safeguarded. The users only need to submit the address of the homepage and optionally provide an email, so that they can be notified when the suggested website becomes available at Arquivo.pt, and assess the quality of the web-archived content;

  • SavePageNow (arquivo.pt/savepagenow): allows users to immediately perform high-quality archiving of a set of web pages in Arquivo.pt using a browser-based crawler. The users only need to enter a page’s address and start browsing so that all the visited content is preserved, which facilitates the complete archiving of a small website to be carried out autonomously by the users;

  • Integration of historical web data collections (arquivo.pt/donate): Arquivo.pt began archiving web content in January 2008. However, donations of historical web content previously published have been received from external sources to be safeguarded;

  • Training (arquivo.pt/training): is a free training programme that aims to raise awareness about the importance of preserving the digital legacy published online. It is composed of four modules: “New ways of searching the past”, “Well publish to well preserve”, “Automatic processing of information preserved from the Web” and “Web archiving: Do-it-yourself!”. This training programme was the seed for the publication of the book “The Past Web: Exploring Web Archives” which aggregated contributions from worldwide experts in web preservation and research; 

  • Open data (arquivo.pt/dadosabertos): are datasets containing metadata about the preserved digital objects useful for third parties, such as lists of URLs that document elections. These datasets have been reused and improved by other organisations also interested in preserving this digital legacy (e.g. Museums);

  • CitationSaver (arquivo.pt/citationsaver): extracts links from documents and safeguards the targeted digital objects so that they can be later retrieved from Arquivo.pt. Conventional documents meant to be printed (e.g. in PDF format) cite online digital objects by referencing their URLs. However, when these links become inaccessible, even printed documents lose their integrity because their citations become useless;

  • Arquivo404 (arquivo.pt/arquivo404): presents preserved digital objects instead of error messages (e.g. “Error 404: Page Not Found”). Webmasters just need to insert one single line of code in the page that generates the 404 error message. When a user tries to access a page that is no longer available on a website, Arquivo404 automatically checks if there is a version of that page preserved in a configurable set of web archives using the Memento protocol;

  • Memorial (arquivo.pt/memorial): safeguards the digital objects which compose a website after its deactivation. Costs grow as websites become older because of the obsolescence of supporting technologies and consequent dangerous security vulnerabilities. The Memorial offers high quality preservation of historical web-content that enables maintaining the original domain name of the deactivated website, keeping its content searchable through live-web search engines and avoiding broken links to its pages;

  • High-quality archive (on-demand): conventional crawlers quickly collect large amounts of information but sometimes miss rich media, such as embedded videos. This service enables high-quality archiving of selected websites which are iteratively archived and curated using the best combination of technologies available;

  • Creation of collections and thematic exhibitions (arquivo.pt/expos): are online exhibitions of safeguarded web pages organised by theme curated in collaboration with external entities which are field experts such as press, radio, municipalities, R&D units, schools or museums. Each exhibition is followed by dissemination campaigns promoted by the external organisations which expand the awareness to the importance of digital preservation to new audiences;

  • Itinerant exhibition of posters at external institutions (arquivo.pt/posters): the down-side of preserving exclusively born-digital artefacts is that it becomes a challenge to catch the attention of potential new users in the physical world. Many digital preservation initiatives rely on digital methods to preserve printed documents. We reversed this strategy and printed a set of posters featuring historical digital objects published online to raise awareness about the pertinence of preserving born-digital legacy.

The Arquivo.pt Catalog of tools for digital preservation is an innovative and comprehensive toolkit to safeguard digital legacy published online for future generations available to anyone.

Latest Comments

Scroll to top