The #DPClinic at the end of May, took the form of a “watch party” on the theme of database preservation. Peter Francis, Manager, Standards and Policy at PROV (Public Record Office Victoria), recently gave a presentation at an online event in Australasia, where he described using the SIARD Suite (Software Independent Archiving of Relational Databases) tools to convert and store database content. At the watch party, highlights from a recording of Peter’s presentation were shared with approximately 25 attendees, drawn from Europe and US.
Peter began by explaining how the SIARD Suite of tools was developed by the Swiss Federal Archive, and has gained acceptance in jurisdictions across Europe. SIARD is both a format (.siard) and a software application, and Peter discussed both aspects during his presentation. PROV has adopted SIARD as a long-term preservation format and now accepts transfers of relational databases as SIARD VEOs (VERS Encapsulated Objects). This approach enables data to be stored independently of the original database software and, when required, loaded into modern database application for reuse. The SIARD Suite currently supports the archiving of the following database types:
-
MS Access 2007 or newer
-
DB/2 or newer
-
MySQL (or MariaDB) 5.5 or newer
-
Oracle 10 or newer
-
PostgreSQL 11 or newer
-
SQL Server 2012 or newer
Peter then offered a demonstration of the SIARD Suite in action. After connecting to the source database as user (with all the permissions and constraints associated with that account), the software generated a SIARD file – essentially an XML file containing a representation of the database structure, data, procedures, and views). The next step produced a SIARD VEO file which, in addition to the previous SIARD file, also included archival metadata, and any documentation of the system’s design and use. This was a product-independent file, approximately the same size as the original database but suitable for long term storage, which PROV could then ingest into their preservation system.
To access the preserved database, Peter showed how the SIARD Suite could be used to create an empty relational database (in any of the supported formats), which was then populated with the stored data, thereby enabling PROV users to access the data and functionality of the original. He explained that PROV had undertaken a number of test scenarios, and found that a database of c.5-6GB could be re-created and made available using the SIARD Suite in under 30 minutes, using a standard office PC at PROV. Peter also explained that as part of their testing they had used an open source tool called SQuirreL SQL to compare the imported and exported databases, to confirm the successful transfer of data. Peter felt that because the SIARD Suite is open and straightforward to use, it provides a good entry point for any organization that wants to manage databases-as-records in the way he had described.
After watching the recording of Peter’s presentation, the following questions were raised by members of the audience. We passed these questions to Peter and he has kindly provided answers which are included below:
Q: What is the difference between the SIARD toolkit and the Database Preservation Toolkit (DBPTK) and why did PROV choose one over the other - do they have different functionality? What are the pros and cons?
A: Our decision was not based on which tool was superior, but which one suited our purposes at the time.
We haven't evaluated the latest versions of the DBPT (hopefully soon) but found the earlier ones interesting. They also offer functionality beyond the .siard format, such as creation of SIP and AIP.
We discuss some of the benefits of the SFA SIARDedit tool in the evaluation report. Our expectation (hope!) is that agencies will prepare their records for transfer to the archive. The availability of a GUI was a factor therefore, as it made the process more 'visible' to agency staff. There are some speed advantages in using command line (which SIARDedit also has). In our testing it was about 20% faster than the GUI. But, as I said, we had a focus on adoption first and foremost. Our thinking was, and is, that if we can build confidence and familiarity using SIARD, then we can introduce a range of other tools.
Q: When discussing the creation of the SIARD VEO file, Peter mentioned the importance of documentation – but who should create such documentation (i.e. is it best created by the people who manage and use the database?) and when should the documentation be created (i.e. is it important to try and get in there early before the database becomes 'legacy' and the knowledge about it disappears)?
A: It is rare to find a system that is well documented so much of it will need to be put together at the time of preservation/transfer. You may find that if the system was recently superceded, the people doing the new system may have generated some valuable documentation of the old system in order to do transformation and migration.
The people with an interest in the preserved database, in our case the transfer archivists, will naturally drive the documentation process. This is much harder once the trail has gone cold. If the system is still available, then it makes sense to capture screenshots of any relevant interfaces (input forms, query forms, results pages, etc.) along with any relevant reports the system generates. If possible, it would be great to talk to and observe someone using the system - they may even let you video them.
This all sounds like a lot. We have done very few transfers to date so we expect to simplify and refine our ideas on documentation as we do more. I should explain that, from 2015 to 2020 we systematically replaced our digital archival systems end to end. Our digital transfer people were heavily involved as SMEs and we even had to halt transfers for a couple of years. Then of course, COVID happened. We have only now returned to planning to increase our database transfers.
More information about PROV’s use of SIARD can be found here, including a link to their original evaluation report: https://prov.vic.gov.au/recordkeeping-government/a-z-topics/software-independent-archiving-relational-databases-siard
Peter’s presentation can be watched back in full from our #DPClinic event page.