Following on from our excellent Digital Preservation Futures Webinar Series with DPC Supporters Arkivum, Artefactual, AVP, LIBNOVA and Preservica… there were some questions we just didn’t have time to ask! In this blog post, our Supporters provide answers to those questions:
Our Members in Australasia were keen to find out what our Supporters offer in terms of time-zone support for help desks and associated development work?
Arkivum: Arkivum’s helpdesk is based in the United Kingdom. We provide a Zendesk Portal so that customers can raise support tickets 24 hours a day, 7 days a week. Support tickets are monitored and escalated across multiple tiers of support. Work is typically done during the United Kingdom working day of 09:00hrs to 17:30hrs unless an issue is critical to the customer. The escalation and resolution process involves the Arkivum Operations Team adhering to the Help Desk Management SOP as part of Arkivum’s ISO 9001 Quality Management System.
With regards to associated development work, the Arkivum onboarding process includes Discovery and Scoping workshops to capture detailed requirements for product configuration. This will generate a documented project plan that will be used for any required internal development work and product enhancements/capabilities.
Artefactual: Standard support is in North American Pacific Standard Time. We are moving to offering support in other time zones, starting in October of this year. Contact info@artefactual.com for more information.
If an organization is interested in developing a feature or additional functionality, Artefactual is the lead developer for both Archivematica and AtoM but it is open-source software so other organizations and individuals can also work on the code. Documentation for submitting development work for inclusion in the Archivematica public project can be found here: https://github.com/artefactual/archivematica/blob/stable/1.12.x/CONTRIBUTING.md
LIBNOVA: LIBNOVA Standard and Extended services do cover 3pm to 10pm Sydney time, but we also offer real 24/7 support options.
As we do not have customers in the Australasian region, we are not offering specific support in this time zones, like we do in the USA or Europe, but as we have done always, we will offer support in this time zones when required. For customers in this region asking for references, we provide them USA or EU based ones, that share a lot of commonalities. LIBNOVA Cloud platform is available in Sydney.
You can find answers to this question from AVP and Preservica in their webinar recordings.
With storage – knowing it is ‘cloud’ is not enough, some DPC Members have data that sometimes cannot leave the state (e.g. New South Wales, Victoria etc.) or the country – they wanted to know about the physical location of the servers/storage infrastructure?
Arkivum: Currently Arkivum uses public cloud services such as Amazon Web Services (AWS) and Google Cloud Platform (GCP) for hosting the service and storing customer data. Arkivum’s solution can be deployed using the preferred cloud provider (AWS/GCP) for a given region or zone and data will only be stored within that geographic area, for example for a specific state or country. For example, Arkivum already deploys our solution in the US, UK and Europe for this reason. As part of the Discovery and Scoping workshops, requirements for data replication and management are captured as part of the product configuration. For example, AWS has an Australia region (Sydney) and AWS supports the Australian Privacy Principles (https://aws.amazon.com/compliance/australia-data-privacy/). If data needs to be stored in a specific state that is not currently covered by AWS or GCP then it is possible for the Arkivum software solution to be deployed within a local data centre that provides similar infrastructure to AWS or GCP, e.g. object storage, VMs or Kubernetes for compute etc.
Artefactual: We have hosting infrastructure in Canada, the USA, the EU, the UK, and in Australia. In Australia specifically, data is hosted in our Sydney data centre. Contact info@artefactual.com if you would like more information.
You can find answers to this question from AVP and Preservica in their webinar recordings.
During episode 2 with Arkivum, William was poised to ask a question… but we didn’t have time for it. William asked:
"Access is often understood as being about a user as the reader, but with large quantities of data, I wonder about where access is actually a text miner or other machine learning system ..."
Arkivum: Access can be users, external applications that are integrated with the solution (e.g. DAMS, Institutional Repositories, CMS, web portals etc.), or, as William says, other systems that want to interrogate, analyse and retrieve content from the system. Arkivum takes an ‘API first’ approach which means all functionality in the system is available through REST API endpoints. Our UI is then built on the top of these APIs. This means that there is a full set of API calls available for external applications to ingest, manage, search, retrieve and export content in the system. For example, in the ARCHIVER project (https://arkivum.com/news/arkivum-and-google-cloud-platform-selected-for-archiver-prototype-phase/) we are working with PB scale scientific research datasets, including the ability for external applications to access/analyse these datasets for example for verifying previous scientific results or using the data for new scientific analysis.
Also during episode 2 with Arkivum, the team talked about an interesting EDRMS case study that they had encountered issues with, DPC Members wanted to know more…
Arkivum:
We are working with a customer who has a large EDRMS (millions of records). They want to export some of their EDRMS content for long-term archiving and access in the Arkivum solution. Most of the issues arise from the EDRMS being proprietary and not having a full set of export capabilities. Information that can be exported is limited and not in standard formats. The organization worked with a specialist third-party that knew the underlying way that the EDRMS stored and managed its records (e.g. within a SQL database) and used that knowledge to write extraction tools. The tool transformed the EDRMS internal content into a series of exports that contained metadata in XML and content as files on a file system. This was then further transformed to match Perpetua’s supported ingest format (e.g. bagit bags of files with CSV metadata) and then ingested.
And finally, in the same episode, DPC Members found the escrow / exit plan strategy interesting and asked to hear more about how the data and metadata is structured in that scenario:
Arkivum:
The Arkivum solution can be configured to export an extra copy of all your data in an externally accessible online storage area, referred to as the ‘Escrow’ copy (typically in Microsoft Azure). This is designed to be accessible independently of any commercial arrangement with Arkivum, providing you with the extra reassurance that you will always be able to retrieve your data independently of the Arkivum service should you need to do so. In this way, escrow offers a robust exit-strategy and means you are not dependent on Arkivum should you wish to migrate to another service provider or bring your data back in house. Data in the context of escrow means: your original files; any corresponding metadata ingested by you into the service; any extra versions of your files created by the service as part of digital preservation; any metadata extracted/created by the service; and a full audit trail of all processing and access to the data in the service. For example, if you have ingested a dataset/collection into the service then escrow will contain the full dataset/collection in its original form, plus any extra versions of the files that the system creates, plus an audit log of the dataset having been ingested along with any user access to its contents, plus any changes to files/metadata that you might have been done whilst the dataset/collection was in the system. Therefore, in the unlikely event that you are not happy with Arkivum in the future (e.g. commercially your business requirements fundamentally change) you have the peace of mind that you would be free to move to alternative service should you wish to without being dependent on Arkivum. It is important to note that escrow is in addition to all the data export and access facilities that the service already provides – escrow is there to provide extra reassurance in DR scenarios – the service already includes a full set of features for searching, finding, accessing, downloading and exporting data that can be used on a day-to-day basis.
Regarding the specific question about metadata and data structures, the metadata and data (bitstreams) are stored separately. The data is stored as a set of unique bitstreams and these can be placed on deep-archive storage if needed (e.g. to save costs). The metadata references these bitstreams and contains all the information about their context (e.g. original filenames, dates, permissions etc as well as associated descriptive, technical and administrative metadata). This includes information on the structure of the data, e.g. how a set of files are logically and/or physically arranged into folders/hierarchies etc. The metadata also includes a full audit log. The metadata will typically be held on faster access storage so it can be easily updated/extended/retrieved and hence kept in sync with the content in the Arkivum solution. Both metadata and bitstreams can be encrypted using if desired (in addition to any storage level encryption that might also be in place). To help customers retrieve and use escrowed content, Arkivum provides an open-source toolkit that will retrieve content from an escrow location, decrypt it (if needed), and reconstitute original filenames and folder structures – which, for example, might result in a series of AIPs on the user’s local filesystem. This can be done without any interaction with the Arkivum service or Arkivum as a business.
DPC Members can watch all of the episodes in the Digital Preservation Futures webinar series again by logging into the website to access the recordings.
Want to know more? Contact our DPC Supporters to continue the conversation.