Dirk Duellmann is the Head of Scientific Computing Collaborations group, CERN IT department
CERN and the High Energy Physics (HEP) community in general have a long tradition of applying Open Science concepts already well before today's term "Open Science" had been invented. Due to the size and complexity of large-scale particle accelerator and detectors projects, our community was forced from early on to consolidate effort and share key infrastructures such as accelerators, computing resources (both at CERN and world-wide partner labs) and also its crucial software investments. This culture has, in comparison to other sciences, created an early coherence across the HEP community that facilitated the collaborative development of open software over the decades, and enabled an open exchange of data analysis methods and science results.
Multiple Motivations
In the past, the interest in large-volume data analysis and the availability of suitable computing resources were limited to only few scientific areas. However, with the advent of commercial Big Data and Machine-Learning, both are becoming increasingly commonplace. The value of further opening the access to our data (and hence extending the scientific dialog) beyond HEP is today seen by many. Consequently, funding agencies increasingly require this step of opening data access for newly funded projects. To assist this movement, the CERN Open Data Portal[http://cern.ch/opendata] has been developed by the IT department with contributions from many partners across CERN and beyond and already today provides access to more than 2 PB of data from LHC and other CERN experiments in a single, curated, searchable web application.
Openly available datasets already facilitate the creation of more realistic outreach and more effective training material, and tie-in machine-learning experts in HEP with the rapid evolution in this field outside [https://www.kaggle.com/c/higgs-boson, https://www.kaggle.com/c/flavours-of-physics]. Also, theory groups external to LHC collaborations have demonstrated their increasing interest and computational means by complementing LHC collaboration results with independent studies. This further demonstrates the increased return to the community at large by opening access to CERN data.
Open Data Policy Working Group
In addition to openness and sharing, the HEP community highly values complementarity by comparing the different detector designs and analysis approaches developed by independent collaborations in overlapping research areas. In this environment, a common policy statement such as a "Open Data Policy" needs to be carefully introduced in order to not disturb the friendly competition between simultaneously active collaborations.
After preparatory discussions at the Worldwide LHC Computing Grid [https://cern.ch/wlcghttps://indico.cern.ch/event/858039/] and a presentation to the Scientific Policy Committee, the CERN Directorate tasked earlier this year a working group across LHC experiments to propose a statement on behalf of the LHC community and potentially to later be adopted by other experiments.
The Open Data Working Group - with representation from LHC experiment physics and computing management, the CERN library unit and the CERN IT department - has met over the last few months and drafted a public policy statement expressing a common view concerning the benefits of open data and the commitment to follow a common approach for opening the data relevant for analysis. The experiments further exchanged practical considerations, assembled an implementation plan for the upcoming data releases and agreed to review this plan via the Open Data Working Group across the LHC run periods.
The policy draft aims to address potential risks for the collaboration funding models, and to mitigate possible adverse impact from scientific results which may not follow the strict publishing mechanism within LHC collaborations. Both risks are managed via the planned opening of statistically relevant experimental data sets, and an agreed embargo period favouring an initial data exploitation by the collaboration that built the detector and collected the data.
Last but not least, a sustainable policy for data access by the community at large will require some dedicated computing resources to remain available. A concrete discussion on how to allocate the necessary development and media resources is now enabled by the joint policy statement.
The resulting policy draft has recently been passed to the respective collaboration bodies and should be fully approved by the LHC experiments later this month.