The data infrastructure working group deals with issues as data storage, data access and reproducibility.
A. Nazli Basak, Erşen Kavak, Kevin P. Kenna, Maarten Kooyman, John E. Landers, Christien Staiger, Natalie Twine, Kristel R. van Eijk, Joke van Vugt, Brendan Kenna
We are nowadays facing the so called big-data problem. The amount of data that we are collecting thanks to the advent of high-throughput technologies, is growing exponentially. This is particularly true for a large-scale whole-genome-sequencing project such as Project Mine. In order to exploit this huge amount of data and make our science effective,we need both a powerful HPC/storage infrastructure such as SURFsafa, and a sophisticated e-infrastructure, capable of providing the members of our consortium with the appropriate tools/utilities to perform their science. This complex infrastructure has the aim of guaranteeing full accessibility to the data, portability and reproducibility of our science, and dealing with a wide range of issues due to the large number of collaborators involved in this project, e.g. data ownership, heterogeneity of national privacy laws, etc.
The Data Infrastructure Working Team has a key role in the success of Project Mine, and we aim to set a best practice standard.