• Alfredo Iacoangeli, Kings College, UK


A. Nazli Basak, Erşen Kavak, Kevin P. Kenna, Maarten Kooyman, John E. Landers, Christien Staiger, Natalie Twine, Kristel R. van Eijk, Joke van Vugt, Brendan Kenna.


We are nowadays facing the so called big-data problem. The amount of data that we are collecting thanks to the advent of high-throughput technologies, is growing exponentially. This is particularly true for a large-scale whole-genome-sequencing project such as Project Mine. In order to exploit this huge amount of data and make our science effective,we need both a powerful HPC/storage infrastructure such as SURFsafa, and a sophisticated e-infrastructure, capable of providing the members of our consortium with the appropriate tools/utilities to perform their science. This complex infrastructure has the aim of guaranteeing full accessibility to the data, portability and reproducibility of our science, and dealing with a wide range of issues due to the large number of collaborators involved in this project, e.g. data ownership, heterogeneity of national privacy laws, etc.

The Data Infrastructure Working Team has a key role in the success of Project Mine, and we aim to set a best practice standard.


  1. Ensure reproducible and consistent workflows in Project Mine / sharing of scripts and Next Generation Sequencing bioinformatic pipelines.
  2. Explore other ICT set-ups enabling faster computational methods where possible and as needed (e.g GPU’s, Apache Spark, etc.).
  3. Explore ICT set-ups enabling efficient data sharing with the consortium members/collaborators and the major external data hubs (AnswerALS, external control data sets).
  4. Advise on optimal data compression for Project Mine (to be decided in General Assembly).