Project MinE, the largest genetic ALS research initiative to date
As the cause of sporadic ALS has not been discovered by researchers up to now, Project MinE has been set up – the largest genetic ALS research initiative to date that should overcome the power problems of previous genetic studies. The project involves ALS researchers worldwide and is a typical example of ‘bigger is better’. Large amounts of genetic data (‘big data’) should help to identify the cause of ALS to speed up the development of an effective treatment for this devastating disease. At the ALS Centre Netherlands, a team has been set up to handle this big data. Kristel Kool-Van Eijk is the bioinformatician in this team, which basically means that she makes sure that researchers can handle the enormous amount of genetic data.
Whole genome sequencing
Kristel explains why Project MinE is different from other research projects: “Project MinE researchers aim to analyse the genomes of 15,000 ALS patients and 7,500 control subjects worldwide with whole genome sequencing. Sequencing implies reading all genes of the genome. This generates more than 6 petabyte of data. This amount of data is almost unprecedented in medical research. Obviously, this amount of data needs the best and largest facilities for storing and analysing and these are now being arranged”.
Together with her research colleagues of ALS Centre Netherlands, Kristel is testing the pipeline for the analyses. It would take more than 1,600 core years to analyse the data of all 22,500 genomes on a normal computer. To efficiently handle all data, Project MinE needs more capacity for storage and analyses than any project before. Even with a super computer it will still take a few months to run the analyses.
Handling petabytes of data
To realise large, safe and fast storage and handling of this large amount of data, services of SURFsara, a Dutch non-profit agency, are being employed. Researchers of the ALS Centre Netherlands are now testing the speed of uploading data and analysing data with SURFsara, which guarantees safe and fast storage of petabytes of data. All participating ALS centres worldwide need to be able to upload their genomes and to retrieve the data of their genomes and perform their analyses. And these processes should not be too time consuming. As the supercomputer of SURFsara is the number 45 of the 500 most fast supercomputers worldwide, expectations are high.
To enable this important and innovative process of untangling the whole genome sequencing data of 22,500 DNA samples, the Netherlands ALS Foundation has approved a grant application of the ALS Centre Netherlands for the costs of data storage and analyses for the Project MinE consortium (called ‘Beyond Project MinE NL’). Among others, 1 million euro raised through the Amsterdam City Swim event in 2014 is dedicated to this part of Project MinE.