• Johnathan Cooper-Knock, UK


Denis Bauer, Chen Eitan, Eran Hornstein, Alfredo Iacoangeli, Kevin P. Kenna, Natalie Twine, Natalie Twine, Nancy Yacovzada, John Quinn, Jack Marshall, Abigail Savage, Arash Bayat, Dennis Wang, Niamh Errington


Insights into genetic causes of amyotrophic lateral sclerosis (ALS) have underpinned almost all of our current understanding of the molecular pathogenesis. Estimates of heritability for sporadic ALS are as high as 61% (Al-Chalabi et al 2010) but present knowledge accounts for only a proportion of the genetic basis in <10% of patients. A large proportion of missing ALS heritability is likely to lie in non-coding DNA. Genetic association with ALS is significantly correlated with chromosome length (van Rheenen et al 2017) unlike the length of coding exons (Sakharkar et al 2004). In other disease areas the role of non-coding genetic association is increasingly recognised (e.g. Michailidou et al 2017). Analysis of non-coding sequence does not benefit from well described features such as exons and introns to say nothing of proteomics, all of which enable efficient prioritising of variants to identify likely pathogenic candidates. As a result novel approaches are needed. Machine learning and particularly artificial neural networks have, delivered best-in-class differentiation in fields as diverse as computer-vision and speech recognition with relatively little calibration. Increasingly these methods are being applied to biological problems with significant success (Angermueller et al 2016). We propose to apply both traditional and novel approaches take advantage of the rapid increase in sequencing data available and deliver a significant step forward for ALS genetics.


  1. Mapping different non-coding DNA categories, including (i) promoters & enhancers; (ii) transcribed non-protein coding RNAs  (e.g., introns, miRNAs, lncRNAs, siRNAs, piRNAs, snoRNAs, snRNAs, exRNAs and scaRNAs).
  2. Developing methods for calling qualifying variants in different subtypes of non-coding regions.
  3. Aggregating non-coding regions with functionally associated coding genes where possible.
  4. Linear analysis of burden of variants in non-coding elements and associated groups of elements.
  5. Increasing detection power for complex association patterns and measurable modifier effect via machine learning approaches including random forest and artificial neural networks.