This working group aims to determine the non-coding genomic variation by focussing on burden analysis on coding genes and associated non-coding elements and specific classes on non-coding RNA’s, with the use of a machine learning approach.
Denis Bauer, Chen Eitan, Eran Hornstein, Alfredo Iacoangeli, Kevin P. Kenna, Natalie Twine, Natalie Twine, Nancy Yacovzada, John Quinn, Jack Marshall, Abigail Savage, Arash Bayat, Dennis Wang, Niamh Errington
Insights into genetic causes of amyotrophic lateral sclerosis (ALS) have underpinned almost all of our current understanding of the molecular pathogenesis. Estimates of heritability for sporadic ALS are as high as 61% (Al-Chalabi et al 2010) but present knowledge accounts for only a proportion of the genetic basis in <10% of patients. A large proportion of missing ALS heritability is likely to lie in non-coding DNA. Genetic association with ALS is significantly correlated with chromosome length (van Rheenen et al 2017) unlike the length of coding exons (Sakharkar et al 2004). In other disease areas the role of non-coding genetic association is increasingly recognised (e.g. Michailidou et al 2017). Analysis of non-coding sequence does not benefit from well described features such as exons and introns to say nothing of proteomics, all of which enable efficient prioritising of variants to identify likely pathogenic candidates. As a result novel approaches are needed. Machine learning and particularly artificial neural networks have, delivered best-in-class differentiation in fields as diverse as computer-vision and speech recognition with relatively little calibration. Increasingly these methods are being applied to biological problems with significant success (Angermueller et al 2016). We propose to apply both traditional and novel approaches take advantage of the rapid increase in sequencing data available and deliver a significant step forward for ALS genetics.