TY - JOUR
T1 - Automated machine learning for Genome Wide Association Studies.
AU - Lakiotaki, Kleanthi
AU - Papadovasilakis, Zaharias
AU - Lagani, Vincenzo
AU - Fafalios, Stefanos
AU - Charonyktakis, Paulos
AU - Tsagris, Michail
AU - Tsamardinos, Ioannis
N1 - KAUST Repository Item: Exported on 2023-09-08
Acknowledgements: The research work was supported by the European Research Council (ERC) under the European Union’s Seventh Framework Programme (FP/2007–2013) (grant agreement no 617393), the METALASSO project, which is co-financed by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH– CREATE– INNOVATE (project code: T1EDK-04347) and the Hellenic Foundation for Research and Innovation (H.F.R.I.) under the “First Call for H.F.R.I. Research Projects to support Faculty members and Researchers and the procurement of high-cost research equipment grant” (Project Number: 1941). This study makes use of data generated by the Wellcome Trust Case-Control Consortium. A full list of the investigators who contributed to the generation of the data is available from: www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under award 076113, 085475 and 090355. We sincerely thank Professor Ioanna Tzoulaki for comments on the manuscript; Professors George Dedousis and Pavlos Pavlidis for fruitful discussions, Elissavet Greasidou for her help in data acquisition and cleaning. Several members of our mensxmachina research group for useful comments and Glykeria Fragioudaki for her administrative help on data access.
PY - 2023/9/6
Y1 - 2023/9/6
N2 - Motivation: Genome Wide Association Studies (GWAS) present several computational and statistical challenges for their data analysis, including knowledge discovery, interpretability, and translation to clinical practice.
Results: We develop, apply, and comparatively evaluate an Automated Machine Learning (AutoML) approach, customized for genomic data that delivers reliable predictive and diagnostic models, the set of genetic variants that are important for predictions (called a biosignature), and an estimate of the out-of-sample predictive power. This AutoML approach discovers variants with higher predictive performance compared to standard GWAS methods, computes an individual risk prediction score, generalizes to new, unseen data, is shown to better differentiate causal variants from other highly correlated variants, and enhances knowledge discovery and interpretability by reporting multiple equivalent biosignatures.
AB - Motivation: Genome Wide Association Studies (GWAS) present several computational and statistical challenges for their data analysis, including knowledge discovery, interpretability, and translation to clinical practice.
Results: We develop, apply, and comparatively evaluate an Automated Machine Learning (AutoML) approach, customized for genomic data that delivers reliable predictive and diagnostic models, the set of genetic variants that are important for predictions (called a biosignature), and an estimate of the out-of-sample predictive power. This AutoML approach discovers variants with higher predictive performance compared to standard GWAS methods, computes an individual risk prediction score, generalizes to new, unseen data, is shown to better differentiate causal variants from other highly correlated variants, and enhances knowledge discovery and interpretability by reporting multiple equivalent biosignatures.
UR - http://hdl.handle.net/10754/694204
UR - https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btad545/7261476
U2 - 10.1093/bioinformatics/btad545
DO - 10.1093/bioinformatics/btad545
M3 - Article
C2 - 37672022
SN - 1367-4803
JO - Bioinformatics (Oxford, England)
JF - Bioinformatics (Oxford, England)
ER -