Department of Computer Science; Department of Mathematics; Institute of Computational and Theoretical Studies
Genome-wide association studies (GWASs), which assay more than a million single nucleotide polymorphisms (SNPs) in thousands of individuals, have been widely used to identify genetic risk variants for complex diseases. However, most of the variants that have been identified contribute relatively small increments of risk and only explain a small portion of the genetic variation in complex diseases. This is the so-called missing heritability problem. Evidence has indicated that many complex diseases are genetically related, meaning these diseases share common genetic risk variants. Therefore, exploring the genetic correlations across multiple related studies could be a promising strategy for removing spurious associations and identifying underlying genetic risk variants, and thereby uncovering the mystery of missing heritability in complex diseases. We present a general and robust method to identify genetic patterns from multiple large-scale genomic datasets. We treat the summary statistics as a matrix and demonstrate that genetic patterns will form a low-rank matrix plus a sparse component. Hence, we formulate the problem as a matrix recovering problem, where we aim to discover risk variants shared by multiple diseases/traits and those for each individual disease/trait. We propose a convex formulation for matrix recovery and an efficient algorithm to solve the problem. We demonstrate the advantages of our method using both synthesized datasets and real datasets. The experimental results show that our method can successfully reconstruct both the shared and the individual genetic patterns from summary statistics and achieve comparable performances compared with alternative methods under a wide range of scenarios. The MATLAB code is available at:http://www.comp.hkbu.edu.hk/~xwan/iga.zip.
Diseases, Bioinformatics, Genomics, Sparse matrices, Data models, Noise
Source Publication Title
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Institute of Electrical and Electronics Engineers
(c) 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
This work was supported by the Georgia State University Deep Grant, Hong Kong Baptist University Strategic Development Fund, Hong Kong Baptist University Research Grant FRG2/14-15/077 and FRG2/14-15/069, and Hong Kong General Research Grant HKBU12202114 and HKBU22302815.
Teng, B., Yang, C., Liu, J., Cai, Z., & Wan, X. (2016). Exploring the genetic patterns of complex diseases via the integrative genome-wide approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 13 (3), 557-564. Retrieved from https://repository.hkbu.edu.hk/hkbu_staff_publication/2734