We study functional genetic variation in human populations, and the mechanisms how it affects human traits and disease. Our work combines computational analysis of high-throughput sequencing data, human population genetics approaches, and experimental work. We focus in particular on studying genetic effects on the transcriptome traits, which has further applications in other traits at the cellular and individual level.

While some of our projects are closely related to specific diseases, our overall goal is to uncover general rules of the genomic sources of human variation, which is applicable to a variety of different diseases. This is reflected by the diversity of our funding: Our work is funded mainly by the NIH – including NIGMS, NHLBI, NIMH, NIA, NHGRI, and the Common Fund. We also have institutional funding, and funding from the Roy and Diana Vagelos Precision Medicine Pilot Award.

We are a highly collaborative lab. We work closely together with other labs at the New York Genome Center where we are physically located, and with colleagues at the Department of Systems Biology at Columbia University. Furthermore, we have collaborators in other NYGC partner institutes as well as in other institutes in the U.S. and abroad. We participate in the GTEx, TOPMed, and MoTrPAC consortia and the NHGRI Common Disease Center at the NYGC.

Below, we describe the key active areas of research in the lab

Characterizing variants that affect the transcriptome

Our lab has a strong track record in integration of large-scale genome and transcriptome sequencing data sets to characterize the genetic architecture of variants that affect the transcriptome. These include both rare and common variants in noncoding and coding regions of the genome. This has applications in interpreting disease-associated loci, and in improving our understanding of the regulatory code and interpretation of the personal genome. We are one of the leading groups of the Genotype Tissue Expression (GTEx) project, and use the GTEx data for most of our computational analyses, but also use (and create) other large cohort data e.g. in TOPMed and MoTrPAC studies. While genome and transcriptome data from RNA-sequencing are the main data types that we analyze, in several projects we apply similar approaches to epigenomic and other cellular data sets as well. Better understanding of regulatory mechanisms and multi-omics data integration is a major goal of the lab.

Regulatory modifiers of coding variant penetrance

Incomplete penetrance of genetic variants that cause severe disease is one of the major largely unresolved questions in genetics, and an important limitation in clinical genetic applications aiming to predict an individual’s disease risk based on their genome. One of the key projects in the lab is characterization of how genetic regulatory variants affecting expression or splicing modify the penetrance of coding variants in their target gene. We study this using methods and data sets from the general population and disease cohorts and experimental CRISPR methods. One of the important advances of this work is that it combines two traditionally separate areas of human genetics – analysis of rare coding variants in Mendelian disease, and common regulatory variants in complex disease – under a joint biological hypothesis and an integrated analytical model.

Gene-environment interactions

Genetic variant effects are known to vary between different tissues, cell types, and cell states, which are partially affected by systemic and external environments such as infections, lifestyle and different exposures. Additionally, genetic and environmental sources of disease may be ultimately driven by shared molecular pathways. We use diverse computational approaches to study these effects especially for genetic regulatory variants. These include mapping gene-environment interactions for regulatory variants, analysis of molecular mechanisms that drive these effects, and integrated analysis of both genetic and non-genetic molecular changes.

Disease-focused projects

While most of our work focuses on generalizable patterns and mechanisms of genetic variation in humans, we often apply our approaches to study specific diseases – or use disease cohorts to demonstrate that the phenomena we study are indeed relevant for human health. Some of the disease areas we study are autism, lower respiratory diseases, psychiatric diseases, and aging.

Computational and experimental methods development

Studying biology from big and complex data sets requires deep understanding of the properties and biases of the data, and sophisticated methods for extracting biologically meaningful information. To this end, we have developed and published several methods, approaches, and software tools for allele-specific expression and eQTL analysis.

Furthermore, we develop and test two types of experimental approaches: 1) Improvement of transcriptome sequencing. We test approaches for long-read transcriptome sequencing with Oxford Nanopore, and develop novel protocols for affordable, non-invasive, and thus hyper-scalable RNA-sequencing. 2) Genome editing approaches to characterize and validate transcriptome effects of genetic variants.