I study statistics for genomics.
A central challenge of scientific research is that raw data cannot directly "enter the mind" to give understanding [1]. This difficulty is especially pronounced in genomics, where experiments can simultaneously measure data on a huge number of biological features, such as mutations or genes.

I study large-scale statistical inference approaches for making sense of genomic data. I also use them to study questions in behavioral neuroscience and other substantive fields.

Statistical research

The field of statistics has been changing rapidly. My view is that statistics is the study of how to make meaning out of data, and thus that statistics research encompasses not only (i) the development and analysis of new procedures for inferring pre-defined population-level quantities from observed data, but also (ii) the specification and definition of new population-level quantitative constructs that can carry substantive meaning [2].

My statistical research program spans these activities in the context of genomics.

  1. I introduce new quantities useful for understanding highly multiplexed genomic data. I am currently interested in single-cell and spatial transcriptomics. For example, I recently studied transcript colocalization for single molecule-resolution spatial transcriptomics.
  2. I create new methods to estimate and perform inference on these quantities. I mainly draw from large-scale inference procedures and high-dimensional statistics. For example, I recently studied high-dimensional mediation analysis.
  3. I study the fundamental statistical phenomena underlying large-scale inference. I am currently interested in simultaneous estimation problems. For example, I am currently developing a new nonparametric regression-based reformulation of empirical Bayes estimation.

Scientific research

I am a core member of the Gene Networks in Neural & Developmental Plasticity (GNDP) research theme at the Carl R. Woese Institute for Genomic Biology. Each theme is a tightly-organized multidisciplinary team of investigators from numerous departments across campus who work together to tackle grand challenges questions typically outside the scope of an individual lab.

My scientific research program, in close collaboration with my GNDP colleagues, is to study the genomic basis of social behavior. I am particularly interested in how genomic information is integrated with information at higher levels of biological organization in the brain. For example, I am the PI of an NIH R01 grant to study multimodal network interactions underlying resiliency.

Funding

My work is or has been supported by:

  1. Zhao (PI), Bonthuis (Co-I), Gritton (Co-I), Vlasov (Co-I).
    CRCNS: Multimodal network interactions for internal state dynamics of resiliency.
    NIH, R01AT013189, 2024–2028.
  2. Han (PI), Zhao (Co-PI).
    Integrated experimental and statistical tools for ultra-high-throughput spatial transcriptomics.
    NIH, R21HG013180, 2023–2025.
  3. Foss (PI), Zhao (Co-PI).
    A Statistical Approach to Nonlocal Compression for Supervised Learning, Semi-Supervised Learning, and Anomaly Detection.
    Sandia National Labs, LDRD22-0599, 2021–2023.
  4. Robinson (PI), Zhao (Co-PI).
    Gut Microbiome Effects on Brain and Behavior.
    NSF IOS-2120378, 2021–2024.
  5. Li (PI), Zhao (Co-PI).
    Computational Reconstruction of Gene-Gene Dynamics in Temporal Patterning of Drosophila Medulla Neuroblasts from Single-Cell RNA-Seq.
    NSF-Simons Center for Quantitative Biology at Northwestern University, 2018–2019.
  6. Zhao (PI).
    Theory and Methods for Simultaneous Signal Analysis in Integrative Genomics.
    NSF DMS-1613005, 2016–2019.


[1]. Fisher (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society A 222, 592–604.

[2]. I think this activity lies at the heart of the emerging discipline of data science, as anticipated by Nelder: "I see statistics as giving a central place to the theory and practice of the matching of theory to data".
Nelder (1986). Statistics, science and technology. Journal of the Royal Statistical Society: Series A 149, 109–121.