how do we make sense of genomic data?
I started out in the natural sciences. My main focus in college was on organic chemistry, but I was interested in everything. I eventually realized that studying statistics would let me, as Tukey said, play in everyone’s backyard. So now I am a statistician, and I'm interested in both the process of applying statistics to science, particularly to biology and genomics, and the theoretical nature of large-scale statistical inference.

An obvious practical difficulty is that biological questions are explanatory in nature, asking how and why things are the way they are, but statistics is for the most part descriptive. I am interested in developing analysis strategies for genomic data that bridge the gap between description and explanation. I'm currently focused on spatial transcriptomics and behavioral genomics, and I've also worked on:

  • Zhu, H., Zhao, S.D., Ray, A., Zhang, Y., and Li, X. (2022).
    A comprehensive temporal patterning gene network in Drosophila medulla neuroblasts revealed by single-cell RNA sequencing.
    Nature Communications, 13, 1247.
  • Avalos, A., Fang, M., Pan, H., Lluch, A. R., Lipka, A. E., Zhao, S. D., Giray, T., Robinson, G. E., Zhang, G., and Hudson, M. E. (2020).
    Genomic regions influencing aggressive behavior in honey bees are defined by colony allele frequencies.
    Proceedings of the National Academy of Sciences 117, 17135–17141.

A hallmark of genomics research is that it asks a huge number of biological questions simultaneously, made possible by high-throughput technology. But more questions also leads to more inductive uncertainty. I am interested in understanding the fundamental statistical principles that make it possible to reduce this uncertainty. I'm currently studying empirical Bayesian inference and mediation analysis.

  • Barbehenn*, A. and Zhao, S.D.
    A nonparametric regression approach to asymptotically optimal estimation of normal means.
    arxiv:2205.00336
  • Zhou*, R. R., Wang, L., and Zhao, S. D. (2020).
    Estimation and inference for the indirect effect in high-dimensional linear mediation models.
    Biometrika 107, 573–589.
    (An earlier version of this paper was a winner of a 2018 American Statistical Association Section of Statistics in Genomics and Genetics distinguished student paper award.)

Finally, I am interested in developing new statistical inference procedures. My work includes multiple hypothesis testing, precision matrix estimation, and high-dimensional survival analysis methods.

My research has been supported by:

  1. Han (PI), Zhao (Co-PI).
    Integrated experimental and statistical tools for ultra-high-throughput spatial transcriptomics.
    NIH, R21HG013180, 2023–2025
  2. Foss (PI), Zhao (Co-PI).
    A Statistical Approach to Nonlocal Compression for Supervised Learning, Semi-Supervised Learning, and Anomaly Detection.
    Sandia National Labs, LDRD22-0599, 2021–2023
  3. Robinson (PI), Zhao (Co-PI).
    Gut Microbiome Effects on Brain and Behavior.
    NSF IOS-2120378, 2021–2024.
  4. Li (PI), Zhao (Co-PI).
    Computational Reconstruction of Gene-Gene Dynamics in Temporal Patterning of Drosophila Medulla Neuroblasts from Single-Cell RNA-Seq.
    NSF-Simons Center for Quantitative Biology at Northwestern University, 2018–2019.
  5. Zhao (PI).
    Theory and Methods for Simultaneous Signal Analysis in Integrative Genomics.
    NSF DMS-1613005, 2016–2019.