Academic job market candidate

Hufeng Zhou, PhD

I build statistical methods, functional annotation resources, and production-quality software for understanding human genetic variation and disease mechanisms at whole-genome scale.

45+
peer-reviewed publications
15+
years in computational biology
3
editorial service roles
2026
CV and publications updated
Hufeng Zhou

Profile

Computational biology for genomic medicine

My work connects statistical genetics, functional genomics, and robust software systems so large-scale sequencing studies can move from raw variants to biological insight.

Research Scientist, Harvard T.H. Chan School of Public Health; former Instructor, Harvard Medical School and Brigham and Women's Hospital.

I focus on annotation-informed rare-variant association methods, functional annotation resources such as FAVOR, EBV-associated cancer epigenomics, AI-driven genomic and pathology data science, host-pathogen computational biology, and software infrastructure that helps research consortia analyze large sequencing datasets.

Statistical genetics Whole-genome sequencing Functional annotation AI and digital pathology EBV epigenomics Scientific software

Research

A coherent program from methods to mechanisms

These areas connect statistical genetics, functional annotation, AI, epigenomics, software infrastructure, and host-pathogen systems biology.

Population genetics and rare variants

Population genetics and rare variants

Scalable methods for large whole-genome sequencing studies, with emphasis on rare-variant association testing, noncoding regions, multi-trait analysis, time-to-event outcomes, and biobank-scale inference.

Explore project
Variant annotation infrastructure

Variant annotation infrastructure

FAVOR, FAVOR 2.0, FAVORannotator, and FAVOR-GPT translate genome-wide functional annotation into searchable resources and analysis-ready formats for human genetics.

Explore project
EBV epigenomics and gene regulation

EBV epigenomics and gene regulation

Integrative genomic studies of Epstein-Barr virus transcriptional regulation, super-enhancers, enhancer RNAs, viral oncoproteins, and host chromatin architecture.

Explore project
AI-enabled genomics and digital pathology

AI-enabled genomics and digital pathology

Machine-learning and deep-learning applications for pathogenic variant annotation, WGS quality control, multi-omic integration, and H&E whole-slide pathology risk modeling.

Explore project
Host-pathogen computational biology

Host-pathogen computational biology

Computational approaches for protein-protein interaction prediction, pathway data integration, microbial systems biology, and molecular diagnostic collaborations.

Explore project
Scientific software and reproducible pipelines

Scientific software and reproducible pipelines

Open software, data resources, and analysis pipelines that turn statistical methods into practical tools for large consortia and biomedical collaborators.

Explore project

Recent publications

Updated publication record

Recent papers now include 2024-2026 publications from PubMed/ORCID and publisher-indexed records.

2026

Scalable and accurate rare-variant association tests for whole genome sequencing time-to-event analysis in large biobanks.

Song S, Li X, Zhou H, Li Z, Lin X.

Proc Natl Acad Sci U S A. 2026;123(9):e2525288123.

2026

Comparison of variant callers using 60,532 multi-ancestry whole genome sequences.

Zhou H, Li Z, Shyr D, Li X, Yang H, Dey R, Tang Y, Maier R, Boerwinkle E, Buyske S, Daly M, Felsenfeld A, Gibbs RA, Gupta N, Hall IM, Matise T, Metcalf GA, Smith A, Reeves C, Sofia HJ, Stitziel NO, Zody MC, NHGRI Genome Sequencing Program Consortium, Neale B, Lin X.

Brief Bioinform. 2026;27(2).

2026

cellSTAAR: incorporating single-cell-sequencing-based functional data to boost power in rare variant association testing of noncoding regions.

Van Buren E, Zhang Y, Li X, Selvaraj MS, Li Z, Zhou H, Palmer ND, Arnett DK, Blangero J, Boerwinkle E, Cade BE, Carlson JC, Carson AP, Chen YI, Curran J, Duggirala R, Fornage M, Franceschini N, Graff M, Gu C, Guo X, He J, Heard-Cosa N, Hou L, Hung YJ, Kalyani RR, Kardia SLR, Kenny E, Kooperberg C, Kral BG, Lange L, Levy D, Li C, Liu S, Lloyd-Jones D, Loos RJF, Manichaikul AW, Martin LW, Mathias R, Minster RL, Mitchell BD, Mychaleckyj JC, Naseri T, North K, O'Connell J, Perry JA, Peyser PA, Psaty BM, Raffield LM, Vasan RS, Redline S, Reiner AP, Rich SS, Smith JA, Spitzer B, Tang H, Taylor KD, Tracy R, Viali S, Yanek L, Zhao W, NHLBI TOPMed Consortium, Rotter JI, Peloso GM, Natarajan P, Lin X.

Nat Methods. 2026;23(2):338-349.

2026

FAVOR 2.0: A reengineered functional annotation of variants online resource for interpreting genomic variation.

Zhou H, Verma V, Li X, Li Z, Shedd N, Li TC, Yang H, Zhang A, Borsari B, Buyske S, Gerstein M, Matise T, Zody MC, Neale B, Weng Z, Sunyaev SR, Lin X.

Nucleic Acids Res. 2026;54(D1):D1405-D1414.

2025

A statistical framework for multi-trait rare variant analysis in large-scale whole-genome sequencing studies.

Li X, Chen H, Selvaraj MS, Van Buren E, Zhou H, Wang Y, Sun R, McCaw ZR, Yu Z, Jiang MZ, DiCorpo D, Gaynor SM, Dey R, Arnett DK, Benjamin EJ, Bis JC, Blangero J, Boerwinkle E, Bowden DW, Brody JA, Cade BE, Carson AP, Carlson JC, Chami N, Chen YI, Curran JE, de Vries PS, Fornage M, Franceschini N, Freedman BI, Gu C, Heard-Costa NL, He J, Hou L, Hung YJ, Irvin MR, Kaplan RC, Kardia SLR, Kelly TN, Konigsberg I, Kooperberg C, Kral BG, Li C, Li Y, Lin H, Liu CT, Loos RJF, Mahaney MC, Martin LW, Mathias RA, Mitchell BD, Montasser ME, Morrison AC, Naseri T, North KE, Palmer ND, Peyser PA, Psaty BM, Redline S, Reiner AP, Rich SS, Sitlani CM, Smith JA, Taylor KD, Tiwari HK, Vasan RS, Viali S, Wang Z, Wessel J, Yanek LR, Yu B, NHLBI TOPMed Consortium, Dupuis J, Meigs JB, Auer PL, Raffield LM, Manning AK, Rice KM, Rotter JI, Peloso GM, Natarajan P, Li Z, Liu Z, Lin X.

Nat Comput Sci. 2025;5(2):125-143.

Software and resources

Tools that make genome-scale studies usable

FAVOR, FAVORannotator, STAAR, STAARpipeline, metaSTAAR, MultiSTAAR, and cellSTAAR show a consistent thread: methods that are not only statistically rigorous, but usable by large collaborations.

FAVOR interface and annotation resource