Check out the latest update at Google Scholar
Technical Reports
(* indicates alphabetical ordering authorship; ** indicates equal contribution)
- Wenlong Ji, Weizhe Yuan, Emily Getzen, Kyunghyun Cho, Michael I Jordan, Song Mei, Jason E Weston, Weijie J Su, Jing Xu, Linjun Zhang. (2025)
An Overview of Large Language Models for Statisticians.
submitted
- Yinpeng Cai, Lexin Li, and Linjun Zhang. (2025)
A Statistical Hypothesis Testing Framework for Data Misappropriation Detection in Large Language Models.
submitted
- Zihan Dong, Xin Zhou, Ryumei Nakada, Lexin Li, and Linjun Zhang. (2025)
Contrastive Network Representation Learning. .
submitted
- Zihan Dong, Xin Zhou, Ryumei Nakada, Lexin Li, and Linjun Zhang. (2025)
Contrastive Network Representation Learning. .
submitted
- Ryumei Nakada, Wenlong Ji, Tianxi Cai, James Zou, Linjun Zhang. (2025)
A Theoretical Framework for Prompt Engineering: Approximating Smooth Functions with Transformer Prompts. .
submitted
- Xintao Xia, Linjun Zhang, Zhanrui Cai. (2025)
Statistical Inference for Differentially Private Stochastic Gradient Descent. .
submitted
- Sai Li and Linjun Zhang. (2024)
FAIRM: Learning Invariant Representations for Algorithmic Fairness and Domain Generalization with Minimax Optimality.
submitted
- Ryumei Nakada, Yichen Xu, Lexin Li, Linjun Zhang. (2024)
Synthetic Oversampling: Theory and A Practical Approach Using LLMs to Address Data Imbalance.
submitted
- * Tianxi Cai, Feiqing Huang, Ryumei Nakada, Linjun Zhang, and Doudou Zhou (2024)
Contrastive Learning on Multimodal Analysis of Electronic Health Records.
submitted
- Huiying Zhong, Zhun Deng, Weijie J Su, Zhiwei Steven Wu, and Linjun Zhang (2024)
Provable Multi-Party Reinforcement Learning with Diverse Human Feedback.
submitted
- Reid McIlroy-Young, Katrina Brown, Conlan Olson, Linjun Zhang, and Cynthia Dwork (2024)
Set-Based Prompting: Provably Solving the Language Model Order Dependency Problem.
submitted
- Sai Li and Linjun Zhang. (2023)
Multi-dimensional domain generalization with low-rank structures. .
submitted
- * T. Tony Cai, Yichen Wang and Linjun Zhang. (2023)
Score Attack: A Lower Bound Technique for Optimal Differentially Private Learning.
submitted
- Peng Wang, Min-Ge Xie and Linjun Zhang. (2022)
Finite-and Large-Sample Inference for Model and Coefficients in High-dimensional Linear Regression with Repro Samples .
submitted
- Zhe Zhang and Linjun Zhang. (2021)
High-Dimensional Differentially-Private EM Algorithm: Methods and Near-Optimal Statistical Guarantees .
submitted.
- * Maya Burhanpurkar, Zhun Deng, Cynthia Dwork and Linjun Zhang. (2021)
Scaffolding Sets.
submitted.
- * T. Tony Cai, Yichen Wang and Linjun Zhang. (2020)
The Cost of Privacy in Generalized Linear Models: Algorithms and Minimax Lower Bounds.
submitted.
- Xianli Zeng, Yingcun Xia, and Linjun Zhang. (2019)
Double Cross Validation for The Number of Factors in Orthogonal Factor Models
.
submitted
- Linjun Zhang, Rong Ma, T. Tony Cai, and Hongzhe Li. (2020)
Estimation, Confidence Intervals, and Large-Scale Hypotheses Testing for High-Dimensional Mixed Linear Regression.
.
submitted
Publications
(* indicates alphabetical ordering authorship; ** indicates equal contribution)
Statistical Foundations of Large Language Models (LLMs)
- Ran Xu, Yuchen Zhuang, Zihan Dong, Ruiyu Wang, Yue Yu, Joyce C. Ho, Linjun Zhang, Haoyu Wang, Wenqi Shi, Carl Yang (2025)
AceRAG: Advancing Reasoning-Intensive Retrieval-Augmented Generation via LLM Self-Play. .
NeurIPS 2025 (Spotlight).
- Tianze Wang, Dongnan Gui, Yifan Hu, Shuhang Lin, and Linjun Zhang (2025)
MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment. .
ICML 2025.
- Fan Nie, Xiaotian Hou, Shuhang Lin, James Zou, Huaxiu Yao, and Linjun Zhang (2025)
FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees. .
ICML 2025.
- Tianci Liu, Ruirui Li, Zihan Dong, Hui Liu, Xianfeng Tang, Qingyu Yin, Linjun Zhang, Haoyu Wang, and Jing Gao (2025)
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models. .
ICML 2025.
- Peng Xia, Kangyu Zhu, Haoran Li, Hongtu Zhu, Yun Li, Gang Li, Linjun Zhang, and Huaxiu Yao (2024)
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models. .
EMNLP 2024.
- Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, Ruibo Hou, Yue Xu, Zhenbang Wu, Zhiyuan Fan, Yiyang Zhou, Kangyu Zhu, Wenhao Zheng, Zhaoyang Wang, Xiao Wang, Xuchao Zhang, Chetan Bansal, Marc Niethammer, Junzhou Huang, Hongtu Zhu, Yun Li, Jimeng Sun, Zongyuan Ge, Gang Li, James Y Zou, and Huaxiu Yao (2024)
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models. .
NeurIPS 2024 Datasets and Benchmarks Track.
- Xinyu Yang, Jixuan Leng, Geyang Guo, Jiawei Zhao, Ryumei Nakada, Linjun Zhang, Huaxiu Yao, and Beidi Chen (2024)
S2FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity. .
NeurIPS 2024.
- Reid McIlroy-Young, Katrina Brown, Conlan Olson, Linjun Zhang, and Cynthia Dwork (2024)
Order-Independence Without Fine Tuning. .
NeurIPS 2024.
- Yiyang Zhou, Zhiyuan Fan, Dongjie Cheng, Sihan Yang, Zhaorun Chen, Chenhang Cui, Xiyao Wang, Yun Li, Linjun Zhang, and Huaxiu Yao (2024)
Calibrated Self-Rewarding Vision Language Models. .
NeurIPS 2024.
- Yiyang Zhou, Zhiyuan Fan, Dongjie Cheng, Sihan Yang, Zhaorun Chen, Chenhang Cui, Xiyao Wang, Yun Li, Linjun Zhang, and Huaxiu Yao (2024)
Calibrated Self-Rewarding Vision Language Models. .
NeurIPS 2024.
- Xinming Tu, James Zou, Weijie Su and Linjun Zhang. (2024)
What Should Data Science Education Do with Large Language Models?.
Harvard Data Science Review
- * Lujing Zhang, Aaron Roth, and Linjun Zhang. (2023)
Fair Risk Control: A Generalized Framework for Calibrating Multi-group Fairness Risks. .
ITCS 2023.
- Yiyang Zhou, Chenhang Cui, Jaehong Yoon, Linjun Zhang, Zhun Deng, Chelsea Finn, Mohit Bansal, Huaxiu Yao. (2024)
Analyzing and mitigating object hallucination in large vision-language models. .
ICLR 2024
Algorithmic Fairness
- * Lujing Zhang, Aaron Roth, and Linjun Zhang. (2023)
Fair Risk Control: A Generalized Framework for Calibrating Multi-group Fairness Risks. .
ITCS 2023.
- * Zhun Deng, Cynthia Dwork and Linjun Zhang. (2023)
HappyMap: A Generalized Multicalibration Method .
ITCS 2023.
- Shirley Wu, Mert Yuksekgonul, Linjun Zhang and James Zou. (2023)
Discover and Cure: Concept-aware Mitigation of Spurious Correlation .
ICML 2023.
- Puheng Li, James Zou, and Linjun Zhang. (2023)
FaiREE: Fair Classification with Finite-Sample and Distribution-Free Guarantee .
ICLR 2023
- Zhun Deng, Jiayao Zhang, Linjun Zhang, Ting Ye, Yates Coley, Weijie J. Su, and James Zou. (2023)
FIFA: Making Fairness More Generalizable in Classifiers Trained on Imbalanced Data .
ICLR 2023.
- Zhun Deng, He Sun, Zhiwei Steven Wu, Linjun Zhang and David C. Parkes. (2023)
Reinforcement Learning with Stepwise Fairness Constraints
.
AISTATS 2023.
- Haotian Ye, $James Zou, and $Linjun Zhang. (2023)
Freeze then Train: Towards Provable Representation Learning under Spurious Correlations and Feature Noise .
AISTATS 2023
- Huaxiu Yao, Yu Wang, Sai Li, Linjun Zhang, Weixin Liang, James Zou and Chelsea Finn. (2022)
Improving Out-of-Distribution Robustness via Selective Augmentation .
ICML 2022.
Private Data Analysis
- * Xintao Xia, Linjun Zhang, and Zhanrui Cai. (2025)
Differentially Private Sliced Inverse Regression: Minimax Optimality and Algorithm.
Journal of the American Statistical Association.
- * Cynthia Dwork, Pranay Tankala, and Linjun Zhang. (2025)
Differentially Private Sliced Inverse Regression: Minimax Optimality and Algorithm.
Theory of Cryptography Conference (TCC) 2025
- * T. Tony Cai, Yichen Wang and Linjun Zhang. (2023)
Score Attack: A Lower Bound Technique for Optimal Differentially Private Learning.
- * Jinshuo Dong, Weijie Su, and Linjun Zhang. (2021)
A Central Limit Theorem for Differentially Private Query
Answering.
NeurIPS 2021, and selected as spotlight (top 3% of submissions)
- * T. Tony Cai, Yichen Wang and Linjun Zhang. (2021)
The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy.
Annals of Statistics .
- Zhe Zhang and Linjun Zhang. (2020)
Privacy-Preserving Algorithms: the Gain and the Loss
.
CHANCE 33 (4), 22-28 .
Deep Learning (with focus on representation learning)
- Jianguo Huang, HuaJun Xi, Linjun Zhang, Huaxiu Yao, Yue Qiu, Hongxin Wei. (2024)
Conformal Prediction for Deep Classifier via Label Ranking.
ICML 2024
- Wenlong Ji, Zhun Deng, Ryumei Nakada, James Zou and Linjun Zhang. (2023)
The Power of Contrast for Feature Learning: A Theoretical Analysis.
Journal of Machine Learning Research.
- Mert Yuksekgonul, Linjun Zhang, James Zou, and Carlos Guestrin. (2023)
Beyond Confidence: Reliable Models Should Also Consider Atypicality.
ICML 2023
- Ryumei Nakada, Halil Ibrahim Gulluk, Zhun Deng, Wenlong Ji, James Zou and Linjun Zhang. (2023)
Understanding Multimodal Contrastive Learning and Incorporating Unpaired Data
.
AISTATS 2023.
- Huaxiu Yao, Linjun Zhang and Chelsea Finn. (2022)
Meta-Learning with Fewer Tasks through Task Interpolation .
ICLR 2022, selected as the oral presentation (top 1.5% of submissions).
- Huaxiu Yao, Yiping Wang, Linjun Zhang, James Zou, and Chelsea Finn. (2022)
C-Mixup: Improving Generalization in Regression .
NeurIPS 2022.
- Huaxiu Yao, Yu Wang, Sai Li, Linjun Zhang, Weixin Liang, James Zou and Chelsea Finn. (2022)
Improving Out-of-Distribution Robustness via Selective Augmentation .
ICML 2022.
- ** Linjun Zhang, ** Zhun Deng, Kenji Kawaguchi and James Zou. (2022)
When and How Mixup Improves Calibration
.
ICML 2022.
- Kenji Kawaguchi, Linjun Zhang and Zhun Deng. (2022)
Understanding Dynamics of Nonlinear Representation Learning and Its Application
.
Neural Computation.
- ** Zhun Deng, ** Linjun Zhang, Kailas Vodrahalli, Kenji Kawaguchi, and James Zou. (2021)
Adversarial Training Helps Transfer Learning via Better Representations
.
NeurIPS 2021.
- Huaxiu Yao, Longkai Huang, Linjun Zhang, Ying Wei, Li Tian, James Zou, Junzhou Huang and Zhenhui Li. (2021)
Improving Generalization in Meta-learning via Task Augmentation
.
ICML 2021.
- **Linjun Zhang, **Zhun Deng, **Kenji Kawaguchi, Amirata Ghorbani, and James Zou. (2020)
How Does Mixup Help With Robustness and Generalization?
[Slides]
ICLR 2021, and selected as spotlight (top 5% of submissions)
- ** Zhun Deng, ** Linjun Zhang, Amirata Ghorbani, and James Zou. (2020)
Improving Adversarial Robustness via Unlabeled Out-of-Domain Data
.
AISTATS 2021, and selected as the oral presentation (top 3% of submissions)
- * Zhun Deng, Cynthia Dwork, Jialiang Wang, and Linjun Zhang. (2020)
Interpreting Robust Optimization via Adversarial Influence Functions
.
ICML 2020
High-dimensional Statistics (with Distribution Shifts)
- Sai Li and Linjun Zhang (2025)
Multi-Dimensional Domain Generalization with Low-Rank Structures..
Journal of the American Statistical Association.
- Sai Li, Linjun Zhang, T. Tony Cai, and Hongzhe Li. (2024)
Estimation and Inference in High-Dimensional Generalized Linear Models with Knowledge Transfer.
Journal of the American Statistical Association.
- Ruijia Wu, Linjun Zhang, and T. Tony Cai. (2021)
Sparse Topic Modeling: Computational Efficiency, Near-Optimal Algorithms, and Statistical Inference.
.
Journal of the American Statistical Association
- * T. Tony Cai, and Linjun Zhang. (2020)
A Convex Optimization Approach to High-dimensional
Sparse Quadratic Discriminant Analysis.
Annals of Statistics
- * T. Tony Cai, and Linjun Zhang. (2019)
High-dimensional Linear Discriminant Analysis: Optimality, Adaptivity, and Missing Data.
Journal of Royal Statistical Society, B
- * T. Tony Cai, Jing Ma and Linjun Zhang. (2018)
CHIME: Clustering of High-Dimensional Gaussian
Mixtures with EM Algorithm and Its Optimality.
Annals of Statistics
- * T. Tony Cai, Linjun Zhang and Harrison H. Zhou. (2017)
Adaptive Functional Linear Regression.
Statistica Sinica
- * T. Tony Cai, and Linjun Zhang. (2017)
High-Dimensional Gaussian Copula Regression: Adaptive Estimation and Statistical Inference.
Statistica Sinica, 2018, Vol. 28, 963-993.
- * T. Tony Cai, and Linjun Zhang. (2016)
Discussion: Important feature PCA for high dimensional
clustering.
Annals of Statistics, 2016, Vol. 44, No. 6, 2372-2381.
Network Analysis
- Linjun Zhang, Michael Small, and Kevin Judd. (2015)
Exactly scale-free scale-free networks.
Physica A, 433: 182-197.
- Michael Small, Lvlin Hou, and Linjun Zhang. (2014)
Random complex networks.
The IEEE International Symposium on Circuits and Systems (ISCAS) 2014 invited paper.
- Michael Small, Kevin Judd, L. Zhang. (2014)
How is that complex network complex?
National Science Review , 1(3): 357-367.