
I am a PhD candidate in Computer Science at National University of Singapore, advised by Reza Shokri. Here is my CV (updated May 24, 2026).
I study how training data, optimization, and model architecture jointly shape memorization and generalization in LLMs. My current research interests lie on memorization of useful facts, including:
- Better memorize facts in pretraining data via fact distribution truncation/reweighting;
- Better separate memorization of (user-specific) facts versus general capabilities in post-training via parameter-separation;
- Better measure memorization of (sensitive) facts in long-context to detect and monitor information leakage across long conversations.
Over the past year, I was a research intern at Apple ML Research focusing on problem (1) and (2) under long-tailed training data and constrained model capacity. Prior to that, I focused on measuring training data memorization, with applications to privacy auditing, data usage detection, and differentially private learning. For my works in privacy & security, I’m a recipient of the 2024 Apple Scholars in AI/ML PhD Fellowship and the 2023-2024 Google PhD Fellowship in security and privacy. Before my PhD, I obtained my B.S. in Computational Mathematics from the University of Science and Technology of China.
Selected Publications
(* indicates equal contributions)
-
Cram Less to Fit More: Training Data Pruning Improves Fact Memorization [Paper]
Jiayuan Ye, Vitaly Feldman, Kunal Talwar
ICML 2026
Also received Best Paper Award at DATA-FM workshop @ ICLR 2026 -
Leave-One-Out Distinguishability in Machine Learning
Jiayuan Ye, Anastasia Borovykh, Soufiane Hayou, Reza Shokri
ICLR 2024 -
Instance-Optimality for Private KL Distribution Estimation
Jiayuan Ye, Vitaly Feldman, Kunal Talwar
NeurIPS 2025
Also received Spotlight (Top 3%) -
How Much of My Dataset Did You Use? Quantitative Data Usage Inference in ML
Yao Tong*, Jiayuan Ye*, Sajjad Zarifzadeh, Reza Shokri
ICLR 2025
Also received Oral (Top 2%) -
Initialization Matters: Privacy-Utility Analysis of Overparameterized Neural Networks
Jiayuan Ye, Zhenyu Zhu, Fanghui Liu, Reza Shokri, Volkan Cevher
NeurIPS 2023 -
Differentially Private Learning Needs Hidden State (Or Much Faster Convergence)
Jiayuan Ye, Reza Shokri
NeurIPS 2022 -
Enhanced Membership Inference Attacks Against Machine Learning Models
Jiayuan Ye, Aadyaa Maddi, Sasi Kumar Murakonda, Vincent Bindschaedler, Reza Shokri
CCS 2022
Among top 10 most cited papers published in security conferences in 2022. [Link] -
Differential Privacy Dynamics of Langevin Diffusion and Noisy Gradient Descent
Rishav Chourasia*, Jiayuan Ye*, Reza Shokri
NeurIPS 2021
Also received Spotlight (Top 3%)
Other Publications Contributed To
-
Optimal Splitting of Language Models from Mixtures to Specialized Domains
Skyler Seto, Pierre Ablin, Anastasiia Filippova, Jiayuan Ye, Louis Béthune, Angelos Katharopoulos, David Grangier
ICML 2026 -
Generalization in LLM Problem Solving: The Case of the Shortest Path
Yao Tong, Jiayuan Ye, Anastasia Borovykh, Reza Shokri
ICLR 2026 -
Unified Enhancement of Privacy Bounds for Mixture Mechanisms via -Differential Privacy
Chendi Wang*, Buxin Su*, Jiayuan Ye, Reza Shokri, Weijie J Su
NeurIPS 2023 -
Share Your Representation Only: Guaranteed Improvement of the Privacy-Utility Tradeoff in Federated Learning
Zebang Shen, Jiayuan Ye, Anmin Kang, Hamed Hassani, Reza Shokri
ICLR 2023