
I am a PhD candidate in Computer Science at the School of Computing, National University of Singapore, advised by Reza Shokri. Here is my CV (updated Feb 19, 2026).
I study how training data, optimization, and model architecture jointly shape what models memorize and learn. My research develops principled methods for controllable memorization especially in language models.
Currently, I’m a research intern at Apple ML Research, where I design data selection algorithms and parameter-partitioned transformer architectures to improve fact memorization and memory-intensive reasoning for training language models under long-tailed training data and constrained capacity.
Earlier in my PhD, I focused on privacy and data protection in machine learning. I developed theoretical and empirical tools to analyze training data influence for privacy auditing, data usage inference, and differentially private learning. I’m a recipient of the 2024 Apple Scholars in AI/ML PhD Fellowship and the 2023-2024 Google PhD Fellowship in security and privacy. I also had the privilege of interning at Apple ML Research (Spring 2024) and Azure Research - Microsoft Research (Summer 2023). Even earlier, I received my B.S. in Computational Mathematics from the University of Science and Technology of China.
Selected Research
(* denotes equal contribution)
(see Google Scholar for complete publications)
Controllable Learning and Memorization in Language Models
-
Parameters and Data Separation Improves Memory-Intensive Reasoning
Jiayuan Ye, Vitaly Feldman, Kunal Talwar, Skyler Seto
Manuscript in submission (draft available upon request) -
Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts
Jiayuan Ye, Vitaly Feldman, Kunal Talwar
Manuscript in submission (draft available upon request); to be presented at Apple Workshop on Privacy-Preserving ML & AI 2026 -
How much of my dataset did you use? Quantitative Data Usage Inference in Machine Learning [Paper] [Code]
Yao Tong*, Jiayuan Ye*, Sajjad Zarifzadeh, Reza Shokri
ICLR 2025 (Oral, top ~2% of submissions)
Foundations of Privacy and Learning
-
Instance-Optimality for Private KL Distribution Estimation [Paper]
Jiayuan Ye, Vitaly Feldman, Kunal Talwar
NeurIPS 2025 (Spotlight, top ~3% of submissions)
Also presented as highlight talk at TPDP 2025 -
Leave-one-out Distinguishability in Machine Learning [Paper] [Code]
Jiayuan Ye, Anastasia Borovykh, Soufiane Hayou, Reza Shokri
ICLR 2024 -
Enhanced Membership Inference Attacks against Machine Learning Models [Paper] [Slides] [Code]
Jiayuan Ye, Aadyaa Maddi, Sasi Kumar Murakonda, Vincent Bindschaedler, Reza Shokri
ACM CCS 2022 (Top 10 most cited papers published in security conferences in 2022.)
Professional Experiences
- Conference & Workshop Program Commitee/Reviewer: NeurIPS 2022–2025; ICLR 2023–2026; ICML 2023–2026; AISTATS 2023, 2025; ACM CCS 2024, 2026; IEEE SaTML 2025–2026; TPDP 2025–2026; PPAI-2022; FL-ICML 2023; PRIVATE ML @ ICLR 2024; SYNTHDATA @ ICLR 2025; DATA-FM @ ICLR 2025; DATA-FM @ ICLR 2026.
- Journal Reviewer: JMLR (2022), SICOMP (2023)