avatar

I am a PhD candidate in Computer Science at the School of Computing, National University of Singapore, advised by Reza Shokri. Here is my CV (updated Feb 19, 2026).

I study how training data, optimization, and model architecture jointly shape what models memorize and learn. My research develops principled methods for controllable memorization and generalizable learning, especially in large language models.

Currently, I’m a research intern at Apple ML Research, where I design data selection algorithms and parameter-partitioned transformer architectures to improve fact memorization and memory-intensive reasoning for training language models under long-tailed training data and constrained capacity.

Earlier in my PhD, I focused on privacy and data protection in machine learning. I developed theoretical and empirical tools to analyze training data influence for privacy auditing, data usage inference, and differentially private learning. I’m a recipient of the 2024 Apple Scholars in AI/ML PhD Fellowship and the 2023-2024 Google PhD Fellowship in security and privacy. I also had the privilege of interning at Apple ML Research (Spring 2024) and Azure Research - Microsoft Research (Summer 2023). Even earlier, I received my B.S. in Computational Mathematics from the University of Science and Technology of China.

Selected Research

(* denotes equal contribution)

(see Google Scholar for complete publications)

Controllable Learning and Memorization in Language Models

Parameters and Data Separation Improves Memory-Intensive Reasoning
Jiayuan Ye, Vitaly Feldman, Kunal Talwar, Skyler Seto
Manuscript in submission (draft available upon request)
Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts [Paper]
Jiayuan Ye, Vitaly Feldman, Kunal Talwar
DATA-FM Workshop @ ICLR 2026
Also also to be presented at Apple Workshop on Privacy-Preserving ML & AI 2026
How much of my dataset did you use? Quantitative Data Usage Inference in Machine Learning [Paper] [Code]
Yao Tong*, Jiayuan Ye*, Sajjad Zarifzadeh, Reza Shokri
ICLR 2025 (Oral, top ~2% of submissions)

Foundations of Privacy and Generalizable Learning

Instance-Optimality for Private KL Distribution Estimation [Paper]
Jiayuan Ye, Vitaly Feldman, Kunal Talwar
NeurIPS 2025 (Spotlight, top ~3% of submissions)
Also presented as highlight talk at TPDP 2025
Leave-one-out Distinguishability in Machine Learning [Paper] [Code]
Jiayuan Ye, Anastasia Borovykh, Soufiane Hayou, Reza Shokri
ICLR 2024
Enhanced Membership Inference Attacks against Machine Learning Models [Paper] [Slides] [Code]
Jiayuan Ye, Aadyaa Maddi, Sasi Kumar Murakonda, Vincent Bindschaedler, Reza Shokri
ACM CCS 2022 (Top 10 most cited papers published in security conferences in 2022.)

Professional Experiences

Conference & Workshop Program Commitee/Reviewer: NeurIPS 2022–2025; ICLR 2023–2026; ICML 2023–2026; AISTATS 2023, 2025; ACM CCS 2024, 2026; IEEE SaTML 2025–2026; TPDP 2025–2026; PPAI-2022; FL-ICML 2023; PRIVATE ML @ ICLR 2024; SYNTHDATA @ ICLR 2025; DATA-FM @ ICLR 2025; DATA-FM @ ICLR 2026.
Journal Reviewer: JMLR (2022), SICOMP (2023)