avatar

I am a PhD candidate in Computer Science at the School of Computing, National University of Singapore, advised by Reza Shokri. Here is my CV (updated Feb 19, 2026).

I study how training data, optimization, and model architecture jointly shape what models memorize and learn. My research develops principled methods for controllable memorization especially in language models.

Currently, I’m a research intern at Apple ML Research, where I design data selection algorithms and parameter-partitioned transformer architectures to improve fact memorization and memory-intensive reasoning for training language models under long-tailed training data and constrained capacity.

Earlier in my PhD, I focused on privacy and data protection in machine learning. I developed theoretical and empirical tools to analyze training data influence for privacy auditing, data usage inference, and differentially private learning. I’m a recipient of the 2024 Apple Scholars in AI/ML PhD Fellowship and the 2023-2024 Google PhD Fellowship in security and privacy. I also had the privilege of interning at Apple ML Research (Spring 2024) and Azure Research - Microsoft Research (Summer 2023). Even earlier, I received my B.S. in Computational Mathematics from the University of Science and Technology of China.

Selected Research

(* denotes equal contribution)

(see Google Scholar for complete publications)

Controllable Learning and Memorization in Language Models

  • Parameters and Data Separation Improves Memory-Intensive Reasoning
    Jiayuan Ye, Vitaly Feldman, Kunal Talwar, Skyler Seto
    Manuscript in submission (draft available upon request)

  • Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts
    Jiayuan Ye, Vitaly Feldman, Kunal Talwar
    Manuscript in submission (draft available upon request); to be presented at Apple Workshop on Privacy-Preserving ML & AI 2026

  • How much of my dataset did you use? Quantitative Data Usage Inference in Machine Learning [Paper] [Code]
    Yao Tong*, Jiayuan Ye*, Sajjad Zarifzadeh, Reza Shokri
    ICLR 2025 (Oral, top ~2% of submissions)

Foundations of Privacy and Learning

  • Instance-Optimality for Private KL Distribution Estimation [Paper]
    Jiayuan Ye, Vitaly Feldman, Kunal Talwar
    NeurIPS 2025 (Spotlight, top ~3% of submissions)
    Also presented as highlight talk at TPDP 2025

  • Leave-one-out Distinguishability in Machine Learning [Paper] [Code]
    Jiayuan Ye, Anastasia Borovykh, Soufiane Hayou, Reza Shokri
    ICLR 2024

  • Enhanced Membership Inference Attacks against Machine Learning Models [Paper] [Slides] [Code]
    Jiayuan Ye, Aadyaa Maddi, Sasi Kumar Murakonda, Vincent Bindschaedler, Reza Shokri
    ACM CCS 2022 (Top 10 most cited papers published in security conferences in 2022.)

Professional Experiences

  • Conference & Workshop Program Commitee/Reviewer: NeurIPS 2022–2025; ICLR 2023–2026; ICML 2023–2026; AISTATS 2023, 2025; ACM CCS 2024, 2026; IEEE SaTML 2025–2026; TPDP 2025–2026; PPAI-2022; FL-ICML 2023; PRIVATE ML @ ICLR 2024; SYNTHDATA @ ICLR 2025; DATA-FM @ ICLR 2025; DATA-FM @ ICLR 2026.
  • Journal Reviewer: JMLR (2022), SICOMP (2023)