Auditing Data Privacy in Machine Learning: A Comprehensive Introduction [keynote slides]

Machine learning algorithms can leak a significant amount of information about their training data. A legitimate user of a model can reconstruct sensitive information about the training data, by having access to its predictions or parameters. Given that all privacy policies and regulations require auditing data-driven algorithms to assess their privacy risks, we are interested in a generic yet rigorous approach to perform a quantitative reasoning about the privacy risks of various machine learning algorithms. We are also interested in explaining the information leakage through machine learning algorithms, and understanding the sources of privacy risks.

Membership inference is emerging as a foundational technique to empirically quantify a lower bound on the information leakage of machine learning algorithms about the individual data records in their training set. This notion of leakage is what differentially private machine learning aims to mitigate. This makes membership inference a perfect tool to audit all different types of machine learning algorithms, in a consistent manner. It can also help making privacy guarantees (e.g., epsilon in differential privacy) more interpretable.

In this tutorial, we present a unified view of recent works on computing more accurate estimates of privacy loss for iterative learning algorithms. We will cover the foundations of practical inference attacks and provide a rigorous quantitative understanding of differentially private machine learning. The objective is to link the underlying relation between privacy concepts, attacks, protection mechanisms, and tools. We will go beyond the mechanics of the techniques, and explain why machine learning algorithms leak information about their training data, and how different membership inference algorithms partially exploit the information leakage. We give examples on how to audit machine learning algorithms using ML Privacy Meter, a popular open source tool developed for this purpose.

ACM CCS Tutorial, 11 November 2022.

References

Hypothesis testing framework for membership inference attacks

Jiayuan Ye, Aadyaa Maddi, Sasi Kumar Murakonda, Vincent Bindschaedler, and Reza Shokri
Enhanced Membership Inference Attacks against Machine Learning Models [code]
ACM Conference on Computer and Communications Security (CCS), 2022

Dynamics of differential privacy in machine learning

Jiayuan Ye and Reza Shokri
Differentially Private Learning Needs Hidden State (Or Much Faster Convergence)
Conference on Neural Information Processing Systems (NeurIPS), 2022

Rishav Chourasia*, Jiayuan Ye*, and Reza Shokri
Differential Privacy Dynamics of Langevin Diffusion and Noisy Gradient Descent [talk by Jiayuan Ye]
Conference on Neural Information Processing Systems (NeurIPS), Spotlight, 2021

Privacy requirements

Hannah Brown, Katherine Lee, Fatemehsadat Mireshghallah, Reza Shokri, and Florian Tramer
What Does it Mean for a Language Model to Preserve Privacy?
ACM Conference on Fairness, Accountability, and Transparency (FAccT), 2022

Inference attacks

Florian Tramèr, Reza Shokri, Ayrton San Joaquin, Hoang Le, Matthew Jagielski, Sanghyun Hong, and Nicholas Carlini
Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets
ACM Conference on Computer and Communications Security (CCS), 2022

Fatemehsadat Mireshghallah, Kartik Goyal, Archit Uniyal, Taylor Berg-Kirkpatrick, and Reza Shokri
Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks
arXiv:2203.03929, 2022

Hongyan Chang, and Reza Shokri
On the Privacy Risks of Algorithmic Fairness
IEEE European Symposium on Security and Privacy (EuroSP), 2021
Also presented at FTC PrivacyCon, 2021

Reza Shokri, Martin Strobel, and Yair Zick
On the Privacy Risks of Model Explanations
AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2021
Also presented at FTC PrivacyCon, 2021

Sasi Kumar Murakonda, Reza Shokri, and George Theodorakopoulos
Quantifying the Privacy Risks of Learning High-Dimensional Graphical Models
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021

Liwei Song, Reza Shokri, and Prateek Mittal
Privacy Risks of Securing Machine Learning Models against Adversarial Examples [talk by L. Song]
ACM Conference on Computer and Communications Security (CCS), 2019

Milad Nasr, Reza Shokri, and Amir Houmansadr
Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning [code] [talk by M. Nasr]
IEEE Symposium on Security and Privacy (S&P) -- Oakland, 2019

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov
Membership Inference Attacks against Machine Learning Models [code] [tool] [datasets] [talk]
IEEE Symposium on Security and Privacy (S&P) -- Oakland, 2017.
The Caspar Bowden Award for Outstanding Research in Privacy Enhancing Technologies 2018.

Speaker

Reza SHOKRI


NUS Presidential Young Professor,
CS Department, National University of Singapore (NUS)

Related Talks

Auditing Data Privacy in Machine Learning