28 June 2021 – A team from NUS Computing excelled at the National Institute of Standards and Technology (NIST) 2020 Differential Privacy Temporal Map Challenge, winning first place at two of the challenge’s three sprints, along with a total cash prize of US$44,000.
Held from October 2020 to June this year, the Differential Privacy Temporal Map Challenge features a series of coding sprints, where teams apply differential privacy methods to temporal map data. The goal is to create a privacy-preserving dashboard map that shows changes across different map segments over time.
The team ‘N-CriPT’, consisting Associate Professor Xiao Xiaokui and PhD students Cai Kuntai, Wei Jianxin, and Bao Ergute from the NUS Computing research centre of the same name, along with Alibaba Group research scientist Ding Bolin, came in 4th place at the first sprint.
However, they ended the challenge with a bang by coming in champions, and winning the progressive prize, at the second and third sprints. Progressive prizes are given mid-way through each sprint to the best-scoring four teams, with priority given to solutions that are pre-screened as satisfying differential privacy.
“We lost some sleep towards the end of Sprint 2 and Sprint 3, but managed to further improve our solutions to gain the upper hand,” said A/P Xiao.
The second sprint featured census data about simulated individuals in various American states from 2012–2018, including 35 different survey features aside from the simulated individual the record belonged to.
The teams needed to build de-identification algorithms generating a set of synthetic, privatised survey records that most accurately preserved the patterns in the original data.
Team N-CriPT used a Probabilistic Graphical Model (PGM)-based approach for this sprint, and applied new techniques to deal with the increased sensitivity of the temporal data.
“We used the public dataset to select a set of marginals that contain highly correlated attributes. Then, we injected noise into the marginals to achieve differential privacy, and used those marginals to construct a Markov random field (MRF), which we used to generate synthetic data,” said the team.
For the third and final sprint, the teams had to build de-identification algorithms for generating synthetic, privatised taxi records that most accurately preserved the patterns in the original data, which featured records of millions of taxi trips in Chicago.
Team N-CriPT opted for a PGM-based approach similar to theirs in Sprint 2, to great effect.
“Using the given dataset, we selected a set of marginals contain highly correlated attributes. Then, to lower the sensitivity of the marginals, we sampled a collection of trips for each taxi. After which, noise was injected into the marginals to achieve differential privacy. We used those marginals to construct a MRF, which was used to generate synthetic data,” said the team.