¡¡
¡¡
¡¡ School of Computing
¡¡
¡¡


Research Topic - Privacy and Anonymity in Databases


1) Organizations, such as government agencies, insurance companies and hospitals, need to release detailed data (e.g., medical records, financial data, etc.) for research and other public benefit purposes. However, sensitive personal information (e.g., disease of a specific person) may be revealed in this process, despite hiding identifying attributes, such as name or social security number. This is due to the existence of quasi-identifiers (QI), which are sets of attributes (e.g., ¨¢ZIP, DateOfBirth, Sexñ) that can be joined with information obtained from a variety of sources (e.g., public voting registration data) in order to reveal the identity of individual records. A well-known study found that 87% of the US population can be uniquely identified by the combination of ¨¢ZIP, DateOfBirth, Sexñ. As more data are made available online, people are expected to raise more concerns about their privacy. Recent research proposed k-Anonymity and ℓ-Diversity to solve this problem.  Both approaches involve generalization (e.g., show the area code instead of the exact phone number) or suppression (completely hide some data), which inadvertently cause information loss. Our research focuses on:

a)      Developing a framework for efficient data transformations (i.e., k-Anonymity and ℓ-Diversity), which minimize information loss (for given k or ℓ). The majority of existing research employs many-to-one mappings to compute the anonymized versions of the original multi-dimensional data. We focus on more flexible many-to-many multi-dimensional mappings. Although it has been acknowledged that this method may achieve lower information loss, it has not received much attention because it is computationally expensive. To reduce the computation cost, we use dimensionality reduction techniques and study efficient approximation algorithms in the one-dimensional space.

b)      Studying the dual problem: ¡°Given the maximum acceptable information loss, find a transformation that maximizes privacy (i.e., k or ℓ)¡±. Although this problem is significant in practice, it has not been studied before.

c)      Extending our framework (i.e., anonymization via dimensionality reduction) to related applications. Examples include location-based services (e.g., on-line maps), where the location of the querying user (rather than the data) must be anonymized, or stream data (e.g., data from sensors, stock market, etc).

¡¡

2)  Some Publications:

G. Ghinita, P. Kalnis, and S. Skiadopoulos, ¡°PRIVE: Anonymous Location-Base Queries in Distributed Mobile Systems¡±. In Proc. of World Wide Web Conference (WWW), Banff, Canada, 2007 (to appear).

G. Ghinita, P. Kalnis, S. Skiadopoulos, ¡°MobiHide: A Mobile Peer-to-Peer System for Anonymous Location-Based Queries¡±. In Proc. of Int. Symposium on Spatial and Temporal Databases (SSTD), Boston, USA, 2007 (to appear).

P. Kalnis, G. Ghinita, K. Mouratidis, and D. Papadias, ¡°Preserving Anonymity in Location Based Services¡±. Technical Report TRB6/06, Dept. of Computer Science, NUS, 2006 (under submission).

¡¡

3) List of Collaborations with:

Dimitris PAPADIAS, Professor, Hong Kong University of Science and Technology

Spiros SKIADOPOULOS, University of Peloponnese, Greece

Nikos MAMOULIS, University of Hong Kong

¡¡

4) Names of the Faculty Members in the research area

Panos KALNIS


¡¡

National University Of Singapore School Of Computing Main Page Search Our Site Sitemap Contact Us Intranet Legal Statement ¡¡

Contact Webmaster: Bao Zhifeng, Xu Liang

¡¡