Tutorials

Tutorials

　

　 Time Title Speaker

Tutorial A 12 April (AM) Database Watermarking Radu Sion

Tutorial B 12 April (PM) Multilingual Database Systems Jayant R. Haritsa

Tutorial C 14 April (PM) Video Sequence Indexing and Query Processing Xiaofang Zhou

Download the slides:

Title:
Database Watermarking

Abstract:
Information, as an expression of knowledge is probably the most valuable asset of humanity today. By enabling relatively cost-free, fast, and accurate access channels to information in digital form, computers have radically changed the way we think and express ideas. As increasingly more of it is produced, packaged and delivered in digital form in a fast, networked environment, one of its main features threatens to become its worst enemy: zero-cost verbatim copies. The inherent ability to produce duplicates of digital Works at virtually no cost can be now misused e.g. for illicit profit. This dramatically increases the requirement for effective rights protection mechanisms.

Different avenues are available, each with its advantages and drawbacks. Enforcement by legal means is usually ineffective, unless augmented by a digital counter-part such as Information Hiding. Digital Watermarking deploys Information Hiding as a method of Rights Protection to conceal an indelible .rights witness. (watermark) within the digital Work to be protected. The soundness of such a method relies on the assumption that altering the Work in the process of hiding the mark does not destroy the value of the Work, and that it is difficult for a malicious adversary (.Mallory.) to remove or alter the mark beyond detection without destroying the value of the Work. The ability to resist attacks from such an adversary (mostly aiming at removing the embedded watermark) is one of the major concerns in the design of a sound watermarking solution.

Rights Protection for relational data is important in areas where sensitive, valuable content is to be outsourced. A good example is a data mining application, where data is sold in pieces to parties specialized in mining it (e.g. sales patterns database, oil drilling data, financial data). Other scenarios involve for example online B2B interactions (e.g. airline reservation and scheduling portals) or sensor streams, in which data is made available for direct, possibly interactive use. In [5, 6, 7] Sion et. al., and in [1, 2] Kiernan, Agrawal et. al. explore rights protection solutions for numeric relational data through watermarking. In [4, 8] Sion introduces the problem of resilient rights proofs for categorical data. Additionally, in [3] Li et. al. extend the work by Kiernan, Agrawal et. al. [1, 2] to provide for multi-bit watermarks in a direct domain encoding. In this tutorial we explore these and other related research efforts. We analyze their resilience and ability to provide court-time rights proofs. We discuss deployment scenarios and provide implementation recommendations. We explore future associated directions. Time permitting, we intend to also provide a demonstration of one of the software packages discussed in [5].

References:
[1] Rakesh Agrawal, Peter J. Haas, and Jerry Kiernan. Watermarking relational data: framework, algorithms and analysis. The VLDB Journal, 12(2):157.169, 2003.
[2] J. Kiernan and R. Agrawal. Watermarking relational databases. In Proceedings of the 28th International Conference on Very Large Databases VLDB, 2002.
[3] Yingjiu Li, Vipin Swarup, and Sushil Jajodia. A robust watermarking scheme for relational data. In Proceedings of the Workshop on Information Technology and Systems (WITS), pages 195.200, 2003.
[4] Radu Sion. Proving ownership over categorical data. In Proceedings of the IEEE International Conference on Data Engineering ICDE, 2004.
[5] Radu Sion. wmdb.*: A suite for database watermarking (demo). In Proceedings of the IEEE International Conference on Data Engineering ICDE, 2004.
[6] Radu Sion, Mikhail Atallah, and Sunil Prabhakar. Rights protection for relational data. In Proceedings of the ACM Special Interest Group on Management of Data Conference SIGMOD, 2003.
[7] Radu Sion, Mikhail Atallah, and Sunil Prabhakar. Relational data rights protection through watermarking. IEEE Transactions on Knowledge and Data Engineering TKDE, 16(6), June 2004.
[8] Radu Sion, Mikhail Atallah, and Sunil Prabhakar. Ownership proofs for categorical data. IEEE Transactions on Knowledge and Data Engineering TKDE, 2005.

About the Speaker: Radu Sion is an Assistant Professor of Computer Sciences at Stony Brook University. He received his PhD (2004) in Computer Sciences from Purdue University. While at Purdue, Radu was affiliated with the Center of Education and Research in Information Assurance and with the Indiana Center of Database Systems. In most of 2004 Radu visited with the IBM Almaden Research Center, while on leave from Stony Brook.

Radu Sion's current research interests are centered around inter-connected entities that access data and need to do so with assurances of security, privacy, and functionality, preferably fast. His research lies at the intersection of security, databases and distributed systems. Applications include: authentication, rights protection and integrity proofs, trusted reputation and secure storage in peer to peer and ad-hoc environments, data privacy and bounds on illicit inference over multiple data sources, security in computation/data grids, detection of intrusions by access profiling for on-line web portals.

Title:
Multilingual Database Systems

Abstract:
Efficient storage and query processing of data spanning multiple natural languages are of crucial importance in today's globalized world. A primary prerequisite to achieve this goal is that the defacto standard data repositories -- relational database systems -- should efficiently and seamlessly support multilingual data. In this tutorial, we will first present a detailed assessment of how good today's database systems (both commercial and public-domain) are with regard to the storage, management and processing of multilingual data. Our results will show that there are significant performance inefficiencies for languages based on scripts other than Latin (such as Devanagari, Kanji, Cyrillic, etc.). We will also outline techniques for alleviating these problems.

With regard to functionality, a major limitation of SQL is that it does not support querying of data across different natural languages, that is, cross-lingual queries. To address this lacuna, we will propose two new SQL operators that support phoneme-based matching of names, and ontology-based matching of concepts, in the multilingual world.

An algebra for integrating these new operators with relational systems will be defined as well as the associated cost models, selectivity estimators, and access methods. Our experience with a prototype implementation of these operators on PostgreSQL will be highlighted.

In a nutshell, this tutorial will present practical approaches towards realizing the ultimate goal of "natural-language-neutral" database engines.

Duration:
3 hours

About the Speaker:
Jayant R. Haritsa is on the faculty of the Supercomputer Education & Research Centre and the Department of Computer Science & Automation at the Indian Institute of Science, Bangalore. He received the BTech degree in Electronics and Communications Engineering from the Indian Institute of Technology (Madras), and the MS and PhD degrees in Computer Science from the University of Wisconsin (Madison). His research interests are in database systems. He is a recipient of the Swarnajayanti Fellowship from the Government of India, and the Sir C V Raman Young Scientist Award from the Government of Karnataka. He is an Associate Editor of the International Journal of Real-time Systems.

Title:
Video Sequence Indexing and Query Processing

Abstract:
Effective and efficient multimedia data retrieval has attracted extensive attention in the last decade. Among media types, video presents the most complex data, including a sequence of frames (or feature vectors), audio, motion, meta-data, and many others. With ever more heavy usage of video devices and advances in video processing technologies, the amount of video data has grown rapidly and enormously for various usages, such as advertising, news video broadcasting, video surveillance, personal video archive, and medical video data. Interestingly, the popularity of WWW enables enormous video data to be published and shared. Web search engines provide users convenient ways for finding videos, of their interests. Due to the high complexity of video data, retrieving the similar video content with respect to a user's query from a large database requires: (a) effective and compact video representations, (b) efficient similarity measurement, and (c) efficient indexing on the compact representations. Given such indispensable demand, very recently, indexing video sequences for fast retrieval has attracted much attention in database community, with and without considering temporal, spatial, and alignment features.

In this tutorial, we focus on video's sequence feature of frames, each of which is a high-dimensional image feature vector. The number of frames is typically in the range of hundreds or more, depending on the length of video. We propose to visit the recent video feature representation models and their similarity measures. Understanding these themes and the intuition behind them helps to construct effective sequence indexing structures and develop fast search techniques. We then move to discuss the state-of-the-arts methods for video sequences indexing in high-dimensional space. We will also discuss open issues and challenges and the potential research trends for video search.

With emerging complex queries in Web search engines, indexing video, the most powerful communication media, will be in the limelight. This tutorial covers a wide spectrum of topics in video search from the database point of view.

Duration:
3 hours

Target audience: researchers and practitioners in the area of multimedia databases and information retrieval.

About the Speaker:
Dr Xiaofang Zhou is a Professor in School of Information Technology and Electrical Engineering, The University of Queensland, Australia. He received his BSc and MSc degrees in Computer Science from Nanjing University, and his PhD degree in Computer Science from the University of Queensland in 1994.His research interests include spatial databases, multimedia databases and high performance query processing. He has published over 90 research papers, including those at SIGMOD, VLDB, ICDE and the VLDB Journal.

	Time	Title	Speaker
Tutorial A	12 April (AM)	Database Watermarking	Radu Sion
Tutorial B	12 April (PM)	Multilingual Database Systems	Jayant R. Haritsa
Tutorial C	14 April (PM)	Video Sequence Indexing and Query Processing	Xiaofang Zhou