VLDB2010 Spore :Program

Biography

Hanan Samet

Hanan Samet (http://www.cs.umd.edu/~hjs/) is a Professor of Computer Science at the University of Maryland, College Park and is a member of the Institute for Computer Studies. He is also a member of the Computer Vision Laboratory at the Center for Automation Research where he leads a number of research projects on the use of hierarchical data structures for database applications involving spatial data. He has a Ph.D from Stanford University. He is the author of the recent book "Foundations of Multidimensional and Metric Data Structures" published by Morgan-Kaufmann, San Francisco, CA, in 2006 (http://www.mkp.com/multidimensional), an award winner in the 2006 best book in Computer and Information Science competition of the Professional and Scholarly Publishers (PSP) Group of the American Publishers Association (AAP), and of the first two books on spatial data structures titled "Design and Analysis of Spatial Data Structures" and "Applications of Spatial Data Structures: Computer Graphics, Image Processing and GIS" published by Addison-Wesley, Reading, MA, 1990. He is the founding chair of ACM SIGSPATIAL, and a recipient of best paper awards in the 2008 SIGMOD Conference, the 2008 SIGSPATIAL Conference, and the 2007 Computers & Graphics Journal, the 2009 UCGIS Research Award and the 2010 CMPS Board of Visitors Award at the University of Maryland, a Fellow of the ACM, IEEE, and IAPR (International Association for Pattern Recognition), and an ACM Distinguished Speaker.

Title: Techniques for Similarity Searching in Multimedia Databases

Similarity searching is a crucial part of retrieval in multimedia databases used for applications such as pattern recognition, image databases, and content-based retrieval. It involves finding objects in a data set S that are similar to a query object q based on some distance measure d which is usually a distance metric. The search process is usually achieved by means of nearest neighbor finding.

Existing methods for handling similarity search in this setting fall into one of two classes. The first is based on mapping to a vector space. The vector space is usually of high dimension which requires special handling due to the fact indexing methods do not discriminate well in such spaces. In particular, the query regions often overlap all of the blocks that result from the decomposition of the underlying space. This has led to some special solutions that make use of a sequential scan. An alternative is to use dimension reduction to find a mapping from a high-dimensional space into a low-dimensional space by finding the most discriminating dimensions and then index the data using one of a number of different data structures such as k-d trees, R-trees, quadtrees, etc. The second directly indexes the objects based on distances making use of data structures such as the vp-tree, M-tree, etc. At times, the distances are no t metrics which requires additional care.

This seminar is organized into four parts that include an overview as well as cover the basic concepts outlined above: indexing low and high dimensional spaces, distance-based indexing, and nearest neighbor searching.

Click for Slides in PDF

Email Registration | Email Webmaster | Email Committees | NUS Home | SoC