VLDB 2010 , 36th International Conference on Very Large Data Bases
  Singapore : 13 to 17 Sept 2010, Grand Copthorne Waterfront Hotel
  General Information
  Proceedings, Slides,
  Photos
  Program
  Information
  Call for Papers & Proposals
  Previous Conferences
  Links
   VLDB2010 are white and red as the Singapore flag: Photo by Courtesy of Eugene Tang/Singaporesights.com
Tutorial 3

Hanan Samet

Hanan Samet (http://www.cs.umd.edu/~hjs/) is a Professor of Computer Science at the University of Maryland, College Park and is a member of the Institute for Computer Studies.  He is also a member of the Computer Vision Laboratory at the Center for Automation Research where he leads a number of research projects on the use of hierarchical data structures for database applications involving spatial data.  He has a Ph.D from Stanford University.  He is the author of the recent book "Foundations of Multidimensional and Metric Data Structures" published by Morgan-Kaufmann, San Francisco, CA, in 2006 (http://www.mkp.com/multidimensional), an award winner in the 2006 best book in Computer and Information Science competition of the Professional and Scholarly Publishers (PSP) Group of the American Publishers Association (AAP), and of the first two books on spatial data structures titled "Design and Analysis of Spatial Data Structures" and "Applications of Spatial Data Structures:  Computer Graphics, Image Processing and GIS" published by Addison-Wesley, Reading, MA, 1990.  He is the founding chair of ACM SIGSPATIAL, and a recipient of best paper awards in the 2008 SIGMOD Conference, the 2008 SIGSPATIAL Conference, and the 2007 Computers & Graphics Journal, the 2009 UCGIS Research Award and the 2010 CMPS Board of Visitors Award at the University of Maryland, a Fellow of the ACM, IEEE, and IAPR (International Association for Pattern Recognition), and an ACM Distinguished Speaker.

 

Similarity searching is a crucial part of retrieval in multimedia databases used for applications such as pattern recognition, image databases, and content-based retrieval.  It involves finding objects in a data set S that are similar to a query object q based on some distance measure d which is usually a distance metric.  The search process is usually achieved by means of nearest neighbor finding.

Existing methods for handling similarity search in this setting fall into one of two classes.  The first is based on mapping to a vector space.  The vector space is usually of high dimension which requires special handling due to the fact indexing methods do not discriminate well in such spaces.  In particular, the query regions often overlap all of the blocks that result from the decomposition of the underlying space.  This has led to some special solutions that make use of a sequential scan.  An alternative is to use dimension reduction to find a mapping from a high-dimensional space into a low-dimensional space by finding the most discriminating dimensions and then index the data using one of a number of different data structures such as k-d trees, R-trees, quadtrees, etc.  The second directly indexes the objects based on distances making use of data structures such as the vp-tree, M-tree, etc.  At times, the distances are no t metrics which requires additional care.

This seminar is organized into four parts that include an overview as well as cover the basic concepts outlined above:  indexing low and high dimensional spaces, distance-based indexing, and nearest neighbor searching.

Click for Slides in PDF

 


Email Registration | Email Webmaster | Email Committees | NUS Home | SoC

© Copyright 2009-2010 National University of Singapore. All Rights Reserved.
Terms of Use | Privacy | Credits
Last modified on 14 Sep 2010