<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="publications.xsl"?>


<PUBLICATIONS UPDATED="10 November 2008">


<!--  ............................................................................  BOOK CHAPTERS  ............................................................................ -->


	
	
	<CHAPTER>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Gabriel</FIRST>
				<LAST>Ghinita</LAST>
			</AUTHOR>
			<TITLE>Spatial Anonymity</TITLE>
			<BOOK>(to appear) Encyclopedia of Database Systems</BOOK>
			<PUBLISHER>Springer</PUBLISHER>
			<YEAR>2008</YEAR>
	</CHAPTER>
	
	<CHAPTER>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<TITLE>Distributed Spatial Databases</TITLE>
			<BOOK>(to appear) Encyclopedia of Database Systems</BOOK>
			<PUBLISHER>Springer</PUBLISHER>
			<YEAR>2008</YEAR>
	</CHAPTER>
	
	

	<CHAPTER>			
			<AUTHOR>
				<FIRST>Nikos</FIRST>
				<LAST>Mamoulis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Spiridon</FIRST>
				<LAST>Bakiras</LAST>
			</AUTHOR>						
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<TITLE>Top-k OLAP Queries Using Augmented Spatial Access Methods</TITLE>
			<BOOK>Encyclopedia of GIS</BOOK>
			<PUBLISHER>Springer</PUBLISHER>
			<YEAR>2008</YEAR>
			<PAGES>1156-1161</PAGES>
			<ISBN>978-0-387-30858-6</ISBN>
	</CHAPTER>
	
	
	
	<CHAPTER>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Gabriel</FIRST>
				<LAST>Ghinita</LAST>
			</AUTHOR>
			<TITLE>Olap Results, Distributed Caching</TITLE>
			<BOOK>Encyclopedia of GIS</BOOK>
			<PUBLISHER>Springer</PUBLISHER>
			<YEAR>2008</YEAR>
			<PAGES>805-809</PAGES>
			<ISBN>978-0-387-30858-6</ISBN>
	</CHAPTER>
	
	
	
	<CHAPTER>
			<AUTHOR>
				<FIRST>W H</FIRST>
				<LAST>Tok</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Stephane</FIRST>
				<LAST>Bressan</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Baihua</FIRST>
				<LAST>Zheng</LAST>
			</AUTHOR>
			<TITLE>Chapter VIII - Spatial Data on the Move</TITLE>
			<BOOK>Handbook of Research on Mobile Multimedia</BOOK>
			<PUBLISHER>Idea Group Reference</PUBLISHER>
			<ISBN>1-59140-866-0</ISBN>
			<YEAR>2006</YEAR>
			<PAGES>103-118</PAGES>
	</CHAPTER>




<!--  ............................................................................  JOURNALS  ............................................................................ -->

	<ARTICLE>
			<AUTHOR>
				<FIRST>Zhenzhou</FIRST>
				<LAST>Zhu</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Spiridon</FIRST>
				<LAST>Bakiras</LAST>
			</AUTHOR>
			<TITLE>DCMP: A Distributed Cycle Minimization Protocol for Peer-to-Peer Networks</TITLE>
			<JOURNAL>IEEE Transactions on Parallel and Distributed Systems (IEEE TPDS)</JOURNAL>
			<YEAR>2008</YEAR>
			<RANK>1</RANK>
			<VOLUME>19</VOLUME>
			<NUMBER>3</NUMBER>
			<PAGES>363-377</PAGES>
			<ABSTRACT>Broadcast-based Peer-to-Peer (P2P) networks, including flat (e.g., Gnutella) and two-layer super-peer implementations (e.g., Kazaa), are extremely popular nowadays due to their simplicity, ease of deployment and versatility. The unstructured network topology, however, contains many cyclic paths which introduce numerous duplicate messages in the system. While such messages can be identified and ignored, they still consume a large proportion of the bandwidth and other resources, causing bottlenecks in the entire network. In this paper we describe DCMP, a dynamic, fully decentralized protocol which reduces significantly the duplicate messages by eliminating unnecessary cycles. As queries are transmitted through the peers, DCMP identifies the problematic paths and attempts to break the cycles, while maintaining the connectivity of the network. In order to preserve the fault resilience and load balancing properties of unstructured P2P systems, DCMP avoids creating a hierarchical organization. Instead, it applies cycle elimination symmetrically around some powerful peers to keep the average path length small. The overall structure is constructed fast with very low overhead. With the information collected during this process, distributed maintenance is performed efficiently even if peers quit the system without notification. The experimental results from our simulator and the prototype implementation on PlanetLab, confirm that DCMP improves significantly the scalability of unstructured P2P systems without sacrificing their desirable properties. Moreover, due to its simplicity, DCMP can be easily implemented in various existing P2P systems and is orthogonal to the search algorithms.</ABSTRACT>
			<PDF>tpds08.pdf</PDF>
			<BIB>journals/tpds/ZhuKB08</BIB>
	</ARTICLE>


	<ARTICLE>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Gabriel</FIRST>
				<LAST>Ghinita</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Kyriakos</FIRST>
				<LAST>Mouratidis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Dimitris</FIRST>
				<LAST>Papadias</LAST>
			</AUTHOR>
			<TITLE>Preventing Location-Based Identity Inference in Anonymous Spatial Queries</TITLE>
			<JOURNAL>IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE)</JOURNAL>
			<VOLUME>19</VOLUME>
			<NUMBER>12</NUMBER>
			<YEAR>2007</YEAR>
			<PAGES>1719-1733</PAGES>
			<RANK>1</RANK>
			<ABSTRACT>The increasing trend of embedding positioning capabilities (e.g., GPS) in mobile devices facilitates the widespread use of Location Based Services. For such applications to succeed, privacy and confidentiality are essential. Existing privacy enhancing techniques rely on encryption to safeguard communication channels, and on pseudonyms to protect user identities. Nevertheless, the query contents may disclose the physical location of the user. In this paper, we present a framework for preventing location based identity inference of users who issue spatial queries to Location Based Services. We propose transformations based on the well-established K-anonymity concept to compute exact answers for range and nearest neighbor search, without revealing the query source. Our methods optimize the entire process of anonymizing the requests and processing the transformed spatial queries. Extensive experimental studies suggest that the proposed techniques are applicable to real-life scenarios with numerous mobile users.</ABSTRACT>
			<PDF>tkde07.pdf</PDF>
            <BIB>journals/tkde/KalnisGMP07</BIB>
	</ARTICLE>



	<ARTICLE>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Wee Siong</FIRST>
				<LAST>Ng</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Beng Chin</FIRST>
				<LAST>Ooi</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Kian Lee</FIRST>
				<LAST>Tan</LAST>
			</AUTHOR>
			<TITLE>Answering Similarity Queries in Peer-to-Peer Networks</TITLE>
			<JOURNAL>Information Systems</JOURNAL>
			<VOLUME>31</VOLUME>
			<NUMBER>1</NUMBER>
			<YEAR>2006</YEAR>
			<PAGES>57-72</PAGES>
			<RANK>2</RANK>
			<ABSTRACT>A variety of Peer-to-Peer (P2P) systems for sharing digital information are currently available and most of them perform searching by exact key matching. In this paper we focus on similarity searching and describe FuzzyPeer, a generic broadcast-based P2P system which supports a wide range of fuzzy queries. As a case study we present an image retrieval application implemented on top of FuzzyPeer. Users provide sample images whose sets of features are propagated through the peers. The answer consists of the top-k most similar images within the query horizon. In our system the participation of peers is ad-hoc and dynamic, their functionality is symmetric and there is no centralized index. In order to avoid flooding the network with messages, we develop a technique that takes advantage of the fuzzy nature of the queries. Specifically, some queries are "frozen" inside the network, and are satisfied by the streaming results of similar queries that are already running. We describe several optimization techniques for single and multiple-attribute queries, and study their tradeoffs. We evaluate the performance of our algorithms by a prototype implementation on our P2P platform and a simulated large-scale network. Our results suggest that by reusing the existing streams, the scalability of the system improves both in terms of number of nodes and query throughput.</ABSTRACT>
			<PDF>is06.pdf</PDF>
			<BIB>journals/is/KalnisNOT06</BIB>
	</ARTICLE>


	<ARTICLE>
			<AUTHOR>
				<FIRST>Rui</FIRST>
				<LAST>Zhang</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Beng Chin</FIRST>
				<LAST>Ooi</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Kian Lee</FIRST>
				<LAST>Tan</LAST>
			</AUTHOR>
			<TITLE>Generalized Multi-dimensional Data Mapping and Query Processing</TITLE>
			<JOURNAL>ACM Transactions on Data Base Systems (ACM TODS)</JOURNAL>
			<VOLUME>30</VOLUME>
			<NUMBER>3</NUMBER>
			<YEAR>2005</YEAR>
			<PAGES>661-697</PAGES>
			<RANK>1</RANK>
			<ABSTRACT>Multi-dimensional data points can be mapped to one-dimensional space to exploit single dimensional indexing structures such as the B+-tree. In this paper we present a Generalized structure for data Mapping and query Processing (GiMP), which supports extensible mapping methods and query processing. GiMP can be easily customized to behave like many competent indexing mechanisms for multi-dimensional indexing, such as the UB-Tree, the Pyramid technique, the iMinMax, and the iDistance. Besides being an extendible indexing structure, GiMP also serves as a framework to study the characteristics of the mapping and hence the efficiency of the indexing scheme. Specifically, we introduce a metric called mapping redundancy to characterize the efficiency of a mapping method in terms of disk page accesses and analyze its behavior for point, range and kNN queries. We also address the fundamental problem of whether an efficient mapping exists and how to define such a mapping for a given data set.</ABSTRACT>
			<PDF>tods05.pdf</PDF>
			<BIB>journals/tods/ZhangKOT05</BIB>
	</ARTICLE>

	<ARTICLE>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Dimitris</FIRST>
				<LAST>Papadias</LAST>
			</AUTHOR>
			<TITLE>Multi-query Optimization for On-Line Analytical Processing</TITLE>
			<JOURNAL>Information Systems</JOURNAL>
			<VOLUME>28</VOLUME>
			<NUMBER>5</NUMBER>
			<YEAR>2003</YEAR>
			<PAGES>457-473</PAGES>
			<RANK>2</RANK>
			<ABSTRACT>Multi-Dimensional Expressions (MDX) provide an interface for asking several related OLAP queries simultaneously. An interesting problem is how to optimize the execution of an MDX query, given that most data warehouses maintain a set of redundant materialized views to accelerate OLAP operations. A number of greedy and approximation algorithms have been proposed for different versions of the problem. In this paper we evaluate experimentally their performance using the APB and TPC-H benchmark concluding that they do not scale well for realistic workloads. Motivated by this fact, we develop two novel greedy algorithms. Our algorithms construct the execution plan in a top-down manner by identifying in each step the most beneficial view, instead of finding the most promising query. We show by extensive experimentation that our methods outperform the existing ones in most cases.</ABSTRACT>
			<PDF>TR-CS01-12.pdf</PDF>
			<BIB>journals/is/KalnisP03</BIB>
	</ARTICLE>

	<ARTICLE>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Nikos</FIRST>
				<LAST>Mamoulis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Dimitris</FIRST>
				<LAST>Papadias</LAST>
			</AUTHOR>
			<TITLE>View Selection Using Randomized Search</TITLE>
			<JOURNAL>Data &amp; Knowledge Engineering (DKE)</JOURNAL>
			<VOLUME>42</VOLUME>
			<NUMBER>1</NUMBER>
			<YEAR>2002</YEAR>
			<PAGES>89-111</PAGES>
			<RANK>2</RANK>
			<ABSTRACT>An important issue in data warehouse development is the selection of a set of views to materialize in order to accelerate OLAP queries, given certain space and maintenance time constraints. Existing methods provide good results but their high execution cost limits their applicability for large problems. In this paper, we explore the application of randomized, local search algorithms to the view selection problem. The efficiency of the proposed techniques is evaluated using synthetic datasets, which cover a wide range of data and query distributions. The results show that randomized search methods provide near-optimal solutions in limited time, being robust to data and query skew. Furthermore, they can be easily adapted for various versions of the problem, including the simultaneous existence of size and time constraints, and view selection in dynamic environments. The proposed heuristics scale well with the problem size, and are therefore particularly useful for real life warehouses, which need to be analyzed by numerous business perspectives.</ABSTRACT>
			<PDF>dke02.pdf</PDF>
			<BIB>journals/dke/KalnisMP02</BIB>
	</ARTICLE>





<!--  ............................................................................  2009  ............................................................................ -->

    
    
    <INPROCEDINGS>
			<AUTHOR>
				<FIRST>Man Lung</FIRST>
				<LAST>Yiu</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Gabriel</FIRST>
				<LAST>Ghinita</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Christian</FIRST>
				<LAST>Jensen</LAST>
			</AUTHOR>			
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<TITLE>Outsourcing of Private Spatial Data for Search Services</TITLE>
			<CONFERENCE>(to appear) Proc. of the IEEE International Conference on Data Engineering (ICDE), short paper</CONFERENCE>
			<YEAR>2009</YEAR>
			<PLACE>Shanghai, China</PLACE>
			<PAGES>4 pages</PAGES>
			<RANK>2</RANK>
			<ABSTRACT>Social networking and content sharing services providers, e.g., Facebook and Google Maps, enable their users to upload and share a variety of user-generated content, including location data such as points of interest. This paper considers a scenario in which the users wish to share location data, but only with their trusted friends. They wish to protect their data in such a way that only the trusted friends can perform spatial queries on the data. Anybody else, including the service provider, should not be able to view the data. We solve the problem by transforming the location data before uploading them. We contribute three spatial transformations that re-distribute locations in space and a fourth transformation that employs cryptographic techniques. The data owner selects transformation keys and shares them with the trusted friends. With the keys available, it is possible to perform spatial queries efficiently, whereas without the keys, it is infeasible to reconstruct the exact original data points from the transformed points. These four transformations represent a spectrum of tradeoffs between query efficiency and data security. In addition, we describe attack models for studying the security properties of our transformations. Empirical studies demonstrate that the proposed methods are efficient and applicable in practice.</ABSTRACT>
			<PDF>icde09.pdf</PDF>
    </INPROCEDINGS>







<!--  ............................................................................  2008  ............................................................................ -->

    
    
    <INPROCEDINGS>
			<AUTHOR>
				<FIRST>Manolis</FIRST>
				<LAST>Terrovitis</LAST>
			</AUTHOR>
						<AUTHOR>
				<FIRST>Nikos</FIRST>
				<LAST>Mamoulis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<TITLE>Privacy-preserving Anonymization of Set-valued Data</TITLE>
			<CONFERENCE>Proc. of the Int. Conf. on Very Large Data Bases (VLDB)</CONFERENCE>
			<YEAR>2008</YEAR>
			<PLACE>Auckland, New Zealand</PLACE>
			<PAGES>11 pages</PAGES>
			<RANK>1</RANK>
			<ABSTRACT>In this paper we study the problem of protecting privacy in the publication of set-valued data. Consider a collection of transactional data that contains detailed information about items bought together by individuals. Even after removing all personal characteristics of the buyer, which can serve as links to his identity, the publication of such data is still subject to privacy attacks from adversaries who have partial knowledge about the set. Unlike most previous works, we do not distinguish data as sensitive and non-sensitive, but we consider them both as potential quasi-identifiers and potential sensitive data, depending on the point of view of the adversary. We define a new version of the k-anonymity guarantee, the k^m-anonymity, to limit the effects of the data dimensionality and we propose efficient algorithms to transform the database. Our anonymization model relies on generalization instead of suppression, which is the most common practice in related works on such data. We develop an algorithm which finds the optimal solution, however, at a high cost which makes it inapplicable for large, realistic problems. Then, we propose two greedy heuristics, which scale much better and in most of the cases find a solution close to the optimal. The proposed algorithms are experimentally evaluated using real datasets.</ABSTRACT>
			<PDF>vldb08.pdf</PDF>
			<PPT>vldb08.ppt</PPT>
    </INPROCEDINGS>
	
	
	
	<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Gabriel</FIRST>
				<LAST>Ghinita</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Ali</FIRST>
				<LAST>Khoshgozaran</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Cyrus</FIRST>
				<LAST>Shahabi</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Kian Lee</FIRST>
				<LAST>Tan</LAST>
			</AUTHOR>
			<TITLE>Private Queries in Location Based Services: Anonymizers are not Necessary</TITLE>
			<CONFERENCE>Proc. of ACM SIGMOD Conference</CONFERENCE>
			<YEAR>2008</YEAR>
			<PLACE>Vancouver, Canada</PLACE>
			<PAGES>121-132</PAGES>
			<RANK>1</RANK>
            <ABSTRACT>Mobile devices equipped with positioning capabilities (e.g., GPS) can pose location-dependent queries to Location Based Services (LBS). To protect privacy, the user location must not be disclosed. Existing solutions utilize a trusted anonymizer between the users and the LBS. This approach has several drawbacks: (i) All users must trust the third party anonymizer, which is a single point of attack. (ii) A large number of cooperating, trustworthy users is needed. (iii) Privacy is guaranteed only for a single snapshot of user locations; users are not protected against correlation attacks (e.g., history of user movement). We propose a novel framework to support private location-dependent queries, based on the theoretical work on Private Information Retrieval (PIR). Our framework does not require a trusted third party, since privacy is achieved via cryptographic techniques. Compared to existing work, our approach achieves stronger privacy for snapshots of user locations; moreover, it is the first to provide provable privacy guarantees against correlation attacks. We use our framework to implement approximate and exact algorithms for nearest-neighbor search. We optimize query execution by employing data mining techniques, which identify redundant computations. Contrary to common belief, the experimental results suggest that PIR approaches incur reasonable overhead and are applicable in practice.</ABSTRACT>
            <PDF>sigmod08.pdf</PDF>
            <PPT>sigmod08.ppt</PPT>
            <BIB>conf/sigmod/GhinitaKKST08</BIB>
      </INPROCEDINGS>



	<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Gabriel</FIRST>
				<LAST>Ghinita</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Yufei</FIRST>
				<LAST>Tao</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<TITLE>On the Anonymization of Sparse High-Dimensional Data</TITLE>
			<CONFERENCE>Proc. of the IEEE International Conference on Data Engineering (ICDE)</CONFERENCE>
			<YEAR>2008</YEAR>
			<PLACE>Cancun, Mexico</PLACE>
			<PAGES>715-724</PAGES>
			<RANK>1</RANK>
            <ABSTRACT>Existing research on privacy-preserving data publishing focuses on relational data: in this context, the objective is to enforce privacy-preserving paradigms, such as k-anonymity and L-diversity, while minimizing the information loss incurred in the anonymizing process (i.e., maximize data utility). However, existing techniques adopt an indexing- or clustering-based approach, and work well for fixed-schema data, with low dimensionality. Nevertheless, certain applications require privacy-preserving publishing of transaction data (or basket data), which involves hundreds or even thousands of dimensions, rendering existing methods unusable. We propose a novel anonymization method for sparse high-dimensional data. We employ a particular representation that captures the correlation in the underlying data, and facilitates the formation of anonymized groups with low information loss. We propose an efficient anonymization algorithm based on this representation. We show experimentally, using real-life datasets, that our method clearly outperforms existing state-of-the-art in terms of both data utility and computational overhead.</ABSTRACT>
	        <PDF>icde08.pdf</PDF>
	        <PPT>icde08.ppt</PPT>
	        <BIB>conf/icde/GhinitaTK08</BIB>
    </INPROCEDINGS>
    
    
    
    	<INPROCEDINGS>		
			 <AUTHOR>
				<FIRST>Nikolay</FIRST>
				<LAST>Vyahhi</LAST>
			</AUTHOR>		
		   <AUTHOR>
				<FIRST>Spiridon</FIRST>
				<LAST>Bakiras</LAST>
			</AUTHOR>			
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Gabriel</FIRST>
				<LAST>Ghinita</LAST>
			</AUTHOR>
			<TITLE>Tracking Moving Objects in Anonymized Trajectories</TITLE>
			<CONFERENCE>Proc. of the Int. Conf. on Database and Expert Systems Applications (DEXA)</CONFERENCE>
			<YEAR>2008</YEAR>
			<PLACE>Turin, Italy</PLACE>
			<PAGES>158-171</PAGES>
			<RANK>2</RANK>
            <ABSTRACT>Multiple target tracking (MTT) is a well-studied technique in the field of radar technology, which associates anonymized measurements with the appropriate object trajectories. This technique, however, suffers from combinatorial explosion, since each new measurement may potentially be associated with any of the existing tracks. Consequently, the complexity of existing MTT algorithms grows exponentially with the number of objects, rendering them inapplicable to large databases. In this paper, we investigate the feasibility of applying the MTT framework in the context of large trajectory databases. Given a history of object movements, where the corresponding object ids have been removed, our goal is to track the trajectory of every object in the database in successive timestamps. Our main contribution lies in the transition from an exponential solution to a polynomial one. We introduce a novel method that transforms the tracking problem into a min-cost max-flow problem. We then utilize well-known graph algorithms that work in polynomial time with respect to the number of objects. The experimental results indicate that the proposed methods produce high quality results that are comparable with the state-of-the-art MTT algorithms. In addition, our methods reduce significantly the computational cost and scale to a large number of objects.</ABSTRACT>
            <PDF>dexa08.pdf</PDF>
            <PPT>dexa08.ppt</PPT>
            <BIB>conf/dexa/VyahhiBKG08</BIB>
    </INPROCEDINGS>    
    
    
    
       	<INPROCEDINGS>			
		   <AUTHOR>
				<FIRST>Wee Siong</FIRST>
				<LAST>Ng</LAST>
			</AUTHOR>			
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Kian-Lee</FIRST>
				<LAST>Tan</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Markus</FIRST>
				<LAST>Kirchberg</LAST>
			</AUTHOR>
			<TITLE>POEMS: Peer-based Overload Management</TITLE>
			<CONFERENCE>Proc. of the Int. Conf. on Web Information Systems Engineering (WISE)</CONFERENCE>
			<YEAR>2008</YEAR>
			<PLACE>Auckland, New Zealand</PLACE>
			<RANK>3</RANK>
			<PAGES>350-365</PAGES>
			<ABSTRACT>The Internet has become increasingly important to many emerging application such as Blog, Wikis, podcasts, and others Web-based communities and social-networking services, i.e. Web 2.0. Behind the scenes, added functionalities depend on the ability of users to work with the data stored on servers, i.e. DBMSs. However, the unpredictability and fluctuations of requests could result in overload, which can substantially degrade the quality of service. It is a challenging task to provide quality of service with inexpensive and scalable infrastructure. In this paper, we look at a new architectural design dimension, POEMS, that is online transformable between a single-node server and peer-based service network architectures. POEMS operates as a conventional DBMS under normal load conditions and transforms to peer-to-peer operation mode for processing under heavy load. In contrast to traditional distributed DBMSs, all nodes contribute their spare capacities for data manipulation. This is achieved without the need to install any DBMS at any of the contributing nodes. Data are partitioned online and operators are distributed to nodes similarly. The effectiveness of query processing is achieved by node cooperation. POEMS allows processes or operators to be dismissed online, so a user can fully utilise his/her resources.</ABSTRACT>
			<BIB>conf/wise/NgKTK08</BIB>
    </INPROCEDINGS>
    
    
    

	<INPROCEDINGS>			
		   <AUTHOR>
				<FIRST>Bharath</FIRST>
				<LAST>Krishnamachari</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Gabriel</FIRST>
				<LAST>Ghinita</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<TITLE>Privacy-Preserving Publication of User Locations in the Proximity of Sensitive Sites</TITLE>
			<CONFERENCE>Proc. of the Int. Conf. on Scientific and Statistical Database Management (SSDBM)</CONFERENCE>
			<YEAR>2008</YEAR>
			<PLACE>Hong Kong, China</PLACE>
			<PAGES>95-113</PAGES>
			<RANK>2</RANK>
            <ABSTRACT>Location-based services, such as on-line maps, obtain the exact location of numerous mobile users. This information can be published for research or commercial purposes. However, privacy may be compromised if a user is in the proximity of a sensitive site (e.g., hospital). To preserve privacy, existing methods employ the K-anonymity paradigm to hide each affected user in a group that contains at least K − 1 other users. Nevertheless, current solutions have the following drawbacks: (i) they may fail to achieve anonymity, (ii) they may cause excessive distortion of location data and (iii) they incur high computational cost. In this paper, we define formally the attack model and discuss the conditions that guarantee privacy. Then, we propose two algorithms which employ 2-D to 1-D transformations to anonymize the locations of users in the proximity of sensitive sites. The first algorithm, called MK, creates anonymous groups based on the set of user locations only, and exhibits very low computational cost. The second algorithm, called BK, performs bichromatic clustering of both user locations and sensitive sites; BK is slower but more accurate than MK.We show experimentally that our algorithms outperform the existing methods in terms of computational cost and data distortion.</ABSTRACT>
            <PDF>ssdbm08.pdf</PDF>
            <PPT>ssdbm08.ppt</PPT>
            <BIB>conf/ssdbm/KrishnamachariGK08</BIB>
    </INPROCEDINGS>
    
    





<!--  ............................................................................  2007  ............................................................................ -->



	<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Gabriel</FIRST>
				<LAST>Ghinita</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panagiotis</FIRST>
				<LAST>Karras</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Nikos</FIRST>
				<LAST>Mamoulis</LAST>
			</AUTHOR>
			<TITLE>Fast Data Anonymization with Low Information Loss</TITLE>
			<CONFERENCE>Proc. of the Int. Conf. on Very Large Data Bases (VLDB)</CONFERENCE>
			<YEAR>2007</YEAR>
			<PLACE>Vienna, Austria</PLACE>
			<PAGES>758-769</PAGES>
			<RANK>1</RANK>
			<ABSTRACT>Recent research studied the problem of publishing microdata without revealing sensitive information, leading to the privacy preserving paradigms of k-anonymity and l-diversity. k-anonymity protects against the identification of an individual’s record. l-diversity, in addition, safeguards against the association of an individual with specific sensitive information. However, existing approaches suffer from at least one of the following drawbacks: (i) The information loss metrics are counter-intuitive and fail to capture data inaccuracies inflicted for the sake of privacy. (ii) l-diversity is solved by techniques developed for the simpler k-anonymity problem, which introduces unnecessary inaccuracies. (iii) The anonymization process is inefficient in terms of computation and I/O cost. In this paper we propose a framework for efficient privacy preservation that addresses these deficiencies. First, we focus on one-dimensional (i.e., single attribute) quasiidentifiers, and study the properties of optimal solutions for k-anonymity and l-diversity, based on meaningful information loss metrics. Guided by these properties, we develop efficient heuristics to solve the one-dimensional problems in linear time. Finally, we generalize our solutions to multi-dimensional quasi-identifiers using space-mapping techniques. Extensive experimental evaluation shows that our techniques clearly outperform the state-of-the-art, in terms of execution time and information loss.</ABSTRACT>
			<PDF>vldb07.pdf</PDF>
			<BIB>conf/vldb/GhinitaKKM07</BIB>
			<PPT>vldb07.ppt</PPT>
	</INPROCEDINGS>

	<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Gabriel</FIRST>
				<LAST>Ghinita</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Spiros</FIRST>
				<LAST>Skiadopoulos</LAST>
			</AUTHOR>
			<TITLE>MOBIHIDE: A Mobile Peer-to-Peer System for Anonymous Location-Based Queries</TITLE>
			<CONFERENCE>Proc. of the Int. Symposium in Spatial and Temporal Databases (SSTD)</CONFERENCE>
			<YEAR>2007</YEAR>
			<PLACE>Boston, MA</PLACE>
			<PAGES>221-238</PAGES>
			<RANK>2</RANK>
			<ABSTRACT>Modern mobile phones and PDAs are equipped with positioning capabilities (e.g., GPS). Users can access public location-based services (e.g., Google Maps) and ask spatial queries. Although communication is encrypted, privacy and confidentiality remain major concerns, since the queries may disclose the location and identity of the user. Commonly, spatial K-anonymity is employed to hide the query initiator among a group of K users. However, existing work either fails to guarantee privacy, or exhibits unacceptably long response time. In this paper we propose MobiHide, a Peer-to-Peer system for anonymous location-based queries, which addresses these problems. MobiHide employs the Hilbert space-filling curve to map the 2-D locations of mobile users to 1-D space. The transformed locations are indexed by a Chord-based distributed hash table, which is formed by the mobile devices. The resulting Peer-to-Peer system is used to anonymize a query by mapping it to a random group of K users that are consecutive in the 1-D space. Compared to existing state-of-the-art, MobiHide does not provide theoretical anonymity guarantees for skewed query distributions. Nevertheless, it achieves strong anonymity in practice, and it eliminates system hotspots. Our experimental evaluation shows that MobiHide has good load balancing and fault tolerance properties, and is applicable to real-life scenarios with numerous mobile users.</ABSTRACT>
			<PDF>sstd07.pdf</PDF>
			<PPT>sstd07.ppt</PPT>
			<BIB>conf/ssd/GhinitaKS07</BIB>
	</INPROCEDINGS>
	
	
	<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Gabriel</FIRST>
				<LAST>Ghinita</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Spiros</FIRST>
				<LAST>Skiadopoulos</LAST>
			</AUTHOR>
			<TITLE>PRIVE: Anonymous Location-based Queries in Distributed Mobile Systems</TITLE>
			<CONFERENCE>Proc. of World Wide Web Conf. (WWW)</CONFERENCE>
			<YEAR>2007</YEAR>
			<PLACE>Banff, Canada</PLACE>
			<PAGES>371-380</PAGES>
			<RANK>1</RANK>
			<ABSTRACT>Nowadays, mobile users with positioning devices can access Location Based Services (LBS) and query about points of interest in their proximity. For such applications to succeed, privacy and confidentiality are essential. Encryption alone is not adequate; although it safeguards the system against eavesdroppers, the queries themselves may disclose the location and identity of the user. Recently, there have been proposed centralized architectures based on K-anonymity, which utilize an intermediate anonymizer between the mobile users and the LBS. However, the anonymizer must be updated continuously with the current locations of all users. Moreover, the complete knowledge of the entire system poses a security threat, if the anonymizer is compromised. In this paper we address two issues: (i) We show that existing approaches may fail to provide spatial anonymity for some distributions of user locations and describe a novel technique which solves this problem. (ii) We propose Prive, a decentralized architecture for preserving the anonymity of users issuing spatial queries to LBSs. Mobile users self-organize into an overlay network with good fault tolerance and load balancing properties. Prive avoids the bottleneck caused by centralized techniques both in terms of anonymization and location updates. Moreover, the status is distributed in numerous users, rendering the system resilient to attacks. Extensive experimental studies suggest that Prive is applicable to real-life scenarios with large populations of mobile users.</ABSTRACT>
				<PDF>WWW07.pdf</PDF>
  				<BIB>conf/www/GhinitaKS07</BIB>
  				<PPT>www07.ppt</PPT>
	</INPROCEDINGS>
	
	
<!--  ............................................................................  2006  ............................................................................ -->

	<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Nikos</FIRST>
				<LAST>Mamoulis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Spiridon</FIRST>
				<LAST>Bakiras</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Xiaochen</FIRST>
				<LAST>Li</LAST>
			</AUTHOR>
			<TITLE>Ad-hoc Distributed Spatial Joins on Mobile Devices</TITLE>
			<CONFERENCE>Proc. of the Int. Parallel and Distributed Processing Symposium (IPDPS)</CONFERENCE>
			<PLACE>Rhodes, Greece</PLACE>
			<YEAR>2006</YEAR>
			<PAGES>10 pages</PAGES>
			<RANK>2</RANK>
			<ABSTRACT>PDAs, cellular phones and other mobile devices are now capable of supporting complex data manipulation operations. Here, we focus on ad-hoc spatial joins of datasets residing in multiple non-cooperative servers. Assuming that  there is no mediator available, the spatial joins must be evaluated on the mobile device. Contrary to common applications that consider the cost at the server side, our main issue is the minimization of the transferred data, while meeting the resource constraints of the device. We show that existing methods, based on partitioning and pruning, are inadequate in many realistic situations. Then, we present novel algorithms that estimate the data distribution before deciding the physical operator independently for each partition. Our experiments with a prototype implementation on a WiFi-enabled PDA, suggest that the proposed methods outperform the competitors in terms of efficiency and applicability.</ABSTRACT>
			<PDF>ipdps06.pdf</PDF>
			<PPT>ipdps06.ppt</PPT>
			<BIB>conf/ipps/KalnisMBL06</BIB>
	</INPROCEDINGS>
	
	
<!--  ............................................................................  2005  ............................................................................ -->

	<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Nikos</FIRST>
				<LAST>Mamoulis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Spiridon</FIRST>
				<LAST>Bakiras</LAST>
			</AUTHOR>
			<TITLE>On Discovering Moving Clusters in Spatio-temporal Data</TITLE>
			<CONFERENCE>Proc. of the Int. Symposium in Spatial and Temporal Databases (SSTD)</CONFERENCE>
			<YEAR>2005</YEAR>
			<PLACE>Angra dos Reis, Brazil</PLACE>
			<PAGES>364-381</PAGES>
			<RANK>2</RANK>
			<ABSTRACT>A moving cluster is defined by a set of objects that move close to each other for a long time interval. Real-life examples are a group of migrating animals, a convoy of cars moving in a city, etc. We study the discovery of moving clusters in a database of object trajectories. The difference of this problem compared to clustering trajectories and mining movement patterns is that the identity of a moving cluster remains unchanged while its location and content may change over time. For example, while a group of animals are migrating, some animals may leave the group or new animals may enter it. We provide a formal definition for moving clusters and describe three algorithms for their automatic discovery: (i) a straight-forward method based on the definition, (ii) a more efficient method which avoids redundant checks and (iii) an approximate algorithm which trades accuracy for speed by borrowing ideas from the MPEG-2 video encoding. The experimental results demonstrate the efficiency of our techniques and their applicability to large spatio-temporal datasets.</ABSTRACT>
			<PDF>sstd05a.pdf</PDF>
			<PPT>sstd05a.ppt</PPT>
			<BIB>conf/ssd/KalnisMB05</BIB>
	</INPROCEDINGS>


<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Nikos</FIRST>
				<LAST>Mamoulis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Spiridon</FIRST>
				<LAST>Bakiras</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<TITLE>Evaluation of Top-k OLAP Queries Using Aggregate R–trees</TITLE>
			<CONFERENCE>Proc. of the Int. Symposium in Spatial and Temporal Databases (SSTD)</CONFERENCE>
			<YEAR>2005</YEAR>
			<PLACE>Angra dos Reis, Brazil</PLACE>
			<PAGES>236-253</PAGES>
			<RANK>2</RANK>
			<ABSTRACT>A top-k OLAP query groups measures with respect to some abstraction level of interesting dimensions and selects the k groups with the highest aggregate value. An example of such a query is “find the 10 combinations of product-type and month with the largest sum of sales”. Such queries may also be applied in a spatial database context, where objects are augmented with some measures that must be aggregated according to a spatial division. For instance, consider a map of objects (e.g., restaurants), where each object carries some nonspatial measure (e.g., the number of customers served during the last month).Given a partitioning of the space into regions (e.g., by a regular grid), the goal is to find the regions with the highest number of served customers. A straightforward method to evaluate a top-k OLAP query is to compute the aggregate value for each group and then select the groups with the highest aggregates. In this paper, we study the integration of the top-k operator with the aggregate query processing module. For this, we make use of spatial indexes, augmented with aggregate information, like the aggregate R–tree. We device a branch-and-bound algorithm that accesses a minimal number of tree nodes in order to compute the top-k groups. The efficiency of our approach is demonstrated by experimentation.</ABSTRACT>
			<PDF>sstd05b.pdf</PDF>
			<PPT>sstd05b.ppt</PPT>
			<BIB>conf/ssd/MamoulisBK05</BIB>
	</INPROCEDINGS>
	
	
		<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Rui</FIRST>
				<LAST>Yang</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Anthony K. H.</FIRST>
				<LAST>Tung</LAST>
			</AUTHOR>
			<TITLE>Similarity Evaluation on Tree-structured Data</TITLE>
			<CONFERENCE>Proc. of ACM SIGMOD Conference</CONFERENCE>
			<YEAR>2005</YEAR>
			<PLACE>Baltimore, MD</PLACE>
			<PAGES>754-765</PAGES>
			<RANK>1</RANK>
			<ABSTRACT>Tree-structured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications. The generally accepted similarity measure for trees is the edit distance. Although similarity search has been extensively studied, searching for similar trees is still an open problem due to the high complexity of computing the tree edit distance. In this paper, we propose to transform tree-structured data into an approximate numerical multidimensional vector which encodes the original structure information. We prove that the L1 distance of the corresponding vectors, whose computational complexity is O(|T1|+|T2|), forms a lower bound for the edit distance between trees. Based on the theoretical analysis, we describe a novel algorithm which embeds the proposed distance into a filter-and-refine framework to process similarity search on tree-structured data. The experimental results show that our algorithm reduces dramatically the distance computation cost. Our method is especially suitable for accelerating similarity query processing on large trees in massive datasets.</ABSTRACT>
			<PDF>sigmod05.pdf</PDF>
			<BIB>conf/sigmod/YangKT05</BIB>
	</INPROCEDINGS>
	
<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Shen Tat</FIRST>
				<LAST>Goh</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Kian Lee</FIRST>
				<LAST>Tan</LAST>
			</AUTHOR>
			<TITLE>Real Datasets for File-Sharing Peer-to-Peer Systems</TITLE>
			<CONFERENCE>Proc. of the Int. Conference on Database Systems for Advanced Applications (DASFAA)</CONFERENCE>
			<YEAR>2005</YEAR>
			<PLACE>Beijing, China</PLACE>
			<PAGES>201-213</PAGES>
			<RANK>2</RANK>
			<ABSTRACT>The fundamental drawback of unstructured peer-to-peer (P2P) networks is the flooding-based query processing protocol that seriously limits their scalability. As a result, a significant amount of research work has focused on designing efficient search protocols that reduce the overall communication cost. What is lacking, however, is the availability of real data, regarding the exact content of users" libraries and the queries that these users ask. Using trace-driven simulations will clearly generate more meaningful results and further illustrate the efficiency of a generic query processing protocol under a real-life scenario. Motivated by this fact, we developed a Gnutella-style probe and collected detailed data over a period of two months. They involve around 4,500 users and contain the exact files shared by each user, together with any available metadata (e.g., artist for songs) and information about the nodes (e.g., connection speed). We also collected the queries initiated bythese users. After filtering, the data were organized in XML format and are available to researchers. Here, we analyze this dataset and present its statistical characteristics. Additionally, as a case study, we employ it to evaluate two recently proposed P2P searching techniques.</ABSTRACT>
			<PDF>dasfaa05.pdf</PDF>
			<BIB>conf/dasfaa/GohKBT05</BIB>
	</INPROCEDINGS>
	

<!--  ............................................................................  2004 ............................................................................ -->


<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Qiang</FIRST>
				<LAST>Jing</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Rui</FIRST>
				<LAST>Yang</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Anthony K H</FIRST>
				<LAST>Tung</LAST>
			</AUTHOR>
			<TITLE>Localized Signature Table: Fast Similarity Search on Transaction Data</TITLE>
			<CONFERENCE>Proc. of the Int. Conference on Information and Knowledge Management (CIKM)</CONFERENCE>
			<YEAR>2004</YEAR>
			<PLACE>Washington, DC</PLACE>
			<PAGES>314-323</PAGES>
			<RANK>2</RANK>
			<ABSTRACT>Recently, techniques for supporting efficient similarity search over huge transaction datasets have emerged as an important research area. Several indexing schemes have been proposed towards this direction. Typically, these schemes provide a tradeoff between searching efficiency and indexing overhead in terms of space. In this paper, we propose a novel indexing scheme for similarity search on transaction data. Based on well-studied clustering techniques, we develop a construction algorithm for the proposed index and a branch-and-bound searching strategy for answering similarity search. Unlike previous techniques, our indexing scheme exhibits high search efficiency and low space requirements by trading-off the pre-computation time. This behavior is ideal for applications with low update but high read volume (e.g., data warehousing, collaborative filtering, etc.). Moreover, our experimental results illustrate that our method is robust to the varying characteristics of the datasets.</ABSTRACT>
			<PDF>cikm04.pdf</PDF>
			<BIB>conf/cikm/JingYKT04</BIB>
	</INPROCEDINGS>
	
	
		<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Eric</FIRST>
				<LAST>Lo</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Nikos</FIRST>
				<LAST>Mamoulis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>David W.</FIRST>
				<LAST>Cheung</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Wai Shing</FIRST>
				<LAST>Ho</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<TITLE>Processing Ad-Hoc Joins on Mobile Devices</TITLE>
			<CONFERENCE>Proc. of the Int. Conference on Database and Expert Systems Applications (DEXA)</CONFERENCE>
			<YEAR>2004</YEAR>
			<PLACE>Zaragoza, Spain</PLACE>
			<PAGES>611-621</PAGES>
			<RANK>2</RANK>
			<ABSTRACT>Mobile devices are capable of retrieving and processing data from remote databases. In a wireless data transmission environment, users are typically charged by the size of transferred data, rather than the amount of time they stay connected. We propose algorithms that join information from non-collaborative remote databases on mobile devices. Our methods minimize the data transferred during the join process, by also considering the limitations of mobile devices. Experimental results show that our approach can perform join processing on mobile devices effectively.</ABSTRACT>
			<PDF>dexa04b.pdf</PDF>
			<BIB>conf/dexa/LoMCHK04</BIB>
	</INPROCEDINGS>


<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Imthiyaz Kasim</FIRST>
				<LAST>Mohammed</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Dong</FIRST>
				<LAST>Xiaoan</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<TITLE>Efficient Processing of Distributed Iceberg Semi-joins</TITLE>
			<CONFERENCE>Proc. of the Int. Conference on Database and Expert Systems Applications (DEXA)</CONFERENCE>
			<YEAR>2004</YEAR>
			<PLACE>Zaragoza, Spain</PLACE>
			<PAGES>634-643</PAGES>
			<RANK>2</RANK>
			<ABSTRACT>The Iceberg SemiJoin (ISJ) of two datasets R and S returns the tuples in R which join with at least k tuples of S. The ISJ operator is	essential in many practical applications including OLAP, Data Mining and Information Retrieval. In this paper we consider the distributed evaluation	of Iceberg SemiJoins, where R and S reside on remote servers. We developed an efficient algorithm which employs Bloom filters. The novelty of our approach is that we interleave the evaluation of the Iceberg set in server S with the pruning of unmatched tuples in server R. Therefore, we are able to	(i) eliminate unnecessary tuples early, and (ii) extract accurate Bloom filters from the intermediate hash tables which are constructed during the	generation of the Iceberg set. Compared to conventional two-phase approaches, our experiments demonstrate that our method transmits up to 80% less data through the network, while reducing the disk I/O cost.</ABSTRACT>
			<PDF>dexa04a.pdf</PDF>
			<BIB>conf/dexa/ImthiyazXK04</BIB>
	</INPROCEDINGS>
	
	
	
	<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Wee Siong</FIRST>
				<LAST>Ng</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Beng Chin</FIRST>
				<LAST>Ooi</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Kian Lee</FIRST>
				<LAST>Tan</LAST>
			</AUTHOR>
			<TITLE>Answering Similarity Queries in Peer-to-Peer Networks</TITLE>
			<CONFERENCE>Poster Paper,Word Wide Web (WWW) Conference</CONFERENCE>
			<YEAR>2004</YEAR>
			<PLACE>New York, NY</PLACE>
			<PAGES>482-483</PAGES>
			<RANK>3</RANK>
			<PDF>www04.pdf</PDF>
			<BIB>conf/www/KalnisNOT04</BIB>
	</INPROCEDINGS>
	
	<!--  ............................................................................  2003  ............................................................................ -->
	
	<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Nikos</FIRST>
				<LAST>Mamoulis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Spiridon</FIRST>
				<LAST>Bakiras</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Xiaochen</FIRST>
				<LAST>Li</LAST>
			</AUTHOR>
			<TITLE>Optimization of Spatial Joins on Mobile Devices</TITLE>
			<CONFERENCE>Proc. of the Int. Symposium in Spatial and Temporal Databases (SSTD)</CONFERENCE>
			<YEAR>2003</YEAR>
			<PLACE>Santorini, Greece</PLACE>
			<PAGES>233-251</PAGES>
			<RANK>2</RANK>
			<ABSTRACT>Mobile devices like PDAs are capable of retrieving information from various types of services. In many cases, the user requests cannot be directly processed by the service providers, if their hosts have limited query capabilities or the query combines data from various sources, which do not collaborate with each other. In this paper, we present a framework for evaluating spatial join queries that belong to this class. We presume that the connection and queries are ad-hoc, there is no mediator available and the services are non-collaborative. We also assume that the services are not willing to share their statistics or indexes with the client. We retrieve statistics dynamically in order to generate a low-cost execution plan, while considering the storage and computational power limitations of the PDA. Since acquiring the statistics causes overhead, we describe an adaptive algorithm that optimizes the overall process of statistics retrieval and query execution. We demonstrate the applicability of our methods with a prototype implementation on a PDA with wireless network access.</ABSTRACT>
			<PDF>sstd03.pdf</PDF>
			<PPT>sstd03.ppt</PPT>
			<BIB>conf/ssd/MamoulisKBL03</BIB>
	</INPROCEDINGS>
	

	<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Spiridon</FIRST>
				<LAST>Bakiras</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Thanasis</FIRST>
				<LAST>Loukopoulos</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Wee Siong</FIRST>
				<LAST>Ng</LAST>
			</AUTHOR>
			<TITLE>A General Framework for Searching in Distributed Data Repositories</TITLE>
			<CONFERENCE>Proc. of the Int. Parallel and Distributed Processing Symposium (IPDPS)</CONFERENCE>
			<YEAR>2003</YEAR>
			<PLACE>Nice, France</PLACE>
			<PAGES>1-10</PAGES>
			<RANK>2</RANK>
			<ABSTRACT>This paper proposes a general framework for searching large distributed repositories. Examples of such repositories include sites	with music/video content, distributed digital libraries, distributed caching systems, etc. The framework is based on the concept of neighborhood; each client keeps a list of the most beneficial sites according to past experience, which are visited first when the client searches for some particular content. Exploration methods continuously update the neighborhoods in order to follow changes in access patterns. Depending on the application, several variations of search and exploration processes are proposed. Experimental evaluation demonstrates the benefits of the framework in different scenarios.</ABSTRACT>
			<PDF>ipdps03.pdf</PDF>
			<BIB>conf/ipps/BakirasKLN03</BIB>
	</INPROCEDINGS>
	
	
	
	<!--  ............................................................................  2002  ............................................................................ -->
	
	

<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Wee Siong</FIRST>
				<LAST>Ng</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Beng Chin</FIRST>
				<LAST>Ooi</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Dimitris</FIRST>
				<LAST>Papadias</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Kian Lee</FIRST>
				<LAST>Tan</LAST>
			</AUTHOR>
			<TITLE>An Adaptive Peer-to-Peer Network for Distributed Caching of OLAP Results</TITLE>
			<CONFERENCE>Proc. of ACM SIGMOD Conference</CONFERENCE>
			<YEAR>2002</YEAR>
			<PLACE>Madison, WI</PLACE>
			<PAGES>25-36</PAGES>
			<RANK>1</RANK>
			<ABSTRACT>Peer-to-Peer (P2P) systems are becoming increasingly popular as they enable users to exchange digital information by participating in complex networks. Such systems are inexpensive, easy to use, highly scalable and do not require central administration. Despite their advantages, however, limited work has been done on employing database systems on top of P2P networks. Here we propose the PeerOLAP architecture for supporting On-Line Analytical Processing queries. A large number of low-end clients, each containing a cache with the most useful results, are connected through an arbitrary P2P network. If a query cannot be answered locally (i.e. by using the cache contents of the computer where it is issued), it is propagated through the network until a peer that has cached the answer is found. An answer may also be constructed by partial results from many peers. Thus PeerOLAP acts as a large distributed cache, which amplifies the benefits of traditional client-side caching. The system is fully distributed and can reconfigure itself on-the-fly in order to decrease the query cost for the observed workload. This paper describes the core components of PeerOLAP and presents our results both from simulation and a prototype installation running on geographically remote peers.</ABSTRACT>
			<PDF>sigmod02.pdf</PDF>
			<BIB>conf/sigmod/KalnisNOPT02</BIB>
	</INPROCEDINGS>



<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Dimitris</FIRST>
				<LAST>Papadias</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Yufei</FIRST>
				<LAST>Tao</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Jun</FIRST>
				<LAST>Zhang</LAST>
			</AUTHOR>
			<TITLE>Indexing Spatio-Temporal Data Warehouses</TITLE>
			<CONFERENCE>Proc. of the IEEE International Conference on Data Engineering (ICDE)</CONFERENCE>
			<YEAR>2002</YEAR>
			<PLACE>San Jose</PLACE>
			<PAGES>166-175</PAGES>
			<RANK>1</RANK>
			<ABSTRACT>Spatio-temporal databases store information about the positions of individual objects over time. In many applications however, such as traffic supervision or mobile communication systems, only summarized data, like the average number of cars in an area for a specific period, or phones serviced by a cell each day, is required. Although this information can be obtained from operational databases, its computation is expensive, rendering online processing inapplicable. A vital solution is the construction of a spatiotemporal data warehouse. In this paper, we describe a framework for supporting OLAP operations over spatiotemporal data. We argue that the spatial and temporal dimensions should be modeled as a combined dimension on the data cube and present data structures, which integrate spatiotemporal indexing with pre-aggregation. While the well-known materialization techniques require a-priori knowledge of the grouping hierarchy, we develop methods that utilize the proposed structures for efficient execution of ad-hoc group-bys. Our techniques can be used for both static and dynamic dimensions.</ABSTRACT>
			<PDF>icde02.pdf</PDF>
			<PPT>icde02.ppt</PPT>
			<BIB>conf/icde/PapadiasTKZ02</BIB>
	</INPROCEDINGS>


	<!--  ............................................................................  2001  ............................................................................ -->

<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Dimitris</FIRST>
				<LAST>Papadias</LAST>
			</AUTHOR>
			<TITLE>Proxy-Server Architectures for OLAP</TITLE>
			<CONFERENCE>Proc. of ACM SIGMOD Conference</CONFERENCE>
			<YEAR>2001</YEAR>
			<PLACE>Santa Barbara, CA</PLACE>
			<PAGES>367-378</PAGES>
			<RANK>1</RANK>
			<ABSTRACT>Data warehouses have been successfully employed for assisting decision making by offering a global view of the enterprise data and providing mechanisms for On-Line Analytical processing. Traditionally, data warehouses are utilized within the limits of an enterprise or organization. The growth of Internet and WWW however, has created new opportunities for data sharing among ad-hoc, geographically spanned and possibly mobile users. Since it is impractical for each enterprise to set up a worldwide infrastructure, currently such applications are handled by the central warehouse. This often yields poor performance, due to overloading of the central server and low transfer rate of the network. In this paper we propose an architecture for OLAP cache servers (OCS). An OCS is the equivalent of a proxy-server for web documents, but it is designed to accommodate data from warehouses and support OLAP operations. We allow numerous OCSs to be connected via an arbitrary network, and present a centralized, a semi-centralized and an autonomous control policy. We experimentally evaluate these policies and compare the performance gain against the existing systems where caching is performed only at the client side. Our architecture offers increased autonomy at remote clients, substantial network traffic savings, better scalability, lower response time and is complementary both to existing OLAP cache systems and distributed OLAP approaches.</ABSTRACT>
			<PDF>sigmod01.pdf</PDF>
			<PPT>sigmod01.ppt</PPT>
			<BIB>conf/sigmod/KalnisP01</BIB>
	</INPROCEDINGS>
	
	<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Dimitris</FIRST>
				<LAST>Papadias</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Jun</FIRST>
				<LAST>Zhang</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Yufei</FIRST>
				<LAST>Tao</LAST>
			</AUTHOR>
			<TITLE>Efficient OLAP Operations in Spatial Data Warehouses</TITLE>
			<CONFERENCE>Proc. of the Int. Symposium in Spatial and Temporal Databases (SSTD)</CONFERENCE>
			<YEAR>2001</YEAR>
			<PLACE>Redondo Beach, CA</PLACE>
			<PAGES>443-459</PAGES>
			<RANK>2</RANK>
			<ABSTRACT>Spatial databases store information about the position of individual objects in space. In many applications however, such as traffic supervision or mobile communications, only summarized data, like the number of cars in an area or phones serviced by a cell, is required. Although this information can be obtained from transactional spatial databases, its computation is expensive, rendering online processing inapplicable. Driven by the non-spatial paradigm, spatial data warehouses can be constructed to accelerate spatial OLAP operations. In this paper we consider the star-schema and we focus on the spatial dimensions. Unlike the non-spatial case, the groupings and the hierarchies can be numerous and unknown at design time, therefore the wellknown materialization techniques are not directly applicable. In order to address this problem, we construct an ad-hoc grouping hierarchy based on the spatial index at the finest spatial granularity. We incorporate this hierarchy in the lattice model and present efficient methods to process arbitrary aggregations. We finally extend our technique to moving objects by employing incrementalupdate methods.</ABSTRACT>
			<PDF>sstd01.pdf</PDF>
			<PPT>sstd01.ppt</PPT>
			<BIB>conf/ssd/PapadiasKZT01</BIB>
	</INPROCEDINGS>
		

		
		<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Dimitris</FIRST>
				<LAST>Papadias</LAST>
			</AUTHOR>
			<TITLE>Optimization Algorithms for Simultaneous Multidimensional Queries in OLAP Environments</TITLE>
			<CONFERENCE>Proc. of the Int. Conference on Data Warehousing and Knowledge Discovery (DaWaK)</CONFERENCE>
			<YEAR>2001</YEAR>
			<PLACE>Munich, Germany</PLACE>
			<PAGES>264-273</PAGES>
			<RANK>3</RANK>
			<ABSTRACT>Multi-Dimensional Expressions (MDX) provide an interface for asking several related OLAP queries simultaneously. An interesting problem is how to optimize the execution of an MDX query, given that most data warehouses maintain a set of redundant materialized views to accelerate OLAP operations. A number of greedy and approximation algorithms have been proposed for different versions of the problem. In this paper we evaluate experimentally their performance using the APB and TPC-H benchmarks, concluding that they do not scale well for realistic workloads. Motivated by this fact, we developed two novel greedy algorithms. Our algorithms construct the execution plan in a top-down manner by identifying in each step the most beneficial view, instead of finding the most promising query. We show by extensive experimentation that our methods outperform the existing ones in most cases.</ABSTRACT>
			<PDF>dawak01.pdf</PDF>
			<BIB>conf/dawak/KalnisP01</BIB>
	</INPROCEDINGS>
	
	
	<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Thanasis</FIRST>
				<LAST>Loukopoulos</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Ishfaq</FIRST>
				<LAST>Ahmad</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Dimitris</FIRST>
				<LAST>Papadias</LAST>
			</AUTHOR>
			<TITLE>Active Caching of On Line Analytical Processing Queries in WWW Proxies</TITLE>
			<CONFERENCE> Proc. of the  Int. Conference on Parallel Processing (ICPP)</CONFERENCE>
			<YEAR>2001</YEAR>
			<PLACE>Valencia, Spain</PLACE>
			<PAGES>419-426</PAGES>
			<RANK>2</RANK>
			<ABSTRACT>The Internet is offering more than just regularWeb pages to the users. Decision makers can now issue analytical, asopposed to transactional, queries that involve massive data (such as, aggregations of millions of rows in a relational database) in order to identify useful trends and patterns. Such queries are referred to as On-Line-Analytical-Processing (OLAP) queries. Typically, pages carrying query results do not exhibit temporal locality and, therefore, are not considered for caching at WWW proxies. In OLAP processing, this becomes a major hurdle as the cost of such queries is much higher than traditional transactional queries. This paper proposes a systematic technique to reduce the response time for OLAP queries originating from geographically distributed private LANs and issued through the Web towards the central data warehouse (DW) of an enterprise. An active caching scheme is proposed that enables the LAN proxies to cache some parts of the data, together with the semantics of the DW, in order to process queries and construct the resulting pages. OLAP queries arriving at the proxy are either satisfied locally or from the DW, depending on the relative access costs. We formulate a cost model for characterizing the latencies of these queries, taking into consideration normal Web access as well as analytical processing. We propose a cache admittance and replacement algorithm that outperforms a widely accepted caching algorithm.</ABSTRACT>
			<PDF>icpp01.pdf</PDF>
			<BIB>conf/icpp/LoukopoulosKAP01</BIB>
			<SPECIAL>Best paper award.</SPECIAL>
	</INPROCEDINGS>
	
	
	
	
	<!--  ............................................................................  1999  ............................................................................ -->
	
	
<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Dimitris</FIRST>
				<LAST>Papadias</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Marios</FIRST>
				<LAST>Mantzouroyannis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Nikos</FIRST>
				<LAST>Mamoulis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Ishfaq</FIRST>
				<LAST>Ahmad</LAST>
			</AUTHOR>
			<TITLE>Content-Based Retrieval Using Heuristic Search</TITLE>
			<CONFERENCE>Proc. of the ACM Conference on Information Retrieval (SIGIR)</CONFERENCE>
			<YEAR>1999</YEAR>
			<PLACE>Berkeley, CA</PLACE>
			<PAGES>168-175</PAGES>
			<RANK>1</RANK>
			<ABSTRACT>The fast growth of multimedia information in image and video databases has triggered research on efficient retrieval methods. This paper deals with structural queries, a type of content-based retrieval where similarity is not defined on visual properties such as color and texture, but on object relations in space. We propose the application of heuristic algorithms which provide good, but not necessarily optimal, solutions in a pre-determined time period, and compare our approach with systematic search methods which are guaranteed to find optimal solutions but require exponential time in the worst case. The quality of the output is calculated using a relation framework which is an extension of Allen’s relations. With this framework our methods can be applied in multiple resolutions and dimensions, thus covering a wide range of applications in spatial, multimedia and video systems.</ABSTRACT>
			<PDF>sigir99.pdf</PDF>
			<BIB>conf/sigir/PapadiasMKMA99</BIB>
	</INPROCEDINGS>

	
<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Dimitris</FIRST>
				<LAST>Papadias</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Nikos</FIRST>
				<LAST>Mamoulis</LAST>
			</AUTHOR>
			<TITLE>Hierarchical Constraint Satisfaction in Spatial Databases</TITLE>
			<CONFERENCE>Proc. of AAAI Conference</CONFERENCE>
			<YEAR>1999</YEAR>
			<PLACE>Orlando, FL</PLACE>
			<PAGES>142-147</PAGES>
			<RANK>1</RANK>
			<ABSTRACT>Several content-based queries in spatial databases and geographic information systems (GISs) can be modelled and processed as constraint satisfaction problems (CSPs). Regular CSP algorithms, however, work for main memory retrieval without utilizing indices to prune the search space. This paper shows how systematic and local search techniques can take advantage of the hierarchical decomposition of space, preserved by spatial data structures, to efficiently guide search. We study the conditions under which hierarchical constraint satisfaction outperforms traditional methods with extensive experimentation.</ABSTRACT>
			<PDF>aaai99.pdf</PDF>
			<BIB>conf/aaai/PapadiasKM99</BIB>
	</INPROCEDINGS>	
	
	
	<!--  ............................................................................  1997  ............................................................................ -->
	
<INPROCEDINGS>
			<AUTHOR>
				<FIRST>M</FIRST>
				<LAST>Perakis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>K</FIRST>
				<LAST>Platis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>K</FIRST>
				<LAST>Adaos</LAST>
			</AUTHOR>
				<AUTHOR>
				<FIRST>D</FIRST>
				<LAST>Nikolos</LAST>
			</AUTHOR>
				<AUTHOR>
				<FIRST>G</FIRST>
				<LAST>Alexiou</LAST>
			</AUTHOR>
				<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<TITLE>FPGA Implementation of a sub SDH STM/1 Framer - Deframer Unit</TITLE>
			<CONFERENCE>Proc. of EMAC</CONFERENCE>
			<YEAR>1997</YEAR>
			<PLACE>Barcelona</PLACE>
			<RANK>3</RANK>
	</INPROCEDINGS>
	

	
<INPROCEDINGS>
			<AUTHOR>
				<FIRST>Panos</FIRST>
				<LAST>Kalnis</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Kostas</FIRST>
				<LAST>Sfiris</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Georgios</FIRST>
				<LAST>Alexiou</LAST>
			</AUTHOR>
			<AUTHOR>
				<FIRST>Dimitris</FIRST>
				<LAST>Nikolos</LAST>
			</AUTHOR>
			<TITLE>From FPGAs to Standard Cell Based VLSI Chips</TITLE>
			<CONFERENCE>Proc. of ED&amp;TC, User Forum</CONFERENCE>
			<YEAR>1997</YEAR>
			<PLACE>Paris</PLACE>
			<RANK>4</RANK>
	</INPROCEDINGS>


	
	
</PUBLICATIONS>
