My research is in developing database systems and technologies for managing big data in Data Science applications. Currently, I am working on several projects.
MAGE: Dynamic Graph Analytics On GPUs
In many real-world graph applications, data are frequently updated. For example, online social networks continuously evolve as new social interactions occur. To enable online analytics for data streaming applications, we are motivated to propose a dynamic graph analytic system on GPUs, which simultaneously processes a stream of graph update operations and analytical queries on the dynamically maintained matrix representation of the graph. The high update rate brought about numerous questions and challenges:
· How can we efficiently perform updates against the matrix stored on GPUs? Due to the Single Instruction Multiple Thread (SIMT) architecture of GPU devices, there are two keys in optimizing GPU-resident applications: 1) minimizing thread divergence; 2) maximizing coalesced memory access. It is non-trivial to design an update-efficient storage scheme on GPU, which also supports blazing fast computation. On one hand, rebuilding the entire storage structure against each update is both time and space inefficient. On the other hand, CPU-styled updates (e.g. insertion/deletion) of SM entries tend to scatter data in the GPU memory which leads to uncoalesced memory access and increases of the level of thread divergence for the computation.
· How to expand existing SM primitives to support a wider range of applications? Existing matrix primitives, e.g. SM vector multiplication, operate on the entire matrix. This is too rigid for many applications. For instance, in the case of community tracking on dynamic networks, it is rather inefficient to query the adjacency matrix as a whole since the community is only a subgraph of the entire social network. In another example, it is often of interest to monitor and analyze certain consumers' purchase history (e.g. Singaporean) from a large matrix representing all consumers' purchase history (e.g. each row of the matrix represents a consumer and each row entry denotes which item was bought by the consumer). Again, running the analysis on the entire matrix is undesirable as only a subset of the rows is required.
· How to optimize dynamic graph analytics on GPUs? Note that it does not suffice to optimize each matrix primitive on GPUs, as many analytic queries are submitted simultaneously on the dynamically updated matrix in various application scenarios. Analysts may track different communities in a dynamic social network. Likewise, marketers are keen to analyse real-time sales records of different consumer groups to deliver targeted advertising instantly. To fully exploit the computational capabilities of GPUs, it calls for novel strategies of scheduling online graph analytics processing on GPUs. In addition, to facilitate applications in larger scales, there is a need to support optimizations on multiple GPUs and CPU-GPU heterogeneous computing environments.
· How to optimize graph processing algorithms for dynamic graphs? Existing analytics algorithms like pagerank and subgraph matching have to be redesigned to fully exploit a CPU-GPU heterogeneous computing environment.
To address the aforementioned research challenges, this project will focus on developing novel storage structures and algorithmic optimization for supporting dynamic SM analytics on GPU-assisted platforms.
· Kian-Lee Tan, NUS
· Wentian Guo, NUS
· Mo Sha, NUS
· Yucheng Li, SMU
· Parallel Personalized Pagerank on Dynamic Graphs W. Guo, Y. Li, M. Sha, K.L. Tan, PVLDB , Vol. 11, No. 1, 2017, pp. 93-106.
· Accelerating Dynamic Graph Analytics on GPUs, M. Sha, Y. Li, B. He, K.L. Tan, PVLDB, Vol. 11, No. 1, 2017, pp. 107-120.
SONAR: Social Network Analytics
Today’s social networking platforms (SNPs) offers a rich amount of information. This include not only users profiles but also the social relationships between users and the activities on SNPs. With such information, it is possible to extract user preferences and understand what/how information are shared between friends or followers. We have been investigating innovative solutions for social media marketing (SMM) in SNPs, including influence maximization, and different channels for SMM such as directed target advertising (i.e., advertisements are directly posted to match users' preferences), celebrity social advertising (i.e, advertisements are endorsed by celebrities) and self influential advertising (i.e., advertisements are propagated to wider audiences through the advertiser's friends or followers). Our recent work have considered a variety of other critical factors that lead to a variety of important variants of the problem:
· Stream Influence Maximization. Social inﬂuence is highly dynamic and the rate at which information is propagated between users can be altered drastically by breaking news and trending topics. This means the influencers are also dynamically changing. How to maintain a changing set of influencers continuously is a challenge.
· Topic-aware Influence Maximization. Users in a social network may have various interests (which can be represented by topics), and the inﬂuence strength between two users (i.e., edge) is often topic-dependent. For example, a user may be inﬂuenced by her friend in some topics (e.g., sports), while remaining neutral/unaffected in others (e.g., politics). It is therefore important to design solutions that are topic-aware, i.e., considers user-to-user inﬂuence strength (edges) is topic-dependent.
· Location-aware Influence Maximization. While two users may be socially close in the social network, they may not be physically near. As such, it does not make sense to send an advertisement that is in Singapore to a friend that is in US. Finding the right set of users who can influence users that are spatially and socially close is the right way to go.
Our research seeks to develop approximate schemes that are not only efficient for real-time applications, but also offer theoretical guarantees on performance.
· Kian-Lee Tan, NUS
· Yanhao Wang, NUS
· Yucheng Li, SMU
· Influence Maximization on Social Graphs: An Survey Y. Li, J. Fan, Y. Wang, K.L. Tan, IEEE Transactions on Knowledge and Data Engineering, July 2018, IEEE CS, accepted for publication.
· A Sliding-Window Framework for Representative Subset Selection (Poster Paper) Y. Wang, Y. Li, K.L. Tan, Proceedings of the 34th International Conference on Data Engineering (ICDE'18), CNAM, Paris, April 16-19, 2018.
· OCTOPUS: An Online Topic-Aware Influence Analysis System for Social Networks (Demo) J. Fan, J. Qiu, Y. Li, Q. Meng, D. Zhang, G. Li, K.L. Tan, X. Du, Proceedings of the 34th International Conference on Data Engineering (ICDE'18), CNAM, Paris, April 16-19, 2018.
· Parallel Discovering Your Selling Points: Personalized Social Influential Tag Exploration Y. Li, J. Fan, D. Zhang, K.L. Tan, 2017 International Conference on Management of Data (SIGMOD'2017), May 14 - May 19, 2017, Chicago, IL, USA, pp. 619-634.
· Real-Time Influence Maximization on Dynamic Social Streams Y. Wang, Q. Fan, Y. Li, Kian-Lee Tan, PVLDB (VLDB'2017), Vol. 10, No. 7, March 2017, pp. 805-816.
· Context-Aware Advertisement Recommendation for High-Speed Social News Feeding Y. Li, D. Zhang, Z. Lan, K.L. Tan, Proceedings of the 32th International Conference on Data Engineering (ICDE'16), Helsinki, Finland, May 16-20, 2016, pp. 505-516.
· Realtime Targeted Influence Maximization for Online Advertisements Y. Li, D. Zhang, K.L. Tan, PVLDB (VLDB'2015), Vol. 8, No. 10, 2015, pp. 1070-1081.
· Online Topic-Aware Influence Maximization S. Chen, J. Fan, G. Li, J. Feng, K.L. Tan, J. Tang, PVLDB (VLDB'2015), Vol. 8, No. 6, 2015, pp. 666-677.
TARDIS: Higher Order Analytics in Trajectory Data Management
The TARDIS can travel back in time to answer historical queries as well as offer insights about the future. We are finalizing our research project in this. Come back for more details in March 2018