Scalable Distributed Stream Join Processing

Efficient and scalable stream joins play an important role in performing real-time analytics for many cloud applications. However, like in conventional database processing, online theta-joins over data streams are computationally expensive and moreover, being memory-based processing, they impose high memory requirement on the system. In this paper, we propose a novel stream join model, called join-biclique, which organizes a large cluster as a complete bipartite graph. Join-biclique has several strengths over state-of-the-art techniques, including memory-efficiency, elasticity and scalability. These features are essential for building efficient and scalable streaming systems. Based on join-biclique, we develop a scalable distributed stream join system, BiStream, over a large-scale commodity cluster. Specifically, BiStream is designed to support efficient full-history joins, window-based joins and online data aggregation. BiStream also supports adaptive resource management to dynamically scale out and down the system according to its application workloads.

Join-Biclique Model: Given a cluster with m + n processing units, join-biclique organizes all of these units as a complete bipartite graph (a.k.a. biclique). Suppose there are two relations R and S. Each side of the bipartite graph corresponds to one relation for storage. Specifically, data of relation R are partitioned and stored into one side of the bipartite graph with m units without replicas. Similarly, data of relation S is partitioned into the other side with n units. Each edge in the join-biclique represents the join operation between two units of the opposite relations.

BiStream System: BiStream is built on top of Apache Storm, and it consists of two functional components: router and joiner. The router is designed to ingest and route the input streams to the corresponding units for further processing, and the joiner is in charge of data storing and join processing for the data received from the router. The BiStream topology follows the join-biclique model but reduces the degree of inter-unit connectivity. Each processing unit of the joiner only communicates with the router to receive the data to store or join without communicating among each other.

Dataflow Control: To facilitate load balancing, BiStream dataflow is designed as a two-stage process:
- (1) From the shuffler to the dispatcher, a random routing strategy is adopted where tuples emitted by the shuffler are randomly distributed across dispatcher tasks such that each dispatcher task receives an equal number of tuples. Being content-insensitive, the random routing strategy should be effective in automatically balancing the workload among the dispatcher tasks.
- (2) From the dispatcher to the joiner, two streams are involved. The store stream is for routing each tuple to a unit for storing, and the join stream is for sending the tuples to the proper units for join processing. Different routing strategies in these two streams can be implemented for different joins based on join selectivity.

Window-Based Stream Joins. A window-based stream join evaluates the join condition only for tuples within the designated window. The most widely used window-based stream join is to apply the time-based sliding window. Indexes are essential to support efficient stream join processing upon the maintenance of a sliding window. To facilitate low-overhead data discarding and index exploration, BiStream adopts the chained in-memory index. The idea is to partition the streaming tuples with respect to the discrete time intervals and construct a subindex per interval. All subindexes are chained as a linked list by the order of their construction time. According to the window constraint, expired data are discarded from memory in the granularity of subindex-level rather than tuple-level. This reduces the overhead of data discarding since the valid subindexes are not affected when the obsolete subindexes are discarded.

Supporting Other Operators: Centering on stream join processing, BiStream also supports other stream operators in a pipelined manner.
- Selection and Projection: Duplicate-preserving selection and projection are conducted by filtering the streams in a tuple-at-a-time manner without using any extra memory for storing the intermediate state. Duplicate-eliminating selection and projection are supported by integrating the implementation of existing duplicate-eliminating into either the router or the joiner to enable duplicate elimination.
- Online Data Aggregation: BiStream adopts a two-phase approach to support online data aggregation over the stream join outputs. The first phase is to pre-aggregate a partial local view in each unit in the joiner. As each unit contains part of the join outputs independently, the local view computation can be directly performed. The second phase is to shuffle the local views to a coordinating component and merge them into a global view from time to time. To avoid high communication and computation overhead incurred by frequent view updates, BiStream computes the updated views by batching the updated tuples based on a time period.