The rise of e-commerce and social networking has spawned many Internet-scale service providers. Several have phenomenal growth rates. These providers and the wannabes that they inspire must ensure that their systems are scalable. Engineering for rapid growth in this highly competitive setting requires intensive testing with scaled-up datasets. This motivates the Dataset Scaling Problem (DSP):
The Dataset Scaling Problem
Given a set of relational tables D and a scale factor s, generate a database state D' that is similar to D but s times its size.
UpSizeR, a tool for solving DSP, does this by extracting inter-column and inter-row information from D. Although UpSizeR was conceived for scalability testing (s>1), it can also be used by an enterprise to make a synthetic copy (s=1) of its proprietary dataset for a vendor, or scale down a production dataset (s<1) for non-production testing. Our wish is that UpSizeR can work for any relational database. However, given the diversity of applications, the complexities in real data and the pressing need for a scaling tool, UpSizeR development requires a community effort. We are therefore releasing UpSizeR, to encourage open-source collaboration to develop it.
Y.C. Tay. Data Generation for Application-Specific Benchmarking. Proc. VLDB 2011 (Challenges and Visions Track), 1470-1473 (Aug. 2011).[PDF Version]
Y.C. Tay, Bing Tian Dai, Daniel T. Wang, Eldora Y. Sun, Yong Lin and Yuting Lin. UpSizeR: Synthetically Scaling an Empirical Relational Database. Information Systems 38, 8(Nov. 2013), 1168--1183. [ PDF Version]
J.W. Zhang, A. Mal and Y.C. Tay. GscalerCloud: A Web-Based Graph Scaling Service ( Demo Website). Proc. 33rd IEEE Int. Conf. Data Engineering (ICDE), San Diego, USA (Apr. 2017),1391--1392. PDF Version
This document, index.html, has been accessed 7102 times since 03-Jun-10 22:16:21 SGT. This is the 1st time it has been accessed today.
A total of 1848 different hosts have accessed this document in the last 2698 days; your host, 184.108.40.206, has accessed it 1 times.
If you're interested, complete statistics for this document are also available, including breakdowns by top-level domain, host name, and date.