BigLog: A Flexible Query Language

BigLog Overview

BigLog is an extention of DataLog to support mixture structured and unstructured data processing. It is a declarative logic programming language to express complex queries in epiC.

Overview

Many analytics employ an Extraction-like style to process multi-structured datasets :

  • Extract structures from unstructured data
  • Analytics are performed on the extracted data and structured data

Example: what is the conversion rate (R) for each region? R = # of payment sessions / # of total sessions

  • From web logs, extract (Sid, Uid, region) from cookies
  • From DBMS, extract (Sid, Uid, payment)
  • Join, Count

Features

We designed a Java library providing DataLog-like interface to support flexible plugin in epiC job. In addition, we developed a declarative programming query inteface to support standalone applications.

In the examples, we'll use case by case comparisons with respect to the SQL language. Here we first show a simple SELECT-FROM-WHERE expression as below.

    SQL:
         SELECT b, c FROM T WHERE T.a == 10;
    
    BigLog:
         Query("b", "c")
         .pred(T, val(10), var("b"), var("c"))
         .writeTo(System.out);
             

In BigLog, a query consists of a set of predicates.

  • Each predicate is a constraint of a table
  • The fields in a record are position indexed

Next, we will show the cases indicating the extension of BigLog on top of DataLog.

Aggregates

    SQL:
         SELECT a, AVG(b) FROM T WHERE c < 10 GROUP BY a;
         
    BigLog:
         Query("a", "avg_b")
         .pred(T, var(a), var(b), less(10))
         .pred(avg("b"), var("avg_b")
         .writeTo(System.out)
              

Joins

    SQL:
         SELECT a, b FROM R, S WHERE R.c = S.c;
             
    BigLog:
         Query("a", "b")
         .pred(R, var("a"), var("b"), var("c"))
         .pred(S, var("c"))
         .writeTo(System.out)
              

User Defined Data Operators

    Query("entry")
    .pred(Dir("/usr/log/*.log", var("f"))
    .pred(Split("f"), var("file"), var("off"), var("len")
    .pred(ParTable("file", "off", "len", Grep), var("entry"))
    .writeTo(System.out)