XDuce to ML embedding

 

This sub-project deals with the problem of integrating XDuce into ML.

 

Highlights

o A source-to-source translation

o Type-safe embedding of XDuce into ML

o Semantics of the XDuce program is preserved by the translation

o An easy extension of XDuce with ML function call and ML pattern matching.

 

Preliminary Benchmarks

The XDuce to ML embedding prototype programs are written in Haskell via manual translation. The reason of using Haskell is that we find that laziness may help in improving runtime performance.

o  The addressbook program:

o      All programs are written based on the XDuce example "addrbook". Source codes are downloadable here.

o      We are measuring the transformation time. The results show that we are pretty close to the CDuce implementation.

o      The Benchmarks are performed on a Linux workstation (with 2.4Ghz P-IV and 1GB Ram).

o      Time is measured in milliseconds. To exclude the parsing time, we need to set two measuring points for each program.

o      XDuce's transformation time is obtained by calculating the difference between two programs. The first program parses the input and transforms it. The second one just parses the input.

o      CDuce's and our prototype's transformation times are obtained by measuring the difference between the CPU time at the two program locations. The first location is placed at the point right after the parsing is over. The second location is placed at the point right after the transformation is over.

o      The benchmark for CDuce is obtained via the compilation with Ocaml interface support, as suggested by the authors of CDuce. The benchmarks of CDuce and our prototype are subject to runtime parameters, i.e. OCAMLRUNPARAM for ocaml and +RTS for Haskell.  For simplicity, we use DEFAULT runtime settings.

o      To enforce strictness in Haskell code, we employ the DeepSeq module.

 

XDuce 0.4.0
(source)

CDuce 0.2.0
(source)

XDuce to ML embedding
(source, translated)

Tiny Size Input

 

 

 

100 entries

60

1.8

1.0

500 entries

2560

2.8

2.0

900 entries

9120

4.0

4.0

Medium Size Input

 

 

 

1k entries

11770

4.2

2.4

2k entries

51510

6.8

5.4

3k entries

*

10.4

7.4

4k entries

*

13.0

9.7

5k entries

*

16.4

12.7

6k entries

*

20.9

14.9

7k entries

*

23.3

18.3

8k entries

*

28.1

21.0

9k entries

*

32.7

23.5

Huge Size Input

 

 

 

10k entries

*

35.1

25.5

20k entries

*

70.4

54.7

30k entries

*

106.6

90.7

40k entries

*

135.9

133.4

50k entries

*

170.6

173.2

60k entries

*

202.1

217.7

70k entries

*

270.5

281.7

80k entries

*

301.9

332.9

90k entries

*

351.6

380.8

 

* denotes no respond after 60 seconds.

o  The token example:

o      The objective of this benchmark is to test whether laziness can employed as an optimization option.

o     We consider a program as follows

         regtype T = Token[]

         regtype D = Delim[]

         token :: (T,(D,T)*) -> Bool

         token (x as T) = False

  token _ = True

o      The problem lies in the application of (token e) where e is evaluated to some value of type ((T,D)*,T). Though we know that ((T,D)*,T) = (T,(D,T)*), the value is in structured representation, and it will be shuffled (restructured) to match with the function input type. It seems unnecessary because we just want to check whether there is more than one token in the sequence.

o      In this benchmark, we employ laziness in the function application, and we test the program with inputs ranging between 10k and 100k entries under the same environment as the above benchmark.

o      The results capture the responding time of the application in milliseconds. It shows that the processing time is constant. It might be more meaningful if we run this benchmark on a slower machine.

 

XDuce to ML embedding (benchmark2.tar.gz)

10k ~ 100k entries

0.0 ~ 0.04

 

o   Upcoming benchmarks

o      See Optimizations section below.

 

Optimizations

Here are some optimization options we can pursue. We plan to apply them in for future experiments.

o Optimization scheme #1: We can save some coercion, by some additional check. After inference takes place, if we find that pattern structure type (R_gamma_i) is a subtype of pattern type (R_I), we combine the downcast testing and the upcast value distribution.  This is common in most of the practical cases. For instance,

    f :: (A|B)* -> ...

    f ((x as A*), (y as B*)) = ...

    ...

o      Note that the type of values which will match first pattern is  (A*,B*)  (R_1) which is  in same shape as the pattern structural type  (A*,B*) (R_gamma_1). In this case, we combine d_1 :: [[(A|B)*]] -> Maybe [[R_1]]  and  u_1 :: [[R_1]] -> [[R_gamma_1]]  into  du_1 :: [[(A|B)*]] -> Maybe [[R_gamma_1]] . We need an additional check at compile time whether R_gamma_1 is a subtype of the function input type.

o Optimization scheme #2: We can compute a most general type based on the use of the pattern variable.  For instance,

    f :: (A|B)* -> ...

    f (x as A*) = g x

    ...

 

  g :: (A|B)* -> ...

o      Since the only use of variable x in the body is to apply it to g whose argument type is (A|B)*, we consider just perform a pattern match checking, and we need not perform any value distribution. This saves two up-casting coercions as compared to the original translation scheme.  In this case, we can translate f as follows,

    f :: [Or A_U B_U] -> ...

    f v = case (d1 v) of

          Just x -> g v

          Nothing -> ...

 

    g :: [Or A_U B_U] -> ...

    d1 :: [Or A_U B_U] -> Maybe [A_U]

o Optimization scheme #3: Another possible direction we can pursue is that when the patterns share a common prefix/suffix, we can merge the prefix/suffix.  For example,

    f :: (A|B)* -> ...

    f v = case v of

         (x as (A|C)*, y as A) -> ...

         (x as (A|D)*, y as B) -> ...

  ...

o     From the results of pattern inference, we have,

    f v = case v of

         (x as A*,y as A) -> ...

         (x as A*,y as B) -> ...

  ...

o     Now we can merge the patterns since they share a common prefix after inference taking place.

    f v = case v of

         (x as A*,v' as (A|B)) -> case v' of

                                  y as A -> ...

                                  y as B -> ...

   ...

o     We get rid of one redundant checking for prefix A* out of value v.

o Other optimization possibilities: fusion and deforestation, changing the underlying representation for sequence, etc.