next up previous
Next: Integration Basics Up: Exchange Format Basics Previous: Bad Examples

Good Example

Here is a simplified version of the exchange format used by the Kleisli system. Punctuation/indentation marks such as colon, commas, tabs, white spaces, etc are not significant. The

Logical Lexical Remarks
Booleans true  
  false  
Numbers 123 Positive numbers
  123.123  
  ~123 Negative numbers
  ~123.123  
Strings "a string" String is put inside double quotes
Records (#l1: O1, ..., #ln: On) Record is put inside round brackets.
Variants <#l: O> Variant is put inside angle brackets
Sets { O1, ..., On} Set is put inside curly brackets
Lists [ O1, ..., On] List is put inside square brackets

Here is an example of our F-box report in this exchange format:

 (#uid: 7490551,
  #title: "F-box domain protein Pof3p - fission yeast (Schizosaccharomycespombe).",
  #accession: "T41727",
  #common: "fission yeast.",
  #organism: (#genus: "Schizosaccharomyces",
              #species: "pombe",
              #lineage: ["Eukaryota",
                         "Fungi",
                         "Ascomycota",
                         "Schizosaccharomycetales",
                         "Schizosaccharomycetaceae",
                         "Schizosaccharomyces"]),
  #feature: {(#name: "source",
              #start: 0,
              #end: 576,
              #anno: [(#anno_name: "organism",
                       #descr: "Schizosaccharomyces pombe"),
                      (#anno_name: "db_xref",
                       #descr: "taxon:4896")]),
             (#name: "Protein",
              #start: 0,
              #end: 576,
              #anno: [(#anno_name: "product",
                       #descr: "F-box domain protein Pof3p")])},
  #sequence: "MNNYQVKAIKEKTQQYLSKRKFEDALTFITKTIEQEPNPTIDLFELRAQVYEKSGQYSQAELDAKRMIHLNARNARGYLRLGKLLQLDGFDKKADQLYTQGLRMVHKMDPLRPVLKKVSQRLNERILRTRPVLDLFRILPREVLLCILQQLNFKSIVQCMQVCKHWRDCIKKEPSLFCCLDFSCASPRSVNSRDRNVMAVARYSVYSKDNIQEVIGLEKLGILTPTKALLRSVKSLKVYKTISPLHTQSTDKLYTIWTPFSELHYFYCATPITFSIASKILSCCKKLKQVELVDLIPDLIFDSMDWDKLFNAESVPLALKSLTFIRNQKFPFHHKEQQFLKDLLSASPYLEYLEASYQSDLVAAIKKYKINLRSLIIIDEGVSNTVKDLAFLPQSLTTLIVKPCNPASTILCPYLFPTNVRMESLINLELFLYLRLSQNDIDNVVKFLTSCYKLKKLVLHDSLALAPHFFEIFASLPELEHLEIPDNVALQNKHAIHITDCCPNLKYVNFSNSISLDGSGFIAVLRGLKELKRIDIINCDSVSRDAIDWARSKGMQVTVASSLPNSQPLGTKKIRLI")

Here is the tomato F-box Medline ``abstract'':

 (#muid: 10652136,
  #authors: "Rose A, Meier I, Wienand U",
  #address: "Institut fur Allgemeine Botanik, Universitat Hamburg, Ohnhorststr. 18, D-22609 Hamburg, Germany.",
  #title: "The tomato I-box binding factor LeMYBI is a member of a novel class of myb-like proteins.",
  #abstract: "The RBCS3A gene of tomato belongs to a small gene family consisting of five members. Although the RBCS1, RBCS2 and RBCS3A promoters contain closely related cis regulatory sequences, the expression patterns of the genes are different. Whereas the RBCS1 and RBCS2 genes are expressed in both leaves and young fruit, the RBCS3A promoter is highly active in leaves, but not in young fruit. This lack of transcription could be due to a mutation in the RBCS3A promoter creating the so-called F-box, a protein binding site located between the activating cis elements, the I-box and G-box. In order to identify proteins that bind to the RBCS3A I-box/F-box region, the yeast one-hybrid system was used. One clone, LeMYBI was isolated which contains strong similarity to plant myb transcription factors. The encoded LeMYBI protein is at least 188 amino acids in length and contains two myb-like domains located at the amino terminus and close to the carboxy terminus, separated by a negatively charged domain. The protein contains a SHAQKYF amino acid signature motif in the second myb-like repeat, which is highly conserved in a number of recently identified plant myb-related genes, thus defining a new class of plant DNA-binding proteins. LeMYBI binds specifically to the I-box sequence of the RBCS1, RBCS2 and RBCS3A promoters, therefore representing the first cloned I-box binding factor. LeMYBI acts as a transcriptional activator in yeast and plants, and binds to the I-box with a DNA-binding domain located in the carboxyterminal domain.",
  #journal: "Plant J 1999 Dec;20(6):641-52 ")

Because the same lexical layer is used, the same ``parser'' can be used. For example, if DATA is either one of the above record, you can print the title by this simple Kleisli/CPL program, which operates on the logical layer and complete hides the lexical layer:

   DATA.#title

Here is a program in Kleisli/Perl that does the same thing. But it operates at a lower level than Kleisli/CPL, so you see both lexical and logical layers:

  $a_file  = CPLIO->Open(DATA);        # receiving at lexical layer
  $a_rec   = $a_file->Parse;           # parse into logical layer
  $a_title = $a_rec->Project("title"); # operate on logical layer
  print "$a_title";

The clean separation between lexical and logical layers allows a programmer to operate on the logical structure of an object (eg. extracting its title) without worrying about how to locate various data elements and how to determine their boundaries (eg. how to recognize the title).


next up previous
Next: Integration Basics Up: Exchange Format Basics Previous: Bad Examples
Wong Lim Soon
4/9/2000