Home Registration Accommodation Location Schedule Invited Speakers Paper Submission Important Dates Call for Papers Call for Poster Travel Support Committees Organizers and Sponsors Related Events Workshop on Bioalgorithmic

RECOMB SATELLITE CONFERENCE ON REGULATORY GENOMICS
PROGRAM -- (all in LT34, SOC1, Level 3)

Day 1: Monday JUL 17, 2006

Time

Presentation title

Speaker

8.00 - 8.45

Registration

-

8.45 - 9.00

Introductory remarks

-

INVITED TALKS:

9.00 - 9.45

Computational Prediction of Regulatory Elements by Comparative Sequence Analysis

Martin Tompa

9.45 - 10.30

A Tale of Two Topics: motif significance and sensitivity of spaced seeds

Ming Li

10.30 - 11.00

Coffee break / Networking opportunities

11.00 - 11.45

Computational Challenges for Top-Down Modeling and Simulation of Biological Pathways

Satoru Miyano

11.45 - 12.30

An Improved Gibbs Sampling Method for Motif Discovery via Sequence Weighting

Tao Jiang

12.30 - 2.00

Lunch (provided) / Networking opportunities / Poster session

2.00 - 2.45

Discovering Motifs with Transcription Factor Domain Knowledge

Francis Chin

PAPER PRESENTATIONS:

2.45 - 3.45

TScan: A Two-step De novo Motif Discovery Method

Osman Abul, Geir Kjetil Sandve and Finn Drablos

Redundancy Elimination in Motif Discovery Algorithms

Henry Leung and Francis Chin

GAMOT: An Efficient Genetic Algorithm for Finding Challenging Motifs in DNA Sequences

Nihat Karaoglu, Sebastian Maurer-Stroh and Bernard Manderick.

3.45 - 4.15

Coffee break / Networking opportunities / Poster session (continued)

INVITED TALK:

4.15 - 5.00

Applications of ILP in Computational Biology

Andreas Dress

PAPER PRESENTATIONS:

5.00 - 5.40

Identification of spaced regulatory sites via submotif modeling

Edward Wijaya and Rajaraman Kanagasabai.

Refining motif finders with E-value calculations

Niranjan Nagarajan, Patrick Ng and Uri Keich.

BANQUET (at Furama RiverFront Hotel): (New!)

Buses leaves the conference venue for the banquet at 6:30pm.



Day 2: Tuesday JUL 18, 2006

Time

Presentation title

Speaker

INVITED TALKS:

8.15 - 9.00

On the Evolution of Transcription Regulation Networks

Ron Shamir

9.00 - 9.45

Systems Pharmacology in Cancer Therapeutics: Iterative Informatics-Experimental Interface

Edison Liu

9.45 - 10.30

Computational Structural Proteomics and Inhibitor Discovery

Ruben Abagyan

10.30 - 11.00

Coffee break / Networking opportunities

11.00 - 11.45

Characterization of Transcriptional Responses to Environmental Stress by Differential Location Analysis

Haixu Tang

11.45 - 12.30

A Knowledge-based Hybrid Algorithm for Protein Secondary Structure Prediction

Hsu Wen Lian

12.30 - 2.00

Lunch (provided) / Networking opportunities

2.00 - 2.45

Monotony and Surprise (Conservative Approaches to Pattern Discovery)

Alberto Apostolico

PAPER PRESENTATIONS:

2.45 - 3.30

Multiple Indexing Sequence Alignment for Group Feature Identification

Tun-Wen Pai and Margaret Dah-Tsyr Chang

Improving the Accuracy of Signal Transduction Pathway Construction Using Level-2 Neighbours

Thomas Wong, Siu-Ming Yiu and Tak-Wah Lam

3.30 - 4.00

Coffee break / Networking opportunities

INVITED TALK:

4.00 - 4.45

Evolution of Bacterial Regulatory Systems

Mikhail Gelfand

PAPER PRESENTATIONS:

4.45 - 5.25

CisSearch: Software Package For Complex Analysis Of Gene Regulatory Sequences

Evgeny Cheremushkin, Tagir Valeev, Tatiana Konovalova, Dmitry Shtokalo and Anna Taraskina.

Investigating roles of DNA flexibility in promoter recognition and regulation

Jim Bashford.



ABSTRACTS:

Ron Shamir, Tel Aviv University, Isreal:

On the Evolution of Transcription Regulation Networks
We are developing methods that employ sequence, expression and other data from multiple species, in order to identify transcription factor-DNA interactions and to trace their evolution. I will discuss several of our efforts in this direction:
- A study on the dynamics of minute changes in the regulatory sequences, using genomes of four closely related yeast species.
- Analysis of stability and change in transcriptional modules in 17 yeast species, using expression and sequence data.
- An integrated genome-wide evolutionary model of the regulatory code.
The emerging picture of the evolution in transcription regulation networks is quite fascinating. If time allows, I will also discuss our novel software tool for large scale de novo motif finding.

Ming Li,University of Waterloo, Canada:

A Tale of Two Topics: motif significance and sensitivity of spaced seeds
Computing the p-value of a motif has been a very difficult problem. Many heuristic algorihms try to approximate it. It turns out that this problem is very similar to the optimal spaced seed design in homology search. Connecting the two topics, for the first time we show computing the p-value is NP-hard, and give a reasonably fast algorithm by dynamic programming. Test results will be given.

Satoru Miyano, Human Genome Center, Institute of Medical Science, University of Tokyo, Japan

Computational Challenges for Top-Down Modeling and Simulation of Biological Pathways
If the concept of ordinary/partial differential equations would be the only way for modeling biological pathways for simulation, our understanding of life as system through computation would be not be drastically increased and would be very biased. If the language for modeling and describing biological pathways would not be rich, we would loose a lot of valuable knowledge and information on biological systems produced and reported. Placing this understanding as our basis of development, we have been developing an XML format Cell System Markup Language CSML (http://www.csml.org/) and a modeling and simulation tool Cell Illustrator (http://www.gene-networks.com/). In this talk, we present the newest version CSML 3.0 and Cell Illustrator 3.0 which supports CSML 3.0.
Cell Illustrator (CI for short) is a software tool for modeling and simulating biological pathways which is based on the notion of Petri net which was developed with the name Genomic Object Net [1]. An important challenge for Systems Biology is to create a software platform with which scientists in biology/medicine can comfortably create models of dynamic causal interactions and processes in the cell(s) and simulate them for further investigations, e.g. testing/creating hypotheses. CI employs the notion of Hybrid Functional Petri Net with extension (HFPNe) as its architecture [2]. HFPNe was defined by enhancing some functions to hybrid Petri net so that various aspects in pathways can be intuitively modeled, including integer, real, string, boolean, vector, objects, etc. The architecture of CI 3.0 is designed so that users can get involved with modeling and simulation in a biologically intuitive way with their profound knowledge and insights, and they can also be benefited from some public/commercial pathway databases. Its effectiveness has been demonstrated by modeling various biological processes. Recently, we have developed a method for automatic parameter estimation for HFPN models by developing a theory of data assimilation that will be implemented as a function of CI.
Some XML formats have been proposed to be standard formats for biological pathways. However, all formats provide only a partial solution for the storage and integration of biological data. The aim of CSML 3.0 is to create a really usable XML format for visualizing, modeling and simulating biological pathways. Other XML formats, SBML 2.0 and CellML 1.0 are proposed and developed for dynamic simulation. These formats have become popular for chemical reactions and many applications support them as data exchanging formats. However, these formats do not define any graphical elements, which cause a difficulty to be a powerful data exchange format among biological pathway applications. Here, CSML 3.0 is developed as an integrated/unified data exchange format which covers widely used data formats and applications, e.g. CellML 1.0, SBML 2.0, BioPAX, and Cytoscape.
We also developed automatic conversion programs which convert SBML 2.0 to CSML 3.0 and CellML 1.0 to CSML 3.0 automatically. CI 3.0 fully supports CSML 3.0 as its base XML. Thus every model in SBML 2.0 and CellML 1.0 is executable on CI 3.0. It is also possible to automatically convert KEGG and BioCyc metabolic pathways to CSML.
1. Genomic Object Net: http://www.genomicobject.net/
2. Nagasaki, M., Doi, A., Matsuno, H., Miyano, S. Computational modeling of biological processes with Petri net based architecture. In "Bioinformatics Technologies" (Y.P. Chen, ed). Springer Press. 179-243, 2005.

Tao Jiang, University of California at Riverside, USA:

An Improved Gibbs Sampling Method for Motif Discovery via Sequence Weighting
The discovery of motifs in DNA sequences remains a fundamental and challenging problem in computational molecular biology and regulatory genomics, although a large number of computational methods have been proposed in the past decade. Among these methods, the Gibbs sampling strategy has shown great promise and is routinely used for finding regulatory motif elements in the promoter regions of co-expressed genes. In this paper, we present an enhancement to the Gibbs sampling method when the expression data of the concerned genes is given. A sequence weighting scheme is proposed by explicitly taking gene expression variation into account in Gibbs sampling. That is, every putative motif element is assigned a weight proportional to the fold change in the expression level of its downstream gene under a single experimental condition, and a position specific scoring matrix (PSSM) is estimated from these weighted putative motif elements. Such an estimated PSSM might represent a more accurate motif model since motif elements with dramatic fold changes in gene expression are more likely to represent true motifs. This weighted Gibbs sampling method has been implemented and successfully tested on both simulated and biological sequence data. Our experimental results demonstrate that the use of sequence weighting has a profound impact on the performance of a Gibbs motif sampling algorithm.

Joint work with Xin Chen (School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore)

Francis Chin, Hong Kong University, Hong Kong:

Discovering Motifs with Transcription Factor Domain Knowledge
Finding the binding sites of transcription factors from a set of promoter regions of co-regulated genes is an important problem in molecular biology. Most motif-discovering algorithms consider over-represented similar patterns as binding sites and find the position specific score matrix (PSSM) with the maximum likelihood as the solution motif. However, many motifs in real biological data cannot be discovered by these algorithms because they do not consider the biological characteristics of binding sites. We introduce a new algorithm, DIMDom, which exploits two kinds of information: (a) the characteristic pattern of binding site classes, where class is determined based on biological information about transcription factor domains and (b) posterior probabilities of these classes. We compared the performance of DIMDom with MEME on all the transcription factors of Drosophiia in the TRANSFAC database and found that DIMDom outperformed MEME with more than double the number of successes and double the accuracy in finding binding sites and motifs.

Andreas Dress, Shanghai Institutes for Biological Sciences, Shanghai, PRC:

Applications of ILP in Computational Biology
In the lecture, I will present various problems in Computational Biology related in particular to phylogenetic combinatorics and network analysis that can be approached successfully using Integer Linear Programming.

Martin Tompa, University of Washington, USA:

Computational Prediction of Regulatory Elements by Comparative Sequence Analysis
With many vertebrate genomes now completely sequenced, the most promising methods for predicting functional sequence elements are based on comparison of sequences from multiple species. We focus on problems that arise when using such tools on a genome-wide scale in the vertebrates. These problems include difficulties in finding reliably homologous promoter sequences, difficulties in choosing the best tool and parameters to apply to these sequences, and difficulties in assessing the significance of the predictions produced. Solutions are offered to each of these problems, though they are far from complete.

Edison Liu, Genome Institute of Singapore, Singapore:

Systems Pharmacology in Cancer Therapeutics: Iterative Informatics-Experimental Interface
Systems biology, as a discipline, seeks to explain biologic phenomenon through the net interactions of all cellular and biochemical components within a cell or organism. We present work that uses a systems approach to build the framework for predictive pharmacology. We used as a model system, the p53 transcriptional response in vitro and in human tumors. First, we analyzed transcript profiles In 251 primary breast cancers in which the p53 gene had been sequenced and identified a clinically embedded 32-gene expression signature that distinguishes p53-mutant and wild-type tumors of different histologies that outperforms p53 sequencing. Thus, the transcriptional fingerprint is a more definitive downstream indicator of p53 function. Second, we identified a unique role for glycogen synthesis kinase-3beta (GSK-3beta) in regulating p53 function in human colorectal cancer cells. Pharmacologic modulation of GSK-3beta markedly impaired p53-dependent transactivation of targets including p21 and Puma but promoted p53-dependent conformational activation of Bax leading to apoptosis. Thus, the cell cycle arrest after p53-mediated damage response is converted to apoptosis following exposure to a variety of chemotherapeutic agents (Tan, et al. Cancer Res. 65(19):9012-20., 2005). The success of this compound will depend on a reliable assessment of p53 status in primary tumors.
Based on these observations, we sought to identify the precise mechanisms of p53 gene regulation by developing a robust approach that couples chromatin immunoprecipitation (ChIP) with the paired-end ditag (PET) sequencing strategy for unbiased and precise global localization of p53 binding sites. From a saturated sampling of over half a million PET sequences, we characterized 65,572 unique p53 ChIP DNA fragments and established overlapping PET clusters as a readout to define p53 binding loci with remarkable specificity. Based on this information, we refined the consensus p53 binding motif, identified at least 542 binding loci with high confidence, discovered 98 previously unidentified p53 target genes that were implicated in novel aspects of p53 functions such as cell adhesion and motility. Finally, we showed their clinical relevance to p53-dependent tumorigenesis in primary cancer samples (Wei CL, et al. Cell. 124(1):207-19, 2006). The mutually supporting discovery framework we have established at the GIS has been the key to maximally exploiting individual discoveries in a collective manner.

Ruben Abagyan, The Scripps Research Institute, La Jolla, USA:

Computational Structural Proteomics and Inhibitor Discovery
Rapid advance of structural proteomics calls for the development of new methods for predicting structural changes, association, function, as well as improving methods for structure based molecular design. The main challenges of computational structural biology and chemistry will be reviewed. We have developed methods for predicting the functional map of a protein with a known 3D structure, accurate docking of compounds to a binding site and virtual ligand screening of large chemical databases, and structure prediction by global energy optimization, e.g. characterizing mutants and SNPs, homology modeling, protein protein or peptide docking, and accurate loop prediction.
Predicting how flexible molecules dock to a flexible receptor is one of the main challenges in computational structural biology and structure based ligand design. Two stories in which novel compounds were discovered through "ligand-guided" receptor pocket modeling followed by virtual screening of large compound libraries, were presented. First, we developed models of the androgen receptor in an antagonist-bound conformation. These models were used to discover computationally the secondary activity of antipsychotic drugs. These drugs were then chemically altered and "re-purposed" to loose their binding to the serotonin and dopamin receptors, and improve their anti-androgen properties. The experimental side of this project was performed by the labs of Xiaokun Zhang and James Dalton. Second, in a collaboration with the David Lomas lab at Cambridge, we identified the first small molecules to inhibit pathological polymerization of an alpha1-antitrypsin mutant which is the most common genetic cause of a lethal liver disease in childhood. Computationally this project was particularly difficult because the target of a small molecule was a dynamic protein-protein interface. Third, we developed a protocol for protein-protein docking which produced the winning overall predictions in two consecutive CAPRI competitions.
Finally, a new way to disseminate structural and functional information in structural proteomics developed in collaboration with the Oxford Center for Structural Genomics is presented.

Haixu Tang, Indiana University, USA:

Characterization of Transcriptional Responses to Environmental Stress by Differential Location Analysis
Unicellular organisms like yeast, need to rapidly respond to environmental condition changes for their survival. Using high-throughput location analysis (Chromatin Immuno-Precipitation on DNA chip, or ChIP-Chip in short), Harbison et al. have determined the genomic binding locations of 204 transcription factors (TFs) from the yeast Saccharomyces cerevisiae in rich media condition and 13 stress conditions. Here, we report a statistical method for differential location analysis, to determine the set of regulators that bind to significantly different genomic regions under certain stress conditions. From the published ChIP-Chip data by Harbison et al., we were able to identify 105 TFs-condition pairs which showed statistically significant differential binding patterns (p < 0.05). Comparison with published Microarray data revealed that the expression levels of nearly half of the tested TFs did not significantly change under the corresponding environmental stress, which implies that such regulatory responses would not be revealed solely by Microarray data. In conclusion, complementary differential analyses (e.g. differential location analysis) are required, in addition to commonly used Microarray-based differential expression analysis, in order to understand the global picture of cellular responses to environmental stresses.

Hsu Wen Lian, Institute of Information Sciences, Academia Sinica, Taiwan:

A Knowledge-based Hybrid Algorithm for Protein Secondary Structure Prediction
In our previous approach, we proposed a hybrid method called HYPROSP II for protein secondary structure prediction, which combined our proposed knowledge-based prediction algorithm PROSP and a neural net approach PSIPRED. In this talk, we further improve the performance of PROSP by proposing a better voting strategy and a wider coverage rate using both 7-mers and 5-mers.

Generally speaking, the knowledge-base algorithm, PROSP, does not necessarily provide the best result among all secondary structure prediction systems, restricted by its coverage rate. Therefore, we need to consult other algorithms or biological properties that are potentially complementary to those of PROSP. We will illustrate a neural network model to help us make good combined results from more than one system, which could be substantially better than any single system. Our approach provides a general platform of knowledge-based approach for prediction algorithms, which is more amenable to various biological domain knowledge.

Alberto Apostolico, Georgia Institue of Technology, USA & University of Padova, Italy:

Monotony and Surprise (Conservative Approaches to Pattern Discovery)
Pattern discovery is often torn between the rigidity of the model and the abundance of candidates, a circumstance that tends to generate daunting computational burdens, and to give rise to a throughput that is impossible to visualize and digest.

While part of these problems is endemic, another part seems rooted in the characterizations traditionally offered for the notion of a motif or association, that are typically based either on syntax or on statistics alone. This talk describes alternate notions based on constraints of saturation that tightly combine syntactic and statistical specifications, and shows how they afford significant parsimony in the generation and testing of candidate patterns.

Mikhail Gelfand, Research and Training Center on Bioinformatics, Russia:

Evolution of Bacterial Regulatory Systems
Comparative analysis of bacterial genomes allows not only for identification of new regulatory systems and functional annotation of hypothetical genes, but also for characterization of changes in regulatory patterns. Although it is premature to speak about a theory of regulatory evolution, some patterns start to emerge. I will present results of genomic analysis of several systems of varying complexity. In particular, I will show how computational analysis of NrdR, a universal regulator of ribonucleotide reductases, has resulted in a detailed description of the regulatory signal and the mechanism of regulation, and has established links between this regulon and replication. I will present examples of regulon expansion, contraction, merging and disappearance in the metabolic pathways of oligosaccharide and sugar utilisation. Finally, I will attempt to reconstruct the evolutionary history of the regulation of iron homeostasis system in alpha-proteobacteria.