Finding Regulatory Signals from Unaligned DNA Sequences

Yuh-Jyh Hu

 

Multiple various genome projects have generated an explosive amount of biosequence data; however, our biological knowledge has not been able to increase in the same pace of the growth of biological data. This imbalance has stimulated the design of many novel analysis methods and devices.  One of the most important new developments is the microarray and the genechip technology.  Though a cluster of co-regulated genes isolated by gene expression measurements can show which genes in a cell have similar reaction to a stimulus, what biologists further want to understand is the mechanism that is responsible for the coordinated responses. The cellular response to a stimulus is controlled by the action of transcription factors. They binds to regulatory sites to interact with RNA polymerase, and thus to activate or repress the expression of a selected set of target genes. Given a family of genes characterized by their common response to a perturbation, the problem we try to solve is to find these regulatory signals (aka motifs), i.e. transcription factor binding sites, that are

shared by the control regions of these genes. According to motif representations, motif significance measures and motif search strategies, many different approaches have been developed.  In this talk, we would like to give a quick introduction to the history regarding the motif detection problem, and also some lessons learned from the studies.