PB1 Phyloenetic Tree based on protein ¡Vcoding genes of xylosyltransferases is similar to the universal tree of life based on rRNA sequences

Hsingshen Hung a*, Kuo-yuan Hwa , PhD a#

a Institute of Organic and Polymeric Materials National Taipei University of Technology No. 1, Sec.3, Jungshiau E Rd, Taipei, Taiwan 106, R.O.C. Tel:+886-2-2771-2171 x 2419. E-mail:kyhwa@ntut.edu.tw # corresponding author * presenting author


Comparative analyses of rRNA sequences have used to construct the ¡§universal tree of life¡¨, a phylogentic tree, which shown a division of living organisms into three major domains: Eucarya, Archaea, and Bacteria. In the rRNA phylogenetic tree, the earliest eukaryotic cells are Archaezoa, which are amitochondriate organisms such as the Diplomonads (Giardia) and Parabasalia (Trichomonads). However recent data using phylogenetic methods based on protein-coding genes have shown that classical methods of molecular phylogeny using rRNA genes may fail to delineate phylogenetic relationships between domains or between major lineages of these domains. Recently, we have found a novel structure of N-linked core structure from T. vaginalis¡¦ glycalyx (K.Y. Hwa et al, unpublished data). The structure resembles similar xylosylated core structure found in plant. The triggers this study in searching of enzymes synthesizing the new structure. To take the time and cost advantages of bioinformatics approach, we have used several different methods to construct theoretical modeling based primarily on HMM, from known sequences of xylosyltransferase genes characterized from other species. We then use the model to predict of xyloxyltranasferase genes by in silico genome-wide screening of EST genome database of T. vaginalis. Our results had detected eight sequences with high similarity to beta-1, 2 xylosyltransferase. We then, constructed a phylogenetic tree with xylosyltransferase coding regions. Interestingly, the phylogenetic tree based on the ¡§hypothetical¡¨ xylosyltransferase protein coding sequences, is similar to the one based on rRNA sequences. Based on the structure similarity in core structure of N-linked glycan and the phylogenic method, we conclude that trichomonads might be the most ancient eukaryote between animal and plant kingdoms.



PB2 An Improved method of 123D fold recognition method by SSE algorithm and structural conserved residues

Po-Hsien Leea, Yi-Tsang Tua, Sheh-Yi Sheua,b

aInstitute of Bioinformatics, bFaculty of Life Sciences and Institute of Genome Sciences, National Yang Ming University, Taipei 112, Taiwan


The challenge of protein structure prediction is to search proper templates for a sequence which has less 25% similarity with any sequences of known structures. Many fold recognition algorithms are developed to solve this problem. 123D is one of them and its algorithm considers three properties, including sequence profiles, secondary structure preferences and contact capacity potentials as parameters, to thread a sequence into 3D structures in the fold library. We try to incorporate the information of secondary structure prediction and SSEs (Secondary Structure Elements) algorithm to improve 123D algorithm, because the accuracy of secondary structure predicted by support vector machine approaches 80% and proteins with high similar structures have high similar arrangements of SSEs. Also, the templates searched by 123D are filtered by degrees of structural conserved residues. The testing results of nonredundant datasets showed that these two factors could complement 123D. Furthermore, by integrating with homology modeling method, we have constructed an automatic pipeline for protein structure prediction to provide a high-throughput tool for study of structural genomics.



PB3 (PS)2: Protein Structure prediction server

Chih-Chieh Chen a, Jenn-Kang Hwang a,b,c, and Jinn-Moon Yang a,b,c

a Institute of Bioinformatics, National Chiao Tung University, Hsinchu, 30050, Taiwan b Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, 30050, Taiwan c Core Facility for Structural Bioinformatics, National Chiao Tung University, Hsinchu, 30050, Taiwan


Protein structure prediction provides valuable insights into function, and comparative modeling is one of the most reliable methods to predict 3D structures directly from amino acid sequences. However, critical problems arise during the selection of the correct templates and the alignment of query sequences therewith. We have developed an automatic protein structure prediction server, (PS)2, which uses an effective consensus strategy both in template selection, which combines PSI-BLAST and IMPALA, and target¡Vtemplate alignment integrating PSI-BLAST, IMPALA and T-Coffee. (PS)2 was evaluated for 47 comparative modeling targets in CASP6 (Critical Assessment of Techniques for Protein Structure Prediction). For the benchmark dataset, the predictive performance of (PS)2, based on the mean GTD_TS score, was superior to 10 other automatic servers. Our method is based solely on the consensus sequence and thus is considerably faster than other methods that rely on the additional structural consensus of templates. Our results show that (PS)2, coupled with suitable consensus strategies and a new similarity score, can significantly improve structure prediction. Our approach should be useful in structure prediction and modeling. The (PS)2 is available through the website at http://ps2.life.nctu.edu.tw/.



PB4 Study of Protein Interaction Network by a Bioinformatics Approach: Evaluation of Gene Expression Profiles Using Matrices

Nancy Lina, Hsueh-Fen Juanb, Hsuan-Cheng Huangc, Shyh-Horng Chioua

aInstitute of Biochemical Sciences, National Taiwan University, Taipei, Taiwan, bDepartment of Life Science and Institute of Molecular and Cellular Biology, National Taiwan University, Taipei, Taiwan, cInstitute of Bioinformatics, National Yang-Ming University, Taipei, Taiwan.


Search of functional genes among the whole genome and their annotation is one of the most challenging problems in the post-genomic era. In this context, the search for meaningful bioinformatics methods of assigning protein functions is important. Many approaches are available for assigning putative functions to un-annotated proteins. Herein, we demonstrate methods of assigning a role to each node of a network by calculating its degree of connectivity in the entire network. In order to find the correlation between gene expression profiles and protein-protein interactions, two matrices would be built, one for gene expression profiles and the other for protein-protein interaction network. One of the matrices contains all the correlation coefficients of all pairs of genes, i.e. based on Pearson, Spearman, and Kendall methods to calculate the correlation coefficients. And the other matrix is obtained by using the pair-wise column comparison method to assign a score between each pair of proteins, whether interactive or not, with number one denoting yes and zero no. After matrices are constructed, both matrix algebra and statistical methods are combined to obtain the correlation coefficient between gene expression profiles and protein-protein interactions within the proteomes of E. coli and yeast. Our result has shown correlation coefficients of ~0.5 between the gene expression profiles and protein interaction networks. On the other hand, we have also applied a relatively new concept of Similarity Score used previously only in internet network. This method is able to provide us with a numerical value to each node of the network, assigning statistical evaluation weights to all nodes within our studied networks. Results reveal a strong positive relationship between the correlation coefficients of gene expression profiles and the similarity scores of protein interaction networks.



PB5 Prediction of glucose transport pathways in the human glucose transporter 1 (GLUT1)

Kuei-Ling Kuoa, Jung-Hsin Lina,b

aSchool of Pharmacy, National Taiwan University, Taipei, Taiwan,
bInstitute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan


Passive transport of glucose across the plasma membrane is mediated by members of the glucose transporter (GLUT/SLC2A) family. GLUT1, known as the red blood cell glucose transporter, is one of the most extensively studied membrane transporters. However, the molecular mechanism of GLUT1-mediated glucose transport is still largely unknown. We have conducted molecular dynamics simulations of the GLUT1 transporter in the full-hydrated POPC bilayer to refine the homology model of GLUT1 based on chemical crosslinking data. Umbrella sampling was then employed to calculate the potential of mean force profile, which indicated that the refined structure did have lower free energy. The glucose transport pathways were predicted by a novel docking scheme which can locate a series of juxtaposed ligand binding sites within such membrane channels, along with the most favorable conformation at each binding site. Detailed knowledge of glucose transport will lead to advances in the understanding and designing the therapeutics of glucose homeostasis disorders, including type 2 diabetes mellitus.



PB6 Developing ¡§Class-Optimized¡¨ Scoring Functions for Drug Design

Zhong-Wei Zhanga, Jung-Hsin Linab

aInstitute of Pharmacy, National Taiwan University, Taipei, Taiwan
bInstitute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan


The scoring function for ligand-receptor interactions is a crucial element for computational drug design. Ideally, the scoring function not only needs to be accurate, but should also apply to as general cases as possible. Furthermore, the numerical values of the scoring function should be able to map to the binding free energies of the ligand-receptor complexes. However, due to the heterogeneous nature of binding thermodynamics of ligand-receptor complexes in solution, it is difficult, if not impossible, to predict the binding free energies with single scoring function for all possible ligand-receptor complexes. The choice of molecular descriptors, which constitutes the variables for a scoring function. We have therefore conducted systematic evaluation of molecular descriptors to predict the binding free energies of ligand-receptor complexes based on comprehensive survey of the Protein Data Bank (PDB) and the binding affinity data in the literature. Also, the ligand-receptor complexes are optimally classified using a decision function to assign each complex to the best scoring function for the prediction of the prediction of the binding free energy. The results showed that this approach can significantly improve the accuracy and predictive power of the scoring scheme.



PB7 The Fragment Transformation Method to Detect the Protein Structural Motifs

Chih-Hao Lu,a Yeong-Shin Lin,b Yu-Ching Chen,a Chin-Sheng Yu,b Shi-Yu Chang,a and Jenn-Kang Hwanga,b,c

aInstitute of Bioinformatics, National Chiao Tung University, Taiwan
bDepartment of Biological Science & Technology, National Chiao Tung University, Taiwan
cCore Facility for Structural Bioinformatics, National Chiao Tung University, Taiwan


To identify functional structural motifs from protein structures of unknown function becomes increasingly important in recent years due to the progress of the structural genomics initiatives. Although certain structural patterns such as the Asp-His-Ser catalytic triad are easy to detect because of their conserved residues and stringently constrained geometry, it is usually more challenging to detect a general structural motifs like, for example, the bba-metal binding motif, which has a much more variable conformation and sequence. At present, the identification of these motifs usually relies on manual procedures based on different structure and sequence analysis tools. In this study, we develop a structural alignment algorithm combining both structural and sequence information to identify the local structure motifs. We applied our method to the following examples: the bba-metal binding motif and the treble clef motif. The bba- metal binding motif plays an important role in nonspecific DNA interactions and cleavage in host defense and apoptosis. The treble clef motif is a zinc-binding motif adaptable to diverse functions such as the binding of nucleic acid and hydrolysis of phosphodiester bonds. Our results are encouraging, indicating that we can effectively identify these structural motifs in an automatic fashion. Our method may provide a useful means for automatic functional annotation through detecting structural motifs associated with particular functions.



PB8 Automatic generation of possible initial structures to assist structural determination by NMR spectroscopy ¡Ð An application to £] class protein

Hsin-I Liaoa,* ,Yi-Chiao Fanga and Ta-Hsien Lina,b,c,d

aInstitute of Bioinformatics, bBioinformatics Program and
cInstitute of Biochemistry and Molecular Biology,
National Yang-Ming University, Taipei 112 Taiwan;
dDepartment of Medical Research & Education,
Taipei Veterans General Hospital, Taipei 112, Taiwan


Nuclear magnetic resonance (NMR) spectroscopy incorporate with molecular dynamics simulation is one of the preeminent techniques for determining the structure of protein. The main procedures include obtaining structural information, such as spatial constraints, from NMR experimental data, and then performing molecular dynamics simulation to calculate three-dimensional structure of protein. Generally, this process produces a lot of possible protein structures, with different tertiary folds. It is very time-consuming to identify which structure is the true one. This can be improved by acquiring more structural information from NMR experiments or employing homology modeling to rapidly build a 3D model as an initial structure for structural calculation. However, when the sequence identity is not high enough, one cannot apply homology modeling to build the initial structure. The goal of this study is to develop a protocol for automatic generation of possible initial structures for structural determination by NMR spectroscopy without any significant sequence similarity. One can rapidly generate possible initial structures by providing the sequence information and the secondary structure determined by NMR spectroscopy. An application to £] class protein will be presented.



PB9 Elucidating the molecular regulation of Ganoderma lucidum polysaccharides in human monocytic cells: from gene expression to network construction

Hsueh-Fen Juan1,2, Kun-Chieh Cheng1,5, Hsuan-Cheng Huang6, Chern-Han Ou1,3, Jenn-Han Chen7, Wen-Bin Yang8, Shui-Tein Chen4,8, Chi-Huey Wong8,9

1Department of Life Science, 2Institute of Molecular and Cellular Biology, 3Department of Electronic Engineering, 4Institute of Biochemical Sciences, National Taiwan University, 5Institute of Biotechnology, National Taipei University of Technology, 6Institute of Bioinformatics, National Yang-Ming University, 7School of Dentistry, National Defense Medical center, National Defense University, 8Institute of Biological Chemistry and the Genomics Research Center, Academia Sinica, Taipei, Taiwan, 9Department of Chemistry and The Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA 92037, U. S. A.


Ganoderma lucidum has been widely used as an herbal medicine for promoting health and longevity in China and other Asian countries. Polysaccharide extracts from Ganoderma lucidum has been reported to exhibit immuno-modulating and anti-tumor activities. We purified the active components of the polysaccharide extracts by gel filtration chromatography and designated it as F3. The major carbohydrate components in F3 are glucose, mannose and galactose. In our study, F3 can activate many cytokines such as IL-1£], IL-6, IL-12B, IL-8 and TNF-£\ in human THP-1 mononuclear cells. This gives rise to the question of how F3 does stimulate immuno-modulating and anti-tumor effects in THP-1 cells. Moreover, understanding the molecular mechanism underlying the F3 exertion in THP-1 cells is also of considerable importance. In order to do so, we used microarray, real-time quantitative PCR and bioinformatics methods to study the F3-induced effects on THP-1 cells. In the microarray data analysis, we identified the differentially disturbed pathways with statistical significance based on Fisher¡¦s exact test and false discovery rate. The pathway of apoptosis induction through the DR3 and DR4/5 death receptors is shown to be very significant and important in F3-treated THP-1 cells. With time-series gene expression measurements and our developed software GeneNetwork and BSIP, we reconstructed a plausible gene regulatory network. F3 may mimic death receptor ligand to initiate signaling via receptor oligomerization, recruitment of specialized adaptor proteins and activation of caspase cascades. This indicates that cell shrinkage induced by F3 may be triggered through the activation of DR3 and DR4/5 death receptor pathway. Our results provide a molecular explanation for the properties of F3 in human THP-1 mononuclear cells and put forth a valuable prospect for leukemia and cancer therapy.



PB10 Prediction of disulfide patterns from protein sequences

Yu-Ching Chena Jenn-Kang Hwanga,b,c

aInstitute of Bioinformatics, National Chiao Tung University, Taiwan
bDepartment of Biological Science & Technology, National Chiao Tung University, Taiwan
core Facility for Structural Bioinformatics, National Chiao Tung University, Taiwan


Disulfide bonds play important structural roles in both stabilizing the protein conformations and regulating protein functions. The ability to infer disulfide patterns directly from protein sequences will provide a valuable tool to biologists in the processes of investigating the structure-function relationship of proteins. However, the prediction of disulfide connectivity from protein sequences presents a major challenge to computational biologists due to the nonlocal nature of disulfide connectivity in terms of linear sequence, i.e., the spatial proximity cysteine pair does not necessary imply sequential closeness. In this report we treated each distinct disulfide pattern as a distinct class and solved the problem as a multi-class classification problem. However, we use the support vector machines based one sequence features such as the coupling between the local sequence environments of cysteine pair, the cysteines sequence separations, and the global sequence descriptor, such as amino acid content. Our approach is able to predict 55% of the disulfide patterns of proteins with two to five disulfide bridges.



PB11 Prediction of Protein Subcellular Localization

Chin-Sheng Yu,b Yu-Ching Chen,a Chih-Hao Lu,a and Jenn-Kang Hwanga,b,c

aInstitute of Bioinformatics, National Chiao Tung University, Taiwan
bDepartment of Biological Science & Technology, National Chiao Tung University, Taiwan
cCore Facility for Structural Bioinformatics, National Chiao Tung University, Taiwan


Recent years have seen a surging interest in developing computational approaches to predict subcellular localization. These methods, based on a wide range of algorithms, have achieved varying degrees of success for specific organisms and for certain localization categories. Here, we developed an approach based on a two-level support vector machine (SVM) system: the first level comprises a number of SVM classifiers, each based on a specific type of feature vectors derived from sequences; the second level SVM classifier functions as the jury machine to generate the probability distribution of decisions for possible localizations. We found that the homology search approach performs surprisingly well for identifying sequence homology as low as 25% sequence homology, but its performance deteriorates considerably for lower sequence identity. A data set of high homology levels obviously appear lead to biased assessment of the performances of the predictive approaches - especially those relying on homology search or sequence annotations. Since our two-level classification system based on SVM does not rely on homology search, its performance remains relatively unaffected by sequence homology. Our results, which are consistent with previous studies, indicate that the homology search approach performs surprisingly well for sequence homology as low as 30% sequence homology, but its performance deteriorates considerably for sequences sharing lower sequence identity. Furthermore, we also develop a practical hybrid method that pipelines the two-level SVM classifier and the homology search method in sequential order as a general tool for the sequence annotation of subcellular localization.



PB12 Deriving a scoring matrix for mapping protein local structure and sequence

Chia-Chuan Liu and Ming-Jing Hwang

Institute of Bioinformatics, National Yang-Ming University, Taipei, Taiwan
IBMS, Academia Sinica, Taipei, Taiwan


The correlation between protein local structure and sequence was low (r ~ -0.2) when one matches them using existing scoring matrices for amino acid sequence similarity. Here we improve the correlation by a new amino acid substitution scoring matrix. We created fragment pairs chosen randomly from PDBselect 25 (a set of protein structures with sequence identity less than 25%) and used Genetic Algorithm (GA) to optimize the correlation. In our results, the GA-optimized scoring matrix for fragment length of 5, 7 and 9 amino acids achieved a much better correlation (r ~ -0.5). The same approach was then applied for local structure classification using the I-sites library as a test set, which is a set of sequence patterns that strongly correlate with protein structure at the local level. The GA-optimized scoring matrix again achieved better results. Thus, in this work we have developed a GA-based approach that can produce amino acid substitution matrices suitable for mapping protein local structure and sequence.