Genome scale analysis of the immune response against pathogenic micro-organisms; identification of diagnostic markers, vaccine candidates and development of an integrated micro array platform for clinical investigations.
The genome sequences of microbial organisms responsible for diseases of world-wide medical importance have been sequenced or will be available in the near future. Technologies for producing large numbers of proteins have been developed and high-throughput assays such as protein micro arrays have been clinically validated for detecting the presence of antibodies, in serum, directed against microbial antigens. These achievements offer the opportunity of investigating the natural immune response against the whole proteome of a variety of micro-organisms. Powerful combinations of genomic information, molecular tools and immunological assays are becoming available to help identify the antigens that function as targets of protective immunity or could be used as markers for serodiagnosis. We propose here to identify in micro-organisms of great medical relevance (M. pneumoniae, C. pneumoniae, L. pneumophila, coronavirus spp and P. falciparum), a large collection of surface and secreted proteins as well as putative endotoxins. This protein repertoire will be produced as recombinant molecules or as sets of overlapping synthetic peptides and printed on array slides. The serum reactivity of groups of individuals with proven history of exposure to the selected micro-organisms will be analysed against the arrayed proteins to identify diagnostic markers and correlates of protection.
This project will significantly expand the SMEs bank of Intellectual Property and contribute to expertise within the RTDs. It is anticipated that the proposed work in high throughput protein expression, software analysis, surface peptides synthesis, protein and peptide surface capture, and array reader instrumentation will create an integrated platform of great commercial and research value. Finally it will contribute to unravelling how the humoral immune response interacts with the microbial proteomes thus filling the gap between genomic data and development of novel vaccines and diagnostic tools.
Structural genomics is a wide term describing the determination of a structure representation based on  information contained in the genome, and at present is almost exclusively limited to the proteins. Although in common understanding genetic information means “genes and their encoded protein products”, thousands of human genes produce transcripts which are biologically important but they do not produce proteins. Furthermore, even though the sequence of the human DNA is known by now, the meaning of the most of the sequences still remains unknown. It is very likely that a large amount of genes has been highly underestimated, mainly because the actual gene finders work well only for large, highly expressed, evolutionary conserved protein-coding genes. Most of those genome elements encode RNA from which transfer and ribosomal RNAs are the classical examples. But beside these well-known molecules there is a vast unknown world of tiny RNAs that might play a crucial role in a number of cellular processes. Those elements are named Noncoding RNAs (ncRNA) and they perform their function without transcription to the protein product. Here, we propose the development of the integrated bioinformatics platform that is specifically addressed for detecting, verifying, and classifying noncoding RNAs. This complex approach to “computational RNomics” will provide a pipeline which will be capable of detecting RNA motifs with low sequence conservation. It will also integrate the RNA motif prediction which should significantly improve the quality of the RNA homologues searching.
Deciphering the information on genome sequences in terms of the biological function of the genes and proteins is a major challenge of the post-genomic era. Currently, the bulk of function assignments for newly sequenced genomes is performed using bioinformatics tools that infer the function of a gene on the basis of sequence similarity with other genes of known function. It is now well recognised that these primary, sequence similarity-based function annotation procedures are frequently inaccurate and error prone. Continuing to use them without clearly defining the limits of their applicability would lead to an unmanageable propagation of errors that could jeopardise progress in Biology. On the other hand, various novel bodies of data and resources are becoming available. These provide information on context-based aspects of the biological function of genes, namely on physical and functional interactions between genes and proteins, and on whole networks and processes. In parallel structural genomics efforts world wide are providing a much better coverage of the structural motifs adopted by proteins and on their interactions. The availability of these additional and novel data offers an unprecedented opportunity for the development of methods for incorporating higher-level functional features into the annotation pipeline.
The GeneFun project aims at addressing these two important issues. The issue of annotation errors will de addressed by developing criteria for evaluating the reliability of the annotations currently available in databases. These criteria will be used to assign reliability scores to these annotations and will be incorporated into standard annotation pipelines, for future use. The issue of incorporating higher-level features into functional annotations will be addressed by combining sequence and structure information in order to identify non-linear functional features (e.g. interaction sites), and by integrating available and newly developed methods for inferring function from higher-level and context-based information (protein domain architecture, protein-protein interaction, genomic context such as gene order etc.).
To achieve these aims several European groups with strong track record in developing novel methods and analyses in comparative genomics, structural- and systems- oriented bioinformatics, and in information technology, have teamed up with an experimental group from Canada, which is well known for its outstanding achievements in the field of structural and functional proteomics. The expected output of the GeneFun project is: improved procedures for inferring function on the basis of sequence similarity, a set of procedures for predicting non-linear functional features from sequence and 3D structure in a more automated way, and benchmarked procedures for predicting context-based functional features. Major efforts will be devoted to devising protocols that optimally combine the results from several methods. In particular Web-based servers to the individual and combined procedures will be developed, and made available to the scientific community. The community will be introduced to these new tools through open workshops and training sessions.