Deciphering the information on genome sequences in terms of the biological function of the genes and proteins is a major challenge of the post-genomic era. Currently, the bulk of function assignments for newly sequenced genomes is performed using bioinformatics tools that infer the function of a gene on the basis of sequence similarity with other genes of known function. It is now well recognised that these primary, sequence similarity-based function annotation procedures are frequently inaccurate and error prone. Continuing to use them without clearly defining the limits of their applicability would lead to an unmanageable propagation of errors that could jeopardise progress in Biology. On the other hand, various novel bodies of data and resources are becoming available. These provide information on context-based aspects of the biological function of genes, namely on physical and functional interactions between genes and proteins, and on whole networks and processes. In parallel structural genomics efforts world wide are providing a much better coverage of the structural motifs adopted by proteins and on their interactions. The availability of these additional and novel data offers an unprecedented opportunity for the development of methods for incorporating higher-level functional features into the annotation pipeline.
The GeneFun project aims at addressing these two important issues. The issue of annotation errors will de addressed by developing criteria for evaluating the reliability of the annotations currently available in databases. These criteria will be used to assign reliability scores to these annotations and will be incorporated into standard annotation pipelines, for future use. The issue of incorporating higher-level features into functional annotations will be addressed by combining sequence and structure information in order to identify non-linear functional features (e.g. interaction sites), and by integrating available and newly developed methods for inferring function from higher-level and context-based information (protein domain architecture, protein-protein interaction, genomic context such as gene order etc.).
To achieve these aims several European groups with strong track record in developing novel methods and analyses in comparative genomics, structural- and systems- oriented bioinformatics, and in information technology, have teamed up with an experimental group from Canada, which is well known for its outstanding achievements in the field of structural and functional proteomics. The expected output of the GeneFun project is: improved procedures for inferring function on the basis of sequence similarity, a set of procedures for predicting non-linear functional features from sequence and 3D structure in a more automated way, and benchmarked procedures for predicting context-based functional features. Major efforts will be devoted to devising protocols that optimally combine the results from several methods. In particular Web-based servers to the individual and combined procedures will be developed, and made available to the scientific community. The community will be introduced to these new tools through open workshops and training sessions.
Python Lead Software Engineer Tasks: develop highly scalable p2p and cloud based products in Python create easy-to-read, fast and well architected quality code designing application architecture creating technical documentation providing…
Genetically Modified Microbes (GMM) are a biotechnological alternative to different environmental problems such as remediation of polluted sites, where microbes with recombinant catabolic pathways are envisaged as the solution for removal of toxic organic compounds. Moreover, the exploration and exploitation of synergistic interactions between plants and microbes for phytoremediation is also a target to solve contamination problems. Critical to the safe application of recombinant microbes in the environment, and re-assurance of public concerns, is adequate information on safety-related properties of the microbes in question. Current whole genome sequencing efforts on relevant microbes provide a unique opportunity to extract completely new safety-related information, to conduct experiments to generate important new data, and to create new tools for increasing the degree of predictability of the behaviour of strains designed for applications in the open environment or in industrial bioreactors.
One of the microorganisms with current applications in Biotechnology is Pseudomonas putida, a paradigm of metabolically versatile microorganism which recycles organic wastes in aerobic compartments of the environment, and thereby plays a key role in the maintenance of environmental quality. The strain KT2440 is the most extensively characterised and best understood strain of P. putida. KT2440 is a nonpathogenic bacterium certified in 1981 by the Recombinant DNA Advisory Committee (RAC) of the United States National Institutes of Health as the host strain of the first Host-Vector Biosafety (HV1) system for gene cloning in Gram negative soil bacteria. Since then, KT2440 has been used world-wide as host of choice for environmental applications involving expression of cloned genes. This strain is one of the few nonpathogenic microbes which are subject to whole genome sequencing by a P. putida genome project currently in progress in Germany. The sequence data generated in the genome project is being made public at appropriate intervals (a 10-fold genome equivalent of raw sequence data is already available) and will constitute an invaluable resource for this project. Therefore, this microorganism, its recombinant derivatives and the body of knowledge accumulated in the last 20 years on its genetics, physiology and biochemistry make it an ideal and friendly microbe for safe biotechnological applications in the environment.
The major aim of this project is to settle the basis to reduce in a rational, environmentally friendly, and safe manner our contamination problems by developing P. putida strains useful to design environmental treatment systems in harmony with the biosphere.
Chirality is a key factor in the efficacy of many drugs and the production of single enantiomers of chiral intermediates has therefore become increasingly important. Biocatalysis offers high enantioselectivity and regioselectivity in chiral synthesis through enzyme-catalyzed reactions and thus has an important advantage over chemical synthesis. Molecular genomic data is an unprecedented resource of enzymes for biocatalysis, but rational and effective methodologies must be established to realize the full potential of these resources. This project will focus on the discovery of novel enzymes, from both public and proprietary eubacterial genomes, in particular novel alcohol dehydrogenases, cytochrome P450 monooxygenases and amino acid modifying enzymes for use in established and innovative processes for chiral synthesis.
The DataGenome project extends from genome analysis, through cloning, expression, enzyme production, screening and protein engineering, to the enzymatic production of chiral biomolecules. The design of the project takes advantage of broad funnel-approach starting with innovative data-mining and processing of a large number of genes to ensure high flow-through in the process and rational selection of best enzyme candidates. The specific combination of expertise and design of the research project is aimed at high success-rate for the development of successful biocatalysts. Emphasis will be put on effective bioinformatics analysis to minimize the requirement for the more laborious “wet chemistry” analysis as well as development of optimized vector-host systems for efficient gene expression and enzyme production. Rational protein engineering or directed molecular evolution will be employed in order to obtain more robust variants, new substrate preferences or enhanced enantiomeric selectivity. Selected enzymes will be tested in existing and/or novel biocatalytic processes for production of chiral pharmaceutical intermediates with applications in therapeutic areas including AIDS, cancer and Alzheimer’s disease.
Structural genomics is a wide term describing the determination of a structure representation based on ย information contained in the genome, and at present is almost exclusively limited to the proteins. Although in common understanding genetic information means “genes and their encoded protein products”, thousands of human genes produce transcripts which are biologically important but they do not produce proteins. Furthermore, even though the sequence of the human DNA is known by now, the meaning of the most of the sequences still remains unknown. It is very likely that a large amount of genes has been highly underestimated, mainly because the actual gene finders work well only for large, highly expressed, evolutionary conserved protein-coding genes. Most of those genome elements encode RNA from which transfer and ribosomal RNAs are the classical examples. But beside these well-known molecules there is a vast unknown world of tiny RNAs that might play a crucial role in a number of cellular processes. Those elements are named Noncoding RNAs (ncRNA) and they perform their function without transcription to the protein product. Here, we propose the development of the integrated bioinformatics platform that is specifically addressed for detecting, verifying, and classifying noncoding RNAs. This complex approach to “computational RNomics” will provide a pipeline which will be capable of detecting RNA motifs with low sequence conservation. It will also integrate the RNA motif prediction which should significantly improve the quality of the RNA homologues searching.
The objective of this project is to use Field Programmable Gate Arrays (FPGAs) in Bioinformatics. The primary application will be the Smith Waterman algorithm. Later we will assess the benefits of applying FPGA for other bioinformatics tasks. As part of the project a large cluster of over 500 simple (Spartan6) FPGAs will be build.
Within this project we will test the applicability of crypto currencies as a facilitator of political movements. The currency will be used to motivate people to (1) select members of parliament based on fitness of personal opinions on major political issues and (2) monitor the correlation between verbally expressed statements and actually passed acts. The currency will be also tested as an alternative means to finance political activities and assess the trust towards political organizations. The currency will be designed to offer advantages over other currently most popular currencies and remain competitive also after the experiment.