High-throughput measurement techniques allow genome-wide studies of biological function. Gene expression, gene regulation, protein content, protein interaction, and metabolic profiles can be measured and combined with sequence information. The major challenge is to extract meaningful findings from large, noisy, high-dimensional, heterogeneous, and incomplete data sets. We develop new computational data analysis methods for taking benefit from prior knowledge and previous experiments in biomedical research.
|
Translational medicine attempts to bring basic research findings to clinical practice. One of the necessary steps of this process is to translate inferences made on the molecular level, for example about metabolites, in model organisms into inferences about humans. Metabolomics is the study of the set of all metabolites found in a sample tissue. Metabolite concentrations are affected strongly by diseases and drugs, and hence they complement the genomic, proteomic, and transcriptomic measurements in an excellent way, in studies of the biological state of an organism. We have developed new computational methods for mapping observed metabolomics data between model organisms and humans.
|
|
|
Large repositories of genome-wide measurement data pose the research question of how to systematically relate different data sets. Re-usage of data sets increases the statistical power of novel studies and opens up the possibility to put biological results in the context of previous studies. To complement keyword search functionalities provided by most repositories for retrieval of similarly annotated studies, we developed machine learning methods that relate gene expression studies through their actual measurement data, along with visualization tools that allow exploring and interpreting the results. In the REx project (Retrieval of Relevant Experiments), relevance is defined by a model of biology that is both data- and knowledge-driven. Collaborators
Specific ProjectsRepresentative Publications
|
|
|
A living cell is an extremely complex system, and hence integration of information from multiple sources is needed for revealing the true potential of the modern high-throughput measurement methods, such as gene expression or micro-RNA data, combined with relational information of the genes, environmental factors, and disease. Much of the data integration literature focuses either on well-targeted combinations of sources, such as using sequence-based regulators for explaining gene expression, or on well-focused prediction tasks such as predicting molecular interactions from several data sources. We have focused on knowledge discovery types of problems where the goal is to discover what is relevant in massive data sets by aiming to discover connections between data sources.
|
|