Analysis of changes in gene expression time series data

Reference:

Mikko Korpela. Analysis of changes in gene expression time series data. Master's thesis, Helsinki University of Technology, Finland, February 2006.

Abstract:

Gene expression is the process by which genes control the biological functions of an organism through the production of proteins. DNA microarray technology enables simultaneous evaluation of the expression of tens of thousands of genes. By using multiple arrays, expression measurements can be done in different conditions and time points.

In this thesis, a gene expression data set is analysed. The data originate from experiments where the effect of asbestos on three different cell lines was studied. First, the data are subjected to various quality control methods. The work continues with descriptions of preprocessing and analysis methods.

The purpose of preprocessing is, among other things, to reduce non-biological variation in the data and to enable the comparison of measurements from different arrays. The RMA preprocessing method is used here. The method consists of the following steps: background correction, normalisation, log-transformation, and summarisation.

Preprocessed expression values are joined into time series which describe the differences between asbestos exposed and normal samples. A recent clustering method designed for short time series is employed in the analysis of the time series. The method includes the assessment of each cluster's statistical significance. The clustering scheme is laid out on an algorithmic level. An error in one part of the method is corrected, and an additional intermediate phase is also introduced.

Information available on genes is used in the analysis of clustering results. For example, it is interesting if a cluster has a significant amount of genes that have the same biological function. Also the clustering of known asbestos-related genes is studied. The implementation of the clustering algorithm is tested by repeating an experiment done on synthetic data. Finally, some results related to the asbestos data are shown. Actual conclusions are left for biologists to draw.

Keywords:

bioinformatics, DNA microarray, gene expression, time series, clustering, preprocessing, asbestos

Suggested BibTeX entry:

@mastersthesis{KorpelaMSc,
    address = {Finland},
    author = {Mikko Korpela},
    month = {February},
    school = {Helsinki University of Technology},
    title = {Analysis of changes in gene expression time series data},
    year = {2006},
}

See www.cis.hut.fi ...