Integration and Reduction of Microarray Gene Expressions Using an Information Theory Approach

Document Type : Research Article


1 Faculty of Computer Engineering, Iran University of Science and Technology (IUST), Tehran, I.R. IRAN

2 Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, I.R. IRAN


The DNA microarray is an important technique that allows researchers to analyze many gene expression data in parallel. Although the data can be more significant if they come out of separate experiments, one of the most challenging phases in the microarray context is the integration of separate expression level datasets that have gathered through different techniques. In this paper, we present a general novel method for the integration of any collected data whose distributions have been linearly transformed. The new method is based on the information theory concepts. More than that, this article presents a new approach for checking of the linearity between two distributions as a validation technique. The validation technique assists in taking the feature reduction process in effect prior to the integration phase. The time complexity of the proposed algorithm is low and the new presented methods show good functionality. The experimental results are presented at the end of the paper.


Main Subjects

[1] Iyer V.R. et al., The Transcriptional Program in the Response of Human Fibroblasts to Serum, Science, 283, 83, (1999).
[2] Choi J.K., YuU., Kim S., Yoo O.J., Combining Multiple Microarray Studies and Modeling Interstudy Variation, Bioinformatics, 19, p. 84, (2003).
[3] Singh D. et al., Gene Expression Correlates of Clinical Prostate Cancer Behavior, Cancer Cell, 1, p. 203 (2002).
[4] Chen W.B., Zhang C., Liu W.L., An Automatic and Robust Method for Microarray Image Analysis and the Related Information Retrieval for Microarray Databases, "(ICDE 2007) IEEE 23rd International Conference", 85 (2007).
[5] Katzer F.K.M., Methods for Automatic Microarray Image Segmentation, IEEE Transactions on nanobioscience, 2, p. 202 (2003).
[6] Knudsen S., "Guid to Analysis DNA Microarray Data", John Wiley, (2007).
[7] Conde L., Mateos A., Herrero J., Dopazo J., Unsupervised Reduction of the Dimensionality Followed by Supervised Learning with a Perceptron Improves the Classification of Conditions in DNA Microarray Gene Expression Data, "Neural Networks for Signal Processing", 77 (2002).
[8] Huang T.-M., Kecman V., Kopriva I., Feature Reduction with Support Vector Machines and Application in DNA Microarray Analysis, "Kernel Based Algorithms For Mining Huge Data Sets", Springer, 95 (2007).
[9] Lai Y., Eckenrode S.E., She J., A Statistical Framework for Integrating Two Microarray Data Sets in Differential Expression Analysis, "17th Asia Pacific Bioinformatics Conference (APBC2009)", (2009).
[10] Jiang H. et al., Joint Analysis of Two Microarray Gene-Expression Data Sets of Select Lung adenocarcinoma marker genes, BMC Bioinformatics, 5, p. 8 (2004).
[11] Yoon Y., Lee J., Park S., Building a Classifier for Integrated Microarray Datasets Through Two-Stage Approach, "BioInformatics and BioEngineering, 2006. BIBE 2006. Sixth IEEE Symposium", 94, (2006).
[12] Kang J., Yang J., Xu W., Chopra P., Integrating Hetrogeneous Microarray Data Sources using Correlation Signatures, Data Integration in the Life Sciences (DILS), 3615/2005, p. 105 (2006).
[13] Xu L., Tan A., Naiman D., Geman D., Winslow R., Robust Prostate Cancer Marker Genes Emerge from Direct Integration of Inter-Study Microarray Data, Bioinformatics, 21, p. 3905 (2005).
[14] Conlon E., Song J., Liu J., Bayesian Models for Pooling Microarray Studies with Multiple Sources of Replications, BMC Bioinformatics, 7, p. 247, (2006).
[15] Hong F., A Comparison of Meta-Analysis Methods for Detecting Differentially Expressed Genes in Microarray Experiments, Bioinformatics, 24, p. 374, (2008).
[16] Xu L., Tan A., Winslow, R. Geman D., Merging Microarray Data from Separate Breast Cancer Studies Provides a Robust Prognostic Test, BMC Bioinformatics, 9, p. 125 (2008).
[17] Borozan I. et al., MAID: An Effect Size Based Model for Microarray Data Integration Across Laboratories and Platforms, BMC Bioinformatics, 9, p. 305 (2008).
[18] Cahan P. et al., List of Lists-Annotated(LOLA): a Database for Annotation and Comparison of Published Microarray Gene Lists, Gene, 360, p. 78 (2005).
[19] Cover T.M., Thomas, J.A., "Elements of information theory", Wiley, (2006).
[20] Welsh J.B. et al., Analysis of Gene Expression Identifies Candidate Markers and Pharmacological Targets in Prostate Cancer, Cancer Research, 61, p. 5974 (2001).
[21] Chang C.C., Lin. C.J., `cjlin/libsvm, (2001).