http://dx.doi.org/10.1142/S1793536911000763">
 

Document Type

Journal Article

Department/Unit

Department of Mathematics

Title

On gene selection and classification for cancer microarray data using multi-step clustering and sparse representation

Language

English

Abstract

Microarray data profiles gene expression on a whole genome scale, and provides a good way to study associations between gene expression and occurrence or progression of cancer disease. Many researchers realized that microarray data is useful to predict cancer cases. However, the high dimension of gene expressions, which is significantly larger than the sample size, makes this task very difficult. It is very important to identify the significant genes causing cancer. Many feature selection algorithms have been proposed focusing on improving cancer predictive accuracy at the expense of ignoring the correlations between the features. In this work, a novel framework (named by SGS) is presented for significant genes selection and efficient cancer case classification. The proposed framework first performs a clustering algorithm to find the gene groups where genes in each group have higher correlation coefficient, and then selects (1) the significant (2) genes in each group using the Bayesian Lasso method and important gene groups using the group Lasso method, and finally builds a prediction model based on the shrinkage gene space with efficient classification algorithm (such as support vector machine (SVM), 1NN, and regression). Experimental results on public available microarray data show that the proposed framework often outperforms the existing feature selection and prediction methods such as SAM, information gain (IG), and Lasso-type prediction models. © 2011 World Scientific Publishing Company.

Keywords

cancer prediction, classification, clustering, Gene selection, Lasso

Publication Date

2011

Source Publication Title

Advances in Adaptive Data Analysis

Volume

3

Issue

01n02

Start Page

911

End Page

918

Publisher

World Scientific Publishing

ISSN (print)

17935369

ISSN (electronic)

17937175

This document is currently not available here.

Share

COinS