The Bioinformatics Lab

Home People Publications Projects Courses Funding Contact Archive

Pathway based Feature Selection for Cancer Microarray Data

Classification of cancer is an important problem, as class specific medication improves the treatment efficiency and reduces unwanted side effects. Classification algorithms based on gene expressions often have better prediction accuracy than the ones based on clinical markers. However, microarray gene expressions contain a large number of genes (also called features) and there are often small number of microarray data samples. As a result, classification algorithms are prone to overfitting. One way to avoid this problem is to select a small set of relevant features and ignore the remaining ones. This paper develops a new feature selection method called Biological Pathway based Feature Selection (BPFS) for microarray data. Unlike most of the existing methods, our method integrates signaling and gene regulatory pathways with gene expression data so that the selected features have significant classification accuracy and also have least interaction among each other on the pathway. Thus, BPFS selects a biologically meaningful feature set that is minimally redundant. Our experiments on published breast cancer datasets demonstrate that all of the top 20 genes found by our method are associated with cancer. Furthermore, the classification accuracy of our signature is up to 18% better than that of van' t Veer's 70 gene signature, and it is up to 8% better accuracy than the best published feature selection method, I-RELIEF.
Download the software.

Tamer Kahveci
Last modified: Thu May 7 10:20:32 EDT 2009