Feature selection and computational optimization in high-dimensional microarray cancer datasets via InfoGain-modified bat algorithm

Hambali, M. A.Oladele, Tinuke OmolewaAdewole, K. S.Sangaiah, A. K.Gao, W.2023-05-122023-05-122022https://uilspace.unilorin.edu.ng/handle/20.500.12484/10170Achieving a satisfactory cancer classification accuracy with the complete set of genes remains a great challenge, due to the high dimensions, small sample size, and presence of noise in gene expression data. Feature reduction is critical and sensitive in the classification task, most importantly in heterogeneous multimedia data. One of the major drawbacks in cancer study is recognizing informative genes from thousands of available genes in microarray data. Traditional feature selection algorithms have failed to scale on large space data like microarray data. Therefore, an effective feature selection algorithm is required to explore the most significant subset of genes by removing non-predictive genes from the dataset without compromising the accuracy of the classification algorithm. The study proposed an information Gain – Modified Bat Algorithm (InfoGain-MBA) features selection model for selecting relevant and informative features from high dimensional Microarray cancer datasets and evaluate the approach with four classifiers - C4.5, Decision Tree, Random Forest and classification and regression tree (CART). The results obtained show that the proposed approach is promising for the classification of microarray cancer data. The random forest has 100% accuracy with few genes in all seven datasets used. Further investigations were also conducted to determine the optimal threshold for each of the datasets.enFeature selection, Binary bat algorithm, Information gain, Cancer classification, Microarray data, Random forest, Computational optimizationFeature selection and computational optimization in high-dimensional microarray cancer datasets via InfoGain-modified bat algorithmArticle