Sequential Optimization Based Feature Selection Algorithm for Efficient Cancer Classification and Prediction

Banjoko, A. W.; Yahya, W. B.

Sequential Optimization Based Feature Selection Algorithm for Efficient Cancer Classification and Prediction

dc.contributor.author	Banjoko, A. W.
dc.contributor.author	Yahya, W. B.
dc.date.accessioned	2021-05-05T09:16:48Z
dc.date.available	2021-05-05T09:16:48Z
dc.date.issued	2018
dc.description.abstract	This study proposes an efficient method for optimal selection of feature subsets to enhance the classification performance of Support Vector Machine (SVM) in a binary and multiclass response high-dimensional genomic microarray data using Multi-Objective Optimization (MOO) approach. In a Monte-Carlo experiment, a pre-selection of the features was performed with the filter method based on Sidak alpha value to reduce the number of false positive features in the data. The optimal values of the tuning parameters for both the SVM cost and Radial Basis Function (RBF) kernel were determined by grid search in a 10–fold cross-validation. The SVM with RBF kernel was then fitted sequentially to select the set of near optimal genes that are correlated with the response class. The proposed algorithm was compared with the following four machine learning methods: Naïve Bayes (NB), Random Forest (RF), Random Forest with variable selection (RFVS) and LASSO. The Misclassification Error Rate (MER) of the proposed method on simulated data was 1.1% with a sensitivity of 97.8% using four (near) optimal selected genes. In contrast, the MERs of NB, RF, RFVS and LASSO classifiers with 10, 10, 9 and 37 genes were 4.28%, 5.03%, 4.98% and 0.00% respectively using the data. Application of the proposed method on published Leukaemia data yielded an MER of 0.03% with a sensitivity of 99.95% based on three (3) optimally selected genes. On the other hand, the MERs of NB, RF, RFVS and LASSO classifiers for the Leukaemia data were 1.0%, 3.0%, 5.67% and 0.00% based on 93, 93, 2 and 31 genes respectively. These same fits of performance were achieved by all the methods considered on multiclass response DNA data set. The results generally showed that the proposed algorithm is more parsimonious and achieved better predictive performance than some of the existing methods considered. The sets of optimally selected gene subsets in the data employed here can be further investigated by molecular biologist to establish the pathology of these genes with respect to their respective tumour classes.	en_US
dc.identifier.uri	https://uilspace.unilorin.edu.ng/handle/20.500.12484/4926
dc.language.iso	en	en_US
dc.publisher	Proceedings of the 14th iSTEAMS International Multidisciplinary Conference, Al-Hikmah University, Ilorin, Nigeria	en_US
dc.subject	Support Vector machines, Feature selection, Multi-Objective Optimization	en_US
dc.title	Sequential Optimization Based Feature Selection Algorithm for Efficient Cancer Classification and Prediction	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Banjoko & Yahya.pdf
Size:: 1.37 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.69 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Sequential Optimization Based Feature Selection Algorithm for Efficient Cancer Classification and Prediction

Files

Original bundle

License bundle

Collections