Sequential Optimization Based Feature Selection Algorithm for Efficient Cancer Classification and Prediction

dc.contributor.authorBanjoko, A. W.
dc.contributor.authorYahya, W. B.
dc.date.accessioned2021-05-05T09:16:48Z
dc.date.available2021-05-05T09:16:48Z
dc.date.issued2018
dc.description.abstractThis study proposes an efficient method for optimal selection of feature subsets to enhance the classification performance of Support Vector Machine (SVM) in a binary and multiclass response high-dimensional genomic microarray data using Multi-Objective Optimization (MOO) approach. In a Monte-Carlo experiment, a pre-selection of the features was performed with the filter method based on Sidak alpha value to reduce the number of false positive features in the data. The optimal values of the tuning parameters for both the SVM cost and Radial Basis Function (RBF) kernel were determined by grid search in a 10–fold cross-validation. The SVM with RBF kernel was then fitted sequentially to select the set of near optimal genes that are correlated with the response class. The proposed algorithm was compared with the following four machine learning methods: Naïve Bayes (NB), Random Forest (RF), Random Forest with variable selection (RFVS) and LASSO. The Misclassification Error Rate (MER) of the proposed method on simulated data was 1.1% with a sensitivity of 97.8% using four (near) optimal selected genes. In contrast, the MERs of NB, RF, RFVS and LASSO classifiers with 10, 10, 9 and 37 genes were 4.28%, 5.03%, 4.98% and 0.00% respectively using the data. Application of the proposed method on published Leukaemia data yielded an MER of 0.03% with a sensitivity of 99.95% based on three (3) optimally selected genes. On the other hand, the MERs of NB, RF, RFVS and LASSO classifiers for the Leukaemia data were 1.0%, 3.0%, 5.67% and 0.00% based on 93, 93, 2 and 31 genes respectively. These same fits of performance were achieved by all the methods considered on multiclass response DNA data set. The results generally showed that the proposed algorithm is more parsimonious and achieved better predictive performance than some of the existing methods considered. The sets of optimally selected gene subsets in the data employed here can be further investigated by molecular biologist to establish the pathology of these genes with respect to their respective tumour classes.en_US
dc.identifier.urihttps://uilspace.unilorin.edu.ng/handle/20.500.12484/4926
dc.language.isoenen_US
dc.publisherProceedings of the 14th iSTEAMS International Multidisciplinary Conference, Al-Hikmah University, Ilorin, Nigeriaen_US
dc.subjectSupport Vector machines, Feature selection, Multi-Objective Optimizationen_US
dc.titleSequential Optimization Based Feature Selection Algorithm for Efficient Cancer Classification and Predictionen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Banjoko & Yahya.pdf
Size:
1.37 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.69 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections