Multiclass Feature Selection and Classification with Support Vector Machine in Genomic Study
No Thumbnail Available
Date
2017
Journal Title
Journal ISSN
Volume Title
Publisher
Edited Conference Proceedings of the 1st International Conference of the Nigeria Statistical Society (NSS).
Abstract
This study proposes an efficient Support Vector
Machine (SVM) algorithm for feature selection and
classification of multiclass response group in high dimensional
(microarray) data. The Feature selection stage of the algorithm
employed the F-statistic of the ANOVA–like testing scheme at
some chosen family-wise-error-rate (FWER) to control for the
detection of some false positive features. In a 10-fold cross
validation, the hyper-parameters of the SVM were tuned to
determine the appropriate kernel using one-versus-all
approach. The entire simulated dataset was randomly
partitioned into 95% training and 5% test sets with the SVM
classifier built on the training sets while its prediction accuracy
on the response class was assessed on the test sets over 1000
Monte-Carlo cross-validation (MCCV) runs. The classification
results of the proposed classifier were assessed using the
Misclassification Error Rates (MERs) and other performance
indices. Results from the Monte-Carlo study showed that the
proposed SVM classifier was quite efficient by yielding high
prediction accuracy of the response groups with fewer
differentially expressed features than when all the features
were employed for classification. The performance of this new
method on some published cancer data sets shall be examined
vis-à-vis other state-of-the-earth machine learning methods in
future works.
Description
Keywords
Support Vector Machines, Monte-Carlo CrossValidation, F-Statistic, Family wise error rate, Misclassification Error Rate