Heterogeneous Ensemble Methods Based On Filter Feature Selection

No Thumbnail Available

Date

2016

Journal Title

Journal ISSN

Volume Title

Publisher

Research Nexus Africa’s Networks in Conjunction with The African Institute of Development Informatics & Policy (AIDIP) Ghana & The International Centre for Information Technology & Development (ICITD), USA

Abstract

While certain computationally expensive novel methods can construct predictive models with high accuracy from high dimensional data, it is still of interest in many applications to reduce the dimension of the original data prior to any modeling of the data. Hence, this research presents a précis of ensemble methods (Stacking, Voting and Multischeme) and Multilayer perceptron, K Nearest Neighbour and NBTree with a framework on the performance measurement of base classifiers and ensemble methods with and without feature selection techniques (Principal Component Analysis, Information Gain Attribute Selection and Gain Ratio Attribute Selection). The enhancement is based on performing feature selection on dataset prior to classification. The notion of this study is to evaluate the performances of the ensemble methods on original and reduced datasets. A 10-fold cross validation technique is used for the performance evaluation of the ensemble methods and base classifiers (Root to Local) R2L KDD cup 1999 dataset and UCI Vote dataset using Waikato environment for knowledge analysis (WEKA) tool. The experiment revealed that the reduced dataset yielded improved results than the full dataset after using the ensemble methods based on stacking, voting and multischeme. On the R2L dataset, Multischeme ensemble method gave accuracy of 98.76% with PCA as feature selection on R2L dataset while 98.58% accuracy was given without feature selection. Using the gain ratio attribute selection, the Multischeme gave 98.93% accuracy over 98.76% without feature selection while using information gain attribute selection gave accuracy 98.85% over 98.76% without feature selection. For the Vote Dataset, Multischeme ensemble method proved best with an accuracy of 92.18% with PCA feature selection over 89.88% without feature selection, 95.40% accuracy with information gain as feature selection over 93.10% without feature selection and 95.40% accuracy with gain ratio as feature selection over 93.10% without feature selection. In arguably, it can be concluded that ensemble methods works well with feature selection.

Description

Keywords

Machine Learning, Data Mining, Ensemble Methods, Feature Selection

Citation

Ameen A. O., Balogun A. O., Usman G. & Fashoto, S.G. (2016): Heterogenous Ensemble Methods Based On Filter Feature Selection. Computing, Information System Development Informatics & Allied Research Journals. Vol 7 No 4. Pp 63-78

Collections