Performance Analysis of Selected Clustering Techniques for Software Defects Prediction

Balogun, AbdullateefOladele, RufusMojeed, HammedAmin-Balogun, BarakatAdeyemo, Victor ElijahAro, Taye Olalere2019-06-182019-06-182019-06-01Abdullateef O. Balogun, Rufus O. Oladele, Hammed A. Mojeed, Barakat Amin-Balogun, Victor E. Adeyemo and Taye O. Aro (2019), Performance Analysis of Selected Clustering Techniques for Software Defects Prediction, Afr. J. Comp. & ICT, Vol.12, No. 2, pp. 30 - 42.2006-1781http://hdl.handle.net/123456789/2208Classification algorithms that help to predict software defects play a major role in the software engineering process. This study investigated the application and performance of clustering techniques in software defect prediction (SDP). Seven clustering techniques; Farthest First Clusterer, K-Means, X-Means, Sequential information Bottleneck, Hierarchical Clusterer, Make-Density Clusterer, and Expectation Maximization were used for the classification of 8 software defect datasets from NASA repository. Experimental results revealed that the use of clustering technique as a classification process is well established as it gave a good predictive performance. Based on average accuracy across the 8 datasets, Farthest First had the best performance of 86.16%, Hierarchical clustering had 85.50% while KMeans Clustering techniques had 72.33% respectively. Expectation Maximization (EM) (33.52%) and X-Means (48.84%) gave rather poor results and Sequential Information bottleneck (SIB) (63%) and Density-based clustering techniques (71.08%) had average performances. In addition, further comparison of classification via clustering techniques with selected standard classification techniques; k-Nearest Neighbor (kNN), Naïve Bayes (NB), and Decision Tree (DT) showed that some classification via clustering techniques (Farthest First and Hierarchical Clustering Techniques) performed considerably well and outperforms some standard classification algorithms. With this, classification via clustering techniques can be considered as an alternative approach to standard classification methods in SDP. It produced a good and competitive predictive performance in SDP with an advantage of not necessarily training a predictive model and using annotated datasets while developing the predictive model. Consequently, SDP models developed using classification via clustering techniques models can be transferred from one project to another as no training of the model is involved. This will help reduce and manage the available resources during the software development process.enClustering TechniqueClassification TechniqueSoftware Defects PredictionSoftware EngineeringPerformance Analysis of Selected Clustering Techniques for Software Defects PredictionArticle