Browsing by Author "Olaniran, O. R."
Now showing 1 - 9 of 9
Results Per Page
Sort Options
Item Efficient Support Vector Machine Classification of Diffuse Large B-Cell Lymphoma and Follicular Lymphoma mRNA Tissue Samples(Annals Computer Science Series, 2015-04-25) Banjoko, A. W.; Yahya, W. B.; Garba, M. K.; Olaniran, O. R.; Dauda, K. A.; Olorede, K. O.This paper proposes a weighted Support Vector Machine (w-SVM) method for efficient class prediction in binary response data sets. The proposed method was obtained by introducing weights which utilizes the point biserial correlation between each of the predictors and the dichotomized response variable into the standard SVM algorithm to maximize the classification accuracy. The optimal value of the proposed w-SVM cost and each of the kernels parameters were determined by grid search in a 10-fold cross validation resampling method. Monte-Carlo Cross Validation method was employed to examine the predictive power of the proposed method by partitioning the data into train and test samples using different sampling splitting ratios. Application of the proposed method on the simulated data sets yielded high prediction accuracy on the test sample. Results from other performance indices further gave credence to the efficiency of the proposed method. The performance of the proposed method was compared with three of the state-of-the art machine learning methods including the standard SVM and the result showed the superiority of this method over others. Finally, the results generally show that the modified algorithm with Radial Basis Function (RBF) Kernel perform excellently and achieved the best predictive performance than any of the existing classifiers considered.Item Efficient Support Vector Machine Classification of Diffuse Large B-Cell Lymphoma and Follicular Lymphoma mRNA Tissue Samples(Faculty of Computer and Applied Computer Science, Tibiscus University of Timisoara, Romania., 2015) Banjoko, A. W.; Yahya, W. B.; Garba, M. K; Olaniran, O. R.; Dauda, K. A.; Olorede, K. O.In this study, an efficient Support Vector Machine (SVM) algorithm that incorporates feature selection procedure for efficient identification and selection of gene biomarkers that are predictive of Diffuse Large B–Cell Lymphoma (DLBCL) and Follicular Lymphoma (FL) cancer tumor samples is presented. The data employed were published real life microarray cancer data that contained 7,129 gene expression profiles measured on 77 biological samples that comprised 58 DLBCL and 19 FL tissue samples. The dimension reduction approach of the Welch statistic was employed at the feature selection phase of the SVM algorithm. The cost and kernel parameters of the SVM model were tuned over a 10–fold cross-validation to improve the efficiency of the SVM classifier. The entire sample was randomly partitioned into 95% training and 5% test samples. The SVM classifier was trained using Monte Carlo Crossvalidation approach with 1000 replications. The performance of this classifier was assessed on the test samples using misclassification error rate (MER) and other performance measures. The results showed that the SVM classifier is quite efficient by yielding very high prediction accuracy of the tumor samples with fewer differentially expressed genes. The selected gene biomarkers in this work can be subjected to further clinical screening for proper determination of their biological relationship with DLBCL and FL tumour subgroups. However, more studies with large samples might be needed in future to validate the results from this workItem IMPROVED BAYESIAN FEATURE SELECTION AND CLASSIFICATION METHODS USING BOOTSTRAP PRIOR TECHNIQUES(Faculty of Computer and Applied Computer Science, Tibiscus University of Timisoara, Romania, 2016) Olaniran, O. R.; Olaniran, S. F.; Yahya, W. B.; Banjoko, A. W.; Garba, M. K.; Amusa, L. B.; Gatta, N. F.In this paper, the behavior of feature selection algorithms using the traditional t-test, Bayesian t-test using MCMC and Bayesian two-sample test using proposed bootstrap prior technique were determined. In addition, we considered some frequentist classification methods like k- Nearest Neighbor (k-NN), Logistic Discriminant (LD), Linear discriminant analysis (LDA), Quadratic discriminant analysis (QDA) and Naïve Bayes when conditional independence assumption is violated. Two new Bayesian classifiers (B-LDA and B-QDA) were developed within the frame work of LDA and QDA using the bootstrap prior technique. The model parameters were estimated using Bayesian approach via the posterior distribution that involves normalizing the prior for the attributes and the likelihood from the sample in a MonteCarlo experiment. The bootstrap prior technique was incorporated into the Normal-Inverse-Wishart natural conjugate prior for the parameters of the multivariate normal distribution where the scale and location parameters were required. All the classifiers were implemented on the simulated data at 90:10 training-test data ratio. The efficiencies of these classifiers were assessed using the misclassification error rate, sensitivity, specificity, positive predictive value, negative predictive value and area under the ROC curve. Results from various analyses established the supremacy of the proposed Bayes classifiers (B-LDA and B-QDA) over the existing frequentists and Naïve Bayes classification methods considered. All these methods including the proposed one were implemented on a published binary response microarray data set to validate the results from the simulation studyItem Improved Bayesian Feature Selection and Classification Methods Using Bootstrap Prior Techniques(Faculty of Computer and Applied Computer Science, Tibiscus University of Timisoara, Romania, 2016) Olaniran, O. R.; Olaniran, S. F.; Yahya, W. B.; Banjoko, A. W.; Garba, M. K.; Amusa, L. B.; Gatta, N. F.In this paper, the behavior of feature selection algorithms using the traditional t-test, Bayesian t-test using MCMC and Bayesian two-sample test using proposed bootstrap prior technique were determined. In addition, we considered some frequentist classification methods like k- Nearest Neighbor (k-NN), Logistic Discriminant (LD), Linear discriminant analysis (LDA), Quadratic discriminant analysis (QDA) and Naïve Bayes when conditional independence assumption is violated. Two new Bayesian classifiers (B-LDA and B-QDA) were developed within the frame work of LDA and QDA using the bootstrap prior technique. The model parameters were estimated using Bayesian approach via the posterior distribution that involves normalizing the prior for the attributes and the likelihood from the sample in a MonteCarlo experiment. The bootstrap prior technique was incorporated into the Normal-Inverse-Wishart natural conjugate prior for the parameters of the multivariate normal distribution where the scale and location parameters were required. All the classifiers were implemented on the simulated data at 90:10 training-test data ratio. The efficiencies of these classifiers were assessed using the misclassification error rate, sensitivity, specificity, positive predictive value, negative predictive value and area under the ROC curve. Results from various analyses established the supremacy of the proposed Bayes classifiers (B-LDA and B-QDA) over the existing frequentists and Naïve Bayes classification methods considered. All these methods including the proposed one were implemented on a published binary response microarray data set to validate the results from the simulation studyItem Multiclass Feature Selection and Classification with Support Vector Machine in Genomic Study(Edited Conference Proceedings of the 1st International Conference of the Nigeria Statistical Society (NSS)., 2017) Banjoko, A. W.; Yahya, W. B.; Garba, M. K; Olaniran, O. R.; Amusa, L. B.; Gatta, N. F.; Dauda, K. A.; Olorede, K. O.This study proposes an efficient Support Vector Machine (SVM) algorithm for feature selection and classification of multiclass response group in high dimensional (microarray) data. The Feature selection stage of the algorithm employed the F-statistic of the ANOVA–like testing scheme at some chosen family-wise-error-rate (FWER) to control for the detection of some false positive features. In a 10-fold cross validation, the hyper-parameters of the SVM were tuned to determine the appropriate kernel using one-versus-all approach. The entire simulated dataset was randomly partitioned into 95% training and 5% test sets with the SVM classifier built on the training sets while its prediction accuracy on the response class was assessed on the test sets over 1000 Monte-Carlo cross-validation (MCCV) runs. The classification results of the proposed classifier were assessed using the Misclassification Error Rates (MERs) and other performance indices. Results from the Monte-Carlo study showed that the proposed SVM classifier was quite efficient by yielding high prediction accuracy of the response groups with fewer differentially expressed features than when all the features were employed for classification. The performance of this new method on some published cancer data sets shall be examined vis-à-vis other state-of-the-earth machine learning methods in future works.Item Multiclass Feature Selection and Classification with Support Vector Machine in Genomic Study(Edited Conference Proceedings of the 1st International Conference of the Nigeria Statistical Society (NSS)., 2017) Banjoko, A. W.; Yahya, W. B.; Garba, M. K.; Olaniran, O. R.; Amusa, L. B.; Gatta, N. F.; Dauda, K. A.; Olorede, K. O.This study proposes an efficient Support Vector Machine (SVM) algorithm for feature selection and classification of multiclass response group in high dimensional (microarray) data. The Feature selection stage of the algorithm employed the F-statistic of the ANOVA–like testing scheme at some chosen family-wise-error-rate (FWER) to control for the detection of some false positive features. In a 10-fold cross validation, the hyper-parameters of the SVM were tuned to determine the appropriate kernel using one-versus-all approach. The entire simulated dataset was randomly partitioned into 95% training and 5% test sets with the SVM classifier built on the training sets while its prediction accuracy on the response class was assessed on the test sets over 1000 Monte-Carlo cross-validation (MCCV) runs. The classification results of the proposed classifier were assessed using the Misclassification Error Rates (MERs) and other performance indices. Results from the Monte-Carlo study showed that the proposed SVM classifier was quite efficient by yielding high prediction accuracy of the response groups with fewer differentially expressed features than when all the features were employed for classification. The performance of this new method on some published cancer data sets shall be examined vis-à-vis other state-of-the-earth machine learning methods in future works.Item Performance Evaluation of Some Estimators of Linear Models with Collinearity and Non–Gaussian Error(Edited Conference Proceedings of the 1st International Conference of the Nigeria Statistical Society (NSS)., 2017) Yahya, W. B.; Garba, M. K.; Ajayi, A. G.; Dauda, K. A.; Olaniran, O. R.; Gatta, N. F.Among typical challenges in numerous multiple linear regression models are those of multicollinearity and non–normal disturbances which have created undesirable consequences for the ordinary least squares (OLS) estimator which is the popular and naïve technique for estimating linear models. Thus, it appears so critical to combine strategies for estimating regression models in order to muddle through while these challenges are present. In this study, the strength of some methods of estimating classical linear regression model in the presence of multicollinearity and non-normal error structures were investigated. The conventional Least Squares (LS), Ridge Regression (RR), Weighted Ridge (WR), Robust M-estimation (M) and Robust Ridge Regression (RRR) methods taking into accounts M-estimation procedures were considered in this study. Results from Monte-Carlo study revealed the superiority of the RRR estimator over others using Mean Squared Errors (MSE) of parameter estimates and Absolute Bias (AB) as assessment criteria among others over various considerations for the distribution of the disturbance term and levels of multicollinearity. The study concluded that whenever linear regression modeling is intended and multicollinearity among the regressors and non-spherical disturbance structure on the response variable are suspected in a data set, the RRR estimator should be adopted in order to ensure optimal efficiency.Item REVIEW OF SOME ROBUST ESTIMATORS IN MULTIPLE LINEAR REGRESSIONS IN THE PRESENCE OF OUTLIER(s)(African Journal of Mathematics and Statistics Studies, 2023) Alanamu, T; Oyeyemi, G. M.; Olaniran, O. R.; Adetunji, K. O.Linear regression has been one of the most important statistical data analysis tools. Multiple regression is a type of regression where the dependent variable shows a linear relationship with two or more independent variables. OLS estimate is extremely sensitive to unusual observations (outliers), with low breakdown point and low efficiency. This paper reviews and compares some of the existing robust methods (Least Absolute Deviation, Huber M Estimator, Bisquare M Estimator, MM Estimator, Least Median Square, Least Trimmed Square, S Estimator); a simulation method is used to compare the selected existing methods. It was concluded based on the results that for y direction outlier, the best estimator in terms of high efficiency and breakdown point of at most 0.3 is MM; for x direction outlier, the best estimator in term breakdown point of at most 0.4 is S; for x, y direction outlier, the best estimator in terms of high effici ency and breakdown point of at most 0.2 is MM.Item A Test Procedure for Ordered Hypothesis of Population Proportions Against a Control(Turkish Clinical publications, Turkey, 2016) Yahya, W. B.; Olaniran, O. R.; Garba, M. K.; Oloyede, I.; Banjoko, A. W.; Dauda, K. A.; Olorede, K. O.Objective: This paper aims to present a novel procedure for testing a set of population proportions against an ordered alternative with a control. Material and Methods: The distribution of the test statistic for the proposed test was determined theoretically and through Monte-Carlo experiments. The efficiency of the proposed test method was compared with the classical Chi-square test of homogeneity of population proportions using their empirical Type I error rates and powers at various sample sizes. Results: The new test statistic that was developed for testing a set of population proportions against an ordered alternative with a control was found to have a Chi-square distribution with non-integer values degrees of freedom v that depend on the number of population groups k being compared. Table of values of v for comparing up to 26 population groups was constructed while an expression was developed to determine v for cases where k > 26. Further results showed that the new test method is capable of detecting the superiority of a treatment, for instance a new drug type, over some of the existing ones in situations where only the qualitative data on users’ preferences of all the available treatments (drug types) are available. The new test method was found to be relatively more powerful and consistent at estimating the nominal Type I error rates (α), especially at smaller sample sizes than the classical Chi-square test of homogeneity of population proportions. Conclusion: The new test method proposed here could find applications in pharmacology where a newly developed drug might be expected to be more preferred by users than some of the existing ones. This kind of test problem can equally exist in medicine, engineering and humanities in situations where only the qualitative data on users’ preferences of a set of treatments or systems are available.