Partial Least Squares-Based Classification and Selection of Predictive Variables of Crimes against Properties in Nigeria
No Thumbnail Available
Date
2017
Journal Title
Journal ISSN
Volume Title
Publisher
Edited Conference Proceedings of the 1st International Conference of the Nigeria Statistical Society (NSS).
Abstract
In this study, the state-of-the-art Partial Least
Squares (PLS) based models (PLS-Discriminant analysis
(PLS-DA), Sparse PLS-DA (SPLS-DA) and Sparse
Generalized PLS (SGPLS)) were employed to model and
classify the rate of crimes (low or high) committed against
properties across the 36 states in Nigeria and the Federal
Capital Territory (FCT). The core variables that are
predictive of this crime type in Nigeria were identified using
the LASSO penalty method via the PLS. Data on occurrences
of cases of offences against property obtained from the data
base of Nigerian Police Force were utilized in this study. The
missing values due to non-occurrence or non-reportage of
crime cases were imputed, using the techniques of
multivariate imputation by chained equation. The complete
data set were partitioned into training and test sets using
80:20 holdout scheme. The 80% training set was used to build
the PLS-based models that were in turn used to predict the
overall crime rates of Nigerian cities in the 20% held out test
data over 200 Monte-Carlo cross-validation runs. All the
PLS-based models yielded good classification of unseen test
samples into either of two qualitative classes of high and low
crime rates with average Correct Classification Rate (CCR)
of 94%. Other performance metrics including sensitivity,
specificity, positive and negative predictive values, balance
accuracy and diagnostic odds ratio were estimated to further
examine their classification efficiencies. The SGPLS identified
fewer (just 3 out of 12) core relevant crime variables that are
predictive of the overall crime rates in Nigerian states with
highest CCR than the SPLS which selected 9 such variables to
achieved about the same feat.
Description
Keywords
Sparse Partial Least Squares, Partial Least Squares, Dimension Reduction, Correct Classification Rate, LASSO, Training Set, Test Set