Predicting and Preventing Crime: A Crime Prediction Model Using San Francisco Crime Data by Classification Techniques

The crime is diﬃcult to predict; it is random and possibly can occur anywhere at any time, which is a challenging issue for any society. The study proposes a crime prediction model by analyzing and comparing three known prediction classiﬁcation algorithms: Naive Bayes, Random Forest, and Gradient Boosting Decision Tree. The model analyzes the top ten crimes to make predictions about diﬀerent categories, which account for 97% of the incidents. These two signiﬁcant crime classes, that is, violent and nonviolent, are created by merging multiple smaller classes of crimes. Exploratory data analysis (EDA) is performed to identify the patterns and understand the trends of crimes using a crime dataset. The accuracies of Naive Bayes, Random Forest, and Gradient Boosting Decision Tree techniques are 65.82%, 63.43%, and 98.5%, respectively, and the proposed model is further evaluated for precision and recall matrices. The results show that the Gradient Boosting Decision Tree prediction model is better than the other two techniques for predicting crime, based on historical data from a city. The analysis and prediction model can help the security agencies utilize the resources eﬃciently, anticipate the crime at a speciﬁc time, and serve society well.


Introduction
Data mining is the knowledge discovery process used to collect and analyze a large dataset and summarize it with helpful information.It is critical in different fields of science to serve analytical purposes and plays an essential role in human life and fields such as education, business, medicine, health, and science.Data mining is an attractive process of discovering a valid, understandable, helpful pattern and valuable information in large amounts of data [1].e main goal of data mining is to find out fascinating and concealed knowledge in the data and summarize it in a significant form [2][3][4].Similarly, the results should be in the form that conveys the inside information effectively [5][6][7].erefore, classification techniques are among the most important and commonly used techniques in data mining, and supervised class prediction techniques allow nominal class labels for predictions [8].
San Francisco is one of the largest cities in the United States of America.erefore, it is vital to understand the pattern of crimes to ensure the safety of the citizens.San Francisco Crime Classification is an open-source dataset available for an online competition administrated by Kaggle Inc. e main task in the dataset is to predict the crime category based on a given set of geographical and time-based variables.
e limited and constrained police resources prove insufficient to handle the city's law and order issues.erefore, it is vital to study and understand the distribution of different types of crimes in the city based on the occurrence time and the location for security agencies to channelize resources efficiently.Naive Bayes, Random Forest, and Gradient Boosting Decision Tree are used for prediction and classification of crimes into two types of violent and nonviolent crimes.
In this paper, the main goal is to propose a prediction model that predicts crime based on past criminal records.e proposed model contains three techniques and performs evaluation through accuracy, precision, and recall evaluation matrices.e data is descriptively analyzed and statistical crime distribution over space and time is visualized to help attain potential patterns.e features are extracted from the original dataset, and the classification is performed using Naive Bayes, Random Forest, and Gradient Boosting Decision Tree techniques.e experimental results show that the Gradient Boosting Decision Tree prediction model is better than the other two techniques for predicting crime, based on historical data from a city.e analysis and prediction model can help the security agencies utilize the resources efficiently, anticipate the crime at a specific time, and serve society well.Conclusions of the study and future directions for further research are presented in the last section of the paper.

Related Work
Data mining has been frequently used in crime prediction models for the last couple of years, considering different features.Yehya used variables such as longitude (X), latitude (Y), address, day of week, date (YYYY-mm-dd: hh : MM : ss), district, resolution, and category to analyze and predict San Francisco crime data.e study used different techniques and principal component analysis to classify the accuracy and avoid overfitting.He also used four different classifiers: K-NN, XGB Decision Tree, Bayesian, and Random Forest, applied them to the task, and obtained the log-loss of 2.39031 by the Random Forest classifier [9].Wenbin Zhu et al. conducted an experiment for the classification of crime based on the San Francisco dataset.According to their explanation, it was mentioned that crime classification helps police to keep the city safe.ey predicted crime categories based on time and location.ey used Naive Bayes, logistic regression, and the Random Forest as baseline classifiers with best prediction results [10].Umair Saeed et al. experimented with data mining techniques to identify and predict crimes and compared the experiment results of Naive Bayes and Decision Tree classifiers.ey observed that the Naive Bayes classifier performed better and accurately predicted crime prediction [11].Somayeh Shojaee et al. conducted an experimental study for crime prediction using supervised classification learners.ey used two different feature selection methods executed on real crime datasets.
ey compared these two methods based on AUC (i.e., Area Under the Curve) values.ey found that Naive Bayes, K-Nearest Neighbor (KNN), and Neural Networks are better classifiers against Decision Tree (J48) and Support Vector Machine (SVM).e Chi-square feature selection technique is used in their experiment for the performance measurement of the classifiers.e investigation is conducted in a RapidMiner environment to enhance the quality of crime mining [12].Junbo et al. predicted crime categories from 2003 to 2015 surrounding San Francisco city based on a dataset derived from SFPD Crime Incident Reporting System.ey investigated Naive Bayes, K-NN, and Gradient Tree Boosting classification models and analyzed their advantages and disadvantages on that prediction task.According to their results, Naive Bayes did not perform as a perfect model for that task because some features did not represent the count or frequency.On the other hand, K-Nearest Neighbor improved the prediction result to a large extent.Gradient Tree Boosting performed as the best model in their experiment, but it was slightly slow.Gradient Tree Boosting model generated a score of 2.39383 and was ranked 93 among 878 teams [9].R. Iqbal et al. (2013) conducted an experimental study for the classification algorithms.
ey experimented with the prediction of crime categories for the different states of USA.ey compared Naive Bayes with the Decision Tree classifier for crime prediction.Naive Bayes achieved 70.81% accuracy and the Decision Tree classifier achieved 83.95% accuracy, which shows that the Decision Tree classifier performs better for the crime classification problems [13].

San Francisco Dataset
e study uses a dataset from Kaggle to build up the model [2] 1.
e study arbitrarily mixes the original training dataset and divides it into a training dataset and testing dataset with 80% and 20% sizes, respectively.

Exploratory Data Analysis.
A simple script is run and explores several unique categories of crimes in the dataset, and 39 different crime categories are identified.e figure also shows the distribution of crime and change in the type of crime since 2003.For example, from the below plot, larceny/theft is the most common type of crime.Further, there appears to be a skewness in the type of crimes.For example, there have been 174,900 incidents of larceny/theft, whereas there have been only 6 of TREA since 2003.
From Figure 1, it is found that the top 10 crimes are larceny/theft, other offenses, noncriminal, assault, drug/narcotic, vehicle theft, vandalism, warrants, burglary, and suspicious OCC, accounting for 83.5% of the whole records statistically [10].It is reasonable to suggest allocating more police resources to deal with these crimes as they are more likely to occur.
Figure 2 indicates that the lower overall density of sex offenses compared to the other categories of crime is expected, as there are fewer crimes of this category in the data.
e overall structure here indicates the aggregate with the most prominent hot spot in the north area centered in the  Figure 4 shows monthly reports of the top ten crimes in San Francisco, revealing the expansion and reduction of crime month-wise.However, the interesting point is that all crimes (top 10) are increased after three months and also decreased after three months, which reveals that the top ten crimes in the San Francisco area based on seasonal pattern increased in the 3rd month (March) with same pattern in Spring, decreased in the 6th month (June) with the same pattern in Summer, and increased again in September, Autumn.Complexity 3 Figure 5 shows the top ten crimes' ratio (increase or decrease) for days of the week.
e crime is more concentrated in northern areas on Friday, Saturday, and Wednesday.Larceny crime, vehicle crime, and vandalism crime increased on Friday and Saturday with the same pattern, while the rate of suspicious OCC crime occurred and increased on Friday and Wednesday.Burglary crime increased on Friday, and assault crime increased on Saturday and Sunday.Drug/narcotics and warrants crime occurred and increased on Wednesday.All these crimes indicate the ratio and occurrence of crime in San Francisco based on days (days of weeks).
Figure 6 shows the aggregate of the crime and the crime rate in each hour.In this graph, the results suggest that all the top ten crimes decreased between 3 : 00 AM and 6 : 00 AM but reached their second peak at midnight and the first peak around 5 : 00 PM to 6 : 00 PM.So, when police resources are limited, our suggestion is to allocate more police from noon to midnight.
ere are seasonal patterns in data, where although the total crime counts were different, the normalized values followed similar trends.When normalized by mean and standard deviations, seasonal patterns in a month appear.
Similar patterns emerge for hours also.Different lines represent crimes for different categories (top 10 only) in Figures 7 and 8, respectively.

Variable Selection.
e variable "Category" is the dependent variable for prediction.e variables "Resolution" and "Description" are irrelevant for the analysis because of their nature and were dropped from the dataset during preprocessing steps.e remaining variables are considered the independent variables, used for predicting the dependent variable.

Prediction Model
e prediction model is based on Naive Bayes, Random Forest, and Gradient Boosting Decision Tree prediction techniques, briefly discussed below.

Naive Bayes.
Naive Bayes is based on the Bayesian theorem, and it is a conditional probabilities method that calculates the probability by counting frequent values [14].
Naive Bayes is summarized as follows: (1) A simple classification process classifier (2) Best suited for historical data and prediction (3) Classification technique analysis of the relationship between each attribute and the class instance (4) A supervised learning method that can solve categorical and probabilistic problems (5) A popular classification technique in text categorization [14].
is Naive Bayes classifier was introduced in 1995 [14].It is known with different names in the community of data mining and machine learning, such as simple bases and independence Bayes [15].
e Naive Bayes classifier is commonly used in many applications like sentiment classifications and in different ensemble prediction models [16][17][18].
Using the Naive Bayes classifier, two types of quantities need to be calculated from the dataset, that is, Class Probabilities and Conditional Probabilities.
e method of the Bayesian classifier is given in the following equation: Here, P(C-X) is a maximum posterior hypothesis, P(C) is prior, P(X) is evidence, and P(X-C) is the likelihood of the hypothesis [8].Complexity used for the predictions of handwriting character, digital pattern recognition, semantic analysis, language feature extraction, and hybrid models [19][20][21][22].In this technique, every tree depends on randomly selected values sampled and independently corresponding distribution for every single tree around it.e numbers of trees increase in the forest general error for the forests converges as become to the limit for the forest's trees.e generalization error of the classifier depends on the correlation and individual strength between the trees of the forest.Each node in the Random Forest is split and randomly selected; the features yield an error rate that is better as compared with AdaBoost.

Random
Definition.Random Decision Forests or Random Forest is a technique consisting of a tree-structured classifier h(x, k), k � 1,. .., where k represents independent identically distributed random vectors and each tree casts a unit vote for the most popular class at input x.
Correlation and Strength.In Random Decision Trees or RF, the generalization error can be obtained in terms of two parameters: how the single classifier measures the value accurately and the dependence between them [23].Random Decision Forests correct for Decision Trees' habit of overfitting to their training set, and a Random Forest produces a large number of decision trees.For data including categorical variables with a different number of levels, Random Forests are biased in favor of those attributes with more levels.Categorical variables also increase the computational complexity to create trees [24].

Gradient Boosting Tree. Gradient Boosting Tree is a machine learning technique for classification and regression problems.
is technique makes a prediction model that uses typically Decision Trees in the form of an ensemble of the weak prediction model.In this technique, the models are built in the same way as in other boosting models.It constructs the model in a stage-wise way as other boosting methods do, and it generalizes it by allowing optimization of an arbitrary differentiable loss function.e idea of gradient boosting originated in the observation by Leo   e proposed prediction models are evaluated on the accuracy, precision, and recall, and ROC and Lift are the performance metrics for estimating the classification models [25].
erefore, it is imperative to compare the accuracy using an alternative method, precision and recall; because of a two-class problem, the performance of a classifier is presented using the "confusion matrix" in Table 2.
e following are standardized equations for computing accuracy, sensitivity/recall, specificity, and precision.
Accuracy � TP + TN/TP + FP + TN + FN.Sensitivity � recall � TP/t � TP/(TP + FN).Specificity � TN/n � TN/(TN + FP).Precision � TP/p � TP/(TP + FP).TP is True Positive, TN is True Negative, FP is False Positive, and FN is False Negative in the confusion matrix presented in Table 2. Precision in this context refers to the actual percentage of crime predicted by the classification model, which translates into the returns on the cost of categories.On the other hand, recall measures the percentage of crime identified and needed to be targeted.us, at last, specificity measures how good a test is at avoiding false alarms.

Experiment Results and Performance Evaluation
All three models were trained and presented in the previous section with different setting parameters and feature selections.e data exploration section observes that both the time-related features and geographic features are important.
For analysis, all the three models are trained and tested, that is, the training dataset with 878,049 records from Kaggle, and they are divided into two parts in the ratio of 80 : 20 for all the models.us, 80% of the dataset were used to train the model, whereas 20% were used to test the model.e subsections discuss the performance and results.

Naive Bayes
In machine learning, Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features.In Table 3, each column holds the reference (or actual) data and within each row is the prediction.
e diagonal represents instances where our observation correctly predicted the class of the item.e table   In Table 4, each column holds the reference (or actual) data and within each row is the prediction.e diagonal represents instances where our observation correctly predicted the class of the item.e table classifies nonviolent crime and violent crime classes using the Naive Bayes algorithm for the testing set.For each class, the result of a confusion matrix is discussed below.(1) n the violent crime class, the correctly classified items are 57,693.(2) In the nonviolent crime class, the wrongly classified items are 31,518.
6.1.Random Forest.Random Forest technique is an ensemble learning method for classification, regression, and other tasks, operated by constructing a multitude of Decision Trees at training time and outputting the class, that is, the mode of the classes (classification) or means prediction (regression) of the individual trees.Random Decision Forests correct for Decision Trees' habit of overfitting to their training set.In this experiment, Random Forest was selected as a technique to estimate the predictors (Table 5).8 Complexity In Table 6, each column holds the reference (or actual) data and within each row is the prediction.e diagonal represents instances where our observation correctly predicted the class of the item.
e table classifies nonviolent crime and violent crime classes using the Random Forest algorithm for the training set.For each class, the result of a confusion matrix is discussed below.
ere are 349,230 items classified into the nonviolent crime class.
(1) In the nonviolent crime class, the correctly classified items are 280,840.(2) In the violent crime class, the wrongly classified items are 68,390.
353,209 items are classified into the violent crime class.
(1) In the violent crime class, the correctly classified items are 287,017.
(2) In the nonviolent crime class, the wrongly classified items are 66,192.
In Table 7, each column holds the reference (or actual) data and within each row is the prediction.e diagonal represents instances where our observation correctly predicted the class of the item.
e table classifies nonviolent crime and violent crime classes using the Random Forest algorithm for the testing set.For each class, the result of a confusion matrix is discussed below.(1) In the violent crime class, the correctly classified items are 56,617.(2) In the nonviolent crime class, the wrongly classified items are 32,448.

Gradient Boosting Trees. Gradient Boosting Decision
Trees is a robust machine learning technique used in predictive modeling due to its high prediction accuracy compared to other modeling techniques.Gradient Boosting Decision Trees produces a prediction model in the form of an ensemble of weak prediction models, that is, Decision Trees.It builds the model in a stage-wise fashion as other boosting methods do, and it generalizes it by optimizing an arbitrary differentiable loss function (Table 8).
In Table 9, each column holds the reference (or actual) data and within each row is the prediction.e diagonal represents instances where our observation correctly predicted the class of the item.e table classifies nonviolent crime and violent crime classes using the Gradient Boosting Decision Trees algorithm for the training set.For each class, the result of a confusion matrix is discussed below.
ere are 351,145 items classified into the nonviolent crime class.In Table 10, each column holds the reference (or actual) data and within each row is the prediction.e diagonal represents instances where our observation correctly predicted the class of the item.e table classifies nonviolent crime and violent crime classes using the Gradient Boosting Decision Trees algorithm for the testing set.For each class, the result of a confusion matrix is discussed below.
ere are 86,569 items classified into the nonviolent crime class.
(1) In the nonviolent crime class, the correctly classified items are 86,569.(2) In the violent crime class, the wrongly classified items are 0.

Conclusions and Future Directions
e study presents exploratory data analysis using a prediction model based on classification techniques and compares the results of San Francisco crime data.e Naive Bayes, Random Forest, and Gradient Boosting Decision Tree are used for predicting the crime category attribute labeled "violent" and "nonviolent."e techniques are implemented in R languages, and the experimental results for all three algorithms manifest that Gradient Boosting Decision Tree performed better than Naive Bayes and Random Forest for the crime classification.
e Gradient Boosting Decision Tree achieved 98.5%, 96.96%, and 100% for accuracy, precision, and recall, respectively.Law enforcement agencies can take great advantage of using machine learning algorithms like Gradient Boosting Decision Tree to fight crime effectively, channelize the resources efficiently, anticipate the crime up to some extent, and serve society.e proposed prediction models can be implemented to any dataset or crime data for predictions and resource management.
In the future, the same models using more advanced classification algorithms can be applied to the crime dataset and evaluate their prediction performance to discover trends and improve the subject knowledge.To design a comprehensive framework for the prediction that helps law enforcement agencies manage the resources in a specific area quickly, it is believed that higher accuracy can be achieved when employing more feature engineering in the address field.A more temporal analysis can be performed to determine the number and intensity of criminal activities using time series analysis, a mix of temporal and spatial analysis, which can help allocate resources more efficiently and effectively.

Figure 3
Figure 3 shows interesting figures and results based on years. is map reveals the increase or decrease in the top ten crimes in different years in San Francisco from 2003 to 2015.Figure4shows monthly reports of the top ten crimes in San Francisco, revealing the expansion and reduction of crime month-wise.However, the interesting point is that all crimes (top 10) are increased after three months and also decreased after three months, which reveals that the top ten crimes in the San Francisco area based on seasonal pattern increased in the 3rd month (March) with same pattern in Spring, decreased in the 6th month (June) with the same pattern in Summer, and increased again in September, Autumn.

Figure 1 :
Figure 1: Number of crimes in individual category.
ere are 86,399 items classified into the nonviolent crime class.(1) In the nonviolent crime class, the correctly classified items are 55,282.(2) In the violent crime class, the wrongly classified items are 31,117.89,211 items are classified into the violent crime class.

Figure 7 :
Figure 7: Normalizing by month reveals common pattern in data.

Figure 8 :
Figure 8: Normalizing by hour reveals common pattern in data.

( 1 )
In the nonviolent crime class, the correctly classified items are 54,779.(2) In the violent crime class, the wrongly classified items are 31,776.89,065 items are classified into the violent crime class.

( 1 )( 1 )
In the nonviolent crime class, the correctly classified items are 347,260.(2) In the violent crime class, the wrongly classified items are 3,885.351,294 items are classified into the violent crime class.In the violent crime class, the correctly classified items are 351,294.(2) In the nonviolent crime class, the wrongly classified items are 0.

Table 1 :
Selected features for analysis.
Forests.Leo Breiman and Ahele Culter developed the Random Forest algorithm.In 1995, Tin Km Ho (Bell Labs) used for the first time the term Random Decision Tree.Ensemble learning method, Random Forests, or Random Decision Forest is a very famous classification and regression method.It is building numbers of the classifier on the training dataset which makes good predictions.is technique is also Breiman where boosting can be interpreted as an optimization algorithm on a suitable cost function.Explicit regression gradient boosting algorithms were subsequently developed by Jerome H. Friedman simultaneously with the more general functional gradient boosting perspective of Llew Mason, Jonathan Baxter, Peter Bartlett, and Marcus Freeman.

Table 3 :
Confusion matrix results of Naive Bayes on training data.

Table 4 :
Confusion matrix results of Naive Bayes on testing data.

Table 5 :
Accuracy, incorrectly classified instances, recall, and precision for Naive Bayes on training and testing data.

Table 6 :
Confusion matrix results of Random Forest on training data.

Table 7 :
Confusion matrix results of Random Forest on testing data.

Table 8 :
Accuracy, incorrectly classified instances, recall, and precision for Random Forest on training and testing data.

Table 9 :
Confusion matrix results of Gradient Boosting DecisionTrees on training data.In the violent crime class, the correctly classified items are 88,611.(2)Inthenonviolentcrime class, the wrongly classified items are 430.Tables5, 8, and 11 present the accuracies of Naive Bayes, Random Forest, and Gradient Boosting Decision Tree techniques, respectively, and it is shown that the Gradient Boosting Decision Trees technique has better results.

Table 11 :
Accuracy, incorrectly classified instances, recall, and precision for Gradient Boosting Decision Trees on training and testing data.

Table 10 :
Confusion matrix results of Gradient Boosting Decision Trees on testing data.