An Improved Transition Probability Matrix for Crime Distribution Prediction

The occurrence of crime has always been the main problem affecting urban public security and social security environment. Therefore, the prevention and control of crime is the focus of public security work. The traditional police strategy has poor timeliness and cannot respond and adjust in real time with the occurrence of criminal activities, and its deterrence and control of criminal activities are limited. To address the problem of low accuracy of existing crime distribution prediction models, an improved transition probability matrix for crime distribution prediction is proposed in this paper. Based on a large number of trajectory data of criminals, this paper quantitatively describes the temporal and spatial characteristics of crowd movement in different areas of the city by using the temporal and spatial transfer probability. Then, combining Markov chain and Bayes' theorem, the probability model of spatio-temporal transfer of criminal groups in regions is constructed. Finally, the model predicts the number of crimes in urban grid areas.


Introduction
Crime is a social problem that cannot be ignored in the process of urban development [1]. With the development of science and technology and the concentration of social wealth, the phenomenon of crime is more frequent, intelligent, and destructive. e analysis of the current situation of criminal behavior and the prediction of criminal trend can restrain the growth of crime rate, effectively promote the public security organs to strengthen the law enforcement, and maintain social stability [2]. At present, crime prediction has become an important work for the public security organs to prevent and crack down on crimes, which plays an increasingly significant supporting role in police work [3]. With the development of big data sampling and Internet information processing technology, more and more attention has been paid to the correlation between criminal behavior and social factors, economic factors, and environmental factors in the studied area [4]. e purpose of crime spatio-temporal analysis is to predict the time, place, and type of crime according to the historical crime data [5]. Its research is of great significance to the maintenance of public security and attracts more and more attention in academic circles. Researchers often use a number of classic machine learning and pattern recognition methods to solve this problem. However, previous studies have neglected the full combination of historical crime data and socio-environmental factors [6]. Previous studies on spatio-temporal analysis of crime are usually based on historical crime data. Classical machine learning and pattern recognition methods are used to model and predict this problem, including linear regression algorithm, auto regression algorithm, lasso regression algorithm, ridge regression algorithm, decision tree algorithm, and Bayesian algorithm [7][8][9][10][11].
Other studies focus on how to introduce machine learning algorithms into spatio-temporal models to obtain a series of spatio-temporal results on crime problems. Literature [12] introduced a regressive integral moving average (ARIMA) model to study the crime rate prediction of a city in a few weeks. By comparing existing studies, it can be seen that in terms of spatio-temporal analysis of crimes, Apriori algorithm is based on the principle of support, confidence, and promotion criteria to screen out strong association rules [13]. When attribute data is too much or the amount of data is too large, it is difficult to mine association rules with predictive significance because of low support. Random forest, LightGBM, and other decision tree algorithms have better effect on category analysis of criminal events [14]. However, the temporal and spatial connection of the crime is vague and cannot be described concretely.
e current research on regional population prediction is generally extended from individual prediction to population prediction. However, due to the large population in urban areas, the calculation amount is too large. Moreover, the impact of the uncertainty of individual prediction on population prediction at the urban regional scale also needs to be further studied, and an efficient prediction method at the population level needs to be established [15]. Studies on time series prediction mainly focus on the temporal variation of regional population, but seldom consider the dynamic cumulative impact of population movement in reality [16]. e predictors based on Markov model take into account the number of people in the region and the characteristics of crowd transfer in the current period as the prediction law of the next period [17]. However, the accuracy of the prediction will be affected due to the spatial and temporal differences in human motion. erefore, this paper adopts crime location data from a group perspective. Considering the spatio-temporal difference of crowd movement, a prediction method for the number of criminals on the urban regional scale is proposed, which combines Markov chain and Bayes' theorem. e method calculates the probability of spatio-temporal shift of crime location and constructs the prediction model of crime number to predict the crime number in urban area. e innovations and contributions of this paper are as follows: (1) e algorithm model in this paper makes full use of the historical track data of criminals to describe the space-time characteristics of crowd flow in different regions (2) Fully combining the advantages of Markov chain and Bayes' theorem, to form a partition prediction model. (3) e distance factor is added into the model, and the change of movement law in adjacent periods is considered is paper consists of four main parts: the first part is the introduction, the second part is methodology, the third part is result analysis and discussion, and the fourth part is the conclusion.

Space-Time Transfer Probability.
Transition probability is an important concept in Markov correlation theory. It is used to describe the transition process from one state to the next in the state space. According to the definition of Markov theory, the transition probability of state t Markov chain I t , t ∈ N is set as conditional probability u t i t ,i t+1 , and the calculation formula is shown in the following equation: where u t i t ,i t+1 represents the probability of the state changing from i t to i t+1 from the current state t to the next state t + 1. e spatial position of the criminal at time n is expressed as X t � (longitude t , latitude t , n), where n represents the time. (longitude t , latitude t , n) indicates the longitude and latitude coordinates of the administrative area where the offender is located, abbreviated as (lon t , lat t ). e spatial position of criminals at different time constitutes the moving trajectory L t � (X 0 , X 1 , X 2 , . . . , X t ), where t is the number of track records. In this paper, the change of spatial position is regarded as the change of state. X t � (lon t , lat t , n) as the state i t . Corresponding state of underground i t+1 is expressed as X t � (lon t+1 , lat t+1 ，n + 1). Transition probability of time and space for the location of crime personnel is expressed as u n i t i t+1 � U(X t+1 | X t ). e spatio-temporal transfer probability of criminals' location is used to quantitatively describe the possibility of criminals moving between different regions. From the location data of active region, the number of criminals in each region can be counted T n x and flow x,y represents the number of criminals between region x and region y and flow n x,y represents the number of times that the current location is x and the next location is y in the trajectory of criminals in time period n. According to Bayes' theorem, the space-time transfer probability u n x,y of criminals from region x to region y is calculated by the following formula: Here, x and y represent the area where the criminal is located in time period n and time period n + 1, respectively, w represents the number of criminals in the administrative region, flow n x,y represents the number of criminals moving from region x to region y in time period n, T n+1 y represents the number of criminals in area y in time period n + 1, and T n+1 is the number of criminals in all areas of n + 1.

Space-Time Transition Probability Matrix.
A space-time transition probability matrix can represent the likelihood of a criminal moving between regions. Figure 1 shows a schematic diagram of inter-regional crowd flow. When only considering the movement of criminals from single region x to w regions (Figure 1(a)), criminals move from region x to region 1, 2, . . . , w, temporal and spatial transfer probability form a row vector U n x , can be expressed as U n x � [u i,1 , u i,2 , · · · , u i,w ]. Figure 1(b) shows a schematic diagram of criminals moving between multiple areas. e spatial-temporal transfer probability of crowd movement between multiple regions can be expressed as a two-dimensional w × w matrix U n , as shown in the following equation: where the transition probability matrix U n satisfies the matrix element 0 ≤ u n x,y < 1, and the sum of all elements in each row is 1.

Castro's Prediction Model.
Castro's model uses the nonaftereffect of Markov chain to predict traffic flow in different periods. Markov effect is a hypothetical random process in which the conditional distribution function of the random variable is only related to itself in the state space and has nothing to do with the state at any previous time. e model assumes that the number of vehicles in urban areas remains constant. At different time granularity, the traffic flow of each time period was counted, and the probability matrix U n of vehicle temporal and spatial transition was calculated. As shown in formula (4), the traffic flow prediction model is constructed. is model is widely used in the short-term prediction of urban traffic flow. By using the idea of state transfer in Markov chain, the number of criminals in each region in the current period T n and transition probability matrix U n can be used to predict the number of criminals in each region in the next period. As shown in following formula: where U n is the transition probability matrix of criminals in time period n, U n � |u n xy |, 0 ≤ u xy ≤ 1, ∀x, y ≤ w, w y�1 u n xy � 1, and T n is the number vector of criminals in each region in time period n.

Model Improvement.
e prediction method of the number of regional criminals in this paper combines Bayes theory and Markov chain without aftereffect. e time and space transfer probability of criminals between regions are calculated. On the basis of the Castro model, the prediction method of the number of criminals in the region is constructed. e improvement of the Castro model is given in detail in the following two aspects: (1) In reality, due to inter-regional population flow and other phenomena, the total population in the region is constantly changing, which makes the assumption that the total number of criminals in the model remains unchanged and untenable. In view of the fluctuation of the total number of criminals, this paper uses historical track data to determine the correction of the total number of criminals in adjacent periods. First, the correction term ΔT n⟶n+1 of the number of criminals in the adjacent period is calculated. Training data statistics obtain the number of regional criminals in each period. e variation of the number of criminals ΔT day,n⟶n+1 in the adjacent time period of each day was calculated, respectively. e day superscript is used to identify different dates. e training data shows the maximum value ΔT n⟶n+1 max and the minimum value ΔT n⟶n+1 min . e average is divided into t state intervals [g] z and h z . e probability u z of ΔT day n⟶n+1 in each interval [g] z and h z is calculated. According to the following formula, the weighted average of the mean value of the interval is used to obtain the correction term ΔT n⟶n+1 . e mean value of the interval is calculated to minimize the influence of the extreme value  Computational Intelligence and Neuroscience of the number of changes caused by emergencies on the number of regional criminals ΔT n⟶n+1 . where , A represents the maximum value of changes in the number of criminals in adjacent periods n and n + 1 in the training dataset, and ΔT n⟶n+1 min is the minimum. Similarly, the correction term ΔU n⟶n+1 of the spatio-temporal transition probability matrix of the population in the adjacent period was calculated. e difference of crowd movement rule in different time periods will lead to the change of temporal and spatial transition probability of crowd movement in different regions. e correction term ΔU n⟶n+1 of the space-time transition probability matrix reflects the difference in the crowd's movement characteristics in different periods. e calculation is based on Δu day,n⟶n+1 x,y , the difference of space-time transfer probability between region x and y in the matrix. e correction term u n x,y and u n+1 x,y of the transfer probability is from the transfer of n and n + 1 in the adjacent period of each. e difference value of transfer probability between region x and region y is calculated.
e difference value of transfer probability Δu day,n⟶n+1 x,y between regions in adjacent time periods of every day is divided into t intervals, and the correction term of space-time transfer probability Δu n⟶n+1 x,y between regions x and y is calculated according to the following equation. We can get from this, the adjacent time and the crowd moving time of transition probability matrix correction item ΔU n⟶n+1 � |Δu n⟶n+1 x,y |(x, y � 1, 2, . . . , w). w is the predicted number of this area.
Δu n⟶n+1 where where Δu n⟶n+1 xy, max represents the maximum difference value of spatio-temporal transfer probability between region x and region y in the training data in adjacent periods and Δu n⟶n+1 xy, min is the minimum. (2) Specific to the calculation of changes in the number of crimes in each region. Previously, the flow distribution of Castro's prediction model did not consider the change of the movement law in adjacent periods. In this paper, the correction term ΔU n⟶n+1 of the space-time transition probability matrix is added to the model. e improved transition probability matrix U n′ is used to allocate the number of regional criminals ΔT n⟶n+1 . e actual significance of the correction item ΔU n⟶n+1 of the probability of spatio-temporal transfer between regions lies in the quantitative description of the differences in the law of crowd movement between urban regions in different periods of time. According to formula (8), the transition probability matrix U n′ that is closer to the actual user movement rule in time period n + 1 can be obtained. I scale U n′ so that it still adds up to 1. Finally, according to the improved transition probability matrix U n′ , the change ΔT n⟶n+1 of the number of crimes in the interval of n + 1 is allocated to each area. According to formula (9), the predicted value of the number of criminals in each region in n + 1 period was obtained.

Main Flow of the Algorithm.
e basic flow of the algorithm in this paper is shown in Figure 2.
e basic process includes three stages: data preparation, model training and model prediction, and evaluation. First of all, the inter-regional flow, flow n x,y , of criminals in different periods of time and the number of criminals in the region T n are counted. According to formula (2), the spatio-temporal transfer probability of criminal groups in regions is calculated, and the spatio-temporal transition probability matrix U n is constructed. en, the historical track data is used to train the prediction model, and the correction terms of the number of criminals in adjacent areas and the transition probability matrix are calculated according to formulas (5) and (6). Finally, formula (9) is used to predict the number of regional crimes. e prediction performance of the proposed algorithm is analyzed and evaluated by the prediction accuracy. e optimal number of training weeks and correction items were determined through experiments.

Grid Classification and Prediction Experiment.
According to the spatial scale of 200 m × 200 m, the research area was divided into 375 grids. According to the distribution of historical crimes in all biweekly periods from 2017 to 2019, the frequency of cases occurring in each grid was calculated. e optimal number of clusters was determined to be 3 or 4 by k-means clustering method. erefore, all grids are divided into stable hot spot grid, relatively hot spot grid, occasional hot spot grid, and nonhot spot grid. e specific grid distribution diagram is shown in Figure 3. e results showed that there were 20 stable hot spot grids, and the frequency of cases occurred more than 50 times in 78 biweekly grids, and cases occurred in 68 biweekly grids with the highest frequency. ere were 36 hot spots with higher incidence, and the frequency of cases occurred was greater than 25 times and less than 44 times in 78 biweekly grids. ere were 49 accidental hot spot grids, and the frequency of cases occurred was greater than 15 times and less than 26 times in 78 biweekly periods. In the remaining 270 grids, cases occurred less than 12 times in 78 biweekly periods.
ere are even parts of the grid that never have a case, so they are classified as nonhot spot grids. e number of grids and cases of each type are shown in Table 1.
After all the grids were classified, improved transition probability matrix prediction models were constructed for the whole study area and all kinds of grids, respectively. e transition probability matrix regards hot spot grid prediction as a dichotomous problem and predicts hot spot grids with cases occurring in the predicted period from all target grids. e number of cases per grid from 2017 to 2019 in the same period as the target period and in 3 adjacent periods were counted. ree representative variables, namely, urban village area, road network density, and POI (catering, shopping mall, and entertainment) density, were selected from the built environment, respectively, as covariables of the crime hot spot prediction model, and the improved prediction model was constructed, and the values of the three built environment covariables corresponding to each grid were calculated. Only historical crime data and two data with historical crime data and three covariables were used as input data. e three algorithms use the same input data as training samples and variables of the data set to be predicted.

Overall Prediction Results of the Study Area.
e overall prediction results of the improved transfer matrix model before and after the addition of covariates were compared by two indexes: grid hit ratio HitR a and case hit ratio HitR n . e calculation formulas of HitRa and HitRn are shown in formulas (10) and (11), respectively. Based on these two evaluation indicators, the prediction results of 26 biweekly experiments in 2019 are shown in Table 2. e line charts of HitRa and HitRn of the three model indicators are shown in Figures 4 and 5.

Computational Intelligence and Neuroscience
where A is the total number of actual hot spot grids and a * is the number of correctly predicted hot spot grids.
where n is the total number of cases in the research area and N is the actual number of cases in the predicted hot spot grid. According to the average of accuracy evaluation indexes of 26 experimental prediction results, the grid hit ratio (HitR a ) and case hit ratio (HitR n ) of the prediction results of the improved prediction model in this paper are both higher than those in literature [18,19], and significant differences are tested at the level of 0.5. In terms of standard deviation, the improved model proposed in this paper has a more stable performance than literature [18] and literature [19] in 26 experiments.
rough the comparative analysis of 2 index line charts, this paper found that in the 26 two-week prediction experiments throughout 2019, the improved prediction model with built environment covariable data has better performance in terms of the overall prediction effect of the study area. Under the same experimental data and requirements, the grid hit ratio and case hit ratio are higher than those predicted by other models. is shows that the model in this paper can correctly predict more hot spot grids. And the hot spot grids correctly predicted have higher crime density and can cover more cases. In this study, the number of hot spots predicted by each biweekly experiment is the same as the actual number of hot spots.
As can be seen from the variation trend of Figures 4 and  5, whether it is literature [18], literature [19], or the prediction model in this paper, the case hit ratio of the prediction experiment results also fluctuates with the rise and fall of the grid hit ratio. In other words, under normal circumstances, the grid hit rate is high and the hit rate of cases in the appropriate period is also high.

Classification Grid Prediction
Results. e evaluation index system is optimized before comparing the prediction effect of various grids before and after adding covariates. Since the number of predicted hot spot grids will change    Computational Intelligence and Neuroscience after the addition of covariates, a new evaluation index, HitE n , case hit efficiency, is added in addition to the two indexes mentioned above. e calculation formulas of HitE n is shown in the following formula: where HitR n is the case hit ratio, a is to predict the number of hot spot grids and A is the actual number of hot spot grids. Table 3 is a summary of the accuracy of the prediction results of the four types of grids and is the comparison of the results of the stable hot spot grid, relatively hot spot grid, occasional hot spot grid, and nonhot spot grid before and after adding the built environment covariable.
For stable hot spot grid with high incidence, the number of predicted hot spot grid increases, and HitR a and HitR n of grid hit ratio are significantly improved after the addition of three covariables. Meanwhile, in terms of HitE n of case hitting efficiency, the mean value of the improved prediction model proposed in this paper is slightly higher than that in literature [18,19]. At the same time, it also reflects that the prediction efficiency of the model with the addition of covariables is higher than the original prediction model using only historical crime data, and the actual hot spot grid can be found more "hot." For hot spot grids with high incidence, the number of predicted hot spot grids increases, and HitR a and HitR n of grid hit ratio are significantly improved after the addition of three covariables. According to HitR n of case hit efficiency, although the mean value of the improved prediction model proposed in this paper is slightly higher than the results in literature [18] and literature [19], it is not statistically significant. It shows that the overall prediction efficiency is not significantly improved after the addition of covariables.  Literature [18] Literature [18]   Computational Intelligence and Neuroscience erefore, the prediction hit ratio can be improved for hot spot grids with high incidence by increasing the number of predicted hot spot grids. e stable hot spot grid and relatively hot spot grid are the areas with high concentration of crimes against property in public places, which need to be taken as the key areas for crime prediction, prevention, and control. As can be seen from the above analysis results, when spatial differentiation is considered, the case hit ratio of the partition model with stable and high hot spot grid can be close to 0.912 and 0.682, respectively. According to the prediction results of the zoning model, more effective monitoring and management can be carried out for these two types of high incidence areas, which is of great significance for crime prediction and prevention in the overall study area. e accuracy of the zonal model is significantly higher than that of the whole model, which also indicates that considering spatial differentiation plays an important role in improving the accuracy of the crime hot spot prediction model. erefore, it is necessary to establish different zoning models for different types of regions according to the spatial differentiation of crimes. By optimizing the crime hot spot prediction model of each zoning model, the overall prediction accuracy of the study area is improved.

Accuracy of the Prediction Model.
In order to check the relationship between the accuracy of the algorithm and the amount of data, an experiment was conducted to add 1000 data as sample data each time. A small amount of data is extracted from the recently acquired data as test data to test the algorithm's accuracy, the algorithm's accuracy under different data amounts is counted, and a broken line graph is drawn. e corresponding results are shown in Figure 6.
According to the analysis in Figure 6, the algorithm's accuracy increases with the increase of sample data. However, when the number of sample data reaches a certain number, the algorithm's accuracy tends to be stable. After 6000 pieces of data, the imported data contains some data that do not conform to objective laws, and it is used as sample data for machine learning. As can be seen from Figure 6, the accuracy of literature [20], literature [21], literature [22] and the prediction model in this paper all declined when the data volume was 7000, but the accuracy still maintained a high level.

Performance Comparison of Prediction Models.
In this paper, the distance factor is introduced into the prediction of crime distribution model. In order to better verify the performance of the algorithm presented in this paper, the distance factor was also incorporated into literature [20], literature [21], and literature [22] to predict the distribution of crimes, which was compared with the prediction model presented in this paper. e predicted results are shown in Figure 7. As can be seen from Figure 7, literature [20] performs worst among all the comparison algorithms, especially in regions 1, 2, 8, and 10. From zone 6 to 10, literature [20], literature [21], and literature [22] fluctuated greatly. However, the prediction model proposed in this paper has a relatively mild trend in these areas, and the effect Accuracy (%) Figure 6: Accuracy of each prediction model.

Conclusion
In recent years, with the continuous development of economy and society, the form of comprehensive social governance is increasingly severe. Crimes such as theft and robbery are becoming more and more prominent and can easily induce major social risks. It is very important to construct the prediction model of crime distribution in order to fight against various criminal activities. Based on Markov chain and Bayes' theorem, this paper proposes a method to predict the number of criminals based on the probability of spatio-temporal transition of criminals' location. e prediction method proposed in this paper comprehensively considers the spatio-temporal characteristics of urban population movement and is suitable for regional population prediction at the regional scale of urban grid. e experimental results show that the proposed algorithm quantifies the spatio-temporal characteristics of human movement and has a good prediction accuracy of the number of criminals. However, more experimental studies are still needed to explore the principles and the data used to predict different types of crime in different research areas. In the future, only by further understanding the causes and laws of the occurrence of crime while doing empirical research, we can carry out more effective crime prevention and control, stabilize social security environment, and maintain urban public security through practical measures.

Data Availability
e labeled dataset used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.