Change-Point Analysis: An Effective Technique for Detecting Abrupt Change in the Homicide Trends in a Democratic South Africa

South Africa is considered the murder capital of the world. +e challenge for the South African government is to attract foreign investment to boost the economy in a country plagued by homicide. In this study, a change-point analysis was used to pinpoint significant changes in the murder trends in each of the nine provinces in South Africa from 2005 to 2015. +is analysis will assist authorities to gain a better understanding of the big picture view in order to mitigate against this crime. Twomethods were used in the analysis, namely, CUSUM and Bootstrap. CUSUM was used to analyse data trends, and Bootstrap was used to calculate the occurrence of change points based on the confidence level. +e results of the analysis clearly show the abrupt shifts in murder data across the provinces of South Africa. In addition, we used the South African population statistic dataset from 2005 to 2015 to evaluate the relationship between population of the nine provinces and contextualise the murder crime rates year to year and province to province.


Introduction
Crime in South Africa is gaining traction, and the government is desperately seeking ways to shake off the tag of being associated as the crime capital of the world in order to boost investor confidence [1,2]. e murder crime in South Africa is on the increase, and the country has been ranked as one of the most murderous countries in the world [3,4]. In the past research, works were conducted on the rise of murder in South Africa and reported in the literature [5][6][7][8][9]. Several reasons are put forward for the high crime rate, and these include the low standard of education, alcohol abuse, a lack of social and vocational skills, poor housing and living conditions, and a lack of parenting skills [1]. Violent crime is increasing faster than any other crime in South Africa.
An abrupt change occurs suddenly, and the application of abrupt change detection in crime studies is highly important to give early notice of an impending crime and the consequences on the health of a nation. Traditionally, control charts were used to detect changes. A major difference between control charts and change-point analysis is that control charts are updated following the collection of each data point while change-point analysis is conducted only after all the data are collected. Control charts can detect isolated abnormal points and a major change quickly while change-point analysis can detect subtle changes frequently missed by control charts [10]. Recently, change-point analysis has been extensively used and proven to be a powerful analytic tool for time-series datasets and revealing underlying trends. Several studies show that change points are efficient in exposing the presence of hidden change points in sequence or series datasets [11][12][13][14]. e murder trends in the nine provinces of South Africa were investigated in this work, and change-point analysis was conducted to find a substantial change. Two powerful change-point analysis tools, namely, CUSUM and Bootstrap, were used to discover trends and occurrence of change points in the South African murder data for 10 years (2005-2015).
is research will enable the South African government to see at a glance the murder trend and particular point at which change occurred upward or downward and work toward prevention of the crime.
Some of the related research found in the literature on murder in South Africa is discussed in Section 2.

Murder in South
Africa. Murder in South Africa mostly occurs as a result of conflict between different groups engaging in certain activities, which include taxi-related, illicit mining, political motives, and hostel-related violence [15]. Over 20 years of postapartheid, South Africa still experiences an extreme homicide rate, which is among the world's highest, apart from war zones. According to the 2013 global report on homicide, Southern Africa and Central America have the highest murder rate [6]. e report from George Otieno et al. shows a high rate of homicide-related deaths in the typical rural South African population and solicited for quick attention in order to prevent loss of life [6].
According to Lindegaard, homicide is the first leading cause of death in South Africa and was regarded as a serious health issue [8]. On average, the death caused by violence in South Africa is almost twice the global average. e young men aged 15 to 29 years have been reported as culprits, suspects, and victims of homicide [7][8][9]. In the rural/ township areas of South Africa, the victimization rate for men is quite higher than in the big cities [6,8]. In addition, the homicides for South African women are committed by their intimate partners, and the rate is six times more than the world average [8].
Sexual homicide is a form of gender-based violence that happens in one in five female homicides and one in ten child homicides in South Africa [16]. Abraham et al. indicated that the rate of adult women's sexual homicide is on the increase, and sexual child homicides for boy and girl children show different patterns of risks, and the girl child has the highest risk. e pervasiveness of various aspects (forensic, social, and demographic characteristics of the victims and perpetrators.) of sexual homicide in adult women, male, and female children is reported by Abraham et al. [17].
According to McCafferty & Action, crime statistics in the new South Africa (post 1994) shows violent crime had the greatest increase of all crime categories. Murder is a subcategory of violent crime. Between 1994 and 2002, all other violent crimes such as attempted murder, serious assault, and rape were on the rise except for murder that was declining [1]. Statistics released by the South African police service in 2008 show that there were 18 487 documented murders in South Africa between 2007 and 2008. Although this statistic shows a decline of more than 19%, homicide in South Africa is still higher than the global average [16]. As crime rates increase in South Africa, conviction rates decrease adding to the culture of violence [1].
Data from the Crime Information Analysis Centre (CIAC) show that between 1994 and 1999, the postapartheid years, one out of every three crimes in South Africa were violent crimes. Interpol data from three countries, namely, Australia, South Africa, and Columbia, show that only South Africa had the highest rate of violent theft, robbery, and murder [1]. According to Interpol statistics, South Africa has the highest per capita rates of murder and rape [18]. In the period from 1994 to 1999, all serious crimes increased in Pretoria and Durban by 19% and in Cape Town by 17% [18].
Since 1994, statistics show that Johannesburg is the capital city in South Africa for a serious crime.
is is followed by Pretoria, Cape Town, and Durban [1]. An analysis of murders in Johannesburg, Pretoria, Cape Town, and Durban shows that people in townships and poorer parts of the city were more at risk of murder [18].
ere are cases where murder in South Africa is committed at the hands of police brutality. One such case that drew a lot of public attention was the Marikana massacre in the North West Province on 16 August 2012. In this incident, 34 striking miners were brutally murdered by the police who claimed that they acted in self-defence [19].
is study will use a change-point analysis to assess the murder trends across the provinces of South Africa.

Materials and Methods
An experimental design with quantitative analysis was employed in this study. In this research, the crime statistics for South African dataset was analyzed using the changepoint analysis data processing technique. Two change-point techniques are combined, namely, CUSUM and Bootstrap, as suggested by Taylor and Arif et al. [10,20]. e approach was aimed at making a realistic interpretation of murder trends in South Africa. e CUSUM charts were used to detect significant changes and indicate when the murder rate was out of control. e Bootstrap technique is a resampling technique. e application of the bootstrapping technique indicated that some change points had occurred in the data. e 1000 bootstraps used in the experiments are recommended as the minimum number [10,20]. It is with 95% confidence that all significant changes in the table are pinpointed. e change-point analysis uses a recursive algorithm to identify multiple changes. e crime statistics for the South African dataset provide a history of crime statistics from 2005 to 2015 per province and station and are available online. e dataset provides a vast number of crime statistics from all South African provinces. e dataset was last updated on 18 November 2019 and with version 2 being the current version [21]. e change points obtained indicated the time where there was a noticeable deviation in the murder statistics in South Africa.
e South African population statistics [22] provides information on the South African population for each province (2005 to 2015). e comparison of the population and crime rates is shown in Figures 1(a)-9(b) of Supplementary Material-2 and discussed in Section 4.

Change-Point Analysis.
In order to detect whether a change or more than one change occurred, a change-point analysis can be performed. Furthermore, change-point analysis can be used to detersssmine when the changes occurred and with what confidence the change occurred. e confidence level indicates the likelihood that a change 2 e Scientific World Journal occurred, and the confidence interval indicates when the change occurred. A change-point analysis can be applied to all types of time-ordered data [10]. Control charts have two horizontal lines (upper control limit and lower control limit) that indicate the maximum range that values are expected to vary. If points appear within these two horizontal lines, then it means no change has occurred. Points outside these limits indicate a change has occurred. While control charts are useful to detect changes, the analysis of changes is lacking [10].

Change-Point Analysis Algorithm.
e change-point analysis aims at detecting any change in the mean of a process in historical data such as murder crime datasets. Performing this analysis, the following questions can be adequately answered: Did a change occur? Did more than one change occur? When did the changes occur? and How confident are we that they are real changes? [10].
Suppose x 1 , x 2 , . . ., x n denote n data points in time series and let S 0 , S 1 , . . ., S n represents the cumulative sum of the points. To calculate the change-point analysis, the following three steps shall be applied to the initial dataset D 0 � {X 1 , . . ., X n } of size n (n 0 � |D 0 |). e mean of x 1 , x 2 , . . ., x n is expressed by e cumulative sum always starts at zero, 0. erefore, let S 0 be equal to zero, S 0 � 0. en, S i is calculated repetitively as follows: Before the computation of bootstrap analysis, there is a need to generate a boundary for the chart, an approximation of the magnitude of the change is calculated as After the computation of the magnitude of change, iteratively, the bootstrap analysis is then executed a number of times N on D 0 . As described [20], a single bootstrap is executed as follows: is dataset is generated by original n values, which are randomly reordered, which is also known as sampling without replacement (SWOR). (ii) e bootstrap CUSUM is computed by following a similar method based on the bootstrap sample and is defined as S j . (iii) e magnitude of change for the bootstrap CUSUM is calculated as follows: (iv) en, where the original magnitude of change is more than the magnitude of change of bootstrap CUSUM, S i diff > S j diff , the number of bootstraps is counted. Let N be the number of bootstrap sample executed, and K be the number of bootstraps for which S i diff > S j diff , where the confidence level that a change has occurred as a percentage is defined as e bootstrapping ends up in an independent error structure [20], which is a distribution-free approach with a single assumption. Errors distributed as shown below are being referred to as an independent error structure: where m i denotes the mean at the time I, e i is a random error correlated with the i-th value, and the independent e i is assumed to have a zero (0) mean value, to be identically and normally distributed. Usually, m i � m i − 1 except for a small number of values of i are called change points. When change is detected, an approximation of when the change happened can be computed. e CUSUM estimator is calculated as follows: where S m is the furthest point from the zero value in the CUSUM chart, and the last point before the change occurred is estimated by point m while point m + 1 estimates the first point after the change occurred [20]. e mean square error (MSE) is used as the second estimator when the change happens. Let MSE (m) be defined as MSE(m) � min i�1,...,|D| MSE(i) for a given sub-dataset D as follows [20]: e bootstrapping technique is adopted to detect multiple changes. A repetitive analysis must be done to get other significant change points at consequent levels and the confidence limits and levels [20].
Significant change points can be revealed by the application of this technique to time-series data on the murder crime dataset, which is considered for this study.

Research Methodology.
Data preprocessing was conducted in the python high-level programming language. Python supports modules and packages, which makes it attractive for rapid application development.
e python package pandas have been used for practical real-world data analysis. Pandas are well suited to the dataset because the data are tabular with heterogeneously typed columns, as in an Excel spreadsheet. Furthermore, the data are time-series data. Data scientists follow several stages when working with data, namely cleaning the data, analyzing or modeling the data, and finally organizing the results of the data in the form of tables and graphs. Pandas are well suited to all these stages. Figure 1 depicts the process flow diagram of the research conducted. As shown in Figure 1, the South African crime data were preprocessed using python data analytics tool. e Scientific World Journal 3 CUMSUM and Bootstrapping techniques were applied to the preprocessed data to detect abrupt changes that occurred at different points.

Results and Discussion
e murder statistics at each police station in every province from 2005 to 2015 was summed to provide provincial statistics on murder using python and displayed in Table 1. Table 1 shows the results for murder statistics in South Africa per province from 2005 to 2015. Table 1 shows that the province of KwaZulu-Natal has the highest average murders (4149.82) over the ten-year period from 2005 to 2015. is is followed by the provinces of Gauteng (3514.09) and Eastern Cape (3402. 18). Table 1 also shows that, on average, more murders were committed in 2015 than any other year. e graphical representation of the murder statistics of the 9 provinces of South Africa is depicted in Figures 1-9 found of Supplementary Materials-1 of this article. Figures 1-9 show the number of crimes in each of the nine (9) provinces for the period of ten years (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016). For instance, Figure 9 of Supplementary Materials-1 shows that 750 murder occurred in 2005-2006, and more 900 murder occurred 2015-2016 in the North West Province of South Africa. It is possible to display the murder crime data on a control chart, but the drawback is that significant change points will not be pinpointed [10]. Table 2 (Figures 2(a)-9(b)) on the rates of population and murder crime, it is all the same, the crime rate does not correspond with the population.
Conversely, comparing province to province in terms of high/low population and crime rates, the results of our experiment depicted in Figures 1(a)-9(b) of Supplementary Material-2 and Table 3 show that the four provinces (Gauteng, KwaZulu-Natal, Eastern Cape, and Western Cape) with highest population have the highest crime rates, and the province (Northern Cape) with the least population has the least crime rate. Table 3 shows the average population and the murder crime in South Africa per province over a ten-year period. Table 3 shows that provinces with the highest populations namely, Gauteng, KwaZulu-Natal, Eastern Cape, and Western Cape have the highest murder rates while provinces such as Northern Cape with low murder rates. Interestingly, KwaZulu-Natal with the second largest population has the highest murder rates.
In order to observe changes in the murder trends during the timed period such as noting when changes occurred and by how much it changed, a change-point analysis was performed. e results of the experiments are presented in the graphs and tables below. Figure 2 shows a graphical presentation of the results of the change-point analysis with background changes and control limits while Figure 3 displays the results of the CUSUM analysis. e shaded background represents a region expected to contain all the values based on the current model that a change occurred. e one change is represented by the shifts in the shaded background. In Figure 2, the red line represents the upper and lower limits, and it can be observed from the figure that some points appear above the upper limit between 2005 and 2007 and below the lower limits around 2011. ese points can be labelled as outliers as they fall outside the boundary. In Figure 3, the blue region of the CUSUM represents the existence of a change. Significant changes occurred in the period from 2009 to 2015. e CUSUM graph shows a descending trend. Table 4 shows the results of the bootstrapping analysis of murder data of KwaZulu-Natal. e analysis detected only one change in 2011. is year represents the first year of the change. e confidence level indicating how confident the analysis is that the change actually happened is 98%. Table 4 indicates that, prior to the change, the murder statistics was 3324, and while after the change, it was 3626. Table 4 also gives a level associated with each change. Any number of levels can exist dependent on the number of changes found. Level 1 change is the change that is most visibly apparent in the plot in Figure 2. e level 1 change is apparent in the CUSUM chart displayed in Figure 3. Figure 4 shows a graphical presentation of the results of the change-point analysis with background changes and control limits while Figure 5 displays the results of the CUSUM analysis. Figure 4 shows that there is no point outside the control limits. However, there is one change represented by the shift in the blue region in the background. is change would have been missed by a control chart because all points are within the control limits. Figure 5 shows the CUSUM chart with background changes. e straightness of the line segments before and after the changes indicates that the changes were fairly sudden. e CUSUM chart shows a descending trend in the blue region. Determining the exact time of the reduction of murders is the key to solving the murder problem.

e Scientific World Journal
In order to get a better understanding of the number and timing of change points, bootstrapping was used. Table 5 shows the results of the bootstrapping analysis of murder data for Gauteng. e analysis detected a level 1 change that occurred in 2011. As shown in Table 5, the number of murders reduced from 3401 to 3012, the confidence interval of a change between 2009 and 2012 at a confidence level of 97%. Insight into why there was a drastic reduction in murders pinpointed by the change-point analysis in 2011 is required by authorities. Figure 6 shows a graphical presentation of the results of the change-point analysis with background changes and control limits while Figure 7 displays the results of the CUSUM analysis. Figure 6 reveals one change point in the blue-shaded region. All points are within the control limits. In Figure 7, the CUSUM plot shows significant changes in the blue region. e CUSUM chart detects significant changes after 2012, and there is an ascending trend in the blue region. Table 6 shows the results of the bootstrapping analysis of murder data given in Western Cape. Table 6 shows the results of a level 1 change in 2012. e confidence interval that a change took place between 2011 and 2013 is at a confidence level of 97%. e number of murders in the Western Cape Province increased from 2070  Province  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  Ave  KwaZulu-Natal  4891  4984  4686  4737  4214  3740  3418  3623  3616  3810  3929  4149.82  Mpumalanga  877  858  831  895  864  717  726  693  806  831  859  814.27  Northern Cape  390  412  418  411  375  339  336  412  437  413  372  392.27  Limpopo  686  745  690  745  761  663  734  701  728  777 2046.00 2111.20 2040.70 2009.20 1877.60 1790.30 1753.50 1822.50 1903.60 1981.90 2068.80 Table 2: Total population statistics in South Africa per province over a ten-year period.  to 3106.2. Level 1 change is the change that is most visibly apparent in the plot in Figure 6. e level 1 changed is apparent in the CUSUM chart displayed in Figure 7. Figure 8 shows a graphical presentation of the results of the change-point analysis with background changes and control limits while Figure 9 is the results of the CUSUM analysis. Table 7 below shows the results of the changepoint analysis on murder data given in Eastern Cape in Table 1.        Table 7. e confidence interval that a change took place between 2008 and 2012 is at a confidence level of 94%. e number of murders in the Eastern Cape Province decreased from 3284 to 3206. Figure 10 shows a graphical presentation of the results of the change-point analysis with background changes and control limits while Figure 11 is the results of the CUSUM analysis.
In Figure 10, the blue region shows one change point after 2009. All points are within the control limits. In Figure 10, the CUSUM chart concurs with the one change point. Significant changes are detected in the blue region. e CUSUM chart shows an ascending trend. Table 8 shows the results of bootstrapping analysis of murder data given in Free State. e analysis discovered a level 1 change in 2009. e result is shown in Table 8. e confidence interval that a change took place between 2009 and 2010 is at a confidence level of 92%. e number of murders in the Free State Province increased from 891.3 to 961.17. Figure 12 shows a graphical presentation of the results of the change-point analysis with background changes and control limits while Figure 13 is the result of the CUSUM analysis.
In Figure 12, the change-point analysis shows one significant change in the blue region. e figure also shows one     e Scientific World Journal 7 point below the lower limit that represents an extreme value. ere was a downward trend in the murder statistics after 2009. In Figure 13, the CUSUM chart confirms this change with the presence of a blue region. Table 9 below shows the results of the bootstrapping analysis of murder data given in Mpumalanga.
e results show a level 1 change in 2011 in Table 9. e confidence interval that a change took place between 2011 and 2012 is at a confidence level of 92%. e number of murders in the Mpumalanga Province decreased from 711 to 686. Level 1 change is the change that is most visibly apparent in the plot in Figure 12. e level 1 changed is apparent in the CUSUM chart displayed in Figure 13. Figure 14 shows a graphical presentation of the results of the change-point analysis with background changes and control limits while Figure 15 is the results of the CUSUM analysis.
In Figure 14, there are no changes displayed in the blue region. All points appear within the control limits for the timed period. In Figure 15, the CUSUM chart shows no significant changes as indicated by the absence of a blue region. e results of the bootstrapping analysis of murder data given in Mpumalanga also showed no significant changes in murder data. Figure 16 shows a graphical presentation of the results of the change-point analysis with background changes and control limits while Figure 17 is the results of the CUSUM.
In Figure 16, there are no changes displayed in the blue region. All points appear within the control limits for the timed period. In Figure 17, the CUSUM chart shows no significant changes as indicated by the absence of a blue region. e results of the bootstrapping analysis of murder data given in Limpopo also showed no significant changes in murder data. Figure 18 shows a graphical presentation of the results of the change-point analysis with background changes and   Figure 19 is the results of the CUSUM analysis.
In Figure 18, there are no changes displayed in the blue region. All points appear within the control limits for the timed period. In Figure 19, the CUSUM chart shows no significant changes as indicated by the absence of a blue region. e results of the bootstrapping analysis of murder data given in the North West Province also showed no significant changes in murder data.
From the experiment, it is clear that abrupt shifts in murder data across provinces were detected by analyzing change points and the level of changes. A level 1 change was detected in the analysis, and this was apparent in the CUSUM charts depicted. Table 10 below shows the instances of upward, downward, and no shift in the murder trends from 2005 to 2015.
In order to generate results that are more precise, the number of bootstraps must be increased, which doubles the duration of the analysis [20].

Conclusions
e change-point analysis on murder data over the ten-year period is preferred to the control chart, as in many instances, the control chart missed the changes. A review by authorities of when the changes occurred can provide valuable insight in reducing the number of murders committed in South Africa. e change-point analysis show trends in the data and give authorities the big picture view of the data.

Data Availability
e South Africa crime and population datasets used to support the findings of this study is available in the https://www.kaggle.com/slwessels/crime-statistics-for-southafrica and http://www.statssa.gov.za/publications/P0318/ P03182018.pdf.