Prediction of Future Terrorist Activities Using Deep Neural Networks

Institute of Computing, Kohat University of Science and Technology, Kohat, Pakistan Center for Excellence in IT, Institute of Management Sciences, Peshawar, Pakistan Department of Information Technology, University of Haripur, Haripur, Pakistan Department of Information Technology, Abbotabad University of Science and Technology, Havelian, Pakistan Faculty of Engineering and Information Technology, Northern University, Nowshehra, Pakistan Faculty of Computer & Information Technology, Al-Madinah International University, Kuala Lumpur, Malaysia King Abdulaziz University, Jeddah, Saudi Arabia


Introduction
One of the most important threats to today's civilization is terrorism, which has affected the quality of lives of people in the whole world [1]. Terrorism means the use of intentional indiscriminate and illegal power and violence for creating terror amongst general population in order to gain some political, monetary, religious, or legal objectives. e definition of terrorism according to Hoffman [2] is "the deliberate creation and exploitation of fear through violence or the threat of violence in the pursuit of political change." e objectives of terrorism are to create instability by creating fear, anxiety, and uncertainty on a larger scale compared to a single individual. According to Global Terrorism Database (GTD), in 2019 alone 1,411 different terrorist attacks have happened, causing 6,362 fatalities and badly affecting the quality of life of individuals in the society. A visualization of world map showing different terrorist activities is given in Figure 1 (image source: https://www.start. umd.edu/gtd/). e orange color shows high intensity value as a combination of incident fatalities and injuries. e map shows a very high rate of terrorism in South Asia and the Middle East. e response of terrorist events is constant sense of fear, feeling helpless, experiencing fear and anger, and intolerance or aggression towards certain ethnicity or religious groups. It is equally important that the emotional reactions of the population is understood in regard to terrorist events so that we are able to design assistance to effectively help those who are suffering from these issues or they do not react to carry out another terrorist activity as a revenge. Terrorism has been studied for decades to understand the major factors causing the act of terrorism or understanding how to perform counterterrorism or understanding the social and economic effects of terrorism [3,4]. However, because of the complex nature of terrorism, it is difficult to find an effective solution that can be used as a counterterrorism to protect the lives of individuals. Identification of terrorist ideologies and prediction of future terrorist attacks have been proven to be of great importance and time-consuming process.
Machine learning algorithms have been used recently to study the different factors of terrorism [5,6]. NN and particularly DNN are getting popularity mainly because of the fact that a huge amount of labelled data is available recently. e advancements in computer technologies [7][8][9] have been able to create much powerful computer systems to perform the required computation in DNN. In this paper, NN and DNN models are used to make predictions of different factors that lead to terrorist activities. e model is helpful for law enforcement agencies to make prediction before an incident actually happens and potentially causes the loss of precious lives. e predicted factors are explained below.
(i) Suicide: to predict whether a terrorist activity is going to be suicide or not. (ii) Success: to predict whether a terrorist activity will succeed or not. (iii) Weapon type: to make a classification of the general type of weapons used in terrorist activity. (iv) Region: to classify the region that will be targeted by the terrorist activity. (v) Attack type: to classify the type of attack carried out as a terrorist activity.
ese predictions are important to understand in order to perform counterterrorism. Deep learning can make these predictions efficiently and can help law enforcement agencies to devise mechanisms to deal with terrorists and protect the lives of individuals. With the help of these tools, a terrorist activity can be stopped before it can actually happen and make destructions in terms of lives, infrastructure, or law. e rest of the paper is organized as follows. Related work is explained in Section 2 to highlight the current state-of-the-art research work in the field. Proposed methodology is explained in Section 3. It also gives a detailed analysis of the dataset, and the architectures of NN and DNN used for the prediction of different factors are explained. Results are demonstrated in Section 4, and the paper is concluded in Section 5 with possible future research directions.

Related Work
Terrorism can affect a society very badly and can have a huge impact on the people. e topic has been studied extensively over the last few decades to understand its causes and how to develop an effective counterterrorism mechanism to reduce the chances of terrorist activities. Machine learning algorithms and data mining techniques have also been applied to understand the different factors involved in a terrorist activity. In 2004, an adaptive safety analysis and monitoring (ASAM) system developed by Singh et al. [10]at University of Connecticut was discussed.
e system used hidden Markov models (HMMs) and Bayesian networks (BNs). e system can detect, track, and predict the potential terrorist activities in real time. e paper has demonstrated the use of the ASAM in analyzing the vulnerabilities at the Athens 2004 Olympics. In 2004, Tranchita et al. had developed a classification model in [11] that includes internal and external, natural and unnatural or man caused events. ey have developed a new security analysis methods that predicts events uncertainties.
In [12], Godwin et al. developed a visual analytical approach to effectively identify related entities such as terrorist groups, events, and location based on a 2D layout. e paper demonstrates a sequence comparison from bioinformatics, modified to incorporate the element of time. e paper has claimed that the system reveals relationships between entities that are not easily detectable using traditional methods. In 2009, Ozgul et al. [13] proposed an ensemble framework that can classify and predict terrorist groups using four different classifiers: Naïve Bayes, K-NN, Iterative Dichotomiser 3, and decision stump. e authors demonstrated that ensemble framework has better figures compared to individual models. In 2011, Dixon et al. [14] developed a neural network-based framework for counterterrorism. e authors used a game that is designed by criminologist and psychologists to generate data that can test 2 Complexity the suitability of AI techniques to look for counterterrorism. e authors investigated neural network and achieved a 60% success rate to identify deceptive behaviour. In 2014, Pilley [15] predicted terrorist groups using CLOPE algorithm.
In 2016, Toure and Gangopadhyay [16] collected incident data from a real-time system to develop a risk model that calculates the terrorism risk level of different locations. A set of rules was also proposed along with the risk model to make prediction of the future terrorist activities. e paper claims to have an accuracy of up to 96%. In another study by Saha et al. [17] in 2017, the authors predicted attack types, weapon types used, and target types, i.e., type of people where attack is made, using ensemble learning algorithm. e paper has claimed to achieve an accuracy in the range of 79% to 86%. In 2017, Mo et al. [18] focused on the prediction of terrorist events from the GTD with data mining techniques. SVM, Naïve Bayes, and logistic regression were used, and they demonstrated an accuracy of up to 78%. In [19], Ding et al. used machine learning methods (NNET, SVM, and Random Forest) to simulate the risk of terrorist attacks. e model was able to predict the places where terrorist events might occur with a success rate of 96%. In 2017, Garg et al. [20] studied the sentiments and survival of tweets before the terrorist attack on September 18, 2016, on security forces by four different terrorists. Different factors of tweets were taken into account such as last retweet, number of retweets, and number of favorites, which were used to study the sentiments of tweets.
Five different machine learning models, i.e., SVM, ANN, Naïve Bayes, Random Forest, and Decision Trees, were used to make predictions on attack type, attack region, and weapon type in 2018 by Verma et al. [21], reporting an accuracy of around 90%. In 2018, Li et al. [22] predicted the behavior of terrorist groups by presenting a comprehensive framework that uses social network analysis, wavelet transform, and pattern recognition approaches to understand the dynamics of the terrorist group and eventually predict the attack behavior. e paper has claimed that the framework has made accurate prediction of the behavior of the terrorist groups. Zhang et al. [23] in 2018 improved the location recommendation algorithm with multisource factors and spatial characteristics using the data of terrorist attack in Southeast Asia from 1970 to 2016. e model was used to build a spatial risk assessment model of terrorist attacks. e paper has claimed to achieve an accuracy of up to 88%.
In another study by Hao et al. [24], the authors used geospatial statistics that can analyze the spatiotemporal evolution of terrorist attacks in Indo-China. Random Forest is used to predict the risk of terrorist attacks using 15 driving factors. In 2019, Agarwal et al. [25] focused on analyzing the dataset of GTD and made prediction on different factors that might have given a blow to terrorism. Different data mining and machine learning algorithms such as SVM, Random Forest, and logistic regression have been used to understand the dataset and predict different factors such as the success of terrorist attack, the group that was involved in terrorist attack and the effect of different external factors involved in terrorist attack. In 2019, Kalaiarasi et al. [26] developed multiple classifiers to group and predict different terrorist activities using k-NN algorithm and Random Forest techniques. ey used the GTD dataset for detection of terrorism. In 2019, Maniraj et al. [27] developed a system that examines the growth or decay of the terrorist groups by the time, location, type of attack, target motives, weapon type, and availability. ey analyzed the GTD dataset and used machine learning algorithm that can predict the probability of attacks in different regions. In 2019, Christie in his thesis [28] carried out a study to understand the dynamics of unclaimed terrorism events in Pakistan using machine learning algorithms. ey made predictions on terrorist attributes such as attack, target, weapon type, spatial attack, and lethality of attacks. e study made an attempt to match the unattributed terrorist attack to known terrorist groups. In 2019, Ahmad et al. [29] developed a method for detection and classification of social media-based extremist affiliations based on the sentiment analysis. e focus was to classify tweets into two categories: extremist and nonextremist classes.
e system uses deep learning-based sentiment analysis to make a classification about the tweets. Other similar studies in 2020 can be found in [30][31][32].
All previous studies have applied machine learning and deep learning techniques to make AI-based model for terrorism. Current state-of-the-art research papers are based on understanding the pattern of terrorism and have proposed different solutions to analyze factors of terrorism. However, no research work is carried out in order to make prediction of future terrorist activities and predict different factors such as success, suicide, weapon type, attack type, and region. Clearly, there is a research gap for modeling and predicting future terrorist activities using deep learning. is research paper compares the performance of traditional machine learning and deep neural networks and concludes that deep neural network is a suitable model for prediction of future terrorist activities.

Data Analysis.
In this section, a detailed analysis of the dataset is given. e preprocessing performed on the dataset is also explained.

Feature Selection.
e National Consortium for the Study of Terrorism and Responses to Terrorism (START) has prepared a dataset known as Global Terrorism Database (GTD) (https://www.start.umd.edu/gtd). GTD contains information about terrorist activities from 1970 until 2018, including more than 181,000 different instances of terrorism. In this paper, 34 attributes (some attributes are redundant and hence discarded) are taken for the analysis. ese attributes along with description are given in Table 1.

Prediction of Different Factors of Terrorist Activities.
e following are different factors that neural network and deep neural network will be trained to learn.
(1) Suicide. is field indicates whether the attack is suicide or not suicide. 1 � "Yes" means that the incident was a suicide attack. 0 � "No" means there is no indication that the Complexity 3 incident was a suicide attack. Dimension of the dataset is (350, 116 × 34). 90% data is used for training (315,104 instances) and 10% is used for testing (35,012 instances). Both "Yes" and "No" classes have 175,058 instances.
(2) Success. is field indicates the success of a terrorist strike. 1 � "Yes" means that the incident was successful. 0 � "No" means that the incident was not successful. Dimension of the dataset is (323, 264 × 34). 90% of the dataset is taken as training (290,937 instances) and 10% is taken as testing (32,327 instances). Each class has 161,632 instances.
(3) Weapon Type. is field indicates the general type of weapon used in the incident. In the dataset, 13 different labels are used to represent different type of weapon. ese labels are explained below.
(1) Biological (2) Chemical (3) Radiological (4) Left as blank (5)   (1) North America (2) Central America and Caribbean (3) South America (4) East Asia iyear is field contains the year in which the incident occurred 2 imonth is field contains the number of the month in which the incident occurred 3 iday is field contains the numeric day of the month on which the incident occurred 4 Extended 1 � "Yes," the duration of an incident extended more than 24 hours; 0 � "No," the duration of an incident extended less than 24 hours 5 Provstate Name (at the time of event) of the 1st order subnational administrative region 6 Latitude e latitude of the city in which the event occurred 7 Longitude e longitude of the city in which the event occurred 8 Specificity 9 Vicinity e region in nearby location 10 Crit1  (1) Assassination (2) Armed assault (3) Bombing/explosion (4) Hijacking (5) Hostage taking (barricade incident) (6) Hostage taking (kidnapping) (7) Facility/infrastructure attack (8) Unarmed assaults (9) Unknown Dimension of the dataset is (95, 7242 × 34). 90% of the dataset is used for training (861,517 instances) and 10% is used for testing (95,725 instances). Each class has 88,255 instances.

Text to Numbers.
In the GTD dataset, some features are in text format, for instance, group name, country name, etc. It is not possible to process features with text data in NN or DNN. ere exist multiple techniques to convert text data to numbers, e.g., TFIDF, Word2Vec, GloVe, One hot encoding, etc. In this paper, LabelEncoder class of sklearn library is used to convert nonnumeric data to numeric data, as the labels are hashable and comparable to numerical labels.

Missing Data.
e dataset contains many missing values, i.e., the cell does not contain any data, which results into NaN when processed by NN. Different interpolation techniques can be used to fill the missing data. In this paper, SimpleImputer of sklearn library is used to fill the missing data. We have replaced the missing values by mean along each column.

Dealing with Unbalanced Classes.
During the analysis of the dataset, it is observed that the data are not balanced in different classes. In some classes, there are more instances, while others have very few instances. NN and DNN trained on unbalanced data are biased [33] towards the classes having more instances. In order to keep the data in balanced form, SMOTE: Synthetic Minority Oversampling Technique presented by Chawla et al. in 2002 [34] and later made available as a tool to be used in Python in [35] is used. NN and DNN presented in this paper are trained on balanced data.
3.1.6. Normalization. In GTD, data are in different range. Some columns have values as 0 and 1, while others have values in hundreds or thousands. In this situation, it is difficult for learning algorithm to learn the pattern and converge to a global minimum. erefore, it is important that before the data are processed by a learning model, the data are normalized, i.e., in the range of 0 to 1 or − 1 to 1. In this paper, MinMaxScalar of sklearn library is used, which for each value of the feature subtracts the average of all values and divides it by standard deviation, to convert the data in the range of − 1 to 1. e formula of standardization is expressed in equation (1), where X i are all the samples for a given feature, X is the average of all samples by the feature, and s is the standard deviation.

Learning Model.
In this section, the learning model used for the prediction of terrorist activities is explained. Two different models are developed. One is based on NN and the other is based on DNN. NN [36][37][38] is a graph of different nodes to perform computation. ese nodes are connected with each other by weighted edges. Some of the nodes are classified as input that takes input features and some of the nodes are known as output nodes that make predictions. During the forward propagation, a matrix of weights is multiplied with input features and eventually makes prediction. We have developed five different models. We will explain the process of learning in one model for suicide prediction. ere are 34 features, where 33 are input features and 1 is output feature which classifies whether an attack is suicide or not. In order to perform training, we store all data in a matrix. We have 315, 104 instances of terrorist activities; therefore, the size of the input matrix represented by X is 315, 104 × 33. In order to train on NN, we need to provide a matrix of the weights with the same size as input features. In case there are 10 units in the first layer, then the size of the weight matrix is 33 × 10. We initialize these weights randomly using Glorot Uniform initializer. We also need to provide a bias represented by b. e formula of this multiplication is shown in equation (2), where W 1 shows the weights for the hidden layer, b 1 shows the bias, and X represents the input matrix. ere is a nonlinear function, ReLU [39], which is computed as ReLU(z) � max(0, z).
For the output layer, we multiply the output of the hidden layer with different weights. Suppose we have 10 units in the hidden layer and one unit in the output layer, then the dimension of the weight matrix is 10 × 1. We also Complexity 5 need to add a bias at this layer. e calculation performed at the output layer is shown in equation (3), where W 2 and b 2 show the weight and bias for the output layer and A 1 is the input vector. At the output layer, sigmoi d [40] is computed as sigmoid(Z) � 1/(1 + e z ).
During the training phase of the NN, the prediction is made, represented by A 2 as shown in equation (3). en, the loss is computed comparing the predicted values with actual values. We are using binary cross-entropy loss as shown in equation (4), where m represents the number of samples and Y shows the actual output values. During the backpropagation process, the derivative of the loss is taken for output layer and hidden layer, and weights are updated using optimization techniques, such as gradient descent [41], gradient descent with momentum [42], RMSProp [43], and Adam [44]. e backpropagation with gradient descent is shown in equation (5), where α is the learning rate, L is the loss function given in equation (4), and W and b are weights and bias. e algorithm of training of NN model for suici de prediction, with gradient descent optimization algorithm, is given in Algorithm 1.
A sample architecture of NN is given in Figure 2. e figure shows the input layer, which contains the input data. ese data are passed to the hidden layer with W and b as shown in equation (2). e output of the hidden layer is passed to output layer with W and b and computed as given in equation (3). From the output layer, loss function is computed as given in equation (4). During the backpropagation process, weights are updated, taking into account the error of the actual Y and predicted Y, as shown in the following equation: A DNN [45,46] has more layers than a single-layer NN. Generally, more than two hidden layers are considered as DNN. A much larger DNN has 100 s of layers. For instance, ResNet [47] has 152 layers. DNN has recently shown advancements in different fields and has achieved state-of-theart accuracy in different applications given in [48][49][50]. A sample architecture of the DNN is given in Figure 3. e computation of forward propagation in DNN is same as NN. e same computation for one hidden layer is now computed for L − 1 hidden layers. e output layer L computes the results as given in equation (3). Loss is computed as given in equation (4), giving an error of the actual and predicted Y. During the backpropagation, the values of W and b for each layer are updated using gradient descent optimization algorithm as shown in equation (5). e algorithm of DNN is given in Algorithm 2. First, initialization of the weights and biases of all layers is made using Glorot Uniform initializer [51]. en, in the forward propagation the linear function and nonlinear activation (i.e., ReLU [39]) are computed at each layer. In the last layer, binary cross-entropy loss function is used to compute the loss. In case of binary classification, sigmoid [40] activation function is used, and in case of multiclass classification, softmax [52] is used. en, during backpropagation, the derivate of the loss function with respect to weights and biases is taken at every layer. e weights and biases are updated using gradient descent.
Gradient descent optimization is explained in Algorithms 1 and 2. But gradient descent is a very basic optimization algorithm and is used only to explain the concept.
ere are more advanced optimization algorithms, such as gradient descent with momentum [42], RMSprop [43], and Adam [44]. In this paper, we are using Adam optimization for the learning process as it is one of the most effective optimization algorithms for training in deep neural networks. Adam optimization can be expressed mathematically in equation (6), where v corrected dW [l] stores the exponentially weighted average of past gradients with bias correction for layer l, s corrected dW [l] calculates exponentially weighted average of the squares of the past gradients for layer l, (β) 1 and (β) 2 are hyperparameters that control the two exponentially weighted averages, α is the learning rate, t counts the number of steps taken for Adam optimization, l means the number of layers, and ϵ is a tiny value that is added to avoid divide by zero error.
e main objective of this research work is to explore novel techniques in deep learning to understand different parameters such as suicide, success, weapon type, the type of attack, and regions of attack that lead to a terrorist activity. ese factors help law enforcement agencies to create strategies for counterterrorism. e deep learning algorithm 6 Complexity was used to learn the pattern of this big data available by GTD using most recent optimization techniques and make reasonable predictions and classifications. Even though many researchers have worked in the domain of using AI solutions for counterterrorism, no one has studied an effective mechanism of understanding factors of terrorism using deep learning, which is becoming very popular recently with the increased data and increased computational [53,54] power. To the best of the authors' knowledge, no comprehensive work is dedicated to predict and classify factors of terrorism using deep learning algorithms. erefore, it is sensible to study the problem of predicting future terrorist activities from the perspective of deep learning to demonstrate the full potential of deep neural network.

Experimental Setup on Cluster.
e working environment for all experiments in this paper is given in Table 2.

Architecture of NN and DNN.
In the experiments for this paper, the NN consists of one hidden layer having 10 units. e DNN consists of 5 hidden layers. e first layer has 100

Accuracy in Train and Test
Datasets. e accuracy of train and test datasets for every iteration computed by NN and DNN is given in Figure 4. e accuracy at 500 iterations for suicide prediction on NN is shown in Figure 4(a) and that on DNN is shown in Figure 4(b). It can be observed that the accuracy in train on DNN is more stable than the NN. Although the accuracy after 500 iterations by NN is very close to DNN, the stability achieved by train and test in DNN is promising and gives better accuracy in different test datasets. e per iteration accuracy in making success prediction in NN is shown in Figure 4(c) and that in DNN is shown in Figure 4(d). NN is not able to make any improvement after 200 iterations, and the accuracy remains around 86%. But in DNN, test accuracy is around 92%. is demonstrates performance improvement by DNN compared to NN. e accuracy on every iteration for weapon type prediction in NN is shown in Figure 4(e). It can be seen that after 100 iterations, the accuracy remains close to 72%. e accuracy in DNN is shown in Figure 4(f ), and after 100 iterations, the accuracy is close to 92%. is demonstrates the improvement in accuracy by DNN compared to NN. e accuracy in NN for making region prediction is shown in Figure 4(g). e maximum accuracy achieved is around 80%. However, more than 95% accuracy is achieved in DNN as shown in Figure 4(h). is experiment demonstrates the performance improvements in DNN as compared to NN. e accuracy in attack type prediction by NN is shown in Figure 4(i). As shown in the figure, the accuracy is around 78%. But the accuracy achieved by DNN is around 92% as shown in Figure 4(j). All these experiments demonstrate that as the number of layers is increased, the network is able to learn more complex nonlinearity in the big data and hence able to make predictions efficiently.

Comparison of Accuracy, Precision, Recall, and F1-Score in NN and DNN.
Accuracy e formulae to calculate accuracy, precision, recall, and F1-Score are given in equation (7). TP means true positive, TN means true negative, FP means false positive, and FN means false negative. e comparison in accuracy in train and test datasets computed by NN and DNN is given in Figure 5. All these experiments demonstrate that DNN is able to achieve an accuracy of more than 91% in both train and test datasets. e maximum accuracy is achieved in suicide dataset, which is around 98%. e comparison of precision, recall, and F1-Score in test data computed by NN and DNN is given in Figure 6. It can be observed that DNN has achieved more than 91% in precision, recall, and F1-Score. is is another demonstration that as the number of layers is increased, the network is able to learn the features in the dataset and is able to make efficient predictions.

Confusion Matrix.
e confusion matrix is a performance measurement in machine learning classification problems. In case of binary classification, the table is 2 × 2 showing true positive, true negative, false positive, and false negative. In case of multiclass classifications, the table has Input: the whole dataset of GTD along with labels Output: optimized values of W and b Data: GTD Dataset ALGORITHM 1: e training of neural network with gradient descent optimization algorithm. 8 Complexity size equal to number of classes squared. e confusion matrix computed by DNN for suicide and success is given in Figure 7. e confusion matrix for weapon type, region, and attack type is given in Figure 8. A confusion matrix with large values on the diagonal demonstrate the high accuracy of the model. As shown in these figures, confusion matrix       has high values on the diagonals and hence DNN is proved to be an efficient model for making predictions.

ROC Curve
e ROC (receiver operating characteristic) curve shows the performance of the classification model at classification thresholds. e curve shows two parameters: true-positive rate (TPR) and false-positive rate (FPR). ese parameters are defined in equation (8).
e ROC computed by DNN in making prediction of suicide, success, weapon type, region, and attack type is given in Figure 9. e ROC shows that DNN is able to make classification with accuracy more than 94%.

Comparison of NN and DNN with Traditional Machine
Learning Algorithms. In this section, the performance of the model based on NN and DNN is compared with traditional machine learning algorithms, i.e., logistic regression, SVM, and Naïve Bayes. e comparison in terms of average train and test accuracy and average precision, recall, and F1-Score is shown in Table 3. ese results demonstrate that DNN is the most suitable model for this type of dataset as it is an example of big data, where the performance improves when there is big data and a deeper network. Traditional machine learning algorithms such as logistic regression, SVM, and Naïve Bayes including a single-layer NN are not able to capture the pattern in the dataset, and thus the maximum performance of approximately 84% is achieved. But in DNN, it is possible to achieve 95% accuracy on average.

Conclusion
Terrorism is the most important threat to the life of mankind of any time. It can affect the quality of life of not only an individual but the whole society. e fear of terrorism restricts people from contributing in the development of the country. In every country, dealing with terrorism is the top most priority of the government. ey seek for techniques to Input: the whole dataset of GTD along with labels Output: optimized values of W and b Data: GTD Datasets (1) W [1..L] � random numbers //Glorot Uniform initializer (2) b [1..L] � random numbers A [j] � g(Z [j] )//g(Z) � max(0, z) (8) increment j by 1   Data Availability e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.