Nature-Inspired-Based Approach for Automated Cyberbullying Classification on Multimedia Social Networking

Training and Research, ICT Academy, Chennai, India Department of Computer Science and Engineering, SNS College of Technology, Coimbatore, India Department of Computer Science, Government Bikram College of Commerce, Patiala-147001, Punjab, India Dept of Computer Science and Engineering, Chennai Institute of Technology, Chennai, India Southern Federal University, Rostov-on-Don, Russia Dept of Computer Science and Engineering, Jagran Lakecity University, Bhopal, India School of Electronics and Electrical Engineering, Lovely Professional University, Phagwara 144411, India Department of Information Technology, College of Computers and Information Technology, Taif University, P. O. Box 11099, Taif 21944, Saudi Arabia Department of Computer Science, College of Computers and Information Technology, Taif University, P. O. Box 11099, Taif 21944, Saudi Arabia


Introduction
Cyberbullying (CB) is considered as a new or electronic form of traditional bullying [1]. CB is defined as a repetitive, intentional, and aggressive reaction committed by a group or an individual against another group or an individual, which is made by the utilization of Information Communication Technology (ICT) tools such as social media, Internet, and mobile phones [2]. e entire CB incidents are carried out virtually in Internet media rather than in physical form. e CB consists of hatred messages transmitted via social networking, e-mails, etc. through personal or public computers or through personal mobile phones. is has aroused as a serious threat among nations [1]. Various privacy-preserving tools are adopted in the Internet arena to protect the data; however, most mechanisms are challenged by the process of traffic classification [3], which is a vital workhorse for network management, where it becomes a key factor in assigning the privacy level to classify malign and benign standpoints [4].
is is true in case of testing the methods with a selected dataset on the Dark Web Forum Portal [5]. e CB consists of hatred messages transmitted via social networking, e-mails, etc. through personal or public computers or through personal mobile phones. is has aroused as a serious threat among nations [1]. e research on previous studies considers CB as a distinct variant from the traditional bullying [2,6]. e suggested variances between the CB and traditional bullying reveal the inadequacy of CB findings from conventional bullying [7]. Evidences found in [8] reveal that there exist several features of CB that vary between its prevalence rates, protective and risk factors, rick outcomes, and strategies adopted for its prevention.
e CB features are partially related and partially distinct with conventional bullying [2,9]. CB, on other hand, has impacted the victims psychologically and physically with its increasing prevalence, where most vulnerability is reported among youths [2].
Hence, it is vital to detect the CB context and its applications to reduce the vulnerability. However, from the view of the cyber world, the application involving CB involves difficulties associated with ignorance of aggressors and their identity, lack of direct communication, and relating consequences over others [10][11][12][13][14][15]. e failure to direct communication causes partial interpretation of the significance or the nature of the message, and it leads to confusion over the individual's intentionality with exchange or interaction messages. In spite of the problems while identifying the behavioral intent of an individual, the major factor that creates transition from aggression to CB is the intention of harming oneself [16].
In the current scenario, an automated behavior of social network platforms alerts the moderators to review the reported CB contents. However, most of the frameworks lacks an automated intelligent system that alerts the moderators and detects the contents in an automated way faster than the traditional reporting system. is enables the moderator to respond on the alert and take required action on reporting the user or removing the content [17]. e major constraint existing on existing detection systems with CB research is the lack of input data. e existing research is carried out conventionally on available datasets or the surveyed data, where the perpetrators or the victims are allowed to report the impressions [18]. e other issue associated with automated CB detection is the proper operationalization on CB contents that considers only the available literatures in the CB detection field for achieving the aim of automated detection to accurately identify the events of CB. e other issue associated with automated CB detection is the proper operationalization on CB contents that considers only the available literature in the CB detection field for achieving the aim of automated detection to accurately identify the events of CB. is creates complexity in identifying the events, and hence, well-developed tools are essential in integrating the features with an automated decision model [19].
Various research studies on automated cyberbullying detection with intelligent systems are reported in [8,18,[20][21][22][23][24][25][26][27][28]. ese studies utilized machine learning algorithms for automated detection of CB contents utilizing several common and psychological features. ese intelligent systems on CB detection are reported to be low, and it is principally limited with the comment of an individual leaving the context. An existing study has reported utilization of the user context in action that involves the characteristics of users and history of user comments to improve the performance of CB detection/classification [17].
In this paper, we utilize an integrated feature model that collects and trains the system with taking psychological features, user comments, and the context into consideration for CB detection. A classification engine using an artificial neural network (ANN) as impacted from [22] enables CB classification, and the operation on each classification is monitored by the reward-penalty model of a Deep Reinforcement Learning (DRL) engine.
e study contributes to the following in the field of CB detection: (a) e authors develop a series of frameworks that extracts the CB contexts from raw input messages. e study considers utilizing wide varied features to train the feature extraction module, and this involves the psychological traits, user comments, and context. (b) e authors develop an integrated classification engine that combines an ANN with DRL to classify the CB contents and improve the results after each iteration based on the feedback obtained from the DRL mechanism. Here, the entire classification is carried out by the ANN algorithm, and the DRL provides state-action-reward for each classified results.
e outline of the study is given as follows: Section 2 discusses the related works. Section 3 provides the proposed classification engine. Section 4 evaluates the entire work. Section 5 concludes the work with possible directions of future scope;

Related Works
Nandhini and Sheeba [20] presented a detection technique to combat CB on social media. e study extracts features such as the noun, pronoun, and adjective obtained from the text and frequency of words occurrences. ese features are used to classify various activities such as Harassment, Flaming, Terrorism, and Racism using a Fuzzy logic-based  [21] employed a Support Vector Machine (SVM) classifier to classify the CB based on various features such as local, sentimental, contextual, and gender-specific language features. e SVM classifier combined with a tf-idf measure and linear kernel identifies the online harassment.
Kumar and Sachdeva [28] reviewed various studies and found both direct and indirect CB features have higher impacts on machine learning classification. e results of classification show that the SVM classifier has higher classification rate than other supervised/unsupervised learning methods.
Al-garadi et al. [8] used the SVM [21,28], naïve Bayes (NB) [25,28], k-nearest neighbor (KNN), and random forest (RF) [25] classifier with various features extracted from the Twitter data that include network, activity and user information, and tweet content. e features are selected using the information gain, c2 test, and Pearson correlation. Furthermore, the classified results are optimized using a synthetic minority oversampling approach, and classes are balanced with weight adjustment in the dataset. e result shows that the RF has higher classification accuracy.
Balakrishnan et al. [25] developed an automated detection model with Big Five and Dark Triad models for user personality determination. e classification is carried out with various machine learning classifiers, NB, RF, and J48, to detect bully, spammer, aggressor, and normal. e psychological features are selected from the twitter data for better tweet classification. e study confirmed that the user personalities on classification have higher impacts on detection than other traits.
Murnion et al. [18] developed an Artificial Intelligencebased CB detection from an automated data collection system from the chat data of online multiplayer games. e sentiment text analytics system is supported with a scoring scheme for optimal classification. e study is assigned with eight descriptive attributes including IsAbusive, IsPositive, IsNegative, HasBadLanguage, IsRacist, NoobRelated, Spe-cificTarget, and FilteredText for potential identification of CB. e estimation of the CB score found that the both Twinword-and Microsoft-aware sentimental analysis were poor with less classification score.
Ho et al. [27] used 90 features categorized into 10 classes and utilized it for classification using a logistic regression model. e detection is improved by training the model with 14 abusive words for reducing the false classification rate.
Balakrishnan et al. [24] used an RF classifier with multiple decision trees, where classification is finally determined based on majority of votes. e study selects 15 twitter features [23] using Big Five and Dark Triad models to find the user personalities.
Sánchez-Medina et al. [26] used ensemble classification trees with Dark Triad for identifying the personality trait. e study used psychopathy, narcissism, and abusive words and then n-grams, blacklists, and edit-distance metrics for the detection of obfuscated words. A three-layered neural network model is used finally for classification, which acts as an unsupervised learning model. e misclassification is reduced by employing a 1.5 million nonabusive words dataset which improves the classification using neural network. e abovementioned research used minimal features to classify the datasets, and furthermore, the CB word is treated as the seed word for DB detection. However, the CB word is a distinctive vocabulary that fails to cover all cases.
Machiavellianism for potentially detecting the CB sexual assaults in social media: Lee et al. [22] used an embedded vector representation such as skip-gram word2vec that represents the words as vectors. e cosine similarity detects the new one.
Balakrishnan et al. [24] used an RF classifier with multiple decision trees, where classification is finally determined based on majority of votes. e study selects 15 twitter features [23] using Big Five and Dark Triad models to find the user personalities.
Sánchez-Medina, et al. [26] used ensemble classification trees with Dark Triad for identifying the personality trait.
e study used psychopathy, narcissism, and machiavellianism for potentially detecting the CB sexual assaults in social media.
Lee et al. [22] used an embedded vector representation such as skip-gram word2vec that represents the words as vectors. e cosine similarity detects the new abusive words and then n-grams, blacklists, and edit-distance metrics for the detection of obfuscated words. A three-layered neural network model is used finally for classification, which acts as an unsupervised learning model. e misclassification is reduced by employing a 1.5 million nonabusive words dataset which improves the classification using neural network. e abovementioned research used minimal features to classify the datasets, and furthermore, the CB word is treated as the seed word for DB detection. However, the CB word is a distinctive vocabulary that fails to cover all cases.

Proposed Method
In the present research, the entire focus is not on a specific CB word, but the vulgarity is determined based on weight score calculation and harmfulness index estimation for the entire word sequence (optimal words chosen by the feature selection method) of the collected tweets. is reduces well the cost of training data construction and further with the dependency between the phrases. e architecture of the proposed classification model is given in Figure 1.
We consider an annotated dataset D � {(x i , ∼c i )}, where x i are the twitter CB datasets and without label ∼c i . e datasets are divided into smaller subset L⊂D. e aim is to detect the CB instances from the twitter data that may vary from long to short paragraphs.

Preprocessing.
e preprocessing method uses a lexical normalization method [29] that uses various components to clean the input tweet data. It further converts the numerical variables into an equivalent text data. e spell corrector component helps to reduce the outbound vocabulary terms, Mathematical Problems in Engineering and in prior, the entire redundant or missing variables are cleaned that involve spelling errors, wrong punctuations, etc.

Feature Selection.
e selection of features (given in Table 1) from the input twitter datasets involves three different methods including Information Gain [30], chi-square χ2 [31], and Pearson correlation [32]. ese methods are employed to select the features from the preprocessed datasets.

Information Gain.
Decision tree algorithm is utilized to implement the feature extraction using information gain. e information gain is defined as the measure of entropy that is used widely in the machine learning domain. It acts as a statistical method that assigns the weights of features based on the correlation between the categories and the features.
We consider a dataset S (s 1 , s 2 , . . ., s n ), which is regarded as the collection of varying instances, say n s. t.
e information gain on each feature is defined used for classification of input data, where A q (a q1 , a q2 ,. . ., a qk ) represents the q th attribute (q � 1, 2, . . ., p). e conditional entropy for an attribute A q (a q1 , a q2 ,. . ., a qk ) is, thus, represented as  where a qj -A q is the attribute value with a k value, p (a qj ) is the probability of categorical variable C, and p (c i |a qj ) is the conditional probability of C after the value of A q is fixed. en, information gain is estimated as the difference between the value H (C) and H (C|A q ), and this offers the attribute value A q as stated below: Usually, the higher the information gain is, the more vital the feature is then considered for classification.
If the value of information gain is high, the feature is considered to be vital for the purpose of classification.

Chi-Square χ 2 .
e chi-square statistics is used in feature extraction as an information theory function that helps in extraction of elements, say t k over a class c i . ese elements are considered to be distributed widely and differently in sets of negative and positive examples of c i .
where N-total documents; A-total documents in c i containing t k ; B-total documents containing t k other than c i ; C-total documents in c i without t k ; and D-total documents without t k other than c i . e next step is the assignment of scores for each c i as discussed in the abovementioned equation, and the collective scores are summed into a single final score. e final score helps in classification of attributes, and the top score is selected.

Pearson Correlation.
e Pearson correlation coefficient in the present study is used for the estimation of optimal features by calculating the degree of linear correlation between the extracted class and original class.
where sim i -similarity between the i th class and original class of a dataset; X j and Y ij -selected attribute data to be tested on the i th class, X-and Y-average value of selected attribute data, and with the original class of a dataset, and finally, the entire attribute data are normalized.

ANN.
Artificial neural networks [33] are trained with weights of input features as in Figure 2(a), and furthermore, it is trained by proper reduction of an error function. e selection of a reduced error function helps in classification in terms of reduced cross-entropy error as follows: e size of the input twitter dataset D, for an ANN classification model P (y|x) is influenced by the selection of CB from D. e challenge of model building is to summarize the underlying distribution from the specific instance D of the samples. e problem with the memory of the dataset is known as overfitting rather than identifying the dataset distribution.
An activation feature is considered as a real function that determines the value of the neuron returned. e present study uses inverse trigonometric functions as the activation function.
Multilayer perceptron is the most frequent architecture of a feedforward neural network. e input layer, output layer, and hidden layer consist of at least three layers (Figure 2(b)). e deep neural network (DNN) is a multilayered MLP. More precisely using fewer neurons, additional layers and, therefore, connections enable the modelling of rare dependencies in the training data [4]. Nevertheless, the DNN learning process can result in overfitting and declining performance [5].
In the theory of ANN, the universal approximation theorem says that a single hidden layer of MLP is enough to estimate, with a certain accuracy, all compactly supported continuous real functions. In many cases, however, DNN predictions are more exact, as research shows [3], compared to those obtained by ANN networks.
ANN changes weights depending on the degree of an error function during the training process to minimize the error.
ere are several different algorithms for training purposes. Depending on a particular problem, the algorithms may vary in performance [34]. [35,36] consists of agents that access its actions and observations at a time to either reward or penalize the actions, i.e., the classification. e detailed steps are given in Algorithm 1, where DRL compares the classified results of the ANN with features extracted in the repository. If the observed and the original class are the same, then the classifier is rewarded, and vice versa. e executions of Algorithm 1 are sent to the ANN that determines whether the unsupervised learning at each iteration is of a reward or a penalty one. is ensures that the classification of ANN-DRL is accurate and precise. Finally, the estimation of the harmfulness index [37] helps in the estimation of the CB detection as accurate or not.

Results
In this section, we present the details of the experiments using the collected datasets and the performance metrics. e study has selected 30,384 tweets collected from the twitter datasets [4]. e tweets contain both CB and non-CB tweets, where automated labelling or tagging is carried out using feature selection methods. e tagging of CB and non-CB is made based on various attributes as mentioned in Table 1, which is a common trait used in online communication over social networks. e input tweet data are, hence, classified as CB and non-CB, where the former indicates the vulnerable behavior and the latter indicates genuine behavior. Out of Mathematical Problems in Engineering 30,384, more than 1252 tweets are classified as CB datasets; however, the labelled data are not used to train the classifier. ese labelled data act as an input for the DRL method, which rewards or penalizes the ANN mechanism. e entire datasets have more imbalanced classes that penalize the unsupervised ANN with inaccurate results in identifying the relevant instances. e ANN, on other hand, with imbalanced classes, ignores minor classes, and it performs well with major classes. e weight adjustment approach helps to avoid oversampling of the minority class, i.e., abnormal class and undersampling the majority class, i.e., the normal class. e entire set of experiments is conducted with the topmost algorithms performed well in existing methods that include the ANN, SVM, RF, and LR. ese existing methods are compared with ANN-DRL to find the classification accuracy. As in [8], the present study utilized three feature selection methods, namely, information gain, χ 2 , and Pearson correlation techniques. A 10-fold cross validation is conducted, and the proposed classifier is tested individually with all three feature selection methods. e performance is estimated against various metrics that include accuracy, F-measure, geometric mean (G-mean), percentage error, precision, sensitivity, and specificity. e details of the metrics are given below.
Accuracy is defined as the total number of predictions required to ensure that the system works correctly. It is estimated as the ratio of the total number of correct predictions and the total predictions; .
Here, TP is the true positive cases, where the model classifies the CB classes correctly. TN is the true negative cases, where the model classifies the non-CB classes correctly. FP is the false positive cases, where the model wrongly classifies the CB classes correctly. FN is the false negative cases, where the model wrongly classifies the non-CB classes correctly. F-measure is the weighted harmonic mean of the recall and precision values, which ranges between zero and one. Higher value of F-measure refers to higher classification performance.
G-mean is defined as the aggregation of sensitivity and specificity measure, which intends to maintain the trade-off between them, especially when the dataset is found to be imbalanced. is is measured as follows: Mean Absolute Percentage error (MAPE) is defined as the measure of prediction accuracy that measures the total loss while predicting the actual classes. It is measured as the ratio of the difference between the actual (A t ) and predicted class (F t ), and the actual class. e entire value is multiplied by 100% and divided by the fitted points (n). e formula for the percentage error is defined as follows: Sensitivity is defined as the ability of the deep learning model to identify correctly the true positive rate.
Input Output Hidden

Mathematical Problems in Engineering
Specificity is defined as the ability of the deep learning model to identify correctly the true negative rate.

Analysis.
is section provides the results of classification as in following tables. e proposed ANN-DRL is validated and compared with existing methods, namely, the ANN, SVM, RF, LR, and NB. e results of predicting the CB are validated against 60%, 75%, and 90% training data with various feature extraction methods: information gain, χ 2 , and Pearson correlation techniques. Figures 3-5 show the results of training the feature selection method with 60%, 75%, and 90% of training data and presenting the classification accuracy of the proposed classifier. e result shows that the Pearson correlation has the highest classification accuracy than information gain and χ 2 . e result further shows that, at some point, with increasing the number of residuals, the classification accuracy using information gain as a feature selection tool drops the most compared with chi-squared and Pearson correlation. erefore, the class of CB is determined accurately with Pearson correlation and ANN-DRL as the classifier; Tables 2-4 show the results of predicting the CB over 60%, 75%, and 90% of training data with information gain as a feature selection tool. Tables 5-7 show the results of predicting the CB over 60%, 75%, and 90% of training data with χ 2 tool. Tables 8-10 show the results of predicting the CB over 60%, 75%, and 90% of training data with Pearson correlation tool. e results of simulation show that the proposed method has higher classification accuracy than the existing classifiers. It is further inferred that the Pearson correlation has optimal selection of features that has boosted the classification accuracy with 90% training data than 75% or 60% datasets. e other metrics show optimal performance for Pearson correlation than the other feature selection tools. Furthermore, the MAPE of the ANN-DRL is lesser than that of the other methods (Table 11).
To test the efficacy of ANN algorithm in the proposed method, we validate the algorithm with a 3000 test dataset and present a confusion matrix. Here, the 3000 test samples are picked randomly from the overall datasets, which is not native to the trained datasets. A 10-fold cross validation is conducted to test the ANN with the DRL scheme. e result shows that the classified results have 1740 TP cases, 1030 TN cases, 160 FN, and 70 FP cases, which is evident from Table 12.     Mathematical Problems in Engineering Depending on the execution results, we found the computational complexity of the ANN-DRL is lesser than that of the existing machine learning methods on detecting the cyberbullying contents. However, the complexity increases with increased layers of the neural network and increased iterations on DRL. It is found that the ANN-DRL is O (nl + en + n 3 + n layers ) for training and O (l + en + n 3 + n layers ) for testing, where n is the training samples, l is the features, and +n layers is the total number of hidden layers with n neurons. e ANN has O (nl + n layers )

Conclusions
In this paper, an integrated model using an ANN and DRL is designed for the classification of CB from raw text datasets of a social media engine. e extraction of psychological features, user comments, and the context has enabled better classification performance, where an ANN at the initial stage performs with improved classification results. e addition of a reward-penalty system using DRL has enhanced the classification to a much greater level than the ANN model. e simulation results illustrate the improved average classification accuracy of 80.69% using ANN-DRL than existing three-layered ANN (77.40%), SVM (75.44%), RF (75.55%), LR (75.10%), and NB (75.19%). In future, the convolutional neural network can be applied on image datasets to extract the information to serve the purpose on reducing the cyberbullying. [50] [49] Data Availability e data used to support the findings of this study are available from the author upon request (gdhiman0001@ gmail.com).

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Ensemble classification trees
Lee et al. [22] New abusive words ree-layered neural network model