Multiclass Event Classification from Text

Social media has become one of the most popular sources of information. People communicate with each other and share their ideas, commenting on global issues and events in a multilingual environment. While social media has been popular for several years, recently, it has given an exponential rise in online data volumes because of the increasing popularity of local languages on the web. This allows researchers of the NLP community to exploit the richness of different languages while overcoming the challenges posed by these languages. Urdu is also one of the most used local languages being used on social media. In this paper, we presented the first-ever event detection approach for Urdu language text. Multiclass event classification is performed by popular deep learning (DL) models, i.e.,Convolution Neural Network (CNN), Recurrence Neural Network (RNN), and Deep Neural Network (DNN). The one-hot-encoding, word embedding, and term-frequency inverse document frequency- (TF-IDF-) based feature vectors are used to evaluate the Deep Learning(DL) models. The dataset that is used for experimental work consists of more than 0.15 million (103965) labeled sentences. DNN classifier has achieved a promising accuracy of 84% in extracting and classifying the events in the Urdu language script.


Introduction
In the current digital era, social media dominated other sources of communication, i.e., print and broadcast media [1]. Real-time availability [2] and multilingual support [3] are the key features that boost the usage of social media for communication.
e usage of local languages on social media is overwhelming for the last few years. People share ideas, opinions, events, sentiments, and advertisements, etc. [4] in the world via social media using local languages. A considerable amount of heterogeneous data is being generated which causes challenges to extract worthy insights, while this information plays a vital role in developing natural language processing (NLP) application, i.e., sentiment analysis [5], risk factor analysis [6], law and order predictor, timeline constructor, opining mining, decision-making systems [7], monitoring social media [8], spam detection, information retrieval, document classification [9], e-mail categorization [10], and sentence classification [11], topic modeling [12], content labeling, and finding the latest trend.
In South Asia (https://www.worldometers.info/), about 24.98% population of the world live in different countries.
Many languages are being spoken in Asia. e most famous among these are Arabic, Hindi, Malay, Persian, and Urdu, etc.

Features of Urdu Language.
e Urdu language is one of the languages in South Asia that is frequently used for communication on social media, namely, Facebook, Twitter, News Channels, and Web Blogs [13]. It is also the national language of Pakistan which is the 6th (https://www. worldometers.info/world-population/population-bycountry/) most populous country in the world. In other countries, i.e., India, Afghanistan, and Iran, the Urdu language is also spoken and understood. ere are 340 million people in the world who use the Urdu language on social media for various purposes [13].
e Urdu language follows the right-to-left writing script. Its grammatical structure is different from other languages.
(1) Subject-object-verb (SOV) sentence structure [14] (2) No letter capitalization (3) Diacritics (4) Free word order [15] e Urdu language has 38 basic characters which can be written as joined and non-joined with other characters [16]. e words having joined characters of Urdu alphabet set are called ligature, and this joining feature of the alphabets made possible to enrich the Urdu vocabulary having almost 24,000 ligatures [15,16]. It is pertinent to mention that this alphabet set is also considered as a superset of all Urdu script-based languages alphabets, namely, the Arabic and Persian, which contain 28 and 32 alphabets, respectively. Furthermore, there are also some additional alphabets in Urdu script that are used to express some Hindi phonemes [15,16].

Event
Classification. An event can be defined as "specific actions, situations, or happenings occurring in a certain period [17,18]." e extracted information can represent different types of events, i.e., sports, politics, terrorist attacks, and inflation, etc.; information can be detected and classified at a different level of granularity, i.e., document level [19], sentence level [20], word level, character level, and phrase level [21].
Event classification is an automated way to assign a predefined label to new instances. It is pertinent to describe that the classification can be binary, multiclass, and multilabel [22]. e implementation of a neural network for text classification provided help to handle a complex and large amount of data [23]. Semantically similar words are used to generate feature vectors [24] that eliminate the sparsity of n-grams models. Urdu text classification is performed [25] to assess the quality of the product based on comments and feedback. In [25], an embedded layer of the neural network was used to convert text into numeric values and classification performed at the document level. Contrary to [25], multiclass event classification is performed at the sentence level instead of the document level. We further performed multiple experiments to develop an efficient classification system using TF-IDF, one-hot-encoding, pretrained Urdu word embedding model and by creating custome pretrained Urdu language word embedding models.

Event Classification Challenges.
e lack of processing resources, i.e., part-of-speech (PoS) tagger, name, entity recognizer, and annotation tools, is the other major hurdle to perform the event detection and classification for the Urdu language. Many people are unfamiliar with the meaning and usage of some Urdu words. It creates semantically ambiguous content that makes the event classification process a nontrivial and challenging task. e unavailability of appropriate resources/datasets is another major challenge for data-driven and knowledge-based approaches to extract events and classify events.
Our contributions are given as follows: (1) e first-ever large-scale labeled Urdu dataset for event classification that is the biggest in terms of instances [15] and classes [25] in other Urdu text datasets reported in state of the art [19,26,27] (2) To our best knowledge, it is the first multiclass event classification task at sentence level for the Urdu language (3) Different feature vector generating methods, i.e., one-hot-encoding, word embedding, and TF-IDF, are used to evaluate the performance of DNN, CNN, and RNN deep learning models (4) Pretrained and custom word embedding models for the Urdu language are also explored (5) Performance comparison of traditional machine learning classifiers and deep learning classifiers In this paper, we performed a multiclass event classification on an imbalance dataset of Urdu language text. Our framework is a design to classify twelve different types of events, i.e. sports, inflation, politics, casualties, law and order, terrorist attack, sexual assault, fraud (and corruption), showbiz, business, weather, and earthquake. Furthermore, we also presented a detailed comparative analysis of different deep learning algorithms, i.e., long short-term memory (LSTM) and convolutional neural network (CNN) using TF-IDF, one-hot-coding, and word embedding methods. We also compared the results of traditional machine learning classifiers with deep learning classifiers.

Related Work
In the past, researchers were impassive in the Urdu language because of limited processing resources, i.e., datasets, annotators, part-of-speech (PoS) taggers, and translators [14], etc. However, now, since the last few years, feature-based classification for Urdu text documents started the use of machine learning models [28][29][30]. A framework was proposed [31] to classify Chinese short texts into 7 kinds [32] of emotion and product review. e event-level information from the text and conceptual information from the external knowledge base are provided as supplementary input to the neural models.
A fusion of CNN and RNN models is used to classify sentences using a movie review dataset and achieved 93% accuracy [33]. A comparative research study of machine learning (ML) and deep learning (DL) models is presented [25] for Urdu text classification at the document level. CNN and RNN single-layer/multilayer architectures are used to evaluate three different sizes of the dataset [26]. e purpose of their work was to analyze and to predict the quality of products, i.e., valuable, not valuable, relevant, irrelevant, bad, good, or very good [25].
Different datasets reported in state of the art, i.e., Northwestern Polytechnical University Urdu (NPUU), consist of 10K news articles labeled into six classes, Naïve dataset including 5003 news articles consists of five classes [34] and Corpus of Urdu News Text Reuse (COUNTER) having 1200 news articles with five classes [27]. A joint framework consisting of CNN and RNN layers was used for sentiment analysis [35]. Stanford movie review dataset and Stanford Treebank dataset were used to evaluate the performance of the system. eir proposed system showed 93.3% and 89.2% accuracy, respectively.
In [35], the authors performed a supervised text classification in the Urdu language by using a statistical approach like Naïve Bayes and support vector machine (SVM). e classification is initiated by applying different preprocessing approaches, namely, stemming, stop word removal, and both stop words elimination and stemming. e experimental results showed that the steaming process has little impact on improving performance. On the other hand, the elimination of stop words showed a positive effect on results.
e SVM outperformed the Naïve Bayes by achieving the classification accuracies of 89.53% and 93.34% based on polynomial and radial function, respectively.
Similarly, the SVM is also applied in the news headlines classification [36] in Urdu text showing a very low amount of accuracy improvement of 3.5%. News headlines are a small piece of information that frequently does not describe the contextual meaning of the contents. In [36], the majority voting algorithm used for text classification in the Urdu language showed 94% accuracy. e classification is performed on seven different types of news text. However, the number of instances was very limited. A dynamic neural network [37] was designed to model the sentiment of sentences. It consists of dynamic K-modeling, pooling, and global pooling over a linear sequence that performs multiclass sentiment classification.
A quite different task is performed [38] in which the authors used a hybrid approach of rule-based and machine learning-based techniques to perform the sentiment classification while analyzing the Urdu script [38] at the phrase level. e hybrid approach showed an accuracy of 31.25%, 8.46%, and 21.6% using the performance metrics of recall, precision, and accuracy, respectively. In [39], a variant of recurrent neural network (RNN) called long short-term memory (LSTM) is used to overcome the weakness of bagof-words and n-grams models and it outperformed these conventional approaches.
A neural network-based system [39] was developed to classify events. e purpose of the system was to help the people in natural disasters like floods by analyzing tweets.
e Markov model was used to classify and predict the location that showed 81% accuracy for classification tweets as a request for help and 87% accuracy to locate the location. Research work was conducted on life event detection and classification, i.e., marriage, birthday, and traveling, etc., to anticipate products and services to facilitate the people [40]. e data about life events exist in a very small amount. Linear regression, Naïve Bayes, and nearest neighbor algorithms were evaluated on the original dataset that was very small but did not show favorable results.
A multiple minimal reduct extraction algorithm was designed [41] by improving the quick reduct algorithm. e multiple reducts are used to generate the set of classification rules which represent the rough set classifier. To evaluate the proposed approach, an Arabic corpus of 2700 documents was used to categorize into nine classes. By using multiple and single minimal reducts, the proposed system showed 94% and 86%, respectively. Experimental results also showed that both the K-NN and J48 algorithms outperformed regarding classification accuracy using the dataset on hand. Table 1 depicts the summary of the related research discussed previously.

Data Collection.
Contrary to the dataset reported in state of the art [27,34] in which no datasets were created for event classification, we created a larger dataset specific for event classification. Instead of focusing on a specific product [25] analysis, or phrase-level sentiment analysis [38], we decided to classify sentences into multiple event classes. Instead of using the joint framework of CNN and RNN for sentiment analysis [35], we evaluated the performance of deep learning models for multiclass event classification. To collect data, a PHP-based web scraper is written to crawl data from the popular social media websites, i.e., Geo News Channel (https://urdu.geo.tv/) website, BBC Urdu (https://www.bbc. com/urdu), and Urdu point (https://www.urdupoint.com/ daily/). A complete post is retrieved from the website and stored in MariaDB (database). It consists of a title, body, published date, location, and URL. e sample text or tweet of both languages of the South Asian countries, i.e., Urdu language on Twitter and Hindi language on Facebook, is shown in Figure 1.
ere are 0.15 million (150,000) Urdu language sentences. e diversity of data collection sources helped us to develop multiclass datasets. It consists of twelve types of events.
e subset of datasets can be useful for other researchers.

Preprocessing.
In the first phase of dataset preparation, we performed some preprocessing steps, i.e., noise removing and sentence annotation/labeling. All non-Urdu words, sentences, hyperlinks, URLs, and special symbols were removed. It was necessary to clean out the dataset to annotate/ label the sentences properly.

Annotation Guidelines
(1) Go through each sentence and assign a class label (2) Remove ambiguous sentences (3) Merge relevant sentences to a single class, i.e., accident, murder, and death (4) Assign one of the twelve types of events, i.e., sports, inflation, murder and death, terrorist attack, politics, law and order, earthquake, showbiz, fraud and corruption, weather, sexual assault, and business, to each sentence To annotate our dataset, two M.Phil. (Urdu) level language experts were engaged. ey deeply read and analyzed the dataset sentence by sentence before assigning event labels. ey recommended removing 46035 sentences from the dataset because those sentences would not contain information that useful for event classification. Finally, after Scientific Programming annotation, the dataset size was reduced to 103965 imbalanced instances of twelve different types of events. e annotation interagreement, i.e., Cohen Kappa score, is 0.93, which indicates the strong agreement between the language and expert annotators. e annotated dataset is almost perfect according to the annotation agreement score.
In the second phase of preprocessing, the following steps are performed, i.e., stop words eliminated, word tokenization, and sentence filtering.
All those words which do not semantically contribute to the classification process are removed as stop words, i.e., ‫و‬ ‫ہ‬ , etc. A list of standard stop words of the Urdu language is available here (https://www. kaggle.com/rtatman/urdu-stopwords-list).
After performing data cleaning and stop word removal, every sentence is tokenized into words based on white space. An example of sentence tokenization is given in Table 2.
e previous preprocessing step revealed that many sentences are varying in length. Some sentences were so short, and many were very long. We decided to define a length boundary for tokenized sentences. We observed that many sentences exist in the dataset which have a length range from 5 words to 250 words. We selected sentences that consist of 5 words to 150 words. An integer value is assigned to each type of event for all selected sentences. e detailed description of the different types of events and their corresponding numeric (integer) values that are used in the dataset is also given in Table 3.
In our dataset, three types of events have a larger number of instances, i.e., sports (18746), politics (33421), and fraud and corruption (10078), contrary to three other types of events that have a smaller number of instances, i.e., sexual assault (2916), inflation (3196), and earthquake (3238). e remaining types of events have a smaller difference of instances among them. ere are 51814 unique words in our dataset.
e visualization in Figure 3 shows that the dataset is imbalanced.

Methodology
We analyzed the performance of deep learning, i.e., deep neural network, convolutional neural network, and recurrent neural network, along with other machine learning classifiers, i.e., K-nearest neighbor, decision tree, random forest, support vector machine, Naïve Bayes multinominal, and linear regression.
e Urdu news headlines contain insufficient information, i.e., few numbers of words and lack of contextual information to classify the events [29]. However, comparatively, to news headlines, the sentences written in informal way contain more information. e sentence-level classification is performed using deep learning models instead of only machine learning algorithms. e majority voting algorithm outperforms on a limited number of instances for seven classes. It showed 94% [36] accuracy, but in our work, more than 0.15 million instances which are labeled into twelve classes are used for classification.
ere exist several approaches to extract useful information from a large amount of data. ree common approaches are rule-based, a machine learning approach, and hybrid approaches [42]. e selection of methodology is tightly coupled with the research problem. In our problem, we decided to use machine learning (traditional machine learning and deep learning approaches) classifiers. Some traditional machine learning algorithms, i.e., K-nearest neighbor (KNN), random forest (RF), support vector machine (SVM), decision tree (DT), and multinomial Naïve Bayes (MNB), are evaluated for multiclass event classification.
Deep learning models, i.e., convolutional neural network (CNN), deep neural network (DNN), and recurrent neural network (RNN), are also evaluated for multiclass event classification.  [36] Dynamic neural network News articles 96.5 [38] Rule-based modeling Urdu corpus of news headlines 31.25 [39] LSTM Tweets 81.00 [41] K-NN and J48 Arabic corpus of 2700 documents 95 and 86 Various feature generating methods are used to create a feature vector for deep learning and machine learning classifiers, i.e., TF-IDF, one-hot-encoding, and word embedding. Feature vectors generated by all these techniques are fed up as input into the embedding layer of neural networks. e output generated by the embedding layers is fed up into the next fully connected layer (dense layer) of deep learning models, i.e., RNN, CNN, and DNN. A relevant class label out of twelve categories is assigned to each sentence at the end of model processing in the testing/validation phase.
Bag-of-words is a common method to represent text. It ignores the sequence order and semantic of text [43], while the one-hot-coding method maintains the sequence of text. Word embedding methods Word2Vec and Glove (https:// ybbaigo.gitbooks.io/26/pretrained-word-embeddings.html) that are used to generate feature vectors for deep learning models are highly recommended for textual data. However, in the case of Urdu text classification, pre-existing wrod2vec and Glove are incompatible. e framework of our designed system is represented in Figure 4. It shows the structure of our system from taking input to producing output.

Experimental Setup
We performed many experiments on our dataset by using various traditional machine learning and deep learning classifiers. e purpose of many experiments is to find the most efficient and accurate classification model for the multiclass event on an imbalance dataset for the Urdu language text. A detailed comparison between traditional classifiers and deep neural classifiers is given in the next section.

Feature Space.
Unigram and bigram tokens of the whole corpus are used as features to create the feature space. TF-IDF vectorization is used to create a dictionary-based model. It consists of 656608 features. e training and testing dataset are converted to TF-IDF dictionary-based feature vectors. A convolutional sequential model (see Figure 5) consists of three layers, i.e., the input layer, hidden layer, and output layer, which are used to evaluate our dataset. Similarly, word embedding and one-hot-encoding are also included in our feature space to enlarge the scope of our research problem.

Feature Vector Generating
Techniques. Feature vectors are the numerical representation of text. ey are an actual form of input that can be processed by the machine learning classifier.
ere are several feature generating techniques used for text processing. We used the following feature vector generating techniques.

Word Embedding.
A numerical representation of the text is that each word is considered as a feature vector. It creates a dense vector of real values that captures the contextual, semantical, and syntactical meaning of the word. It also ensures that similar words should have a related weighted value [29].

Pretrained Word Embedding Models.
Usage of a pretrained word embedding model for the small amount of data is highly recommended by researchers in state of the art. Glove and Word2Vec are famous word embedding models that are developed by using a big amount of data. Word embedding models for text classification, especially in the English language, showed promising results. It has emerged as a powerful feature vector generating technique among others, i.e., TF, TF-IDF, and one-hot encoding, etc.
e results of pretrained existing word embedding models are good at the initial level but very low, i.e., 60.26% accuracy. We explored the contents of these models, which revealed that many words are irrelevant and borrowed from other languages, i.e., Arabic and Persian. e contents of Wikipedia are entirely different than news websites that also affect the performance of embedding models. Another major factor, i.e., low amount of data, affected the feature vector generation quality. Stop words in the pretrained word embedding model are not eliminated and considered as a token, while in our dataset all the stop words are removed. It also reduces the size of the vocabulary of the model while generating a feature vector. erefore, we decided to develop a custom word embedding model on our preprocessed dataset. To postulate the enlargement of the research task, three different word embedding models are developed. e details of all used pretrained word embedding models are given in Table 4.

One-Hot-Encoding.
Text cannot be processed directly by machine learning classifiers; therefore, we need to convert the text into a real value. We used one-hot-encoding to convert text to numeric features. For example, the sentences given in Table 5 can be converted into a numeric feature vector using one-hot-encoding as shown in Table 6.

TF-IDF.
TF and TF-IDF are feature engineering techniques that transform the text into the numerical format. It is one of the most highly used feature vectors for creating a method for text data. ree deep learning models were evaluated on our corpus. e sequential model with embedding layers outperformed other pretrained word embedding models [44] reported in state of the art [48]. e detailed summary of the evaluation results of CNN, RNN, and DNN is discussed in Section 7.

Deep Neural Network Architecture.
Our DNN architecture consists of three layers, i.e., n-input layer, 150 hidden (dense) layers, and 12 output layers. Feature vector is given as input into a dense layer that is fully connected. e SoftMax activation function is used in the output layer to classify sentences into multiple classes.

Recurrence Neural Network.
e recurrence neural network is evaluated using a long short-term memory (LSTM) classifier. RNN consists of embedding, dropout, LSTM, and dense layers. A dictionary of 30000 unique most frequent tokens is made. e sentences are standardized to the same length by using a padding sequence. e dimension of the feature vector is set as 250. RNN showed an overall 81% accuracy that is the second highest in our work. Network (CNN). CNN is a class of deep neural networks that are highly recommended for image processing [49]. It consists of the input layer (embedding layer), multiple hidden layers, and an output layer.

Convolutional Neural
ere are a series of convolutional layers that convolve with a multiplication. e embedded sequence layer and average layer (GloobalAveragePooling1D) are also part of the hidden layer. e common activation of CNN is RELU Layer. e details of the hypermeters that are used in our problem to train the CNN model are given in Table 7.

Hyperparameters.
In this section, all the hyperparameters that are used in our experiments are given in the tabular format. Only those hyperparameters are being discussed here which have achieved the highest accuracy of DNN, RNN, and CNN models. e hyperparameters of DNN that are fine-tuned in our work are given in Table 8.
e RNN model showed the highest accuracy (80.3% and 81%) on two sets of hyperparameters that are given in Table 9. Similarly, Table 7 provides the details of the hyperparameters of the convolutional neural network.

Performance Measuring Parameters
e most common performance measuring [41] parameters, i.e., precision, recall, and F1-measure, are used to evaluate the proposed framework. e selection of these parameters was decided because of the multiclass classification and imbalance dataset.
Accuracy � (TP + TN) where TP, TN, FP, and FN represent total positive, total negative, false positive, and false negative values, respectively. Precision is defined as the closeness of the measurements to each other and recall is the ratio of the total amount of relevant (i.e., TP values) instances that were actually retrieved during the experimental work. It is Scientific Programming noteworthy that both precision and recall are the relative values of measure of relevance.

Deep Learning Classifiers.
e feature vector can be generated using different techniques. e details of feature vector generating techniques were discussed in Section 5.
e results of feature vector generating techniques that were used in our work, i.e., "multiclass event classification for the Urdu language text," are given in the proceeding subsections.

Pretrained Word Embedding Models.
e convolutional neural network model is evaluated on the features vectors that were generated by all pretrained word embedding models. e summary of all results generated by       [44] and custom pretrained word embedding models is given in Table 10. Our custom pretrained word embedding model that contains 57251 unique tokens, larger dimension size 350, and 1 as the size of a window, showed 38.68% accuracy. e purpose of developing a different custom pretrained word embedding model was to develop a domain-specific model and achieve the highest accuracy. However, the results of both pre-existing pretrained word embedding models and domain-specific custom word embedding models are very low. e detail summary of results can be seen in Table 10.

TF-IDF Feature
Vector. DNN architecture consists of an input layer, a dense layer, and a max pool layer. e dense layer is also called a fully connected layer comprised of 150 nodes. SoftMax activation function and sparse_categor-ical_cross-entropy are used to compile the model on the dataset. 25991 instances are used to validate the accuracy of the DNN model. e DNN with connected layer architecture showed 84% overall accuracy for all event classes. e details of the performance measuring parameters for each class of event are given in Table 11. Law and order, the 6th type of event in our dataset, consists of 2000 instances that are used for validation. It showed 66% accuracy that is comparatively low to the accuracy of other types of events. It affected the overall performance of the DNN model. e main reason behind these results is that the sentence of law and order overlaps with the sentences of politics. Generally, sometimes, humans hardly distinguish between law and order and political statements.
For example, "  ‫ح‬  ‫ک‬  ‫و‬  ‫م‬  ‫ت‬  ‫ی‬  ‫و‬  ‫ز‬  ‫ی‬  ‫ر‬  ‫ک‬  ‫ی‬  ‫غ‬  ‫ی‬  ‫ر‬  ‫ذ‬  ‫م‬  ‫ہ‬  ‫د‬  ‫ا‬  ‫ر‬  ‫ا‬  ‫ن‬  ‫ہ‬  ‫گ‬  ‫ف‬  ‫ت‬  ‫گ‬  ‫و‬  ‫خ‬  ‫ط‬  ‫ے‬  ‫ک‬  ‫ے‬  ‫ا‬  ‫م‬  ‫ن‬  ‫ک‬  ‫ے‬  ‫ل‬  ‫ی‬  ‫ے‬  ‫خ‬  ‫ط‬  ‫ر‬  ‫ہ‬  ‫ہ‬  ‫ے‬  ‫۔‬ " " e irresponsible talk of state minister is a threat to peace in the region" e performance of the DNN model is given in Table 11 that showed 84% accuracy for multiple classes of events. All the other performance measuring parameters, i.e., precession, recall, and F1-score, of each class of events are given in Table 11. e accuracy of the DNN model can be viewed in Figure 5, where the y-axis represents the accuracy and the xaxis represents the number of epochs. RNN achieved 84% accuracy for multiclass event classification. e expected solution to tackle the sentence overlapping problem with multiple classes is to use a "pretrained word embedding" model like W2Vec and Glove. However, unfortunately, like the English language, still, there is no open/close domain pretrained word embedding model that is developed by a large corpus of the Urdu language text. e RNN sequential model architecture of deep learning is used in our experiments.
e recurrent deep learning model architecture consists of a sequence of the following layers, i.e., embedding layer having 100 dimensions, Spa-tialDropout1D, LSTM, and dense layers. Sparse_categor-ical_cross-entropy loss function has been used for the compilation of the model. Multiclass categorical classification is handled by a sparse categorical cross-entropy loss function instead of categorical cross-entropy. A SoftMax activation function is used at a dense layer instead of the sigmoid function. SoftMax can handle nonlinear classification, i.e., multiple classes, while sigmoid is limited to linear classification and handles binary classification.
A bag-of-words consisting of 30000 unique Urdu language words is used to generate a feature vector. e maximum length of the feature vector is 250 tokens. e overall accuracy of the RNN model is presented in Table 12 that achieved 81% validation accuracy for our problem by using TF-IDF feature vectors. Other performance evaluation parameters of each class are also given in Table 12. e accuracy of the RNN model can be viewed in Figure 6, where the y-axis represents the accuracy and the xaxis represents the number of epochs. RNN achieved 81% accuracy for multiclass event classification.
Although CNN is highly recommended for image processing, it showed considerable results for multiclass event classification on textual data. e performance measuring parameters of the CNN classifier are given in Table 13. e distributed accuracy of the CNN classifier for the twelve classes can be viewed in Figure 7. ere is more than one peak (higher accuracies) in Figure 7 that showed datasets are imbalanced.

One-Hot-Encoding.
e results of deep learning classifiers are used in our researcher work, and their performance on one-hot-encoding features is presented in Figure 8. e onehot-encoded feature vectors are given as input to CNN, DNN, and RNN deep learning classifiers. RNN showed better accuracy as compared to CNN while the DNN outperformed

Traditional Machine Learning Classifiers.
We also performed a multiclass event classifier by using traditional machine learning algorithms: K-nearest neighbor (KNN), decision tree (DT), Naïve Bayes multinomial (NBM), random forest (RF), linear regression (LR) and support vector machine (SVM). All these models are evaluated using TF-IDF and one-hot encoding features, as feature vectors. It was observed that the results produced using TF-IDF features were better than the results generated using one-hotencoding features. A detailed summary of the results of the above-mentioned machine learning classifiers is given in the next section. (KNN). KNN performs the classification of a new data point by measuring the similarity distance between the nearest neighbors. In our experiments, we set the value of k � 5 that measures the similarity distance among five existing data points [50].    10 Scientific Programming

K-Nearest Neighbor
Although the performance of traditional machine learning classifiers is considerable, it must be noted that it is lower than deep learning classifiers. e main performance degrading factor of the classifiers is the imbalanced number of instances and sentences overlapping. e performance of the KNN machine learning model is given in Table 14. It showed 78% accuracy.

Decision Tree (DT).
Decision Tree (DT)Decision tree (DT) is a type of supervised machine learning algorithm [51] where the data input is split according to certain parameters. e overall accuracy achieved by DT is 73%, while another performance detail of classes and DTmodel is given in Table 15.

Naive Bayes Multinominal (NBM).
Naïve Bayes multinominal is one of the computational [52] efficient classifiers for text classification but it showed only 70% accuracy that is very low as compared to KNN, DT, and RF. e performance details of all twelve types of classes are given in Table 16.

Linear Regression (LR)
. Linear regression is highly recommended for the prediction of continuous output instead of categorical classification [53]. Table 17 shows the     performance of the LR model, i.e., 84% overall accuracy for multiclass event classification.

Random Forest (RF).
It comprises many decision trees [54]. Its results showed the highest accuracy among all evaluated machine learning classifiers. A detailed summary of the results is given in Table 18.

Support Vector Machine (SVM).
e support vector machine (SVM) is one of the highly recommended models for binary classification. It is based on statistical theory [55]. Its performance details are given in Table 19.
A comparative depiction of results obtained by the traditional machine learning classifiers is given in Figure 9.

Discussion and Conclusion
Lack of resources is a major hurdle in research for Urdu language texts. We explored many feature vectors generating techniques. Different classification algorithms of traditional machine learning and deep learning approaches are evaluated on these feature vectors. e purpose of performing many experiments on various feature vector generating techniques was to develop the most efficient and generic model of multiclass event classification for Urdu language text.
Word embedding feature generating technique is considered an efficient and powerful technique for text analysis. Word2Vector (W2Vec) feature vectors can be generated by pretrained word embedding models or using dynamic parameters in embedding layers of deep neural networks. We performed sentence classification using pretrained word embedding models, one-hot-encoding, TF, TF-IDF, and dynamic embeddings. e results of the rest of the feature vector generating techniques are better than pretrained word embedding models.
Another argument in support of this conclusion is that only a few pretrained word embedding models exist for Urdu language texts. ese models are trained on considerable tokens and domain-specific Urdu text. ere is a need to develop generic word embedding models for the Urdu language on a large corpus. CNN and RNN (LSTM) singlelayer architecture and multilayer architecture do not affect the performance of the proposed system.
Experimental results are the vivid depiction that the one-hot-encoding method is better than the word embedding model and pretrained word embedding model. However, among all mentioned (see Section 5.2) feature generating techniques, TF-IDF outperformed. It showed the highest accuracy (84%) by using DNN deep learning classifier, while event classification on an imbalance dataset of multiclass events for Urdu language using traditional machine learning classifiers showed considerable performance but lower than deep learning models. Deep learning algorithms, i.e., CNN, DNN, and RNN, are preferable over traditional machine learning algorithms, because there is no need for a domain expert to find relevant features in deep learning like traditional machine learning. DNN and RNN outperformed among all other classifiers and showed overall 84% and 81% accuracy, respectively, for the twelve classes of events. Comparatively, the performance of CNN and RNN is better than Naïve Bayes and SVM.
Multiclass event classification at the sentence level performed on an imbalance dataset; events that are having a low number of instances for a specific class affect the overall performance of the classifiers. We can improve the performance by balancing the instances of each class. e following can be concluded: (1) Pretrained word embedding models are suitable only for sentence classification if pretrained models are developed by an immense amount of textual data (2) Existing word embedding models Word2Vec and Glove that were developed for the English language text are incompatible for Urdu language text (3) In our case, TF-IDF, one-hot-encoding, and dynamic embedding layer are better feature generating  techniques as compared to pre-existing Urdu language text word embedding models (4) e TF-IDF-based feature vectors showed the highest results, as compared to one-hot-encodingand dynamic word embedding-based feature vectors (5) Imbalance number of instances in the dataset affected the overall accuracy

Future Work
In a comprehensive review of Urdu literature, we found only a few numbers of referential works related to Urdu text processing. e main hurdle in Urdu exploration is the unavailability of the processing resources, i.e., event dataset, close-domain part-of-speech tagger, lexicons, annotators, and other supporting tools.
ere are a lot of tasks that can be accomplished for Urdu language text in the future. Some of those are mentioned as follows: (1) Generic word embedding models can be developed for a large corpus of Urdu language text (2) Different deep learning classifiers can be evaluated, i.e., BERT and ANN (3) Event classification can be performed at the document level (4) A balance dataset can be used for better results (5) Multilabel event classification can be performed in the future (6) Unstructured data of Urdu text can be classified into different event classes (7) Classification of events for the Urdu language can be further performed for other domains of knowledge, i.e., literacy ratio, top trends, famous foods, and a religious event like Eid (8) Contextual information of sentence, i.e., presentence and postsentence information, certainly plays a vital role in enhancing the performance accuracy of the classification model (9) Event classification can be performed on a balanced dataset (10) Unstructured Urdu data can be used for event classification (11) Classification can be performed at a document and phrase level Data Availability e data used to support this study are available at https:// github.com/unique-world/Multiclass-Event-Classification-Dataset.

Conflicts of Interest
e authors declare that there are no conflicts of interest.