A Novel Approach for Malicious URL Detection Based on the Joint Model

,e number of malicious websites is increasing yearly, and many companies and individuals worldwide have suffered losses. ,erefore, the detection of malicious websites is a task that needs continuous development. In this study, a joint neural network algorithm model combining the attention mechanism, bidirectional independent recurrent neural network (Bi-IndRNN), and capsule network (CapsNet) is proposed. ,e word vector tool word2vec trains the characterand word-level uniform resource locator (URL) static embedding vector features. At the same time, the algorithm will also extract texture fingerprint features that can compare the content differences of different malicious web URL binary files. ,en, the extracted features are fused and input into the joint neural network algorithm model. First, the multihead attention mechanism is used to extract contextual semantic features by adjusting weights and Bi-IndRNN. Second, CapsNet with dynamic routing is used to extract deep semantic information. Finally, the sigmoid classifier is used for classification.,is study uses different methods from different angles to extract more comprehensive features. From the experimental results, the method proposed in this study improves the classification accuracy of malicious web page detection compared with other researchers.


Introduction
With the continuous improvement of the network environment, Internet applications have penetrated deeply into all aspects of life. Simultaneously, the vast Internet applications group also attracted many network attacks to make a profit through malware, spam, and phishing websites. According to the Check Point's report in 2020 [1], more than 100,000 malicious websites are used to steal users' personal information or cause damage to users' systems every day around the world. Kaspersky's report [2] stated that the number of malicious URLs identified by web antivirus components in 2020 was 173 million. Besides, the report also mentioned that malicious URLs accounted for 66.07% of the 20 most active malicious programs. With the emergence of more and more malicious websites, more and more individuals and companies will suffer immeasurable losses worldwide. e web page represented by the malicious URL contains malicious interactive code, such as HTML tags [3], Java-Script (JS) [4], and Cascading Style Sheets (CSS) [5]. e attacker writes the source code containing malicious JS tags into the website, and thereby, the malicious code is executed while the user is visiting the website. For example, a remote download program is executed in the background when a user clicks on an advertisement implanted with malicious code by a hacker. e user terminal is finally controlled to collect user personal information. In addition, phishing websites are also the main battlefield for malicious URLs. e attacker establishes an illegal site and leads users into malicious web pages through inducements and other means to complete malicious acts such as network fraud. To dispel the user's precautionary psychology, the attacker will construct these websites very similar to the legitimate website, indistinguishable by the human eye. Accelerating the development of malicious URL detection has become an essential task of network security in such a network environment.
So far, predecessors have proposed a lot of malicious URL detection methods. In previous research on malicious website detection, researchers usually manually extract one or more of the following features: web content feature HTML, JavaScript code, host information feature WHOIS, lightweight feature web URL, and visualization features, and then input them into the machine learning or heuristic learning system to detect malicious websites. For example, Kumar et al. [6] used an HTML parser and JavaScript simulator to extract web content features and input them into a heuristic system. Chu et al. [7] used domain-related information as the main feature and used machine learning for detection research. However, the feature engineering of machine learning technology is more cumbersome and relies on the subjective judgment of researchers. e emergence of deep learning has solved this well. Ren et al. [8] extracted the word embedding of the URL character to identify malicious URLs effectively. Peng et al. [9] added texture fingerprint features based on extracting URL and host information and then used a deep learning model for detection research. is study only focuses on URL features and uses deep learning techniques to detect and research malicious websites.
Designers generally design URLs as meaningful words to facilitate memory, and some meaningless words usually convey information in their character sequence. erefore, we use word embedding and character embedding technology to extract the semantic features of URLs. Since the URLs generated by the same tool or organization have similar structures, we also extracted the URL texture fingerprint features (Section 3). erefore, a joint neural network algorithm model was proposed to capture URL features. First, the attention mechanism is used to give higher weight to key features. Second, we used an improved independently recurrent neural network (IndRNN) [10] called the bidirectional IndRNN (Bi-IndRNN) model to encode the fusion feature information. Finally, the CapsNet is to extract highlevel semantic features. rough experiments, it is found that the stacked CapsNet has made significant progress, and the joint model is a precious exploration. e innovations of this study are summarized as follows: (1) We constructed a joint neural network algorithm model that combines the attention mechanism, Bi-IndRNN, and CapsNet for malicious URL detection (2) To obtain more specific and natural features, we have integrated different malicious URL feature information to extract combined semantic and image information (3) A series of comparative experiments show that the joint model proposed in this study achieves better performance than some state-of-the-art methods e subsequent chapters of this study are organized as follows. Section 2 introduces the contributions of previous researchers to malicious URLs, Section 3 introduces the details of the proposed method, Section 4 explains the experimental results and analysis, and Section 5 summarizes this study.

Related Work
e main aim of malicious URL detection is to distinguish malicious URLs from benign URLs. Previous researchers proposed the methods for the problem of malicious URL detection which are mainly divided into the following categories: blacklist-, rules-, machine learning-, and deep learning-based detection.

Blacklist.
e method based on the blacklist is to detect malicious websites, mark them, and store them in the database, which contains relevant information of the malicious URL. A global distributed URL blacklist service system based on P2P technology is proposed [11]. Contributors share the blacklist information on storage nodes, and the client uses a plug-in form to ensure the user's normal browsing experience. Fukushima et al. [12] proposed a blacklist system based on the reputations of the IP address block and registrars used by attackers. To discover more malicious websites actively, some researchers have proposed methods to expand the blacklist by analyzing the features of malicious websites. Akiyama et al. [13] used existing malicious URL search structure neighborhoods to find unknown malicious websites and verify them to expand the URL blacklist. Prakash et al. [14] proposed a prediction system composed of multiple heuristic components to generate new URLs. en, regular expressions and hash maps are used to approximately match the URL to verify whether it is malicious. Compared to passively submitting URLs to the blacklist, this method can discover and verify malicious URLs from the same malicious source, but the limitations are also apparent. It is impossible to find newly emerging malicious domains, that is, there is no better generalization ability. Although the blacklist-based approach is easy to operate, data storage and update will face challenges when malicious websites are added with a considerable amount every day.

Rule Matching.
Researchers have proposed a rule matching method to solve the above problem, which uses some features to formulate rules to filter malicious URLs. Cao et al. [15] proposed a rule matching method called Automated Individual Whitelist (AIWL), which automatically uses the Naive Bayes classifier to operate and maintain a list of the login user interface (LUI) that users are familiar with. is detection method will warn users when they visit untrusted websites or submit confidential information to these websites. Nguyen et al. [16] proposed a system that calculated six heuristic values similar to the Levenshtein distance between the domain name and the Google search engine spelling suggestion and weighted and added these values to determine whether it is a phishing website based on the threshold. Liu and Zhang [17] proposed a two-wheeled phishing page check method. e first round checks the domain name, URL, and e-mail of the current page, and if it exceeds the threshold, it is directly identified as a phishing page. If it does not exceed the second round, the password, link, and picture are checked. If all the checks do not exceed the threshold, it is a regular page. However, this method is only used in the financial field. Shekokar et al. [18] proposed a two-stage phishing page detection scheme. e first stage uses the LinkGuard algorithm to analyze the difference between visual links (links rendered by the browser) and actual links (hidden in HTML). e second stage compares the similarity between suspicious web page snapshots and legitimate web pages by calculating the discrete cosine transform. Although this method does not need to maintain a vast database of malicious websites, it cannot detect unknown malicious URLs because the establishment of rules relies on existing malicious URLs. Moreover, it requires much subjective experience to analyze malicious web pages. e rule-based approach can find some more obvious malicious websites. Nowadays, the features of malicious web pages are diversified, and many rule-based methods are helpless.

Machine Learning.
As big data become more and more popular, machine learning with generalization and resistance to actual attacks has become the mainstream detection method for malicious URLs. To implement a selflearning model, researchers must have enough malicious website data. Furthermore, the known sites are used to train the algorithm model, and the unknown sites are classified through the already trained algorithms model. After these steps, the model will have specific dynamic detection capabilities. Shahrivari et al. [19] proposed a method that used feature engineering to construct a dataset that extracts 30 features from URL, web page content, and host information; then, 12 machine learning methods such as random forest and decision tree are used to detect phishing websites. Crisan et al. [20] used word embedding to represent URL information and increase the performance of naive Bayes, logistic regression, and SVM models by adding general domain-specific features. is method abandons the selection of features from complex page content and simplifies the data processing process. However, machine learning methods require much functional design. Once these functions are known to malicious website designers, it is easy to bypass these security settings. Singhal et al. [21] used machine learning to classify malicious websites and proposed concept drift detection to find the difference in data distribution between the feature vectors of the old training dataset and the newly collected dataset.
e purpose is to prevent attackers bypass the detection rules by changing the URL after realizing that the feature is extracted from the URL. e methods proposed by Eshete et al. [22] use machine learning algorithms for training and customize the corresponding algorithms to further improve the generalization ability of the method. First, seven machine learning algorithms are trained by extracting 39 features in three categories: URL, pagesource, and social reputation. en, the web page category is determined by the confidence-weighted majority vote classification algorithm.
Although machine learning can improve detection accuracy and has a certain generalization ability, manually extracting features is still a time-consuming and labor-intensive task and can only extract shallow features.

Deep
Learning. Different from machine learning, deep learning can automatically extract high-dimensional features based on preprocessed data. After time verification, deep learning has also become the mainstream malicious URL detection method. e emergence of deep learning broke the deadlock of traditional machine learning algorithms. Deep learning can automatically extract features compared to machine learning's feature extraction method, which frees up the time of manual feature engineering. Wei et al. [23] proposed a method for malicious URL detection using the CNN. is method first extracts character-level features from the URL. en, the CNN is used to extract features and classification. Bahnsen et al. [24] proposed feature extraction and classification method of malicious URLs based on long and short-term memory networks.
is method analyzes 14 URL vocabulary features, such as subdomain length and URL entropy, to build feature engineering and LSTM for classification. e experimental results show that malicious webpage detection based on URL vocabulary features is more feasible than complete content analysis. Jiang et al. [25] proposed an online detection scheme based on a deep neural network to detect malicious URLs.
is method maps URL and DNS into vectors and then uses the CNN to extract malicious features and train a classification model automatically. Nevertheless, this model can also be fine-tuned to make the model predictions more accurate. Das et al. [26] compared the application of a simple RNN, simple LSTM, and CNN-LSTM architecture to malicious URL classification in their research. After comparing accuracy, precision, and recall rate, the performance of CNN-LSTM architecture is better than the other two. e enlightenment of this research is as follows. Different models have different ideas for feature extraction. It is advisable to optimize the process of feature extraction by fusing models.
In general, deep learning technology has significantly improved the performance of malicious URL detection. Our method can process data faster than previous research results, which is essential in applying malicious URL detection tasks. In addition, the fusion of texture fingerprint features enables the model to have the ability to process URLs with complex structures, and the fusion of features enables the model to have better recognition accuracy. Experiments results show that our method improves the performance of malicious URL detection and classification. Although machine learning can improve detection accuracy and has a certain generalization ability, manually extracting features is still time-consuming and laborintensive and can only extract shallow features.

Feature Analysis.
e malicious behavior of malicious websites is generally manifested in the URL and website content. However, the method proposed by [27] bypassed Security and Communication Networks the website content, directly used URL to extract features and classify them, and achieved good experimental results.
is study is inspired by this and only focuses on the website URL. To grasp the global features of malicious URLs, we extracted the texture fingerprint features of website URLs. However, this type of feature is only a superficial feature and does not fully reflect the essential attributes of URLs. So, we make a static analysis of the website URL to extract the semantic features of the website URL. In general, this study extracts two types of features: texture fingerprint features and semantic features.
3.1.1. Semantic Feature. By analyzing the malicious URLs published by PhishTank [28], Openphish [29], we found that the creators of some phishing websites usually imitate the content of regular pages and bind a similar domain name, such as "http://www.amazzonn.online." Besides, some special characters may be used to confuse users, such as "@" and "-". What is more, it confuses users by lengthening the string of meaningless characters or increasing the depth of the domain name (that is, the number of "."), such as "mlwdkaflzkpqccqdaxjuqlltyexdfcfuzufo-dot-cryptic-now-290917.ey.r.appspot.com." So, we extracted URL word features. First, symbolize the input URL and decompose the string into their constituent words. e symbolic description is shown in Figure 1.
To enable the symbolized data to be processed by the computer, it is necessary to embed the words obtained in the above steps and convert them into digital vectors containing the grammatical and semantic information of the words. e specific method is to embed the symbolized data into a V × D matrix and update it through backpropagation, where V represents the size of the vocabulary, and D is the dimension of word embedding. When we use word2vec to get the vectors of most words, meaningless words and symbols will confuse our model, so we also extracted URL character features. e process is similar to the process of word embedding. At this stage, we extracted two granular levels of embedding from the website URL: word level and character level.

Texture Fingerprint Feature.
We also extracted visual features from the URL. In the experiment of Wang et al. [30], it was concluded that the same malicious web page family has similarities in texture fingerprints. In previous studies, Su et al. [31] and Yang and Wen [32] have proved the validity of grayscale images for deep learning models. Inspired by these conclusions, the URLs were also converted into grayscale images. e two-dimensional texture fingerprint features in the range of 8-bit unsigned integers are converted into effective texture fingerprint features corresponding to the grayscale image's gray value range.
Specifically, as shown in Figure 2, read the original data in binary form, and use each 8-bit read as a basic unit (fill it with 0 if the last read is less than 8-bit). en, convert each basic unit to an unsigned integer, so that each integer value is guaranteed to be in the range of [0, 255]. Each integer is mapped to a grayscale image and represents the grayscale value of each pixel. "0" means pure white, and "255" means pure black. Finally, the gray value is stored in a fixed-width matrix.

Feature Fusion.
To further improve the accuracy of detection, the three features of character-, word-level embedding vector, and texture fingerprint features have been fused. Given a sequence W i represents the i th word, C i represents the i th character of W i , and P i represents the i th pixel of the grayscale image. e following formula can be used to express the joint vector: Where "[]" represents the vector cascade, and Emb(i) denotes the embedding of this i. After the features are fused, they are sent to the model for training and prediction.

Framework of the Model.
A deep learning framework for detecting malicious URLs based on Bi-IndRNN and Cap-sNet is proposed in this study. e main structure of the framework is shown in Figure 3. First, the displayable characters and words are embedded in the multidimensional feature space using character embedding and word embedding components. e texture fingerprint feature of the URL is simultaneously extracted. Subsequently, merge the selected features and input to the attention mechanism, which assign probability weights to the mixed features to obtain features with higher weights. Next, an improved IndRNN called Bi-IndRNN is used to extract features from long sequences. We input the features extracted by the attention mechanism into Bi-IndRNN. e features extracted by Bi-IndRNN are input into CapsNet to establish high-level feature information. Finally, the sigmoid classifier is used to calculate the probability.

Attention
Mechanism. e contribution of each joint vector to the feature expression of malicious URLs is different. As the attention mechanism can give higher weight to key features to highlight the impact of key features on downstream models, we stacked the attention mechanism on the top layer of the joint vector. Bahdanau et al. [33] first used the attention model in machine translation. e main task of the attention mechanism is to extract the most critical information for the model from a large number of given inputs by simulating the attention behavior of people to improve the efficiency of model training as much as possible while minimizing feature loss. At a macro level, the attention model can be understood as a mapping from a query to a series of key-value pairs. In essence, the attention mechanism is to perform a weighted summation of value. en, query and key are used to calculate the weight coefficient of the corresponding value.
In this study, the multihead attention is introduced to structure a subset of URL high-dimensional features. e multihead attention is also based on query, key, and value, represented by Q, K, V ∈ R n×d (n represents the number of URL features, and d represents the dimension of URL features), respectively, which will obtained by applying linear projections. Different from the general attention, multihead attention uses scaled dot-product attention to calculate the attention score. Given X � x 1 , x 2 , . . . , x n ∈ R n×d represents the URL fusion feature vector, x i represents the i th feature vector, and input the x i into the attention model:

Security and Communication Networks
e key of multihead attention is to use the above attention multiple times, and the number of "head" represents the number of times to perform the above attention. en, it should be calculated as follows to obtain the attention of all URL feature vectors: However, the linear projection of Q, K, and V calculated by each head is different. Take the multihead attention model of the i th head with h heads as an example: Where W Q , W K , W V ∈ R n×d/h . After h calculations, concatenate the calculation results: Finally, calculate the weighted sum of the input joint vector X and the obtained attention M to obtain the input of the next layer, the feature vector v. After the above calculations, we can determine which information is more critical when Bi-IndRNN processes the current task. Give this important information a higher weight, so as to obtain as much information as possible for the current task from the URL joint vector.

Bi-IndRNN.
Recurrent neural network (RNN) can effectively process data with sequence characteristics. However, the RNN training will face the problem of gradient disappearance and explosion due to long-distance dependence. As a variant of the RNN, long and short-term memory network (LSTM) can make it easier for the RNN to save information many steps ago, but it does not guarantee that gradients will not disappear or explode. In order to break through the situation at the time, Li et al. [10] proposed an independent recurrent neural network. is method effectively solves the problem of gradient disappearance and explosion because it can well apply ReLU and other nonlinear activation functions and can adjust the timebased gradient backpropagation. e IndRNN unit structure is shown in Figure 4. e hidden layer of IndRNN can be described as where W, u, and b represent the input weight, recurrent weight, and bias, respectively, ⊙ denotes the Hadamard product, and v t denotes the input vector. However, IndRNN can only obtain features through forwarding information when processing sequences to enable the model to integrate feature information better and have better modeling capabilities. e improved IndRNN called Bi-IndRNN has been used in this study. Bi-IndRNN is based on IndRNN and adds the idea of the bidirectional recurrent neural network (BRNN). at is, for each time t, the input will be given to two independent IndRNN units in the front and rear directions at the same time, and the output will be jointly determined by the two unidirectional IndRNN units. e joint vector is a description of semantic and visual information, including important text structure and the spatial position distribution between characters. In order to enable the content represented by the joint vector to have more robust information representation capabilities, we use the Bi-IndRNN model to extract features from the joint vector. Given an feature vector V � v 1 , v 2 , . . . , v t extracted from fusion feature by multihead attention, in the Bi-IndRNN we implemented, forward IndRNN h → t reads the feature sequence from v 1 to v t , and backward IndRNN h ← t reads the feature sequence from v t to v 1 . e hidden state expression can be expressed as follows: Next, we combine these two vectors as the output of Bi-IndRNN. In this way, each hidden state has information of the entire sequence, which is concentrated around the t th sequence of the input vector. en, the features vector extracted by Bi-IndRNN will be input into the capsule network to further extract deep features.

Capsule Network.
is study introduces the capsule network to establish advanced feature information. e extracted feature data can play a huge advantage when the capsule network is built on the top of the Bi-IndRNN layer. In order to solve some of the defects of the convolutional neural network to adapt to new deep learning tasks, Sabour et al. [34] proposed the capsule network in 2017. e capsule network is also a kind of neural network. e difference from the ordinary neural network is that the neurons of the capsule network are vectors instead of scalars. Each dimension of these vectors represents an attribute of the object. erefore, the capsule network retains the posture information and spatial relationships between objects to the greatest extent. As part of the overall model, the structure of the capsule network is shown in Figure 3. First, input the features extracted from Bi-IndRNN to a standard convolution layer. e convolution operation is as follows: where ∘ is the element-wise multiplication, b denotes the bias, and W 1 ∈ R K×d denotes the convolutional filter, where the size of the convolutional filter is denoted by K × d. at means the convolution operation is to slide the filter on a given input to extract features and collect them in a feature map.
Next is the capsule layer, which converts the feature map into a capsule through a group-convolution operation: where V denotes the dimension of a capsule vector, Z i represents the i th dimension capsule vector, and function g means the nonlinear squash function, which expressed by the following formula: Each capsule in the i th layer in the network needs to predict the output u of the (i + 1) layer capsule separately: en, calculate the weighted sum of all prediction vectors to get the high-level capsule v: where c i is the coupling coefficient obtained by the dynamic routing algorithm. e capsule network can retain the most valuable information to the greatest extent and then save it completely and submit it to the upper capsule. Finally, input the result obtained into the sigmoid classifier to get the final probability. So far, our model can complete the detection of malicious URL. Table 1

Dataset.
e dataset in this study consists of benign and malicious instances. We obtained a collection of benign URLs from the top rankings of Alexa verified by Google Safe Browsing, and the collection of malicious URLs was obtained from public websites, such as host-file.net and phishtank.com. In the end, 32,378 benign URLs and 33,549 malicious URLs were obtained.

Evaluation Indicators.
We use five-fold cross-validation. e dataset is equally divided into five parts, four parts of which are used as training data and 1 part used as test data, and experiments are carried out in turn. Accuracy (ACC), precision (P), recall (R), and F score (F) are used to evaluate the classification results. Before the evaluation, it is necessary to count the number of experimental results correctly classified as malicious (TP) and benign (TN) samples and the number of incorrectly classified as malicious (FP) and benign (FN) samples. e evaluation is calculated as follows:

e Influence of Model Parameters on Experimental
Results. In the model training process, we found that the influence of model parameters on experimental results is Security and Communication Networks quite apparent. Appropriate parameter settings will have a positive effect on model training and classification results. To determine these parameters and obtain the optimal classification results, we test these variable parameters, such as feature types, feature dimensions, under the same dataset, and determine the optimal parameters based on the evaluation indicators.
To determine which feature type to use to have the best classification performance, we first use three types of features: character embedding, word embedding, and texture fingerprint features to test separately and then combine these three features for testing. e results are given in Table 2. It can be concluded that using character and word embedding alone for classification can have good performance, reaching 99.82% and 99.89% recall rates, respectively. In contrast, the performance of the texture fingerprint classifier is slightly weaker, reaching a recall rate of 97.48%. It can also be concluded from the table that although the use of character embedding features can get good results with an accuracy of 99.74%, the method of combining the three features has a stable improvement in various evaluation indicators. e dimension of the feature vector also has a specific impact on the experimental results. We used different dimensions as variable parameters to determine the feature dimensions and divided them into six groups of varying feature dimensions for comparison. As shown in Figure 5, in these six sets of experiments, ten feature dimensions are added each time the next set of experiments is performed. When the feature dimension increased from 90 to 130, all evaluation indicators increased, but the results obtained by continuing to increase the feature dimension are not ideal. When the dimensionality increases from 130 to 140, all other evaluation indicators decrease except for the slight recall rate increase, and the precision decrease is more pronounced. From this, we have determined that the dimension of the feature is 130.

e Necessity of Model Components.
In this part, several sets of experiments are designed to verify the effectiveness of each part of the model. After comparing the three attention mechanisms, we found that these attentions have good performance. As shown in Figure 6, the accuracy of self-attention, hierarchical attention, and multihead attention can reach 99.75%, 99.67%, and 99.78%, respectively. However, the multihead attention mechanism has improved significantly in various evaluation indicators, and the recall rate can reach 99.90%, which is helpful for the detection and classification of malicious URLs. So, the multihead attention had been used as a component of the model in this study.
Our model combined the attention mechanism, Bi-IndRNN, and CapsNet components. In order to verify the effectiveness of each component, three other models were designed. e three groups of models in the table are    Such a comparative experiment allows us to see the contribution of each component in the model. As given in Table 3, the AIR model without CapsNet is lower in accuracy, precision, and recall rate than the model used in this study, which shows the effectiveness of the capsule network. For ACaps without IndRNN, the result is similar to the AIR model, and all evaluation indicators are also lower than our model. In addition, the IRCaps model removes the attention mechanism, and the result is as we expected. e model performance is not as good as our model due to the inability to select more helpful information, which also illustrates the necessity of the attention mechanism.
In order to verify that our proposed model is more suitable for malicious URL detection and classification, a set of experiments had been designed to compare with methods by using other deep learning models.
e experimental results are shown in Table 4. In this experiment, we fixed the hyperparameters and input the same data set into different models under the same experimental environment to verify the improvement of our model. ese methods have good performance for malicious URL detection and classification. Wanda and Jiang [35] also use character embedding technology and a single CNN architecture to extract features and classify, with a precision of 99.7%. Nevertheless, it is slightly inferior in accuracy and F value. Bahnsen et al. [24] and Liang et al. [36] used LSTM and Bi-LSTM models, respectively, and it can be concluded that the Bi-LSTM model with an accuracy of 99.74% performs slightly better than LSTM. In Wang's [37] Table 5, and the time cost experimental results are given in Table 6. It can be known from the experimental results that the classic model can also achieve good results in a short time. For example, an attention-based Bi-LSTM model called AB-LSTM proposed in [8] can get 99.69% of the test accuracy. In pursuit of higher accuracy, researchers have proposed more complex models to detect malicious URLs. e TException method proposed in [39] uses multiple TException Blocks composed of 1d convolutional, batch normalization, Maxpooling, ReLU layer, and deep neural network (DNN) layers to perform feature processing on character-level and wordlevel URLs. is method uses multiple batch normalization layers to speed up the training, but this will also reduce the expression ability of the subsequent activation function, resulting in a limited improvement in accuracy. Both the attention-based CNN-LSTM (ATT-CNN-LSTM) method proposed in [40] and the CNN and attention-based hierarchical RNN (ATT-CNN-HRNN) method proposed in [41] combine CNN and RNN related methods, which can effectively extract relevant features and achieve malicious URL detection. It can be seen from Table 2 that compared with other new methods, our method does require a longer time in the training of a single epoch, which is caused by the routing protocol algorithm in the internal loop of the capsule network. However, the smaller training parameters make our method converge faster, have the same order of magnitude total training time as other advanced algorithms, and have higher test accuracy.

Conclusions
is study proposed a joint neural network algorithm model combining the attention-based bidirectional independent recurrent network (Bi-IndRNN) and capsule network (CapsNet) to identify and detect malicious URLs. It can be concluded from the experiment that the performance of this method to detect malicious URLs is significantly better than these of a single deep neural network and a shallow neural network. e key to this study is to use the generated word vector model word2vec to train to obtain URL words and character vector features, extract the texture fingerprint features of the URL, and fuse the three features. en, extract key features are based on the weight of the multihead attention mechanism and Bi-IndRNN, and finally, use the capsule network to build high-dimensional features and classify them. Besides, in the same experimental environment, we compared different feature types and dimensions, different model components, and algorithm models. In summary, the method proposed in this article can effectively improve the detection efficiency and accuracy of malicious URLs.
It can be improved, although the method in this study had performed well. In the follow-up process, we will consider integrating dynamic and static features to verify its effectiveness. At the same time, we will continue to update the model, integrate new components into the system, and optimize the time cost of the model to achieve a more excellent method.

Data Availability
e data used to support the findings of this study have been deposited in the GitHub repository (https://github.com/ yipeng-liu-rep/malicious-url-data).

Conflicts of Interest
e authors declare that there are no conflicts of interest.