Identification of Attack on Data Packets Using Rough Set Approach to Secure End to End Communication

,


Introduction
e information technology played a significant role in real life and has made the life stress-free. With the advancements and developments of information technology, security has been considered as one of the main fears for interaction and communication [1][2][3][4][5][6][7][8][9][10]. Within the last decade, the information security attacks become upraised, and hackers tried to apprehend the origination significant information for their personal profits. is kind of attack on information and network can severely put the proprietor of network and information into huge damage. Security of network and information of an organization is extremely reliant on diverse forms of information of the organization. Security has become one of the significant factors for any communication and transmission of data. Different smart devices are linked to process, communicate, monitor, and compute various real-time developments.
rough the interaction and communication, if an eavesdropping occurs, then it will lead to a severe damage of the whole networks, and the data will be controlled by immoral spiteful users. Identification of attack is a way to identify the security violations and analyze the measures in a computer network.
In modern day industry of information communication, the devices are connected through the Internet of ings (IoT) forming a network of communication. Due to the challenges of security and privacy, the idea of IoT came into existence. e reason was that the conventional protocol of security does not support security of the IoT devices. Researchers have proposed the use of diverse approaches and measures of security for securing the information and their communication and to further secure the network [10][11][12][13][14][15][16][17][18].
ese measures include logical access, firewalls, identification, control, authentication, and encryption and decryption. Building a complete security system is hard to accomplish, and not any of these measures of security only can secure the network communication [11,18]. An identification system, which is accurate and effective for malicious activities or intrusion detection, can further secure the prevailing system for secure and smooth communication of end to end nodes. e role of information security is to design and protect the entire data of networks and to maintain its confidentiality, integrity, and availability for their right users. erefore, there is a need for end to end security management, which will confirm the security and privacy of the network and will save the data inside networks from malicious users. As the number of devices connected to a network is growing with the passage of time. So, the level of threats is also increasing for these devices.
To overcome the problem of the severity of security, the planned study has implemented the approach of rough set, which is a mathematical tool that can deal with the situation of uncertainty for the identification of attack for data packets and ensure the secure communication of end to end nodes. e dataset available at Kaggle [19] "Attack Prediction on Data Packets" was used to validate the research. e experimental results of the study show the acceptance and success of the study in the detection of attacks on packets of data. e organization of the paper is as follows; section 2 shows the literature and related work to the current research, in particular, identification of attacks on data packet. Section 3 briefly presents the library-based analysis of the available literature from diverse viewpoints in the utmost widespread libraries and the applications of rough set to the proposed study. e experimental results and discussions are given in section 4. e paper is concluded in section 5.

Related Work
Researchers are trying to come across diverse techniques, approaches, tools, and solutions for the identification and prevention of security attacks. Kotenko and Chechulin [20] offered a security evaluation framework and attack modelling in event management and security information system. Subsorn and Limwiriyakul [21] observed the Internet banking security of 16 Australian banks for identifying the insufficiencies, which were possibly disturbing the bank customers' confidentiality. Besides, the research examined 12 ai commercial banks and matched the experimental results with the prior research studies. Kotenko and Chechulin [22] planned an approach to the computer modelling attack and security evaluation to comprehend the event management and security information system. e authors planned a quantitative method for information systems security risk, which is systematic, modular, and extendable. e research aimed to efficiently estimate the threats of security in a wide-ranging manner [9]. Manjiatahsien et al. [23] elaborated a summary of the architecture of IoT with  comprehensive details on machine learning algorithms and  implications of the IoT security with various attack types. e research presented an approach of the related factors of information management for the organization of information security. Initially, 136 articles were surveyed for identifying the factors of information security, and then a sequence of interviews were performed with 19 experts of the industry for evaluating the association of these factors. In the third step, a comprehensive model was developed [24]. e detection and measurement of security have important role in the areas such as IoT in smart city. e authors [25] accompanied a comprehensive review of the literature of deep learning, IoT security, and technologies of big data. Zhang et al. [7] suggested a method for crowed measuring the trustworthiness and security of open social networks on the basis on signaling theory. e devices of IoT functioning in the environment of healthcare are vulnerable to numerous attacks and cyber threats. e industry of healthcare faces 340% issues of security, and it is 200% extra vulnerable to theft of data [26]. Accordingly, over 90% of enterprises have faced security breach [27]. Additionally, the research recommended that there is typical of 164 cyber threats identified per 1,000 linked host devices in the system of Internet of Medical ings [28]. e devices of IoMT are organized in a system deprived of considering the security, and this is the key motive that these devices grieve from availability, integrity, and confidentiality issues [29]. ese susceptibilities permit the cyber-criminals to acquire access into the network of the IoMT and attain the sensitive and particular data regarding the patient. e key issues confronted by the devices of IoMT are privacy and security. Jhonson and Jhonson presented that the devices of the IoMT such as digital insulins are susceptible to cyber threats [30].
In the system of IoHT, the data applicable to the patient is stored in the cloud and is transformed back and forward from side to side through millions of devices of the IoT, and thus it spawns the susceptibility to data in their application. Due to the susceptibility, many enterprises may not agree to store the applications of IoT on the cloud. us, risk evaluation is required preceding the storage of such applications on the cloud and for devices of mobile mounting the applications of IoT [31]. Occasionally, decision-making about the choice of top security for the devices of the IoHT is an issue due to numerous factors involving such growing multifaceted measures relating to security, vast amount of heterogeneous devices of IoT, processing limitations, and capability of memory of such devices. Considering these situations, deficiency of appropriate security measures and criteria is not a worthy style.
e study [5] offered a comprehensive summary of security belongings analysis of ML algorithms. ey examined the security of ML to build up an outline for diverse areas of researches. e attack approaches and conferring the approaches of defense alongside them were done. e research offered an outline of the strengths and flaws of the existing assessment approaches for security and usability of the websites of E-commerce. e assessment models from 2000 to 2018 have been studied for E-commerce [32]. Mao et al. [33] planned a security dependency system for measuring the implications of security system from extensive viewpoints of the system. Nazir et al. [10] devised an approach to assess software components security through the analytic network process. ey elaborated that the technique is effective and efficient in circumstances of complexity where the dependencies exist amongst diverse network nodes. e appearance of deep learning has transformed the field of research to support practitioners and researchers with programmed feature extraction capabilities [34]. ese programmed feature extraction abilities not only allow the practitioners and researchers to get free of the difficulty of picking important feature extraction approach pertinent to a certain issue, but also confirm the huge recognition rate related to conformist algorithms of classification. Schuster and Paliwal [35] suggested the Bidirectional Recur-rent Neural Network after scheming RNN in a forward and a backward direction.
is is acceptable preserving lengthy series situation information about past and future through Bidirectional Long Short Term Memory [36]. e grouping of attributes of BRNN and LSTM models is collectively known as BLSTM.

Research Method
In the recent few years, attacks on information and network security grew. e intruders are endeavoring to take significant information from the organizations for their use and profits. ese security attacks on networks and their data can severely place the proprietor of network and information into huge damage. e organization security of information is extremely reliant on diverse sorts of information of the organization. Security has become one of the significant factors for any communication and transmission of data. Different smart devices are linked to process, communicate, monitor, and compute various real-time developments. During communication and transmission, if any intrusion or eavesdropping occurs, then it is harmful for the owners of information network and can then lead to a thoughtful mutilation of the whole network, and the data will be controlled by wrong malicious users. Traditionally, the computing security is always trusting the approaches and mechanisms of authentication and access control. ese provide access to authorized users. Pervasive and ubiquitous computing is very flexible and scalable due to which it is not suitable to adopt such services.
e information security provides services through its platforms in the form of pervasive and ubiquitous computing, which is one of the advanced paradigms of information security. Pervasive computing plays an important role in the area where it delivers capability to allocate computational facilities to the atmospheres where people work and lead to make concerns like identity, privacy, and trust. e key benefits of pervasive computing are the design and development of services that are efficient to the users, who send query as request for the services and in situations from which the service request is sent. Figure 1 depicts flowchart for conducting the study. e contribution of the proposed study is to adopt a rough set approach. A mathematical tool for dealing with uncertainty arises in the detection of attacks on data packets. e dataset available online at Kaggle was used to validate the research process. e approach demonstrates the accomplishment in attack identification of data packets for secure communication of end to end nodes. e figure consists of different phases. In the first phase, the existing literature is studied in order to show the related work in the area. In the next phase, the details of the approaches used in the area are given. In the third phase, the dataset was identified for conducting experiments of the proposed study. In the fourth phase, the information (decision) table was designed from the dataset. In the fifth phase, the information table was inserted into the RSES tool in order to do the experiment on the dataset. In the sixth and final phase, the process of cancelation was done through the software in order to apply algorithms and obtain results. e following subsections show the research methodology in brief.

Search Strategies of the Existing Research.
e recent developments in IT sector have made the world move from huge systems and have a propensity to controlling and reduced devices for facilitating interfaces of heterogeneous and huge computational wireless communications. Security  Complexity 3 is a key part of a system for its smooth functionality. Diverse methods and approaches are used for securing the communication inside and outside the network. Identification of attack is a way to identify the security violations and analyze the measures in a computer network. e devices of IoT are always vulnerable due the situation in which they are operating. e associated IoT devices are mobile devices and could drop connectivity due to vulnerability of wireless outages. According to [37][38][39], the common IoT vulnerabilities are identified and presented in Figure 2.
e strategy of search was adopted to know about the existing literature in the field. For this purpose, the popular libraries in the field such as ACM, IEEE, Sciencedirect, and Springer were searched. e data from these libraries were collected in the form of year of publication, type of publication, the areas of publication, media of publication, and other types of studies that are given in figures and tables. Figure 3 presents magazine/journal names and articles published in the library of ACM. e search process shows several materials in different forms including conference papers, journal papers, books, and many other online materials. e purpose of these materials was to show the background knowledge in the area of security for end to end communication. Figure 4 presents the number of publications along with the media format in which the paper is published. ese categories include PDF, image, HTML, video, and other formats. Figure 5 shows all the publications along with the total number of papers published for the search process in the ACM library. ese publications are in the form of journal, conference, book, and other types. e publications categories were further elaborated to show in depth details of the associated studies. Figure 6 presents the types of contents and number of articles published. e figure depicts that more papers were published in the form of research article followed by poster, and so on.
After the search process of ACM, the search was performed in the IEEE library. e reason behind searching in different libraries was to identify more details of the area in the widespread libraries. Figure 7 presents publication type along with the number of articles in the IEEE library. In this library, more papers were obtained in the form of conference papers followed by journal papers. Figure 8 presents the topics of publications along with the total number of articles published in the IEEE library. e figure shows different topics and areas in the field in which more work is done in the area of security of data followed by telecommunication security, and so on. e reasons behind the identification of these topics were to know which area is mostly researched. e search process was then performed in the Sciencedirect library for identifying further the studies published in the field. is library publishes quality papers in different areas of interest. Among the available libraries, more focus was given to this library. e reasons behind the search process in this library were to identify the related materials and the research done in the field and provide background knowledge. Figure 9 shows the total number of papers published in the given year. From the figure it is concluded that there will be an increase in the number of publications in the coming year regarding the identification of security threats. In the figure, it is shown that more papers are published in last two years, which shows that there is a significant work done in the area and still the work is going on. Figure 10 depicts the types of articles along with the total number of articles published. ese types of article include the journal, conference, book, and others.
is step was performed to know about the type of papers in which more focus was given to indexed journal papers. Figure 11 shows the title of publication, and the reason of this was to know the number of publications in the area, which particular area has how much publications available online in the given title of publication. e figure shows that more work has been done in the area of computer networks followed by computer and security, then computer communication, and so on. e identification of different areas of security, type of publications, year of publication, and the publisher was quite tricky process. is was done for ensuring that most related materials in the areas should be identified to support the current study. For these reasons, different types of search mechanisms were adopted. Most of the process was done manually by the authors for ensuing that no related materials should be missed, although some of the materials that are relevant to the current study may be missed. Figure 11 presents the publication title along with the number of articles published.
At last, the library of Springer was searched to view associated materials published related to the proposed research. Figure 12 presents the types of contents along with the number of articles in the Springer library. Figure 13 presents the disciplines in which the papers are published along with the number of articles published.   Table 1 shows some of the subdisciplines in which the papers are published along with the total number of publications.

Rough Set Approach for Identification of Attack on Data
Packets. ML algorithms are being in use for the identification of intrusion disturbing the organizations or its system [2,6,[40][41][42]. In this paper, a rough set approach is used to identify the attack on data packets. e rough set approach works very well in situations of uncertainty by plotting the    upper and lower approximations. e rough set approach is a combination of rules made up of associated features. e obtainable model consists of "IF THEN rules." e rough set was presented by Pawlak in 1982 [43]. It has specific lower and upper approximation boundary areas. Rough sets can be mathematically presented as follows [44]: (1) Figure 14 presents the rough set concept. Figure 15 presents the rough set theory workflow and its application. e main parts in the workflow are described in this section. e above workflow of the proposed research has been implemented by the RSES software [45]. e library of RSES is      6 Complexity a well-known implementation of rough set theory process. e experimental process of the proposed study has followed the process that is common to use in rough sets for data analysis. Rough set and fuzzy rough set theories are based on some preliminary parts [46,47]. e details are given as follows.

Indiscernibility Relation.
e indiscernibility relation shows to which extent two objects are similar. In RST, if the information system (U, A) for any B⊆A, the equivalence relation R B is defined by a(y) , then x and y have the same values of the attributes in B, while in FRST the R is a fuzzy tolerance relation in U, where R satisfies Reflexivity: ∀x ∈ U, R(x, x) � 1 Symmetry: ∀x, y ∈ U, R(x, y) � R(y, x) τ − transitivity: ∀x, y,z ∈ U, τ (R(x, y), R(y, z))≤ R(x, z), where R is fuzzy t-equivalence relation.

Approximations.
e indiscernibility relation is used in the definition of lower and upper approximations. Given B⊆A, X⊆U can be approximated using the information in B by constructing B-lower and B-upper approximations of X;   In FRST, the following Radzikowska and Kerre crisp [48] lower and upper approximations are generalized by means of an implicator τand t-norm τ. Fuzzy indiscernibility relation R a and a fuzzy set X in U:

Regions and
And the B-boundary region e degree of dependency of the decision attribute d on the set of conditional attributes B can be computed by

Discernibility Matrix.
In RST, the information system (U, A) and the discernibility matrix M (A) are a symmetric n × n matrix whose elements (C ij ) are defined as C ij � a ∈ A: a x i ≠ a x j for i, j � 1, . . . , n.
In RST the discernibility matrix can be defined as (1) Decision/Information Table.
An information system is denoted as IS � (U, A) where U is the universe of nonempty finite set, and A is attribute of nonfinite set. (3) Cut and Discretization. A Cut mostly appears in the context of discretization process. e discretization is a   Complexity process of grouping the data attributes for the calculation cuts and continuous variables, converting them into discrete attributes [49].
(4) Rules Generation. After constructing reduct sets, rules are generated in the form of (IF C THEN D), where "C" is the condition, and "D" is the decision value.  objects of 8459. For cross validation purposes, the Decision rules and Decomposition tree algorithms were used. e DR algorithm shows accuracy of 59.1%, while the DT shows accuracy of 61.5%. e comparison of all these algorithms is shown in Figure 16.
After evaluating the proposed model for various parametric measures such as accuracy, F-measure, specificity, precision, recall, and miss-classification rate, it was concluded that the KNN performs very well among other classification algorithms as depicted in Figure 16. e other two generic algorithms, decision trees and decision rules, are used to check the applicability of the KNN-based recognition model. Figure 17 shows the experimental results of the KNN model.
(1) Decision trees- Figure 18 shows the recognition capabilities of the decision trees based model based on different performance measures. (2) Decision rules- Figure 19 shows the recognition capabilities of the decision trees based model based on different performance measures.

Conclusion
Information security is considered to be one of the important factors for any network and information communication. An organization or system of the organization with optimum security can lead to a successful business and can earn huge profit on the business they are doing. With the passage of time, the developments in information security are rising. Protecting the data and information inside the network becomes a challenging task for practitioners and researchers. To tackle such issues, an efficient and accurate mechanism is the dire need of modern day information industry. e proposed research presents the detection of data packets attack through the use of the rough set theory for security purposes of data packets and communication. e experimental work was performed by the RSES tool, and the results of the proposed study show that the research is capable of detecting data packets attack.

Data Availability
No data are available.

Conflicts of Interest
e authors declare no conflicts of interest.