Intrusion Detection Using Machine Learning for Risk Mitigation in IoT-Enabled Smart Irrigation in Smart Farming

,


Introduction
Agriculture is very important to the country's economic well-being because it provides food for everyone. One of the most important things that happened in the country is linked to it. If a country has a lot of farmers, it is thought to be both economically and socially wealthy. Agriculture is the main source of jobs in most countries. When there are a lot of people on a big farm, they often need help with planting and caring for the animals. ese large farms can use nearby processing facilities to finish and improve their agricultural goods [1]. As human civilization has progressed, there have been big changes in agricultural output. ese changes have made it possible to use less resources and carry out less work.
Despite this, demand and supply have never been able to meet due to the high population density. In 2050, the world's population is expected to grow to 9.8 billion people, a 25 percent increase over the current amount [1].
ere is a strong likelihood that the majority of population growth will occur in developing countries [2].
Despite this, 70 percent of the world's population is anticipated to live in cities by 2050, up from 49 percent now [3]. In addition, as incomes rise, so will the need for food, particularly in developing countries. As a result, these countries will become more aware of the quality of their food and diet. As a result, consumers' tastes may move away from grains and cereals toward legumes and, eventually, meat.
In agriculture, water is a valuable yet finite natural resource [4][5][6]. In a country like India, a considerable amount of water is used for irrigation [7]. Many environmental conditions influence crop productivity, including air temperature, soil temperature, and humidity [8]. Crop irrigation is an important component in affecting crop output [8]. Farmers rely substantially on human supervision and experience for harvesting fields [9]. e water supply for the field must be preserved [10]. In today's globe, water scarcity is a big issue. It is already a problem for people all across the world [11,12]. e scenario could get even worse in the coming years.
Smart farming is a term that refers to a well-known and better way to run a farm that has become more common in modern farming. Agricultural and information technologies are used to keep an eye on the health and production of crops. is includes keeping an eye on field crop conditions and other indicators [13,14]. Finally, the goal of smart farming is to cut the cost of agricultural inputs while still keeping the quality of the end product the same. If you use a lot of pesticide or fertilizer at the same time, the whole field is treated as a single unit. e principal sources of natural water resources are rainwater, subterranean water, and surface water. 96.5 percent of the world's water is found in the oceans. Only 0.001 percent of the remaining water on the globe can be found in clouds, mist, and precipitation, which is 1.7 percent of the world's total water supply. e majority of the world's surface water is found in the ocean, which is made up of salt water. erefore, there is a shortage of fresh water in most of the countries around the world. e survival of all ecosystems depends on the availability of fresh water. According to the World Resources Institute (WRI), most of the countries are expected to face water scarcity in upcoming years [15].
ere is a considerable impact on downstream ecosystems due to the vast majority of freshwater being used for agricultural and industrial uses. ere is a need to utilize fresh water in such a way that the upcoming generations do not get affected by the scarcity of fresh water. e soil contains a variety of soil types, including sandy, salty, and clayey soils. Each type of soil has its own advantages and disadvantages. A good example of this is sandy soil, which has a high capacity for drainage. Drainage, on the other hand, swiftly removes nutrients from the soil. e soil's properties are crucial to determining how much water a plant needs [16].
Agricultural jobs, in particular, can benefit greatly from data mining approaches. One of these operations is the use of association rule restrictions to regulate water use in agricultural areas. In addition, the Internet of ings has made smart farming possible by various data acquisition and storage techniques. Field values for optimal plant irrigation are collected by smart sensor networks in modern irrigation systems. Machine learning is used in a wide range of realworld applications such as smart farming, smart healthcare, smart logistics, and smart production. In the framework [17], data are acquired using soil and moisture sensors, and then they are stored on a centralized cloud server. On a cloud server, various analytics are performed using machine learning algorithms.
is framework provides the exact water quantity required for a particular crop. is framework is shown in Figure 1.
An IoT-enabled global smart city concept is possible. ere are many different types of intelligent communities, such as smart homes, smart farms, smart environments, smart fitness, and smart governance, for example. Additionally, the Internet of ings is employed in the oil and gas extraction, manufacturing, and refining industries. e Internet of ings (IoT) improves efficiency, optimizes costs, maximizes energy utilization, maintains forecasts, and provides a great deal of convenience for the general public. Security concerns are increasing as more and more systems and data processing become more diversified. Security and privacy concerns are the primary impediments to the growth of the Internet of ings.
Computer security is the act of securing computer systems from external threats in order to ensure the confidentiality, integrity, and availability of computing resources. When an intrusion occurs, the network resources and the victim server are put in danger [18]. System administrators can take action if intrusions are detected by the intrusion detection system (IDS) because it monitors and reports on intrusions. As the number of cyberattacks has risen, so has people's mistrust of the Internet. A denial of service (DoS) is a well-executed security attack (DoS).
It is possible to use IDS to detect attacks from the outside as well as from within a company's computer network. It looks like a burglar alarm, but an intrusion detection system is different. It is proposed in this article a framework for detecting and categorizing intrusions into Internet of ings networks that are utilized in agriculture. Security and privacy are important considerations not only in IoT networks linked to agriculture but also in all Internet of ings applications in general. As an input data set, the NSL KDD data set is used in this technique and is available online [19]. e NSL-KDD data collection is preprocessed by first translating all symbolic features to numeric features and then transforming all numeric features to symbolic features. In order to extract features, principal component analysis is utilized. After that, the preprocessed data set is categorized using machine learning methods such as support vector machine, linear regression, and random forest to determine its classification. When comparing the performance of machine learning algorithms, the accuracy, precision, and recall measurements are taken into consideration.

Review of Existing Irrigation Systems.
A wide range of industries, including the military and agriculture, rely on wireless sensor networks (WSNs). A lot of research is going on the power consumption of wireless sensor network. Metaheuristics and WSN life spans were also examined in their research. Using metaheuristic algorithms, they propose a transition, evaluation, and determination procedure that helps to find the optimal solution to the problem.
It is clear from their explanation that the metaheuristic method requires a thorough comprehension of the many field characteristics and domain expertise associated with the longevity problem. Metaheuristic algorithms have already been studied by a large number of academics. Even though these strategies increase the WSN's effectiveness, there are still a number of issues that need to be addressed. Contingency planning may lead to a reduction of sensors or cluster heads. Furthermore, metaheuristic approaches are meant for optimization and are not suitable to solve scalable problems.
According to Li et al. [20], the nodes should be as small as possible to enhance connection and coverage. e proposed method is compared to other existing methods to perform the similar task. e efficacy of the suggested strategy was proven in four unique instances using two wellknown ways.
WSN nodes are used in mobile network topologies when a significant level of communication frequency is required. Hassan and colleagues explored the use of frequencies for industrial, scientific, and medical (ISM) objectives [21]. e premise of their reasoning is frequency reuse at sensor nodes along busy pathways. In order to spread the load across the macro cells, we employed the CR approach. In turn, this results in a reduced crash area and greater space for passengers in their suggested design. Using an underground sensor network, soil humidity and temperature may be monitored in real time.
us, sensors and power adjustments for input setup and regular energy maintenance are appropriately installed and maintained [22].
Verma et al. [23] investigated the use of wireless sensors for irrigation scheduling. Using a sensor array and sensor network-based accuracy technologies, they are able to predict real-time watering demands in the soil and so save unnecessary watering. For irrigation systems to function more efficiently, data transmission demands improved energy savings. Multiple access time division (MATD) is claimed by Ushakov et al. [24] to be more efficient for data transfer in WSN irrigation systems (TDMA). Energy is saved by immediate transmission and the addition of data. TDMA also enhances the performance of the network.
Goumopoulos et al. [25] developed an ontological technique for a wireless sensor or actuator-based autonomous zone irrigation system. In precision agriculture, plants communicate with one other in order to preserve water. Detecting node problems in the network is the primary goal of this system, which employs many machine learning approaches. In the past, irrigation activities were made more automated and user-friendly by creating enduser apps.
Cotton crops can be irrigated more efficiently according to a system devised by Rakhra et al. [26]. Researchers have used a wide range of data sets to predict the exact amount of water decided for a particular combination of soil. e output of the analytical study is made available to the farmers or other users by a mobile application.
Lopez et al. [27] investigated the agricultural uses of wireless sensor networks. Time-dependent agriculture industry characteristics were discussed extensively in their debate. As a consequence, they compiled a list of sensors that may be used to monitor agricultural factors. In the end, they looked at a variety of communication networks and made comparisons.
According to [28,29],field productivity can be improved by sending out notifications in the form of text messages or emails when certain variables have been measured over a threshold. Some sensors, such as those used to assess agricultural fields' statistical features, were thoroughly examined, as were the products they are linked to and the specifications attached to them. e model had sensors for  Journal of Food Quality 3 soil water content, soil moisture content, soil electrical conductivity, pH, temperature, and wind speed. Dou et al. [30] designed an ecofriendly framework to implement the basic functions of smart agriculture including the optimization of farming resources, decision support, and land surveillance. Water and fertilizer consumption may be maximized while crop yields can be improved at the same time thanks to a new technique they have devised.
Deepika and Rajapirian [31] have created a wireless sensor network prototype for precision agriculture that makes use of a constrained power source. For their model, they employed sand soil with varying amounts of water in order to demonstrate the impacts of off-the-shelf technology.
According to Deepika and Rajapirian [32], a research study on the current status of precision agriculture's wireless sensor networks assessed some of the more cutting-edge possibilities. In order to monitor a plant, an FPGA-based control system is employed. Imam et al. [33] examined various issues and challenges related to wireless sensor networks and sensors for IoT-based agriculture in order to optimize farmer advantages. In precision agriculture, microcontroller families and sensor nodes were compared. Researchers also detailed the demands of relative humidity sensors and their interfaces.

Review of Existing Intrusion Detection Methods.
Computational intelligence is a new generation of information systems that are being built with the help of soft computing [34]. With a soft computing system, you can build intelligent machines that can solve complicated realworld issues that cannot be mathematically modeled because they are too hard to model. In order to attain a likeness to human decision-making, it uses tolerance for approximation, ambiguity, imprecision, and partial reality [35]. Soft computing methods for intrusion detection are summarized in this section. e research is broken into four sections: fuzzy logic, neural networks, evolutionary algorithms, and artificial immune systems. Intrusion detection has yet to benefit from the use of coupled map lattices.
Since its inception in Holland, the genetic algorithm (GA) has been shown to be a flexible and effective search engine. Evolution in the wild is simulated using computer technology. As a stochastic global search process, the GA is based on the survival of the fittest principle and creates evercloser approximations to a solution.
New solutions are generated every generation by selecting people based on their performance in the issue area and spawning children. Individuals that are better suited to the problem area than their predecessors can be formed by using this method of recruiting new employees [36]. e fitness function provides an indication of how people did in the issue area.
As a result of the flocking and schooling behavior of birds and fish [37], they created PSO in 1995 [38]. Artificial life, psychology, physics, and computer science all played a role in the development of PSO in some capacity. A "population" of particles moves around the problem region at specific rates in order to solve it. Particle velocities are adjusted via stochastic calibration based on their prior best position and their best nearby position.
Both the particle best and the neighborhood best are calculated using a user-defined fitness function. As each particle moves, it naturally results in a near-ideal solution or a near-optimum solution. "Swarm" is used to describe the movement of particles in the issue room, rather than a flock of birds or school of fish.
As an approximation logic known as fuzzy logic (FL), two-valued logic is expanded to include operations on fuzzy sets, such as equalization, enclosure, complementation, intersection, and union, in the context of fuzzy logic. Machine learning optimization and classification paradigms based on evolutionary processes like genetics and natural selection are used in evolutionary computing. In the field of evolutionary computing, concepts such as genetic algorithms, evolutionary programming, genetic programming, and evolution techniques are all included. Genetic algorithms are the most frequently used in applications [39]. e hidden naive Bayes classifier (HNB) is more adaptable than the classic naive Bayesian classifier. In the HNB model, the hidden parent of each attribute is specified by adding a new layer. e structural properties of HNB can be inferred using naive Bayes. Hidden parents are created for each trait so that the forces of its other characteristics can be brought together. e average of weighted one-dependence estimators [40] is used to describe hidden parents. e support vector machine (SVM) is a classification method based on statistical learning theory (SLT). Hyperplane classifiers are another example of this. Using SVM, an ideal hyperplane maximizes the difference between two groups while minimizing any overlap.
ere are several layers of latent variables (hidden units) in DBNs in machine learning, and the relations between the levels but not the units inside each layer are generative graph models or deep neural networks.
A model created by researchers in [42] can be used to select the features of an intrusion detection system. Using genetic algorithms and PSO, the data set has an accuracy percentage of 91.75%. e main components of this framework are IDS data set, data preprocessing algorithm, machine learning algorithms, and prediction module.

Methodology
In preprocessing of the NSL-KDD data set, firstly all symbolic features are converted to numeric features. e target class is also allocated unique numbers. Continuous numeric features such as duration (duration of connection) and SRC bytes (data bytes) present in the dataset are normalized using z-score normalization. Feature extraction is carried out using the principle component analysis (PCA) approach in this article. Data analysis and compression may benefit from PCA's linear approach to dimensionality reduction [30]. It is based on finding orthogonal linear combinations of the original characteristics with the greatest variance in order to convert a large number of uncorrelated features.

Support Vector Machine Approach.
When it comes to handling two-class categorization problems, a support vector machine (SVM) is the most common method. It is possible to perform classification and regression with SVM in addition to other uses. In addition, SVM uses the kernel phenomenon to alter the data before determining an appropriate boundary between the most likely outcomes. In addition, the decision line between the two classes on a graph must be wide enough to be discernible. SVM creates an ideal boundary that divides the new data point into the correct categories. For that reason, the hyperplane is often referred to as the ideal boundary.

Logistic Regression.
Logistic regression is the technique used to link a dependent variable to one or more independent variables. In some contexts, the dependent variable and the independent variable are referred to as predictors and predictors, respectively. Temperature and humidity differential, soil moisture, and pH rate are variables independent of plant type prediction (c). e following formula has been established: 3.3. Random Forest. All of the predictors in a random forest are built from random variables with the same distribution across all the trees in the forest. e forest's generalization error decreases as time passes. In the woods, there is a record number of trees.
e generalization error of a tree classifier forest depends on the relative strength of the individual forest trees and their comparison. As compared to AdaBoost, using a random selection of features to split each node results in error rates that are more stable. Variance, frequency, and consistency of internal measurements are used to indicate the reaction to an increase in the number of attributes that are split. e parameter importance estimate is also based on external measurements. Regression is based on the same principles.

Results and Analysis
In fact, the NSLKDD dataset is used for anomaly detection.
is enhanced version of the KDDCUP99 dataset has no redundancy, no duplication of records, and a lower complexity level than NSLKDD [43]. Twenty percent of the NSLKDD dataset is training data (25192 records), and the remaining eighty percent is testing data (100781 records). In this paper, we use only 20% of the training data to generate decision rules. In the experimental analysis, three classification algorithms, namely, SVM-support vector machine, logistic regression, and random forest classifiers, are used. To calculate accuracy the following formulae were used: where TP represents true positive, TN represents true negative, FP represents false positive, and FN represents false negative.

NSL KDD Data Set Data Preprocessing
Classification of Preprocessed Data set by Machine Learning Algorithms-Support Vector Machine, Logistic Regression, Random Forest Journal of Food Quality e results proved that the accuracy of the SVM classifier is better than that of the random forest and logic regression algorithms. It is shown in Table 1 and Figure 3. Also, Figure 4 shows a graphical representation of the accuracy results of machine learning algorithms.

Intrusion Detection and Prediction
Precision and recall parameters are also used to measure the performance of machine learning algorithms. Precision and recall of SVM, random forest, and logistic regression for intrusion detection of agriculture fields are shown below in Figures 5 and 6.
In this graph, SVM exceeds random forest and logistic regression in terms of machine learning algorithm accuracy. e accuracy of SVM is higher than 98%, but random forest and logistic regression accuracy is less than 78%.

Conclusion
Agriculture is crucial to the country's economic well-being since farmers produce food for everyone. It connects a varied spectrum of enterprises around the country. A country with a sizable agricultural sector is considered wealthy economically and socially. Agriculture is critical in the majority of countries as a source of employment. Irrigation accounts for a sizable portion of overall water use in a country like India. Among the several elements affecting crop productivity are the surrounding environment's temperature, soil temperature, and relative humidity. Agricultural irrigation is crucial in crop production since it has a direct effect on crop yield. Harvesting fields successfully is highly dependent on human supervision and experience. It is vital to safeguard the field's water supply at all costs. Water scarcity is a significant issue in contemporary civilization. e issue is global in scope, affecting individuals on a daily basis. As a result, we are concerned about the possibility of the situation deteriorating worse in the next few years. To address the issues highlighted above, smart irrigation and precision   farming are the answer. Smart irrigation and precision agriculture are only possible with the development of the Internet of things and machine learning. Numerous benefits accrue from the Internet of ings, including enhanced efficiency, cost optimization, optimal energy consumption, forecasting, and convenience for the general public (IoT). e diversity of data processing technologies and methodologies raises issues regarding their dependability and security. Concerns about security and privacy are impeding mainstream adoption of the Internet of ings. is article proposes a system for detecting and categorizing intrusions into IoT networks in agricultural regions. SVM has a precision of greater than 98 per cent; however, random forest and logistic regression have a precision of less than 78 percent.
Data Availability e data supporting this research article are available from the corresponding author on reasonable request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.