Probabilistic Prediction of Unsafe Event in Air Traffic Control Department Based on the Improved Backpropagation Neural Network

Air traffic control is an important tool to ensure the safety of civil aviation. For the departments that do the work of air traffic control, reducing the percentage of unsafe event is the core task of safety management. If the relationship between the percentage of unsafe event and their influencing factors can be effectively clarified, then the probability of unsafe event in some control department can be predicted. So, it is of great importance to improve the level of safety management. To quantitatively estimate the probability of unsafe event, a three-layer BP neural network model is introduced in this paper. First, a probabilistic representation of unsafe event related to air traffic control department is made, and then, the probability of different classes of unsafe events and safe events is taken as the outputs of the BP neural network, the factors influencing occurrence of unsafe event connected with air traffic control is taken as inputs, and the sigmoid function is chosen as activation function for the hidden layer. Based on the error function of neural network, it is proved that the general BP neural network has two drawbacks when used for the training of small probability events, which are as follows: the pattern does not ensure that the sum of probability of all events is equal to one and the relative error between the actual outputs and desired outputs is very large after the training of neural network. The reason proved in this paper is that the occurrence rate of the unsafe event is much smaller than that of the safe event, resulting in each weight in the hide layer being subjected to the desired outputs of the safe event when using the gradient descent method for network training. To address this issue, a new mapping method is put forward to reduce the large difference of the desired outputs between the safe event and unsafe event. It is theoretically proved that the mapping method proposed in this paper can not only improve the training accuracy but also ensure that the sum of probability is equal to one. Finally, a numeric example is given to demonstrate that the method proposed in this paper is effective and feasible.


Introduction
China has a large population, vast geographical area, and uneven distribution of natural resources. To promote economic development and improve people's life, the exchange of people and goods among different regions is very frequent. Transportation is the tool to realize the exchange of people and goods, and different transportation modes have different characteristics. Due to the large population and uneven distribution of the population in China, the transportation demands are diverse, and to meet different people's demand, the Chinese government has been committed to building a diversified comprehensive transportation system, creating a comprehensive transportation network integrating railroads, highways, waterways, and civil aviation [1][2][3][4][5].
With the popularization of China's civil aviation from high-end passengers to the public, civil aviation is becoming more and more important in China's comprehensive transportation system. In 2019, China civil aviation completed a total of 129.27 billion ton-kms of freight turnover, 660 million person-times of passengers, and 752.6 million tons of cargo and mail, with growth rates of 7.1%, 7.9%, and 1.9% year-on-year, respectively. With the advantages of safety, speedability, and convenience, civil aviation is winning more and more people's choice, and the share of civil aviation in passenger transportation reached 32.8%, with a growth rate of 1.5% year-on-year. e number of flights is rising steadily, and Chinese transport airlines completed 4,966,200 takeoffs, with an increase of 5.8% over the previous year [6].
Safety is the primary concern for every mode of transportation. With the increasing in the number of flights, the air routes and airports are becoming more and more congested, and the controllers' workload is constantly increasing, which brings serious challenges to the safety of civil aviation. Air traffic management is an important part to ensure the safety of civil aviation, and the prerequisite for the safety of air traffic management is the safety of air traffic control (ATC), which makes the operation of aircrafts efficient, orderly, and safe. ATC is a service provided by ground-based controllers who direct aircraft on the ground and through controlled airspace, providing advisory services to aircraft in noncontrolled airspace [7]. ATC department is becoming more and more difficult for controllers to direct aircrafts, and the increase of flight volume leads to an increasing possibility of unsafe event and an increasing risk for the operation of the ATC department.
Safe operation has always been the goal pursued by civil aviation and is also the focus of the whole society [8]. ere are many factors that influence safety in the ATC department. Air traffic controllers and pilots are crucial in achieving high levels of safety in air traffic operations [9]. e increasing volume of flights and the expanding route network have led to a dramatic increase in the workload of air traffic management, and a gradual increase in the number of ATC unsafe event has had a serious impact on the development of China's civil aviation [10]. Air traffic management systems are typically highly interactive, highly distributed, and complex systems [11]. e changing objective conditions lead to uncertainty, and the issue of ATC safety comes along with various uncertainties in the process of civil aviation transportation [10]. e increase in flight has led to a dramatic increase in demand for airspace, so that various unsafe factors also increase [12]. e contradiction between the capacity of air transport services and demand for transport is becoming more and more prominent. With this come the security risks in air traffic control operations [13]. e limited airspace, manpower, and equipment resources have increased the pressure on air traffic controllers and the probability of unsafe event in air transport has increased [14]. Due to the dynamic and real-time nature of the ATC control operation process, its risk level is higher than other systems, and control factors are more likely to lead to an unsafe event [15]. e risk factors of organization and management in the air traffic control system have a complex influence on flight safety [16].
To improve the safety level of air traffic control, different approaches have been taken in various ways. Safety integrity system is of great significance for modern management, and it is helpful to establish a safety integrity system for ATC [17]. Global networks of satellites for communications, navigation, and surveillance are longer-term solutions to air traffic management [18]. Automation removes some existing sources of human errors, and it can prevent some accident [19]. e construction of a safety culture in air traffic control can protect air traffic controllers [20]. Among the many methods, safety assessment for air traffic control is the commonly used [19,21,22]. e Chinese government attaches great importance to civil aviation safety. According to "Rules for Safety Management of Air Traffic Management Operation Department of Civil Aviation" published by the Civil Aviation Administration of China, the operation department of air traffic control of civil aviation shall establish a safety assessment mechanism. However, this document does not put forward specific implementation plan or evaluation method. e reason is that there is no universal risk assessment methodology that can be applied to any situation. So, in the "Regulations for Safety Assessment of Air Traffic management of Civil Aviation" published by the Civil Aviation Administration of China, the Civil Aviation Administration encourages and supports research and innovation on the methods and techniques of safety assessment to make the safety assessment scientific and normative. is document also points out that the safety assessment should choose reasonable methods, based on the conditions, characteristics, and needs of the actual circumstances. erefore, the safety assessment in air traffic control has always been an important research topic in civil aviation, and many experts and scholars have made outstanding contributions in this field. For example, Wang and Yao [23] presented a fuzzy Petri net method to assess the risk of air traffic control. Wan and Zhang [24] dealt with the certainty and uncertainty of the assessment system as a whole and establish a risk assessment model based on game theory and set pair analysis (SPA).Yuan et al. [25] believed that there are a lot of uncertainties in air traffic control safety evaluation, such as randomness, imprecision, and ambiguity. ey introduced the dempster combination rule to improve it and proposed an ATC safety assessment method based on evidence theory. Yao et al. [26] adopted a fuzzy Petri net and introduced the risk level threshold and analytic hierarchy process to reduce the complexity of the fuzzy Petri net used for ATC safety assessment. Wang and Sun [27] used system theory process analysis (STPA) to identify potential unsafe behaviors of ATC operation system and then used first-order linear temporal logic (LTL) to normalize the identified unsafe behaviors. Finally, they proposed a safety assessment method for the unsafe behaviors. Liao et al. [28], from the perspective of probability theory, proposed a safety probability evaluation method for air traffic control based on Bayesian analysis.
rough a comprehensive analysis about these literature studies, it can be found that most of the current research on unsafe event in the ATC department is focused on safety assessment or risk assessment. ATC safety assessment is mostly based on comprehensive evaluation, that is, taking the whole ATC safety as the research object, establishing corresponding evaluation index, setting the weight of each evaluation index, scoring each evaluation index, and judging which safety level the ATC safety status belongs to by combining the weight of each index and the score of the index on the premise of the given safety level. At present, most methods of the ATC safety assessment are qualitative assessments on the safety level in the air traffic control department, and there are few quantitative assessments on the probability of unsafe event for some air traffic control department. However, in the field of ground transportation, the application of quantitative methods has been very extensive. For example, Lin et al. used hybrid deep learning model and generative adversarial networks to traffic incident detection [29,30]. In the field of ground traffic incident detection, the use of various quantitative research methods is very common [31][32][33]. For the departments that do the work of air traffic control, reducing the percentage of unsafe event in their departments is the core task of safety management. If the relationship between the percentage of unsafe event and their influencing factors can be effectively clarified, then, the probability of unsafe event in some control department can be predicted. In order to quantitatively estimate the probability of unsafe event, a three-layer backpropagation neural network model is introduced in this paper. Considering the training accuracy of the BP neural network is not good when the outputs are small probability event, this paper introduces a corresponding model to improve it.

Artificial Neural
Networks. An artificial neural network (ANN) is modeled on the brain where neurons are connected in complex patterns to process data from the senses, establish memories, and control the body. Artificial neural networks (ANNs) process data and exhibit some intelligence. It is the piece of a computing system designed to simulate the way the human brain analyzes and processes information [34]. It is the foundation of artificial intelligence (AI) and solves problems that would prove impossible or difficult by human or statistical standards. ANNs have selflearning capabilities that enable them to produce better results as more data become available. Warren McCulloch and Walter Pitts presented the first simple systems, which are the origins of artificial neural networks (ANNs) in the 1940s. ey proved that an ANN can learn any arithmetic or logical function [35]. Artificial neural networks have been widely used in various industries and have achieved excellent results [36,37].

General Backpropagation Neural Network.
ere are two types of artificial neural networks, shallow neural network and deep neural network. A shallow neural network has only one hidden layer of neurons that processes inputs and generates outputs. A deep neural network has two or more hidden layers of neurons that process inputs. According to experts [38], shallow neural networks can tackle equally complex problems. So, we use the shallow neural network to solve the problem of predicting probability of unsafe event in the air traffic control department.
Backpropagation is an algorithm that backpropagates the errors from the output nodes to the input nodes. It is the essence of neural net training. It is the method of fine-tuning the weights of a neural network based on the error rate obtained in the previous iteration. It is a standard method of training artificial neural networks.
is method helps to calculate the gradient of a loss function with respect to all the weights in the network. BP neural network is described as follows [39][40][41]. Given a shallow neural network has three layers of neurons that process inputs and generate outputs. this neural network has M inputs in input layer, N outputs in output layer, and K neurons (or nodes) in hidden layer. e input vector in input layer is X � (x 1 , x m , . . . , x M ), the input vector in hidden layer is HI � (hi 1 , hi k , . . . , hi K ), the output vector in hidden layer is HO � (ho 1 , ho k , . . . , ho K ), the input vector in output layer is YI � (yi 1 , yi n , . . . , yi N ), the output vector in output layer is YO � (yo 1 , yo n , . . . , yo N ), the desired output vector is t � (t 1 , t n , . . . , t N ), the connection weight from the mth node in the input layer to the kth node in the hidden layer is w mk , and the connection weight from the kth node in the hidden layer to the nth node in the output layer is w kn . e bias of each neuron in the hidden layer is b k , the bias of each neuron in the output layer is b n , the activation function of the hidden layer is f 1 (·), and the activation function of the output layer is f 2 (·).
e BP neural network uses the backpropagation algorithm to train networks. Backpropagation algorithm is commonly used in the training of artificial neural networks. e basic idea of the backpropagation algorithm is that the learning process consists of two processes: the forward propagation of the signal and the backward propagation of the error. In the process of forward propagation, the input signals pass from the input layer to the output layer after being handled in the hidden layer. If the actual outputs do not match the desired outputs, then, the process will turn into the backpropagation of error. e model of the BP neural network with one hidden layer is shown in Figure 1. e input of the kth neuron in the hidden layer is as follows: (1) e output of the kth neuron in the hidden layer is as follows: e input of the nth neuron in the output layer is as follows: e output of the nth neuron in the output layer is as follows: yo n � f 2 yi n . (4) e loss function E between the network outputs and the desired outputs is as follows:

Mathematical Problems in Engineering
During network training at a time, the change value of weight in each layer is obtained by gradient descent algorithm as follows: where Δw mk represents the change value of weight from the mth node in the input layer to the kth node in the hidden layer. Δw kn represents the change value of weight from the kth node in the hidden layer to the nth node in the output layer. η represents learning rate. During network training at a time, the change value of biases obtained by gradient descent algorithm is as follows: Δb n � ηf 2 ′ yi n t n − yo n , where Δb n represents the change value of the nth bias in the output layer. Δb k represents the change value of the kth bias in the hidden layer. BP neural network is widely used in all walks of life because of its strong adaptability, including nonlinear mapping ability, self-learning ability, adaptive ability, generalization ability, fault tolerance ability, and other advantages [39,42,43]. At the same time, many scholars have improved the BP neural network considering its shortcomings and deficiencies, which improves the accuracy of the model [44][45][46]. is paper introduces the BP neural network into the probabilistic prediction of unsafe event in air traffic control department. For the problem of insufficient prediction accuracy of the general BP neural network, this paper proposes an improvement method.

Modeling Based on the Backpropagation Neural Network.
ree problems need to be solved in calculating the probability of ATC unsafe event by using BP neural network: determining the inputs, outputs, and activation function.

Probabilistic Representation of the ATC Event.
According to the air traffic management rules of civil aviation in China, ATC unsafe event can be classified into five levels with respect to their severity: accident, serious incident, general incident, serious error, and general error. Each class of unsafe event is strictly defined in this document.
According to the probability theory, this article defines the elementary event as follows. Firstly, the elementary event e 0 is defined as safe event, e 1 represents accident event, e 2 represents the event of serious incident, e 3 represents the event of general incident, e 4 represents serious error event, and e 5 represents general error event. e probability space composed by the elementary event is Ω � e 0 , e 1 , e 2 , . . . , e 5 . Ψ represents the set of all subsets of Ω, there are a total of 2 6 elements, Ψ is a σ domain, and each subset A in set Ψ is an event. e probability of A is P(A), and then, the probability space of the ATC event can be expressed as (Ω, Ψ, P). Defining the single-valued real function X(e i ) � i( i is one of integer form 0 to 6 standing for different ATC event) in the probability space (Ω, Ψ, P), then, X is the random variable about the ATC event. It is known that X is a discrete random variable, which can be seen from the value of X. e probability of different values of X can be expressed in the distribution law. In the set of elementary event defined in this paper, there is a special elementary event e 0 , which represents safe event. One important characteristic of probability is the normalization of probabilities, which means that the sum of the probability of all events is equal to one. If we do not consider the safe event e 0 , the normalization of the probability will not be satisfied, which will have a bad impact on the accuracy of the prediction.

Outputs of the Backpropagation Neural Network.
e outputs are usually the data that the model builder cares about. From the previously mentioned analysis, we are concerned with the probability of an unsafe event occurring in an ATC department. erefore, the probability of unsafe event should be used as the outputs of the network. In addition to this, as previously mentioned, ensuring the normalization of the probability can improve the prediction accuracy, so the probability of safe event is also included in the outputs. erefore, using the set of elementary event e 0 , e 1 , e 2 , . . . , e 5 as the outputs of the network is the most straightforward way. Of course, the outputs can also be adjusted according to the needs of the research purpose. For example, if we only want to know the overall probability of all unsafe events and do not care about which class of unsafe event occurs, we can define event A as e 1 , e 2 , . . . , e 5 . e outputs of the network are the probability of the event A and the probability of the safe event e 0 . at is to say, depending on the actual needs of the research problem, the outputs of the network can be any subset of the set Ω. It should be noted that, in order to satisfy the normalization of probability, the network's output event must be a partition of the set Ω.

Inputs of the Backpropagation Neural Network.
e inputs of BP neural network are usually some influencing factors that have an impact on the safety of air traffic control, including the technical level of the controller, number of conflict points in the controlled airspace, and so on. e essence of determining the inputs is to establish the indicator system that affects the safety of ATC department. e establishment of input indicator system is relatively flexible, which usually needs to meet the following principles: the quantifiability of indicators, moderate number of indicators, and convenient data collection. ere are many factors affecting the safety of ATC department, and the relationship among various influencing factors is intricate and complex, so it is very difficult to find the same influencing factors applicable to all different ATC departments. In the safety management of ATC, the analysis method most commonly used and widely recognized by the experts is SHEL model recommend by ICAO [23,47], as described in Doc 9859 called Safety Management Manual. In this method, the factors affecting ATC safety are divided into four elements: hardware, liveware, environment, and management. e SHEL model provides a holistic overview of the influencing factors affecting ATC department. erefore, in this paper, the hardware, liveware, environment, and management in the SHEL model are used as the inputs of the neural network. e next question is how to quantify these four inputs. ere is no doubt that unsafe acts of human, equipment failures, management loophole, and objective problems in the environment may all lead to unsafe event. So, the number of unsafe acts of human in the ATC department can be used as input data for the liveware. e number of equipment failures in the ATC department can be used as input data for the hardware. e number of problems with management in the ATC department can be used as input data for the management.
e number of problems with environment in the ATC department can be used as input data for the environment. Of course, the previously mentioned four inputs can be divided in depth according to the needs of the research. For example, the environment can be divided into the number of hazardous weather and the volume of flights. In short, the inputs of the neural network can be subdivided based on the previously mentioned four indicators in combination with the practical problems.

Activation Function of the Backpropagation Neural
Network. BP neural network used in this paper is a threelayer structure, which needs two activation functions. Under normal circumstances, the choice of activation function needs to be determined by the actual problem. When used in probabilistic prediction, it has its particularity; that is, the outputs of the output layer must be in interval (0, 1). erefore, the output layer chooses the asymmetric sigmoid function as activation function as follows, whose range is also in interval (0, 1): Considering that the domain of input in output layer may contain negative number, if the activation function of the hidden layer chooses the asymmetric sigmoid function, its output used as the input of the output layer will not cover the domain. erefore, the symmetric sigmoid function should be a better choice for the hidden layer as follows, whose range is in interval (−1, 1): 2.4. Improvement of the BP Neural Network. BP neural network has a good fault-tolerant ability and is widely used in all walks of life. e probability of unsafe event in air traffic control department has its particularity; if we directly apply the BP neural network to probabilistic prediction, it will have a large error in accuracy. So, it needs to be improved.

Problems of the General Backpropagation Neural
Network.
ere are several problems in using general BP neural network directly to predict the probability of unsafe event in the ATC department.
During the training of the network, the outputs do not satisfy the normalization of the probability, which is that the sum of the probability of all network outputs is not equal to one. When the trained network is used for prediction, prediction accuracy may be affected.
ere is more than one reason for the previously mentioned problem. For example, the inputs do not fully cover all the influencing factors, the quantization of the inputs is insufficient, and the number of training data is lacking. A few data in the training set is inaccurate, and so on. In short, BP neural network can only approximate a certain function, and errors are inevitable. So, if a constraint is introduced to ensure that the sum of all outputs is equal to one in each training epoch, the problem can be solved.
ATC unsafe events are small probability events. at is, the probability of an unsafe event in the ATC department is very low compared with the probability of safe event. When the connected weight of BP neural network is adjusted, the change of weight is mainly affected by the probability of safe event. is leads to the fact that the training of the whole network is dominated by the error of safe event, which makes the relative error of unsafe event very large. Next, we prove this by starting with the principle of BP neural network.
For the convenience of proof, the case is considered, in which only two outputs are in BP neural network. yo n is used to represent output, n � 2 stands for safe event, and n � 1 stands for unsafe event. Since the unsafe event is a small probability event, this means that the desired probability of unsafe event is far less than the desired probability of safe event. For example, in 2020, the rate of unsafe event was 0.0056 per 10,000 flights in Chinese civil aviation. In 2019, the rate of unsafe event was 0.0043 per 10,000 flights in Chinese civil aviation. It is shown as follows: where t 1 represents the desired probability of unsafe event and t 2 represents the desired probability of safe event.
Mathematical Problems in Engineering e following is obtained from (5): e following are obtained from (12): e following is obtained from (14) and (15): In the training process, when the actual outputs of BP neural network yo are close to the target t, there is yo 1 ≪ yo 2 , and the following can be proved, like (16): e following apparently hold true: e following is obtained from (18) and (19): e following can be obtained from (13), (16), (17), and (20): Bringing (21) into (7) and (9), the following can be obtained: From (22) and (23), it is easy to find out that the change of weights from the input layer to the hidden layer is mainly determined by the probability of safe event.
erefore, when weight change is based on error, the result for safe event is better, but for unsafe event with small probability is worse.
e general processing method for this situation is to normalize the inputs and outputs, which map the inputs and outputs to the same interval through a function. is method can reduce the difference among the inputs and outputs, reduce errors, and improve the accuracy. After the prediction is made, the outputs can be probability though reverse normalization. However, this method still has following problems.
(1) It still does not guarantee the normalization of probability, and the sum of the all outputs may not be equal to 1. (2) Different normalized function is going to produce different results. It is hard to say which normalized function is better. Different normalized functions are selected for the same model. is may result in some difference in accuracy. So, it is unclear which normalized function can be trusted. For these problems, this paper proposes an improved method of BP neural network, which can not only meet the normalization of probability, but also improve the accuracy of the training of BP neural network.

Improvement of the Backpropagation Neural Network.
For a certain input vector, it is assumed that the qth output of the neural network is u q . Since there is some error in the output, it is assumed that the error of the qth output is ε q . en, the true value of the qth output is u q + ε q . According to random utility theory [48][49][50], the probability of the qth output belonging to the qth class of event is as follows: where u z stands for the zth output of the neural network, and ε z stands for the error of the zth network output.
Assume that the error ε obeys an independent Gumbel distribution, and its distribution function is shown as follows: en, its joint distribution F(ε 1 , ε 2 , . . . , ε n ) is as follows: Find the partial derivative of the qth random variable ε q as follows: On substituting (24) into (27) to find its definite integral, the following can be obtained [51][52][53]: According to the random utility theory, the qth desired output t i q corresponding to the ith input vector can be taken as the desired probability of ATC event. So, the following is valid: If there are N outputs, then, there are N equations like (29), which form an equation set. u i q can be obtained by solving the equation set. t i q is replaced with u i q used for the desired outputs in the training of BP neural network. In the prediction, the network output u is brought into (29), and then, the probability of ATC event can be obtained.
Considering any two network outputs u i p and u i q corresponds to the ith input vector. According to (29), the following can be obtained: Equation (30) divided by (31) yields the following: After transformation, the following is obtained: Finally, we get the following: According to the properties of logarithmic function, when the difference between t i p and t i q is large, that is, the value of t i p /t i q is very large or very small, the value of ln(yo i p /yo i q )will not change too much, and the difference of |u i p − u i q | will become smaller. For example, if t i p /t i q � 100000, then ln(100000) � 11.51293. erefore, according to the analysis in Section 2.4.1, replacing t i with u i in the training of the network will be more reasonable.
After the training of the network is completed, the ith sample is fed into the neural network and the pth output U i p is obtained. On bringing U i p into (29), the actual output of the ith input is obtained by the following: Summing all the equations in (35) yields the following: So, the model can satisfy the normalization of probability.
In summary, the outputs of neural network contain all the event which is the partition of probability space, so (36) holds without question.
e equation set formed by (29) has a redundant equation, which can be seen from (36) and (37). N unknown variables cannot be solved by N − 1 equations. Just let the one u i p be zero, and this problem can be solved. It is difficult to solve the equation set formed by (29) directly. e method of element changing is introduced. Let x i q � exp(u i q ), and then, (29) is transformed into the following: (38) is transformed to the following: e equation set formed by (39) is the linear equations set. Solving it is very simple. After obtaining the solution to the equation set, the u i q can be obtained by the following:

Adjustment of the Improved Backpropagation Neural
Network Structure. Since the network output value in the improved BP neural network u i p does not need to be limited in this interval (0, 1), the activation function of the output layer f 2 (·) can use the linear transfer function as follows to simplify the problem: en, the change value of weight in the output and hidden layers is obtained by gradient descent algorithm is as follows: Similarly, during network training at a time, the change value of biases in the output and hidden layers obtained by the gradient descent algorithm is as follows: Mathematical Problems in Engineering 2.5. Scope of Application of the Model. One of the core tasks of air traffic control in civil aviation is to ensure the safety of aircraft and avoid unsafe event. Unfortunately, Murphy's law tells us that absolute security does not exist. at is, it is difficult to succeed in completely avoiding unsafe event.
Since it cannot be completely avoided, it must be scientifically confronted. For the department performing air traffic control work, it is very meaningful for air traffic safety management if it can predict the possibility of some unsafe event occurring. In this paper, BP neural network model is introduced into the probability prediction of unsafe event in air traffic control department. ere are at least three practical applications: (1) e improved model in this paper provides a theoretical approach to quantify the probability of unsafe event in ATC department. As mentioned previously, expert assessment method is mostly used in air traffic control safety research in China at present. In this class of safety assessment method, experts assess the safety situation of air traffic control department based on their own knowledge and experience. ere is no doubt that this assessment method plays a positive role in the safety management of air traffic control. However, this kind of evaluation method is mainly based on the qualitative evaluation of experts but has no enough quantitative analysis. At the same time, it is based on the subjective evaluation of experts and has no enough objectivity. In this paper, the neural network model is introduced into the prediction of unsafe event in the ATC department. e historical data are used as the basis to make the evaluation result more objective. e neural network is also a quantitative mathematical model, which is more objective than the qualitative evaluation. (2 )It can be used for the trend analysis of ATC safety development.
For the air traffic control department, the development trend of its risk has a significant impact on air traffic safety management. If the safety risk is increasing, the ATC department needs to invest more manpower, material, and financial resources into the safety management. If the safety risk is decreasing, the investment in safety management can be appropriately reduced according to the actual situation for reducing the cost. e changing trend of the rate of unsafe event reflects the development trend of safety risks to a certain extent. After predicting the probability of unsafe event in the future and combining the past and present rate of unsafe event, the safety risk trend of the ATC department can be judged, and the decision for the safety management of the ATC department can be made. (3) It provides a method for calculating probabilities for risk assessment. Risk assessment is a common method for civil aviation safety management. e "Safety Management Manual" prepared and published by International Civil Aviation Organization clearly defines risk as the product of the probability of an unsafe event and the consequences of the unsafe event. e Air Traffic Management Bureau of the Civil Aviation Administration of China has included this method in "Rules for Safety Management of Air Traffic Management Operation Department of Civil Aviation," However, neither the "Safety Management Manual" issued by the International Civil Aviation Organization nor "Rules for Safety Management of Air Traffic Management Operation Department of Civil Aviation," issued by the Civil Aviation Administration of China, give a specific method for calculating the probability of unsafe event. e model proposed in this paper provides a probability calculation method for the risk assessment for ATC department.

Numerical Example
e following is a numerical example to illustrate the advantages of the improved BP neural network proposed in this paper compared with other neural networks in the network training about unsafe event in air traffic control department.

Data.
Suppose an air traffic control department wants to know the probability of unsafe event occurring in the next year. As analyzed in Section 2.3.3, four indicators of hardware, liveware, environment, and management related to SHEL model are used as inputs in BP neural network. Taking the unsafe event and safe event as outputs, the number of problems with management in the ATC department is used to quantify the management indicator. e number of problems with environment in the ATC department is used to quantify the environment indicator. e number of unsafe acts of human in the ATC department is used to quantify the liveware indicator. e number of equipment failures in the ATC department is used to quantify the hardware indicator. e occurrence rate of unsafe event is used to quantify the desired output of unsafe event. e occurrence rate of safe event is used to quantify the desired output of safe event. e historical data is as shown in Table 1, in which A 1 represents the safe event, t(A 1 ) is the desired output of safe event, A 2 represents unsafe event, and t(A 1 ) is the desired output of unsafe event.
e historical data for numerical example is shown in Table 1.

Modeling.
To prove that the model proposed in this paper can improve the prediction accuracy, three types of BP neural network are applied, which are called benchmark model and improved BP neural network. e benchmark model includes general BP neural network and normalized BP neural network. If the accuracy of the improved neural network is higher than the accuracy of the benchmark model used for comparison, it means that the improved neural network is meaningful. erefore, we need to train the three different neural networks separately and compare the accuracy of the training results.

Benchmark Models
(1) General BP Neural Network. e principle of general BP neural network is shown in Section 2.2 e inputs and outputs are directly used to network training without any processing in the general BP neural network. To make the presentation clearer, the structure of the general BP neural network is described as follows. e prediction of probability of unsafe event in control department is essentially data fitting, by finding out inherent and opaque connections between the inputs and outputs through historical data. ere is a consensus that the three-layer neural networks can fit most of the problems, so this model uses three-layer neural networks, which contain the input, hidden, and output layers: (1) e input layer Any safety-related influences can be used as inputs to the neural network, but these influences must be able to be quantified, meaning that these input data can be collected in practice. For a specific ATC department, as detailed in Section 2.3.3, the inputs of neural network can be found from the perspective of equipment, environment, personnel, and management according to SHEL model. At the same time, considering the collectability of the data, in this case, the following four indicators are used as inputs to the neural network, which are the number of equipment failures in the ATC department, number of equipment failures in the ATC department, number of problems related to management in the ATC department, and number of unsafe acts of human in the ATC department. It needs to be stressed again that, for the purpose of comparison, the input and output data are used directly to network training without any preprocessing. (2) e hidden layer e hidden layer needs to solve two problems: one is to determine the activation function and the other is to determine the number of nodes in the hidden layer. Considering that the domain of input in output layer may contain negative number, if the activation function of the hidden layer chooses the asymmetric sigmoid function, its output also used as the input of the output layer will not cover its domain. erefore, the symmetric sigmoid function should be a better choice for the hidden layer as follows, whose range is in interval (−1, 1): ere is no method that everyone agrees on for figuring out the number of hidden layer nodes. Some experts have introduced an empirical formula as follows [54]: where K stands for the number in the hidden layer, M stands for the number in the input layer, and N stands for the number in the output layer, α ∈ [0, 10].

(3) e output layer
For the output layer, two problems need to be solved: one is to determine the output indicator and the other is the need to determine the activation function. ere are some limitations on the output. First, the range of each output is in interval [0, 1]. Second, the outputs have to be the partition of the probability space. In order to meet the previously mentioned conditions, and taking into account the actual context of this case, there are two outputs, which are the probability of unsafe event and safe event. Since it is impossible to know the probability of unsafe event in advance, the occurrence rate of historical event can be used as the probability of ATC event for network training. e output layer chooses the asymmetric sigmoid function as activation function in the following, whose range is also in interval (0, 1), which is also the interval of the probability: (2) Normalized Neural Network. e inputs, outputs, and number of nodes in the hidden layer of the normalized neural network are the same as those of the general neural network. In other words, the structure of the two networks is the same; the difference is that the normalized neural network needs to preprocess the input and output data before training.
where y stands for the data used for training after normalization, x max stands for the maximum value of an input or a output vector, and x min stands for the minimum value of an input or a output vector. e normalized input and desired output data are shown in Table 2.
Because the outputs are normalized to the interval [−1, +1], the activation function of the output layer for the normalized neural network cannot use the sigmoid function whose range is in interval (0, 1). So, the linear function is used for activation function of the output layer as follows: In the normalized neural network model, the data used for training are normalized, and reverse normalization is required when making predictions or testing training accuracy. e reverse normalization formula is shown as follows: e parameters in (51) can be referred to (49).

Improved BP Neural
Network. e principle of improved BP neural network is shown in Section 2.4.3. In order to make the use of the improved neural network more clearly, the features of the improved neural network are summarized as follows: (1) Defining the network structure e purpose of defining the network structure is mainly to determine the number of inputs, outputs, and hidden layers of the network. For this case, the network structure of the three models is defined to be the same to make the comparison results more fairly. Details of the network structure can be found in Section 3.2.1.
(2) Output data preprocessing First, let u 2 � 0. According to (38)-(40), u 1 can be calculated. See the third column data in Table 3. e network is trained with u 2 and u 1 as the target outputs of the neural network. Of course, to improve the speed of network training, we can also normalize the input and output of the network again for improved BP neural network.

(3) Network training
Network training involves the adjustment of weights and biases. e adjustment of weights and bias in the improved neural network are shown in (42) to (45).

(4) Probability of obtaining output
In the training process of the network, the original output data is not used. After the network training is completed, either to make predictions or calculate the accuracy of the training results, the network output needs to be converted to probability, and the conversion formula is shown in (29).

Model Calculation Process.
e raw data can be divided into training set, validation set, and test set in the training of neural network. e training set is used for network training, the validation set is used for checking if the training process is overfitting, and the test set is used to compare. In this paper, because the samples are not many, all the samples are used for training. At the same time, the training set also is used for comparing which model is better.
ere is the question of why not collect more samples. e reason is that the quantitative data on the factors affecting unsafe event are confidential to the public for air traffic control department. For example, no one will make public the number of unsafe acts caused by ATC controllers in their department. e reason is simple things like that is not honorable. So, it is very difficult to get many operation data. Taking the actual situation mentioned previously into account, the data used in this numerical example is not many. erefore, all samples are put in the training set.
Another question is how to validate the network and compare the network, if there is no validation set and test set? Test set is used to compare the accuracy of different neural networks, after the network is trained. In the absence of a test set, the training set can be used to calculate the accuracy of the network, which may have some impact in actual use, but the purpose in this paper is to compare the accuracy of different networks and determine which neural network model is more accurate. We only need to care about the order of the different network accuracies, not too much about what the exact accuracy of the network is, so doing this will not have much impact. e validation set is mainly used to avoid overfitting; when the overfitting is detected, the network will stop training. In addition to overfitting, the conditions for stopping training the network can be the maximum number of epochs to train, performance goal, minimum gradient, maximum time to train, and so on. So, the network can train without validation set. Of course, overfitting cannot be ignored. To avoid the effect of overfitting on model comparison, this paper makes a detailed explanation in Sections 3.3 and 3.4. e whole process is classified into four stages: Step 1. e training set is brought into the different models as the inputs and desired outputs for network training.
Step 2. After the training is completed, all inputs are brought into the trained network to calculate the network outputs. For the normalized neural network, use the normalized input data as inputs.
Step 3. e network outputs are the actual outputs for the general BP neural network, because the inputs and outputs are not transformed in any way in this neural network. e actual outputs for the normalized neural network are obtained, after the network outputs are processed by the reverse normalization. e actual outputs for the improved BP neural network are obtained by bringing the network outputs into (29).
Step 4. At last, the absolute error and the relative error between the actual outputs and the desired outputs is calculated. e advantages and disadvantages can be derived by comparing the errors of different kinds of BP neural network. For the normalized neural network, use the normalized input data as inputs.
For the fairness of comparison, the computing parameters of different models are set to be the same. at is, the maximum number of iterations is 1000, the learning rate is 0.01, and the number of nodes in hidden layer is K � 10 according to (47), considering M � 4, N � 2, and α � 7. Tables 3-5 show the results of different BP neural network, where p(·) is the probability of the different event, which also is the actual outputs of neural network. It is obtained by feeding the inputs into the trained network and is then transformed by reverse normalization in improved BP neural network and normalized BP neural network. e(·) represents the absolute error between the actual outputs and desired outputs, and Re(·) represents the relative error between actual outputs and desired outputs. p(A) is the sum of the probability of different event, which is used to check whether the normalization of probability is satisfied. e calculation formula of each parameter in the tables is as follows:

Detailed Analysis through One Training Result.
where A 1 stands for safe event, and A 2 stands for unsafe event. e analysis result of the general BP neural network is shown in Table 4. e analysis result of the normalized BP neural network is shown in Table 5. e analysis result of the improved BP neural network is shown in Table 3: (1) From Table 3, it is easy to see that the sum of the actual outputs also used as the probability of ATC event is equal to one in the improved BP neural network. From Tables 4 and 5, it can be seen that whether outputs or inputs are normalized or not, general BP neural network and normalized BP neural network cannot guarantee that the sum of actual outputs is equal to one. (2) By comparing the error of the three models, it is easy to see that the general BP neural network model has the worst training precision, especially that the precision of unsafe event is very poor, and there are some relative errors reaching dozens or hundreds of times. Obviously, such model cannot be applied to application. It is easy to see that the improved model has the best precision after comparing the errors of different models. (3) In order to compare the accuracy of different models quantitatively on the whole, the mean square error is used to calculate the absolute total error AE and the relative total error RE of different models which is calculated as follows (the calculation results are shown in Table 6): It is not difficult to see from Table 6 that the improved neural network model has the highest precision, because both the absolute total error and the relative total error of this model are minimal.

Comprehensive Analysis through 100 Training Sessions.
e previously mentioned analysis is based on the training results of each neural network at one time, and we can visualize the difference in the training accuracy of different neural networks from a numerical perspective. It is well known that the training process of neural networks is an optimal optimization process, in which the output error is minimized as the objective function and the network weights and bias are the decision variables. To prevent the neural network from falling into local optimum, when initializing the weights and bias, random initialization is used and multiple training will be performed to take the best one for the actual application. erefore, the one-time training results cannot fully demonstrate the advantages of the improved model proposed in this paper in terms of training accuracy. To address this issue, we made 100 training sessions for each network and calculated their average error from 1 to 100 times, as shown in Figure 2. e black curve is the change of the average error of the improved neural network with the training sessions. e red curve is the change of the average error of the normalized neural network with the training sessions. e blue curve is the change of the average error of the general neural network with the training sessions. It can be easily seen from the figure that the average error of the improved neural network is the best at any time. It is enough to prove that the higher training accuracy of the improved neural network is not accidental.

Excluding the Effect of Overfitting on Accuracy.
Overfitting is a potential pitfall in neural network training. Since the samples collected in this paper are not many, all samples are used for training. Inevitably, a question arises whether the improved neural network in this paper is overfit to make the accuracy higher than the accuracy of other benchmark models. To illustrate this question, first, we are clear that overfitting may be caused by an increase in the number of training epochs. Based on this consensus, we compare the accuracy of the three models under different number of training epochs one by one. At the early stage of training, the number of training epochs is low, and the possibility of overfitting is extremely low, at which time we compare the training accuracy, and the higher accuracy of the improved neural network due to overfitting can be excluded. To improve the fairness of comparison, the training parameters of the three different neural network models are set to be the same, the maximum epochs of training is 1000, the learning rate is 0.01, and the same gradient descent algorithm is used. At the same time, to avoid the influence of falling to the local optimum on the training results, all three models are initialized by randomly assigning weights and bias to the network. Considering the randomness of the single training results, the average error is     Figure 3. e horizontal axis represents the 1000 training epochs, and the vertical axis represents the average error of 100 training sessions at each training epochs. e black curve is the change of the average error of the improved neural network after 100 training sessions with the number of training epochs. e red curve is the change of the average error of the normalized neural network after 100 training sessions with the training epochs. e blue curve is the change of the average error of the general neural network after 100 training sessions with the training epochs. It is not difficult to see from the figure that, at every training epoch, the accuracy of the improved neural network proposed in this paper is higher than that of the benchmark model, which excludes the possibility that the improved neural network has high accuracy due to overfitting.

Application of the Model.
e trained BP neural network can be used for prediction of unsafe event in ATC department, by feeding new inputs to the network. For example, the safety managers in an air traffic control department have formulated the safety management objective in the next year, which requires that the number of unsafe acts of the controllers should not exceed five, the number of equipment failures should not exceed four, the number of management problems that cannot be solved in time should not exceed six, and the number of environmental problems should not exceed four. e question is what is the probability of an unsafe event, given that the above objectives can be achieved? e question can be answered using the model proposed in this paper. e solution process is described as follows.
In this paper, the entire samples are used for network training because of the limited samples, which is feasible for comparing the accuracy of different networks. However, when using the network for practical application, this approach may result in overfitting and lead to reduction in the generalization ability of the model. For addressing this problem, the early stopping method can be used to avoid overfitting. e general practice is to first divide the data into three subsets. e first subset is the training set, which is used for computing the gradient and updating the network weights and biases. e second subset is the validation set. e error on the validation set is monitored during the training process. e validation error normally decreases during the initial phase of training, as does the training set error. However, when the network begins to overfit the data, the error on the validation set typically begins to rise. e training stops when the error of the validation set is found to grow continuously in the training. e network weights and biases are saved at the minimum of the validation set error. e test set error is not used during training, but it is used to compare different model. Of course, the prerequisite is that there is enough data to ensure that there are enough samples in the different sets.
To illustrate the use of the network, the samples in Table 1 are divided into three sets, the training set, the validation set, and the test set. e training set contains five samples, whose number is 2, 6, 7, 8, and 9 in Table 1. e validation set contains two samples, whose number is 4 and 5 in Table 1. e test set contains two samples, whose number is 1 and 3 in Table 1. ere are two points to clarify: one is that how the samples in the set are assigned is random, and the samples in the previously mentioned set is the outcome after the authors have divided samples to different set and trained the network many times to achieve the best result. e other is that this process is only to demonstrate the use of the model, since the samples are not many. e training results may not be very good. is is the reason why all the samples are input to training without dividing the sample into different set, when the network comparison was made in Section 3.3. e training parameters are as follows: the learning rate is 0.01, the maximum number of epochs to train is 1000, the performance goal of training set is zero, and the maximum validation failures of validation set is six. After performing many training sessions, a network with better results was selected for illustration. e performance curves for different sets of improved BP neural network are shown in Figure 4.  Mathematical Problems in Engineering 13 In this training process, with the increase of training epochs, the errors of different sets are decreasing, and the decrease rate is smaller and smaller, which is fully consistent with the properties of gradient descent method. When the training epoch reaches 831, the error in the validation set was increasing for six consecutive times (the increase is small and the curve in Figure 4 is not clear) and reaches the maximum validation failure set by user. At this time, if the training continues, overfitting will occur, so the network stopped training. Obviously, the optimal network is obtained at 825 training epoch, so the network at this time can be used for practical applications.
Once the network has been trained, the question proposed in the first paragraph in this section on how to apply the trained network and known conditions to predict the probability of an unsafe event should be the focus. In fact, when the network has been trained, the problem becomes very simple. e probability can be predicted by feeding this set of data [5,4,6,4] into the network.
e predicted result is that the probability of unsafe event is 0.0006093, and the probability of safe event is 0.9993907.

Summary
(1) With its powerful ability of function fitting, BP neural network can be used to effectively predict the probability of unsafe event in air traffic control department, after reasonably designing the network structure and collecting historical data for training the neural network.  (2) e general BP neural network cannot guarantee the accuracy of output, if it is directly applied to predict the probability of unsafe event in ATC department, and it cannot ensure normalization of probability. However, after normalization of inputs and outputs before training the network with the help of normalized BP neural network, the prediction accuracy is improved, but the normalization of probability for actual outputs is still difficult to satisfy. (3) e improved BP neural network proposed in this paper has high accuracy for the training of small probability event and can guarantee that the sum of the probability of all events is equal to one.

Conclusions
As an important means to ensure the safety of civil aviation, air traffic control plays a pivotal role in the protection of passengers' lives and property. To improve the level of safety management of the department which performs air traffic control work, the authors introduce artificial neural network to predict the probability of unsafe event for the air traffic control department. A three-layer neural network containing an input layer, hidden layer, and output layer is designed to solve the problem. e influence that affects ATC safety but can be quantified and collected is used as inputs according to SHEL model. e probability of unsafe event is used as the outputs of the network. e general BP neural network cannot be used for the network training of small probability event, which is proved theoretically in this paper, because the general BP neural network does not ensure that the sum of probability of all the outputs is equal to one and the error between the actual outputs and desired outputs is very large after the training of neural network. To address this issue, a new mapping method is put forward from the probabilistic viewpoint in this paper. It is theoretically proved that the mapping method proposed in this paper can not only improve the training accuracy, but also ensure that the sum of probability is equal to one. Finally, a case study demonstrates that the improved BP neural network model in this paper has a higher accuracy in predicting the probability of unsafe event in the air traffic control department.
However, one should note that this study also has two limitations, which should be improved in future research endeavors. e first limitation is that our current analysis is based on Chinese civil aviation. In particular, the outputs of the neural network are built on the air traffic management rules of civil aviation in China. So, if the model is used outside of China, the outputs of the network need to be modified according to the actual situation in other countries. e second limitation is that, like other artificial neural networks, the number of the nodes in the hidden layer has a great influence on the accuracy of the network, which is the common fault of artificial neural network models. In the actual application, the number of the nodes in the hidden layer needs to be adjusted several times to achieve good results.
BP neural network needs data for training. If a few data in the dataset are wrong, these wrong data will have side effects on network training. When a large amount of data is collected, the data should be preprocessed to remove the unreasonable data, which is very helpful to improve the accuracy of the network. is paper does not study the method of filtering data, and how to filter the data to remove some unreasonable data is the direction of future research.

Data Availability
Data used to support this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of the paper.