Staring Imaging Real-Time Optimal Control Based on Neural Network

In this paper, a real-time optimal attitude controller is designed for staring imaging, and the output command is based on future prediction. First, the mathematical model of staring imaging is established. Then, the structure of the optimal attitude controller is designed. The controller consists of a preprocessing algorithm and a neural network. Constructing the neural network requires training samples generated by optimization. The objective function in the optimization method takes the future control effect into account. The neural network is trained after sample creation to achieve real-time optimal control. Compared with the PID (proportional-integral-derivative) controller with the best combination of parameters, the neural network controller achieves better attitude pointing accuracy and pointing stability.


Introduction
Staring mode of satellites can be used to collect the dynamic information of ground targets. Emergency rescue, airport, and traffic monitoring are all cases where staring imaging is more advantageous than conventional modes.
The optical axis of the camera is required to point fixedly to the ground target in the process of staring imaging. Thus, the attitude controller should run in a three-axis mode. Imaging quality is closely related to attitude control performance which is affected by the control strategy.
Control laws in analytical type are widely applied in attitude controllers for staring imaging. In those analytical control laws, one-step decision is usually considered instead of long-term planning. Lian proposed two different calculation methods of the desired angular velocity for staring imaging [1]. Zhang designed the control strategy under the condition that the target error can be detected, and that the actuator acts with delay [2]. Cui designed a PD adaptive controller for multiobjective staring imaging [3]. Zhang designed a controller to observe spatial cooperative targets [4].
Staring imaging is one application of attitude tracking control. For the more general tracking control problem, onestep decision instead of long-term planning is also the main concern. Wu [5] designed a combined control strategy consisting of a saturation controller for spatial motion targets and proposed a backstepping controller using multiexecuting mechanism control. Leonard designed a control law to monitor space debris [6]. An adaptive fault-tolerant controller with backstepping control is designed by Hu for attitude tracking control problems of rigid spacecraft under unknown but constant inertial parameters, unexpected disturbances, and faults under actuators or saturation [7]. Lu proposed an FNTSM control law with finite time control [8]. Pong designed a control law to achieve spacecraft pointing in a specified direction [9]. Wu implemented attitude tracking control based on iterative learning [10].
Recently, neural networks are mainly applied to the following three aspects in aerospace guidance and control.
First, neural networks are used to accelerate some evolutionary algorithms in aerospace. Cassioli et al. [11] applied SVMs (support vector machines) to improve initial guesses for some trajectory problems in the GTOP (global trajectory optimization problem) database. Basu et al. [12] deployed a neural network to set initial conditions for the PSO (particle swarm optimization), which reduced the planning time of reorientation for a space telescope.
Another application of neural networks is real-time trajectory planning. Izzo et al. [13] trained a DNN (deep neural network) to design optimal interplanetary trajectories. Furfaro et al. [14] designed neural networks based on LSTM (long short-term memory) to generate optimal trajectories for planetary landing tasks.
Neural networks are also directly used in aerospace controllers. Zou [15] proposed a scheme for limited time attitude tracking control of spacecraft using the terminal sliding mode and Chebyshev neural network. Biggs and Fournier [16] trained MLP to optimize the thruster application scheme to control the attitude of a 12 U CubeSat. CNNs (convolutional neural networks) have shown superior performance in the image classification, and they are also used as control networks. Furfaro et al. [17] designed a CNN to process simulated moon images and generate landing control commands in the simulation of a 2D environment.
Different from traditional control methods, the optimal control method proposed in this paper seeks the optimal control command sequence for a period of time in the future. The beginning of the command sequence is the current optimal control decision. However, the optimal control method requires iterative calculation, which consumes high computational resources and is not a real-time control method. Thus, the optimal control method is used to create training samples where the relationship between input and output is contained. Finally, the neural network is trained to achieve optimal control and real-time control simultaneously.
The main contributions of this paper are stated as follows.
(i) In the aspect of staring imaging, a controller which considers a future period of time and makes the optimal control decision is proposed, and it performs better than the traditional controller (ii) An algorithm for compressing the satellite state information is proposed. The state information consists of current dynamics of the satellite as well as the desired sequence of the future angle and angular velocity. After compression, only 12 scalars are prepared as the input of the neural network This paper is organized as follows. Section 2 states the staring imaging problem of the spacecraft system. The structure of the real-time optimal controller for staring imaging is designed in Section 3. In Section 4, auxiliary parts of the controller are designed. In Sections 5 and 6, the training sample set is generated, and the neural network in the controller is constructed, respectively. Simulation results are provided in Section 7. Finally, conclusions are given in Section 8.

Problem Formulation
2.1. Definition of the Coordinate System. Inertial frame S i ðOx i y i z i Þ: its origin is located at the mass center of the Earth. The x i axis points from the Earth towards the vernal equinox. The z i axis is along the Earth rotation axis. The y i axis is determined by the right-hand rule.
Satellite body frame S b ðOx b y b z b Þ: its origin is located at the mass center of the satellite. x b , y b , and z b axes lie along the corresponding principal axes of the satellite.
Desired satellite body frame S d ðOx d y d z d Þ: it is used for describing the desired attitude of the satellite. Its three axes lie along x b , y b , and z b when the satellite is located at the desired attitude.
Orbital reference frame S o ðOx o y o z o Þ: its origin is located at the mass center of the satellite. The y o axis is along the antisymmetric direction of the orbit momentum. The z o axis points to the Earth center. The x o axis is determined by the right-hand rule.

Kinematic and Dynamic
Equations. The satellite is considered as a rigid body equipped with 3 reaction wheels installed orthogonally along the three axes of the satellite body frame. The camera is fixed to the satellite, and its optical axis lies along the z b axis of the satellite body frame.
The mathematical model contains orbit and attitude constraints, both of which are described by kinematic and dynamic equations. The orbital kinematic equation in the inertial frame is given by where R s ∈ ℝ 3×1 is the satellite position and V s ∈ ℝ 3×1 is the satellite velocity. The orbital dynamic equation in the inertial frame is where μ is the product of the gravitational constant and the mass of the Earth. The attitude kinematic equation ( [18], pp. 344) is defined using quaternions where q = ½q 1 q 2 q 3 q 4 T = q v q 4 ½ T is the quaternion denoting the rotation from the inertial frame S i to the satellite body frame S b . q v = q 1 q 2 q 3 ½ T is the vector part, and q 4 is the scalar part. ω b = ω bx ω by ω bz Â Ã T is the angular velocity vector of the satellite body frame S b with respect to the inertial frame S i . It can also be expressed in terms of basis vectors The attitude dynamics equation is is the inertia tensor of the 3 reaction wheels. Ω w ∈ ℝ 3×1 is the angular velocity of the 3 reaction wheels relative to the satellite body. M c ∈ ℝ 3×1 is the control torque generated by reaction wheels. M ext ∈ ℝ 3×1 is the external disturbance torque. Figure 1, T img is the imaging time interval, and t imgi (i =1, 2, 3…) is the instant when image i is taken. Imaging quality has a close relationship with the nearby attitude control effect. For example, the image taken in t img2 is affected by the attitude control effect during [t img2 − T img /2, t img2 + T img /2]. Thus, two steps are designed to properly evaluate the attitude control effect for the whole task of staring imaging. First, the attitude control effects nearby each image are evaluated, respectively, (i.e., the attitude control effect nearby image i is evaluated based on the angle and angular velocity within [t imgi − T img /2,t imgi + T img /2]). Then, the average value of those effects is obtained for evaluating the whole task.

Attitude Control Effect Evaluation. As illustrated in
Pointing accuracy and stability are the two major categories that the payload attitude control requirements fall into [19]. Therefore, mean pointing error [20] and rate error [19] are included in the attitude control effect evaluation. Assuming that the number of all images is m, the pointing accuracy J A is proposed in Eq. (6), the pointing stability J ω is proposed in Eq. (7), and the control effect of the whole task is obtained by Eq. (8).
In Eq. (6), symbol j j means getting the absolute value of the variable within. p is the number of simulation steps during [t imgi − T img /2, t imgi + T img /2], and A eseq ∈ ℝ 3×ðp+1Þ is the Euler angle error sequence during that time. A eseqðj,kÞ is the element in row j and column k of the matrix A eseq . The column k in the matrix A eseq is denoted as A eseqðkÞ ∈ ℝ 3×1 , and the Euler angle error vector A eseqðkÞ = ½φ e , θ e , ψ e T represents the rotation from the desired satellite body frame to the Figure 1: Imaging time interval and imaging instants.  3 International Journal of Aerospace Engineering real satellite body frame, with the order 3-1-2 (i.e., ψ e − φ e − θ e ). Other Euler angles defined in this paper share the same method of rotation (ψ e − φ e − θ e ) as A eseqðkÞ .
In Eq. (8), k 1 and k 2 are the weights of each term.

Structure of the Real-Time Optimal Controller
As is shown in the dashed box named "Attitude Controller" in Figure 2, six modules are outlined. They are established in the following sections of this paper. The modules named "Calculating Staring Imaging Desired Sequence" and "Network Input Preprocessing" are explained in the section named "Auxiliary Parts of the Real-time Optimal Controller." The modules named "Normalization," "Neural Network," and "Anti-normalization" are central parts, which will be explained in two sections of this paper. In order to establish the neural network, the method of creating training samples will be firstly shown in the section named "Sample Creation." Then, the design of those three modules will be shown in the section named "Learning the Optimal Control Strategy." The module named the "PID Control Law" is designed in case that the real inputs exceed the designed input range of the neural network. This module only works under big disturbances, and it is tested in "Impulse Response" of "Results and Discussion." The PID coefficients are shown in Table 1.
The inputs of the controller consist of seven parts, which are current Euler angle A bo ∈ ℝ 3×1 from the star sensor, current angular velocity ω b ∈ ℝ 3×1 from the gyro, target longitude and latitude from telecommand, UTC (coordinated universal time) from the on-board computer, and current R s ∈ ℝ 3×1 as well as V s ∈ ℝ 3×1 from GPS (Global Positioning System). A bo represents the rotation from the orbital reference frame to the satellite body frame.
The output of the controller is the torque command M cmd ∈ ℝ 3×1 . Reaction wheels receive the torque command and generate the M c in Eq. (4).

Auxiliary Parts of the Real-Time
Optimal Controller 4.1. Calculating Staring Imaging Desired Sequence. The module named "Calculating Staring Imaging Desired Sequence" is used to predict and obtain the future data for the controller. The outputs consist of the desired Euler angle sequence A diseq ∈ ℝ 3×ðn+1Þ and the desired angular velocity sequence ω dseq ∈ ℝ 3×ðn+1Þ . n is the number of steps in the simulation to predict the future. Denote τ as the time taken in a single simulation step. Column j in matrix A diseq is defined as A diseqðjÞ ∈ ℝ 3×1 , and it represents the desired Euler angle (representing the rotation from the inertial frame to the desired satellite body frame) in time τðj − 1Þ. Column j in matrix ω dseq is defined as ω dseqðjÞ ∈ ℝ 3×1 , and it represents the desired angular velocity (in the desired satellite body frame) in time τðj − 1Þ. A diseq and ω dseq are calculated according to the states of the satellite, and the algorithm is shown in ref. [1].

Network Input Preprocessing.
The module named "Network Input Preprocessing" is used to calculate the input for neural network. The inputs consist of A bo , ω b , A diseq , and ω dseq , and the outputs consist of The desired angular velocity ω d is the first column in matrix ω dseq .
The method of calculating the first derivative of the desired angular velocity _ ω d is The Euler angle A bd is obtained by the following steps. First, the satellite current Euler angle A bo is converted [18] to coordinate transformation matrix L bo ∈ ℝ 3×3 (representing the rotation from S o to S b ). The current desired Euler angle A diseqð1Þ is converted to coordinate transformation matrix L di . The coordinate transformation matrix from S i to S o is defined as L oi , and it can be obtained [21] according to R s and V s . The coordinate transformation matrix from S d to S b is obtained by L bd = L bo ⋅ L oi ⋅ L T di . Finally, L bd can be converted toA bd .
The current angular velocity error vector ω e = ω b − ω d represents the difference between the angular velocity ω b and the desired angular velocity ω d .  Figure 3.
To start the algorithm, the interior-point optimizer firstly provides a three-axis zero torque command sequence M cmdseq ∈ ℝ 3×q , which is the matrix to be optimized in the loop. q is the number of torque commands used in the simulation of the "Attitude Dynamics" module.
The inputs of the "Attitude Dynamics" module are M cmdseq as well as the initial values, and the outputs of that module are the Euler angle sequence A biseq ∈ ℝ 3×ðn+1Þ as well as the angular velocity sequence ω bseq ∈ ℝ 3×ðn+1Þ . n is the number of steps in the simulation.

Definitions of Sample Input and Output.
Each sample is divided into a sample input vector S in and a sample output vector S out , and their definitions are The number of samples should cover the input range of the neural network. The input vector S in of each sample is created randomly, with the 12 elements selected within the designated input ranges shown in Table 2. The ranges are selected according to the requirement of staring imaging tasks. If the input is out of the designated range, M cmd will be generated by the PID controller instead of the neural network. This function is tested in the simulation part of this paper. The variables in Table 2 are defined as and ω e = ½ω ex , ω ey , ω ez T .

Calculating the Desired Sequence.
From the desired angular velocity ω d and its first derivative _ ω d at the initial time, the desired angular velocity sequence ω dseq can be generated by where τ is the simulation time step and ω dseqðjÞ ∈ ℝ 3×1 is the column j in matrix ω dseq . The desired Euler angle sequence A diseq ∈ ℝ 3×ðn+1Þ is calculated via quaternions. The desired angular velocity sequence ω dseq can be integrated in time to obtain the quaternion sequence q diseq ∈ ℝ 4×ðn+1Þ (denoting the rotation from the inertial frame S i to the desired satellite body frame S d ). The column j in matrix q diseq ∈ ℝ 4×ðn+1Þ is q diseqðjÞ ∈ ℝ 4×1 . It is set that the inertial frame coincides with the desired attitude frame at the beginning of the simulation in sample creation. Thus, the quaternion at the initial time is q diseqð1Þ = ½1, 0, 0, 0 T . The following equations are used to calculate the quaternion sequence q diseq .
where the function ΩðωÞ is defined as , ð16Þ The quaternion sequence q diseq can be converted to the desired Euler angle sequence A diseq .

Calculating the Initial Values of Simulation.
The desired satellite body frame coincides with the inertial frame at the initial time. Thus, the initial Euler angle of the satellite is where A bi denotes the rotation from the inertial frame S i to the satellite body frame S b . The initial satellite angular velocity in the satellite body frame is 5.5. Attitude Dynamics. It has been defined that q is the number of commands in M cmdseq ∈ ℝ 3×q , and that n is the number of steps in the simulation. Assuming that T is the time interval between two commands, the relationship between q,T,n, and the simulation time step τ is where k is the ratio of T to τ and it is a positive integer. The standard fourth-order Runge-Kutta method is used to calculate the kinematics and dynamics of the satellite. The Euler angle sequence A biseq ∈ ℝ 3×ðn+1Þ and the angular velocity sequence ω bseq ∈ ℝ 3×ðn+1Þ in the whole process are recorded, and they are obtained after the simulation. 5.6. Calculating the Objective Function. The objective function J (defined in Eq. (8)) is used to evaluate the performance of M cmdseq . The smaller the objective function is, the better the control performance is.

Interior-Point Optimization.
At the beginning of the sample creation for a sample input S in , the interior-point optimizer creates a three-axis zero torque command sequence M cmdseq ∈ ℝ 3×q . The objective function J is obtained after the simulation where M cmdseq is tested. Then, the optimizer makes the judgment according to J. If the exit condition is satisfied, the loop exits. If it is not satisfied, a new M cmdseq is generated and the optimization continues.
The terminal value of M cmdseq is generated when the loop exits. The sample output S out is obtained by where M cmdseqð1Þ is the first column of matrix M cmdseq .

Learning the Optimal Control Strategy
In this section, the neural network, the normalization module, and the antinormalization module in the attitude controller are established, which are the three central parts shown in Figure 2. The three modules can be modeled as function N. The algorithms of normalization and antinormalization can be found in ref. [22].
6.1. Division and Normalization of the Sample Set. The whole sample set is divided into two parts, which are the training sample set and the test sample set. 75% of the samples are randomly selected as training samples, and the others are test samples. The training sample set consists of two matrices. One is the sample input matrix I train ∈ ℝ 12×N train and the other is the corresponding sample output matrix O train ∈ ℝ 3×N train . N train is the number of samples in the training sample set. Each row of the two matrices has been normalized to [-1,1] before the training sample set is applied to training neural networks.
The test sample set also consists of two matrices. One is the sample input matrix I test ∈ ℝ 12×N test , and the other is the corresponding sample output matrix O test ∈ ℝ 3×N test . N test is the number of samples in the test sample set. Each row of the two matrices has been normalized to [-1,1] before the test sample set is applied to testing the neural networks.
6.2. Training the Neural Networks. The structure of the neural network is inputlayer-feedforwardlayer-feedforwardlayer. The stochastic gradient descent (SGD) algorithm is applied  International Journal of Aerospace Engineering to the training process. In order to reduce overfitting, dropout (the fraction of dropout is set to 0.1) and weight decay (the regularization parameter is set to 0.01) are also applied. Different network structure parameters and training parameters are tested to achieve the best performance. The values of parameters are selected in a linear method and are shown in Table 3. All parameter combinations are used in training, and 3876neural networks (3876 = 4 × 51 × 19) are obtained.
6.3. Evaluating the Neural Networks. In the process of evaluating a neural network, the output matrixÔ test ∈ ℝ 3×N test is obtained bŷ where I testðjÞ ∈ ℝ 12×1 is the column j in matrix I test ∈ ℝ 12×N test andÔ testðjÞ ∈ ℝ 3×1 is the column j in matrixÔ test ∈ ℝ 3×N test . Define that O testði,jÞ is the element in row i and column j of matrix O test , and thatÔ testði,jÞ is the element in row i, and column j of matrixÔ test . MSE (Mean Squared Error) is defined as Finally, the neural network with the smallest MSE (0.00011686) is obtained. Its parameters used in training are shown in Table 4, and its training process is shown in Figure 4.

Results and Discussion
7.1. Simulation Condition Design. The satellite orbit parameters at the initial time are shown in Table 5. The initial time is 2017-05-31-04-40-00 (in yyyy-MM-dd-HH-mm-ss format).
As is shown in Figure 5, 100 ground targets are randomly selected in the circle with the center at 120°E 30°N and a radius of 100 km. After all the ground targets are tested, the average performance J is obtained by where N target = 100 and J is the performance defined in Eq. (8) of a single test.    Equal to the desired initial Euler angle for each ground target Initial angular velocity Equal to the desired initial angular velocity for each ground target   The average pointing accuracy J A is obtained by The average pointing stability J ω is obtained by Other simulation parameters are defined in Table 6.

Searching the Optimal Coefficients for the PID Controller.
Different combinations of coefficients in the PID controller are tested to achieve its best performance as the baseline for comparison. The coefficients of each axis are optimized, respectively, using the same combinations (12240 = 60 × 4 × 51) defined in Table 7. The Euler angles and angular velocities used in the PID control law are expressed in degrees. All the ground targets defined in Figure 5are tested for each coefficient combination, and the average performance J is calculated. Then, the optimal PID coefficient combination is obtained according to Eq. (25) (k 1 = 1 and k 2 = 5, in order to balance the pointing accuracy and the pointing stability).

Performance Comparison of Steady-State Control.
The comparison of steady-state control performance between the PID controller and the neural network controller consists of two parts. The first part is testing the condition in which the satellite stares at the central ground target (at 120°E 30°N). Curves of the Euler angle error and angular velocity error are provided. The second part includes testing all the ground targets shown in Figure 5and calculating the average performance J.  Table 8 and the error curves.
Error curves of the PID controller are shown in Figure 6 and Figure 7.
Error curves under the control of neural network are shown in Figures 8 and 9. 7.5. Testing all the Ground Targets. All the ground targets defined in Figure 5are tested for the neural network controller and the PID controller with the best coefficient combination. The neural network controller performs better according to the results shown in Table 9. Figure 10 is provided to visualize the performances and excludes the influence of k 1 and k 2 . The x-axis and y-axis denote the pointing accuracy J A and the pointing stability J ω , respectively. The blue points represent the performances of the PID controller on all the ground targets, and the red points represent the performances of the neural network controller.
7.6. Performance Comparison of the Dynamic Response 7.6.1.
Step Response. In order to test the step response, two simulations are executed for each controller and each ground target. One is the normal simulation, and the other is the simulation in which an extra angle measurement error (0.005°to three axes) is added at 30s and later. According to the difference between the two Euler angle sequences, the overshoot, settling time (the error band is ±5%), and steady-state error are obtained.
The average values of overshoot, settling time, and steady-state error for all the ground targets are shown in Table 10 (PID controller) and Table 11 (neural network controller). The Euler angle difference of the two simulations (the satellite stares at 120°E 30°N) are provided in Figure 11 (PID controller) and Figure 12 (neural network controller). The neural network controller shows an advantage over the PID controller on all the three indices (in column "Average of Three Axes") according to the results.      11 International Journal of Aerospace Engineering 7.7. Frequency Response. In order to test the frequency response, external periodical torque is applied in the simulation in which the satellite stares at the central ground target. The amplitude of the external periodical torque is 0.03 Nm, and the response of three axes is tested, respectively. Magnitude-frequency characteristic curves of the two controllers are shown in Figure 13 (PID controller) and Figure 14 (neural network controller). The amplitude of the neural network controller is higher than that of the PID controller when the frequency is lower than 2 Hz, and the amplitudes are similar when the frequency is higher. 7.8. Impulse Response. A big disturbance is applied to the system under the control of the neural network in order to test the stability of the system (the satellite stares at 120°E 30°N). The disturbance starts at 30s and lasts for 0.5 s, and its amplitudes for the three axes are [1.0, 1.0, 1.0] Nm. Figure 15 shows that though the Euler angle error has exceed the input range (defined in Table 2) of the neural network, and the PID control law in the proposed attitude controller (in Figure 2) manages to converge and gives the control back to the neural network.
7.9. Testing the Real-Time Control Ability. It takes an average of 6.1 seconds to create a sample using CPU i7-8700. As a result, applying the loop of optimization (defined in Figure 3) to the satellite attitude controller does not meet the real-time control requirement (e.g., the control period is usually set to 0.1 s). The calculation time of the whole neural network controller using ARM (STM32F417) is 3.3 ms

Conclusions
In this paper, the optimal control and machine learning method are applied unitedly in the attitude control of staring imaging. Compared with the PID controller with the best coefficients, the neural network controller shows advantages in the steady-state performance (including pointing accuracy and pointing stability), which is the most important in staring imaging tasks. As for the dynamic response, the neural network controller performs better in the step response and worse in the frequency response than the PID controller. The proposed controller also shows the ability to converge when a big disturbance is applied. The calculation time of the proposed attitude controller on ARM (STM32F417) is 3.3 ms, which meets the requirement of real-time control.