SL-BiLSTM: A Signal-Based Bidirectional LSTM Network for Over-the-Horizon Target Localization

Deep learning technology provides novel solutions for localization in complex scenarios. Conventional methods generally suﬀer from performance loss in the long-distance over-the-horizon (OTH) scenario due to uncertain ionospheric conditions. To overcome the adverse eﬀects of the unknown and complex ionosphere on positioning, we propose a deep learning positioning method based on multistation received signals and bidirectional long short-term memory (BiLSTM) network framework (SL-BiLSTM), which reﬁnes position information from signal data. Speciﬁcally, we ﬁrst obtain the form of the network input by constructing the received signal model. Second, the proposed method is developed to predict target positions using an SL-BiLSTM network, consisting of three BiLSTM layers, a maxout layer, a fully connected layer, and a regression layer. Then, we discuss two regularization techniques of dropout and randomization which are mainly adopted to prevent network overﬁtting. Simulations of OTH localization are conducted to examine the performance. The parameters of the network have been trained properly according to the scenario. Finally, the experimental results show that the proposed method can signiﬁcantly improve the accuracy of OTH positioning at low SNR. When the number of training locations increases to 200, the positioning result of SL-BiLSTM is closest to CRLB at high SNR.


Introduction
High-precision location of the over-the-horizon (OTH) target is a critical issue in the fields of space target surveillance and navigation. Although wireless localization theories and methods have made great progress in recent years, there are still many challenging problems existing in OTH scenarios [1]. e existing passive localization algorithm is mainly classified into two modes [2][3][4][5][6], twostep and direct positioning algorithms. Azimuth of arrival (AOA), time difference of arrival (TDOA), frequency difference of arrival (FDOA), and multiparameter joint estimation methods are two-step positioning mode. Twostep positioning methods based on the second-level information fusion inevitably have information loss. e location precision is strictly limited by the accuracy and matching degree of measuring parameters. Another type of algorithm is direct position determination (DPD) [7][8][9][10][11], which overcomes the shortcomings in two-step position methods and significantly improves positioning performance under low signal-noise ratio conditions. By establishing the received signal model, DPD obtains information directly from the signal data according to the maximum likelihood criterion. DPD can also integrate channel information (such as ionospheric structure) into the position estimation model [12]. However, the actual OTH scene is extremely complicated, and the shortwave communication often relies on ionospheric reflection, resulting in a serious decrease in location precision [7,12]. Uncertain ionospheric conditions and lower signal-noise ratio make it difficult for the existing position algorithms to model complex channel scenes. us, OTH positioning still faces huge challenges even with current advanced DPD algorithms.

The Received Signal Model of OTH Target
e multiarray passive location of the OTH target is presented in Figure 1. e antenna arrays of multiple receiving stations receive signals and transmit the signal data to the central station.
e central station combines the signals received by each station and the prior observation information of the ionosphere to directly estimate the target position. OTH positioning has the characteristics of nonline-of-sight and long target distance, which is mainly realized by shortwave communication and ionosphere reflection. To simplify the model, the positioning model in this paper only considers the influence of ionospheric virtual heights on the signal propagation path, assuming that the geometric structure of the signal reflected by the ionosphere is symmetrical.
Consider a transmitter and L base stations intercepting the transmitted signal. Each base station is equipped with an antenna array composed of M elements. In this paper, we mainly consider an X-Y plane coordinate of localization.
us, the two-dimensional coordinates of target position and lth base station position are defined as p � [x 0 , y 0 ] T and q l � [x l , y l ] T . e signal observed by the lth base station array is given by where r l (t) is a time-dependent M × 1 vector, a l (p) is the lth array response to a signal transmitted from position p, O l (p) is a complex matrix representing the channel effect, and s(t − t 0 ) is the signal waveform, transmitted at time t 0 . e vector n l (t) represents noise and interference, including multipath observed by the array. T presents the length of observing time. e sampled version of the signal in (1) is given by e received data of each station can be combined into a matrix: en, the discrete Fourier transform (DFT) of the signal in (2) is given by where j � 1, 2, . . . , J indicates the DFT coefficient of the corresponding time samples. We assume that the propagation path of electromagnetic waves is mainly affected by ionospheric virtual heights at reflection points [12]. erefore, the channel effect O l (p) which is mainly composed of signal transmission delay and attenuation can be calculated as where b l is an unknown complex scalar representing the attenuation related to the lth base station, τ l (p, h) is the lth station signal delay transmitted from position p, and h � [h 1 , h 2 , . . . , h l ] T is the unknown ionospheric virtual heights at reflection points. It is well known that the distribution of the ionosphere is related to the spatial position. In this paper, we define h l � f(p lr ) � f(x lr , y lr ), where p lr � [x lr , y lr ] T is the coordinate reflection point, and f(·) is the nonlinear mapping from space position to ionospheric virtual height. We set T ≫ max τ l (p, h) to ensure a reasonable range of time delay estimation. Define the vectors: where ⊗ represents the Kronecker product, I L×L represents the unit matrix of size L × L, and 1 M×1 is M column vector whose elements are all one. We can rewrite (7) as r l (j) � A(j)bs(j) + n(j), j � 1, 2, . . . , J.
en, the received data of each station can be combined into a matrix: e next step is to determine the probable location information of the Q radio sources from r(j). If we assume that the virtual height h of the ionosphere is known, the Cramér-Rao lower bound (CRLB) of the conventional DPD algorithm is provided in [7,12]. However, ionospheric parameters including virtual height distribution are uncertain and changeable, so we need highly nonlinear algorithms to handle the localization of complex scenes.

Proposed Location Method
3.1. Signal-Based Localization BiLSTM. Since LSTM has the powerful ability in handling sequential data and capturing temporal dependencies in data, it has been successfully applied in many applications [29][30][31]. In this paper, we utilize a BiLSTM network based on signal data for the OTH target localization. A standard structure of BiLSTM can be found in Figure 2, and the architecture of LSTM is formulated as follows: where • denotes Hadamard product, tanh is the hyperbolic tangent function, and x k is the input vector. W m , Q m , b m (m ∈ i, f, o, g ) are learnable parameters. e output of LSTM at time k is h k .
Location features of OTH targets are difficult to model and have high complexity. e received signal segments from each base station at their specific location can be treated as data with complex position feature distribution and multiple noises. Our proposed architecture of SL-BiLSTM is shown in Table 1, which is developed with a BiLSTM encoder (including three BiLSTM layers and a maxout layer [32]), a fully connected layer, and a regression layer. Received signal segments are fed into the BiLSTM encoder to obtain the high-level encoding for the localization feature. e fully connected layer and linear regression layer are utilized to estimate final target positions. To learn the target position from the complex data, a large amount of signal data from different positions is required.
Under the same assumption as [7,12] that the signal waveform is known to the receivers, we consider the problem of OTH localization as learning the mapping from received signal segments X � r i (t) n i�1 to the target location In our model, X are the T long signal segments of L base stations. As shown in Figure 1, we compute r(k) as equation (6) of these signal segments and use both their real and imaginary parts as the network input.
rough the SL-BiLSTM encoder and regression layer, we can get the predicted coordinates of target locations . en, we calculate the mean square errors of Mathematical Problems in Engineering the predicted locations and the true locations of training samples and backpropagate the prediction loss to update our SL-BiLSTM parameters. e prediction loss is calculated as where and W b , W r are the weight of BiLSTM and regression layer. We adopt an optimization method named Adam [33] to calculate the adaptive learning rate of network parameters W b and W r and then optimize the parameters on the objective function: Specifically, assuming that θ t is the parameter of the network and g t is the corresponding gradient, Adam algorithm gets the updating θ t+1 as where α t and β t are the first and second moments of the gradient, respectively; c is the learning rate; and λ 1 , λ 2 , and ε are set to be 0.9, 0.999, and 10 − 8 , respectively. In this paper, we select the step declining learning rate (StepLR) as our learning rate reduction strategies with an initial learning rate c.

Regularization Methods.
Deep networks often become overfitting, especially in regression problems where the number of training samples is limited. To obtain a robust network that avoids overfitting the training data, appropriate regularization methods are essential [34]. In this paper, techniques of dropout and randomization are mainly adopted to prevent overfitting.

Dropout Layers.
First, we adopt two dropout layers after the last layer of the LSTM and the fully connected layer, respectively. In the dropout layer, parts of the neuron are randomly masked with a certain probability during each round of parameter update. Assuming that the dropout probability is p d and w i is the node weight during round i, the output z i at (l + 1) th layer during the training process can be expressed as where i is the output of the previous layer, and b (l+1) i is the network bias parameter. To achieve better generalizability for coordinate regression, the dropout rate is increased from 0.1 to 0.6. e maxout layer is also a dropout method which deletes a part of the network that is not sensitive to the input data to reduce the risk of overfitting.

Sample Randomization.
Second, since the number of training samples in actual scenes is limited, the distribution characteristics of training samples that are not related to the localization scene may also cause overfitting. For example, we cannot directly apply the grid points in the area as training samples, because this will cause the neural network to overlearn "on-grid" as one of the location features. erefore, we uniformly randomize the positions of the training samples in the target area. e specific operation is illustrated in Figure 3. We divide the normalized area evenly (gray line) according to the predetermined sample number and then randomly select the training sample position (blue dots) within the divided subregions. is regularization method not only ensures the adaptability of the network to the entire positioning area but also avoids the overfitting problem caused by the limited training samples.
We take I t target emission source location as training samples (blue dots) to train our network, and randomly generate I v validation samples (red circles) to verify the network convergence performance. Finally, 100 Monte Carlo simulation experiments were performed under several SNR conditions at five test positions (red asterisks).

Experiments
In order to examine the method's performance and compare it with existing approaches, we perform extensive Monte Carlo simulations. Since the OTH localization problem is related to the ionosphere, traditional localization systems usually utilize the multistation direction finding and intersection method for positioning. us, AOA and DPD methods based on the MUSIC algorithm [2,11] and the OTH DPD (ODPD) [7,12] method with fixed virtual height hypothesis are selected for performance comparison. We use machines equipped with Intel Core i7CPU, 32 GB RAM, and NVIDIA GeForce RTX 2080Ti GPUs to train our model and complete simulation experiments.

Simulation Scenarios.
Consider three base stations placed at three corners of 2000 km × 2000 km square as shown in Figure 4, and their coordinates are shown in Table  2. Each base station is equipped with a circular array of nine antenna elements. e radius of the array is set at one wavelength.
e OTH target area is 1000 km × 1000 km square. As shown in Table 2, these 5 test locations are not the same as the 100 locations (denoted as training locations with labels 1, 2, . . . , 100 in Table 2 and Figure 3) in the training dataset and the coordinates are accurate to two decimal places to ensure high-precision positioning capability. e actual distribution of the ionosphere changes with time and geographic location. In this scenario, we ignore time-varying effects and the influence of the Earth sphere. Conventional methods often assume that the ionospheric virtual height is a known constant, which increases the positioning error. Without loss of generality, we use the DCT low-dimensional coefficients of a random matrix (uniformly distributed in [160 km, 240 km]) to simulate the spatial distribution of the ionospheric virtual height, as shown in Figure 5. Since the reflection point is difficult to measure, we assume that the reflection is geometrically symmetric to simplify the model. en, we consider the situation that the propagation path of the signal through the ionosphere is only affected by ionospheric virtual heights at reflection points.
us, ionospheric virtual height can be written as h l � f c (p lr ) � f c [(p + q l )/2]. e emission signal is an 8PSK signal with a known waveform, the sampling frequency is 1 MHz with ten times of sampling per symbol. e number of signal samples per position is 100. Each location prediction is based on 64 times of samples of the signal. e path-loss attenuation magnitude is set to be normal distribution (mean � 1, STD � 0.1) with phase uniformly distributed in [−π, π].
en, we can calculate the received signal waveform matrix of the corresponding target position according to the model in Section 2 as the network input data. We combine all the position labels and the corresponding received waveform data to finally form the dataset required for network training. We utilize the root mean square error (RMSE) of the target positioning obtained by the Monte Carlo experiment to measure the localization performances:

Discussion on Training Hyperparameters.
In this subsection, we discuss the training hyperparameters, including the learning rate, batch size, dropout rate, and training loss threshold. Appropriate hyperparameters will improve network convergence performance to a certain extent. e method of selecting network hyperparameters is usually based on data characteristics and empirical. In this paper, considering the data dimension and network scale, we first adjust the batch sizes and the corresponding learning rate (initial rate and delay step) according to the training loss. en, we adjust the dropout rate from 0 to 0.5 according to the validation loss to avoid the network from falling into overfitting.
In the entire training procedure, different hyperparameters are used to train the network. Both 64 and 128 are suitable batch sizes. Considering the GPU parallel efficiency and learning rate decay strategy, N b � 128 is finally selected. In addition, if the initial learning rate is too large or too small, it may make it difficult for the network to converge to the best result. After multiple sets of parameters debugging, we finally select 0.001 as the initial learning rate. Finally, increasing the dropout rate to 0.5 can effectively avoid network overfitting and the loss of validation data gradually converges. e five sets of parameters and corresponding network losses are given in Table 3, where parameter (d) is set as the final parameter. We use machines equipped with NVIDIA GeForce RTX 2080 Ti GPUs to train our model.

Experimental Results and Discussions.
In this subsection, we evaluate the performance of our method through the experimental results ofN M � 100 Monte Carlo simulations at 5 testing target locations. First, Figure 5 shows the predicted locations of 5 testing targets for 100 experiments and their RMSE at SNR 10 dB, and the corresponding CDF curves of localization error are shown in Figure 6. en, we Input: e received signals X t � r i (t) . Initialize: Learning rate c, decline step d, training batch size N b , maximum train epoch 500; the initial state (h 0 l , c 0 l ) of LSTM is (0, 0); and network weight W 0 is zeros; (1) Dataset creation:   Mathematical Problems in Engineering select several SNR values between −10 dB and 10 dB. At each SNR value, we conducted 100 experiments on target A to obtain the performance statistics of our proposed method. In traditional positioning systems, the OTH problem usually adopts the method of multistation direction finding and intersection. erefore, we select the following different approaches for performance comparison: AOA estimation based on MUSIC algorithm, MUSIC-DPD, ML-DPD, and ODPD [11,12]. e comparison curve of the localization RMSE performance of the above-mentioned methods is shown in Figure 7. Figures 5 and 6 indicate that although the localization accuracy of different geometric positions is different, the results of 100 location experiments on five test targets are still stable. Target B has the highest accuracy, with 50% and 80% localization errors within 8.57 km and 12.54 km, respectively. Location accuracy of target D is relatively low, and 50% and 80% localization errors are within 11.13 km and 16.96 km. e plots in Figure 7 indicate that SL-BiLSTM is superior to AOA, MUSIC-DPD, ML-DPD, and ODPD, especially at low SNR. Owing to the uncertainty of ionospheric distribution, all methods fail to reach the theoretical level even at high SNR.
To adapt to complex situations that are difficult to model by conventional mathematical methods, deep learning models often need a large number of parameters and gradient calculations, which makes the training process take more time. However, the training process of deep learning models is generally completed offline, and the trained model can be directly used for online positioning tasks. e comparison results of the online positioning time complexity of 100 Monte Carlo experiments are given in Table 4.
e results show that the well-trained SL-BiLSTM model is significantly faster than other methods when performing online positioning.

Impact of the Number of Training Locations.
e network performance depends on the training dataset to a certain extent. In this paper, the impact of the dataset is mainly reflected in the number and distribution of training locations. erefore, we further study the influence of the number of training locations on the positioning accuracy in the target area in Figure 5. e localization RMSE varies with SNR, corresponding to the number of training locations 50, 100, and 200, shown in Figure 8.
e experimental results show that, as training positions increase, the positioning accuracy increases accordingly. e performance advantage of the proposed method is more obvious at low SNR. When training locations are insufficient such as the number of 50, the network performance does not reach the desired accuracy even at a high SNR. In addition, training locations represent the global reference of the entire area, while the accuracy of different locations is different. Using global reference data to estimate a single target may lose part of the local accuracy. Choosing appropriate reference locations according to actual scenarios is also one of the issues that worth further researching. In summary, SL-BiLSTM can learn complex channel features from signal data after proper training and finally give more accurate location predictions.

Conclusions
In this paper, a deep learning method for OTH positioning called SL-BiLSTM was proposed. e proposed SL-BiLSTM encodes location features in signal data based on BiLSTM structure and obtains a position estimation model by training on reference locations. e number of reference positions is limited in practical applications. We utilize regularization methods to solve the network overfitting caused by these limitations. Simulation experiments verify that SL-BiLSTM has higher positioning accuracy in OTH scenarios, while conventional localization methods generally suffer performance losses. Our work provides a novel method and experimental basis for long-distance OTH positioning.
ere are still many points worthy of further study, such as how to choose as few reference locations as possible to achieve high accuracy. e experimental results further also illustrate that deep learning methods have obvious advantages especially in complex scenarios that cannot be modelled. e deep learning method is affected by the scene data, so in the application with the proposed method, we need to consider different OTH positioning problems with different distributions of the ionosphere. When the ionosphere distribution is significantly different, the model will need more data to be further adjusted. In addition, since the ionosphere distribution will change with time, to improve the generalization performance of our SL-BiLSTM network, data with sufficient observation time and improved models are required for sufficient pretraining of the network, which is a promising direction for future work.
Data Availability e method of generating simulation data has been described in the experiments of the paper. All data included in this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.