Resonant Frequency Modeling of Microwave Antennas Using Gaussian Process Based on Semisupervised Learning

For the optimal design of electromagnetic devices, it is the most time consuming to obtain the training samples from full wave electromagnetic simulation software, including HFSS, CST, and IE3D. Traditional machine learning methods usually use only labeled samples or unlabeled samples, but in practical problems, labeled samples and unlabeled samples coexist, and the acquisition cost of labeled samples is relatively high. (is paper proposes a semisupervised learning Gaussian Process (GP), which combines unlabeled samples to improve the accuracy of the GP model and reduce the number of labeled training samples required. (e proposed GP model consists two parts: initial training and self-training. In the process of initial training, a small number of labeled samples obtained by full wave electromagnetic simulation are used for training the initial GP model. Afterwards, the trained GP model is copied to another GP model in the process of self-training, and then the two GP models will update after crosstraining with different unlabeled samples. Using the same test samples for testing and updating, a model with a smaller error will replace another. Repeat the self-training process until a predefined stopping criterion is met. Four different benchmark functions and resonant frequency modeling problems of three different microstrip antennas are used to evaluate the effectiveness of the GP model. (e results show that the proposed GP model has a good fitting effectiveness on benchmark functions. For microstrip antennas resonant frequency modeling problems, in the case of using the same labeled samples, its predictive ability is better than that of the traditional supervised GP model.


Introduction
In recent years, for the optimization design of electromagnetic devices, some excellent research results have been achieved by numerical simulation calculation or combining full wave electromagnetic simulation software such as HFSS with global optimization algorithm, such as particle swarm optimization (PSO) [1]. In general, microwave devices can be simulated by HFSS software to obtain accurate results. However, when global optimization algorithm is combined with HFSS software, each updating needs to call HFSS for evaluation, which is costly and very time consuming. erefore, using a surrogate method instead of HFSS to evaluate the fitness of electromagnetic devices can save greatly optimization time, which is a hot topic in electromagnetic optimization design. Many researchers have proposed lots of surrogate methods, such as artificial neural network (ANN) [2,3], support vector machine (SVM) [4,5], kernel extreme learning machine (KELM) [6,7], and Gaussian process (GP) [8,9].
GP is a machine learning method that has developed rapidly in recent years. It has a strict statistical theoretical basis and is suitable for dealing with complex problems such as high dimensions, small samples, and nonlinearity [10,11]. GP has been developed with continuous research studies on Bayesian neural network (NN) and has the advantages such as flexible nonparametric inference, adaptive acquisition of super-parameters, and predictive outputs. In the electromagnetic field, many scholars have made some achievements in the application of GP, verifying the feasibility of GP as an alternative method of electromagnetic simulation software.
However, to our best knowledge, the most GP modeling of electromagnetic behaviors are based on supervised learning. e labeled training samples used in GP modeling are based on HFSS. e acquisition of labeled samples by HFSS will consume a lot of time, which is also the main reason affecting the efficiency of antenna optimization. erefore, semisupervised learning (SSL) [12,13] method is proposed on the basis of existing research studies in this study. Traditional machine learning techniques rely on large numbers of labeled samples for training. In practical electromagnetic engineering, it is difficult to obtain labeled samples, while unlabeled samples are cheap and easy to be obtained [14]. SSL is a learning method between supervised and unsupervised learning [15], mainly considering the combination of labeled samples and unlabeled samples to improve the learning efficiency, which is suitable for regression and classification problems. Specifically, SSL methods include self-training [16], co-training [17], graphbased methods [18], EM-with generative modes [19], and transductive SVM [20]. In this paper, the self-training method is combined with GP modeling to solve the antenna optimization design.
Self-training is one of the SSL methods, which is simple and effective without specific assumptions [21,22] and is commonly used to deal with classification problems. Based on the traditional self-training method, this paper proposes the SSL-based GP model, which is used to predict the resonant frequency of microstrip antennas (MSAs) that belongs to the regression problems.
e SSL-based GP model proposed in this study includes two parts: initial training and self-training. In the initial training, a few labeled samples are used to obtain a GP model with low accuracy, and then the initial error of the GP model can be obtained. Before self-training, the trained GP model in the initial training process is copied to another. Next, different unlabeled samples are inputted, respectively, into each GP model and the corresponding outputs are achieved. e two models are cross-trained with the generated pseudolabeled samples, and then the two GP models are updated and differ from each other. e same test samples are used to verify the two updated GP models, and the model with smaller error is used to substitute another one, and then a more accurate training sample set can be produced for future training the GP model. Repeat the self-training process until a predefined error threshold is met. Four benchmark functions and resonant frequency modeling problems of three MSAs are used to evaluate the effectiveness of the proposed algorithm.
rough the experiments of test functions and resonant frequency of three different MSAs, we can get the conclusion that the predictive ability of the proposed GP model in this study is better than that of the traditional supervised GP model. e properties of GP are determined by the mean function and the covariance function [23], which can be expressed as

Semisupervised Learning Model
where x, x ′ ∈ R d , m(x) is the mean function, and k(x, x ′ ) is the covariance function. Furthermore, it can be expressed as For the regression model y � f(x) + ε, the observed target value y polluted by additive noise ε that is the random variable subject to normal distribution, and its mean is 0 and its variance is σ 2 n ; therefore, e prior distribution of y is given by where K � K(X, X) is the symmetric positive definite covariance matrix of order n × n, k ij measures the correlation x i and x j , and n training sample outputs y, and n * testing sample outputs f * constitute the joint Gauss prior distribution, that is, GP can choose different covariance functions [24], usually using the square exponential covariance function as follows: e properties of the mean function and the covariance function of GP are determined by a set of super-parameters. By establishing the log-likelihood function of the conditional probability of the training samples, the partial derivative of the super-parameters is obtained. e optimal solution of the super-parameters is found by the conjugate gradient optimization method. e form of the negative log-likelihood function is After obtaining the optimal super-parameters, the trained GP used to perform the relevant prediction.

Predicting.
Given the new input x * , the input value of the training set X, and the observed target value y, which are used to infer the maximum possible predicted posterior distribution of y * , it is written as where m and are the mean and the covariance of the prediction, given by m � K X * , X K(X, X) −1 y, � K X * , X * − K X * , X K(X, X) −1 K X, X * .
2 Complexity e predicted mean and covariance describe the Gaussian distribution that the predicted output may follow. e predicted mean value can be regarded as the predicted output value of the nonlinear fitting tool, and the predicted variance can be regarded as uncertainty evaluation of the predicted mean value. e magnitude of the prediction variance reflects the accuracy of the model at this point. e smaller the variance, the higher the accuracy of the model.

Training Process of the Semisupervised Learning Model.
For the proposed method in the study, there are two training processes. e first part is the initial training with one GP model, the second part is the self-training with two GP models.

Initial Training
Process. Firstly, we use HFSS to simulate a small number of labeled samples, denoted as N 0 . en, we use these N 0 samples as the original training samples.
e ith training sample is expressed as . , x in } is the input variable, n is the dimension of the input variables, while y i m i�1 � y i1 , y i2 , . . . , y im is the corresponding output, m is the dimension of the output variables. After that we train the GP model with the N 0 training samples. Figure 1 is the flow chart of the initial training, and the process can be summarized as follows: Step 1: using HFSS software simulates a few labeled training samples, denoted as N 0 .
Step 2: applying these N 0 samples to train the GP model. After training, the GP model is with relatively low accuracy because of small number of samples.
Step 3: the initial error of the trained GP model can be obtained.

Self-Training Process.
Before the self-training, the trained GP model in the initial training process is copied to another one, denoted as GP 1 and GP 2 . Two groups of unlabeled sample sets are denoted as Self 1 .X � x 11 , x 12 , . . . , x 1n } and Self 2 .X � x 21 , x 22 , . . . , x 2n }. In each iteration, we select, respectively, one sample from the unlabeled sample set and input it into GP 1 and GP 2 . We can obtain the corresponding output Self 1 .Y � y 11 , y 12 , . . . , y 1n and Self 2 .Y � y 21 , y 22 , . . . , y 2n . e two groups of pseudolabeled samples generated by GP 1 and GP 2 are denoted, respectively, as Self 1 .GP 1 � (Self 1 .X, Self 1 .Y) and Self 2 .GP 2 � (Self 2 .X, Self 2 .Y). In each iteration, we use the HFSS software to simulate one sample and get its label, used as the test sample, denoted as Test.
e test sample is used to evaluate the performance of the GP 1 and GP 2 , and the test errors are denoted as e 1 and e 2 , respectively. Figure 2 is the flow chart of the self-training algorithm, where i is the number of iterations, and its process can be summarized as follows: Step 1: copy the GP model from the initial training process to another one, denoted, respectively, as GP 1 and GP 2 .
Step 3: we use pseudolabeled Self 1 .GP 1 � (Self 1 .X, Self 1 .Y) to future train GP 2 and use Self 2 .GP 2 � (Self 2 .X, Self 2 .Y) to future train GP 1 . After that GP 1 and GP 2 are updated, and the two updated models differ from each other.
Step 4: the test sample in each iteration Test.G i � (x i , y i ) | x i ∈ R n , y i ∈ R m } simulated by HFSS software is used to test the performance of the updated GP 1 and GP 2 . e test error of GP 1 is denoted as e 1 , and the test error of GP 2 is denoted as e 2 .
Step 5: compare the value of e 1 and e 2 . If e 1 is large than e 2 , which means the accuracy of GP 2 model is higher than that of GP 1 , we can replace GP 1 with GP 2 , denoted as GP. Otherwise, we replace GP 2 with GP 1 , denoted as GP.
Step 6: judge whether the smaller one in e 1 and e 2 meet the error threshold or not. If yes, the program will end. Otherwise, the test sample in this iteration Test.G i � (x i , y i ) | x i ∈ R n , y i ∈ R m is used to future train the above trained GP in step 5, and then go to step 1. e pseudocode of the self-training process is shown in Algorithm 1. In the process of the self-training, the GP model uses the information obtained by unlabeled samples to update constantly. We add the test sample in each iteration to the training sample, set in order to improve the    (10)- (13). e first two functions are unimodal functions, and the error threshold is set as 1e-06. e last two functions are multimodel functions, and the error threshold is set as 1e-04. We set the maximum number of iterations to 100: e selected four benchmark functions are used to test the performance of the proposed SSL-based GP model. For each iteration, the Relative Error (RE) is used for evaluation, and it is given by where y pred is the label predicted by the SSL-based GP model and y test is the true label of the test sample.  4 Complexity Figure 3 shows the iterative results of the four benchmark functions. e left side named (a) is the test error curves of the above four functions. From the curves, we know that the test error for each iteration is small. For the unimodal functions, the order of magnitude of the maximum error is 1e-05, and for the multimodel function, the order of magnitude of the maximum error is 1e-02. Meanwhile, the right side named (b) is the fitting effect diagrams of the four functions, showing the fitting effects of each function for 50 test points. As can be seen, although the fitting effects of the multimodel functions are not perfect in some points, the fitting effects of the four functions all have reached good levels. Figure 4 is a schematic diagram of the rectangular MSA [25] that is composed of the radiation element, the medium layer, and the reference ground. e width of the rectangular patch is W, the length is L, the thickness of the dielectric layer is h, and the relative dielectric constant is ε r . e design variables are W, L, h, and ε r , and the resonant frequency points are measured by Mehmet Kara [26]. ere are 33 sets of data in total shown in Table 2. When selecting the training sample set, we should consider the information on each dimension and make sure that the samples in each dimension are uniformly dispersed. en, 13 sets of data with suffix # are used as the initial training samples, and 10 sets of data with suffix * are used as test samples. e label information of 10 sets of data with suffix ★ are removed firstly and then used as unlabeled samples because we have no unlabeled sample in the case.

Resonant Frequency of MSAs
In initial training, we set the four design variables with W, L, h, and ε r as the input variables of the GP model and the resonant frequency as the output of GP model; then, we establish the initial GP model by using the training samples in Table 2. e test samples are used to obtain the initial error, and the result of the mean RE is 0.0093.
In each iteration, GP 1 and GP 2 select, respectively, one different unlabeled sample for cross-training and take one test sample for verification. We set the iteration termination condition, that is, the test error has met the error threshold and the error of the latter iteration is worse than that of this iteration simultaneously. If it satisfies the condition, the program stops at the latter iteration. In the case, the error threshold is 1e-05, and the smallest test error is 3.4683e-6 at the 5th iteration. erefore, we stop the iteration at the 6th iteration. Table 3 shows the test errors of the six iterations, and Figure 5 shows the error curves.

Complexity
From the above results, except for unlabeled samples for each iteration, the optimal model has been further trained with four more test samples. At the same time, the four test samples are added into the original training sample set. We train a traditional GP model with the updated training sample set. e 5th test sample is used to test the traditional GP model, and the result is 7.7814e-4, larger than 3.4683e-6. We can preliminarily consider that the SSL-based GP model has advantages over the traditional supervised GP model.
Considering the effects of different test samples on the error, we use the above trained SSL-based GP model to predict the 5th to 10th test samples, and the predicted results for these six test samples are shown in bold and underlined, in column 7 named f proposed in Table 2. We also use the above traditional GP model to predict the 5th to 10th test samples for comparing. Here, we use Mean Relative Error (MRE) to evaluate the performance of the model. e MRE is given by For the rectangular MSA, the MRE of the proposed SSLbased GP model is 0.0075, while the MRE of traditional supervised GP model is 0.0081. We may conclude that, with the same training samples and the same test samples, the performance of the SSL-based GP model is better than traditional supervised GP model, and the test error is smaller and the accuracy is improved.

Circular MSA.
As can be seen from Figure 6, replacing the rectangular patch with the circular patch is a schematic diagram of the circular MSA. e radius of the circular patch is a, the thickness of the dielectric layer is h, and the relative dielectric constant is ε r . Its resonant frequency is measured by Sing [27] and Seref Sagiroglu [28]. e design variables are a, h, and ε r , as shown in Table 4, and totally there are 20 sets of data. As described above, considering the information on each dimension, 8 sets of data with suffix * are used as test samples, and their label information is removed firstly; then, they are used as unlabeled samples. e remaining samples are used as training samples.
In initial training, the same as above, we get the initial error is 0.0807, which is large and the GP model needs further improvement. In this case, the error threshold is 1e-02, and the smallest test error is 0.0071 at the 7th iteration. erefore, the program stops at the 8th iteration. Table 5      Complexity shows the test errors of eight iterations, and Figure 7 shows the error curve. From the above results, the GP model has been further trained with six more test samples. Same as above, these six test samples are added to the original training sample set to train a traditional supervised GP model. e 7th test sample is used for comparison, and the result is 0.0073, which is larger than 0.0071.
We use the above trained GP model to predict the 7th to 8th test samples, and the predicted results are shown in bold and underlined, in column 7 named f proposed in Table 4. e MRE of the SSL-based GP model is 0.0160, while that of the traditional GP model is 0.0240, which means the proposed GP model is better. In conclusion, for the resonant frequency of circular MSA, the performance of the proposed SSL-based GP model is better than that of the traditional GP model.

Triangular MSA.
As can be seen from Figure 8, we replaced the rectangular patch with the triangular patch and obtained schematic diagram of the triangular MSA. e length of equilateral triangle is s, the height is h, and the relative dielectric constant is ε r . e antenna has five different modes of operation. e design variables are mode, s, h, and ε r , and its resonant frequency is measured by Chen [29] and Danele [30].
As described above, considering the information on each dimension, 5 sets of samples with suffix * are used as the test samples in Table 6. At the same time, their label information was removed because we want them to be as unlabeled samples. e remaining samples are training samples.
In the initial training process, the error is 0.1428. Due to less training samples, the initial error is large, which needs to be further improved. In this case, the error threshold is L = 2a 2a   Complexity 1e-02, and the smallest test error is 0.0067 at the 4th iteration. Table 7 shows the test errors, and Figure 9 is the error curve. e optimal model has been further trained with three more test samples. Same as above, these three test samples are added to the original training sample set to train a traditional supervised GP model. e 4th test sample is used to test, and the result is 0.0781, which is larger than 0.0067. We use the above trained SSL-based GP model to predict the 4th and 5th test samples, and the predicted results are shown in bold and underlined, in column 7 named f proposed in Table 6. e MRE of the SSL-based GP model is 0.0194, while that of the traditional GP model is 0.0423. From the result, we can conclude that this error is smaller than the initial training error, and the accuracy of the model can be improved by using the proposed algorithm when the accuracy of initial model is not good enough. At the same time, the MRE of the SSL-based GP model is smaller than that of the traditional supervised GP model, which means the SSLbased GP model is better. In conclusion, for the resonant frequency of the triangular MSA, the performance of the proposed SSL-based GP model is better than that of the traditional supervised GP model.

Comparison with Other Algorithms.
e three basic geometries including rectangular MSA, circular MSA, and triangular MSA lead to the development of fractal geometries for the design of multiband antennas. Many research studies have researched on the data of these three different resonant frequency points we used. Firstly, we compare the proposed algorithm in this study with NNs including the backpropagation (BP), the delta-bar-delta (DBD), and the extended delta-bae-delta (EDBD) in reference [26]. Different models are used to predict all the samples including training samples and test samples, and the total absolute errors are obtained. e predicted results of the proposed GP model are, respectively, shown in column 7 named f proposed in Tables 2, 4, and 6 for different MSA. e comparison results are showed in Table 8. From the results of Table 8, for the    rectangular MSA, the total absolute error is obviously smaller than the other algorithms. For the circular MSA and the triangular MSA, the proposed algorithm is better than DBD and BP, but a litter worse than EDBD. In a word, the proposed SSL-based GP model uses fewer training samples, but has almost same or better predication ability than the NNs in [26]. However, as we all know, how to decide the structure of NN is difficult. Usually, it depends on the researcher's experience or adopts the trial and error method. For our proposed method, it has no problem and is very easy to model.
We also cite some results from other references for comparison, and the results also shown in Table 8. From references [31,32], for the rectangular MSA, the proposed algorithm is better than the methods in references. From references [29][30][31][32][33], for the circular MSA and triangular MSA, we can get the same conclusion. In a word, using a few numbers of training samples, the proposed SSL-based GP model has better predication ability than these methods in the references.

Conclusion
In order to improve the optimal design efficiency of electromagnetic devices and save the time for collecting the training samples simulated by full wave electromagnetic software, this study proposes a semisupervised GP model, which covers initial training process and self-training process. In the initial training process, a few labeled samples are used to train the GP model with relatively low accuracy. In each iteration of self-training process, the trained GP model is copied to another GP model firstly, and the two GP models are further updated with unlabeled samples. After using the same test sample for testing, the GP model with small error is used to replace another GP model for self-updating. Repeat the self-training process until the error threshold is met. Four benchmark functions are used to test the effectiveness of the proposed algorithm. Experimental results show that both unimodal functions and multimodel functions can reach the expected error with fewer iterations. Meanwhile, the problems of resonant frequency of three different microstrip antennas are used to verify the effectiveness of the proposed GP model. Comparing with supervised GP model, the results show that the accuracy of the proposed semisupervised GP model is improved, and the error is smaller than that of the traditional supervised GP model. Comparing with other algorithms, the proposed GP model uses the fewer labeled samples, while the prediction ability has some advantages over other methods. In a word, the proposed semisupervised GP model in this study further promotes the research on the optimal design of electromagnetic devices.

Data Availability
e data used to support the findings of this study are included within the article.