Semi-Supervised Ensemble Classifier with Improved Sparrow Search Algorithm and Its Application in Pulmonary Nodule Detection

The Adaptive Boosting (AdaBoost) classiﬁer is a widely used ensemble learning framework, and it can get good classiﬁcation results on general datasets. However, it is challenging to apply the AdaBoost classiﬁer directly to pulmonary nodule detection of labeled and unlabeled lung CT images since there are still some drawbacks to ensemble learning method. Therefore, to solve the labeled and unlabeled data classiﬁcation problem, the semi-supervised AdaBoost classiﬁer using an improved sparrow search algorithm (AdaBoost-ISSA-S4VM) was established. Firstly, AdaBoost classiﬁer is used to construct a strong semi-supervised classiﬁer using several weak classiﬁers S4VM (AdaBoost-S4VM). Next, in order to solve the accuracy problem of AdaBoost-S4VM, sparrow search algorithm (SSA) is introduced in the AdaBoost classiﬁer and S4VM. Then, sine cosine algorithm and new labor cooperation structure are adopted to increase the global optimal solution and convergence performance of sparrow search algorithm, respectively. Furthermore, based on the improved sparrow search algorithm and adaptive boosting classiﬁer, the AdaBoost-S4VM classiﬁer is improved. Finally, the eﬀective improved AdaBoost-ISSA-S4VM classiﬁcation model was developed for actual pulmonary nodule detection based on the publicly available LIDC-IDRI database. The experimental results have proved that the established AdaBoost-ISSA-S4VM classiﬁcation model has good performance on labeled and unlabeled lung CT images.


Introduction
Pulmonary nodule detection belongs to the category of classification.
e pulmonary nodule detection based on lung CT images is the key to diagnose lung cancer for doctors. In the field of actual lung CT image recognition, classification accuracy and acquisition of lung CT image labels are crucial issues. To solve the classification accuracy problem, ensemble learning is introduced. After 30 years of development, ensemble learning has been applied in many fields of machine learning [1,2] and is considered to be one of the effective ways to improve classification accuracy problem. In 1996, Breiman proposed the Bagging algorithm [3] which is similar to the Boosting algorithm [4]. is algorithm is one of the many algorithm families in the field of ensemble learning. But, the Boosting algorithm is difficult to apply in practical problems, because that it must know the generalization lower bound of the "weak" learning algorithm. To solve this problem, Freund and Schapire proposed the famous adaptive boosting (AdaBoost) algorithm in 1997 [5]. Compared with the Boosting algorithm, this algorithm has stronger robustness and applicability and further promotes the development of ensemble learning. In many researches, the combination of AdaBoost classifier is optimized by the method of "weak" classifier, such as support vector machine (SVM) method [6] and long shortterm memory (LSTM) network [7].
Although AdaBoost classifier has been successfully applied in many fields with its competitive accuracy, the AdaBoost classifier in classification problem of missing part of lung CT image labels show the main weakness: when dealing with labeled and unlabeled lung CT images, the classification cannot be classified well alone. To solve this problem, a hybrid ensemble learning approach is proposed for pulmonary nodule detection combining AdaBoost algorithm and safe semi-supervised support vector machine (S4VM) [8]. In addition, its performance is also mainly affected by the key parameters in the model. e corresponding weight of each weak classifier β needs to be optimized in AdaBoost-S4VM. And the S4VM optimization involves two main hyper-parameters: regularization trading off the complexity parameter C 1 and the empirical error on label and unlabeled data parameter C 2 . e parameter of S4VM is usually optimized by the 10-fold cross-validation method that cannot adapt to the data automatically and set the parameter range difficultly. And, it is easy to fall into the local optimum. In order to overcome the above shortcomings, many researches have proposed the use of intelligent optimization algorithms to optimize the key parameters of the S4VM model. ese algorithms include quasi-Newton algorithms [9], cuckoo search algorithm (CS) [10], and beetle antennae search (BAS) [11]. For the advantages of sparrow search algorithm (SSA) [12], such as simple principle, strong mining capacity, and few adjustable parameters, the improved SSA that improves the classification performance of AdaBoost-S4VM is used to optimize the key parameters of AdaBoost-S4VM and S4VM in this research. As well as, hybrid strategy which is one of the main research directions to improve the performance of swarm intelligence algorithms has become a research hotspot in machine learning. Rasoulizadeh et al. [13] modified local RBF-generated finite difference method (RBF-FD) based on local stencil nodes which has a sparsity system to overcame the dense and ill-condition. Rashidinia et al. [14] also has proposed the two meshless collocation methods based on radial basis function-generated finite difference (RBF-FD) and global RBF(GRBF) methods, and the simulation results have shown that the proposed approach was viable and effective. Can et al. [15] modified the idea of the interpolation by radial basis function, and the obtained results show that the proposed method is able to provide valid and accurate results and outperform other counterparts.
In this study, in view of the shortcomings of the AdaBoost algorithm, the S4VM weak classifier is introduced. en, SSA is used to optimize the parameters of AdaBoost-S4VM. But, it has disadvantages such as easy to fall into local optimum and poor performance in solving complex optimization problems. After that, because the sine cosine algorithm (SCA) [16] has the characteristics of achieving high search and avoiding local optimization, we first introduced the SCA algorithm to improve the global search capability of the SSA algorithm. Additionally, in order to enhance the convergence ability of the SSA algorithm, the labor cooperation structure of the sparrow in the SSA algorithm is redefined. Finally, based on the new labor cooperation structure and SCA algorithm, the improved cooperative sparrow search algorithm based on sine cosine algorithm (SCA-CSSA) is proposed. e SCA-CSSA algorithm is used to optimize the weight of AdaBoost-S4VM and the key parameters of S4VM to improve the accuracy of the AdaBoost-S4VM model for semi-supervised lung CT classification. And, the improved semi-supervised AdaBoost classification model using an improved sparrow search algorithm (AdaBoost-ISSA-S4VM) was established. In order to verify the effectiveness of the proposed AdaBoost-ISSA-S4VM model, first it compared with several hybrid algorithms and popular algorithms on CEC2017 functions and 12 benchmark functions, including unimodal and multimodal functions. In addition, in order to evaluate the effectiveness of the AdaBoost-ISSA-S4VM model, it is also compared with the supervised classifiers and semi-supervised classifiers. Experimental results show that the improved machine learning model proposed has better stability and higher prediction accuracy.

Adaptive Boosting (AdaBoost).
e AdaBoost classifier algorithm is implemented by changing the data distribution. It determines the weight of each sample based on whether the classification of each sample in each training dataset is correct and the accuracy of the last overall classification accuracy. It sends the new dataset with modified weights to the lower classifier for training, and so on. Finally, according to the calculated corresponding weights, we will get the final desired "strong classifier" named AdaBoost-S4VM by stacking a series of "weak classifiers." e strong learning algorithm is defined as follows: where g m (x) is the basic weak classifier, β m (m � 1, . . . , M) is the corresponding weight of each weak classifier, and M is the total number of basic weak classifiers. e calculation method of the weight coefficient where e m is the error rate of calculating g m on the training set, that is, e m � P(G m (x i ) ≠ y i ) � N i�1 w mi I(G m (x i ) ≠ y i ). Since we explore the discrimination of features by training weak classifiers and organize the AdaBoost classifiers in a cascade way, we ask for simple weak classifiers, with which the target of the cascade-AdaBoost classifier can be easily controlled.
us, simple threshold classifiers are chosen as weak classifiers as follows: where T lower and T upper are thresholds for the weak classifier g m , which can be obtained by using a semi-exhaustive search technique.

Safe Semi-Supervised Support Vector Machines (S4VM).
e core design idea of S4VM is that it optimizes the classification of unlabeled sample data when there are many different situations that meet the larger "interval" dividing line. So that the performance improvement relative to the support vector machine that only uses labeled samples is 2 Mathematical Problems in Engineering maximized in the worst case. And, the objective function h(f, y) that S4VM needs to optimize is as follows: l y j , f x j .

(4)
Its goal is to find multiple large-margin low-density separators f t T t�1 and the corresponding label assignments y t T t�1 such that the following functional is minimized: where T is the number of separators, Ω is a quantity of penalty about the diversity of separators, and M is a large constant enforcing large diversity. And, Ω( y t T t�1 ) is as sum of pairwise terms. Here it is defined as Ω( y t T t�1 ) � 1≤t≠t≤T I((y t ′ y t /u) ≥ 1 − ε), where I is the identity function and ε ∈ [0, 1] is a constant, but note that other penalty quantities are also applicable. Without loss of generality, suppose that f is a linear model, and it is defined as is a feature mapping induced by the kernel k. en, the optimization problem to be solved can be expressed as follows: where y t,j refers to the jth entry of y t . Formula (6) is nonconvex, and in the following, we will present two solutions. It is evident that this can also be implemented by other solutions, especially those based on efficient S3VMs.

Sparrow Search Algorithm (SSA).
Sparrow search algorithm (SSA) was originally proposed by Xue et al. e algorithm imitates the unique predation method of sparrows in nature to solve the optimization problem. In SSA, the position of the sparrow in the population is the candidate solution for a given optimization problem.
According to the mathematical model of SSA, the behavior of the sparrows is mainly divided into three divisions of labor: producers, scroungers and sparrows at the edge of the group. According to the rules of producers and once the sparrow detects the predator, the producers can search for food in a broad range of the places than that of the scroungers. e location of the producer is updated as follows: where t indicates the current iteration, j � 1, 2, . . . , d. X t+1 i,j represents the value of the jth dimension of the ith sparrow at iteration t. iter max is a constant with the largest number of iterations, α ∈ (0, 1] is a random number, R 2 (R 2 ∈ [0, 1]) and ST(ST ∈ [0.5, 1.0]) represent the alarm value and the safety threshold, respectively. Q is a random number which obeys normal distribution, and L shows a matrix of 1 × d for which each element inside is 1. According to rules of producers and the scroungers, the position update formula for the scrounger is described as follows: where X P is the optimal position occupied by the producer, X worst denotes the current global worst location, A represents a matrix of 1 × d for which each element inside is randomly assigned 1 or − 1, and A + � A T (AA T ) − 1 . When i > n/2, it suggests that the ith scrounger with the worse fitness value is most likely to be starving. According to rule of the sparrows at the edge of the group, the mathematical model of the sparrows at the edge of the group can be expressed as follows: where X best is the current global optimal location. β, as the step size control parameter, is a normal distribution of random numbers with a mean value of 0 and a variance of 1, and K ∈ [0, 1] is a random number. Here, f i is the fitness value of the present sparrow, f g and f w are the current Mathematical Problems in Engineering global best and worst fitness values, respectively, and ε is the smallest constant so as to avoid zero-division-error.

The Proposed Method
e sparrow search algorithm and the principle of AdaBoost classifier have been clarified, and the basic S4VM has also been discussed. However, the pulmonary nodule detection classifying process based on lung CT images is complex and challenging.
Although the AdaBoost classifier is novel and superior, there are still some shortcomings when utilized to the pulmonary nodule detection classifying problem based on lung CT images.
us, two improving strategies called "S4VM algorithm" and "parameters optimization" or "weight optimization" are introduced to the original Ada-Boost classifier and AdaBoost-S4VM, respectively, to help them jump out of local optima.
ere are also some shortcomings in SSA to optimize them, so a sine cosine algorithm is used as hybrid algorithm to help it jump out of local optima. And, a new division of labor structure is introduced to the original SSA to help it converge to the global optimal solution faster and more stably. In this section, the proposed SCA-CSSA algorithm, AdaBoost-S4VM algorithm, and AdaBoost-ISSA-S4VM algorithm will be discussed in detail.

e Proposed AdaBoost-S4VM Model (AdaBoost-S4VM).
e AdaBoost classifier algorithm is implemented by changing the data distribution. It determines the weight of each sample based on whether the classification of each sample in each training dataset is correct and the accuracy of the last overall classification accuracy. It sends the new dataset with modified weights to the lower classifier for training and so on. Finally, according to the calculated corresponding weights, we will get the final desired "strong classifier" named AdaBoost-S4VM by stacking a series of "weak classifiers." e strong learning algorithm is defined as follows: where g m (x) is the basic weak classifier, β m (m � 1, . . . , M) is the corresponding weight of each weak classifier and it is defined as M m�1 β m � 1, 0 ≤ β m ≤ 1, and M is the total number of basic weak classifiers.
According to the characteristic of formulas (6) and (10), we can get the improved AdaBoost-S4VM formula combined with "weak" classifier S4VM as follows: where β m (m � 1, . . . , M) is the corresponding weight of each S4VM classifier and it is defined as M m�1 β m � 1, 0 ≤ β m ≤ 1 and M is the total number of basic S4VM classifiers.

e Proposed Sparrow Search Algorithm (SCA-CSSA).
It can be seen from Section 3.1 that, in order to obtain the optimal classification effect and efficiency, parameters C 1 , C 2 , and weight β m (m � 1, . . . , M) need to be optimized. But, due to the disadvantages of being easy to fall into the local optimum and slow convergence speed of the original SSA, it is difficult to guarantee the quality of the obtained solution. In response to these problems, SCA algorithm and a new labor cooperation structure is used to improve the global and local search capability of SSA algorithm. e sine cosine algorithm (SCA) position can be written mathematically as follows: where X t i is the position of the current solution in i-th dimension at t-th iteration, r 2 /r 3 are random numbers, P i is position of the destination point in i-th dimension, | | indicates the absolute value, and r 4 is a random number in [0, 1].
In order to balance exploration and exploitation, the range of sine and cosine in formula (12) is changed adaptively using the following equation: where t is the current iteration, T is the maximum number of iterations, and a is a constant.
To increase the search speed and jump out of local optimization, the SCA algorithm is introduced in SSA. Because SCA has the characteristics of increasing the search speed and jumping out of local optimization, it can well avoid the problem of premature sparrows. e following formula is the update SSA formula that combines the SCA algorithm: en, in the improved SSA algorithm, a new labor cooperation structure is first used to converge to the global optimal solution faster and more stably. In the new labor cooperation structure, three divisions of sparrows are also defined: producer, scrounger, and sparrows at the edge of the group. Since the producer and scrounger determine the location range and convergence performance of the group, they share their locations to achieve cooperation. en, cooperation could make both producer and scrounger great, thereby achieving the effect of improving convergence. e location of the producer is remarked as follows: e position remarked formula for the scrounger is described as follows: is process can be expressed by the following formula: where min best refers to the global optimal minimum solution.
e pseudocode of the whole SCA-CSSA is given below in Algorithm 1.

e Proposed AdaBoost-S4VM Model Improved by the Improved Sparrow Search Algorithm (AdaBoost-ISSA-S4VM).
After getting the improved SCA-CSSA, it can be seen from Section 3.1 that, in order to obtain the optimal classification effect and efficiency, parameters C 1 and C 2 and weight β m (m � 1, . . . , M) need to be optimized.
Finally, the AdaBoost-S4VM parameter is optimized using SCA-CSSA. And, the pseudocode of AdaBoost-ISSA-S4VM is shown in Algorithm 2.

Experimental Studies
In this section, in order to evaluate the performance of the proposed SCA-CSSA and AdaBoost-ISSA-S4VM model, a series of experiments on test functions and CT images are used in this paper. All experiments in this paper are implemented using the following: MATLAB R2014b; Win 10 (64 bit); Inter (R) Core (TM) i5-10210M CPU @1.60 GHz 2.11 GHz.

Function Optimization Experiment.
is section presents the evaluation of SCA-CSSA using a series of experiments on benchmark functions [17] and CEC2017 test functions [18]. To obtain fair results, all the experiments were conducted under the same conditions. e number of the population size is set as 30 in these algorithms. And, each algorithm runs 30 times independently for each function.

Benchmark Functions and CEC 2017 Test Functions.
When investigating the effective and universal performance of SCA-CSSA compared with several hybrid algorithms and popular algorithms, 12 benchmark functions and CEC2017 test functions are applied. In order to test the effectiveness of the proposed SCA-CSSA, 12 benchmark functions are adopted, all of which have an optimal value of 0. e benchmark functions and their searching ranges are shown in Table 1. In this test suite, f 1 − f 7 are unimodal functions. ese unimodal functions are usually used to test and investigate whether the proposed algorithm has a good convergence performance.
en, f 8 − f 12 are multimodal functions. ese multimodal functions are used to test the global search capability of the proposed algorithm. e smaller the fitness value of functions, the better the algorithm performs. Furthermore, in order to better verify the comprehensive performance of SCA-CSSA in a more comprehensively manner, another 30 complex CEC2017 test functions are used. e CEC2017 test functions are simply described in Table 2.

Parameter Settings.
In order to verify the effectiveness and generalization of the proposed SCA-CSSA, the improved SCA-CSSA is compared with several hybrid algorithms and popular algorithms. ese algorithms are SSA, SCA, SCA_SSA, and SCA_CSSA. Another 4 popular intelligence algorithms, such as particle swarm optimization (PSO) [19], bird swarm algorithm (BSA) [20], crow search algorithm (CSA) [21], whale optimization algorithm (WOA) [22], grasshopper optimization algorithm (GOA) [23], are used to compare with SCA-CSSA. ese algorithms represented state-of-the-art can be used to better verify the performance of SCA-CSSA, in a more comprehensively manner. For fair comparison, the number of populations of all algorithms is set to 30, respectively, and other parameters of all algorithms are set according to their original papers. e initial controlling parameters of all algorithms are shown in Table 3. Input: weak classifier type: S4VM; train data set, train label set, test data set, test label set; the maximum iterations: M; kernel: RBF; parameters of S4VM: weight for the hinge loss of labeled instance C 1 , weight for the hinge loss of unlabeled instance C 2 , and the sampling times for each trial sampleTime. Output: prediction label of test data set (1) set the weights of the training data set

Comparison on Benchmark Functions with Hybrid
If there are misclassification points / * Parameter selection based on SCA-CSSA * / (4) According to SCA-CSSA, find the optimal hyper-parameters (C 1 , C 2 ) of weak classifier S4VM; / * Weight of AdaBoost selection based on SCA-CSSA * / (5) According to SCA-CSSA, find the optimal weight β m (m � 1, . . . , M) of weak classifier S4VM; (6) Using the weight distribution β m , calculate the m th weak classifier G m ; Update the weight distribution of the training set w m+1,i ; (8) m � m + 1; (9) else (10) jump out of the loop; (11) end (12) end for (13) According to formula (11), m groups of weak classifiers are linearly combined, and the final classifier is output; (14) Use the final classifier to predict the training set classification.
Input: the maximum iterations: iter max; the number of producers: P Num; the number of sparrows who perceive the danger: SD Num; the number of sparrows: pop; dynamic parameter: a; Output: global optimal position: X best ; fitness of global optimal position: f g ; Begin: (1) Initialize a population of sparrows pop and define its relevant parameters.
(2) while (t < iter max) (3) Rank the fitness values and find the current best individual and the current worst individual.
Using formula (16), update the sparrow's location; (7) end for (8) for i � 1: pop (9) Using formula (17), update the sparrow's location; (10) end for / * the new division of labor structure scheme * / (11) Using formulas (18) and (19), update the producer and scrounger's cooperative location; (12) If the new location is better than before, update it use formula (20); (13) for l � 1: SD Num (14) Using formula (9) update the sparrow's location; (15) end for (16) Get the current new location; (17) If the new location is better than before, update it; / * sine cosine algorithm scheme * / (18) Using formula (14), update the SCA sparrow's location; (19) If the new location is better than before, update it use formula (15); ALGORITHM 1: e framework of the SCA-CSSA. Rosenbrock Step   Figures 1 and 2, where the horizontal axis represents the number of iterations and the vertical axis represents the fitness value. We can obviously see that the convergence speeds of several different algorithms. e maximum value (Max), the minimum value (Min), the mean value (Mean), and the variance (Var) obtained by several benchmark algorithms are shown in Tables 4 and 5, where the best results are marked in bold. Table 4 shows the performance of the several algorithms on unimodal functions when FEs � 1000, and Table 5 shows the performance of the several algorithms on multimodal functions when FEs � 1000.
(1) Unimodal Functions. e evolution curves of these algorithms on 3 unimodal functions f 1 , f 3 , and f 5 are given in Figure 1. It can be detected from the figure that the curve of SCA-CSSA descends fastest in the number of iterations that are far less than 10,000 times. For f 1 and f 3 case, SCA-CSSA has the fastest convergence speed compared with other algorithms. But, on functions f 1 and f 3 , the original CSA and GOA got the worst solution because it is trapped in the local optimum prematurely. For function f 5 , these algorithms did not find the value 0. However, SCA-CSSA continues to decline and the convergence speed of it is significantly faster than other algorithms in the early stage; the solution eventually found is the best. Overall, owing to enhance the diversity of population, SCA-CSSA has a relatively excellent convergence speed when FEs � 10,000.
From the numerical testing results on 7 unimodal functions in Table 4, we can see that SCA-CSSA can find the minimum value on f 1 , f 2 , f 3 , f 4 ,f 5 , and f 7 . And, SCA-CSSA can find the optimal solution for all unimodal functions and get the minimum value of 0 on f 1 , f 2 , f 3 , and f 4 . It illustrates that the SCA-CSSA has best performance on unimodal functions compared to the other algorithms when FEs � 1000. Moreover, SCA-CSSA has the best maximum value (Max), the minimum value (Min), the mean value (Mean), and the standard deviation (Std) on f 1 , f 3 , f 5 , and f 7 . Obviously, the SCA-CSSA has a relatively good convergence speed. In summary, compared with these popular algorithms and hybrid algorithms, SCA-CSSA is a competitive algorithm for solving several functions and has the best performance on the most test benchmark functions.
(2) Multimodal Functions. e evolution curves of these algorithms on 3 multimodal functions f 8 , f 9 , and f 10 when FEs � 10,000 are depicted in Figure 2. We can see that SCA-CSSA can find the optimal solution in the same iteration. For f 8 and f 10 cases, SCA-CSSA continues to decline and got the best value 0. But, the original PSO and GOA get parallel straight lines because of their poor global convergence ability on these 3 functions. For function f 9 , although SCA-CSSA is also trapped the local optimum, it finds the minimum value compared to other algorithms. Obviously, the convergence speed of the SCA-CSSA is significantly faster than other algorithms in the early stage, and the solution eventually found is the best. In general, owing to enhance the diversity of population, SCA-CSSA has a relatively balanced global search capability when FEs � 10,000.
From the numerical testing results on 5 multimodal functions in Table 5, we can see that SCA-CSSA can find the optimal solution for all multimodal functions and get the minimum value of 0 on f 8 and f 10 . e SCA-CSSA has relatively well performance on multimodal functions compared to the other algorithms. Moreover, SCA-CSSA has the best maximum value (Max), the minimum value (Min), the mean value (Mean), and the standard deviation (Std) on f 9 , f 10 , f 11 , and f 12 . Obviously, the SCA-CSSA has a relatively well global search capability. e main reason is that SCA-CSSA has a stronger global exploration capability based on the SCA method. In summary, the SCA-CSSA has a superior global search capability on most multimodal functions when FEs � 1000.

Comparison on CEC2017 Test Functions with Hybrid
Algorithms and Popular Algorithms. In order to further verify the universality of the proposed SCA_CSSA algorithm, it has been compared with PSO, BSA, WOA, GOA, SSA, SCA, and SCA_SSA on the latest CEC2017 test functions. In this experiment, the dimension's size (Dim) is set to 10. e number of function evaluations (FEs) is 10,000. Experimental comparisons included the maximum value (Max), the minimum value (Min), the mean value (Mean), and the standard deviation (Std) and are given in Tables 6  and 7, where the best results are marked in bold.
SCA-CSSA gets the minimum value on F 3 , F 4 , F 6 , F 10 , F 11 , F 12 , F 13 , and F 14 in Table 6 Table 7. According to the results, we can observe that SCA-CSSA does well on 21 CEC2017 test functions. Further, SCA-CSSA has the best maximum value (Max), the minimum value (Min), the mean value (Mean), and the standard deviation (Std) on F 3 , F 4 , F 6 , F 10 , F 11 , F 12 , F 13 , F 14 , and F 15 in Table 6 and on F 17 , F 18 , F 20 , F 22 , F 23 , F 24 , and F 25 in Table 7. erefore, SCA-CSSA can not only find the optimal solution but also has stability on 16 CEC2017 test functions. In summary, it can be observed that SCA-CSSA obtains optimal value. It can be concluded that SCA-CSSA has better global search ability and better robustness on these test suites.

Application to Practical Pulmonary Nodule Detection
Classification Problem Based on Lung CT Images. In this section, in order to evaluate the performance of SCA-CSSA in optimizing real-world optimization problem, the proposed AdaBoost-ISSA-S4VM model is used for Pulmonary Nodule Detection Classification. e CT images from LIDC/ IDRI database were used for the AdaBoost-ISSA-S4VM classification. In order to obtain fair results, all the implementations, such as SVM [24], S4VM, AdaBoost-SVM, and AdaBoost-S4VM, are conducted under the same conditions. e experimental environment for all experiments in this section is the same as in Section 4.1. And, each algorithm runs 30 times independently for each classification model. Population size and maximum generation are set to 30 and 100, respectively.

Design of Pulmonary Nodule Detection Classification
System. In order to identify and classify the lung nodules and non-nodules, the processing module includes the preprocessing of DICOM image, the extraction of the lung parenchyma and lung nodule, the interception of the ROI (region of interest) image, the acquisition of ROI feature vectors, and the dimensionality reduction and classification of the vectors. Block diagram of the pulmonary nodule detection classification system based on the improved AdaBoost-ISSA-S4VM is shown in Figure 3.  Step 1 (Image Selection). e selection of the CT images of lung is solitary lung nodule. At the same time, the datasets should be closely related to lung cancer sample analysis. e dataset should be relatively independent. e dataset is randomly divided into two parts of training and testing samples.
Step 2 (Picture Preprocessing). Read the CT images of lung first, as shown in Figure 4(a), and then use RPCA method which is improved by weighted nonconvex regularization for image denoising [25]. After  enhancing the contrast ratio of the images through binarization processing, this study uses optimal threshold segmentation (OTSU) method to sharpen the image, as shown in Figure 4(b).
Step 3 (Lung Parenchyma Extraction). In order to narrow the range of the lung parenchyma and reduce the difficulty of detection, thus improving the accuracy of detection, we fill the lung parenchyma, as shown in   Mathematical Problems in Engineering 13  Figure 4(c). en, do XOR to the figure; the area of lung parenchyma is obtained as shown in Figure 4(d). After deleting external and small area in lung parenchyma, the morphological method is used to repair the edge of the image, as shown in Figure 4(e). Finally, the lung parenchyma template and the image after pretreatment are multiplied to obtain the required lung parenchyma, as shown in Figure 4(f ).
Step 4 (ROI Region Extraction). e optimal threshold segmentation method is used again in order to extract the pathological part. After eliminating the linear structures, the small area is removed by removing smaller connected components, as shown in Figure 4(g). Finally, we can remove the false positives by the dot filter method which can remove the linear structure effectively and get the ROI regions as shown in Figure 4(h). e ROI regions include lung nodules and non-nodules as shown in Figures 4(i) and 4(j).
Step 5 (Feature Vector Extraction). In order to avoid the influence of the particularity, heterogeneity, texture, and complexity of lung nodules on the selection of feature vectors, we introduce the Curvelet transform with rigorous mathematical theory based on the conventional feature extraction methods [24] to supplement the feature vectors.
Step 6 (Classifier Training and Feature Classification). In the AdaBoost-ISSA-S4VM classifier, input the actual feature vector of lung node dataset after feature vector extraction, use AdaBoost-ISSA-S4VM classification algorithm to train, and finally get the AdaBoost-ISSA-S4VM classification model. e train dataset is identified by the trained Ada-Boost-ISSA-S4VM classification model and the classification results are obtained as output.

Practical Application.
In Section 2.3, the performance of the proposed ISSA is simulated and analyzed on benchmark functions. In order to test the application effect of the improved AdaBoost-ISSA-S4VM classification model, the CT images of lung from LIDC/IDRI database is selected for experiments. According to the description of the XML annotation file of the case nodule information in the database, the solid solitary lung nodule was analyzed. ROI region extraction on the DICOM image is performed before feature vector extraction, as shown in Figure 4. Figure 4 shows some partial steps of lung nodule extraction, where the (a) is the original CT image, the (f ) is a lung parenchyma after being processed, and the (i) and (j) Figure 5 shows some lung nodules and non-nodules gained from the experiment. Feature vector extraction on the ROI regions datasets is performed before the training of AdaBoost-ISSA-S4VM classification model on the training dataset, as shown in Table 8. 715 feature vector parameters are extracted, including 12 morphological feature parameters, 10 gray-scale feature parameters, 7 texture feature parameters, and 686 Curvelet transform coefficients. en, these feature vectors are normalized to prevent features with large dynamic range from affecting the characteristic of features with small one, as shown in Table 9.
e CT image after preprocessing, extracting lung parenchyma and lung nodes, and extracting the feature of the lung nodes is used to train the pulmonary nodule detection classification model. In order to measure the performance of the AdaBoost-ISSA-S4VM classification model, we compare the improved classification model with several popular pulmonary nodule detection classification models. ese classification models are the SVM model [26], standard S4VM model, AdaBoost-SVM model, AdaBoost-S4VM model, and AdaBoost-ISSA-S4VM model. In order to evaluate the performance of the recognition model, the following performance indicators are selected in this paper. e formula for evaluating the classification indexes is shown in Table 10.
ACC is used to evaluate the accuracy of each classification model. SEN and SPE are used to refer the ability to detect the true positive and true negative, respectively. FPR and FNR are, respectively, the misdiagnosis rate and missed diagnosis rate. Table 11 records the performance indicator data of each classification model, and the best results are marked in bold. e larger the SEN and SPE are, the better the classification model performs. On the contrary, the smaller the FPR and FNR are, the better the classification model performs.
From the performance indicators data of each classification model in Table 11, we can see that the classification accuracy of AdaBoost-ISSA-S4VM classification model can be comparable to or even better than supervised classifiers such as SVM. First of all, it can be seen that the classification accuracy of the S4VM classifier is quite poor and far lower than the SVM. e reason is that the SVM is a supervised classifier whose input dataset is labeled, while S4VM is a semi-supervised classifier whose input dataset contains unlabeled dataset, which will reduce the accuracy of the classifier. Secondly, the S4VM classifier optimized by ensemble learning is better than SVM combined with ensemble learning. en, the classification accuracy of the established AdaBoost-ISSA-S4VM, which is the S4VM classifier optimized by ensemble learning and SCA-CSSA and can get 94.22% on labeled and unlabeled lung CT images which is much higher than the original supervised classifier on labeled samples. At the same time, the false positive rate and false negative rate of the established AdaBoost-ISSA-S4VM can get 0.1234 and 0.0146 on these  e false positive rate and false negative rate of AdaBoost-ISSA-S4VM also performs well. Based on above results of data, the proposed classification model is better than the traditional supervised classifiers such as SVM model on lung nodule classification.

Conclusion
In summary, the improved semi-supervised ensemble classifier (AdaBoost-ISSA-S4VM) is proposed by combining AdaBoost classifier, semi-supervised SVM, and improved sparrow search algorithm for semi-supervised problem. e proposed algorithm is employed in lung CT images for pulmonary nodule detection, and a detailed performance comparison and analysis are presented based on the publicly available LIDC-IDRI database. Better experimental results are obtained with the improved algorithm compared to that with the SVM, S4VM, AdaBoost-SVM, and AdaBoost-S4VM algorithms. In particular, the proposed AdaBoost-ISSA-S4VM is able to improve 21% more accuracy than standard SVM and 26% more accuracy than S4VM. is study demonstrates that the established AdaBoost-ISSA-S4VM classifier can solve the problem of pulmonary nodules detection of labeled and unlabeled lung CT images. In other words, the proposed AdaBoost-ISSA-S4VM classifier has the potential for improving the performance of the lung CT image classification by labeled and unlabeled lung CT images with a high detection probability of being cancers at its early stage.
Although the proposed AdaBoost-ISSA-S4VM has been proven to be effective in solving general optimization problems, AdaBoost-ISSA-S4VM has some shortcomings that warrant further investigation. And, in AdaBoost-ISSA-S4VM, due to the improvement strategies, AdaBoost-ISSA-S4VM has needed more time than the classical S4VM and most of supervised classifiers. erefore, deploying the proposed algorithm to increase recognition efficiency is a worthwhile direction. In the future research work, the method presented in this paper can also be extended to solving discrete optimization problems and multiobjective optimization problems. Furthermore, applying the proposed AdaBoost-ISSA-S4VM model to other fields such as financial prediction and biomedical science diagnosis is also an interesting future work.

Data Availability
All data included in this study are available upon request by contact with the corresponding author. All the lung CT images for pulmonary nodule detection in this study can be found in the free publicly available LIDC/IDRI Database

Conflicts of Interest
e authors declare that they have no conflicts of interest.