Kernel-Free Nonlinear Support Vector Machines for Multiview Binary Classification Problems

. Multiview learning (MVL) frequently uses support vector machine-(SVM-) based models, but it can be difcult to select appropriate kernel functions and corresponding parameters. Ten, by introducing kernel-free tricks, two multiview classifers are proposed, called C -multiview kernel-free nonlinear support vector machine ( C -MKNSVM) and its ] -version, namely, ] -MKNSVM. Tey try to fnd a quadratic hypersurface under each view to classify the sample points and employ a consistency constraint to fuse the sample points from two views. Both the primal and dual problems of C -MKNSVM and ] -MKNSVM do not involve kernel functions; thus, they are allowed to be solved directly. In addition, the relationship of solutions between the primal and dual problems is discussed in each classifer. For the C -version and ] -version of MKNSVM, the meanings of their parameters and the relationship between them are analyzed in detail. Te experimental results of artifcial and benchmark datasets show that our methods are superior to some traditional MVL classifers like SVM-2K, PSVM-2V, and MvTSVM, especially ] -MKNSVM.


Introduction
For the multiview binary classifcation problem, two classifers are proposed in this paper.Tey are based on both kernel-free learning and multiview learning (MVL).
For kernel-free learning as our frst foundation, Dagher [1] frst proposed a kernel-free nonlinear classifer, namely, quadratic surface support vector machine (QSSVM), which is based on the idea of the maximum geometric margin.Te classifcation hyperplane is replaced by a quadratic hypersurface.Later, in order to improve the generalization ability and classifcation accuracy of the model, Luo [2] proposed a soft QSSVM.Subsequently, their team developed a oneclass QSSVM [3] and applied the QSSVM model to assess credit risk [4,5].Bai et al. [6] developed a least squares QSSVM for target disease classifcation and achieved good performance.Mousavi [7] raised QSSVM based on the L 1 norm regularization term.Following these leading works, some scholars performed further studies [8][9][10][11].Recently, Gao et al. [12] proposed an improved version of QSSVMs based on a double potential function.In addition, the minimax probability machine [13] and ϵ-quadratic surface SVR [14] based on QSSVMs were proposed by our team.
As our second foundation, MVL [15][16][17][18][19] focuses on multiview data.A branch of the MVL algorithm is based on the support vector machine (SVM) [20][21][22][23].In 2005, Farquhar [24] proposed SVM-2K, which combined a standard SVM with kernel canonical correlation analysis.It minimizes the distance between the predictive functions of two views as well as the loss within each view by following the consensus principle.Later, Xie and Sun [25] developed MvTSVMs on the basis of twin SVMs by increasing coupling terms and coupling coefcients.Tang et al. [26] raised PSVM-2V, which generalizes a privileged classifer.Niu et al. [27] proposed MSVMCFs, which can select low-dimensional features for classifcation.Meanwhile, the MVGSVM [28] was proposed by Sun et al.Cheng et al. [29] developed an improved MVGEPSVM, where a multiview regularization term and an L 1 norm were introduced.Recently, Yu et al. [30] proposed the IBMvSVM, which focuses on the characteristics of each instance itself in diferent views rather than treating them equally.Xie and Xiong [31] developed two generalized multiview extensions of generalized eigenvalue proximal support vector machines which can handle generalized multiview learning cases.In addition, many researchers have developed various improved versions [32][33][34][35][36][37].However, it should be noted that the methods mentioned above require kernel tricks to achieve multiview nonlinear classifcation, so their interpretability is weak.In addition, it is usually difcult to select appropriate kernel functions and corresponding parameters.
It can be seen that it is interesting to study the multiview kernel-free nonlinear classifer using the above kernel-free trick.Tus, we propose two novel multiview kernel-free nonlinear classifers, namely, C-MKNSVM and ]-MKNSVM.In summary, our main contributions can be outlined as follows: (1) C-MKNSVM is proposed by combining the kernelfree trick and the consensus principle of multiview learning.Te value of penalty parameters C 1 and C 2 of this model are in the set (0, +∞).In addition, its alternative ]-version approach is also developed, namely, ]-MKNSVM.Te value range of parameters ] 1 and ] 2 involved in this model is (0,1].Ten, our ]-MKNSVM is easier to adjust with the parameters than our C-MKNSVM.(2) For these two classifers, they do not involve kernel functions but look for a quadratic hypersurface to separate data in each view.Terefore, they not only avoid the choice of kernel functions and corresponding parameters but also have stronger interpretability.Moreover, both the primal and dual problems of C-MKNSVM and ]-MKNSVM are allowed to be solved directly.It should be emphasized that the primal problems of these two classifers do not need to solve inverse matrices.(3) For our C-MKNSVM and ]-MKNSVM, the relationships of solutions between the primal and dual problems are discussed in the theoretical analysis.In addition, the meanings of parameters are addressed for these two classifers, respectively.In fact, the parameter ] has more intuitive meaning than the parameter C. Furthermore, the Te rest of this paper is organized as follows: Section 2 briefy reviews related works.Section 3 introduces our proposed two multiview quadratic classifers, C-MKNSVM and ]-MKNSVM, and gives the solution procedure in detail.Section 4 shows the analysis of the relationship between our two classifers and their parameters.Section 5 presents the numerical experiments.Section 6 shows the conclusion of this paper.

Related Work
Tis section is separated into the following three subsections: Te frst subsection mainly concentrates on the notations in the whole paper.Te second subsection concentrates on several classical binary SVM-based classifers, including C-SVM, ]-SVM, and SQSSVM, and the third subsection reviews SVM-2K for multiview learning.

Notations.
Troughout this paper, we use lowercase letters to denote scalars, lowercase bold letters to denote vectors, and uppercase bold letters to denote matrices.Te set of real numbers is written as R, the set of m-dimension real vectors is written as R m , and the set of real symmetric m × m matrices is written as S m .We use I to represent the appropriate dimensional identity matrix.
For further convenience, we give the following three operators that represent the mapping relationships among the matrix, vector, and vector, respectively.

Defnition
1.For any symmetric matrix W � [w ij ] m×m ∈ S m , we defne its half-vectorization operator as Defnition 2. For any vector x � [x 1 , x 2 , . . ., x m ] T ∈ R m , we defne its quadratic mapping operator as 2 International Journal of Intelligent Systems Defnition 3.For any vector x � [x 1 , x 2 , . . ., x m ] T ∈ R m , we defne its matrixing operator as mat(x) ≜ From Defnition 3, the matrix mat(x) can be illustrated by m � 3,

Related Binary Classifcation SVMs.
For a traditional binary classifcation problem, given the training set, where To fnd the following separating hyperplane, we have Te optimization problem of the standard SVM is formulated as min w,b,ξ where ξ � [ξ 1 , ξ 2 , . . ., ξ l ] T is a slack vector and C is a nonnegative penalty parameter.Te standard SVM is also called as C-SVM.In C-SVM, the penalty parameter C trades of the two confict goals: maximizing the margin and minimizing the training error.Te value of C is qualitatively clear.Larger C implies that more attention has been paid to minimizing the training error.However, it is lacking in quantitative meaning.
In order to overcome this drawback, another classical SVM was proposed in [38], that is, ]-SVM.Its optimization problem is min where ξ � [ξ 1 , ξ 2 , . . ., ξ l ] T is a slack vector and ] ∈ 0, 1] is a preselected parameter.Te signifcance of the parameter ] lets one efectively control the number of support vectors.
It is well known that both the aforementioned SVMbased methods need to adopt the kernel trick to obtain a nonlinear separation.Due to introducing the kernel function, in general, the dual problems of the primal problem ( 7)-( 9) (( 10)-( 12)) are solved to get the normal vector w and the bias b of the separating hyperplanes.For the traditional binary classifcation problem, C-SVM and ]-SVM are all popular.In fact, there is some equivalence relation between C-SVM and ]-SVM based on the quantitative relation between the parameters C and ] [38,39].
Recently, a series of kernel-free SVM models [12,14] have been developed, in which a typical kernel-free SVM model is the soft quadratic surface support vector machine (SQSSVM) [2].For the given training set (11), the goal of the SQSSVM is to fnd the following quadratic separating hypersurface: To derive the value of y for any value of x by the decision function, we have where W ∈ S m , b ∈ R m , and c ∈ R. In order to obtain the quadratic hypersurface (13), the SQSSVM establishes the following optimization problem: where ξ � [ξ 1 , ξ 2 , . . ., ξ l ] T is a slack vector and C is a nonnegative penalty parameter.Taking advantage of the symmetry of the matrix W, the optimization problem ( 15)-( 17) can be further simplifed by using above Defnitions 1-3.For more details, readers are referred to see [2].

SVM-2K.
For a two-view binary classifcation problem, given the training set, where x A i ∈ R m 1 and x B i ∈ R m 2 are the i-th inputs of view A and view B, respectively, and y i ∈ +1, −1 { } is the i-th output, i � 1, 2, . . ., l.
For the given training set (18), the goal of SVM-2K [24] is to seek a pair of hyperplanes under view A and view B: where Te decision function can be constructed as In order to obtain the pairs of hyperplanes ( 19) and ( 20), SVM-2K establishes the following optimization problems: where C 1 , C 2 , D, and ε are nonnegative parameters and ξ A/B � [ξ A/B 1 , . . ., ξ A/B l ] T and η � [η 1 , . . ., η l ] T are slack variables.Te similarity constraint (23) integrates the two views between two SVMs from two distinct spaces.Similar to classical SVM-based methods, the solution of the optimization problem ( 22)-( 25) is obtained by solving its dual problem.Ten, the kernel trick can be used in SVM-2K, while two diferent kernel functions can be adopted by the two views, respectively.
For the two-view training set (18), our goal is to fnd a pair of the following form of quadratic hypersurfaces under views A and B: where W 1 ∈ S m 1 and W 2 ∈ S m 2 are symmetric matrices, b 1 ∈ R m 1 and b 2 ∈ R m 2 are column vectors, and c 1 and c 2 are scalars.If the abovementioned quadratic hypersurfaces ( 26)-( 27) are obtained, then the following decision function can be used to label the new unknown sample point (x A , x B ):

C-Multiview Kernel-Free Nonlinear Support Vector Machine (C-MKNSVM).
In order to obtain the pair of quadratic hypersurfaces ( 26)-( 27), the following optimization problem is established: where C 1 , C 2 , and D are non-negative penalty parameters, ϵ is a nonnegative constant that tends to zero, and T are slack vectors.We can see that the optimization problem ( 29)-( 32) is constructed by both distinct C-version SQSSVMs.In particular, the constraint (31) follows the consensus principle and integrates the two views.In other words, the similarity constraint ( 31) is introduced in order that the prediction results of the same sample points under the two views are as consistent as possible.Te error of the similarity is measured by the ϵ-insensitive loss function.
Using the symmetry properties of the matrix W 1/2 , the optimization problem ( 29)-( 32) can be simplifed by Defnitions 1-3.In the objective function (29), the frst two terms can be converted as where , and i � 1, 2, . . ., l, andnd from the constraint conditions ( 30)-( 32), the quadratic functions can be transformed into the linear form as follows: where , and then, the optimization problem ( 29)-( 32) is equivalent to min ) ≥ 0 are always true for any w 1/2 .Terefore, the optimization problem ( 35)-( 38) is a convex quadratic programming problem.
In addition, if the matrices G 1/2 are degenerated into the identity matrices, then the optimization problem ( 35)-( 38) is similar to the primal problem of SVM-2K ( 22)-( 25) in form.Particularly, to obtain a nonlinear separating hypersurface in each view, the primal problem ( 35)- (38) need not be introduced to any kernel function, such that it can be solved directly.However, SVM-2K needs to introduce the kernel function, so its dual problem must be solved generally.
However, the dual problem of the optimization problem ( 35)- (38) can also be constructed and be solved.Next, we give the derivation of its dual problem.For the optimization problem ( 35)- (38), introducing the nonnegative Lagrange multiplier vectors T , and λ � [λ 1 , . . ., λ l ] T , the Lagrange function can be formulated as According to the Karush-Kuhn-Tucker (KKT) conditions [40], the optimality conditions of the optimization problem in ( 35)-( 38) can be written as follows: International Journal of Intelligent Systems From equations ( 40) and (42), we have Since τ 1/2 i , λ i ≥ 0, from equations ( 41) and (43), we derive 0 Substituting equations ( 44) and (45) into the Lagrange function (39) and combining with inequations (46) and ( 47) to obtain the dual problem of the primal problem ( 35)- (38), we obtain min Clearly, the abovementioned dual problem (48)-( 52) is also a convex QP problem and is easy to be solved.According to dual theory, the solution of the primal problem ( 35)-( 38) can be represented by the solution of the dual problem (48)-(52).Intuitively, it is more advantageous to solve the dual problem (48)-(52) when the number of sample points is relatively small.In contrast, if the primal problem ( 35)- (38) were solved directly, then the inverse matrices would be avoided.
Next, we give the relationships between the solutions of the primal problem ( 35)- (38) and the dual problem (48)-(52) by the following theorem.
After getting the optimal solution w 1/2 , c 1/2 to the optimization problem ( 35)- (38) from Teorem 1, the optimal solutions W 1/2 and b 1/2 to the primal problem ( 29)-( 32) can be obtained by the inverse operation of the equation For an unknown new sample point (x A , x B ), its label can be predicted using the information from both views by the decision function (28), and it is also possible to predict from view A or B using the single-view information.□

]-Multiview Kernel-Free Nonlinear Support Vector Machine (]-MKNSVM).
In the subsection, for the two-view binary classifcation problem (18), we develop a ]-version multiview kernel-free classifer, called as ]-MKNSVM.In order to obtain the pair of quadratic hypersurfaces ( 26)-( 27), the primal problem of ]-MKNSVM is established as where ] 1 , ] 2 , and D are nonnegative penalty parameters, ϵ is a nonnegative constant that tends to zero, and ρ 1 and ρ 2 are the unknown variables controlling the soft margin in views A and B.
Similarly, the primal problem ( 55)-( 58) can be simplifed to the following form: min According to dual theory, the dual problem of the optimization problem (59)-( 62) is as follows: Similar to Teorem 1, we give the relationships between the solutions to the primal problem (59)-(62) and the dual problem (63)-(68) by the following theorem: 62) can be obtained by
After getting the optimal solution w 1/2 , c 1/2 to the optimization problem (59)-(62) from Teorem 2, the optimal solutions W 1/2 and b 1/2 to the primal problem (55)-( 58) can be obtained by the inverse operation of the equation Similarly, the decision function ( 28) is used to predict the new sample point (x A , x B ).We can see that the ]-MKNSVM-solving process is similar to C-MKNSVM, and the main diference that is between the penalty parameters, ] 1 , ] 2 ∈ (0, 1], and the variables ρ 1 and ρ 2 that control the soft margin under views A and B can be changed according to the specifc problem.It should be pointed out that, unlike other nonlinear models with kernel functions, C-MKNSVM and ]-MKNSVM are nonlinear and kernel-free.Tey have good interpretability due to avoiding selecting kernel functions and corresponding parameters.

Some Discussion
In this section, for both C-MKNSVM and ]-MKNSVM, we discuss the means of their parameters and their relationship.Just for completeness, we also briefy give the optimization model of ]-SVM-2K, which corresponds to C-version SVM-2K.

Analysis of the Parameters.
In C-MKNSVM, the penalty parameter C 1/2 trades of two confict goals: maximizing the margin and minimizing the training error.Te values of C 1/2 are qualitatively clear: larger C 1/2 implies that more attention has been paid to the latter goal.
Te signifcance of the parameters ] 1/2 is concerned with the terms of the "support vector" and "training points with margin errors."Te defnitions of the support vector and training point with margin errors are given below.Defnition 4. Given the training set T v (18), we suppose that (α A/B , β ± ) is the optimal solution to the optimization problem (63)-(68), where α A/B � (α l from the constraints (66) and (67).In addition, it can also be shown under certain conditions that, with probability 1, both the fraction of training points with margin errors and the fraction of support vectors approach to ] 1 and ] 2 in views A and B when the number l of the training points tends to infnity.Teorem 3 and the above conclusions provide a basis for the selection of the penalty parameters ] 1 and ] 2 .

Relationship between ]-MKNSVM and C-MKNSVM. Corresponding to the relationship between the C-version
and ]-version in standard SVMs, we have the following theorem.

4.3.
]-SVM-2K.For a two-view binary classifcation problem, due to the advantages of the ]-version, another SVM-2K is proposed by constructing its optimization problem as where ] 1/2 , D, and ε are nonnegative parameters and T are slack vectors.Te detail is omitted.

Numerical Experiments
In this section, we evaluate our proposed methods C-MKNSVM and ]-MKNSVM on an artifcial dataset and three multiview classifcation benchmark datasets.Te three benchmark datasets are Ionosphere, Handwritten Digits, and Caltech101-20.Basic information about the datasets is listed in Table 1.
To verify the feasibility of our learners, we compare them with SVM-2K, multiview privileged support vector machines (PSVM-2V), and multiview twin support vector machine (MvTSVMs).For C-MKNSVM and ]-MKNSVM, their primal and dual problems are solved and marked with (P) and (D), respectively.In addition, we compare the results of SVM-2K and ]-SVM-2K.In order to clear the C-version, SVM-2K is denoted as C-SVM-2K in the numerical experiment.In order to display our experimental results conveniently, C-MKNSVM1, C-MKNSVM2, ]-MKNSVM1, and ]-MKNSVM2 are called single-view learners and C-MKNSVM and ]-MKNSVM are joint view learners.For joint view learners, C-SVM-2K and ]-SVM-2K denote that C-SVM1, C-SVM2, ]-SVM1, and ]-SVM2 are single-view learners, respectively.Note that ]-SVM-2K is also what we proposed in Section 4.3.Similarly, for the joint view learners PSVM-2V and MvTSVM, PSVM1-2V, PSVM2-2V, MvTSVM1, and MvTSVM2 are denoted as their single-view learners, respectively.For the learners with kernel functions, the Gaussian kernel function is used and marked with (R).We run these methods by MATLAB R2020a on a desktop PC running on Windows with 16.00GB RAM.
For each method, the optimal parameters are obtained by the 5-foldcross-validation and grid search strategy.For C-MKNSVM, MvTSVM, PSVM-2V, and C-SVM-2K, the optimal penalty parameters C 1 and C 2 are chosen from the set 2 i |i � −4, −3, . . ., 4  .For ]-MKNSVM and ]-SVM-2K, the optimal penalty parameters ] 1 and ] 2 are chosen from the set i|i � 0.1, 0.2, . . ., 1.0 { }.Furthermore, the penalty parameter D is chosen from the set 2 i |i � −3, −2, . . ., 3  .In addition, for the learners with kernel functions, the Gaussian kernel parameter is adjusted from the set 2 i |i � −5, −3, . . ., 5  , and the tradeof parameter c for PSVM-2V is tuned in the range 10 i |i � −3, −2, . . ., 3  .Note that we take ϵ � 10 − 6 in the primal problems of C-MKNSVM and ]-MKNSVM and set ϵ � 0 for all other cases.For each dataset, we randomly selected 80% samples as the training set and the rest as the test set.Te test process is repeated for 10 times to record the averaged accuracy and the corresponding standard deviation.Finally, the optimal accuracy is labelled by bold.

Artifcial Dataset.
Te artifcial dataset is a two-view binary dataset, where each view contains two moons and two crossed lines.Te dataset contains 300 sample points, including those containing 150 positive points and 150 negative points, and is denoted by blue circles and red crosses in Figure 1, respectively.Figure 1 shows the geometric properties of C-MKNSVM and ]-MKNSVM.Figures 1(a) and 1(b) and Figures 1(c) and 1(d) demonstrate the separation surfaces obtained by C-MKNSVM and ]-MKNSVM in each view, and the accuracy of self-verifcation in the single-view information is also indicated in the upper right corner of each subgraph.We can see that diferent quadratic surfaces can be obtained from two views although the same quadratic transformation is used for both C-MKNSVM and ]-MKNSVM in each view.Te results show that C-MKNSVM and ]-MKNSVM are feasible.
Table 2 also shows the accuracy of all methods over the artifcial dataset.Te accuracy obtained by C-MKNSVM and ]-MKNSVM can reach more than 98%, and ]-MKNSVM is slightly better than other comparison methods.In addition, we can see that ]-MKNSVM(P) has the best result, which can be attributed to choosing more 10 International Journal of Intelligent Systems appropriate parameters.Moreover, the results of C-MKNSVM and ]-MKNSVM obtained by solving the primal problem and the dual problem are similar.In particular, our C-MKNSVM and ]-MKNSVM are kernel-free classifers.We will further verify the performances of our methods on later benchmark datasets.

Ionosphere Dataset.
Te Ionosphere [36] dataset was collected by using a system in Goose Bay, Labrador, which consists of a phased array of 16 high-frequency antennas with a total transmission power of about 6.4 kilowatts.Te label of the data is the "good" or "bad" return of the free electron radar in the ionosphere.Te "good" radar return means that they have evidence of some structure in the ionosphere.In other words, the radar can return information, which is recorded as positive sample points.A "bad" radar return means that they failed to return information.Te signal crossed the ionosphere and was recorded as a negative class sample point, and the data contain 225 positive class samples and 126 negative class samples.In our experiments, the original dataset is considered to belong to view A, while the artifcially generated dataset obtained by PCA dimension reduction technology from 34-dimension to 25-dimension belongs to view B. Te experimental results are shown in Table 3.International Journal of Intelligent Systems Table 3 shows that our ]-MKNSVM has better performance than other two-view classifers.In all the compared methods, the prediction results of joint views are better than those of single views.In addition, it is easy to observe that ]-MKMSVM and ]-SVM-2K have higher classifcation accuracy than C-MKNSVM and C-SVM-2K.Te reason is that the ]-version learners own a small range of parameters.[34] consists of a feature set of handwritten digits 0-9, collected by a collection of Dutch utility maps.Te dataset consists of 2000 images in binary format, with 200 images per digit, for 10 classes.Tis dataset is composed of six feature sets, which can be regarded as the datasets under six views, namely, FOU, KAR, FAC, MOR, ZER, and PIX.Since our methods are two-view binary classifers, we selected two sets of views out of six, FOR and KAR and MOR and ZER.We select 13 pairs of digits to evaluate all involved methods for experiments, namely (0, 1), (2,5), (3,8), (6,9), (3,9), (0, 9), (8, 9), (0, 8), (4, 5), (6,8), (2,3), (1,7), and (2,8).

Handwritten Digits Dataset. Tis dataset
Te experimental results are shown in Tables 4 and 5.We can observe that ]-MKNSVM is almost better than all comparison methods, especially ]-MKNSVM(P).Te accuracy of C-MKNSVM is slightly better than that of ]-SVM-2K.For MKNSVM and SVM-2K, the results of the ]-version are better than those of the C-version in most cases.Tis also verifes that ]-version classifers are more dominant.In addition, compared with PSVM-2V and MvTSVM, our methods also have better classifcation performance in most cases, especially ]-MKNSVM.Moreover, in most cases, the performances of joint view prediction are better than that of single view in all comparison methods.Tis also demonstrates the advantage of multiview learning.Furthermore, the experimental results under views MOR and ZER are generally better than those under views FOU and KAR.Tis shows that selecting the appropriate views as the experimental object is benefcial to the classifcation performance.

Caltech-101-20
Dataset.Caltech-101 [32] is an image dataset.Tis dataset contains a total of 101 classes of images, each of which may contain between 40 and 800 images, and most of which are around 50 images per class, for a total of about 9000 images.Here, the frst 20 classes are selected as experimental subjects, with a total of 2386 images.We extract the Gabor and wavelet moment information from the image as views A and B. Gabor is a feature used to describe image texture, frequency, and orientation.Te Gabor flter is similar to human visual systems, so it is very suitable for texture representation and discrimination.Te wavelet moments represent the feature information of the image.Te data scale of Gabor texture information is 48 dimensions, and the wavelet moment feature information is 40 dimensions.
In the experiment, the experimental strategy of oneto-rest is used for the 20 classes of images.Te images of one class are successively regarded as positive class, denoted as  Te optimal accuracy is labelled by bold for better comparison of classifcation performance of compared methods.
12 International Journal of Intelligent Systems Similarity, the results obtained by the ]-version of MKNSVM and SVM-2K are better than those of the C-version in most cases.It is not difcult to fnd that, in the same group of experiments, ]-MKNSVM(P) has higher classifcation accuracy than ]-MKNSVM1(P) or ]-MKNSVM2(P).In addition, for C-MKNSVM and ]-MKNSVM, the results obtained by solving the primal problems are slightly better than those by solving the dual problems.Tis may be the inverse of the matrices involved in the dual problem.
From the experiments over the above datasets, we can draw several conclusions.Te geometric intuitions of C-MKNSVM and ]-MKNSVM are demonstrated on the artifcial dataset.We can see that diferent quadratic surfaces can be obtained from two views although the same quadratic transformation is used for both C-MKNSVM and ]-MKNSVM in each view.Te prediction results of the joint view are better than those of the single view.Te results demonstrate that our approaches are comparable with SVM-2K, PSVM-2V, and MvTSVM, and ]-MKNSVM is slightly better than other learners.In addition, for SVM-2K, PSVM-2V, and MvTSVM, they all need to solve the dual problem in the nonlinear case.Both the primal and dual problems of C-MKNSVM and ]-MKNSVM are allowed to be solved directly, and the primal problems do not involve the inverse matrices.In these experiments, the classifcation performances of the ]-version learners are better than those of the C-version learners, which further verifes the advantage of the ]version.Furthermore, since our methods are kernel-free, fewer parameters need to be chosen.We will further discuss the role of penalty parameters ] 1 and ] 2 in the ]-MKNSVM.

Parameters ]
1 and ] 2 Set.In this section, we discuss the role of parameters ] 1 and ] 2 in the ]-MKNSVM.We have only discussed ]-MKNSVM(P) here.Te two penalty parameters ] 1 and ] 2 take values from the grid with a step size of 0.1 from 0.15 to 0.95, and the parameters ε � 10 − 6 and D � 2 3 are fxed to obtain the running results of ]-MKNSVM.
We plot the 3D grid chart of the relationship between ] 1 and ] 2 and accuracy, as shown in Figure 2. It is easy to fnd 16 International Journal of Intelligent Systems that, in most cases, the optimal accuracy often corresponds to small penalty parameters ] 1 and ] 2 .Tis implies that we can appropriately shrink the value range of the parameters ] 1 and ] 2 .

Statistical Analysis.
To further compare the performances of the above methods, the Friedman test and the Nemenyi post hoc test are performed.According to the experimental results in Tables 2 to 6, the accuracies of these methods on each dataset are ranked, and the average ranking of each method is obtained.In addition, for the methods using single-view information prediction, we only compare ]-MKNSVM1(P) and ]-MKNSVM2(P) for a total of 10 methods.First, the Friedman test is used to compare the average ranks of diferent methods.Te null hypothesis states that all methods have the same performance; that is, their average ranks are the same.We can calculate the Friedman statistic τ F as follows: where N and k are the number of datasets and methods, respectively.Te average rank of the i-th method is r i .According to formulas (75) and (76), we have τ F � 15.2115.For α � 0.05, we can obtain F α � 2.0374.Since τ F > F α , we reject the null hypothesis.Ten, we proceed with the Nemenyi test to determine which methods are signifcantly diferent.To be more specifc, the performances of the two methods are considered signifcantly diferent if the diferences of their average ranks are larger than the critical diference (CD).CD can be calculated by For α � 0.05, we know that q α � 3.5265, and then, CD � 1.9554 can be obtained by equation (77).Figure 3 visually shows the results of the Friedman test and the Nemenyi post hoc test, where the average ranking of each method is labeled along one axis.Te axis is turned so that the lowest (best) ranks are in the right.Groups of methods that are not signifcantly diferent are linked by a red line.In Figure 3, we can see that our ]-MKNSVM is the best one statistically among the comparison methods, especially ]-MKNSVM(P).

Conclusion
In this paper, two multiview binary classifers are proposed, namely, C-MKNSVM and ]-MKNSVM, which combine the kernel-free trick with the consensus principle of multiview learning.Compared with C-MKNSVM, ]-MKNSVM is easier to select the suitable parameters because the range of its parameters is smaller.It should be pointed out that our methods are nonlinear and kernel-free; thus, the complex task of selecting kernel functions and corresponding parameters can be avoided.Due to utilizing the kernel-free trick, our C-MKNSVM and ]-MKNSVM have stronger interpretability, and their primal and dual problems are allowed to be solved directly.In the theoretical analysis, for C-MKNSVM and ]-MKNSVM, the relationship of solutions between the primal and dual problems in each classifer is addressed and the meanings of the penalty parameters and the connection of these two classifers are analyzed.Finally, the experimental results show that our proposed two multiview kernel-free classifers are feasible and efective.In the future, the extensions of MKNSVMs into the multiview (more than two views) scenario and the design of efcient acceleration algorithms for large-scale datasets are considered.
Assuming that (α A/B , β +/− ) is any solution of the dual problem (63)-(68), w 1/2 is obtained by the KKT condition.If there exists two components of α A/B , α A/B n 1/2 , and α A/B k 1/2 , according to the complementary slackness conditions, we have ξ 1/2 n 1/2 � ξ 1/2 k 1/2 � 0, and then, (1))e called training points with margin errors in views A and B, respectively, if we have ξ 1 i > 0,ξ 2 j > 0. Roughly speaking, the training points with margin errors are the training points within the soft margin band under each view or the training points misclassifed by the decision function(28).Te signifcance of ] 1 and ] 2 is shown by the following theorem.Supposing that ]-MKNSVM is performed on the training set T v(18)and the values ρ 1 and ρ 2 are also computed by (71).If ρ 1 , ρ 2 > 0 for views A and B, then we have the following:(1)Denoting the number of the training points with margin errors as m 1 and m 2 in views A and B, we have ] 1 ≥ m 1 /l, and ] 2 ≥ m 2 /l, i.e., ] 1 and ] 2 are the upper bound on the fraction of the training points with margin errors in views A and B, respectively.(2) Denoting the number of support vectors as n 1 and n 2 (18)and (x A j , x B j , y j ) are called support vectors in views A and B, respectively, if we haveα A i > 0,α B j > 0.Defnition 5. Given the training set T v(18), we suppose that (w 1/2 , c 1/2 , ξ 1/2 , η) is the optimal solution to the optimization problem (59)-(62), where ξ 1/2 � (ξ 1/2 1 , . . ., ξ 1/2 l ) T .Te in views A and B, we have ] 1 ≤ n 1 /l and ] 2 ≤ n 2 /l, i.e., ] 1 and ] 2 are the lower bound on the fraction of support vectors in views A and B, respectively.Proof (1) According to the KKT conditions, if ρ 1 , ρ 2 > 0, the corresponding Lagrange multipliers are zero, then the constraint (67) becomes equality.On the j > 0, then we can obtain ] 1 1] such that, for any C 1/2 and the corresponding ] 1/2 � ψ 1/2 (C 1/2 ), the decision functions obtained by ]-MKNSVM with ] 1/2 � ] 1/2 and C-MKNSVM with C 1/2 � C 1/2 are identical if the decision functions can be computed by both of them; i.e., for ]-MKNSVM with ] 1/2 � ] 1/2 , two components α A/B k 1/2 and α A/B n 1/2 of α A/B can be chosen such that α

Table 1 :
Information on four datasets.
International Journal of Intelligent Systems class +1, and all other images are classifed as negative class, denoted as class -1.Te experimental results are listed in