Automatic image annotation is for more accurate image retrieval and classification by assigning labels to images. This paper proposes a semisupervised framework based on graph embedding and multiview nonnegative matrix factorization (GENMF) for automatic image annotation with multilabel images. First, we construct a graph embedding term in the multiview NMF based on the association diagrams between labels for semantic constraints. Then, the multiview features are fused and dimensions are reduced based on multiview NMF algorithm. Finally, image annotation is achieved by using the new features through a KNN-based approach. Experiments validate that the proposed algorithm has achieved competitive performance in terms of accuracy and efficiency.
National Natural Science Foundation of China61572104611031466142500261751203Fundamental Research Funds for the Central UniversitiesDUT17JC04Jilin University93K172017K031. Introduction
The advent of Internet age brings the explosive growth of image resources. Although managing and retrieving images by semantic tags is a common and effective way, there are still a large number of untagged or not fully tagged images. However, it is not easy to carry out manual annotation regarding the cost of human resources and the semantic nuances of annotation under the background of various cultures, religions, and languages. Moreover, the cognition bias caused by subjectivity could induce semantic discrepancies as well. Thus, how to design an efficient automatic image annotation algorithm to provide accurate labels for untagged images has been an urgent problem.
Automatic image annotation (AIA) refers to the process that computers automatically provide one or more semantic tags that can reflect the content of a specific image through algorithms. It is a mapping from images to semantic concepts, namely, the process of understanding images. Image annotation is based on image feature representations, and features utilized in different tasks have different representation abilities [1–3]. For example, global color and texture features have been successfully used in retrieving similar images [4], while local structure features perform well in tasks of object classification and matching [5, 6]. In general, features that depict images from different views can provide complementary information. Thus a rational fusion of multiview features contributes to more comprehensive depiction for images, which can be beneficial to image searching, classification, or other related tasks.
Many multiview learning algorithms have been proposed for operating some tasks such as classification, retrieval, and clustering based on multiview features. According to the levels of feature fusion, multiview learning methods can be grouped into two categories [7]: feature-level fusion such as MKL [8], SVM-2K [9], and CCA [10] and classifier-level fusion such as hierarchical SVM [11]. Some experimental studies show that classifier-level fusion outperforms simple feature concatenation, whereas sophisticated feature-level fusion usually performs better than classifier-level fusion [11, 12].
Recently, many image annotation algorithms use a variety of underlying features to improve annotation performance [8–10]. On one hand, the multiview features improve the accuracy, but on the other hand the strategies decrease the efficiency and applicability of the algorithms because of the increase of feature dimensions. Moreover, many existing multiview learning algorithms are unsupervised; that is, they do not make use of the label information in the training set. Such fused features may not effectively contain the semantic relationship between samples. This paper proposes a semisupervised learning framework based on graph embedding and multiview NMF (GENMF). In GENMF, feature fusion and dimension reduction are firstly performed by the proposed graph embedded multiview NMF algorithm, and then the new obtained features are used to annotate images through KNN-based approach.
2. Related Works
Existing image annotation algorithms can be roughly divided into two categories [13]: model-based learning methods and database-based retrieval methods. Model-based methods explore the relationship between high-level semantic concepts and low-level visual features to discover a mapping function through machine learning or knowledge models for image annotation. Unlike model-based methods, database-based methods do not need to set up the mapping function based on the training set but directly provide a sequence of candidate labels according to the already annotated images in the database.
There are three kinds of model-based learning methods for image annotation: classification based methods, possibility based methods, and topic model-based methods. Classification based methods [14–16] treat tags as specific class labels and explore the mapping relations between low-level visual features and labels through machine learning methods. The essence of this kind of methods is transforming image annotation to image classification. Different classifiers are used to establish mapping functions between low-level features (from images or regions) and semantic concepts. Labels with the high confidence from the classifiers are annotated to images. Different from classification based methods, possibility based methods [17, 18] do not use classifiers to build the mapping functions but explore the relationship between the underlying features of the image and the semantic labels based on unsupervised probability and statistics models. They utilize the relations to calculate the joint probability of images and labels or the conditional probability of labels given an image and then estimate the possible labels through statistical inference. Topic model-based methods [19, 20] use latent topics to associate low-level visual features with high-level semantic concepts to implement image annotation.
The model-based methods have three difficulties in practical applications. First, the learning models trained on the datasets with finite image types and semantic labels can hardly reflect the characteristics of feature distributions in the real world, which leads to unsatisfactory annotation performance when facing new features and semantic labels. Second, the limited size of training sets may result in overfitting and low generalization ability of the models. Third, low-level features may often fail to express high-level semantic information because they belong to different feature spaces. Thus, it is also hard to establish a mapping model between image features and semantic concepts because of the semantic gap.
The essence of retrieval based method is directly providing a list of candidate labels for the images to be tagged based on the existing datasets with complete and valid label information. Most common retrieval methods are based on KNN [21–23]: they retrieve k images with the highest similarity to the input image from the database, and the labels of the k images are sorted based on the statistical relationship or weighted statistical relationship to generate the candidate labels of the input images. The other category is graph-based methods [24–27] that utilize image feature distance to establish relevant graphs of samples. Based on the assumption that neighboring images in the relevant graph have similar labels (label smoothness), the similarity between nodes and the global structural characteristics of the relevant graph are used to propagate and enrich the node information including labels and classes. This kind of semisupervised learning methods is suitable for not fully tagged datasets existing on the Internet.
Traditional graph-based methods usually label images by aggregating multiple features into one feature and building a relation graph based on this feature. In [25], it is pointed out that traditional methods cannot effectively capture the unique information for each feature and proposes to utilize different features to establish relation subgraphs and then link these subgraphs to form a supergraph. Based on the supergraph, label propagation is achieved through the graph-based method. In [26], different feature graphs are built based on different features of the images and then the relationship between images is constructed through the graph-based method based on different feature graphs. Furthermore, the relationship between images and different features can be also constructed. Finally, the two relationships, namely, the relation between images and the relation between images and different features, can be fused by a designed objective function to obtain good candidates for the labels.
In [27], a graph learning KNN (GLKNN) is proposed by combining KNN-based method and graph-based method. GLKNN first uses graph-based method to propagate the labels of the K nearest neighbors to the new image and obtain one sequence of candidate labels, then GLKNN employs the naive-Bayes nearest neighbor algorithm to establish the relationship between labels and image features for obtaining another sequence of candidate labels. Finally, the two candidate label sequences are linearly combined as the final predicted labels. In [28], graph embedding discriminant analysis is applied to classify marine fish marine fish species by constructing intraclass similarity graph and interclass penalty graph. Although the algorithm improves the performance of classification and clustering by utilizing class labels to build graph embedded term, the traditional graph embedding algorithm is not suitable for multilabel problems with multilabel images because there is no intraclass and interclass relationship. In [21, 22], different models based on metric methods are proposed to enhance the representation ability of features and further improve the performance of image annotation. However, the metric based feature processing only linearly embeds the original features and does not reduce the feature dimension. In [13], multiple features are fused by concatenation, which ignores the manifold characters of different features and high feature dimension results in low efficiency of the algorithm.
For reducing the dimensions of each feature for annotation, an extended local sensitive discriminant analysis algorithm is proposed by constructing relevant and irrelevant graphs in [29]. Generally, feature dimension reduction methods based on NMF decomposition are for single-view features. References [30, 31] extend this method to multiview features by simply concatenating multiple vectors into one feature vector before further dimension reduction. However, this concatenation way can cause vector dimension disaster. Besides, multiview features are descriptions from different views for images so that simple connection does not make good sense. Then a multiview NMF model based on shared coefficient matrix is developed for capturing the latent feature patterns in multiview features [32], where different view features have their own basis matrices and share a coefficient matrix. The proposed model is used for solving classification and clustering problems and is not suitable for multilabel problems with multilabel images.
Based on the above reviews, this paper proposes a semisupervised learning model based on multiview NMF and graph embedding. A novel multiview NMF algorithm based on graph embedding is developed to fuse the multiview features and reduce the dimension of the fused features by designing appropriate graph embedded regularization terms. Then, the image annotation is performed by using the new features through a KNN-based algorithm.
3. The Proposed Methods
In this section, we elaborate the proposed semisupervised framework for automatic image annotation. First, the graph embedding terms for multilabel problems are constructed through semantic similarity matrix. Second, an objective function is established by adding graph embedded semantic constraints. Third, the update rules for optimizing are derived in detail. Finally, the overall framework of the algorithm is presented.
3.1. Graph Embedding for Multilabel Problem
The traditional graph embedding model is introduced for classification problems, in which each sample has only one label, so that the Laplacian matrices L and Lp can be given according to whether they belong to the same category or not. However, for multilabel problems, a sample usually contains multiple category labels. Therefore, traditional graph embedding methods cannot be directly applied to multilabel problems. In this paper, we give a relation matrix according to whether samples are related or not. By setting appropriate thresholds, the relevant matrix and the irrelevant matrix can be obtained, and they can be used to calculate Laplacian matrices L and Lp, respectively.
Let xi,yi denote the i-th sample and Y∈Rn1∗m denote label matrix, where n1 is the number of samples in the training set, m is the number of labels, yi represents the i-th row of Y, and y:i represents the i-th column. The semantic similarity between sample i and sample j can be formulated as yiCyj, where C is a priori label relation matrix similar to that in [33].(1)Cij=cosy:i,y:j=y:i,y:jy:iy:jy:i∈Rn1∗1 denotes the sample vector and y:i denotes the L2-norm of y:i. Then, the semantic similarity matrix of samples can be obtained by the following formula:(2)Wsij=yiCyjGiven thresholds Tu and Tl (Tu≥Tl), samples with similarity greater than Tu are relevant, and samples with similarity less than Tl are irrelevant. Therefore, the relevant matrix W and the irrelevant matrix Wp are constructed as follows:(3)Wij=Wijs,Wijs>Tu0,Wijs≤Tu(4)Wijp=1,Wijs≤Tl0,Wijs>TlThe corresponding Laplacian matrices are formulated as follows:(5)L=D-W(6)Lp=Dp-Wpwhere Djj=∑lWjl and Djjp=∑lWjlp.
Having the relevant and irrelevant matrices, the following two constraint items C1 and C2 are incorporated to make feature representations in the new feature space consist with semantic concepts:(7)C1=∑i,j=1n1vi-vj2Wij=∑i=1n1viTviDii-∑i,j=1n1viTvjWij=TrVTDV-TrVTWV=TrVTLV(8)C2=∑i,j=1n1vi-vj2Wijp=∑i=1n1viTviDiip-∑i,j=1n1viTvjWijp=TrVTDpV-TrVTWpV=TrVTLpVwhere n1 denotes the number of samples in the training set and vi and vj represent the visual feature vectors of sample i and sample j, respectively.
3.2. An Automatic Image Annotation Model Based on Multiview Feature NMF and Graph Embedding
Let X=X(v)v=1M denote the data matrix, where X(v)∈RDv∗N is the feature matrix corresponding to the v-th view, Dv is the dimension of feature vectors, M is the number of views, and N is the number of samples. The objective function can be formulated as(9)O1=∑v=1MXv-UvVT2s.t.uijv≥0,vij≥0where U(v)∈RDv∗K and V∈RN∗K are nonnegative matrices and K denotes the dimension of the new low-dimensional feature.
Furthermore, graph embedding regularization terms (7) and (8) are combined with the above loss function, then(10)O1=∑v=1MXv-UvVT2+TrVlTLˇVls.t.uij≥0,vij≥0where Lˇ=(αL-βLp) and α and β are two equilibrium coefficients. Equation (10) consists of two terms, where the first is the error term, and the second is the constraint term that makes semantic constrains on V by using graph embedding regularization. It implies that the semantic related sample features are closer and vice versa. It is worth noting that the model is semisupervised since that Vl refers to data with labels, and the graph embedding term is used to constrain Vl.
3.3. Update Rules Derivation
The established model is semisupervised, and only part of the data has label information. The objective function can be rewritten in the form of block matrix. The following subsection will give the derivation of update rules.
The update rule of formula (10) is derived as follows:(11)O1=∑v=1MTrXv-UvVTXv-UvVTT+αTrVlTLVl-βTrVlTLpVl=∑v=1MTrXvXvT-2TrXvVUvT+TrUvVTVUvT+αTrVlTLVl-βTrVlTLpVlLet ψij(v) and φij be the Lagrange multipliers of constraint conditions uij(v)≥0 and vij≥0, respectively, Ψ(v)=ψij(v),Φ=[φij]. Then the Lagrange function can be written as(12)L=∑v=1MTrXvXvT-2TrXvVUvT+TrUvVTVUvT+TrΨvUvT+αTrVlTLVl-βTrVlTLpVl+TrΦVTThe partial derivative of L with respect to U(v) is as follows:(13)∂L∂Uv=-2XvV+2UvVTV+Ψvwhere Xv=Xvl,Xvu, V=VlT,VuTT, and Φ=[ΦlT,ΦuT]T, the symbol l means labelled and the symbol u means unlabelled. Thus, Xvl and Vl refer to the data with labels. Then (12) can be rewritten as (14)L=∑v=1MTrXvXvT-2TrXvl,XvuVlT,VuTTUvT+TrUvVlT,VuTVlT,VuTTUvT+TrΨvUvT+αTrVlTLVl-βTrVlTLpVl+TrΦlT,ΦuTTVlT,VuT=∑v=1MTrXvXvT-2TrXvlVlUvT-2TrXvuVuUvT+TrUvVlTVlUvT+TrUvVuTVuUvT+TrΨvUvT+αTrVlTLVl-βTrVlTLpVl+TrΦlVlT+ΦuVuT.Separating the terms associated with Vl and Vu, the above equation can be written as(15)L=LVl+LVu(16)LVl=∑v=1M-2TrXvlVlUvT+TrUvVlTVlUvT+αTrVlTLVl-βTrVlTLpVl+TrΦlVlT+const(17)LVu=∑v=1M-2TrXvuVuUvT+TrUvVuTVuUvT+TrΦuVuT+constThe partial derivatives of L with respect to Vl and Vu are as follows:(18)∂L∂Vl=∑v=1M-2XvlTUv+2VlUvTUv+2αLVl-2βLpVl+Φl(19)∂L∂Vu=∑v=1M-2XvuTUv+2VuUvTUv+ΦuUsing the KKT conditions ψij(v)uij(v)=0 and φijvij=0 (i.e., ψij(v)=0 and φij=0), consider formulae (13), (18), and (19) and let the derivatives equal 0; the following three equations can be obtained:(20)-XvVijuij+UvVTVijuij=0(21)∑v=1M-2XvlTUv+2VlUvTUvijvij+αLVl-βLpVlijvij=0(22)∑v=1M-2XvuTUv+2VuUvTUvijvij=0The following update rules can be obtained through the above three equations:(23)uvik⟵uvikXvVikUvVTVik(24)vljk⟵vljk∑v=1MXvlTUv+αWVl+βDpVljk∑v=1MVlUvTUv+αDVl+βWpVljk(25)vujk⟵vujk∑v=1MXvuTUvjk∑v=1MVuUvTUvjkIt is mentioned in [34] that in order to ensure the convexity of the loss function, β needs to be taken as an appropriately small value, which is suggested by β=10-4. Besides, [35] gives a modified strategy to the original update rules to ensure convergence. The same strategy can be applied to the derived update rules.
3.4. Framework of the GENMF
The schematic diagram of the proposed GENMF model can be illustrated as in Figure 1. First, multiview features are extracted from images as the input matrix X in (10). Equations (1)-(8) are utilized to build graph embedding regularization terms as the input matrices L and Lp in (10). Then, U(v) and V are updated iteratively by using updated equations (23) to (25) until the maximum number of iterations is reached or the loss value is within the permissible range. Finally, the new features Vu of the test set and the training set features Vl are input to the KNN-based labelling algorithm to obtain the predicted labels. The flowchart of the algorithm is shown in Figure 2.
Schematic diagram of the GENMF model.
Flowchart of the GENMF.
Algorithm 1 gives the pseudocode of the GENMF.
Algorithm 1: Multiview NMF with graph embedding for image annotation.
Input: Image set I=Itrain,Itest and label matrix Yl of the training set.
Output: Predicted label matrix Ypre for the test set Itest.
(1) Extract different feature X(v)∈R+Dv∗N for image set I;
(2) Construct Laplacian graph L and Lp;
(3) Initialize U(v)∈R+Dv∗K and V∈R+N∗K randomly;
(4) do
(5) Update U(v) and V based on equation (23)-(25);
(6) while the terminating condition is not satisfied
(7) Input V into 2PKNN [21] image annotation algorithm;
(8) Output predicted labels Ypre for the test set.
4. Experimental Studies4.1. Dataset and Experiment Design
The main purpose of the proposed algorithm is to improve the performance of automatic image annotation by fusing the multiview features and reducing the feature dimension, which makes it better to represent semantic concepts under semantic constraints in new low-dimensional feature spaces. So this paper selects the dataset Corel5k with 15 different features, and Corel5k consists of 4500 images for training and 499 images for test, which is available on http://lear.inrialpes.fr. The 15 features are all low-level image features including Gist, DenseSift, DenseSiftV3H1, HarrisSift, HarrisSiftV3H1, DenseHue, DenseHueV3H1, HarrisHue, HarrisHueV3H1, Rgb, RgbV3H1, Lab, LabV3H1, Hsv, and HsvV3H1. In the experiment, we select a local feature DenseSiftV3H1, a global feature Gist, and a color feature Hsv.
In the experiments, the multiple features except Gist are regularized through L2-normalization, and the normalized features are input into the GENMF to obtain low-dimensional representations. Then the low-dimensional feature vectors are input into the 2PKNN annotation algorithm to obtain the predicted labels for the test set. The performance of the algorithm is evaluated in terms of four metrics Pre, Rec, F1, and N+. Table 1 lists the parameters used in the experiments.
Parameters required in the algorithm and their ranges of values.
Notation
Description
Range of values
α
Weight for graph embedding terms
{0,500,1000,1500,2000,2500,3000}
K
Dimension of the new features
{100,200,300,400,500,600,700,800}
Tu
Label-relevant coefficient
{1,2,…,10}
Tl
Label-irrelevant coefficient
{0,1,…,Tu}
4.2. Experimental Results4.2.1. Convergence Curve of Loss Function
Figure 3 shows the convergence curves of loss function with different parameters. It can be observed that, after about 300 iterations, the trend of the loss curve tends to be stable.
Convergence curves of loss function.
α=1000, K=300
α=1500, K=400
4.2.2. The Influence of Different Tu and Tl
The relation matrix Ws∈R4500∗4500 can be established according to formula (2). Observed by experimental methods, the maximum value of Ws is 12.9554 and the minimum value of Ws is 0. The values of Tu=1,2,…,10 and Tl=0,…,Tu are traversed, where Tu≥Tl. Figure 4 shows the changes in the performance of the annotation when the different values of parameters are selected. On the whole, when Tu=2 and Tl=1, the algorithm obtains the highest F1 value. Thus, in the following experiments Tu is taken as 2 and Tl is taken as 1.
Impact of different values for Tu and Tl.
4.2.3. The Influence of Different α
Figure 5 shows the varying curve of Pre, Rec, F1, and N+ in the case of K = 300 with different α values. Figure 5-1 shows that the annotation accuracy increases first and then decreases with the increase of α. When α=1000, the accuracy reaches the highest value. Figure 5-2 shows that the recall rate generally increases first and then decreases. When α=2000, the recall rate reaches the highest value. From Figure 5-3, it can be seen that the F1 value also increases first and then decreases with the increase of α, but a concave point appears at α=1500. When α=1000, the F1 value reaches the highest value. In Figure 5-4, the N+ value fluctuates in the interval [0,1500], and its value reaches the highest value at α=2000 and decreases afterwards.
Curve of Pre, Rec, F1, and N+ with different α values.
4.2.4. The Influence of Different Feature Dimensions K
Figure 6 shows the annotation performance curves when α is taken as 0, 1000, and 2000, respectively, and the value of K changes from 100 to 800 with an increase of 100 each time. The three curves with different values of parameter α show the consistent trend of change. In Figure 6-1, the accuracy increases with the increase of dimension because more information can be retained, and the curve becomes stable until α reaches 2000. The worst performance is at α = 0. Figure 6-2 shows that the recall rate decreases slightly with the increase of dimension because the requirement for retrieval is higher with the increase of dimension. In Figure 6-3, F1 is reflecting the comprehensive effect of the accuracy and recall rate. It can be observed that the F1 increases in the interval [100,300] with the increase of dimension and then tends to be stable except for α = 0. Figure 6-4 shows that N+ value fluctuates but the overall trend is stable. In general, the performance of proposed algorithm on four metrics outperforms using the original features when α = 1000 or α = 2000 with dimension in the range of [200-800].
Annotation performance curves for different values of K.
4.2.5. Comparison with Existing Annotation Algorithms
Table 2 presents the comparison results with existing annotation algorithms. RMLF [36] optimizes the final prediction tag score by fusing prediction tag scores of 15 different features. LDMKL [14] and SDMKL [14] use the different classifiers based on the nonlinear kernel of three-layer network to annotate images. 2PKNN [22] uses two steps for annotation: after dealing with data imbalance, images are annotated through a KNN-based method in data-balanced dataset. LJNMF [31], merging features [31], and Scoefficients [31] consider different kinds of NMF modeling, extract new features, and annotate images through a KNN-based method. TagProp (ML) [21] and TagProp (σML) [21] acquire discriminative feature fusion on the training set by designing a metric learning model and annotate images using weighted KNN method. JEC [37] is a KNN-based algorithm based on the average distance of multiple features, which is a benchmark algorithm for image annotation. MRFA [38] proposes a new semantic context modeling and learning method based on multimarkov random fields. SML [39] is a discriminative model that treats each label as one class in multiclass classification problems; GS [38] introduces the regularization-based feature selection algorithm to exploit the sparsity and clustering properties of features.
Comparison results with other annotation algorithms.
Methods
Pre
Rec
F1
N+
SML
23
29
25.7
137
JEC
27
32
29.3
139
GS
30
33
31.4
146
MRFA
31
36
33.3
172
TagProp(ML)
31
37
33.7
146
TagProp(σML)
33
42
37.0
160
RMLF
29.7
32.6
31.1
-
Merging features
33
40
36.5
-
Scoefficients
30
39
34.6
-
LJNMF(3f’)
35
43
39.1
-
2PKNN(3f)
32
28
30.6
177
SDMKL
38
25
30
158
LDMKL
44
29
34.9
179
GENMF (3f)
38
39
39.2
168
In Table 2, the note (3f) denotes using the three features selected in this paper, and the note (3f’) indicates using three features that are not the same as in this paper. The results of other algorithms are directly taken from respective literatures and all the 15 features are utilized. Our algorithm uses only three features, and it can be seen in Table 2 that the proposed GENMF achieves the competitive performance.
4.2.6. The Best, Average, and Standard Deviation of the Results
Table 3 shows the best, average, and standard deviation of the results using 10 independent runs. The NMF-based algorithms have a certain randomness, and different initial values may produce different results. Table 3 shows that the influence of different initialization values is limited, but better performance could be expected if a better initialization strategy is chosen. Besides, the average time consumption of the proposed GENMF with the new low-dimensional features is 13.945 seconds to label all 499 test images, whereas utilizing the original features to label takes 34.652 seconds, which is about 2.5 times that of GENMF.
The maximum, mean, and standard deviation of results using 10 independent runs.
metrics
Precision
Recall
F1
N+
mean
0.38
0.39
0.392
168
SD
0.017
0.010
0.012
4.50
maximum
0.41
0.40
0.398
175
5. Conclusions
In this paper, we propose a semisupervised framework based on graph embedding and multiview nonnegative matrix factorization for automatic image annotation with multilabel images. The main purpose of the proposed algorithm is to improve the performance of automatic image annotation by fusing multiview features and reducing feature dimension, which makes it better to represent semantic concepts under semantic constraints in new low-dimensional feature spaces. For feature fusion and dimension deduction, a novel graph embedding term is constructed based on the relevant graph and the irrelevant graph. Then, the fusion of multiview features and the reduction of dimensionality are realized based on multiview NMF model. Moreover, the updated rules of the model are derived. Finally, images are annotated by using a KNN-based approach. Experimental results validate that the proposed algorithm can achieve competitive performance in terms of accuracy and efficiency.
Data Availability
The code used in this paper is released, which is written in Matlab and available at https://github.com/MenSanYan/image-annotation.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
The authors are grateful to the support of the National Natural Science Foundation of China (61572104, 61103146, 61425002, and 61751203), the Fundamental Research Funds for the Central Universities (DUT17JC04), and the Project of the Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University (93K172017K03).
MaiL.JinH.LinZ.FangC.BrandtJ.LiuF.Spatial-Semantic Image Search by Visual Feature SynthesisProceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)July 2017Honolulu, HI, USA1121113010.1109/CVPR.2017.125GuanH.SmithW. A.BRISKS: Binary Features for Spherical Images on a Geodesic GridProceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)July 2017Honolulu, HI, USA4516452410.1109/CVPR.2017.519ZhangY.LinW.LiQ.ChengW.ZhangX.Multiple-level feature-based measure for retargeted image quality201827145146310.1109/TIP.2017.2761556MR37298622-s2.0-85038256649ZhangF.WahB. W.Fundamental principles on learning new features for effective dense matching201827282283610.1109/TIP.2017.2752370MR3733179ZhangH.PatelV. M.ChellappaR.Hierarchical Multimodal Metric Learning for Multimodal ClassificationProceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)July 2017Honolulu, HI, USA3057306510.1109/CVPR.2017.312LiP.WangQ.ZengH.ZhangL.Local Log-Euclidean Multivariate Gaussian Descriptor and Its Application to Image Classification20173948038172-s2.0-8501592306110.1109/TPAMI.2016.2560816LuoY.LiuT.TaoD.XuC.Multiview matrix completion for multilabel image classification20152482355236810.1109/TIP.2015.2421309MR33458132-s2.0-84929501512LanckrietG. R.CristianiniN.BartlettP.El GhaouiL.JordanM. I.Learning the kernel matrix with semidefinite programming20045323330MR2247973FarquharJ. D. R.HardoonD. R.MengH.Shawe-TaylorJ.SzedmakS.Two view learning: SVM-2K, theory and practiceProceedings of the 2005 Annual Conference on Neural Information Processing Systems, NIPS 2005December 20053553622-s2.0-33749404380HardoonD. R.SzedmakS.Shawe-TaylorJ.Canonical correlation analysis: an overview with application to learning methods200416122639266410.1162/0899766042321814Zbl1062.681342-s2.0-10044285992KludasJ.BrunoE.Marchand-MailletS.Information Fusion in Multimedia Information Retrieval20084918Berlin, GermanySpringer Berlin Heidelberg147159Lecture Notes in Computer Science10.1007/978-3-540-79860-6_12SnoekC. G. M.WorringM.SmeuldersA. W. M.Early versus late fusion in semantic video analysisProceedings of the 13th Annual ACM International Conference on Multimedia (MULTIMEDIA '05)November 2005ACM39940210.1145/1101149.11012362-s2.0-84883126733GuY.QianX.LiQ.WangM.HongR.TianQ.Image Annotation by Latent Community Detection and Multikernel Learning20152411345034632-s2.0-8495945707610.1109/TIP.2015.2443501JiuM.SahbiH.Nonlinear Deep Kernel Learning for Image AnnotationProceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing201715511555SunL.GeH.YoshidaS.LiangY.TanG.Support vector description of clusters for content-based image annotation2014473136113742-s2.0-8488833438610.1016/j.patcog.2013.10.015ZhangM.-L.WuL.LIFT: Multi-label learning with label-specific featuresProceedings of the 22nd International Joint Conference on Artificial Intelligence, IJCAI 2011July 2011160916142-s2.0-84881055109ZandM.DoraisamyS.Abdul HalinA.MustaffaM. R.Visual and semantic context modeling for scene-centric image annotation2017766854785712-s2.0-8496270767910.1007/s11042-016-3500-5TianD.ShiZ.Automatic image annotation based on Gaussian mixture model considering cross-modal correlations20174450602-s2.0-8501080891310.1016/j.jvcir.2017.01.015TianJ.HuangY.GuoZ.QiX.ChenZ.HuangT.A multi-modal topic model for image annotation using text analysis20152278868902-s2.0-8491691077710.1109/LSP.2014.2375341PliakosK.KotropoulosC.PLSA driven image annotation, classification, and tourism recommendationProceedings of the IEEE International Conference on Image Processing2014300330072-s2.0-84983161084GuillauminM.MensinkT.VerbeekJ.SchmidC.TagProp: discriminative metric learning in nearest neighbor models for image auto-annotationProceedings of the IEEE 12th International Conference on Computer Vision (ICCV '09)September-October 2009Kyoto, JapanIEEE30931610.1109/iccv.2009.54592662-s2.0-77953202699VermaY.Jawahar CV.Image Annotation Using Metric Learning in Semantic NeighbourhoodsProceedings of the European Conference on Computer Vision2012836849KalayehM. M.IdreesH.ShahM.NMF-KNN: image annotation using weighted multi-view non-negative matrix factorizationProceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '14)June 2014Columbus, OH, USAIEEE18419110.1109/cvpr.2014.312-s2.0-84911384894ChenZ.ChenM.WeinbergerK. Q.Marginalized denoising for link prediction and multi-label learningProceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence201517071713Hamid AmiriS.JamzadM.Efficient multi-modal fusion on supergraph for scalable image annotation2015487224122532-s2.0-8492567487010.1016/j.patcog.2015.01.015Zbl1374.68606HeZ.ChenC.BuJ.LiP.CaiD.Multi-view based multi-label propagation for image annotation2015168C85386010.1016/j.neucom.2015.05.0392-s2.0-84937812952SuF.XueL.Graph learning on K nearest neighbours for automatic image annotationProceedings of the 5th ACM International Conference on Multimedia Retrieval, ICMR 2015June 20154034102-s2.0-84962467044HasijaS.BuragohainM. J.InduS.Fish species classification using graph embedding discriminant analysisProceedings of the 2017 International Conference on Machine Vision and Information Technology, CMVIT 2017February 201781862-s2.0-85017195582LiuX.LiuR.LiF.CaoQ.Graph-based dimensionality reduction for KNN-based image annotationProceedings of the 21st International Conference on Pattern Recognition, ICPR 20122013125312562-s2.0-84874569480BenAbdallahJ.CaicedoJ. C.GonzalezF. A.NasraouiO.Multimodal image annotation using non-negative matrix factorizationProceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2010September 20101281352-s2.0-78649865427RadR.JamzadM.Automatic image annotation by a loosely joint non-negative matrix factorisation2015968068132-s2.0-8494743328710.1049/iet-cvi.2014.0413GuanZ.ZhangL.PengJ.FanJ.Multi-View Concept Learning for Data Representation20152711301630282-s2.0-8495955092110.1109/TKDE.2015.2448542WangH.DingC.HuangH.Multi-label Linear Discriminant AnalysisProceedings of the European Conference on Computer Vision2010126139GuanN.HuangX.LanL.LuoZ.ZhangX.Graph based semi-supervised non-negative matrix factorization for document clusteringProceedings of the 11th IEEE International Conference on Machine Learning and Applications, ICMLA 201220134044082-s2.0-84873598274LinC.-J.On the convergence of multiplicative update algorithms for non-negative matrix factorization20071861589159610.1109/tnn.2007.8958312-s2.0-36348966695YaoY.XinX.GuoP.A rank minimization-based late fusion method for multi-label image annotationProceedings of the 23rd International Conference on Pattern Recognition, ICPR 201620178478522-s2.0-85019138246MakadiaA.PavlovicV.KumarS.A New Baseline for Image AnnotationProceedings of the European Conference on Computer Vision2008316329XiangY.ZhouX.ChuaT.-S.NgoC.-W.A revisit of generative model for automatic image annotation using markov random fieldsProceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009June 2009115311602-s2.0-70450161440CarneiroG.ChanA. B.MorenoP. J.VasconcelosN.Supervised learning of semantic classes for image annotation and retrieval200729339441010.1109/TPAMI.2007.612-s2.0-33847419773