A Distinguishable Pseudo-Feature Synthesis Method for Generalized Zero-Shot Learning

Generalized zero-shot learning (GZSL) aims to classify seen classes and unseen classes that are disjoint simultaneously. Hybrid approaches based on pseudo-feature synthesis are currently the most popular among GZSL methods. However, they suffer from problems of negative transfer and low-quality class discriminability, causing poor classification accuracy. To address them, we propose a novel GZSL method of distinguishable pseudo-feature synthesis (DPFS). The DPFS model can provide high-quality distinguishable characteristics for both seen and unseen classes. Firstly, the model is pretrained by a distance prediction loss to avoid overfitting. Then, the model only selects attributes of similar seen classes and makes sparse representations based on attributes for unseen classes, thereby overcoming negative transfer. After the model synthesizes pseudo-features for unseen classes, it disposes of the pseudo-feature outliers to improve the class discriminability. The pseudo-features are fed into a classifier of the model together with features of seen classes for GZSL classification. Experimental results on four benchmark datasets verify that the proposed DPFS has GZSL classification performance better than that in existing methods.


Introduction
Target classifcation and recognition have been dramatically improved with the development of deep learning technologies. Traditional deep learning methods rely heavily on large-scale labelled training datasets such as ImageNet [1]. However, some are infeasible in extreme cases without labelled samples of some classes [2]. To address it, zero-shot learning (ZSL), which imitates the process of human recognition, has been proposed to link seen classes (available in training datasets) and unseen ones (not available in training datasets) using auxiliary information (e.g., attributes [3] and word vectors [4]). Conventional ZSL methods only consider the recognition of unseen classes but neglect that of seen classes. It leads to the failure of simultaneous recognition of them [5]. Subsequently, generalized zero-shot learning (GZSL) [6] has been found to address it.
Most previous GZSL works are mainly divided into mapping-based approaches [7,8] and hybrid approaches. Te former learns a visual-semantic projection model trained with labelled samples. However, they are prone to overftting due to limitation of labelled sample numbers and domain shift between disjointed seen classes and unseen classes [9], failing in unseen class classifcation. Te latter, including generating-based approaches [10] and synthesisbased ones, has been proposed to alleviate overftting. Generating-based approaches (e.g., generative adversarial networks (GANs) [11] and variational auto-encoders (VAEs) [12]) generate pseudo-features for unseen classes with prior semantic knowledge. However, they sufer from mode collapse [13] because it is challenging to train hybrid models. Unlike them, synthesis-based approaches [14][15][16] synthesize pseudo-features for unseen classes by using semantic information and seen class features. However, they sufer from negative transfer [17] and low-quality class discriminability [18].
In this paper, we propose a novel two-stage method of distinguishable pseudo-feature synthesis (DPFS) for GZSL tasks, as shown in Figure 1. Here, the embedding network and the preclassifer are jointly pretrained to extract distinguishable features for seen classes and simultaneously predict prototypes for unseen ones in stage 1. It ensures that the features of seen classes are well-kept and avoids overftting efectively. Next, distinguishable pseudo-features of unseen classes are synthesized through the attribute projection module (APM) and the pseudo-feature synthesis module (PFSM) in stage 2. Here, for each unseen class, APM builds a sparse representation based on attributes to output a base vector. It only uses attributes of the base classes (i.e., the similar seen classes), thereby overcoming negative transfer. Furthermore, PFSM creates feature representations and synthesizes the pseudo-features by using the base class features, the base vectors and the unseen class attributes. Te outliers of pseudo-features are disposed of to get distinguishable pseudo-features and improve the class discriminability. Te distinguishable features are fed to the classifer to boost GZSL classifcation performance.
Our major contributions are summarized as follows: (1) We proposed a novel generalized zero-shot learning (GZSL) method of distinguishable pseudo-feature synthesis (DPFS). Te proposed method can further improve GZSL classifcation performance compared with other state-of-the-art methods. (2) We pretrained our model by a well-designed distance prediction loss while predicting prototypes for unseen classes, thereby avoiding overftting. (3) We only selected attributes of similar seen classes when making sparse representations based on attributes for unseen classes, thereby overcoming negative transfer efectively. (4) We screened the outliers of synthesized pseudofeatures and disposed of them to further improve class discriminability.

Related Works
Mapping-based approaches can be traced back to early ZSL tasks [2][3][4]9]. Tey learn a mapping function between visual features and semantic features by supervised learning. So, it is important to construct a feature-semantic loss function that can be used to train mapping model [19]. But early methods are prone to overftting in GZSL tasks [7]. CPL [8] learned visual prototype representations for unseen classes to solve the problem. To obtain discriminative prototype, DVBE [20] used second-order graphical statistics, DCC [21] learned the relationship between embedded features and visual features, and HSVA [22] used hierarchical two-step adaptive alignment of visual and semantic feature manifolds. However, the prototype representation is constrained and does not correspond to actual features [10] due to domain shift. Diferent from these works, we propose a distance prediction loss, which constructs not only feature-attribute distance constraint of seen classes but also predicts unseen class prototypes under the guidance of a preclassifer. It keeps seen class features from disturbing the classifcation of unseen classes to avoid overftting.
Generating-based approaches [23,24], which utilize GANs and VAEs, have been widely applied to produce information about unseen classes and improve the prototype representation for GZSL tasks. Tey generate pseudo-features for unseen classes under the prior condition of semantic knowledge and random noise. LDMS [25], Inf-FG [26], and FREE [27] improved the generating strategy from aspects of discrimination loss, consistency descriptors, and feature refning. Besides, GCF [28] presented counterfactual-faithful generation to solve recognition rate imbalance between both seen classes and unseen ones. Although the strategies of generating-based methods are added to our proposed method, the use of simplex semantic information and the training difculty [16] of GANs cause mode collapse.
Synthesis-based approaches [24,29] integrate features and semantics of seen classes to enhance the feature diversity. SPF [15] designed a synthesis rule to guide feature embedding. TCN [14] exploited class similarities to build knowledge transfer from seen to unseen classes. To deal with the domain shift, LIUF [16] synthesized domain invariant features by minimizing the maximum mean discrepancy distance of seen class features. However, it would lead to negative transfer by mixing irrelevant class information. Diferent from the above mentioned, we only select the similar seen classes, instead of all seen classes, to fnish knowledge transfer, thereby avoiding negative transfer caused by the mixing of irrelevant information. Ten, we utilize distinguishable features extracted from the pretrained embedding network to apply to the pseudo-feature synthesis. Besides, we use a preclassifer to dispose of the outliers of synthesized components, thereby improving class discriminability. Unlike the method [24] of using synthesized elements from other domains, we only utilize the similar seen classes from this domain to overcome the unavailability of data from other domains.

Proposed Method
GZSL is more challenging than ZSL, which recognizes samples only from unseen classes, because GZSL needs to recognize samples from seen classes and unseen classes. Terefore, we propose the DPFS method to improve the theoretical basis of GZSL further and boost the classifcation performance. DPFS can synthesize distinguishable pseudofeatures for unseen classes, and then use the pseudo-features to fnish GZSL classifcation together with features of seen classes. In this chapter, we frst defne notations and defnitions of GZSL, then outline the proposed method, including base class selection, distinguishable feature extraction, attribute projection, and distinguishable pseudofeature synthesis. Finally, we provide the process of our training algorithm.

Mathematical Formulation.
In GZSL tasks, suppose we have S seen classes y S and U unseen classes y U , y S ∩ y U � ∅. We give training dataset ∆ S � y s (x i , y i ) ∈ Ξ × y S n s i�1 where n s is the sample number, Ξ is visual space, x i is a visual feature, and y i is the class index of x i . Te mapping function of the embedding network is denoted as φ: Ξ ⟶ ς where ς is latent space. Te weight parameters of the embedding network, the preclassifer and the classifer are θ en , θ pcls , and θ cls , respectively. A S � [a S 1 , . . . , a S S ] and A U � [a U 1 , . . . , a U U ] are class-attribute matrices of seen classes and unseen classes, respectively. s and u are indexes of seen classes and unseen classes, s ∈ y S and u ∈ y U , respectively.
GZSL methods learn a function f GZSL : Ξ ⟶ y S ∪ y U with training dataset ∆ S , and class-attribute matrices A S and A U to classify disjoint seen classes and unseen ones at the same time. After the training, both seen and unseen classes from testing datasets will be predicted by f GZSL .

Base Class Selection.
For each unseen class, we only select the top K seen classes similar to the unseen classes to overcome negative transfer. Attributes of all base classes of unseen class u are with the closest distance to the attribute of the unseen class, which are as follows: Computational Intelligence and Neuroscience where topk(•) is an operator that sorts elements from small to large and selects indices of the top K elements. y B u stones indices of the top K base classes, which are the frst to the K th seen classes most similar to unseen class u.

Distinguishable Feature Extraction.
In stage 1, we pretrain the embedding network and the preclassifer. It makes the embedding network extract distinguishable features for seen classes to build a relationship between classes and semantics, as shown in Figure 1. Te attributes obtained by cognitive scientists [30] are the most commonly used semantic knowledge, and they are based on the high-level description of target objects specifed by human beings [2]. We introduce the constraint of feature-attribute distance by imitating meta-learning [31], and build prototype representations, as shown in Figure 2. Te customary way to construct the meta-learning task is called as K-way-N-shot [32], where N labelled samples in each of the K classes are provided in each iteration of the model training.
We randomly sample one unseen class and K seen classes per iteration. And, we set support set Σ � (x i , y i )|y i ∈ Ψ S N×K i�1 and query set Θ � (x i , y i )|y i ∈ Ψ S N×K+Q×K i�N×K+1 . Te visual features from Σ produce prototypes for seen classes through the embedding network are as follows: where x i is a visual feature from seen class s and N is the class number. Ten, a feature-attribute distance (FAD) loss is constructed as follows: Diferent from the meta-representation [33] restrained by the distance minimization of intraclass features, we act on the feature-attribute distance constraint to structure the meta-representation associating common characteristics between diferent attributes. After the constraint, features in latent space are pulled near their prototypes to ensure that the similar attracts and the dissimilarity repels each other. Te prototype and the attribute from the same class are close to each other. Terefore, the features of seen classes in latent space can be regarded as the distinguishable features extracted from the embedding network.
To keep the embedding network from overftting, the prototypes are predicted by features of their base classes. A component from the base class is denoted as follows:   Computational Intelligence and Neuroscience where choice(•) is a choice operator, specifcally choice(b k ) means randomly choosing a visual feature of the k th similar base class from Ψ B u . A predicted prototype is denoted as follows: For each iteration, we build a prototype query set Θ U � (c u , u)|y i ∈ Ψ U U i�1 . Ten, a preclassifcation loss Λ PC operating to pretrain the preclassifer is donated as follows: where p(•|•) is a SoftMax function for the preclassifcation. Ten, Λ FAD and Λ PC are summed to form distance prediction loss Λ DP as follows: We use the distance prediction loss to jointly pretrain the embedding network and the preclassifer. After that, seen classes will be classifed, and unseen classes will be predicted preliminarily. It prevents trade-of failure between seen and unseen classes. Besides, features of seen classes will be extracted, and then used for unseen pseudo-feature synthesis.

Attribute Projection.
Inspired by sparse coding, we make a sparse representation for each unseen class. We select attributes only from the base classes unlike the methods [14,16] using all seen classes, to build attribute projections from seen to unseen classes. For unseen class u, the matrix of its attribute projection is denoted as follows: where a S b 1 , a S b K ∈ Ψ B u . Te attribute projection can represent the unseen class information by using sparse representation vector set m u U u�1 . Te objective function of the attribute projection is as follows: where β 1 and β 2 are two regulation coefcients, β 1 , β 2 > 0. Te mixed regularizations of L1-norm and L2-norm have the advantages of sparsity and trade-of between deviation and variance [34]. Both β 1 and β 2 are set as 0.4 with appropriate generality. Te objective function is optimized by the optimal local condition of Karush-Kuhn-Tucker [35] where m u are non-negative. We normalize m u by using the following equation: Computational Intelligence and Neuroscience 5 Ten, we treat m u as the base vector. Te attribute projection provides a vital item for the pseudo-feature synthesis, as shown in Figure 3.

Distinguishable Pseudo-Feature Synthesis.
For unseen class u, we randomly choose a feature from each of its base classes to construct an embedding matrix v 1 · · · v K . Te base vectors are utilized for weighting the chosen features that are embedded into the attribute projection, as shown in Figure 3(a). Ten, a feature representation is formulated as follows: where c is a weighting coefcient (c ∈ [0, 1]). However, the feature representation only integrated with features of the base classes may be scattered and produce outliers of candidate pseudo-features, as shown in Figure 3(b). Terefore, attribute information is integrated into the feature representation to synthesize candidate pseudo-features, as shown in Figure 3(c).
To dispose of the outliers, we screen them by the following equation: where τ is creditability threshold (τ ∈ [0, 1]). Te preclassifer acts as an operator of the outlier disposing. It screens and reserves the credible pseudo-features satisfying f(v) � 1 to get distinguishable pseudo-features of unseen classes, as shown in Figure 3(d). After the operations of the attribute projection and the pseudo-feature synthesis, the synthesized features integrated with the information of the similar base classes and unseen classes have separability characteristics.

Train and Inference.
We conduct the DPFS model training. Algorithm 1 shows the pseudo-code of the DPFS training algorithm. Te algorithm mainly includes two-cycle structures because DPFS is a two-stage method. Firstly, the sequence structure from lines 1 to 2 performs the attribute projection to get the base vector for each unseen class. Next, the frst cycle from lines 3 to 9 performs the embedding module pretraining to extract distinguishable features of seen classes. Ten, the second cycle from lines 10 to 15 performs the classifer training for GZSL tasks. In each iteration of the classifer training, we randomly select a certain number of the whole samples from training samples and synthesized pseudo-feature samples, where the number of the selected whole samples is N w . Here, the proportion of the pseudo-feature samples in the whole samples is set as η.
After each iteration, the classifer is adopted for evaluation.  [38]). AWA2 and aPY are coarse-grained datasets and aPY includes a higher proportion of unseen classes than AWA2. CUB and SUN are fne-grained datasets, especially SUN, with more whole classes and fewer training samples per class than CUB. Table 1 summarizes the statistics of the four evaluating benchmarks.

Implementation Details.
We conduct ResNet-101 [39] as a backbone based on a convolutional neural network. Visual features are extracted from the output of the fnal avgpooling layer after the backbone is pretrained on ImageNet [1]. Figure 4 shows the network structures of the DPFS model including the embedding network, the preclassifer and the classifer. Te embedding network is composed of three fully connected (FC) layers, and the back of each layer is connected to a ReLU activation function for nonlinear activation. Both the preclassifer and the classifer have the same modules. Teir modules are composed of two FC layers and the output dimensions equal the total number of all classes. For the four benchmarks, the middle layer dimension of the classifer is 512 for AWA2 and aPY, and 1024 for CUB and SUN, respectively. Our model is coded in PyTorch and runs on GeForce RTX 2080 Ti. It is trained by an adaptive moment estimation (Adam) [40] optimizer. During the embedding module pretraining, sample numbers of each class in both the support set and the query set, N and Q, are set as 4 for AWA2, aPY, and CUB, and 2 for SUN, respectively. Te learning rate of our model is 10 −4 . During the classifer training, the number of the whole selected samples, N w is set as 1000. Te classifer is trained with a learning rate of 10 −4 and the embedding module is fne-tuned with a learning rate of 10 −6 . Besides, four additional hyper-parameters, the proportion of pseudo-feature samples η, creditability threshold τ, number of base classes K, and weighting coefcient c will be discussed later in the hyper-parameter sensitivity chapter. Samples from training datasets are used to train our model by supervised learning. And samples from the testing datasets are used to evaluate GZSL classifcation performance of our model.
Te accuracies of average seen classes (As) and average unseen classes (Au) are computed based on the universal evaluation protocols [6].
Au � 1 We evaluate the simultaneous classifcation accuracy of both seen and unseen classes by computing harmonic mean H as follows: H is regarded as the most crucial criterion to measure the GZSL classifcation performance.

Hyper-Parameter Sensitivity.
Tere are four hyper-parameters including the proportion of pseudo-feature samples η, creditability threshold τ, number of base classes K, and weighting coefcient c. We discuss the sensitivity of the hyper-parameters because proper hyper-parameters give our model extra reliability and robustness. Proportion η controls the frequencies of obtaining information from seen classes and unseen ones. Higher η provides the classifer with more opportunities to learn the characteristics of unseen classes. Figure 5 shows GZSL classifcation performance under diferent η on the four benchmarks. We set η within the range from 0.7 to 0.97 and select the proper η value according to the optimal GZSL performance.
As will decrease slowly while Au and H will increase until reaching a peak along with the increase of η in most cases. Tis result reveals that DPFS can provide more balanced GZSL performance by adjusting η. Te decreasing ratio of As will increase after Au and H reach the peak. It indicates a proper selection of η is necessary to solidify seen class classifcation. When H reaches the peak, η is diferent on the four benchmarks. Te value depends on the Input: training dataset Δ S , class-attribute matrices of seen classes and unseen ones A S and A U , learning rate λ, and max-epochs of the embedding module pretraining and the classifer training n pre and n t Initialize: set of the weight parameters of the embedding module and the preclassifer W � θ en , θ pcls , classifer weight parameter θ cls (1) Build attribute projection matrices with A S and A U by, and equations (1), (2), and (9) for unseen classes (2) Compute the base vectors with the matrices and A U by equations (10) and (11) (3) for step � 0, . . ., n pre do (5) Compute base class prototype c k with E θ en by equation (3) (6) Build prototype query set Θ U � c i Q i�1 with E θ en and A U by equations (5) and (6)  (7) Compute Λ DP by equation (8)  (8) Update W←W + λ 1 ∇ W Λ DP (9) end for (10) for step � 0, . . ., n t do (11) Synthesize candidate pseudo-features for unseen classes by equation (12 granularity of training samples. In general, the value on the benchmarks with a few training samples (such as SUN) should be lower than that on the benchmarks with multitraining samples (such as AWA2), and the value on the benchmarks with a higher proportion of unseen classes (such as aPY and CUB) should be higher. Terefore, we set η � 0.85 for AWA2, η � 0.94 for aPY, η � 0.91 for CUB, and η � 0.76 for SUN. Creditability threshold τ controls the efect of the outlier disposing. Figure 6 shows the performance under diferent τ on the four benchmarks. We set τ within the range of 0.7 to 0.95. Tis result reveals that As will decrease and Au will increase along with the increase of τ in most cases. Meanwhile, H will increase until reaching a peak. When the range of τ is 0.8 to 0.9, H will reach the peak, and the classifcation accuracy will be the best. It indicates proper τ can prevent the outliers from interfering with seen class classifcation while maintaining unseen class classifcation. Terefore, we set τ � 0.85 on all the four benchmarks.
Numbers of base classes and weighting coefcient, K and c, concurrently control the pseudo-feature synthesis simultaneously. Figure 7 shows the H heatmap results of the  Computational Intelligence and Neuroscience performance under diferent K and c values on the four benchmarks. Te range of K is set from 3 to 9 for AWA2 and CUB, from 6 to 12 for aPY, and from 2 to 8 for SUN, respectively. Te range of c is set from 0 to 0.4. Te result reveals that N has a more signifcant impact than c on H. H will increase frst and then reduce along with the increase of N. It indicates that an appropriate integration with the similar seen classes will achieve outstanding classifcation accuracy, but an over-integration will degrade the classifcation accuracy because it mixes information of irrelevant classes. According to the performance on the four benchmarks, N making H reach peak depends on the granularity of training samples. In general, N on the benchmarks with a few training samples (such as CUB and SUN) should be lower than that on the benchmarks with multitraining samples (such as AWA2), and N on the benchmarks with the higher proportion of unseen classes (such as aPY) should be higher. So, we set N � 5 for AWA2, N � 9 for aPY, N � 6 for CUB, and N � 3 for SUN.
Te result also reveals that when N is fxed, H will also increase frst and then reduce along with the increase of c in most cases. It indicates that weighting a certain proportion of attributes will improve the classifcation accuracy and the proper introduction of attribute information can raise the performance of our model. Terefore, we set c � 0.2 for AWA2, c � 0.3 for aPY, c � 0.1 for CUB, and c � 0.35 for SUN.  Table 2 shows GZSL classifcation performance results compared with existing state-of-theart approaches and the proposed DPFS. Te existing approaches contain the mapping-based, the generatingbased, and the synthesis-based approaches, which are marked with †, ⸶, and ⸷, respectively. Among these, the results show that DPFS gains the best performance on AWA2, CUB, and SUN, and achieves the second performance on CUB. Compared with the mapping-based approaches, DPFS is superior to DCC by 5.5% on aPY, and DVBE by 4.8%, 2%, and 5.4% on AWA2, CUB, and SUN, respectively. Compared with the generating-based approaches, DPFS is superior to FREE by 4.7% on AWA2, LDMS by 4.9% on aPY, and GCF by 3.9% on SUN, respectively. And compared with the synthesis-based approaches, DPFS is superior to LIUF by 1.6%, 1.1%, 6%, and 3.2% on AWA2, aPY, CUB, and SUN, respectively. DPFS signifcantly improves Au and avoids overftting.     Total  AWA2  40  10  50  85  23527  5882  7913  37322  APY  20  12  32  64  5932  1483  7924  15339  CUB  150  50  200  312  7057  1764  2967  11788  SUN  645  72  717  102  10320  2580  1440  14340 10 Computational Intelligence and Neuroscience DPFS is superior to most mapping-based approaches in the aspects of Au and H, especially on SUN. It indicates that DPFS has a more vital learning ability on the benchmarks with a few training samples. And DPFS shows signifcant improvement of As, Au, and H, especially compared to generating-based approaches on aPY. It explains that DPFS makes full use of the feature information of seen classes and the attribute information, thereby solving the difculty of classifying the higher proportion of unseen classes and avoiding mode collapse.
DPFS also experiments on the four benchmarks for conventional ZSL tasks, where only the synthesized pseudofeature samples are fed into the classifer. Table 3 shows ZSL classifcation performance results. We observe that DPFS overperforms existing methods on AWA2, aPY, and SUN, which can also verify that the synthesized pseudo-features have distinguishable characteristics.
We further demonstrate the advantage of DPFS over SPF and LIUF. We imitate SPF and LIUF, replacing the strategy of our pseudo-feature synthesis with the synthesis strategies of SPF and LIUF to form the reference methods, D-SPF and D-LIUF, respectively. Meanwhile, the stages of the embedding module pretraining and classifer training of D-SPF and D-LIUF are the same as those of DPFS. Table 4 shows the comparison results among D-SPF, D-LIUF, and DPFS. DPFS gains prominent advantages over D-SPF because the optimized attribute projection can embed and project features of seen class into features of unseen class more accurately, to improve class discriminability. DPFS also has apparent advantages over D-LIUF especially on CUB and SUN. DPFS eliminates the irrelevant classes, so it suppresses negative transfer. In addition, DPFS introduces the attribute weighting in equation (12) and the outlier disposing in equation (13), to decrease the confusion between classes. So, DPFS is superior to D-SPF and D-LIUF in classifcation.

Ablation Results.
We conducted ablative experiments to illustrate the infuence of diferent tactics in DPFS. Te tactics contain the embedding module pretraining (mpt), the outlier disposing (odi) in equation (13), and the preclassifcation loss (pc) in equation (7). Table 5 shows the results of ablation experiments. Four ablated methods, PFS, DPFS-1, DPFS-2, and DPFS-3 are all validated. PFS is to remove all the tactics. DPFS-1, which pretrains the model only by the feature-attribute distance loss in equation (4), is to add the mpt tactic. DPFS-2 is to add both the mpt and odi tactics. And DPFS-3, which pretrains the model by the distance prediction loss in equation (8), is to add both the tactics of mpt and pc.
It is important to add the mpt tactic for extracting some common characteristics between seen classes and unseen ones because it improves prototype representations and eliminates the domain shift. Terefore, DPFS-1 performs obvious progress compared with PFS. PFS-1 is superior to PFS by 8.6% on AWA2, 8.3% on aPY, 9.3% on CUB, and 9.3% on SUN. On this foundation, DPFS-2 adopts the odi tactic to eliminate the outliers of candidate pseudo-features. It boosts the performance on parts of benchmarks. PFS-2 is superior to PFS-1 by 0.9% on AWA2, and 0.4% on aPY,  We visualize features from the embedding module by t-SNE [41] to further show the tactic efect on the AWA2 benchmark for GZSL tasks. Figure 8 shows the visualization results. We fnd that DPFS can improve the distinguishability of unseen classes. Meanwhile, it can also maintain the distinguishability of seen classes according to the comparison results between Figures 8(a), 8(c) and 8(b), and 8(d).
Considering that existing methods [18,26] do not visualize all features of both seen and unseen classes, we visualize all the output features of testing samples from PFS and DPFS in   Figures 8(e) and 8(f ), respectively. It is obvious that the classes characterized by the output features from DPFS is more separable than those characterized by the output features from the PFS. DPFS eliminates the confusion between classes and improves feature distinguishability, thus achieving a better multiclass classifcation accuracy. Both seen and unseen classes satisfy the characteristics of intraclass gather and interclass separability. Terefore, DPFS can efectively eliminate the domain shift.

Discussion
Based on the results above, our model was trained and evaluated on four benchmark datasets. Our method selected the optimal hyper-parameters for diferent benchmarks to achieve the best GZSL classifcation performance compared with most existing methods. Especially on the benchmarks with a few training samples or with a higher proportion of unseen classes, DPFS gained the superior performance because it can use the information of features and attributes appropriately and avoid mode collapse. Compared with existing synthesis-based models similar to DPFS, DPFS can eliminate the introduction of irrelevant classes and suppress negative transfer. It can also synthesize candidate pseudo-features and dispose of the outliers to improve class discriminability. Furthermore, our model was also trained and evaluated for ZSL tasks and outperformed competing ZSL methods on most benchmarks. Besides, we conducted the ablation experiments of DPFS and further explained the performance gain of each tactic. Distinguishable features can be extracted and the GZSL performance can be improved with the embedding module pretraining tactic. On this basis, adding the preclassifcation tactic can predict prototypes for unseen classes before the classifer training, thereby improving the performance and avoiding overftting. Te tactic of the outlier disposing can further enhance the performance. Tese are the foundation that outperforms the competing GZSL methods. Te visualization results have demonstrated that DPFS has the distinguishability characteristics of both seen and unseen classes.

Conclusion
Tis paper proposed a novel distinguishable pseudo-feature synthesis (DPFS) method for GZSL tasks. It included the procedures of base class selection, distinguishable feature extraction, attribute projection, feature representations, and outlier disposing. Tese procedures can realize the initialization, the connection, and the weight updating of the DPFS model. Terefore, the model can synthesize distinguishable pseudo-features with attributes of unseen classes and features of similar seen classes. Experimental results showed that DPFS achieved the GZSL classifcation performance better than existing methods. It indicated DPFS signifcantly improved class discriminability and restrained negative transfer, and DPFS also efectively eliminated the domain shift and the confusion between classes. In the future, we will synthesize more distinguishable features of unseen classes by integrating more auxiliary information, such as statistical features and knowledge graphs, to extend our method into other applications.

Conflicts of Interest
Te authors declare that they have no conficts of interest.