Active Discriminative Dictionary Learning for Weather Recognition

1School of Computer Science and Information Technology, Northeast Normal University, Changchun 130117, China 2School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China 3Key Laboratory of Intelligent Information Processing of Jilin Universities, Northeast Normal University, Changchun 130117, China 4College of Statistics, Capital University of Economics and Business, Beijing 100070, China


Introduction
Traditional weather detection depends on expensive sensors and is restricted to the number of weather stations.If we can use existing surveillance cameras capturing images from the local environment to detect weather conditions, it would be possible to turn weather observation and recognition into a low-cost and powerful computer vision application.In addition, most of current computer vision systems are designed to execute in clear weather [1].However, in many outdoor applications (e.g., driver assistance systems [2], video surveillance [3], and robot navigation [4]), there is no escape of "bad" weather.Hence, research of weather recognition based on images is in urgent demand, which can be used to recognize the weather conditions for many vision systems to adaptively turn their models or adjust parameters under the different weathers.
To date, despite its remarkable value, there are only a few of works that have been proposed to address the weather recognition problem.In [5,6], some researchers proposed weather recognition models (sunny or rainy) from images captured by in-vehicle cameras.However, these models heavily relied on prior information of vehicles, which may weaken their performances.Song et al. [7] proposed a method to classify traffic images into sunny, fog, snowy, and rainy.Their method extracted several features such as the image inflection point, power spectral slope, and image noise and used K-nearest neighbor (K-NN) as classification model.This method applied only to classify the weather conditions of the traffic scene.Lu et al. [8] applied collaborative learning approach to label the outdoor image as either sunny or cloudy.In their method, an existence vector is firstly introduced to indicate the confidence in the corresponding weather feature being present in the given image.Then, the existence vector and weather features are combined for weather recognition.However, this method involved many complicated technologies (shadow detection, image matting, etc.); thus its performance largely depended on the accuracies of these technologies.Chen et al. [9] employed support vector machine (SVM) with the help of active learning to classify the weather conditions of images into sunny, cloudy, or overcast.Nevertheless, they only extracted the features from sky part of the images and the useful information in nonsky part of images is neglected.
In general, the above methods have been successfully applied to the applications of weather recognition.However, they still suffer from the following limitations.Firstly, since these methods just extracted the features (e.g., SIFT, LBP, and HSV color) from whole images or only sky part of images, they neglected the different useful information in the sky part and nonsky part of images.In the sky part of images, the distribution of cloud and the color of sky are key factors for weather recognition, so the visual appearance feature such as the texture, color, and shape should be extracted.In the nonsky part of images, the features based on physical properties which can characterize the changes of images caused by the varying weather conditions and thus image contrast [5] and dark channel [10] should be considered.Secondly, these methods directly used K-NN or SVM based algorithms to classify the different weather conditions.Although K-NN is a simple classifier and it is easy to implement, it is not robust enough for the complicated real-world images in the practical application while SVM is a complex classifier and it is difficult to select the appropriate kernel function.Lastly, these methods required a large amount of labeled data for training, which is often expensive and seldom available.
To address the above problems, we propose a novel framework to recognize different weather conditions (sunny, cloudy, and overcast) from outdoor images acquired by a static camera looking at the same scene over a period of time.The proposed method extracts not only the features from the sky parts of images which are relevant to the visual manifestations of different weather conditions, but also the features based on physical characteristics in the nonsky parts.Thus, the extracted features are more comprehensive for distinguishing the images captured under various weather situations.Unlike other methods which used the traditional classifier (e.g., SVM, K-NN), the discriminative dictionary learning is used as the classification model.Moreover, in order to achieve good performance of weather recognition with a few labeled samples, active discriminative dictionary learning algorithm (ADDL) is proposed.In ADDL, the active learning procedure is introduced to the dictionary learning which selects the informative and representative samples for learning an impact and discriminated dictionary to classify weather conditions.As far as we know, the proposed framework is the first approach which combines the active learning technology into the dictionary learning for weather recognition.
The rest of this paper is organized as follows.Section 2 briefly reviews some related work.Section 3 presents the details of the feature extraction and the proposed ADDL algorithm.Extensive experiments and comparisons are conducted in Section 4, and Section 5 is the conclusion of the paper.

Related Work
2.1.Dictionary Learning.Dictionary learning has emerged in recent years.It is one of the most popular tools for learning the intrinsic structure of images and has achieved state-ofthe-art performances in many computer vision and pattern recognition tasks [11][12][13].The unsupervised dictionary learning algorithms such as K-SVD [14] have achieved satisfactory results in image restoration, but they are not suitable for classification tasks because it only requires that the learned dictionary could faithfully represent the training samples.By exploring the label information of training dataset, some supervised dictionary learning approaches have been proposed to learn a discriminative dictionary for the classification task.Among these methods, one may directly use training data of all classes as the dictionary, and the test image can be classified by finding which class leads to the minimal reconstruction error.Such a naive supervised dictionary learning method is called sparse representation based classification (SRC) algorithm [15], which has shown good performance in face recognition.However, it was less effective in classification when the raw training images include the noise and trivial information and cannot sufficiently exploit the discriminative information in the training data.Fortunately, this problem can be addressed by properly learning a dictionary from the original training data.After the test samples are encoded over the learned dictionary, both the coding residual and the coding coefficients can be employed for identifying the different classes of samples [16].Jiang et al. [17] proposed a method named label consistent K-SVD (LC-K-SVD), which encouraged samples from the same class to have similar sparse codes by applying a binary class label sparse code matrix.Fisher discrimination dictionary learning (FDDL) method [18] was proposed based on the Fisher criterion, which learned a structured dictionary to distinguish samples from different classes.In most existing supervised dictionary learning methods,  0 -norm or  1 -norm sparsity regularization was adopted.As a result, they often suffer heavy computational costs in both training and testing phases.In order to address this limitation, Gu et al. [19] developed a projective dictionary pair learning (DPL) algorithm, which learned an analysis dictionary together with a synthesis dictionary to attain the goal of signal representation and discrimination.DPL method can successfully avoid solving the  0 -norm or  1norm to accelerate the training and test process, so we adopt the DPL algorithm as the classification model in our work.

Active Learning.
Active learning, which aims to construct an efficient training dataset for improving the classification performance through iterative sampling, has been well studied in the computer vision fields.According to [20], the existing active learning algorithms can be generally divided into three categories: stream-based [21,22], membership query synthesis [23,24], and pool-based [25][26][27].Among them, pool-based active learning is most widely used for realworld learning problems because it assumes that there is a small set of labeled data and a large pool of unlabeled data available.This assumption is consistent with the actual situation.In this paper, we adopt the pool-based active learning.
The crucial point of pool-based active learning is how to define a strategy to rank the unlabeled sample in the pool.There are two criteria, informativeness and representativeness, which are widely considered for evaluating unlabeled samples.Informativeness measures the capacity of the samples in reducing the uncertainty of the classification model, while representativeness measures whether the samples well represent the overall input patterns of unlabeled data or not [20].Most of active learning methods only take one of the two criteria into account when selecting unlabeled samples, which restricts the performance of the active learning [28].
Although several approaches [29][30][31] have considered both informativeness and representativeness of unlabeled data, to our knowledge, almost no researchers have introduced these two criteria of active learning into the dictionary learning algorithms.

Proposed Method
In this section, we present an efficient scheme, including the effective feature extraction and ADDL classification model, to classify the weather conditions of images into sunny, cloudy, and overcast.Figure 1 shows a flow chart of the overall procedure of our method.First, the visual appearance based features are extracted from the sky region and the physical characteristics based features are extracted from the nonsky region of images.Secondly, the labeled training dataset is used to learn an initial dictionary, and then the samples are iteratively selected from unlabeled dataset based on two measures: informativeness and representativeness.These selected samples are used to expand the labeled dataset for learning a more discriminative dictionary.Finally, the testing dataset is classified by the learned dictionary.

Features Extraction.
Feature extraction is an essential preprocessing step in pattern recognition problems.In order to express the difference among the images of the same scene taken under various weather situations, we analyze the different visual manifestations of images caused by different weather conditions and extract the features which describe both visual appearance properties and physical characteristics of images.
From the viewpoint of visual appearance features of images, the sky is the most important part in an outdoor image for identifying the weather.In the case of sunny, the sky appears blue because the light is refracted as it passes through the atmosphere, scattering blue light, while, under the overcast condition, the sky is white or grey due to the thick cloud cover.The cloudy day is a situation between sunny and overcast.On a cloudy day, the clouds are floating in the sky, which exhibit a wide range of shapes and textures.Hence, we extract the SIFT [32], HSV color, LBP [33], and the gradient magnitude of the sky parts of images as the visual appearance feature of images.These extracted features are used to describe the texture, color, and shape of images for weather classification.
Unlike Chen et al. [9] who directly eliminated the nonsky regions of images, we extract two features in the nonsky parts of images based on the physical characteristics, which also can be used as powerful features to distinguish the different weather conditions.The interaction of light with the atmosphere has been studied as atmospheric optics.In the sunny (clear) day, the light rays reflected by scene objects reach to the observer without alteration or attenuation.However, under bad weather conditions, atmospheric effects cannot be neglected anymore [34][35][36].The bad weather (e.g., overcast) causes decay in the image contrast, which is exponential in the depths of scene points [35].So images of the same scene taken in different weather conditions should have different Mathematical Problems in Engineering contrasts.The contrast CON of an image can be computed by where  max and  min are the maximum and minimum intensities of the image, respectively.Overcast weather may come with haze [8].The dark channel prior presented in [10] is effective to detect the image haze.It is based on the observation that most hazefree patches in the nonsky regions of images should have a very low intensity at least one RGB color channel.The dark channel  dark was defined as where   is one color channel of the image J and Ω() is a local patch centered at .In summary, both the visual appearance features (SIFT, HSV color, LBP, and the gradient magnitude) of the sky region and the physical characteristics based features (the contrast and the dark channel) of the nonsky region are extracted to distinguish the images under different weather situations.And then, the "bag-of-words" model [37] is used to code each feature for forming the feature vectors.

The Proposed ADDL Classification Model.
Inspired by the technological advancements of discriminative dictionary learning and active learning, we design an active discriminative dictionary learning (ADDL) algorithm to improve the discriminative power of the learned dictionary.In ADDL algorithm, DPL [19] is applied to learn a discriminative dictionary for recognizing different weather conditions, and the strategy of active samples selection is developed to iteratively select the unlabeled sample from a given pool to enlarge the training dataset for improving the DPL classification performance.The criterion of active samples selection includes the informativeness measure and the representativeness measure.

DPL
where  is an algorithm parameter.According to [19], the objective function in (4) can be solved by an alternatively updated manner.
When  and  have been learned, given a test sample   , the class-specific reconstruction residual is used to assign the class label.So the classification model associated with the DPL is defined as When dealing with the classification task, DPL requires sufficient labeled training samples to learn discriminative dictionary pair for obtaining good results.In fact, it is difficult and expensive to obtain the vast quantity of labeled data.If we can exploit the information provided by the massive inexpensive unlabeled samples and choose small amounts of the "profitable" samples (the "profitable" unlabeled samples are the ones that are most beneficial for the improvement of the DPL classification performance) from unlabeled dataset to be labeled manually, we would learn a more discriminative dictionary than the one learned only using a limited number of labeled training data.To achieve this, we introduce the active learning technique to DPL in the next section.

Introducing Active Learning to DPL.
When evaluating one sample is "profitable" or not, two measures are considered: informativeness and representativeness.The proposed ADDL iteratively evaluates both informativeness and representativeness of unlabeled samples in a given pool for seeking the ones that are most beneficial for the improvement of the DPL classification performance.Specifically, the informativeness measure is constructed based on the reconstruction error and the entropy on the probability distribution over the class-specific reconstruction error, and the representativeness is obtained from the distribution of the unlabeled dataset in this study.
The target of ADDL is to iteratively select   most "profitable" samples, denoted by D  , from D  to query their labels Y  and then add them to D  for improving the performance of the dictionary learning classification model.
Informativeness Measure.Informativeness measure is an effective criterion to select informative samples for reducing the uncertainty of the classification model, which captures the relationship of the candidate samples with the current classification model.Probabilistic classification models select the one that has the largest entropy on the conditional distribution over its labels [38,39].Query-by-committee algorithms choose the samples which have the most disagreement among a committee [22,26].In SVM methods, the most informative sample is regarded as the one that is closest to the separating hyperplane [40,41].Since DPL is used as the classification model in our framework, we design an informativeness measure based on the reconstruction error and the entropy on the probability distribution over the classspecific reconstruction error of the sample.
For dictionary learning, the samples which are wellrepresented through the current learned dictionary are less likely to provide more information in further refining the dictionary.Instead, the samples have large reconstruction error and large uncertainty should be mainly cared about, because they have some additional information that is not captured by the current dictionary.As a consequence, the informativeness measure is defined as follows: where Error   and Entropy  denote the reconstruction error of the sample   ∈ D  with respect to the current learned dictionary and the entropy of probability distribution over class-specific reconstruction error of the sample   ∈ D  , respectively.Error   is defined as where   and   represent subdictionary pairs corresponding to the class , which is learned by DPL algorithm as shown in (4).The larger Error   indicates that the current learned dictionary does not represent the sample   well.
Since the class-specific reconstruction error is used to identify the class label of sample   (as shown in ( 5)), the probability distribution of class-specific reconstruction error of   can be acquired.The class-specific reconstruction error probability of   in class c is defined as The class probability distribution (  ) for sample   is computed as (  ) = [ 1 ,  2 , . . .,   ], which demonstrates how well the dictionary distinguishes the input sample.That is, if an input sample can be expressed well by the current dictionary, we will get a small value of ‖  −       ‖ 2 to one of class-specific subdictionaries, and thus the class distribution should reach the valley at the most likely class.Entropy is a measure of uncertainty.Hence, in order to estimate the uncertainty of an input sample label, the entropy of probability distribution over class-specific reconstruction error is calculated as follows: The high Entropy  value demonstrates that   is difficult to be classified by the current learned dictionary; thus it should be selected to be labeled and added to the labeled set for further training dictionary learning.
Representativeness Measure.Since the informativeness measure only considers how the candidate samples relate to the current classification model, it ignores the potential structure information of the whole input unlabeled dataset.Therefore, the representativeness measure is employed as an additional criterion to choose the useful unlabeled samples.Representativeness measure is to evaluate whether the samples well represent the overall input patterns of unlabeled data, which exploits the relation between the candidate sample and the rest of unlabeled samples.The distribution of unlabeled data is very useful for training a good classifier.In previous active learning work, the marginal density and cosine distance are used as the representativeness measure to gain the information of data distribution [39,42].Li and Guo [43] defined a more straightforward representativeness measure called mutual information.Its intention is to select the samples located in the density region of the unlabeled data distribution, which is more representative regarding the remaining unlabeled data than the ones located in the sparse region.We introduce the framework of representativeness measure proposed by Li and Guo [43] into the dictionary learning.
For an unlabeled sample   , the mutual information with respect to other unlabeled samples is defined as follows: where (  ) and (  |    ) denote the entropy and the conditional entropy of sample   , respectively.  represents the index set of unlabeled samples where  has been removed from ,   =  − , and    represents the set of samples indexed by   . 2  and  2 |  can be calculated by the following formulas: Inputs: Labeled set D  and its label set Y  , Unlabeled set D  , the number of iteration   and the number of unlabeled samples   to be selected in each iteration.(1) Initialization: Learn an initial dictionary pair  * and  * by DPL algorithm from the D  .
Assume that the index set   = (1, 2, 3, . . ., ) and ∑     is a kernel matrix defined over all the unlabeled samples indexed by   ; it is computed by the following form: K(⋅) is a symmetric positive definite kernel function.In our approach, we apply simple and effective linear kernel K(  ,   ) = ‖  −   ‖ 2 for our dictionary learning task.The mutual information is used to implicitly exploit the information between the selected samples and the remaining ones.The samples which have large mutual information should be selected from the pool of unlabeled data for refining the learned dictionary in DPL.
Procedure of Active Samples Selection.Based on the above analysis, we aim to integrate the strengths of informativeness measure and representativeness measure to select the unlabeled samples from the pool of the unlabeled data.We choose the samples that have not only large reconstruction error and the entropy of probability distribution over class-specific reconstruction error with respect to the DPL classification model, but also the large representativeness regarding the rest of unlabeled samples.The sample set D  = {  1 ,   2 , . . .,     } are iteratively selected from pool by the following formula: The overall of our ADDL is given in Algorithm 1.

Experiments
In this section, the performance of the proposed framework is evaluated on two weather datasets.We first give the details about the datasets and experimental settings.Then, the experimental results are provided and analyzed.

Datasets and Experimental Setting
Datasets.The first dataset employed in our experiment is the dataset provided by Chen et al. [9] (denoted as DATASET 1).DATASET 1 contains 1000 images of size 3966 × 270, and each image has been manually labeled as sunny, cloudy, or overcast.There are 276 sunny images, 251 cloudy images, and 473 overcast images in DATASET 1. Figure 2 shows three images from DATASET 1 with the label sunny, cloudy, and overcast, respectively.
Because there are few available public datasets for weather recognition, we construct a new dataset (denoted as DATASET 2) to test the performance of the proposed method.The images in DATASET 2 are selected from the panorama images collected on the roof of BC building at EPFL (http://panorama.epfl.ch/provides high resolution (13200 × 900) panorama images from 2005 till now, recording at every 10 minutes during daytime) and categorized into sunny, cloudy, or overcast based on the classification criterion presented in [9].It includes 5000 images which were captured at approximately every 30 minutes during daytime in 2014, and the size of each image is 4821 × 400.Although both DATASET 1 and DATASET 2 are constructed based on the images provided by http://panorama.epfl.ch/,DATASET 2 is more challenging because it contains a large number of images captured in different seasons.In Figure 3 representation of the dark channel.To be specific, we divide each nonsky region of the image into 32 × 32 blocks and extract the contrast and the dark channel features by ( 1) and ( 2) and then use bag-of-words model [37] to code each feature.
In our experiment, 50% images are randomly selected in each dataset for training and the remaining data is used for testing.The training data are randomly partitioned into labeled sample set D  and unlabeled sample set D  .D  is applied for learning an initial dictionary and D  is utilized for actively selecting "profitable" samples to iteratively improve the classification performance.To make the experiment results more convincing, each following experimental process is repeated ten times, and then the mean and standard deviation of the classification accuracy are reported.

Experiment I: Verifying the Performance of Feature
Extraction.The effectiveness of the feature extraction of our method is first evaluated.Many previous works merely extract the visual appearance features of the sky part for weather recognition [9].In order to validate the power of our extracted features based on physical characteristics of images, the results of two weather recognition schemes are compared.One only uses the visual appearance features to classify the weather conditions, and the other combines the visual appearance features with features based on physical characteristics of images to identify the weather conditions.In order to weaken the influence of classifier on the results of weather recognition, -NN classification ( is experientially set to 30), SVM with the radial basis function kernel, and the original DPL without active samples selection procedure are applied in this experiment.Figures 4 and 5 show the comparison results on DATASET 1 and DATASET 2, respectively.
In Figures 4 and 5, -axis represents the different number of training samples and -axis represents the average classification accuracy.The red dotted lines indicate just six visual appearance features of the sky area are used, and the blue solid lines indicate both six visual appearance features and two features based on physical characteristics of the nonsky area are applied for recognition.From Figures 4 and 5, it is clearly observed that the combination of visual appearance features and physical features can achieve better performance for weather recognition task.

Experiment II: Recognizing Weather Conditions by the
Proposed ADDL.In this section, the performance of the proposed ADDL algorithm is evaluated.First experiment is conducted to give the best parameters for ADDL.And then ADDL is compared against several popular classification methods.

Parameters Selection.
There are three important parameters in the proposed approach, that is, , , and . is the number of atoms in each subdictionary   learned from samples in each class,  is used to control the discriminative property of , and  is a scalar constant in DPL algorithm.The performances of our ADDL under various values of , , and  are studied on DATASET 1. Figure 6(a) lists the classification results when  = {15, 25, 35, 45, 55, 65, 75, 85, 95}.It can be seen that the highest average classification accuracy is obtained when  = 25.This demonstrates that ADDL is effective to learn a compact dictionary.According to the observation in Figure 6(a), we set  to be 25 in all experiments.The classification results obtained by using the different  and  are shown in Figures 6(b) and 6(c).From Figures 6(b) and 6(c), the optimal values of  and  are 0.05 and 25, respectively.This is because of the fact that a too big or too small  value will lead the reconstruction coefficient in ADDL to be too sparse or too dense, which will deteriorate the classification performance.If  is too large, the effect of the reconstruction error constraint (the first term in (4)) and the sparse constraint (the third term in (4)) is weakened, which will decrease the discrimination ability of the learned dictionary.On the contrary if  is too small, the second term in (4) will be neglected in dictionary learning, which also reduces the performance of algorithm.Hence, we set  = 0.05 and  = 25 for all experiments.Now we evaluate weather active samples selection can improve the recognition performance.500 samples in DATASET 1 are randomly selected as training data and the remaining samples are used for testing.In training dataset, 50 samples are randomly selected as the labeled dataset D  and the remaining 450 samples are selected as the unlabeled dataset D  .The proposed ADDL uses the labeled dataset D  to learn an initial dictionary pair and then iteratively selects 50 samples from D  to label for expanding the training dataset.Figure 7 shows the recognition accuracy versus the number of iterations.
In Figure 7, the 0th iteration indicates that we only use the initial 50 labeled samples to learn the dictionary, and the 9th iteration means using all 500 training samples to learn the dictionary for recognition.The recognition ability of ADDL is improved by active samples selection procedure, and it achieves highest accuracy when the number of iterations is 3; total 200 samples are used for training.It is worth mentioning that ADDL obtains the best results when the number of iterations is set as 3.If iterations are larger than 3, the recognition rates will drop about 1%.This is because there are some noisy examples or "outliers" in the unlabeled dataset, and the more noisy examples or "outliers" will be selected to learn the dictionary along with the increase of iterations, which interferes with the dictionary learning and leads to the classification performance degradation.In the following the number of iterations is set to 3 for all experiments.two methods are -NN algorithm used by Song et al. [7] and SVM with the radial basis function kernel (RBF-SVM) adopted by Roser and Moosmann [5].The third method is SRC [15] which directly uses all training samples as the dictionary for classification.In order to confirm that the active samples selection in our ADDL method is effective, the proposed ADDL is compared with the original DPL method [19].We also compare ADDL with the method proposed by Chen et al. [9].As far as we know, the work in [9] is the only framework which addresses the same problem with our method, that is, recognizing different weather conditions (sunny, cloudy, and overcast) of images captured by a still camera.It actively selected useful samples to training SVM for recognizing different weather conditions.

Comparisons of ADDL with
For DATASET 1, 500 images are randomly selected as the training samples and the rest of images are used as the testing samples.In the training dataset, 50 images are randomly chosen as the initial labeled training dataset D  , and the remaining 450 images are regarded as the unlabeled dataset D  .ADDL and Chen's method [9] both include the active learning procedure; thus they iteratively choose 150 samples from D  to be labeled based on their criterion of the samples selection and add these samples to D  for further training the classification model.For -NN, RBF-SVM, SRC, and DPL methods which are without the active learning procedure, 150 samples are randomly selected from D  to be labeled for expanding the labeled training dataset D  .Table 1 lists the comparisons of our approach with several methods for Mathematical Problems in Engineering    From Tables 1 and 2, two points can be observed.First, we can find that the recognition performances of -NN, RBF-SVM, SRC, and DPL are overall inferior to the proposed ADDL algorithm.This is probably because these four algorithms randomly select the unlabeled data from the given pool, which do not consider whether the selected samples are beneficial for improving the performance of the classification model or not.Second, although the proposed ADDL and Chen's method [9] both include the active learning paradigm, the proposed ADDL performs better than Chen's method [9].This is due to the fact that Chen's method [9] only considers the informativeness and ignores representativeness of samples when selecting the unlabeled samples from the given pool.

Conclusions
We have presented an effective framework for classifying three types of weather (sunny, cloudy, and overcast) based on the outdoor images.Through the analysis of the different visual manifestations of images caused by different weathers, the various features are separately extracted from the sky area and nonsky area of images, which describes visual appearance properties and physical characteristics of images under different weather conditions, respectively.ADDL approach was proposed to learn a more discriminative dictionary for improving the weather classification performance by selecting the informative and representative unlabeled samples from a given pool to expand the training dataset.Since there is not much image dataset on weather recognition, we have collected and labeled a new weather dataset for testing the proposed algorithm.The experimental results show that ADDL is a fairly effective and inspiring strategy for weather classification, which also can be used in many other computer vision tasks.

Figure 1 :
Figure 1: The flow chart of the proposed method.
) for each sample   in the unlabeled dataset D  .(4) Select   samples (denoted by D  ) from the D  by (13), and add them into D  with their class labels which manually assigned by user.Then updates D  = D  − D  and D  = D  ∪ D  .(5) Learn the refined dictionaries  * new and  * new over the expanded dataset D  .(6) End for Output: Final learned dictionary pair  * new and  * new .

Figure 4 :
Figure 4: Weather recognition results over DATASET 1 by using different features.

Figure 5 :
Figure 5: Weather recognition results over DATASET 2 by using different features.

Figure 6 :
Figure 6: Selection of parameters.-axis represents the different values of parameters and -axis represents the average classification accuracy.(a) The average classification rate under different k.(b) The average classification rate under different .(c) The average classification rate under different .

Figure 7 :
Figure 7: Recognition accuracy on DATASET 1 versus the number of iterations.
Algorithm.Suppose there is a set of -dimensionality training samples from  classes, denoted by  = [ 1 , . . .,   , . . .,   ] with the label set  = [ 1 , . . .,   , . . .,   ], where   ∈  × denotes the sample set of th class and   denotes the corresponding label set.DPL [19] algorithm jointly learned an analysis dictionary  = [ 1 ; . . .;   ; . . .;   ] ∈  × and a synthesis dictionary  = [ 1 , . . .,   , . . .,   ] ∈  × to avoid resolving the costly  0 -norm or  1norm sparse coding process. and  were used for linear encoding representation coefficients and class-specific discriminative reconstruction, respectively.The object function of DPL model is {   ∈  × and   ∈  × represent subdictionary pairs corresponding to class ,   represents the complementary matrix of   ,  ≻ 0 is a scalar constant to control the discriminative property of , and   denotes the th element of dictionary .The objective function in (3) is generally nonconvex.But it can be relaxed to the following form by introducing a variable matrix : { * ,  * ,  * } = arg min * ,  * } = arg min =1       −           2  + Other Methods.Here, the proposed ADDL is compared with several methods.The first

Table 1 :
Comparisons on DATASET 1 among different methods.As can be seen from Table1, ADDL outperforms other methods.The mean classification rate of ADDL reaches about 94%.In DATASET 2, 2500 images are randomly selected as the training samples and the rest of images are used as

Table 2 :
Comparison on DATASET 2 among different methods.In the training dataset, 50 images are randomly chosen as the initial labeled training dataset D  ; the remaining 2450 images are regarded as the unlabeled dataset D  .All parameters setting for DATASET 2 are the same as DATASET 1.Table 2 lists the recognition results of different methods, which indicates that the validity of the proposed ADDL is better than other methods.