A Semi-Supervised Framework for MMMs-Induced Fuzzy Co-Clustering with Virtual Samples

. Although the goal of clustering is to reveal structural information from unlabeled datasets, in cases with partial structural supervisions, semi-supervised clustering is expected to improve partition quality. However, in many real applications, it may cause additional costs to provide an enough amount of supervised objects with class labels. A virtual sample approach is a practical technique for improving classification quality in semi-supervised learning, in which additional virtual samples are generated from supervised objects. In this research, the virtual sample approach is adopted in semi-supervised fuzzy co-clustering, where the goal is to reveal object-item pairwise cluster structures from cooccurrence information among them. Several experimental results demonstrate the characteristics of the proposed approach.


Introduction
Clustering or cluster analysis is a basic technique for unsupervised classification, whose goal is to reveal intrinsic substructures varied in large scale unlabeled datasets.In some applications, however, it is possible to utilize partial knowledge on the substructures [1], such as must-link and cannot-link among some objects [2][3][4], class labels on a part of objects [5][6][7], or predefined fuzzy membership degrees [8,9], and it is expected that partition quality can be significantly improved by utilizing such partial knowledge.In this study, a situation of having semi-supervision is considered, in which we have some supervising objects in conjunction with their class labels.Semi-supervised clustering [5,10] is a practical approach for utilizing partial supervised information with the goal of improving the partition quality in unsupervised classification.Such partial knowledge can be utilized in two phases of supervised initialization and supervised membership assignment [6,7].
Although partition quality is expected to be improved with sufficient semi-supervisions, in many real world applications, it is often difficult to utilize an enough amount of supervised objects.For example, in many web data analyses, various open data are available while most of them are unlabeled and their class labels cannot be provided without heavy costs.A promising approach to improve classification quality in semi-supervised learning is the virtual sample approach, in which additional virtual samples are artificially generated from several supervised objects [11][12][13].
In this paper, the virtual sample approach is adopted in semi-supervised fuzzy co-clustering while the conventional model [13] was designed for semi-supervised classification in multidimensional data space.Fuzzy co-clustering is a fundamental technique for summarizing mutual cooccurrence information among objects and items such as document-keyword frequencies in document analysis [14] and customer-products preferences in purchase history analysis [15].The task of co-clustering is achieved by simultaneously estimating memberships of both objects and items from a cooccurrence information matrix.Fuzzy co-clustering induced by MMMs concept (FCCMM) [16] is a fuzzy coclustering algorithm induced by multinomial mixture models (MMMs) [17], where statistical mixture models are interpreted as -means-type classification models with regularized objective functions.The iterative algorithm is composed of updating two types of memberships of items and objects.In semi-supervised fuzzy co-clustering [18], partial knowledge was utilized in two phases of supervised initialization and supervised object membership assignment in a similar manner to the framework of [6,7].In supervised initialization, initial 2 Advances in Fuzzy Systems item memberships are generated using labeled objects only.Then, in supervised object membership assignment, memberships of supervised objects are prefixed while others are updated in the iterative algorithm.
The remaining parts of this paper are organized as follows: Section 2 provides a brief review of the MMMs-induced fuzzy co-clustering, which is then combined with a semisupervised framework.An artificial process for generating virtual samples is introduced in Section 3. Section 4 presents experimental results and Section 5 gives summary conclusions.

MMMs-Induced Fuzzy Co-Clustering and Its
Variant for Semi-Supervised Clustering For example,   can be the frequency of keyword (item)  in document (object)  in document analysis.In many traditional results, it has been shown that co-cluster structure analysis is useful for summarizing contents such as intrinsic chapter or category information of many documents with their representative keywords.In fuzzy co-clustering context, the task is reduced to the problem of finding the pairwise clusters of mutually familiar objects and items, in which the goal is to estimate the fuzzy memberships of both objects and items such that mutually familiar object and item pairs have large memberships in the same cluster.MMMs [17] is a probabilistic mixture model for coclustering, in which each component density is multinomial distribution.Multinomial distribution is a multicategory extension of binomial distribution, where the probability of an object with item cooccurrence vector   = ( 1 , . . .,   ) ⊤ is defined as the joint probability of all items with their frequencies of   .MMMs construct a mixture distribution by iteratively estimating the item occurrence probability and the a priori probability of generative distributions in conjunction with estimating object a posteriori probability to each generative model.
Following the soft clustering interpretation of probabilistic mixture models [19], Honda et al. [16] introduced a fuzzy co-clustering-based interpretation of MMMs, where the pseudo-log-likelihood function was decomposed into an object-item aggregation measure and K-L information-based fuzzification penalty.FCCMM [16] is an MMMs-induced fuzzy co-clustering model, in which the degree of object partition fuzziness can be tuned with an adjustable penalty weight on the K-L information-based penalty.For extracting  fuzzy co-clusters, the objective function to be maximized is defined as where   and   are the fuzzy memberships of object  and item  to cluster , respectively.  is the volume of cluster  such that ∑  =1   = 1.From the MMMs viewpoint,   is the a posteriori probability of class  given object  and should be constrained to ∑  =1   = 1.On the other hand,   corresponds to the generative probability of item  in component  and should be ∑  =1   = 1.Then,   is mainly responsible for exclusive object partition while   just represents the typicality of item  within a generative model.
∑  =1 ∑  =1 ∑  =1     log   is the aggregation criterion of objects and items, which measures the degree of aggregation of familiar objects and items, and becomes larger when mutually familiar object-item pairs with large   have large memberships   and   in a same cluster .From the viewpoint of -means-type clustering with respect to object memberships   , the aggregation criterion is essentially a (linear) hard clustering measure, and the K-L information term [20,21] is responsible for estimating soft partition of objects in the same manner with the soft partition nature of Gaussian mixture models [19].  tunes the degree of fuzziness of object partition.When   = 1, the objective function is reduced to the pseudo-log-likelihood function of MMMs with  component distributions.Then, in the case of   > 1, FCCMM brings a fuzzier co-cluster partition than MMMs while object partition becomes more crisp with   < 1.As an extremal case,   → 0 implies a crisp co-clustering model.It was shown that a careful tuning of fuzziness degree can contribute to improvement of partition quality of MMMs by reducing initialization and noise sensitivities [16].
The updating rules for these model parameters are given as The clustering algorithm is the 3-step iterative process composed of these updating rules.

Algorithm of FCCMM
(1) Let  be the number of clusters.Choose the fuzzification weight   .
(3) Iterative Process: iterate the following process until convergence of all   .

Semi-Supervised Fuzzy Co-Clustering and Inductive Classification.
When some objects have their intrinsic class labels, the partition quality of unsupervised clustering is expected to be improved with their supports [5,10].A possible semisupervised framework for fuzzy co-clustering [18] utilized such partial supervision in two levels: supervised initialization and supervised membership assignment.Assume that we have an  ×  cooccurrence matrix, where only a part of objects, such that  ∈   (|  | < ), has class labels but others do not.
A framework of semi-supervised fuzzy co-clustering is given as follows [18].

Semi-Supervised Fuzzy Co-Clustering Framework
(1) Let  be the number of clusters, which is usually equivalent to the class number of the supervised objects.Choose the fuzzification weight   .
( In the initialization level, a plausible initial co-clusters are estimated considering only the supervised objects.Here, this approach is available only if the amount of supervised objects is enough for estimating cluster-wise item preferences, and each cluster index strictly corresponds to the intrinsic class index.In case with insufficient supervision, this initialization step should be performed with the conventional procedure of random assignment for avoiding overfitting to a few supervised objects.
Next, in the iterative optimization level, a fixed crisp object membership is assigned to each supervised object reflecting its class label.The partial supervision can contribute to guiding other unlabeled objects to plausible coclusters.This second level is expected to be useful even if we have only a few supervised objects.
Once we got a co-cluster structure, we can perform inductive classification of new (unlabeled) objects supported by the co-cluster information [18].
The goal is to predict the class of a new test object  * , which is associated with its cooccurrence information   * = (  * 1 , . . .,   *  ).
(3) Maximum Membership Assignment: search for the largest   * and output its class label.
This inductive classification approach was shown to outperform the supervised classification with a small set of supervised objects only [18].This implies that unlabeled objects can contribute to effectively estimating class-wise distributions rather than supervised learning with insufficient supervisions.

Generation of Virtual Samples for Semi-Supervised Fuzzy Co-Clustering
Although the performance of semi-supervised clustering is expected to be improved as the number of labeled objects becomes larger, it may cause a high cost to generate an enough amount of supervised objects in real applications.The virtual sample approach is a practical strategy for improving classification quality in semi-supervised learning without additional costs.In the remaining parts of this paper, the virtual sample approach is adopted in semi-supervised fuzzy co-clustering.Sassano [13] proposed two methods for generating virtual samples for text classification based on the following assumption: The category of a document is unchanged even if a small number of words are added or deleted.
Because the documents belonging to the same category usually contain several common keywords, deletion or addition of a small number of words is expected not to have a severe impact on classification quality.In [13], two strategies of Deletion and Addition were considered and were utilized in semi-supervised support vector machine learning of text documents.In GenerateByDeletion, virtual samples were generated by deleting some portions of the original supervised documents and were added to the class of the original ones.On the other hand, in GenerateByAddition, virtual samples were generated by adding a small number of words into the original supervised documents.The words to be added are taken from documents, the label of which is the same as that of the original document.
In this paper, the two generative strategies are introduced to semi-supervised fuzzy co-clustering tasks.Assume that   = ( 1 , . . .,   ) is the cooccurrence information vector on a supervised object  ∈   , where   ∈ {0, 1} just represents the appearance/absence of item  in object .Its virtual copy    = (  1 , . . .,    ) is given as an additional virtual (supervised) sample to be added to partial supervision after slight revisions based on the two strategies.
This procedure implies that each virtual object is a virtual copy of the original supervised object having a smaller number of appearances.Virtual object    is almost equivalent to the original   with  → 0 while    becomes sparse as  is larger.
Here, the detailed process is demonstrated with a toy example using a set of 6 supervised objects (|  | = 6) shown in Table 1, where each cooccurrence information vector   is composed of ten items ( = 10) and the objects belong to one of two supervised classes ( = 2).Be noted that class 1 is mainly related to the first 5 items while class 2 is related to others.

GenerateByAddition
(1) Construct the set of all supervised objects, whose class labels are the same as that of   , and generate a temporal item set composed of all items that appeared in the supervised object set.
(2) Copy   to    .(3) For each item  with   = 1, if rand( ) ≤  then randomly select item  from the temporal item set and set as    = 1.
The generated virtual objects are added to the set of supervised objects and are utilized as semi-supervision in semi-supervised learning.

Numerical Experiment
The classification quality of semi-supervised fuzzy coclustering with virtual objects is investigated through numerical experiments in this section.
The classification quality was investigated through 5fold cross-validation scheme.In applying semi-supervised fuzzy co-clustering, the dataset was first partitioned into 5 disjoint subsets.Four subsets were utilized for the training set to be used in semi-supervised fuzzy co-clustering and the remaining one subset was used for the test set to validate the classification ability.This training/test trials were iterated 5 times rotating different test subsets.The number of clusters was set as the actual class number; that is,  = 6 in CiteSeer dataset and  = 7 in Cora dataset.The fuzzification weight was set as   = 5.0 for CiteSeer dataset and   = 3.0 for Cora dataset, respectively, such that a slightly fuzzier model rather than MMMs with   = 1.0 can contribute to better performances [16].
4.1.Preliminary Experiment.First, a preliminary experiment was performed with the goal of investigating the effects of the amount of supervised objects, where the semi-supervised framework was implemented without virtual objects.The amount of supervised objects, which are available in training data, was varied with various ratios of {0%, 1%, 5%, 10%, 20%, 50%, 100%} for two benchmark datasets but the class labels of the remaining objects were withheld to be unknown."0%" corresponds to the conventional unsupervised model, which utilizes no supervisions."100%" corresponds to the fully supervised model, where all training objects have their class information and  co-cluster models (cluster-wise item memberships) were independently estimated in each class.After co-cluster estimation, the classes of the unsupervised training objects were predicted with the largest memberships.The classes of the test objects were also predicted by the inductive classification scheme of Section 2.
The recognition rates are compared in Table 2 for CiteSeer dataset and Table 3 for Cora dataset."100%" for training data are missing as shown with "-" because no unsupervised By the way, in these datasets, too many supervisions, such as more than 10%, could not contribute to further improvement of test evaluation.It may be because too many supervisions can bring the overfitting to training samples and cause poor generalization capability.Then, semi-supervised learning is expected to contribute to improving generalization capability rather than fully supervised learning.
In the following experiments, a situation with 1% partial supervision was designed for simulating an insufficient partial knowledge case, where the goal is to demonstrate the advantage of additional virtual partial supervision.

Investigation of Effects of Virtual Objects.
Next, the effect of virtual objects in semi-supervised fuzzy co-clustering was investigated.In this experiment, the classification abilities of semi-supervised fuzzy co-clustering of the cases with/without virtual objects were compared.Following the result of the above preliminary experiment, 1% supervised objects were included in training sets for semi-supervised coclustering as a simulation of the situation where an enough amount of supervised objects is not available.The virtual objects were generated from each of these 1% supervised objects by GenerateByDeletion and GenerateByAddition, and then they were added to training sets and utilized as additional supervision.Supported by the additional information of virtual objects, the partition quality is expected to be improved rather than the original 1% supervision case.
Additionally, the influences of the number of virtual objects were also investigated, where the number of virtual objects was increased by iterating the generative procedures.Furthermore, a hybrid method was also applied using both GenerateByDeletion and GenerateByAddition, simultaneously.In this hybrid method, at least 2 virtual objects are generated from one supervised object.Both the recognition rates of training and test objects were estimated in the same manner with the preliminary experiment.Tables 4 and 5 show the results with CiteSeer dataset.Tables 6 and 7 show the results with Cora dataset.Bold types indicate the improved quality rather than the original 1% supervision case (without virtual objects)."GBD" and "GBA" mean GenerateByDeletion and GenerateByAddition, respectively, and "GBD+GBA" is their hybrid method.In this experiment, the number of virtual objects per supervised object was varied in {1, 2, 5} adopting different random seeds in GenerateByDeletion and GenerateByAddition and varied in {2, 4} adopting different random seeds in the hybrid method.Parameter  of addition/deletion rate was varied with  = {0.01,0.05, 0.1, 0.2}.
In both training and test cases, the recognition rates of semi-supervised co-clustering with virtual objects are superior to those without virtual objects when the number of virtual objects generated from a supervised objects is relatively small.These results indicate that virtual objects can contribute to improving classes except for Agents and HCI in CiteSeer dataset and Reinforcement learning in Cora dataset.Therefore, virtual objects can mostly contribute to improving partition quality from the viewpoint of class-wise recognition.

Conclusion
In this paper, the effect of virtual objects in semi-supervised fuzzy co-clustering was demonstrated.Following the previous study [13], two novel procedures for generating virtual objects for cooccurrence data analysis were proposed and their utility was investigated through numerical experiments.
In the numerical experiments with two benchmark datasets, the effects of virtual objects in conjunction with the effects of the number of virtual objects per original supervised object were compared and it was indicated that classification quality of semi-supervised fuzzy co-clustering can be improved without additional cost for generating supervised objects by adding several virtual objects while the classification quality can be degraded with too many virtual objects.
Future work includes improving the quality of virtual samples.For example, it may be possible to improve the quality of the virtual samples by evaluating the plausibility of the additional samples with some cluster validity measures for fuzzy co-clustering [22].Furthermore, besides the simple virtual copy of objects, virtual copy of items may be another possible direction.
) Initialization: initialize the memberships of supervised objects  ∈   such that   = 1 in the labeled class  and   = 0 in others.For unsupervised objects  ∉   ,   = , ∀, , where  is a small positive.
(The initial cluster volumes and item memberships are estimated by mainly reflecting supervised objects only.)(3) Iterative Process: iterate the following process until convergence of all   .(a) Update cluster volumes   using (3).(b) Update item memberships   using (4).(c) Update memberships   of unsupervised object  ∉   using (2).(The memberships of supervised objects are prefixed and unchanged.)

Table 1 :
A toy example of supervised objects set.

Table 2 :
Comparison of recognition rates in preliminary experiment (CiteSeer).

Table 3 :
Comparison of recognition rates in preliminary experiment (Cora).

Table 4 :
Comparison of recognition rates with and without virtual objects (CiteSeer-training).

Table 5 :
Comparison of recognition rates with and without virtual objects (CiteSeer-test).