Nonnegative Matrix Factorization-Based Spatial-Temporal Clustering for Multiple Sensor Data Streams

Cyber physical systems have grown exponentially and have been attracting a lot of attention over the last few years. To retrieve and mine the useful information from massive amounts of sensor data streams with spatial, temporal, and other multidimensional information has become an active research area. Moreover, recent research has shown that clusters of streams change with a comprehensive spatial-temporal viewpoint in real applications. In this paper, we propose a spatial-temporal clustering algorithm (STClu) based on nonnegativematrix trifactorization by utilizing time-series observational data streams and geospatial relationship for clustering multiple sensor data streams. Instead of directly clustering multiple data streams periodically, STClu incorporates the spatial relationship between two sensors in proximity and integrates the historical information into consideration. Furthermore, we develop an iterative updating optimization algorithm STClu. The effectiveness and efficiency of the algorithm STClu are both demonstrated in experiments on real and synthetic data sets. The results show that the proposed STClu algorithm outperforms existing methods for clustering sensor data streams.


Introduction
Cyber physical systems (CPS) have grown exponentially and have been attracting a lot of attention over the last few years [1].Emerging applications include transportation, energy, manufacturing, environment monitoring, and many parts of our social infrastructure.It consists of a large number of sensors to monitor physical or environmental conditions [2,3].The incoming information in CPS needs to be considered as a stream and not so much as an ever-growing data set [4,5].
To obtain interesting relationships and useful information from multiple data streams, a variety of methods have developed over the past decades.Beringer and Hüllermeier [6] used a discrete Fourier transformation (DFT) to summarize the data streams and presented an online version of the classic -means clustering algorithm to cluster parallel data streams.To avoid recalculating the DFT coefficients and minimize processing time, an incremental update mechanism for clustering large number of data streams was proposed in [7].To discover cross relationships among multiple data streams, a clustering-on-demand framework was presented in [8].Yeh et al. [9] proposed a framework for clustering over multiple evolving streams by correlations and events, which monitored the distribution of clusters over multiple data streams based on their correlation.To find communities in a large set of interacting entities, Aggarwal and Yu [10] proposed an online analytical processing framework for community detection of data streams.Evolutionary clustering was a relatively new topic and was formulated by Chakrabarti et al. [11] in 2006.To discover communities and capture their evolution with temporal smoothness, incremental and evolutionary clustering technologies were designed to handle dynamic data streams [10,[12][13][14].Wang et al. [15] proposed an ECKF framework for evolutionary clustering large-scale data based on low-rank kernel matrix factorization.Chi et al. [16] proposed two algorithms to incorporate temporal smoothness in evolutionary spectral clustering.Ning et al. [14] presented an incremental approach for spectral clustering to handle not only insertion/deletion of data points but also similarity changes between existing points.
The values of a sensor in the temporal domain usually have spatial correlation features, meaning that the sensors' readings are influenced by near sensors.Such as in transportation systems, the congestion swiftly expands along the street and influences nearby sensors.A serious congestion usually lasts for a few hours and covers hundreds of sensors when reaching the full size.As time passes by, it shrinks slowly, eventually reduces the coverage, and finally disappears.As shown in Figure 1, we find that those records in an atypical event are spatially close and timely relevant.To retrieve the useful information from massive sensor data streams, Tang et al. [5] proposed a model of atypical cluster to describe the atypical events and retrieve them with spatial and temporal attribute.To explore the spatial correlation between data sampled by different sensors for sensor networks, -local spatial clustering algorithm was proposed [17].To explore the temporal locality features of sensor data, Wei and Peng [18] proposed an incremental clustering algorithm that clusters objects by inducing clustering results from previous time windows.
In this paper, we propose a spatial-temporal clustering algorithm (STClu) based on nonnegative matrix trifactorization by utilizing time-series observational data streams and geospatial relationship for clustering multiple sensor data streams.Our proposal was motivated by recent progress in nonnegative matrix factorization and its applications [19][20][21][22][23][24][25], with the aim being to discover clusters of objects with similar behavior and, consequently, discover potential anomalies and atypical events which may be revealed by the relationship evolving over time.We assume that the spatial feature is the summary of the atypical event in temporal dimension, and the temporal feature is the summary of the event in spatial dimension.STClu simultaneously incorporates the spatial correlation between two sensors in proximity and integrates the historical information into consideration.To maintain a consistent clustering result, there is a trade-off between the spatially close and timely relevant cost embedded in STClu and the benefit from current observations and the spatially close historical results.Thus, the clustering results at a given time step are determined by current snapshot, prior knowledge, and spatial correlation between two sensors in proximity.
The rest of this paper is organized as follows.Related research work is introduced in Section 2.Then, the preliminaries of STClu are given in Section 3. Section 4 presents the proposed algorithm in detail, while convergence and correctness proofs of the STClu are provided in the Appendix.Experimental results using synthetic and real world data sets are presented in Section 5. Finally, Section 6 gives the conclusion.

A Brief Review of Matrix Factorization and Its Applications
Graphs are used in a wide range of applications, such as transportation networks, sensor networks, and social networks.So, various problems are studied under graph mining.To find clusters in graphs is a new challenge if the graph is evolving over time.Matrix is a usual representation of a graph; the relationships between matrix factorization and means clustering have been explored in [26].Singular value decomposition (SVD) has served as a building block for many important applications, such as PCA and LSI [27].SVD factorizes a matrix with the general form of  ≈ SVD  , where  is a unitary basis consisting of left-singular vectors of ,  is a unitary basis consisting of right-singular vectors of , and  is a diagonal matrix with singular values on the diagonal.In SVD, matrices  and  are allowed to have negative entries; the projected data might have negative values in spite of the original data being positive.That can prevent from the clustering results to be intuitive for applications such as documents or images that have a positive data input.
Nonnegative matrix factorization (NMF) [28,29] is a linear, nonnegative approximate data representation technique.NMF focuses on the analysis of data matrices whose elements are nonnegative, a common occurrence in data sets derived from text and images.The nonnegative data matrix  is factorized into matrixes  and  as  ≈   , with the constraints that  ∈ R × and  ∈ R × are nonnegative.
In particularly, the result of a -means clustering run can be written as a matrix factorization  =   , where  is a data matrix,  contains the cluster centroids, and  contains the cluster membership indicators.Nonnegative matrix factorization focuses on the analysis of data matrices whose elements are nonnegative [28].
To extend the applicable range of NMF methods, when the data matrix is unconstrained, Semi-NMF is motivated from the perspective of clustering [29], in which it restricts  to be nonnegative while placing no restriction on the signs of .For reasons of interpretability, Convex-NMF constrains the basis vectors  = (f 1 , . . ., f k ).The vectors defining  lie within the column space of :   =  1  1 + ⋅ ⋅ ⋅ +     =   , or  = .It can be applied to both nonnegative and mixed-sign data matrices.This constraint could interpret the columns   as weighted sums of certain data points and these columns would capture a notion of centroids.
To expand the range of application of NMF, Semi-NMF and Convex-NMF algorithm have been proposed in [29].Semi-NMF offers a low-dimensional representation of data points which lends itself to a convenient clustering interpretation.Convex-NMF restricts the columns of  to be convex combinations of data points in .The advantage of this constraint is that the columns as weighted sums of certain data points could interpret.Based on NMF, a clustering algorithm has been proposed in [29].
However, the traditional NMF, Semi-NMF, and Convex-NMF are linear model and they may fail to discover the nonlinearities of data streams.In real world, the data streams have potential nonlinear structure.Kernel method is a powerful technique for extracting useful information form nonlinear correlations.To solve the problem of nonlinearities, the kernel method is to map the data nonlinearly into a kernel feature space.Then, the Convex-NMF method can be accomplished in the kernel feature space to process the nonlinear data.
Indeed, several important applications can be modeled as large sparse graphs, such as transportation network analysis and social network analysis.Low-rank approximation for the matrix of a graph is essential in finding patterns and detecting anomalies.And it can extract correlations and remove noise from matrix structured data.This has led to proposal of methods such as CUR [30], CMD [31], and the family of Colibri [32].

Preliminaries
To monitor the traffic status along a freeway, sensors are deployed and utilized to collect readings, such as vehicle speed and volume of traffic.It is assumed that data records arrive synchronously, which means that all data streams will be updated simultaneously.The sensors are the points in the graph and an edge is formed between each pair of sensors; these data records can be used to construct graphs of transportation networks periodically.

Our Approach
4.1.Objective Function of .By utilizing time-series observational data streams and geospatial relationship for clustering multiple sensor data streams, the proposed method STClu takes the spatial correlation between two sensors in proximity and the historical information into consideration.The cost of spatially close and timely relevant is embedded in the objective function.The cluster partitions are obtained by decomposing the adjacency matrix representation of the graph into a product of its latent factors.The square of the Euclidean distance and the generalized Kullback-Leibler divergence are common metrics for the approximation error [24].In STClu, we seek to minimize the distance function ‖ − ‖ 2  between two nonnegative matrices  and , defined as where ‖ ⋅ ‖ 2  is the Frobenius norm.This is lower bounded by zero and clearly vanishes if, and only if,  = .

Optimization of 𝑆𝑇𝐶𝑙𝑢.
As can be seen, the objective function in ( 4) is minimized with respect to  () ,  () ,  () , and  (−1,) , and it is unrealistic to expect an algorithm to find the global minimum.In the following, we introduce an alternating schema to optimize the objective function, which can achieve a local minimum.
The computational cost of the proposed STClu algorithm is discussed as follows.To compute the complexity of the STClu algorithm, we need to inspect its main operations, namely, the decomposition matrix and multiplicative updates.In the decomposition step, the computation is actually done using the three small matrices, , , and .Each of , , and  is multiplied by an update factor, which involves several matrix multiplications.We assume that  is the cluster number at time step ,  1 and  0 are the sizes of the data subspaces at time steps  and −1, respectively, and  1 and  0 are the sizes of the input matrices at time steps  and  − 1, respectively; the multiplicative update stops after  iterations, and the cost for STClu is (( 2  +  +  2 +  2 )), where  = max( 0 ,  1 ) and  = max( 0 ,  1 ).

Experiments and Results
In this section, we use several synthetic and real world data sets to evaluate the effectiveness and efficiency of the STClu algorithm.
following three popular clustering algorithms.The first is means clustering in the principle component analysis (PCA) subspace.The other two algorithms are the typical spectral clustering algorithm normalized cut (Ncut) [34] and NMFbased clustering [19].
To evaluate clustering quality, all our comparisons are based on clustering accuracy (ACC) and normalized mutual information (NMI) measurements.
Clustering accuracy (ACC) discovers the one-to-one relationships between clusters and classes and measures the extent to which each cluster contains data points from the corresponding class.It is defined as [18,19] where   denotes the cluster label,   denotes the true class label,  is the total number of documents, (, ) is the delta function that is equal to one if  =  and zero otherwise, and map(  ) is the permutation mapping function that maps each cluster label   to the equivalent label from the data set.
Between two random variables  (category label) and  (cluster label), NMI is defined as [35] where  is the number of documents,   and   denote the number of documents in category  and cluster , respectively,  , denotes the number of documents in category  as well as in cluster, and  () and  () are the cluster numbers in the true and predicted clusters.The NMI score is 1 if the clustering results perfectly match the category labels whereas the score is 0 if the data are randomly partitioned.The higher the NMI score is, the better the clustering quality is.Since there are no predefined categories in our data, we have to design an alternative way to carry out the evaluation.In this paper, we use the -means clustering result of the combined feature data as the target labels for feature clustering.Similarly, the -means clustering result of the combined user data is used as the target labels for user clustering.To randomize the experiments, we conduct the evaluation using different cluster numbers.Note that all the experiments ran on the same machine.At each time step, for each given cluster number, 30 test runs are carried out on different randomly chosen clusters.For each experiment, we run it 10 times and report the average.

Synthetic Data Sets.
In this section, we use synthetic datasets to demonstrate that STClu can significantly cluster data for multiple evolving data streams over time.The number of nodes and number of clusters in each time step are given.The synthetic datasets are generated by two steps [18].The first step generates the objects' geographic information, and the second step generates the values of the objects' nongeographic attributes.The objects' nongeographic attributes are generated by applying prototype systems used in [6].It can be defined as follows: (+Δ) = ()+  (+Δ),   (+Δ) =   () + (), in which (⋅) is a stochastic process,  = 0, and Δ, 2Δ, . . .are independent random variables, uniformly distributed in an interval [−, ].So, the data stream (⋅) can be defined by () = ( + ℎ()) + (), where ℎ(⋅) and (⋅) are stochastic processes that are generated in the same way as the prototype (⋅).The constant  determines the smoothness of a data stream, which can be different for (⋅), ℎ(⋅), and (⋅), such as 0.05, 0.1, and 0.2, respectively.To examine the quality of STClu for clustering multiple data streams, we generate random datasets with the number of streams ranging from 100 to 1000.Each stream has a length of 2,000 points.
We show the performance of four methods while the time window evolving over time and the number of clusters in the data varies from 2 to 20.The stream number is fixed at 400, and each of them contains 1,000 points.As can be seen in Table 1, the average ACC and NMI among four methods The reason is that it incorporates the previous clustering results of the upstream sensors and the spatial relationship between the upstream and downstream sensors.
The average ACC and NMI value for the proposed algorithm varies with the varied number of clusters or data streams shown in Figures 3(a) and 3(b), respectively.As shown in Figure 3, we can observe that the performance of the proposed algorithm STClu is very robust to the number of clusters and data streams.
Finally, we evaluate the average processing time for a round of clustering multiple data streams.There are different factors affecting the execution time.In this experiment, we evaluate the effect of the window size and the number of data streams on the response time for the compared algorithms.
The following set of experiments evaluate the effect of window size on the execution time of these algorithms.Figure 4(a) shows that as the window size increases, the processing time of these algorithms increases.The -axis shows the execution time, while the -axis represents the window size () from 60 to 600.In Figure 4(b), the -axis shows the execution time, while the -axis represents the number of data streams being varied from 100 to 1000.The other parameters are fixed as follows: () = 300 s,  = 4.Note that as the number of streams increases, the execution time of these methods increases.However, the time of the STClu and NMF increases much more slowly than others.This is because of the execution efficiency of STClu and NMF for clustering multiple data streams in the approximate subspace.Although STClu requires more time than NMF, as can be seen from the experiments, it is able to process extremely large sensor data streams with spatial, temporal, and other multidimensional information.

Real World Data
Sets.The real world data sets were obtained from the PeMS (http://pems.dot.ca.gov/) database.
With sensor devices installed on road networks, a system monitors traffic flows on major US highways 24 hours a day, 7 days a week, thus acquiring huge volumes of data.We collected data from 963 sensors positioned along the freeway.We selected 15 months of daily data records giving the occupancy rates (between 0 and 1) of different car lanes on freeways in the San Francisco bay area over time.The data set covered records sampled every 10 min from 1 January 2008 to 30 March 2009.Data for public holidays were removed from the data set, as well as datva collected on two days with anomalies (8 and 9 March 2008), where all sensors were muted between 2:00 and 3:00 am.
First, we discuss the experiments on the PeMS data set with the number of clusters varying from 2 to 20.For each given cluster number , the clustering results of all these algorithms on the PeMS data set are shown in Table 2; Ncut, NMF, and the proposed STClu algorithm achieve better performance than the -means algorithm, since these three algorithms consider the geometrical structure information contained in the data.
Next, we discuss the experiments on the PeMS data set with different time steps.For each time step, the clustering results of all these algorithms on the PeMS data set are shown in Table 3.We observe that the STClu algorithm performs better than the other three algorithms for most time steps, except  = 1.The reason for this is that STClu integrates more historical knowledge and the spatial relationship between two sensors in proximity.
Finally, we report the experimental results on PeMS data set over a different number of time steps with a varying number of clusters.The clustering results of all these algorithms on the PeMS data set are shown in Table 4. STClu outperforms the other three algorithms in terms of both ACC and NMI, because it simultaneously considers the geometric structure and incorporates the prior knowledge and the spatial correlation between sensors in proximity.

Conclusions
Clustering multiple sensor data streams in CPS has been extensively studied in various applications, including transportation systems, sensor networks, and social networks.To extract and retain meaningful information from multiple sensor data streams, we assume that the spatial feature is the summary of the atypical event in temporal dimension, and the temporal feature is the summary of the event in spatial dimension.In this work, we have proposed a spatial-temporal clustering algorithm, called STClu.It utilizes time-series observational data streams and geospatial relationship for clustering multiple sensor data streams.It aims to discover clusters of objects with similar behavior and discover potential anomalies which may be revealed by the relationship evolving over time.STClu simultaneously incorporates the spatial relationship between two sensors in proximity and integrates the historical information into

Figure 1 :
Figure 1: Time-series of upstream and downstream traffic flow.

Figure 3 :Figure 4 :
Figure 3: The performance of the proposed algorithm varying with the size of (a) clusters, (b) data streams, respectively.

Table 1 :
Comparison of ACC and NMI results for the four methods based on synthetic data sets.

Table 2 :
Comparison of the ACC and NMI results for the four methods on PeMS with the number of clusters varying from 2 to 20.

Table 3 :
Comparison of the ACC and NMI results for the four methods on PeMS with different time steps.

Table 4 :
Comparison of the ACC and NMI results for the four methods on PeMS with the number of clusters varying from 2 to 20 at different time steps.Experimental results on synthetic and real world data sets show that STClu achieves a better performance for clustering multiple sensor data streams evolving over time.