Improved Convolutional Neural Network for Chinese Sentiment Analysis in Fog Computing

Fog computing extends the concept of cloud computing to the edge of network to relieve performance bottleneck and minimize data analytics latency at the central server of a cloud. It uses edge nodes directly to perform data input and data analysis. In public opinion analysis system, edge nodes that collect opinions fromusers are responsible for some data filtering jobs including sentiment analysis. Therefore, it is crucial to find suitable algorithm that is lightweight in operation and accurate in predictive performance. In this paper, we focus on Chinese sentiment analysis job in fog computing environment and propose a non-task-specific method calledChannel Transformation BasedConvolutionalNeuralNetwork (CTBCNN) for Chinese sentiment classification, whichuses a new structure called channel transformation based (CTB) convolutional layer to enhance the ability of automatic feature extraction and applies global average pooling layer to prevent overfitting. Through experiments and analysis, we show that our method do achieve competitive accuracy and it is convenient to apply this method to different cases in operation.


Introduction
The concept of fog computing [1,2] is extended from cloud computing which puts a substantial amount of data analysis to edge nodes.These edge nodes are densely geographically deployed and are close to original data input.The main purpose of fog computing is to relieve network traffic load and reduce data calculation latency at the cloud.With the explosive growth of data from the Internet, fog computing is getting more and more important in Internet of Things (IoT) [3], Intrusion Detection, and many other fields.In a conceptual framework of public opinion analysis system, edge nodes are responsible for some data filtering jobs including sentiment analysis.Therefore, operations on these edge nodes are supposed to be lightweight and accurate in predictive performance.
Sentiment analysis is an important task in many realworld applications and there are many researches on it.Most researches on sentiment analysis mainly divide into two categories: unsupervised methods based on sentimental lexicon and supervised machine learning methods.The first strategy identifies polarity of text using sentiment lexicons.
Reference [4] implemented sentiment value calculation and classification of microblog texts by extensible sentiment dictionary.Reference [5] combined basic emotion value lexicon and social evidence lexicon to improve traditional polarity lexicon, thus achieving significant improvement in Chinese text sentiment analysis.There are various ways to express the same opinion in Chinese.Therefore, it is impossible for any lexicon to cover all sentiment words or phrases.Furthermore, the same word in different field may have different sentimental impacts, which means lexicon is usually taskspecific, while supervised machine learning methods use traditional text classification methods, such as Naive Bayes (NB) and Support Vector Machine (SVM).Pang applied several machine learning techniques to the sentiment classification problem in [6], which used movie reviews as data and the results showed that SVM outperform other methods.Feature extraction methods based on TF-IDF, Mutual Information, and Chi-Square [7] are commonly used for machine learning methods.However, classification performance varies a great deal with different feature selection methods [8].Meanwhile, feature engineering is also task-specific.As far as we know, most sentiment analysis methods are implemented at cloud server.It is crucial to find a suitable algorithm that works in fog computing environment.Recently, more and more researchers apply deep neural networks to sentiment analysis [9][10][11].And the release of TensorFlow Lite makes it possible to run deep learning models on portable equipment.
In this work, we focus on Chinese sentiment analysis and aim at presenting a lightweight, non-task-specific method with high portability in fog computing environment without manual feature engineering.We propose a method called Channel Transformation Based Convolutional Neural Network (CTBCNN), which is an improved model of Convolutional Neural Network (CNN).Firstly, we input sentences and obtain word vectors by skip-gram model [12,13] and keep them static.Then we utilize a new structure of called channel transformation based (CTB) convolutional layer, which enhances the capability of automatic feature extraction by considering the channel information of the output of the previous convolutional layer.Our method replaces the fully connected layer by global average pooling layer to prevent overfitting.Global average pooling was proved effective in computer vision tasks by Lin Min [14].Inspired by Lin M's work, we study the outputs by global average pooling layer instead of fully connected layer.
The main contributions of this work are presented as follows.Firstly, we present an effective method that is suitable for fog computing environment.The method can be applied to different cases to handle different data conveniently without much human operations.Secondly, we put forward the idea of channel transformation for convolution layer, which is able to cover more information and extract more representative feature.Thirdly, our work proves the regularization effect of global average pooling layer in nature language processing task.
This work is organized as follows.Section 2 introduces the overall model of CTBCNN.Section 3 shows the structure of the stacking of CTB convolutional layers.Section 4 presents the implementation of global average pooling layer.Section 5 shows the experimental results.We conclude this work in Section 6.

Channel Transformation Based Convolutional Neural Network
We first present the overall model structure of CTBCNN model, which is presented in Figure 1.Compared with classic CNN [15], CTBCNN has two main differences.Firstly, CTBCNN has three CTB convolutional layers.Channel transformation is a trick of matrix transformation that allows us to implement convolution not only on height×width plane but also on height×depth plane and depth×width plane.This kind of convolution enhances capability of feature extraction by covering more information in channel depth dimension.The second difference is that we replace the fully connected layer with global average pooling layer because fully connected layer is a kind of dense connection which is prone to overfitting.CTBCNN takes sentence vector as input and extracts feature maps from each input sentence by the three CTB convolutional layers and then utilize global average pooling over these feature maps to output the sentiment classification result.
In the following two sections, we will present the structure of the three channel transformation based convolutional layers and global average pooling layer in detail.

Channel Transformation Based Convolutional Layer
In this section, we first give the definition of the shape of a matrix or a vector.In the second part we introduce the conventional convolutional layer and convolutional process.The third part presents the CTB convolutional layer on the basis of conventional convolutional layer.Finally we stack CTB convolutional layers and give the three CTB convolutional layers structure.

Definition of Shape.
To make it more understandable, we define the shape of a matrix or vector using three dimensions: height, width, and depth.The input and output result can be represented by their shape in an intuitive manner of the convolutional layer and, for example, we have an image which size is h×w and has RGB 3 channels; then the shape of the image can be represented as (h, w, 3).

Conventional Convolutional Layer.
The input is a sentence vector that can be represented as where x    is a k dimensional word vector of the  ℎ word in sentence that consists of n words and ⊕ means concatenation operation.Thus the input sentence vector is similar to an "image" whose shape is (n, k, 1).
Generally in convolution process, a filter W ℎ  × like a window of ℎ  words is applied to generate a new feature   .
where  is a bias term and  is an activation function.In this work we use ReLU as the activation function.From [16] we know that ReLU fuction has faster convergence speed of gradient descent.As a result of the convolution, a feature map  is generated where Figure 2 gives an example of conventional convolution.One feature is extracted from one filter.Assume the number of filters is d; then the shape of convolutional output is ( − ℎ  + 1, 1, ).Intuitively, we can see that the width of output is compressed to 1, which means information loss.

Channel Transformation Based
Convolutional Layer.To reduce information loss and take full advantage of information in depth dimension, namely, the channel information, we propose a method called channel transformation.Channel transformation is a reshape operation of a vector.For example, the shape of the convolution output is ( − ℎ  + 1, 1, ), which can be transformed to ( − ℎ  + 1, , 1) by switch width and depth dimensions.Channel transformation provides two benefits.One is that the output of the previous convolutional can be remained "image" shape such that we can stack multiple convolutional layers.The other benefit is that we can make good use of information in depth dimension and extract feature maps with more sentimental semantics.

The Three CTB Convolutional Layers.
In CTBCNN model, we have three CTB convolutional layers.Figure 3 shows the structure of the three channel transformation based convolutional layers.
In the first convolutional layer (conv1), the shape of the input vector is (, , 1).Similar to Kim's work, we use 3 different sizes of filters.W 11  ℎ 11 *  , where ℎ 11 = 3 Each size is with  1 filters and extract  1 feature maps.For filter W 1 , the shape of convolution output is ( − ℎ 1 + 1, 1,  1 ).The trick of channel transformation is used to switch width and depth dimensions such that we have the new shape ( − ℎ 1 + 1,  1 , 1) as input for the next convolution layer.
In the second convolutional layer (conv2), we use 3 different filter sizes: , where ℎ 21 =  − ℎ 11 + 1 , where ℎ 22 =  − ℎ 12 + 1 Each size is with  2 filters and extract  2 feature maps.For filter W 2 , the shape of convolution output is (1,  1 ,  2 ).The trick of channel transformation is used to switch height and depth dimensions to get the new shape ( 2 ,  1 , 1).Finally we concatenate the output from 3 different sizes of filters on depth dimension and the shape of output becomes ( 2 ,  1 , 3), which can be seen as an image of size n 2 ×n 1 with 3 channels.
In the third convolutional layer (conv3), we use filter W 3  ℎ 3 ×ℎ 3 to implement wide convolution, with  3 filters and extract  3 feature maps.Notice that the numbers of feature maps  3 need to be the same as the output labels of the whole model.These feature maps contain sentimental semantics of the input sentence, which will be sent to global average pooling layer to capture the most important feature.

Global Average Pooling Layer
In classic CNN, feature maps produced by convolution layers are usually sent to max pooling layer to downsampling and then concatenated as a long vector, which is fully connected to output categories.The dense connection makes it hard to interpret how the category level information from the objective cost layer feed back to the convolution layer.Furthermore, the fully connected layers are prone to overfitting and heavily depend on dropout regularization.
For the reasons mentioned above, our method replaces the fully connected layer with global average pooling layer.Global average pooling layer enforces direct correspondences between feature maps and categories.In this way the feature maps can be intuitively interpreted as categories confidence maps.Furthermore, there is no parameter to optimize in global average pooling layer; thus overfitting is avoided.
From the comparison in Figure 4 we can see that global average pooling layer takes the average value of each feature map which can be regarded as confidence value for each category, and the resulting vector is fed directly into the Softmax function to get the probability distribution among sentiment categories.
where  represents the model parameter set.  (, ) is the average pooling result of the feature map that corresponds to category .Y is the category space.We use stochastic gradient  descent to minimize the Negative log-likelihood function of formula ( 6) and learn the parameter set.

Experiments
5.1.Overview.We have two parts of experiments.The first one performed Chinese sentiment classification tasks using CTBCNN, compared with other typical machine leaning methods and classic textCNN [15].The second experiment focused on the regularization effect of global average pooling layer of CTBCNN model.At last we analyse the portability of our model and how it can be conveniently used in different cases.
All experiments are evaluated on two datasets: THU Hotel reviews (http://nlp.csai.tsinghua.edu.cn/∼lj/.)used in [17] and ChnSentiCorp-Book (http://www.nlpir.org/?actionviewnews-itemid-77) used in [18].Table 1 shows the details of these two datasets after duplication removal work.Both datasets are roughly equally split into positive and negative.Since there is no standard set for these dataset, we performed 10-fold cross validation and used the average accuracy metric to measure the overall performance for each method.
Initialized word vectors are 400-dimensional that were trained on 230000 articles from Chinese Wikipedia The first line of Table 2 gives the results of lexiconbased baseline method used in [18,19].Experiment results show that most machine learning methods surpass the baseline method on both datasets.It is worth noticing that book reviews contain more less frequently used words and

The Regularization Effect of Global Average Pooling Layer.
To evaluate the regularization effect of global average pooling layer, we set up a comparison experiment which replaces the global average pooling (GAP) layer of CTBCNN with a fully connected (FC) layer, while the other parts remain the same.We evaluate this model with and without dropout before the FC layer.All models are tested on the two datasets and the results are shown in Table 3.
As we can see in Table 3, for both datasets, the model with fully connected layer without dropout gives the worst performance, which is expected as the fully connected layer prone to overfitting [20] without applying any regularizer.The second model applies dropout before the fully connected layer and achieves better performance than the first one, while our model with global average pooling layer achieves the highest classification accuracy, which proved global average pooling is an effective way to avoid overfitting.

Portability of CTBCNN.
From the above discussion we know that CTBCNN avoid human engineering by using CTB convolutional layer, which means CTBCNN is nonfeature-specific and non-task-specific.This is helpful when different node handles different types of data in a distributed computing network.Besides, CTBCNN can be applied for multiclass classification tasks conveniently.All we need to do is make sure the number of the output feature maps of the last convolutional layer is equal to the number of categories.

Conclusion
Fog computing uses edge nodes to carry out a substantial amount of data analysis work.In public opinion analysis system, it is crucial to find suitable algorithm that is lightweight in operation and accurate in prediction.This work focuses on Chinese sentiment analysis in fog computing environment and proposes a non-task-specific method called Channel Transformation Based Convolutional Neural Network (CTBCNN).CTBCNN mainly consists of two parts: the three CTB convolutional layers and the global average pooling layer.CTB convolution layer is able to cover more channel information and extract more representative feature maps.And global average pooling is a regularizer to prevent overfitting.Through experiments and analysis, we show that our model do achieve competitive accuracy and it is convenient to apply this method to different cases in operation.

Figure 1 :
Figure 1: The overall structure of Channel Transformation Based Convolutional Neural Network.The model basically consists of three channel transformation based convolutional layers and a global average pooling (GAP) layer.This paper focuses on binary classification and the number of output categories is 2.

Figure 2 :
Figure 2: The convolution process with one input channel for an example sentence.Filter WR h c ×k extracts one feature R (n−h c +1)×1 and with d filters we can extract d features.

Figure 3 :
Figure 3: The structure of three channel transformation based convolutional layers.The first reshape switches the width and depth dimensions of the convolutional output.The second reshape switches the height and depth dimensions of the convolutional output.

Figure 4 :
Figure 4: The comparison between fully connected layer and global average pooling layer.Global average pooling layer feeds each feature map to corresponding category.

Table 1 :
Summary of two datasets.

Table 2 :
the classification results on two datasets.

Table 3 :
Global average pooling compared to fully connected layer.