^{1}

^{2}

^{3}

^{2}

^{2}

^{2}

^{2}

^{1}

^{4}

^{2}

^{1}

^{2}

^{3}

^{4}

With the increasing of depth and complexity of the convolutional neural network, parameter dimensionality and volume of computing have greatly restricted its applications. Based on the SqueezeNet network structure, this study introduces a block convolution and uses channel shuffle between blocks to alleviate the information jam. The method is aimed at reducing the dimensionality of parameters of in an original network structure and improving the efficiency of network operation. The verification performance of the ORL dataset shows that the classification accuracy and convergence efficiency are not reduced or even slightly improved when the network parameters are reduced, which supports the validity of block convolution in structure lightweight. Moreover, using a classic CIFAR-10 dataset, this network decreases parameter dimensionality while accelerating computational processing, with excellent convergence stability and efficiency when the network accuracy is only reduced by 1.3%.

In the 5G era, with the development of emerging technologies such as the Internet of Things and big data, related applications in smart terminals are becoming more and more widespread. As a support for these intelligent applications, brain-computer-interface (BCI) technology plays an essential role in intelligent identification, classification, and computing. Our work mainly focuses on the intelligent recognition of images and videos, which is an indispensable intelligent application in life.

Since the publication of the 2006 Hinton research [

The development of ANN can be tracked back to the 1940s, and its development process is roughly divided into three stages. The first stage was the submission of the neuron model and learning rules from 1947-1969, such as perceptron, HEBB learning rules, binary neuron model (MP model), etc. The second stage is the HNN neural network model introduced by Professor Hopfield in 1982 by introducing the concept of Energy Function. The third stage is the classic back-propagation algorithm proposed by Professor Rumelhart in 1986. This algorithm is now known as the BP algorithm [

Traditional ANN model.

One of the benefits of deep learning frameworks in image recognition is that they do not need the traditional classification algorithms. It requires a lot of artificial processing of image features.

It is an adaptive algorithm. Through multilayer convolution and a nonlinear activation function, the algorithm classifies and regresses all image features through MLP [

Many classic deep learning network architectures have been developed based on the ILSVRC platform, such as AlexNet [

Two lines of work have been proposed to make deep learning networks applicable to daily lives. One is to improve hardware. The other is to improve the computing power of mobile terminals and to improve network structures, with a goal to minimize the training time and the amount of data required without affecting the accuracy. The development speed of the hardware is relatively slow, and its update iteration is far behind the speed required for the evolution of the network structure. Therefore, reducing the calculation parameters and calculation complexity of traditional network frameworks has gained the most research interests in deep learning.

Since the discovery of electrodes that can be used to collect EEG signals from the subcortex in the 1930s, research on EEG signals has provided experimental tools to decode neural substrates that are associated with thoughts and feelings of study subjects. With the rapid development of pattern recognition algorithms, ANNs, and deep learning frameworks, research on brain-computer interface (BCI) systems is in full swing.

BCI system-evoked potential collection methods include nonimplantable electroencephalogram (EEG) [

The five-layer CNN has AlexNet and its optimized networks such as ZF, VGG, GoogleNet, ResNet, and DenseNet. Their performance is gradually improving, but the amount of parameters is also increasing. See Figure

Comparison of famous model quality.

UC Berkeley proposed the SqueezeNet convolutional network model in 2016. This model can reduce these tens of megabytes and hundreds of megabytes of network structure to about 4.6 megabytes without affecting accuracy. This paper proposes three improvement strategies for SqueezeNet’s core module, Fire Module. The first strategy is to improve on the dense

In order to research the lightweight of CNNs in BCI, in 2015, Professor He Kaiming introduced a new structure of deep residual neural networks [

As shown in Figure

Composition of Fire Module

The comparison between grouped convolution and regular convolution is shown in Figure

Regular convolution and grouped convolution.

Regular convolution

Grouped convolution

It is clear from Table

Comparison of parameters and calculations of traditional convolution and grouping convolution.

Parameters | Calculations | |
---|---|---|

Regular convolution | ||

Grouping convolution | ||

Because the input is a whole, the output after convolution is also mapped to the whole of the input. In the grouping convolution, the training of each grouping convolution is performed independently for each channel. This is equivalent to dividing the overall input into many independent parts for convolution. Therefore, the independent operation between each group will cause the information of each group channel to flow. To strengthen the information exchange between each packet, this article adds channel shuffle operation on this basis.

As shown in Figure

The principle of group convolution. (a) The channels are grouped into 3 groups, and there is no communication between different groups of feature maps. (b) Reconstructed feature maps. (c) Channel shuffle after convolution

SqueezeNet uses a conventional SoftMax classifier, and the SoftMax function is a finite discrete probability distribution function. The SoftMax function, as shown in

For probabilistic multiclassification problems, it is simple and effective. However, the high similarity and features of human facial pictures are not apparent, and the class spacing of their features is often substantial. The intraclass distance is likely to be larger than the interclass distance. This will result in a lower recognition rate under complex face pictures. The Center loss function is shown in

The mixed loss function is shown in

In Equation (

The Fire Module improved based on the above method is shown in Figure

Improved NVM.

According to the hyperparameters of the previous module, the grouping number

The dimensions and dimensions of the input and output of the entire model are the same. This article refers to the structure of the SqueezeNet model by linking the improved NVM. And add a pooling layer and two NVM to the fabric. Besides, add the SoftMax-Center Loss function at the end of the structure. The new SoftMax-Center Loss function maps the model’s output value to the [0, 1] interval during training. But the overall output sum is still 1. It is the probability value of the classification result needed in this paper. Table

NVMNet structure table.

Layer name | Output size (number of parameters) | |||
---|---|---|---|---|

Input | ||||

Convolution layer | ||||

Maximum pooling | ||||

NVM2 | 16 | 16 | 64 | |

NVM3 | 16 | 16 | 64 | |

Maximum pooling 3 | ||||

NVM | 32 | 32 | 128 | |

NVM | 32 | 32 | 128 | |

Maximum pooling 5 | ||||

NVM6 | 48 | 48 | 192 | |

NVM7 | 48 | 48 | 192 | |

NVM8 | 64 | 64 | 256 | |

Maximum pooling 9 | ||||

NVM11 | 64 | 64 | 256 | |

NVM12 | 64 | 64 | 256 | |

NVM13 | 64 | 64 | 256 | |

Average pooling | ||||

Full connection | Classification number | |||

SoftMax-Center Loss | Classification number |

The results and discussion may be presented separately, or in one combined section, and may optionally be divided into headed subsections.

It is not difficult to know from Figure

Structure diagram of shortcut connect.

It is not difficult to understand why ResNet is superior to other CNNs in terms of convergence speed and classification accuracy. In each remaining module, shortcut connect makes convolutional layer learning difficult. Secondly, it guarantees efficient transfer of gradients.

Figure

Schematic diagram of bottleneck structure.

In the ILSVRC2017 computer vision competition, SeNet won the classification championship [

Schematic diagram of structure and composition of Squeeze-and-Exception module.

SE-inception module

SE-ResNet module

The results and discussion may be presented separately, or in one combined section, and may optionally be divided into headed subsections.

According to the previous article, we improved the structure in the global pooling layer based on the Squeeze-and-Excitation module, grew the

According to the introduction of the CNN pooling layer, the sliding window size of the general pooling layer is fixed. Based on this, RMAC pooling introduces variable sliding windows to pool features. As shown in Figure

Schematic diagram of RMAC pooling process.

In each dataset, the system is easy to recognize some categories, but difficult for others. Besides, the number of these samples is different, so we choose the CNN classifier suitable for the moment. The number of samples in each category is prone to uneven proportions.

When a class has a large number of classifiers, the classifier can generally distinguish the class well. Conversely, when the number of classes is small, the performance of the classifier is not so good. In the conventional cross-entropy loss function, there is no distinction between the categories with a larger proportion of classifiers and the fewer categories. This will cause a waste of network resources. Because the system repeatedly learns those samples that have a good discrimination effect without focusing on training those samples with poor discrimination effects.

The focus loss function expression is shown in

In Equation (

In the optimization of convolution, we use the idea of grouping convolution to replace all

The improved Se module is shown in Figure

Schematic diagram of the improved SE module.

Part I

Part II

When the input of the upper layer enters the RSE module, it will be divided into two parts. One part uses the branch of shortcut connects, which takes the input of the previous layer directly as the output. Another branch of the bottleneck part is used to reduce the dimensionality of the previous layer input and then input convolution kernels of different sizes for feature extraction. The RSE module uses a

Parameters of each level of the R-SeNet network structure.

Layer | Input size (number of parameters) | Kernel size (number of parameters) | Stride |
---|---|---|---|

Input size | N/A | N/A | |

Convolution layer 1 | 2 | ||

Pooling layer 1 | 2 | ||

RSE.1 | 1 | ||

RSE.2 | 1 | ||

RSE.3 | 2 | ||

RSE.4 | 1 | ||

RSE.5 | 2 | ||

RSE.6 | 1 | ||

RSE.7 | 2 | ||

RSE.8 | 1 | ||

Pooling layer | N/A | Global |

The ORL face dataset [19] was created in 1994 by the Olivetti Lab of the University of Cambridge, UK. There are 40 directories in the dataset. There are 10 facial expressions with different expressions stored in each directory. As shown in Figure

ORL dataset.

On the SqueezeNet, the SoftMax classifier at the end was also changed to a combination of SoftMax-Center Loss for monitoring. In the ORL dataset, this paper uses 4 groups on the number of NVM grouping convolutions.

As shown in Table

Improved network quality nuclear accuracy comparison.

Structure | Parameter quantity (M) | Accuracy | |
---|---|---|---|

Batch normalization | No-batch normalization | ||

SqueezeNet | 4.81 | N/A | 0.7125 ± 0.0004 |

NVM | 3.27 | 0.7082 ± 0.0002 | 0.7016 ± 0.0013 |

Also on the ORL face dataset, the exponent of 2 can completely divide the dimension of the image input, so 2, 4, 8, and 16 grouping convolutions are used for training and testing, respectively. It can be seen from Table

Influence of packet convolution of different groups on quality kernel accuracy.

Number of packets | Accuracy | Parameter quantity (M) |
---|---|---|

2 | 3.31 | |

4 | 3.12 | |

8 | 3.01 | |

6 | 2.88 |

This experiment trained 40 different epochs on three different network structures on the ORL dataset. As shown in Figure

The training process of three different network structures on the ORL dataset.

Comparison of different network structures.

Structure | Parameter quantity (M) | Amount of computation |
---|---|---|

SqueezeNet | 1.24 | 0.70 |

AlexNet | 61.18 | 0.73 |

ResNet | 11.69 | 3.49 |

ShuffleNet | 1.32 | 0.32 |

Vgg-16 | 138 | 72 |

NVMNet | 3.21 | 0.3 |

As shown in Table

Compared with that of SqueezeNet, on the ORL dataset, the classification accuracy of NVMNet is decreased by 0.7%. However, the parameter amount was reduced by 33%. Therefore, NVMNet has a certain value in lightweight models.

The CIFAR-10 [20] dataset contains 60,000 color images with a resolution of

Overview of 10 categories and their respective pictures on the CIFAR-10 dataset.

We use the CIFAR-10 classic dataset for efficiency comparison and parameter comparison of our CNN model (training on an Intel Xeon processor, Radeon Pro 580x graphics card, 32G memory MAC based on TensorFlow deep learning framework). According to our hardware conditions and the requirements of the network structure, the network parameter learning rate is 0.01, the optimization strategy of the CNN is stochastic gradient descent (SGD), the learning rate change rate is 0.1, and the maximum number of iterations is 400000.

As shown in Table

Weight and resolution of CNN model on CIFAR-10 classic data.

CNN model | Network model weight (MB) | Network model compression ratio (%) | Network model accuracy (%) | Network model error distribution (%) |
---|---|---|---|---|

ResNet-50 | 98.1 | N/A | 95.1 | N/A |

ResNet-50@2.5 | 16.5 | 16.1% | 93.1 | -2.0 |

ResNet-50@.5 | 7.2 | 7.2% | 93.1 | -2.0 |

R-SeNet | 9.6 | 10.1% | 93.3 | -1.8 |

We compare the convergence of R-SeNet with other popular lightweight CNN networks, such as MobileNet, ShuffleNet, and SqueezeNet for network convergence. The comparative convergence curve is shown in Figure

Comparison of training convergence of SqueezeNet, ResNet, ShuffleNet, and MobileNet.

Based on the comparison of different lightweight CNN structures, we compare the improved parts of ResNet separately to observe the impact of each method on the performance of the network structure. As shown in Table

The influence of each compression thought segmentation on the accuracy of CNN.

R-SeNet | ||||
---|---|---|---|---|

Grouped convolution | √ | √ | √ | |

RMAC pooling | √ | √ | ||

Focus loss function | √ | |||

CNN model accuracy (%) |

Berkeley and Stanford proposed SqueezeNet to reduce parameter dimensionality with the AlexNet and VGGNet models. The

The BCI system based on the convolutional neural network includes functional modules including visual triggering device, EEG acquisition device, EEG preprocessing module, classifier based on convolutional neural network, and classification result display. There are successive dependencies among various modules. The system will first receive the EEG signal data from the EEG collector and then filter and normalize it through EEG signal preprocessing and then use the data as the input of the EEG signal classifier. After these EEG data are recognized and classified by the classifier, the recognition results are displayed in the result output module of this system. Based on the predecessors, this paper makes lightweight improvements based on the Fire Module of the SqueezeNet convolutional network structure. This paper introduces batch normalization and the SoftMax-Center Loss classifier to improve the recognition accuracy and efficiency of the network structure under the face. In the case of refining the overall structure of the network, the classification effect on the ORL dataset has also improved. However, because the ORL dataset has relatively few training samples, data samples can be added for further verification in future experiments. The lightweight model structure has functional application scenarios. In the future, we plan to explore the feasibility of vision fields other than human faces, including applications in the BCI and BMI fields.

If necessary, you can contact the author of this article for relevant experimental data.

The authors declare that there is no conflict of interest regarding the publication of this paper.

This work is supported in part by the Shanghai Academy of Agricultural Sciences for the Program of Excellent Research Team (2017[B-09]). The authors thank the generous support from the Tongji University, particularly the Department of Software Engineering.