A Lightweight Binarized Convolutional Neural Network Model for Small Memory and Low-Cost Mobile Devices

use,


Introduction
Deep neural networks (DNNs) are making remarkable progresses every day and involved in many application felds.Computer vision, natural language processing and many other domains beneft from these progresses opening doors to new solutions to hard problems.Convolutional neural networks (CNNs) are the most common method in use these days.CNNs solve various visual problems such as image classifcation, recognition, or detection.New CNN models are constantly proposed and improved, such as ResNeXt [1] and SK-Net [2]; however, their architecture does not change much during the last decade.Te main improvements were possible thanks to the computational power availability: the use of GPU-based machines as well as the increase of the associated memories allows CNNs to achieve outstanding performances.A natural question aroused that is it possible to reach similar or better performances at a lower computational cost?Tat is to say, is it possible to have CNNs or equivalent, running or cheaper machines with less memory (typically mobile phones for instance) and having comparable results?Tis new open problem has been addressed recently and people started to develop methods based on model compression and binarization.Yoshua Bengio in his seminal work [3] introduced a method for training binarized neural networks (BNNs).Indeed, in the training phase, binary weights and activations replace the real ones in the gradients operations as for CNNs.Tis greatly reduces the used memory size, the access times to it, and replaces most arithmetic operations with bitwise operations, which fts exactly with the initial quest of keeping the same efectiveness at a lower cost.
Our work takes inspiration from BNN [3].We focused mostly on two points: (i) fnding the conditions to reduce the number of trainable params and (ii) deriving the best preprocessing operation, all together with keeping the highest performances possible at the lowest cost possible.Moreover, we show also the color information is not necessary and the brightness in images is sufcient to reach similar performances to classical CNNs in executing the same tasks.We obtain good experimental results.Tis double compression method has the following three advantages: (

Related Work
2.1.Information Loss.Unlike optimizing the binary process directly at the convolution layer, LAB2 [4] directly considers the binary loss and applies the near-end Newton algorithm to the binary weights.CI-BCNN [5], through learning to strengthen graph models, mining channel-level interaction, and iterating pop count, reduces symbol inconsistency in binary feature graphs and retains the input sample information.LNS [6] proposes to predict binary weight by monitoring noise learning and training binary functions.ProxyBNN [7] utilizes basis and coordinate submatrices to form the weight matrix prior to binary conversion, while IR-NET [8], RBNN [9], IA-BNN [10], SLB [11], and BBG-NET [12] optimize, reshape, activate, and allocate weights for the binary conversion process.

Network Structures.
In the existing binary research on classical networks, there are some problems, such as too much memory, more parameters, complex network model structure, and relatively high application cost due to inheriting the structure of classical neural networks.Moreover, diferent binarized neural network architectures will not only afect the performance of binarized convolutional neural networks, but also afect the actual hardware deployment cost.Subsequent researchers have made a series of enhancements to BNN [3], such as BBG-Net [12] and Real-to-Bin [13], which aim to improve the accuracy of ResNet and other high-performance conventional networks.DMS [14] has efectively narrowed the precision gap between full-precision networks.BATS [15], BNAS [16], NASB [17], and high-capacity-expert [18] have proposed specialized NAS approaches to design architectures for searching BNN and comparing the accuracy of similar network models with some binarized conventional networks, such as ResNet.Meanwhile, high-capacity-expert [18] applies a conditional calculation method called expert convolution in BNN, combining the convolution group with the above method.MoBiNet-Mid [19] and Binarized MobileNet [20] propose a new, lighter BNN structure with better precision performance in reference to Mobilenet-V1.MeliusNet [21] and ReActNet [22] design a new BNN model structure with a less foating point and binary operations (FLOPs/BOPs) calculation cost, which has better accuracy than full-precision lightweight MobileNet.BNN-BN-free [23] incorporates the BN-free [24] concept and presents a method of constructing a network architecture without batch normalization, which has been replaced by the scaling factor.FracBNN [25] reasonably extends the topology of ReActNet, reconstructs the network block.BCNN [26] designs a specifc network structure specifcally for the ImageNet data classifcation task, and its model is more lightweight than MeliusNet and ReActNet.Te binary operation based on the classical network sometimes wastes computational resources while dealing with some practical small-scale engineering applications.At the same time, the model needs more memory space and increased application cost.We get a lot of inspiration on the basis of previous research, and then we design a lightweight CBCNN model to meet the hardware deployment problem in reality.

Training Strategy. Te choice of training schemes and
technique also afects the best accuracy of the neural network.Main/subsidiary [27] proposes a method for pruning BNN flters.Bop [28] and UniQ [29] each propose a new optimizer for training BNN.Referring to the lottery ticket hypothesis [30], MPT [31] designs a simpler scheme to learn BNN with high precision pruning and quantifying the full precision CNN with random weighting.Real-to-bin [13] designs a two-step training strategy using the method of transfer learning to train BNN by learning the real-value retraining network.By implementing this training strategy, highly accurate models such as ReActNet [22], high-capacity-expert [18], and BCNN [26] are ultimately trained.Additionally, BNN-stochastic [32] proposes a relaxed approach to stochastic methods that enhances the accuracy of the CIFAR-10 dataset.Te above research has laid a solid foundation for the development of the binarized network, which greatly reduces the computational complexity and gradually increases the accuracy.Facing the needs of hardware deployment in application, a sequential model dual compression binarized convolutional neural network structure CBCNN is studied to make the network structure lighter.

CBCNN (Compress Binarized Convolutional Neural Network).
CBCNN is a sequential model structure, which makes the model simpler than others.We binarize the network weights and activation functions to participate in the errors back-propagation.During training, binarized weights and activation values are involved in the calculation of gradients.When making predictions, the weights and activation values of the network are binary (−1/+1).
Tis section describes our proposed compress binarized convolutional neural network (CBCNN) framework and the related training details.

Model.
Te core target in our CBCNN is to reasonably compress the params of BCNN to make the model more lightweight.CBCNN contains three types of blocks, which we named Binary Block 1, Binary Block 2, and Image Compression Block as shown in Figure 1, where Binary Block 1 contains a Binary_C (the binary convolution layer), 2 Mobile Information Systems a MaxPooling layer as well as a batch normalization layer, and Binary Block 2 contains a Binary_D (the binary dense layer) and a batch normalization layer.In addition, we design Image Compression Block to efectively compress the dataset.Our network architecture is shown in Figure 2, where diferent blocks are set for diferent input size.We evaluated our model with three datasets of two diferent sizes, 32 * 32 * 1, 28 * 28 * 1, and 28 * 28 * 1, respectively.Te Binary_C layer is designed to extract features.We carry out a series of experiments on the confguration of diferent blocks, and fnally choose the model structure reasonably according to experimental results.Te kernel size is 3 * 3, and the pool size is 2 * 2.

Training.
As the GTSRB [33] dataset contains RGB color images, we use Image Compression Block to carry out some data preprocessing before training.Te input images are converted from RGB to YUV, and the two color channels U and V are removed, while the brightness channel Y is kept and used as the input of the network.Meanwhile, we use the histogram equalization and the standardize features methods for training.In addition, the fnal mapping of histogram equalization is shown in equation (1), where S k is the target pixel value, r k is the original pixel value, L is the gray level, P r (r j ) is the probability of r j in the original image, MN is the total number of pixels in the image, and n j represents the number of pixels with j in the original image.Ten, the standardize features method is presented as shown in equation ( 2), where μ is the mean value of the image, m is the image matrix, σ is the standard variance, and P represents the pixel value of the image.For Fashion-MNIST [34] and MNIST [35], which are themselves grayscale images of a channel, we do not carry out additional processing before training.
We introduce the implementation principles of Bina-ry_act (the binary activation function), Binary_C (the binary convolutional layer), and Binary_D (the binary dense layer), respectively.Te calculation rules of the single-layer gradient in the CBCNN model we defned are shown in Algorithm 1, where x is the weight of the input, g x is the current gradient of the input, y is the weight of the output, and g y is the gradient of the output.Te process of Algorithm 1 is shown in Figure 3.
In order to train each layer of the CBCNN model according to Algorithm 1, we use the "hard_sigmoid" function as follows: In the training process of the CBCNN model, in order to realize the forward propagation algorithm and backward propagation algorithm (Algorithm 1), we defne the intermediate function "cross" in equation (4)."S_G" means the stop gradient.
As for Binary_C, we propose the function Binary k C (equation ( 6)) to binarize the kernel of the convolutional layers in the CBCNN model.W C is the value of the kernel in convolutional layers, where "S_G" means the stop  Binary As for Binary_D, we propose the function Binary k D (equation ( 7)) to binarize the kernel of the dense layers in the CBCNN model, W D is the value of the kernel in dense layers, where "S_G" means the stop gradient and "h_s" means the "hard_sigmoid" function (equation ( 3)).Trough function equation (7), we convert the value between

Experimental Results
We tested our models on three diferent datasets (GTSRB [33], Fashion-MNIST [34], and MNIST [35]) and compared them to other neural network models that use convolution and binary methods.

GTSRB Test and Analysis.
In order to better simulate the classifcation problems in actual engineering, in this article, we choose a more challenging and practical dataset (43 classes of trafc signs) to evaluate the performance of our model.We choose German Trafc Sign Recognition Benchmark (GTSRB) [33], a database for trafc sign recognition provided by the INI Institute of Neural Computation in Germany.Finally, 51840 images, more than 1700 instances, a total of 43 classes were obtained.
According to the number of trafc sign pictures of each class, we reasonably divide them into a training set and a validation set.We have a training set with 39209 samples and a validation set with 12630 samples.To our knowledge, this article is the second to evaluate binarized neural networks on the GTSRB dataset.Our data of each class and their number distribution in the training set are shown in Figure 4. Our data of each class and their number distribution in the validation set are shown in Figure 5. On the GTSRB dataset, we compare against methods [36][37][38][39][40], Faster R-CNN [41], and 5 traditional methods [42] on test (12630 images).Te result is shown in Table 1.

Mobile Information Systems
We set some training parameters, the epoch is 1000.We use batch normalization with a minibatch of size 200 to speed up the training.Te optimizer used is "Adam" and the loss function used is "squared hinge".We use the learning rate as an initial value of 10 −3 and an end value of 10 −4 .Te accuracy and loss we obtained are shown in Figure 6, and the accuracy of the model reaches 92.94%.In Table 1, we can clearly see that CBCNN is superior to the fve traditional methods in [42], the accuracy is 1.14% higher than that of Faster R-CNN [41] and only 6.65% lower than the current state-of-the-art result [37].However, the memory of our model is only 6.81 MB, trainable params are only 0.59 M, and bitwise operation can be performed at the same time.[34] is a dataset composed of objects related to clothing, shoes, and bags.Te training set and test set of Fashion-MNIST have a consistent distribution with the training set and test set of MNIST.To our knowledge, this article is the frst to evaluate binarized neural networks on the Fashion-MNIST dataset.We compare the test (10000 images) with other Mobile Information Systems advanced methods [43][44][45][46][47][48] on the Fashion-MNIST dataset, and the result is shown in Table 2.

Fashion-MNIST Test and Analysis. Fashion-MNIST
We set some training parameters, the epoch is 500.We use batch normalization with a minibatch of size 50 to speed up the training.Te optimizer used is "Adam" and the loss function used is "squared hinge."We use the learning rate as an initial value of 10 −3 and an end value of 10 −4 .Te accuracy and loss we obtained are shown in Figure 7, and the accuracy of the model reaches 92.86%.In Table 2, we can clearly see that the accuracy of CBCNN is only 4.05% lower than that of the current best method [48].However, the memory of our model is only 1.89 MB, trainable params are only 0.48 M, and bitwise operation can be performed at the same time.

MNIST Test and Analysis.
MNIST is a benchmark image classifcation dataset [35].It is made up of 28 × 28 grayscale images, representing numbers between 0 and 9, and contains 60000 training sets and 10000 test sets.In BNN [3], the binary MLP method is used to obtain the best accuracy of 99.04% on MNIST, but the design of MLP makes the model occupy a large amount of memory.We compare the results tested by the CBCNN method with other methods, and the result is shown in Table 3.
Our experimental parameter confguration is the same as the Fashion-MNIST test.Te accuracy and loss we obtained are shown in Figure 8, and the accuracy of the model reaches 99.32%.In Table 3, we can see that the best accuracy of CBCNN is 0.28% higher than that of the current best method [3] in binarized neural networks.Moreover, the memory of our model is only 1.89 MB, trainable params are only 0.48 M, and bitwise operation can be performed at the same time.

Discussion
We analyze the model performance of CBCNN as follows.

Conclusions
In this article, we propose a lightweight neural network CBCNN (compress binarized convolutional neural network) to solve the problem of image multiclassifcation recognition.We compress both the datasets and the binarized convolutional neural network structures when dealing with the multiclassifcation problem.CBCNN obtain the most advanced results in binarized convolutional neural networks on GTSRB [33], Fashion-MNIST [34], and MNIST [35].In addition, in the process of the forward pass (running and training), the CBCNN model replaces most arithmetic operations with bitwise operations and reduces the memory size and memory access 32 times.Furthermore, the dual compression method (the network structure and dataset) greatly reduces the memory space occupied by the model and enables the potential for loading neural networks onto portable devices that have severely limited memory, which is more conducive to neural network embedded deployment.Experimental results show that CBCNN has a slightly lower accuracy than the convolutional neural network when dealing with multiclassifcation problems, but CBCNN has a lower cost in hardware deployment.High-performance neural network architecture sometimes causes waste of computing resources when dealing with some practical engineering problems.Moreover, excessive reliance on highperformance hardware increases the application cost.In the future, we will continue to work on improving the performance of binarized neural networks by changing the network structures and training strategies.

Data Availability
Our code is available at https://github.com/AI-Xuan/CBCNN.All experimental datasets are public datasets.Mobile Information Systems

Figure 4 :Figure 5 :
Figure 4: (a) Number of training set classes.(b) Number of training set quantity distribution of each class.

Figure 6 :
Figure 6: (a) Accuracy of CBCNN training and validation sets on GTSRB.(b) Te loss of CBCNN on GTSRB.

Figure 8 :
Figure 8: (a) Accuracy of CBCNN training and validation sets on MNIST.(b) Te loss of CBCNN on MNIST.
1) Efectively reducing the amount of calculation in the training process and accelerate the training speed of Rules for calculating the gradient of a single layer in CBCNN.

Table 1 :
Te accuracy performance of diferent methods is compared on the GTSRB dataset.

Table 3 :
Te accuracy performance of diferent methods is compared on the MNIST dataset.

Table 2 :
Te accuracy performance of diferent methods is compared on the Fashion-MNIST dataset.