Unrestricted Face Recognition Algorithm Based on Transfer Learning on Self-Pickup Cabinet

In the contactless delivery scenario, the self-pickup cabinet is an important terminal delivery device, and face recognition is one of the efficient ways to achieve contactless access express delivery. In order to effectively recognize face images under unrestricted environments, an unrestricted face recognition algorithm based on transfer learning is proposed in this study. First, the region extraction network of the faster RCNN algorithm is improved to improve the recognition speed of the algorithm. *en, the first transfer learning is applied between the large ImageNet dataset and the face image dataset under restricted conditions.*e second transfer learning is applied between face image under restricted conditions and unrestricted face image datasets. Finally, the unrestricted face image is processed by the image enhancement algorithm to increase its similarity with the restricted face image, so that the second transfer learning can be carried out effectively. Experimental results show that the proposed algorithm has better recognition rate and recognition speed on the CASIA-WebFace dataset, FLW dataset, and MegaFace dataset.


Introduction
At present, the global epidemic prevention and control has become normal, so it is necessary to develop effective prevention and control measures. In the field of logistics distribution, terminal contactless distribution has become the focus of the public [1]. At present, the main research technology of terminal contactless matching is face recognition technology. In the contactless scene, the face images are often interfered by unrestricted environmental factors such as light, occlusion, and expression. For the face image affected by the unrestricted environment, the quality is poor, so it is difficult to recognize, and the recognition accuracy is low [2]. erefore, the study of the fast face image recognition algorithm in unconstrained environment is of great significance for the field of terminal contactless distribution.
In computer vision, face recognition is one of the most important research directions. Face recognition can be used in digital cameras, access control systems, identity recognition network applications and entertainment applications, and other fields. With the rapid development of artificial intelligence and image analysis technology, face recognition technology has been widely used in many fields [3]. However, face images collected under an unrestricted environment are subjects to the mixed interference of illumination, occlusion, expression, and other factors, so that the face recognition accuracy is greatly reduced. At the same time, because the original image exists in high dimensional data, the speed of face recognition is also affected. erefore, it is of great significance to study the fast face image recognition algorithm in unrestricted environment. e face recognition method can be divided into the traditional method and deep neural network method. e traditional methods are mainly composed of the geometric feature-based method, local feature analysis method, and eigenface method. In recent years, with the rapid development of deep learning theory, the deep neural network has become the most widely used algorithm in face recognition. e first deep neural network model that attracted attention in face recognition was the Facebook's DeepFace [4] model in 2014. e DeepFace model is the first study to use deep learning to approach human performance in face recognition. It achieves an average accuracy of 97.35% on the LFW dataset, approaching the human limit of 97.5%. DeepFace extracts highly compressed facial features, and models facial features with the 3D model. e model assumes that the feature parts of all faces are fixed at the pixel level, and the faces are aligned by region-based affine transformation. Finally, a deep neural network is used as a feature extractor to extract the face features from the image.
is model contains a very large number of parameters due to the use of the full connection layer. Subsequently, Professor Tang et al. developed a series of new deep neural network structures of DeepID. DeepID1 [5] is a small network consisting of four convolutional layers and two fully connected layers. e output DeepID feature of the penultimate layer contains 160 dimensions. e higher the number of layers is, the larger the receptive field is in the features of the convolutional neural network. So the connection mode takes into account both the local features of the face and the global features of the whole face. DeepID2 [6] added the loss function of face verification on the basis of DeepID1 and learned the discriminant features of faces through two objective functions. In the final loss function, the importance of face classification and face verification is adjusted by changing the weight of the two loss functions. e final loss function is composed of the weighted sum of the two loss functions of face classification and face verification. e subsequent DeepID2+ [7] network continues to change the network structure on the basis of DeepID2, from 160 dimensions to 512 dimensions. e second is that a lot of structural analysis has been performed, and it turns out that neurons at the higher levels are more sensitive to faces. DeepID2+ achieves results that exceed human performance on LFW datasets and is robust to appropriate face occlusion. Compared with the DeepID2+ model, the latest DeepID3 model further deepens the number of layers and achieves better results.
In recent years, it has become a trend to train models to complete face recognition based on the deep neural network using larger labeled datasets. For example, Google used 200 million face images containing 8 million different individuals in FaceNet [8]. However, the cost of collecting and tagging such datasets is enormous, and training them through deep neural networks requires better hardware support. Larger and larger datasets are used to train better models, but this is not a good development direction. For example, when the face verification accuracy on the LFW dataset increased from 99.47% to 99.77%, the number of trained images increased from 200,000 to 1.2 million. However, in most realistic scenarios, only a small amount of data can be used, and how to learn rich knowledge from these limited data is a problem to be solved. ere is growing interest in the research and development of technologies in different fields, such as domain adaptive and transfer learning [9], and the personalized search model achieved amazing results [10]. In a study [11], a transfer learning algorithm combining a large number of source domain samples with a relatively small number of target domain data is proposed. In a study [12], a deep transfer metric learning method for cross-domain visual recognition is proposed by transferring recognition knowledge from the labeled source domain to the unlabeled target domain. erefore, the best way to guide the face representation learning of a few samples through deep learning can be knowledge transfer or domain adaptation. In other words, you can learn some preknowledge from other large databases and then fine-tune that knowledge in your target domain. erefore, compared with traditional machine learning methods, deep learning methods have great advantages in the field of feature extraction. In other words, the convolutional neural network algorithm can automatically extract the features of the image content layer by layer without any prior knowledge [13]. When the number of samples is large enough and the number of network layers is large enough, the data can be fully excavated to extract excellent features with resolution. However, deep learning is driven by data, and it is difficult to extract features with generalization ability when the amount of data is insufficient. In the unrestricted environment, the number of face images is small and the acquisition cost is high, so the data scale is not enough to support the training of the network.
In this study, the transfer learning method is used to solve the problem of insufficient number of face images in unrestricted environment. First, the faster RCNN algorithm is improved to improve the recognition speed of the algorithm. en, the network parameters trained by the ImageNet large-scale dataset [14] are used for initialization by using one transfer learning. Second, the network parameters are tuned through the face images under restricted conditions with relatively sufficient data volume, and the face images under restricted conditions are trained to be able to recognize. Finally, the face image is enhanced under unrestricted conditions, such as attitude alignment, illumination brightness enhancement, and angle rotation. erefore, it can increase its similarity with the face image under the restricted condition. After secondary transfer learning of the previously obtained restricted face image recognition network, the network that can accurately recognize the unrestricted face image is obtained. e rest of this article is organized as follows. e second section introduces the face recognition model of the improved faster RCNN algorithm. e third section introduces the enhancement method of unrestricted face image. e fourth section gives the transfer learning method and experimental results. e fifth section is the conclusion of this study.

Face Recognition Model with the Improved Faster RCNN
Algorithm. Faster RCNN is an improvement object detection framework over the existing RCNN algorithm framework. RCNN combines CNN with a regional candidate box. On this basis, a faster RCNN algorithm appears [15]. e basic idea of these algorithms is to divide the original image into different candidate boxes. e convolutional neural network CNN is used as a feature extractor, and a feature vector is extracted from the candidate box. en, a classifier is trained to classify the feature vectors. Finally, the target detection framework consists of three parts as shown in Figure 1. e faster RCNN proposes a region proposal network (RPN) to generate candidate boxes based on the convolutional neural network. In this network, the faster RCNN is still used as the detector. RPN and faster RCNN actually share a convolutional layer to extract features; thus, RPN and faster RCNN combine to form a single unified a faster RCNN network. As shown in Figure 2, it can be roughly divided into four parts: convolutional trunk network, RPN micronetwork, faster RCNN detector, and multitask loss.
In order to speed up the detection process of the model, the convolutional backbone network and the multitask loss function are adjusted; at the same time, the RPN micronetwork of faster RCNN is improved in this study.

e Adjusted Convolution Backbone.
e convolutional layer in faster RCNN borrows the convolutional layer architecture of the classical classification model and its pretrained weight. e pretrained model of the classification task was applied to a similar detection task, and the weight of the classification model was directly adjusted, which greatly reduced the training amount of the model. As shown in Figure 3, the convolutional layer part of the VGG-16 classification model [16] is adopted by the convolutional backbone network in this study. e feature map is not pooled before output, which changes slightly. e step length of all convolution operations is 1, and the boundary filling is 1. e width and height of convolution kernel is 3×3, which ensures that the width and height of the image remain unchanged before and after convolution. e pooling layer adopts the maximum pooling of 2 × 2 with a step size of 2. e pooling layer does not affect the number of channels in the image. However, after each pooling, the width and height of the image will be halved. e number of channels for convolution is 64, 128, 256, 512, and so on. e number of channels represents the number of feature images extracted by convolution. After each convolutional layer, the rectified linear unit (ReLU) activation function performs nonlinear transformation, which does not affect the width and height of the feature and the number of channels. erefore, after the input image is convolved with 13 layers and pooled with 4 layers, the width and height of the output feature map  Mathematical Problems in Engineering 3 obtained will become 1/16 of the original image, and the number of channels will change from RGB 3 channels to 512.

Improved RPN Micronetwork.
e RPN micronetwork adopts the sliding window mode to generate 9 anchors in the input image for each point on the feature map. As shown in Figure 4, it is the anchor point corresponding to the center point of the feature map on the input image. e outer black box is the original image of 800 × 600 pixels. e inside, the middle, and the outside of the three thick and thin boxes, respectively, represent the size of 128, 256, and 512. In each scale, there are three situations of aspect ratio of 1 : 2, 1 : 1, and 2 : 1, so each sliding window corresponds to 9 anchors.
In the original anchor, 128, 256, and 512 are set in order to ensure that the target object can adapt to various scales. As for anchor settings with three scales and three proportions, this is equivalent to a point in the feature map that can correspond to 9 regions in the original image perception field. Each area corresponds to an anchor. With supervised learning parameter training, the model can adjust the parameters, so that the calculated feature map can correspond to the object in the original picture. Smaller scales can capture small differences between objects, which allow different classes of objects to be distinguished. e larger scale can ensure that the original image is covered, that is, all the receptive fields, so that the original image will not miss the undetected objects. e RPN network structure is shown in Figure 5. For a given input image, the convolutional layer generates the convolutional feature map. e RPN micronetwork slides a small window of 3 × 3 on the feature map after convolution. Each window maps to a 256-dimensional eigenvector, which is then fed into two branch networks: Cls classification network and Reg regression network. Here, the original 512dimension eigenvector is improved to a 256-dimension eigenvector, which accelerates the detection speed.
Cls classifier classifies the feature vectors of window mapping. It predicts a foreground probability and background probability for each anchor, so there will be 2 × 9 � 18 probability values, which are represented by 18 neurons.
e Reg regression performs regression on the eigenvectors of the window map. It predicts the center point coordinates and offset of width and height of each anchor, which is represented by (t x , t y , t w , t h ). So there are 4 × 9 � 36 offsets, represented by 36 neurons. Note that the processing of the feature map is carried out in a sliding window mode, so these processes can be realized by convolution operation.

Multitasking Loss.
For RPN training, the multitask loss is adopted in this study to combine the cross entropy loss of the classifier with the Smooth L1 loss of the regression. In order to get the multitask loss, suppose that its classification loss is L cls (p i , p * i ) and regression loss is L reg (t i , t * i ), then the multitask loss L( p i , t i ) of all samples is calculated as follows: where N cls and N reg are the standardized terms, and λ is the tradeoff coefficient. Consider the classification loss of a single sample where p * is the category tag corresponding to anchor, and p is the prediction probability of its corresponding category tag.
In the multitask loss, only the regression loss of the anchor marked as positive is calculated, and the regression loss function is considered separately.
where  Mathematical Problems in Engineering where t i and t * i are represented by a source of four groups. In order to express the simplicity of above, the subscript i is removed. And only the regression under a single sample is considered for the predicted offset t i of anchor and the true offset t * i of ground truth for anchor. where e SGD method [17] is used in the RPN network to optimize the multitask loss function to minimize the loss function L( p i , t i ). In the optimization process, the model adjusts the parameters to find a local optimal solution. During the test, RPN is used to predict the category probability of each anchor and the regression offset of the anchor marked as positive. e candidate box of regression offset correction is obtained by using the nonmaximal suppression method from the output of the RPN micronetwork.

The Enhancement Method of Unrestricted
Face Image

Face Posture Alignment through Image Rotation.
Because of the complicated attitude problem, the image in nature brings great challenge to the key point positioning. erefore, face image posture clustering is needed, and then, different categories of images are trained. For an image I, the goal of face alignment is to learn a nonlinear mapping function D from features to key points. Due to the large difference in attitude, D's learning process is complex, so D is divided into several simple subtasks D 1 , D 2 , . . . , D n . In this way, in each subtask D k , faces have similar postures, which simplifies the learning of D.
Because of the diversity of posture, affine transformation is used to adjust face pose before clustering. Affine transformation matrix M is given in equation (7). Affine transformation only needs two sets of three-point coordinates to obtain the matrix M. e three coordinates are the coordinates of the two eyes and the middle position of the mouth. For each image, one is the coordinates (x, y, 1) T in the original coordinate system, and the other is the coordinates (u, v, 1) T in the target seat system. Notice that the position of the eyes in the target coordinate system is on the same horizontal line. Once the transformation matrix Mis calculated, it can be used to affine transform the entire image. e result is shown in Figure 6. e first row is before the affine transformation, and the second row is after the transformation. ere are only three kinds of corrected facial posture: positive face, left face, and right face.
Considering that no real labels about poses are provided in the dataset, the K-means unsupervised clustering algorithm [18] is used to realize pose clustering. en, a better initial position is provided closer to the true position for all samples in each class. e adaptive SDM model is used to extract discriminative features, and each category is trained separately to obtain three different training models. Since the key point position is corrected by affine transformation, the final output key point position needs to be converted to the source coordinate system by an inverse transformation. As shown in formula (8), x t ′ is the position coordinate in the coordinate system after affine transformation, and x t is the coordinate of the key point position in the source coordinate system.

e Alignment of Face Images.
In face alignment processing, a regression function is learned to predict the position increment between the current position and the real position. Considering that the regression function is a complex nonlinear mapping function, a linear regression method is used by SDM instead of complex nonlinear regression to predict the position. e objective function is as follows: Suppose that a picture has m pixels d ∈ R m×1 , and d(x) ∈ R p×1 is the p key points on the picture. x 0 ∈ R p×2 represents the initial position, and h is a nonlinear feature extraction function. In the experiment of this study, the Mathematical Problems in Engineering 5 HOG feature is used. Φ * � h(d(x * )) represents the feature extracted based on the real position. For each sample, there is an initial position x 0 . According to Newton's gradient descent criterion, it is only needed to iterate on formula (7) repeatedly to obtain a sequence of Δx, Δx 1 , Δx 2 , . . . , Δx k . And after each iteration, x k � x k−1 + Δx k is corrected. After several iterations, x k will converge to the optimal position x * . Taylor's expansion was carried out on equation (9); then, Δx is derived. Let the derivative be 0; then, equation (10) is obtained as follows: and the first iteration can be expressed as follows: where R 0 is seen as the direction of decline. A series of descending directions R k and b k need to be calculated and expressed as equation (12). e features extracted at each stage constitute a setΦ � Φ 1 , Φ 2 , . . . , Φ k .
e adaptive feature extraction is embodied in Φ k . As shown in Figure 7, here are five key points as an example. e red dot represents the position obtained at each stage, the green dot represents the real position, and the red circle represents the radius r of the feature extraction frame. Figure 7(a) shows the transformation trend of the radius r of the SDM model. It can be seen that the size of r is constant. is will extract useless features that affect the positioning of key points.
Face alignment is a process from coarse to fine. e size of the radius r of the feature extraction frame is related to the position increment Δx generated in each stage.
When Δx in the training sample is widely distributed, it is more inclined to use large r to extract features. Follow the rule from coarse to fine and adaptively change the size of r to obtain discriminative features. As shown in Figure 7(b), in the initial stage, the obtained position x k is far from the real position x * , and Δx is widely distributed. e use of large feature boxes near key points to extract more useful information is conducive to handling large differences in face shape and ensuring robustness. As the stage increases, the distance between x k and x * becomes smaller and smaller, and the use of a gradually reduced feature extraction frame can effectively obtain discriminative features. Especially in the later stages, a small feature extraction frame can reduce noise and ensure accuracy. Equation (13) expresses the acquisition process of the radius r k of the adaptive feature extraction frame, and x ij k represents the position of the j th key point of the i th sample in the k stage.  Mathematical Problems in Engineering Although r k is gradually decreasing, the strategy is tough, and it does not take into account the distribution of position increments Δx generated at each stage of the training sample. In the experiment of this study, the radius r k of the feature extraction frame is adaptively obtained according to the Δx produced in each stage. At each stage, each sample will produce Δx with a dimension of p × 2. Calculate the distance between the current position of each key point and the real position to obtain P distances. N samples will produce N × p distances. e maximum distance is selected among N × p distances, which is regarded as the size of the feature extraction frame r of each key point of all samples at this stage. e reason for the largest selection is to extract useful features around the real key points. In this way, the size of the feature extraction frame selected at each stage fully considers the distribution of the current position and the true position of the sample. As the stage increases, it will gradually decrease, and the extracted features can be extracted at the real position to the greatest extent, and the interference of redundant features is also reduced.
By obtaining the radius r of the adaptive feature extraction frame, the discriminative feature Φ � Φ 1 , Φ 2 , . . . , Φ k is obtained. e values of R k and b k can be calculated by minimizing the difference between the current position increment and the actual position increment, which is shown as follows: is equation is a typical linear least squares problem, and an analytical solution can be obtained. en, according to formula (12), the position increment Δx k of the k th stage can be obtained. en, the key point position x k of the k th stage can be obtained. After the iteration is completed, R k and b k obtained at each stage can be saved.
In a test sample, the attitude of the face image is first determined, and the corresponding initial position x 0 is given. en, a series of R k and b k obtained in the training stage are used to predict the position of key points.

Enhancement of the Face Image.
It is difficult and costly to obtain sufficient samples of unrestricted face images, and it is difficult to train a satisfactory model based on the number of existing samples. When encountering the problem of insufficient data volume, the common method is to expand the dataset by random cropping, color conversion, and other methods. Although this method can improve the recognition accuracy of the network, the improvement is limited. In this study, the method of transfer learning is adopted with the aid of the daytime aerial photography dataset with relatively sufficient data volume. Fine-tune the daytime vehicle recognition network model through nighttime data to realize the recognition of nighttime targets.
Since restricted face images and unrestricted face images have great similarity, the features extracted by the network are also very similar. Transfer learning takes advantage of this similarity to transplant the restricted face training model to the unrestricted face recognition network. en, we use the unrestricted face data to enhance the image to make the algorithm more suitable for the recognition of unrestricted face images. e higher the similarity of two objects connected by transfer learning is, the more conducive to transfer learning is. By comparing restricted face data with nonrestricted face data, the main difference lies in the interference of illumination, occlusion, expression, and other factors. erefore, in order to improve the similarity between restricted face data and unrestricted face data, it can be processed from multiple perspectives such as illumination, occlusion, and expression to improve the similarity. In this study, illumination enhancement is selected for image processing to improve the similarity between restricted and unrestricted face data. In the algorithm of illumination enhancement, the Retinex algorithm is adopted in this study because it can weaken the influence of light on the object in the image and restore the original color, edge, and other information of the object.
In Retinex theory, images are thought to be composed of incident and reflected light [19]. e basic idea of the image enhancement method is to remove the influence of the illuminating light and retain the reflection properties of the object itself.
e Frankle-McCann Retinex iterative algorithm is used in this study. e iterative piecewise linearization based on spiral structure and compares paths to estimate illumination is adopted in this algorithm. e spiral structure path is that the pixel correction result at point (0, 0) will be jointly determined by the pixel value of the inflection point of the path. e number of selected reference points is moderate. e closer to the target point, the more intensive the sampling, the better the result. e color images are used in this study, so the three channels of RGB are processed separately. Finally, the three channels are merged and output.
(1) e early stage of the data conversion In order to reduce the amount of calculation in subsequent calculations, the pixel value of the original image is converted from the integer domain to the logarithmic domain. To avoid negative values, the original image is added with 1 to the pixel value as a whole, which is expressed as follows: en, the constant matrix r can be initialized. e constant value takes the average value of the original image pixels, and the size is the same as the original image.
(2) Comparison and correction between pixels For an image with a pixel size of m × n, the coordinate change between the two comparison points at the furthest distance from the target point is expressed as follows: where where fix is the rounding function. en, the distance between the two comparison points in each next step is shortened to half of the previous step, and the direction is rotated clockwise, which is shown as follows: And the direction is rotated until the interval of the comparison points is less than 1.
Assuming that r n (x, y) is the result of the previous iteration, r n+1 (x, y) is the result of this iteration. If D > 0, then Otherwise, if D < 0, then (3) Image output display After the iterative operation, the gray value of the reflected image is often a floating point number, which needs to be linearly converted to an effective gray value: where r (n+1)max (x, y) and r (n+1)min (x, y) are the maximum and minimum values of the iteration result r n+1 (x, y), respectively, and R(x, y) is the final enhancement result.

Method of Transfer Learning.
In order to effectively recognize face images in an unrestricted environment, an unrestricted face recognition algorithm based on transfer learning is proposed. e region extraction network of the faster RCNN algorithm is improved to improve the recognition speed of the algorithm. In order to improve the detection accuracy, a transfer learning method is adopted, in which network parameters trained by large-scale datasets are used for initialization. en, the network parameters trained by the face dataset under unrestricted conditions are fine-tuned. e specific steps are expressed as follows.
Step 1. Large ImageNet dataset and face image dataset are transferred for the first time. e improved faster RCNN network in this study is trained by using the large ImageNet dataset and face image dataset. e parameters obtained from the training are used for network initialization.
Step 2. e secondary transfer learning of face image datasets under unconstrained conditions is carried out. e network parameters trained by the face dataset can be fine-tuned under unrestricted conditions.
Step 3. e initialized network is used to train the RPN and generate ROI Step 4. According to the ROI obtained in Step 3, the source domain dataset is used to conduct classification and regression training for the initialized network Step 5 . e network obtained in Step 4 is used to train RPN, adjust only the network layer parameters specific to RPN, and generate ROI Step 6 . e generated ROI training network was used for classification and regression, and the shared convolutional layer parameters were kept fixed. So far, the training of the faster RCNN network for the target domain data detection model is completed.

Restricted Face Image Recognition Experiment.
In order to verify the effectiveness of the face recognition algorithm proposed in this study, the recognition experiment is conducted on the CASIA-WebFace face dataset [16], LFW dataset [20], and MegaFace dataset [21] under unrestricted conditions.

CASIA-WebFace Dataset.
CASIA-WEBFACE is one of the most important large-scale datasets in the field of face recognition. It contains more than 494,000 face images with labels of 10,575 people, and the size of its training set is only 0.49 MB. In this study, face images belonging to the same person as LFW and MegaFace were first removed from the dataset. A total of 122,875 face images of 2580 people were selected from the rest of the dataset, and these images were divided into training set, verification set, and test set according to the ratio of 7:2:1. e face image in the training set is preprocessed. Obtain the largest face region in the face image and remove the interference outside the face region. Key points were set in the image, and affine transformation was carried out according to the nose and eyes, so as to make the eyes flush and the nose centered. en, the face image is further processed to make its size as 112 × 112. Among them, the face image of the training set is shown in Figure 8.

Recognition Results of Restricted Face
Images. e network model trained by ImageNet is used to initialize the network parameters of the algorithm in this study, and the restricted face image set in the CASIA-WebFace dataset is used to fine-tune the algorithm network in this study, and the target recognition of the restricted face image based on one transfer learning is completed. In this experiment, the recognition effects of different networks are compared on CASIA-WebFace. e results are given in Table 1.
By observing the recognition results in Table 1, it can be seen that the network model in this study can accurately identify restricted face images, and the proposed algorithm has a faster recognition speed.

Unrestricted Face Image Recognition Experiment.
In order to verify the effectiveness of the proposed algorithm for unrestricted face image recognition, this experiment was tested on LFW and MegaFace datasets under unrestricted conditions. In the model training of CASIA-WebFace datasets, face images belonging to the same person as those in LFW and MegaFace datasets have been removed. e low-level convolution layer of the network model is used to extract shallow features such as edges, colors, and textures, which has little influence on different datasets. In this study, the parameters of the low 3-layer convolutional network are fixed for the trained constrained face image recognition model. According to the dataset of unrestricted face images, only the parameters of the deeper network are fine-tuned.

LFW Dataset Experiment.
e LFW dataset contains more than 13,000 facial pictures of 5749 people in the natural environment. In the natural environment, human faces are often affected by illumination, expression, and occlusion, which bring great challenges to recognition. In the experiment, a View2 test set containing 6000 pairs of faces was used. In this dataset, it contains a total of 10 folds, and each fold contains 300 pairs of matched and mismatched faces. e random sample of the LFW dataset is shown in Figure 9.
Comparison tests were performed on unrestricted datasets, and the results are shown in Table 2. rough experimental comparison in Table 2, it can be found that the algorithm proposed in this study achieves a quite good recognition effect in the FLW dataset, which is 3.20% higher than Ouamane in the same small dataset. Even in the big data training set, the recognition efficiency is higher than that of DeepFace and DDML. e accuracy is similar to that of Face Net with large training data, which shows that the algorithm framework in this study has a good recognition rate. And the recognition speed of the algorithm in this study has a very good application value for actual engineering applications because of other algorithms.

MegaFace Dataset Experiment.
MegaFace is a public face test dataset with millions of interference items added. MegaFace dataset includes multiple application scenarios such as face verification, face training, and face confirmation.
MegaFace specifies that a training set below 0.5 MB is a small dataset, while a training set above 0.5 MB is a large dataset. However, the network proposed in this study is trained and evaluated under a small dataset. e random sample of MegaFace dataset is shown in Figure 10.
As can be seen from the experimental comparison in Table 3, the recognition rate of the algorithm proposed in this study is 2.10% higher than that of the Ouamane algorithm in the same test results. At the same time, the algorithm in this study exceeds the recognition rate of FaceNet in large-scale data, which is enough to show that the proposed algorithm has good robustness under unconstrained conditions. At the same time, the proposed algorithm has a faster recognition speed.   Figure 9: e random sample of the LFW dataset.

Conclusions
In this study, the first transfer learning is completed from the large-scale dataset ImageNet to medium-scale restricted face image set, and the effective recognition of restricted face image set is realized. en, the secondary transfer learning from the medium-scale restricted face image set to the smallscale unrestricted face image set is completed. e image enhancement methods, such as pose alignment, illumination brightness enhancement, and angle rotation, are applied to the unrestricted face image set to facilitate the smooth transfer learning. Experimental results show that the proposed algorithm has high identification accuracy and can meet the requirements of rapid detection and has the engineering application value for the field of terminal contactless distribution.
Data Availability e source code of this algorithm cannot be provided directly because of the programmer's reason.

Conflicts of Interest
e author declares that there are no conflicts of interest.