Vision-based recognizing and positioning of electronic components on the PCB (printed circuit board) can improve the quality inspection efficiency of electronic products in the manufacturing process. With the improvement of the design and the production process, the electronic components on the PCB show the characteristics of small sizes and similar appearances, which brings challenges to visual object detection. This paper designs a real-time electronic component detection network through effective receptive field size and anchor size matching in YOLOv3. We make contributions in the following three aspects: (1) realizing the calculation and visualization of the effective receptive field size of the different depth layers of the CNN (convolutional neural network) based on gradient backpropagation; (2) proposing a modular YOLOv3 composition strategy that can be added and removed; and (3) designing a lightweight and efficient detection network by effective receptive field size and anchor size matching algorithm. Compared with the Faster-RCNN (regions with convolutional neural network) features, SSD (single-shot multibox detectors), and original YOLOv3, our method not only has the highest detection mAP (mean average precision) on the PCB electronic component dataset, which is 95.03%, the smallest parameter size of the memory, about 1/3 of the original YOLOv3 parameter amount, but also the second-best performance on FLOPs (floating point operations).
As an essential component of electronic information products, electronic components must be assembled under the rules of the correct class and correct location in the manufacturing process of electronic products [
As we know, there are many kinds of electronic components and different shapes, and CNN simulates the visual cognition principle of the brain and retains the features of the object through dimensionality reduction, even if the object appears again when the scale, direction, and position are different to identify it. Therefore, it is necessary to combine the detection of electronic components with the CNN. Kuo et al. proposed a novel Graph Network block to refine the component features conditioned on each PCB. The mAP of electronic component detection on the testing PCBs can reach 65.3% [
Receptive field (RF) is an important concept that combines biological vision research to reveal why CNN can complete various visual tasks. RF defines the original image’s area size that can be seen by a pixel in the different depths feature layers of the CNN [
From the above analysis, we understand that studying the visual characteristics of the CNN and adjusting the network depth according to the sample data size are two problems that still need to be solved in the current object detection task. Based on these two problems, we select the anchor-based YOLOv3 in the one-stage object detection method as the research framework, take the electronic components on the PCB as the detection object, take the effective receptive field as the research key point, and finally realize the design of electronic components’ detection method based on the anchor size and the effective receptive field size matching. The key contributions of this paper are summarized as follows: We realized the calculation and visualization of the effective receptive field size of different depth layers of CNN based on gradient backpropagation. It not only considers the multichannel problem in the CNN but also regards the processing of nonlinear modules. Through this interpretability analysis, we found that the effective receptive field of different layers changes dramatically. It is easier to understand that the shallow layer is sensitive to position information, and the deep layer is sensitive to semantic information. To the best of our knowledge, this is the first time to reveal how YOLOv3 internally captures the data to detect the target through the receptive field. We proposed a modular YOLOv3 composition strategy. The entire YOLOv3 model is composed of five modules. We can add, remove, and retain some modules. In particular, we found that if we change the number of modules in the backbone network, Darknet-53, the effective receptive field size corresponding to each pixel in the three anchor distribution layers of YOLOv3 will change. We designed an effective receptive field size-anchor size matching algorithm based on YOLOv3. This method analyzes the factors that affect the ERF size of the anchor distribution layer. It formulates module addition and removal strategies to ensure that the ERF size is closest to the anchor size distribution layer of the layer’s largest anchor.
In the next section, we review the related works on YOLOv3 and the effective receptive field. In Section
Since the proposed PCB electronic components’ detection network is implemented on the YOLOv3, which involves the accurate quantification of the ERF size, the following will introduce these aspects’ relevant research work.
The YOLOv3 algorithm is a typical one-stage object detection algorithm that combines the classification and target regression problems with an anchor box, thus achieving high efficiency, flexibility, and generalization performance [
When the YOLOv3 performs object detection, the core content has four parts. The first is preprocessing training data, including size cropping of input pictures, generation of clustering anchors, and allocation of anchors. The second part is the feature extraction network, which is mainly completed by DarkNet-53. The third part is the feature fusion network, which uses the YOLO layer to build a feature pyramid. The fourth part is the loss function and the output module. All improvements to the YOLOv3 are around these four parts. The research work in this paper mainly uses the training dataset to resize and generate the anchor, remove or increase the internal module of Darknet-53 to strengthen the matching degree of the anchor size and the effective receptive field size, which is the anchor distribution layer, and finally achieve the goal of efficient detection. So, we mainly reviewed some of the work done by our predecessors in this field.
Liu et al. proposed the ACF-PR-YOLO structure, which includes a region proposal extraction method based on the aggregated channel feature before the whole image was sent to the YOLOv3 for cyclist detection high-resolution pictures [
Since the RF concept is connected with the CNN, researchers have been trying to use the RF to reveal the internal reasons as to why CNN can perform some visual tasks. In particular, several mathematical formulas are used to describe the relation of the convolution kernel size, convolution padding size, convolution stride size, convolution dilation rate, and the size of different feature layers’ receptive fields [
The concept of the ERF (effective receptive field) comes from a problem. Since increasing the receptive field can improve recognition accuracy, is it necessary to increase the CNN network depth to maximize recognition accuracy? The answer is no. Some researchers noted that a given feature was not equally impacted by all input pixels within its receptive field region: the input pixels near the center of the receptive field had more “paths” to influence the feature and consequently carried more weight. The theoretical receptive field refers to the region observed in the input space for a neuron in the convolutional neural network. The effective receptive field refers to the set of input neurons that are connected to a higher level neuron, excluding the invalid neurons in the receptive field. Luo Wenjie et al. provided a mathematical formulation and a procedure to measure effective receptive fields, experimentally observing a Gaussian shape on the theoretical receptive field, with the peak at the receptive field center [
Although significant progress has been made in the two fields mentioned above, there are still some gaps that need to be fulfilled from the review.
In terms of the improved design of the Darknet-53 on the YOLOv3, most studies mainly focused on replacing the backbone Darknet-53 with the existing high-efficiency backbone or the Darknet-M, and the M is random. Few people pay attention to the influence of the number of convolution modules in the Darknet-53 on the original picture’s visual recognition effect. In this paper, the Darknet-X modular design method is proposed. On the premise of maintaining at least three down-samplings of the original feature extraction network, we offer a method of removing, reducing, or adding some modules to achieve a backbone network design that matches the detection object’s size.
In terms of the effective receptive field concept, most of its applications only describe the relationship between the receptive field, the effective receptive field, and the target size in general, and rarely involve its specific calculation method. Only two articles included the calculation and display of effective receptive fields, and one used the gradient backpropagation to solve the size of an effective receptive field. However, only one channel was used, and only the effect of the convolution and linear operation modules on the effective receptive field was considered. The other only showed the dilated convolution parameters’ effect of the effective receptive fields and did not quantify the size of each layer’s effective receptive fields of one specific CNN network. In this paper, an ERF calculation method based on gradient backpropagation is proposed. This method can quantify each feature layer’s effective receptive field size in the YOLOv3 before the training and provide data support to design an accurate object detection network.
This research mainly implements a rapid and lightweight model design method suitable for small-scale object detection by reducing and removing backbone network modules. The network uses the YOLOv3 as the basic network, uses electronic components on the PCB as the detection objects, and uses each anchor group’s maximum width and height after input images resize to 416 × 416 as the threshold. By analyzing the influence of the different module combinations of the Darknet-53 on the effective receptive field size of the anchor distribution layer in the YOLOv3, an anchor and effective receptive field matching PCB detection algorithm based on effective receptive field analysis are designed. For the convenience of the following description, the method proposed in this article is called ERFAM-YOLOv3.
The implementation process of the proposed ERFAM-YOLOv3 is shown in Figure
The implementation process of ERFAM-YOLOv3. ERFAM-YOLOv3 is derived from YOLOv3, but the data preprocessing method and feature extraction module are different from the original YOLOv3. The following feature fusion module and output layer are appropriately adjusted according to the changes of the backbone.
Anchor box is a concept used by the YOLOv3 when making bounding box prediction. The anchor’s significance is that its size predefines the target’s most likely height and width to be detected. In the data preprocessing of the YOLOv3, we usually use K-means to cluster the target sizes in the training set to generate nine most likely target anchors, each with its width and height.
The size of the pictures in the dataset is often not uniform, and all pictures, whether for training or testing in the YOLOv3, need to be resized to 416 × 416 first. Therefore, in the ERFAM-YOLOv3, the anchors are generated after resizing the training set image in advance. The advantage of this is that all the data are resized in advance to meet the size of the network input, and the width or height of the anchor can be directly used as the threshold of the ERF size in the three anchor distribution layers. After calculation of the PCB train dataset, the traditional sizes of the 9 anchors generated after normalizing are (24 × 14); (16 × 32); (37 × 21); (55 × 29); (28 × 57); (72 × 46); (48 × 106); (136 × 60); and (212 × 211). The sizes of the generated anchor after the picture is resized are (1 × 3); (3 × 1); (2 × 5); (5 × 2); (5 × 5); (4 × 9); (10 × 4); (14 × 12); and (31 × 31). We show them in Figure
Schematic diagram of K-means cluster anchors. (a) 9 anchors were produced by the original K-means clustering method (b) 9 anchors produced by the K-means clustering method after resized. In this paper, the anchors in (b) are used as a priori box for target recognition and positioning. In YOLOv3, the blue, green, and red anchors will be assigned to the small, middle, and large feature fusion layers of the output layer, respectively.
Luo et al. propose the concept of the ERF. He concludes that although all pixels in the receptive field affect the final result, their weights are different. The weight at the center is the largest, and the weight at the edge is the smallest. That means we need to quantify the ERF size to a specific value. This particular value is the original image’s effective area size that each pixel in the feature layer of the CNN can see effectively. In the paper [
In Figure
Take the last feature layers of Darknet-53 as input, give random weight to the entire network, and use gradient backpropagation to solve the ERF.
It is essential to note that the YOLOv3 includes convolution, batch normalization, Leaky ReLU, and multichannel processing. As an activation function, the mathematical expression of Leaky ReLU is
The gradient backpropagation formula for Leaky ReLU is
Batch normalization is to normalize the
For the multichannel gradient backpropagation problem, we assume that the loss function
The gradient backpropagation formula for the multichannel is
Although the YOLOv3 contains a lot of convolution, BN, Leaky ReLU, and multichannel processing, according to the chain derivation rule and the above effective receptive field analysis method, we can realize the ERF visualization and the ERF size calculation results of any feature layer in the YOLOv3.
According to the different arithmetic modules, the modular design strategy of the ERFAM-YOLOv3 is to disassemble the entire YOLOv3 object detection framework. In the reconstruction process, for the different size targets, the original core modules are retained. Some down-sampling modules were removed or added. The number of the repeatable modules is reduced or increased so that the final classification and positioning can be more accurately adapted to the target size.
According to the operation sequence in the YOLOv3, we have defined five modules, namely, DBL1, DBL2, Res-
According to the above modular design strategy, we conducted the modular structure analysis of the original YOLOv3. The ERFAM-YOLOv3 is to retain the core modules, determine which modules need to be removed, and count the number of the repeatable modules to achieve the purpose of matching the three output layers’ anchor size to the corresponding layers’ ERF size. In this way, the design problem of the ERFAM-YOLOv3 is transformed into the problem that the Darknet-X replaces the original Darknet-53, which is to solve the issues of X1, X2, X3, X4, and X5, respectively, and assess whether some modules need to be removed. The ERFAM-YOLOv3 structure is shown in Figure
Module composition in the YOLOv3 structure.
The ERFAM-YOLOv3 structure is designed by a module design strategy.
From the object detection framework of the YOLOv3, we learned that the object’s classification and positioning are achieved by assigning nine predefined anchors to the three output layers of the different scales through continuous learning features of the training data. Although many factors affect the final object detection effect, in the method mentioned in this paper, we are concerned about the size of the ERF corresponding to a pixel of the layer where the anchor is located. We define
Because the clustering algorithm has obtained the nine anchors’ size before training, each anchor’s width and height are fixed values. The nine anchors are divided into three groups according to the order from small to large. These three groups are called small anchors, medium anchors, and large anchors in turn. Small anchors are assigned to Yolo-scale-3, medium anchors are assigned to Yolo-scale-2, and large anchors are assigned to Yolo-scale-1. We can see them in Figure
We designed the effective receptive field size-anchor size matching algorithm to detect the objects based on the above definition. The flowchart of the effective receptive field size-anchor size matching algorithm is shown in Figure
Flow chart of anchor size-effective receptive field size matching algorithm.
Case one: in the three comparison formulas of
Case two: in the three comparison formulas of
The anchor effective receptive field matching algorithm is to increase, decrease, and remove the number of modules in the Darknet-53 to minimize the difference between the anchor distribution layer’s effective receptive field size and the anchor size to achieve the best match to improve the object detection effect.
We evaluate the proposed method on a dataset of PCB electronic components. There are 1000 images, 29 instrument categories, and 182900 electronic components in the dataset [
The experimental platform is the operating system (OS): Windows 10, core processor (CPU): Intel Xeon 6132 × 2 2.60 GHz, graphics processor (GPU): NVIDIA Titan RTX (24 G), hard disk space: 512 G SSD + 2T SATA, memory: 192 GB, Python 3.5.2. program development framework: Python 3.7, TensorFlow 2.0, CUDA 10.1. The PCB electronic component dataset is divided into 8 : 2, that is, eight pieces of data are randomly selected for training, and two pieces of data are used as detection data.
The ERFAM-YOLOv3 design suitable for electronic component detection on the PCB will be completed using the ERF and the effective receptive field size-anchor size matching algorithm.
To get the ERF size of each feature layer of the YOLOv3, we must first load the YOLOv3 model. Each convolutional layer has two parameters—weight and bias. We will set every layer’s weight to be a random value and the bias to be 0. The BN layer has four parameters—weight, bias, running_mean, and running_var. We set the weight to be a random value, bias to 0, running_mean to 0, and running_var to 1.
We are particularly concerned about whether the ERF size of the three yolo_scale output layers corresponding to the assigned anchors matches the anchors’ size. Therefore, Figure
YOLOv3 original anchors allocation corresponding to the activation map and effective receptive field. (a) yolo_scale_3 (52 × 52) activation map (b) yolo_scale_3 (52 × 52) ERF (c) yolo_scale_2 (26 × 26) activation map (d) yolo_scale_2 (26 × 26) ERF (e) yolo_scale_1 (13 × 13) activation map (f) yolo_scale_1 (13 × 13) ERF.
Anchors and ERF size of Yolo_scales inYOLOv3 and ERFAM-YOLOv3.
Anchor | The maximum width or height in each group of anchors | YOLOv3 (Darknet-53) | ERFAM-YOLOV3 (Darknet-11) | ||
---|---|---|---|---|---|
Output layer | ERF size | Output layer | ERF size | ||
(10 × 4) | 31 | Yolo_scale_1 (13 × 13) | 174 | Yolo_scale_1 (52 × v52) | 47 |
(14 × 12) | |||||
(31 × 31) | |||||
(5 × 2) | 9 | Yolo_scale_2 (26 × 26) | 95 | Yolo_scale_2 (104 × 104) | 23 |
(5 × 5) | |||||
(4 × 9) | |||||
(1 × 3) | 5 | Yolo_scale_3 (52 × 52) | 49 | Yolo_scale_3 (208 × 208) | 13 |
(3 × 1) | |||||
(2 × 5) |
As we all know, there are many electronic components on the PCB, and many components are soldered to the PCB through SMT (Surface Mounted Technology). They have similar shapes and small sizes. According to the previous analysis of the electronic component dataset, we can understand from Table
According to the effective receptive field size-anchor size matching algorithm, we need to remove the fifth down-sampling DBL2 and RES-
ERFAM-YOLOv3 network structure.
After completing the design of the ERFAM-YOLOv3 object detection framework adapted to the electronic component dataset, we again use the previous ERF analysis and calculation methods to perform a single pixel of the three anchor distribution output layers of ERFAM-YOLOv3 corresponding to the original image. The activation map and ERF are shown in Figure
ERFAM-YOLOv3 anchors allocation corresponding to the activation map and effective receptive field. (a) yolo_scale_3 (208 × 208) activation map (b) yolo_scale_3 (208 × 208) ERF (c) yolo_scale_2 (104 × 104) activation map (d) yolo_scale_2 (104 × 104) ERF (e) yolo_scale_1 (52 × 52) activation map (f) yolo_scale_1 (52 × 52) ERF.
Comparing Figure
We found that the above comparative analysis of the YOLOv3 and the ERFAM-YOLOv3 found that for the dataset with a small overall size distribution after clustering, after removing some modules and reducing repeatable number Res-
To better illustrate the proposed algorithm’s effectiveness, we will represent by showing the detection image results, the accuracy of the table of detection, and a series of curves of the Faster-RCNN, SSD, YOLOv3, and ERFAM-YOLOv3.
Faster-RCNN, SSD, YOLOv3, and ERFAM-YOLOv3 can realize multi-target detection in one image, and we tested 200 images with them. The identified target is matched in different bounding boxes, and some detection results are expressed in Figure
Comparisons of object detection results between the four algorithms. (a) Original image of Arty_Bottom. (b) Faster R-CNN object detection effect of Arty_Bottom. (c) SSD object detection effect of Arty_Bottom. (d) YOLOv3 object detection effect of Arty_Bottom. (e) ERFAM-YOLOv3 object detection effect of Arty_Bottom. (f) Original image of Zybo. (g) Faster R-CNN object detection effect of Zybo. (h) SSD object detection effect of Zybo. (i) YOLOv3 object detection effect of Zybo. (j) ERFAM-YOLOv3 object detection effect of Zybo.
In the five pictures about Arty_Bottom in the left column of Figure
The right column of Figure
Evaluation curves of the four algorithms. (a) precision-recall curve. (b) mAP-train steps curve. (c) Loss-train step curve. (d) Accuracy-threshold curve.
For detecting PCB electronic components containing 29 categories, we used the AP (average precision) of each category of components to characterize the four algorithms’ performance. In Table
AP for each electronic component category of four algorithms.
Category | AP (average precision) of algorithm | |||
---|---|---|---|---|
Faster R-CNN (%) | SSD (512) (%) | YOLOv3 (Darknet-53) (%) | ERFAM-YOLOv3 (Darknet-11) (%) | |
Resistor | 28.07 | 15.89 | 23.89 | |
Capacitor | 14.75 | 16.08 | 39.87 | |
Text | 14.90 | 16.72 | 51.74 | |
Unknown | 22.84 | 43.06 | 72.62 | |
Emi | 17.63 | 36.36 | 93.84 | |
Ferrite | 23.81 | 37.24 | 60.89 | |
Pads | 21.09 | 16.92 | 48.79 | |
Led | 13.51 | 17.18 | 68.95 | |
Zener | 21.80 | 75.00 | 78.79 | |
Component | 16.99 | 17.02 | 47.05 | |
Transistor | 25.21 | 18.00 | 93.49 | |
Diode | 26.62 | 27.11 | 78.76 | |
Jumper | 23.26 | 27.27 | 80.55 | |
Inductor | 23.54 | 27.27 | 77.76 | |
Fuse | 25.73 | 27.27 | ||
Electrolytic | 25.19 | 27.05 | 63.01 | |
Transformer | 28.24 | 97.73 | ||
Potentiometer | 23.45 | 9.09 | 55.88 | |
Pins | 19.14 | 54.55 | 99.00 | |
Clock | 16.76 | 45.45 | 84.31 | |
Battery | 29.85 | 54.55 | ||
Button | 27.43 | 90.43 | 99.08 | |
Ic | 20.02 | 54.55 | 95.12 | |
Switch | 24.73 | 81.82 | ||
Test | 55.02 | 17.21 | 74.54 | |
Connector port | 31.00 | 54.52 | 95.73 | |
Buzzer | 34.69 | 100.00 | ||
Heatsink | 32.35 | 100.00 | ||
Display | 26.64 | 100.00 |
The comparison of objective analyses shows that the detection effect of the ERFAM-YOLOv3 in Figure
To perform a detailed ablative analysis, we have conducted experiments with the YOLOv3 baseline. We use the mAP (mean average precision) as an evaluation index to measure the model’s effectiveness. The higher the value of the mAP, the better the detection effect. For electronic components on the PCB, the mAP of the original YOLOv3 is 79.48%. Table
The effectiveness of different components on mAP.
Remove the fifth down-sampling DBL2 and Res-X5, | Remove the fourth DBL2, Res-X4, | Remove the leftmost DBL1, | mAP@0.5 | |
---|---|---|---|---|
√ | 86.65%+7.17 | |||
√ | √ | 89.88%+3.23 | ||
√ | √ | √ | 87.18%−2.70 | |
√ | √ | √ | 95.03%+5.15 |
To further illustrate that the different components in Table
The matching degree of removing different modules.
Difference between ERF size and anchor size | YOLOv3 | 234(1-1111)-YOLOv3 | 123(1-128)-YOLOv3 | ERFAM-YOLOv3 |
---|---|---|---|---|
143 | 63 | 18 | ||
86 | 43 | 17 | ||
44 | 19 | 10 |
From Tables
To comprehensively compare the advantages and disadvantages of different object detection algorithms in the above ablation experiments, we have drawn four curves for comparative analysis in Figure
The precision-recall is a useful measure of the success of prediction when categories are very imbalanced. In information retrieval, precision is a measure of result relevancy, while recall is a measure of how many truly relevant results are returned. With the accuracy of the
The mAP provides a single-figure measure of quality across recall levels. Among evaluation measures of different object detection algorithms, mAP has shown to have excellent discrimination and stability. In this paper’s experiments, the number of training we set is 350 epochs, which means 70,000 steps. We have the mAP value as the
Figure
Figure
We can describe its detection effect with detection accuracy and model modeling power (FIOPs), and model size (parameters) to describe model complexity for a deep learning framework. FLOPs (floating-point operations,
We measured the mAP, FIOPs, and parameters of the Faster-RCNN (Resnet-50), SSD (VGG16.512 × 512), YOLOv3 (Darknet-53), and ERFAM-YOLOv3 (Darknet-11) involved in this paper. They are shown in Table
Statistics of accuracy and complexity of four algorithms.
Model | mAP (%) | Params (M) | FLOPs (G) |
---|---|---|---|
Faster R-CNN (Resnet50) | 24.63 | 43.435 | 742.473 |
SSD (VGG16, 512) | 45.01 | 28.516 | 91.545 |
YOLOv3 (Darknet-53) | 79.48 | 61.727 | |
ERFAM-YOLOv3 (Darknet-11) | 69.784 |
Use the matching of the ERF size and anchor size as the object detection network design’s entry point. The ERFAM-YOLOv3 enjoys significant advantages of high detection accuracy, being lightweight, and with low operation complexity compared with the traditional object detection algorithm. The benefits attributed to considering the internal reason as to why the CNN can perform various visual tasks. The area where a single pixel of the output layer is mapped to the original image is fixed, and only some pixels in this area are effective. The predefined object anchor, far smaller or larger than the ERF size, will bring low efficiency and low accuracy of detection. The ERF analysis method mentioned in the ERFAM-YOLOv3 can not only calculate and visualize the ERF of the three anchor distribution layers but also can be applied to any feature layer of the CNN for the ERF analysis, which, in turn, can form a hierarchical explanatory analysis. From a technological point of view, the modular network design provides packaged and encapsulated modules with independence, addability, removal, and core, which can be constructed by building blocks to complete recognition networks suitable for different size targets. To show that the module recombination based on ERF Sizer-Anchor Size matching can improve the detection effect, we made two comparisons with the electronic components on the PCB as the detection target. The first shows the differences between the ERF size and the anchor size in YOLOv3, ERFAM-YOLOv3, and the other three algorithms were formed after different modules are removed. The
Although the ERFAM-YOLOv3 obtains many significances, such as high accuracy, fast operation, and small parameters, there are still several potential limitations and challenges for further improving its effectiveness. Firstly, the ERF is currently the square root within a specific range of the activation area, so it is friendly to approximately square targets. The shape of the target to be detected is complicated. For example, the existence of a large number of narrow and long objects will limit the matching effect of the ERF size and the anchor size. Secondly, the ERF size is determined by the threshold of 95.45% in the activation map according to the normal distribution. How to adaptively select the threshold is still challenging. Fortunately, according to the current experiences, deformable kernel networks [
Inspired by the receptive field in biological neuroscience, this paper studied the stimulus (assign anchor) of a pixel in the anchor distribution layer of the YOLOv3, and uploaded it to the original image area by the weights that connect the front and back layers of the CNN, and determined the effective area size (ERF) that cause the stimulation. For the first time, matching the bottom-level anchor size and the top-level ERF size are related to the increase and decrease of the CNN modules. We use the ERF size and the anchor size matching YOLOv3 for the electronic component detection task on the PCB in electronic product manufacturing.
Based on the results presented in this paper, several contributions are of significance. Firstly, a novel ERFAM-YOLOv3 architecture for the object detection of electronic components on the PCB is proposed. Its backbone network, the Darknet-11, is 42 fewer convolutional layers compared to the YOLOv3’s backbone network, the Darknet-53. Secondly, the modular composition strategy of the YOLOv3 is designed. It provides the possibility of effective receptive field size changes. Thirdly, the effective receptive field size-anchor size matching algorithm is developed. It provides the feasibility of object detection adapted to different distribution sizes.
The evaluation experiment demonstrates that an effective receptive field size and anchor size matching algorithm based on YOLOv3 could achieve higher detection accuracy and lower model complexity than the other state-of-the-art methods.
The data used to support the findings of this study are available from the corresponding author upon request.
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This research was funded by the National Natural Science Foundation of China (No. 51875266) and in part by Henan Provincial Department of Education Key Discipline Project Funded Human-Machine and Environmental Engineering ([2018]119).