Hardware Decoding Accelerator of (73, 37, 13) QR Code for Power Line Carrier in UPIoT

The proposal of the ubiquitous power Internet of Things (UPIoT) has increased the demand for communication coverage and data collection of smart grid; the quantity and quality of communication networks are facing greater challenges. This brief applies (73, 37, 13) quadratic residue (QR) codes to power line carrier technology to improve the quality of local data communication in UPIoT. In order to improve the decoding performance of the QR codes, an induction method for the error pattern is proposed, which can divide the originally coupled error pattern into six parts and reuse the same module for decoding. This method greatly reduces the resource requirements, so that (73, 37, 13) QR code can be implemented on FPGA hardware. Notably, the hardware architecture is a modular framework, which can fit into an FPGA with different sizes. As an example (73, 37, 13), QR code is implemented on Intel Arria10 FPGA; the experimental result shows that the maximum decoding frequency of this architecture is 21.7 M Hz, which achieves 4121x speedup compared to CPU. Moreover, the proposed architecture benefits from high flexibility, such as modular design and decoding framework in the form of the pipeline which can be seen as an alternative scheme for decoding long-length QR codes.


Introduction
With the development of power systems, the types and quantities of electrical equipment are increasing rapidly, for example, Energy Storage (ES) is being widely adopted in the grid to meet the needs of intermittent power generation [1]. This brings a growing demand for information sharing [2] and analysing [3] between all nodes in the power grid system. Ubiquitous power Internet of Things (UPIoT) is proposed as an advanced technology in the construction of smart grids. It can integrate the combination of the Internet of Things (IoT) and the power business, efficiently integrate IoT and power system, and improve the information transparency of the power system. The "Ubiquitous" of UPIoT is embodied in the various nodes of the power system (transmission, transformation, distribution, and consumption), that means the real-time interconnection of people, machinery, power networks, and platforms can be achieved at any time [4].
UPIoT is proposed based on IoT, so it has a structure similar to the traditional IoT, including perception layer, network layer, platform layer, and application layer [5]. The perception layer is composed of sensors with different functions in each link and node of the power network. The network layer is mainly composed of the internet or the dedicated network, which is responsible for a large amount of information transmission. The platform layer is the data sharing and publishing center. The application layer provides users with data display and control signal transmission function. It is recognized that UPIoT mainly includes three parts [6] of communication, local communication, edge IoT gateway, and remote communication. As an intermediate carrier between the perception layer and the platform layer, the edge IoT gateway assumes the function of gathering data and the local edge fast computing. The remote network is usually implemented by 4G/5G and dedicated networks, while the local communication network can choose wired or microwave wireless networks due to its short distance. The types and quantities of data measured by local sensors in UPIoT will increase greatly with the development of power networks. Therefore, the local transmission network should have the following characteristics: (i) Two-way real-time communication capability (ii) Low power consumption and low cost (iii) Strong data compatibility Power line carrier communication (PLC) technology, as a local communication technology, can carry both power and data and has been widely used for high-speed data transmission, even in relatively remote areas [7]. The good characteristics of PLC, such as direct application of power line transmission data, no need to modify the wiring layout, easy implementation, and two-way communication, can meet the basic characteristics of the local transmission network. However, using power carrier line leads to a relatively large attenuation of carrier signal, and the damage to performance from multiple noises is still a challenge. In order to deal with the shortcomings, some researches have proposed Reed-Solomon (RS), Low Density Parity Check (LDPC) [8,9], Convolution Code (CC), and other advanced codes [10,11]. In addition, using modulation techniques such as Quadrature Frequency Shift keying (QFSK) and Orthogonal Frequency Division Multiplexing (OFDM) [12] can improve the reliability of the system.
The LDPC codes are selected as the channel coding schemes for the data channels in 5G new radios because of its excellent long code error correction performance. In local communication, most of the transmitted signals between the sensors and the IoT gateway are expected to be power status data and control signals, which can be regard as short data; the advantages of Low-Density Parity Check (LDPC) codes cannot reflect due to the dense and short communication data [8]. In recent years, with continuous development, the root-protograph (RP) LDPC code appeared to solve the interruption limitation of the BF channel due to its nontraversal characteristics [9]. The CC decoding has high computational complexity due to the Viterbi algorithm and the use of interleave causes significant end-to-end delay. Reed-Solomon (RS) codes, which are used in many PLCs, will cause a huge amount of calculation when the redundancy is large. Due to the insufficient error correction capability of the channel encoder, the performance of the receiver still cannot achieve satisfactory performance [10]. Therefore, we propose to use quadratic residue (QR) codes [13] to encode data and control signals in front of the PLC channel, and still can use more commonly used modulation methods such as OFDM; it can obtain the following advantages: (i) The requirement of transmission power can be reduced in the same environment (ii) Improve the reliability of communication and thereby improve communication efficiency The QR code gains powerful error correction capabilities by increasing the complexity of decoding; although the QR code decoding algorithm has been improved several times, it is still difficult to make a stand out in practical applications due to its unsatisfactory decoding speed. Based on the existing decoding algorithm, difference syndrome (DS) algorithm [14], and optimized decoding algorithm cyclic weight (OCW) [15], we propose a method suitable for digital circuit hardware implementation, which is named as error pattern induction (EPI). By using EPI, we reduce the (73, 37, 13) QR code into 6 parts so that it can use same pipeline processor, and (73,37, 13) QR code can be realized on limited hardware resources. At the same time, we implemented the hardware decoding architecture of (73, 37, 13) QR code on the Intel Arria10 10AX115-U4F45I1SG FPGA platform, achieving a maximum clock frequency (fmax) of 260.42 MHz, which is equivalent to 21.7 MHz decoding frequency.

QR Code and Decoding Algorithm
The QR code was proposed by E. Prange in 1958 and has excellent error detection and correction capability due to its large minimum Hamming distance [3]. The QR code is defined on the finite field GFð2 m Þ, and GFð2 m Þ is a finite field containing 2 m elements. The (73, 37, 13) QR code is constructed over GFð2 9 Þ, and its quadratic residue set is given by Let m be the smallest positive integer which makes ð2 m mod nÞ = 1, and for (73, 37, 13) QR code, m = 9. Let α ∈ G Fð2 9 Þ be a root of the primitive polynomial pðxÞ = x 9 + x 4 + 1, then α generates all nonzero elements in the finite field GFð2 9 Þ. Obviously, ð2 9 − 1Þ/73 = 7, so β = α 7 is the 73 rd root of unity in GFð2 9 Þ, and the generator polynomial of the (73, 37, 13) QR code is given by The message part of the (73, 37, 13) QR code is a vector of length 37 and can be expressed by a polynomial as mðxÞ = ∑ 36 i=0 m i x i . Then, the codeword can be expressed as let sðxÞ = rðxÞ mod gðxÞ = ∑ 35 i=0 s i x i , and the sðxÞ is called syndrome polynomial.
We will describe the decoding algorithms by integrating DS algorithm and OCW algorithm; of course, there are other decoding algorithms such as Berlekamp-Massey (BM) [16], Fast Algebraic Decoding Algorithm (ADA) [17], Syndrome and Syndrome Difference Decoding Algorithm (SSDDA) [18,19], all of them are implemented in software.
For simplicity, split codeword r to a message part r m and a parity part r p , and split decoded codeword d r to a message part dr m and a parity part dr p . Let e i be an error in the i th bit of r m , and the syndrome corresponds to e i is expressed as s i . The syndromes s i ð0 ≤ i ≤ 36Þ is shown in Table 1.
For a received codeword r, the syndrome s r of r is first calculated, then the weight Wðs r Þ of the s r is calculated, and the codeword can be decoded by the following steps.
Step 1. If Wðs r Þ ≤ 6, all errors of the received codeword r are in r p . Then, the decoded message part dr m = r m , and the decoding is completed at this time. If Wðs r Þ > 6, there is at least one error existing in the message part r m .
Step 2. Calculate s ri = s r ∧ s i ð0 ≤ i ≤ 36Þ where '∧' means the bit operation XOR, and calculate the weight Wðs ri Þ of s ri . If i exists in the range of 0 ≤ i ≤ 36 such that Wðs ri Þ ≤ 5, there is only one error in r m , and we can obtain the decoded message part dr m = r m ∧ e i . Then, the decoding is completed at this time, and the calculation complexity of this step is C 1 37 . If i does not exist such that Wðs ri ≤ 5Þ, there are at least two errors in the message part r m .
Step 3. Calculate s rij = s r ∧ s i ∧ s j ð0 ≤ i < j ≤ 36Þ, and calculate the weight Wðs rij Þ of s rij . If i, j exist in the range of 0 ≤ i < j ≤ 36 such that Wðs rij Þ ≤ 4, there are two errors in r m , and we can obtain decoded message part dr m = r m ∧ e i ∧ e j . Then, the decoding is completed at this time, and the calculation complexity of this step is C 2 37 . If i, j does not exist which makes Wðs rij ≤ 4Þ, there are at least three errors in r m .
Step 4. Calculate s rijk = s r ∧ s i ∧ s j ∧ s k ð0 ≤ i < j < k ≤ 36Þ, and calculate the weight Wðs rijk Þ of s rijk . If i, j, k exist in the range of 0 ≤ i < j < k ≤ 36 such that Wðs rijk ≤ 3Þ, there are three errors in r m , and we can obtain decoded message part dr m = r m ∧ e i ∧ e j ∧ e k . Then, the decoding is completed at this time, and the calculation complexity of this step is c 3 37 . If i, j , k does not exist which makes Wðs rijk ≤ 3Þ, there are at least four errors r m .
Step 5. Cyclic shift r by 36 bits to left, the result of shifting is expressed as r ′ . Then, calculate the syndrome s r ′ of r ′ , and calculate the weight Wðs r ′Þ of s r ′. Repeat steps 1~4, then 4~6 errors in r m can be correct. After these, the decoding is completed. We can infer that the total computational complexity of the DS algorithm is ðC 1 37 + C 2 37 + C 3 37 Þ * 2.

Journal of Sensors
The simulation of QR code is indispensable, 37-bit random message is generated and encoded by (73, 37, 13) QR code. Then, calculate the decoding performance after mixing the encoded signal with Gaussian white noise. The simulation results of different bit error rate (BER) performance are shown in Figure 1. We also show the curves of the two RS codes in [20] and the EPI algorithm proposed in this paper. From Figure 1, we can see that QR codes have better performance than RS codes, and QR codes will also have better performance in practical applications. The specific EPI algorithm will be explained in Section 3.

Error Pattern Induction Method
In order to improve the decoding speed and ensure the controllability of the decoding time, it is necessary to calculate all the decoding possibilities, then the decoding could be completed within a fixed time. However, the long codeword length of (73, 37, 13) QR code leads to high decoding complexity, and decoding with a full parallel module will result in a great amount of resource consumption. This decoding framework summarizes the error patterns to implement module multiplexing based on DS algorithm; then, we can reduce resource consumption and hardware burden, and we can make it possible to implement in low-cost hardware.
3.1. Error Pattern Induction and Classification. Section 2 shows that when there are three errors in r m , the decoding complexity is the highest, so we consider to optimize it firstly.
3.1.1. Three-Error Patterns. The number of three-error patterns is c 3 37 , a total of 7770 kinds, which can be decomposed into several parts, and we call them as inductive error patterns; the inductive error patterns are shown in Table 2. From Table 2, there are 210 kinds of inductive error pattern. Cyclic shift each pattern to the left by mð1 ≤ m ≤ 36Þ bits so that we can obtain all 210 * 37 kinds of three-error patterns, and these patterns are further decomposed to the form of 7770.
At this point, the first decomposition is completed, but due to the uneven distribution of the number of errors, the hardware implementation will result in different pipeline lengths, so we need to process the inductive error patterns.   As is shown in Table 3, we combine the inductive error patterns and divide them into 6 groups to make the patterns evenly distributed, and each combination pattern group has 35 inductive error patterns. We call the inductive patterns in the same group as combination patterns and implement them in same pipeline. Then, the three-error patterns are decomposed into the form as C 3 37 = 7770 = 6 * 35 * 37.

Two-Error
Patterns. The number of two-error patterns is d f lag in = 0, a total of 666 kinds can be decomposed into the 18 kinds of inductive patterns, expressed as e 0 e n ð1 ≤ n ≤ 18Þ. Then, we cyclically shift each pattern to the left by m ð1 ≤ m ≤ 36Þ bits so that we can get all 18 * 36 kinds of two-error patterns.
Since the three-error patterns are divided into 6 groups, we also combine two-error patterns into 6 groups to maintain the consistency of the pipeline, and each group has 3 two-error inductive patterns. The two-error combination patterns are shown in Table 4.
At this point, the two-error patterns are decomposed into the form as C 2 37 = 6 * 3 * 37.

Single-Error
Pattern. The number of single-error patterns is C 1 37 , a total of 37 kinds. It is the minimum error correction unit and can be grouped without decomposing. Since the number 37 cannot be evenly divided, the single-error patterns are divided into 6 groups. Each group has 7 single-error patterns, and no operation is performed in redundant part to ensure the consistency of the pipeline. The single-error division groups are shown in Table 5.

Error Pattern Traversal and Decoding Framework.
Unlike the DS algorithm, the priority of the memory footprint optimization in the FPGA application is not high. Therefore, we propose a decoding framework that separates decoding operation into an error pattern traversal part and a codeword decoding part. We use memory cells to reduce register consumption, which is feasible in FPGA designs.
The specific operation of this decoding framework is shown in Section 4. After the preprocessing operation, the codeword r/r ′ and the flag bit are stored in First Input First Output (FIFO) memory. Then, the codeword r/r ′ is calculated to obtain the initial syndrome s r , and we can traversal of all error patterns by s r . When traversing, if the corresponding condition is met (see Section 4 for details), the decoding flag bit denoting as 'd_flag' will be set to 1, and the current error pattern denoting as 'ep1', 'ep2', and 'ep3' will be recorded to the register 'e_reg_out1', 'e_reg_out2', and 'e_ reg_out3'. When the error pattern traversal is completed, decoding will be performed according to the decoding flag bit, the recorded error pattern positions, and the original codeword stored in FIFO.
This framework reduces the register consumption of each stage of the pipeline from store 73 bits of entire codeword to store 18 bits of the three error pattern positions (each error pattern position is 6 bits), which greatly reduces the register consumption. The registers saved here can be used for other intermediate amounts of storage to increase the speed of the entire framework.

Hardware Architecture Design
According to Section 3, the operations before and after the cyclic shift are the same, so we can implement these two operations in same hardware framework by preprocessing and postprocessing of the received codeword, then we can realize pipeline multiplexing to improve resource utilization and reduce resource consumption. All the error patterns are summarized and divided into 6 parts, then the decoding Table 3: Three-error combination.

Group
Three-error combination patterns Numbers

Code Words Preprocessing
Module. Preprocess the input codeword r by the cyclic shift operation, the original codeword r, and the cyclic shifted codeword r ′ are given by The specific structure is shown in Figure 2; the current data is determined according to the state of preprocessing flag denoted as 'pre_flag'(0/1); when pre f lag = 0, the current data is original codeword r; when pre f lag = 1, the current data is cyclic shifted codeword r ′ . Then, the data sequence r, r ′ are input into initial syndrome generation unit one by one. Note that in order to make the figures clear, the clock signal is not shown in the figures below.

Weight Calculation
Unit. The weight calculation unit can accumulate the sum of s i ð∑ 35 i=0 s i Þ (as shown in Figure 3). In order to reduce the path delay and increase the operation frequency, we divide the S_in into 6 parts and complete the sum operation in 2 steps.
In addition, a module identification signal named "Condition" is introduced to make the weight calculation unit to adapt to different modules.
In initial syndrome generation module, "flag_out" will be enabled when WðS in Þ ≤ 6.

Syndrome Generation
Unit. The syndrome generation unit has two kinds of units, one for s r and another for s ri , s rij , s rijk .
The calculation of initial syndrome is given by Other syndrome calculations are given by where i, j, k are the error positions of r m . The specific structure is shown in Figure 4. The initial syndrome s r is calculated by formula (6) according to r m , r p and syndromes s 0~s36 . Then, the other syndromes s ri , s rij , and s rijk can be calculated by s r , s i , s j , and s k .
According to the algorithm, when the calculation of initial syndrome s r is completed, the weight Wðs r Þ of the initial syndrome s r is calculated firstly. If Wðs r Þ ≤ 6, the decoding flag bit 'd_flag_init' is set to 1, the register 'e_reg_out3' of the current error pattern is assigned to "40", which can provide an indication for postprocessing unit to decoding.

Error Traversal
Units. The error traversal units are divided into three types according to the number of errors in r m , which are single-error traversal units, two-error traversal units, and three-error traversal units. In order to reduce the consumption of registers in this design, after each traversal unit calculates the weight of the new syndrome, only the current decoding flag bit 'd_flag_out', and the current error pattern 'ep1', 'ep2', 'ep3' are recorded instead of direct decoding. After all pattern traversal is completed, decoding is performed by decoding flag bit and the recorded error patterns.

Single-Error Traversal Units.
The specific structure is shown in Figure 5. According to the input error pattern position 'ep1', calculate s r,ep1 = s r ∧ s ep1 , and calculate the weight Wðs r,ep1 Þ of s r,ep1 , then the flag bit 'd_flag_out' will be given according to Wðs r,ep1 Þ. Regardless of whether the current error pattern meets the decoding condition, the current error pattern position 'ep1' is recorded to 'ep1_out', which simplifies the logic and accurately records the error pattern.  Journal of Sensors

Three-Error Traversal Units.
The specific structure is shown in Figure 7. According to the input error pattern positions 'ep1', 'ep2', 'ep3', calculate s r,ep1,ep2,ep3 = s r ∧ s ep1 ∧ s ep2 ∧ s ep3 , and calculate the weight Wðs r,ep1,ep2,ep3 Þ of s r,ep1,ep2,ep3 . Then, the flag bit 'd_flag_out' will be given according to Wð s r,ep1,ep2,ep3 Þ. At the same time, the current error pattern          Table 3. After combining, the number of each combination pattern is 35. At this time, module multiplexing can be performed by controlling the value of the input parameter 'pctr'. In each clock, we traverse 35 kinds of inductive error patterns; all three-error patterns will be finished in 6 clocks (as is shown in Table 6).
Change the input parameter 'pctr' ð1 ≤ pctr ≤ 6Þ to control the switching of the combination patterns, and according to the offset parameter 'poff' ð0 ≤ pof f ≤ 34Þ, we can determine the initialization pattern e init1 e init2 e init3 to construct the basic pipeline unit. The calculation method of the initialization parameter is given by   Journal of Sensors error pattern of the previous stage traversal unit does not meet the decoding condition, and the error pattern traversal result of this level is considered. The specific structure of the basic pipeline unit is shown in Figure 8(a). The initial error positions 'e_init1', 'e_init2', 'e_init3' are determined according to the parameters 'pctr' and 'poff'. Then, modulo 36 plus lð0 ≤ l ≤ 36Þ bits, respectively, with initial error positions to obtain new error patterns for traversal. Combined with the input initial syndrome s r , 37 error traversal units are obtained to form a basic pipeline unit. After traversing these error patterns, the 37 results of traversal are judged and selected. If the error patterns in this unit satisfy the decoding condition, the 'd_flag_out' is assigned to 1, and the current error positions are output to the error pattern record registers 'e_reg_out1', 'e_reg_out2', 'e_reg_out3'. Then, the input initial syndrome s r is passed to the next level for traversal. 35 three-error traversal pipeline units are connected step by step to form a complete three-error traversal module. The specific structure is shown in Figure 8(b). The initial syndrome s r , parameter 'pctr', decoding flag 'd_flag_out' and error pattern record register 'e_reg_out1', 'e_reg_out2', 'e_ reg_out3' are passed in stages to ensure the integrity of the error pattern traversal and the stability of the data.

Two-Error Traversal Module.
In the three-error traversal pipeline unit, all three-error patterns have been distributed by 6 clocks, so we also distribute all two-error patterns into 6 clocks. In each clock, we traverse 3 inductive two-  Journal of Sensors error patterns, all two-error patterns will be finished in 6 clocks (as shown in Table 6).
Change the input parameter 'pctr' ð1 ≤ pctr ≤ 6Þ to control the switching of the combination patterns, and according to the offset parameter 'poff' ð0 ≤ pof f ≤ 3Þ, we can determine the initialization pattern e init1 e init2 to construct the basic pipeline unit. The calculation method of the initialization pattern is given by According to the input decoding flag bit 'd_flag_in', if d f lag in = 1, the error pattern of the previous stage traversal unit has met the decoding condition, set 'd_flag_out' to 1, and register 'e_reg_in1', 'e_reg_in2', 'e_reg_in3' directly to 'e_reg_out1', 'e_reg_out2', 'e_reg_out3'; if d f lag in = 0, the error pattern of the previous stage traversal unit does not meet the decoding condition, and the error pattern traversal result of this level is considered.
The specific structure is as shown in Figure 9(a), and the initial error positions 'e_init1', 'e_init2' are determined according to the parameters 'pctr' and 'poff'. Then, modulo 36 plus lð0 ≤ l ≤ 36Þ bits, respectively, with initial error positions to obtain new error patterns for traversal. Combined with the input initial syndrome s r , 37 error traversal units are obtained to form a basic pipeline unit. After traversing these error patterns, the 37 results of traversal are judged and selected. If the error patterns in this module satisfy the decoding condition, the 'd_flag_out' is assigned to 1, the current error positions are output to the error pattern record registers 'e_reg_out1', 'e_reg_out2', and the value of 'e_reg_ out3' is set to 42, providing an indication for the postprocessing unit. Then, the input initial syndrome is passed to the next level for traversal.
Three two-error traversal pipeline units are connected step by step to form a complete two-error traversal module. The specific structure is shown in Figure 9(b). The initial syndrome, parameters 'pctr', 'poff', decoding flag 'd_flag_out', and error pattern record registers 'e_reg_out1', 'e_reg_    Table 6). Change the input parameter 'pctr' ð1 ≤ pctr ≤ 6Þ to control the switching of the combination patterns, we can determine the initialization pattern e init1 to construct the basic pipeline unit. The calculation method of the initialization pattern is given by According to the input decoding flag bit 'd_flag_in', if d f lag in = 1, the error pattern of the previous stage traversal unit has met the decoding condition, set 'd_flag_out' to 1, and register 'e_reg_in1', 'e_reg_in2', 'e_reg_in3' directly to 'e_reg_out1', 'e_reg_out2', 'e_reg_out3'; if d f lag in = 0, the error pattern of the previous stage traversal unit does not meet the decoding condition, and the error pattern traversal result of this level is considered.
The specific structure is as shown in Figure 10, and the initial error position 'e_init1' is determined according to the parameter 'pctr'. Then, modulo 36 plus lð0 ≤ l ≤ 36Þ bits, respectively, with initial error position to obtain new error patterns for traversal. Combined with the input initial syndrome s r , 7 single-error traversal units are obtained to form a complete single-error traversal module.
After traversing the error patterns, the 7 results of traversal are judged and selected. If the error patterns satisfy the decoding condition in this module, the 'd_flag_out' is set to 1, the current error positions are output to the error pattern record registers 'err_reg_out1'. And 'err_reg_out2' is set to 0, 'err_reg_out3' is set to 41, which provide indications for postprocessing unit. Then, the input initial syndrome s r is passed for final decoding.
4.6. Decoding and Postprocessing Unit. Read the data stored in the FIFO, denoted as 'r_delay' and 'pre_flag_delay', and 'r_delay' contains a message part 'm_delay' and a parity part 'p_delay'. According to the decoding status flag bit 'd_flag1', error pattern record registers 'e_reg_out11', 'e_reg_out12', 'e_reg_out13', preprocessing status flag bit 'pre_flag_delay' and the pipeline passed syndrome 's_delay', the decoding can be completed. Denote the decoded codeword message part as 'm_decode' and denote decoded codeword parity part 'p_decode'. The operations of decoding are shown in Table 7.
After the decoding is completed, postprocessing is required to restore the original codeword order. The decoded codeword is cyclically shifted and restored according to the preprocessing status register 'pre_flag_delay', thereby we can obtain the final decoded codeword 'm_final' as At this time, the decoding is completed, 'm_final' is the final decoded codeword, and the specific structure is as shown in Figure 11. 4.7. Pipeline Architecture. The preprocessing module, the initial syndrome generation module, the three-error traversal module, the two-error traversal module, the single-error traversal module, and the decoding and postprocessing module are connected step by step to form a 42-stage pipeline. The specific architecture is shown in Figure 12 4.8. Hardware Implementation Results. The experiment and verification are divided into two parts, software verification and hardware verification. Software verification tests the performance of QR codes while hardware verification tests the acceleration effects of QR codes. Software verification part is carried out on the Personal Computer (PC), which is also mentioned in Section 2. The verification software runs in VS2019, Intel core I5-6500, windows 10 environment, and the decoding time is shown in Table 8. In order to show the difference between various decoding algorithms, we have recorded the decoding error correction time of various decoding algorithms under different number of errors. There is no doubt that the more errors an encoded message has, the longer decoding time cost, and the unstable decoding delay will have an impact on the real-time performance of the IOT device. In addition, the software experiment content also included the average decoding time of various decoding algorithms by calculating average based on the probability distribution of the number of errors, the results can be used as a  The hardware verification framework is shown in Figure 13. Random messages and noise are generated by ROM lookup table. The sum of encoded message and noise will input into the decoding accelerator. The BER counter records the error rate by comparing the decoding result with the original message.
The decoding accelerator is deployed in Intel Arria10 10AX115U4F45I1SG FPGA; the results is given in the Table 9.
In this decoding framework, all error patterns are evenly distributed into the pipeline, so the decoding time is fixed. A complete decoding requires 125 clocks, the traverse modules cost 12 clocks, 6 clocks to traverse all patterns, and 6 clocks repeated after shifting. Therefore, after 12 clocks of pipeline processing, a valid decoding result can be obtained. At the operation frequency of 260.42 M Hz, every 12 clock outputs can be equivalent to a decoding speed of 21.7 M Hz, the decoding time is 46.07 ns, and the decoding time required of each condition is shown in Table 8. Compared with PC, hardware decoding acceleration can obtain 4121 times of decoding speed increase and can maintain a constant decoding time.

Conclusion
This article introduces the feasibility and advantages of QR code in power carrier technology based on the concept of    Figure 12: Pipeline architecture.

Journal of Sensors
UPIoT, and according to its shortcomings (decoding complexity and defects with long decoding time) proposed a solution. By improving the DS decoding algorithm, the error pattern is split according to the characteristics of the inductive combination of error patterns, the proposed Error Pattern Induction method has been simulated to prove that the performance is not lost and is better than RS code. This article also analyses the hardware feasibility of separating error pattern traversal and decoding operations and implements an FPGA-based hardware decoding architecture on this basis, on the Intel Arria10 10AX115U4F45I1SG FPGA platform, an equivalent decoding frequency of up to 21.7 MHz is realized, which is 4121 times the decoding speed of software. To our best knowledge, the hardware decoding architecture proposed in this paper is the first hardware implementation of (73, 37, 13) QR code decoding architecture. While providing a new implementation scheme for PLC error correction coding based on UPIoT, this architecture also shows that QR codes with longer codewords can also be quickly decoded by our hardware frame.

Data Availability
Our data is not public because of confidentiality, readers can contact hjynet@hdu.edu.cn for available data.

Conflicts of Interest
The authors declare that they have no conflicts of interest.