Zynq-Based Reconfigurable System for Real-Time Edge Detection of Noisy Video Sequences

We implement Zynq-based self-reconfigurable system to perform real-time edge detection of 1080p video sequences. While object edge detection is a fundamental tool in computer vision, noises in the video frames negatively affect edge detection results significantly. Moreover, due to the high computational complexity of 1080p video filtering operations, hardware implementation on reconfigurable hardware fabric is necessary. Here, the proposed embedded system utilizes dynamic reconfiguration capability of Zynq SoC so that partial reconfiguration of different filter bitstreams is performed during run-time according to the detected noise density level in the incoming video frames. Pratt’s Figure of Merit (PFOM) to evaluate the accuracy of edge detection is analyzed for various noise density levels, and we demonstrate that adaptive run-time reconfiguration of the proposed filter bitstreams significantly increases the accuracy of edge detection results while efficiently providing computing power to support real-time processing of 1080p video frames. Performance results on configuration time, CPU usage, and hardware resource utilization are also compared.


Introduction
Heterogeneous embedded systems have proliferated in the Internet of Things era, and stream-based applications for multimedia services are widely used in various types of portable devices.These applications are data-intensive and need high computing capability to meet the required throughput and stringent real-time constraints.On the other hand, low power consumption is also as important as throughput for these portable embedded systems since they operate often in the energy constrained environments [1].
Recently, Zynq SoC (System on Chip) platform including ARM dual-core Cortex-A9 processor with FPGA fabric has been used in the embedded systems (e.g., Advanced Driver Assistance System) to implement computationally complex signal processing algorithms by utilizing both of SW flexibility of ARM processor and parallel processing capability of reconfigurable hardware fabric [2].For sensor processing and tracking, object classification, and assessment, edge detection is a fundamental tool in computer vision applications.However, 1080p-resolution video processing for object edge extraction in real-time is highly time consuming and becomes computational bottleneck for ARM processors.Therefore, algorithm migration onto the FPGA fabric has been preferred to meet the performance requirements.The previous studies [3,4] had designed drawback such as inflexibility of the implemented hardware architecture and lack of adaptation capability for time-varying incoming video signals while partial reconfiguration technique on FPGA can be highly desirable to solve these problems.In particular, as shown in the previous studies [5], corrupted images with unwanted salt-and-pepper noises caused by faulty memory cells, sensing error in the analog-to-digital conversion, or bit error in transmission degrade the performance of edge detection filters significantly.
In this paper, we implement the adaptive partial reconfigurable system to maximize the output performance of the implemented edge detection filter.By detection of noise density levels in the incoming video, adaptive selfreconfiguration of hardware bitstream is performed in realtime in order to remove unwanted noises before edge extraction process.For efficient utilization of hardware resources,  bitstream of Median filter with the following Sobel edge detection is also mapped to the same hardware region on FPGA.The rest of this paper is organized as follows.Section 2 introduces Zynq SoC platform, and adaptive reconfiguration approach is proposed in Section 3. In Section 4, experimental results are discussed, and we conclude this paper in Section 5.

Zynq SoC Platform
Zynq-7000 AP SoC platform is hybrid FPGA platform combining high-performance ARM processor with FPGA fabric.Figure 1 shows Zynq internal architecture.It consists of Processing System (PS) and Programmable Logic (PL).The PS consists of ARM dual-core Cortex-A9 MP core, Caches, DMA controller, and various built-in peripherals such as USB, UART, SPI, CAN, and I2C.The PL attached to the PS through AMBA AXI ports possesses a number of hardware resources: Configurable Logic Blocks (CLB), Digital Signal Processing (DSP) Blocks, two 12-bit analog-todigital converters, and serial transceivers [6].
Advanced Microcontroller Bus Architecture (AMBA) is used for the connection of functional blocks in a System on Chip (SoC) [7].Particularly, Advanced eXtensible Interface (AXI) high-performance slave ports with configurable 32-bit or 64-bit data width and AXI general-purpose master/slave ports with 32-bit data width are available for PS-PL interface.Therefore, it ensures that user-defined functional blocks on programmable logic region can easily exchange data with one another and data can be transferred across the system [8].
Partial Reconfiguration (PR) plays a critical role for enhancing FPGA adaptability by allowing specific region of the FPGA to be reconfigured dynamically with a new bitstream while the remainder of the FPGA continues to run.Specific regions called Partially Reconfigurable Regions (PRRs) are used to implement the bitstreams of Partial Reconfigurable Modules (PRMs).Throughout the design approach using partial reconfiguration, many advantages over traditional full configuration can be provided, such as reduction of hardware resource utilization and reconfiguration time overhead, increased scalability, reduced system downtime, and less storage size required.Additionally, reducing the size of the FPGA can lead to reduction of cost and power consumption [9].Therefore, partial reconfiguration techniques have been widely studied in different domains to provide benefits such as enhanced quality, performance, and reliability of the systems [10][11][12].
In order to access FPGA's configuration memory for partial reconfiguration, different types of interfaces, such as JTAG, SelectMAP, and ICAP (Internal Configuration Access Port), have been offered by Xilinx FPGAs.While external reconfiguration controller outside the FPGA is needed for JTAG and SelectMAP, ICAP enables internal access within the FPGA, supporting self-reconfiguration approaches.Therefore, ICAP interface has been widely used together with soft-IP processor (Xilinx Micro Blaze) or hard-IP processor (IBM PowerPC), and many studies on new interface for ICAP have been performed to enhance reconfiguration speed [13,14] or reduce hardware resources required [15].For Zynq SoC, additional interface, called Processor Configuration Access Port (PCAP), is provided to enable PS to configure PL region [16].
In this paper, partial reconfiguration of the proposed filter bitstreams is performed through 32-bit PCAP interface which is clocked at 100 MHz and can support up to 400 MB/s  download throughput.PR interface using PCAP is shown in Figure 2. First Stage Boot Loader (FSBL) read from external SD card boots up PS and configures the PL with the full bitstream via the PCAP, and user application loads the partial bitstream into DDR memory later on.From this moment, software-controlled partial reconfiguration is enabled to dynamically reconfigure part of the PL with the bitstream of preimplemented IP core [17].

Proposed Approach
For real-time object edge extraction of 1080p-resolution video frames, Sobel filter has been implemented on PL region.As an orthogonal gradient operator, Sobel operators shown in Figure 3 are used to perform 2-dimensional convolution in every pixel of the incoming video frame.(, ) is the pixel value at location (, ) [18]  1 (, ) = The gradient vector magnitude and direction are given by Typically, salt-and-pepper noise is caused by defective sensors, faulty memory cells, and bit error in transmission, and it degrades the performance of edge detection filter significantly.In this paper, we implement the noise detection algorithm proposed in [19] and briefly describe it as follows.
(, ) is a pixel value to be processed, and 3 × 3 window with (, ) as the center location is considered. min and  max are minimum and maximum pixel values of 3 × 3 window.Then, thresholds  min and  max are defined as Equation ( 4) is used as a criterion to determine whether (, ) is a corrupted noise pixel or not Then, the noise density is measured as the total number of detected noise pixels divided by the total number of pixels in a given video frame.
Since the edge detection performance decreases significantly in the corrupted image by the salt-and-pepper noise, Median filter is implemented for denoising as in M (, ) = Median {   ,  : (  ,   ) ∈  (, )} .(5) Here, Median value of neighboring pixels in window  is selected as output [20].
The proposed self-reconfiguration method replaces PRMs to the Sobel edge detection after preprocessing Median operator (hereafter referred to as the Median + Sobel filter) which is effective for noise reduction when the salt-and-pepper noise is added to the video frame.
Pratt's Figure of Merit (FOM) to evaluate the accuracy of detected edge in noisy image is used to determine corresponding threshold of noise density level.As performance of edge detection accuracy is deteriorated, partial reconfiguration process is triggered to reduce noise before edge filtering.Pratt's FOM is defined by Here,   = max(  ,   ),   is the number of edge points in the ideal edge,   is the number of edge points in the detected edge,  is a calibration constant, and  is the distance between the detected and the ideal edges [21].The distance "" is important factor in the evaluation of edge detection using PFOM.The factor "" is inversely proportional to the factor .For a stained edge, the distance "" between ideal and detected edge increases and factor  is reduced.Figure 4 shows video pipeline architecture and noise detection task.The 1080p video frames from HDMI-IN are stored in DDR memory.The implemented filter mainly consists of three subfunctions that are filtering process, edge detection process, and bus interface to control input and output of video.Video DMA (VDMA) reads video frames from DDR memory and sends them to the filter engine.The AXI4-Stream interfaces are connected through VDMA and AXI interconnect block to the high-performance slave port of the PS [22].The output frame from the filter engine is stored back into DDR memory and then stored frame is sent to the display controller for HDMI-OUT.Synthesized results show that pipelined edge detection process has 9 clock cycles of latency, and total processing time requires 2,059 clock cycles.
Noise density level detection is performed to trigger partial reconfiguration of Median + Sobel filter.Partial bitstreams for filter operations are loaded from SD card into DDR memory by the user application running on the PS.It improves the reconfiguration time and also takes advantage of caching.Then, the application can use partial bitstreams to modify the partially reconfigurable region in PL without interrupting the rest of the PL area.Partial reconfiguration of Sobel or Median + Sobel bitstreams from DDR memory to the predefined PL region is performed through the PCAP interface.If the measured noise density becomes higher than the threshold, then Median + Sobel bitstream is configured to the partially reconfigurable region (PRR) to replace Sobel bitstream.
For Zynq SoC, an AXI-PCAP bridge consisting of "transmit" and "receive" FIFO buffers between the AXI and the PCAP interface is implemented in the Device Configuration interface (DevC) of the PS.This bridge converts 32-bit AXI formatted data to 32-bit PCAP protocol, and a DMA engine transfers data between the FIFOs and the DDR memory for partial reconfiguration.A DevC driver function, built on top of sysfs, is called to move data across the PCAP interface through initiating the DMA transaction and then waits for an interrupt signaling that the transfer is completed.When both AXI and PCAP transfers are finished, then the function call returns.The application does not need to know about physical location of partially reconfigurable region because partial bitstream has the configuration frame addressing information.The filter in PL region is reset before transferring a partial bitstream via DevC/PCAP.When the bitstream transfer is completed, the reset is released and the configured filter is restarted with VDMA.Our measurements show that up to 5 frames of incoming video can be dropped during the partial reconfiguration process.

Experimental Results
Devices used in the experiment are ZC702 evaluation board with XC7Z020 AP SoC, FMC module equipped with HDMI input/output based on ADV7611/ADV7511, and 1920 × 1080 resolution monitor.ZC702 board is controlled through UART Terminal Emulator running on PC [23].Figure 5 shows experimental setup for implementation of the proposed reconfigurable edge detection system.The Boot binary file booting the ZynqSoC consists of Zynq FSBL created in the SDK tool, full bit file generated in the Vivado, and Uboot called second stage boot loader.The compressed kernel image, that is, uImage, supports linux operating system on the target board [24].The partial bit files are initially stored in SD card and read to DDR memory for PS to perform PR through the PCAP interface.The target board utilizes uramdisk as the root file system.The Software Development Kit (SDK) tool is used to create linux application on the processor to perform the operation of the proposed method and partial reconfiguration.
In this paper, the proposed Sobel and Median + Sobel filter blocks are implemented by High-Level Synthesis (HLS) Tool [25,26].The HLS tool transforms C language, C++, and SystemC into a RTL implementation, and also offers the pipelining of the function through GUI interface.
Vivado Integrated Design Environment (IDE) is a development tool to provide Xilinx Integrated Synthesis Environment (ISE) and Xilinx Platform Studio (XPS).It is used to analyze and synthesize the HDL designs and perform timing analysis.Figure 6 shows the overall procedure of full and partial bitstream generation.The HDL design description of the system and the IP cores generated by the HLS tool are synthesized.Then, as shown in Figure 7, we floorplanned partially reconfigurable region so that hardware resources required for implementation of PRMs are less than 90% of the total amount of the PRR hardware resources.The hardware resource comparison of PRR and PRMs is summarized in Table 1.
As a result, one full bitstream and two partial bitstreams are generated.The PL system is initially configured with a full bitstream including static logic and Sobel filter.If detected noise density level becomes higher than threshold, partial bitstream of Median + Sobel is used to reconfigure the PRR   region.If its level becomes lower than threshold, partial bitstream of Sobel is read again from DDR memory for runtime reconfiguration.In this paper, new reconfiguration interface called PCAP (Processor Configuration Access Port) available on Xilinx Zynq SoC is explored to perform partial reconfiguration of filter bitstreams using ARM Cortex-A9 processor.While theoretical speed of reconfiguration for 32-bit PCAP interface clocked at 100 MHz is 400 MB/s, practical reconfiguration speed is much lower due to the internal ARM interconnect architecture.Several examples using filter bitstream with JTAG, ICAP, and PCAP interfaces are shown in Table 2.
Due to the design approach using partial reconfiguration, we could achieve significant reduction of both bitstream file size and reconfiguration time through PCAP interface.As shown in Table 3, partial reconfiguration time is reduced to 12% of the full configuration time, and system downtime to replace the function of the proposed filter engine is not necessary any longer.As shown in Figures 8(g) and 8(h), Median + Sobel filter is highly effective for the noisy video sequences #1 and #2.Its edge detection results are significantly improved both subjectively and objectively.For objective evaluation of edge detection results, PFOM is used to compare the performance of Sobel and Median + Sobel filters [31,32].Video sequences #1 and #2 are corrupted by salt-and-pepper noise with 5% and 10% noise density level.Since Median operator removes the salt-and-pepper noise in the corrupted video frames, performance analysis of two filter engines shows that Median + Sobel filter provides about 14 to 20 times improvement of PFOM as indicated in Figure 9.
In Figure 10, frame rates supported by Sobel and Median + Sobel filters are indicated, and run-time CPU usage of hardware and software filter implementations is measured in Figure 11.While 100% of CPU power is used for S/W implementation of Sobel filter, frame rates drop significantly down to 1.5 fps, indicating software implementation is not suitable for real-time processing of 1080p video frames.Here, camera controller supports 60 input frames per second.Due to the additional computational complexity, H/W Median + Sobel filter supports up to 29 frames per second (fps), about 1 frame less than Sobel H/W implementation (30 frames per second).
After PR, the power consumption of hardware platform on Xilinx Zynq FPGA is estimated using Power Report in Vivado Design Suite [33].
As shown in Figure 12, the power consumption of Median + Sobel filter is higher than Sobel filter because Median + Sobel filter requires more hardware resources in FPGA than Sobel filter.

Conclusion
In this paper, we propose adaptive partial reconfigurable system to maximize the output performance of the implemented edge detection filter.Hardware implementation of filter engine onto the FPGA fabric provides computing capability of real-time edge detection of 1080p video sequences.
According to detection of noise density levels in the incoming video, adaptive self-reconfiguration of hardware bitstream is performed during run-time and it enables significant performance improvement in both subjective and objective results.Experimental results show that partial reconfiguration time is reduced to 12% of the full configuration time, and about 14 to 20 times improvement of PFOM is achieved.

Figure 4 :
Figure 4: Video pipeline and noise detection task.

Figure 6 :
Figure 6: The procedure of bitstream generation.

Figure 7 :
Figure 7: Static logic and PR modules.

Figure 8 :
Figure 8: Comparison of edge detection results for the noisy video sequences.

Table 1 :
Comparison of PRR and PRMs resources.

Table 3 :
Bitstream size and configuration time.