A Heterogeneous Image Fusion Method Based on DCT and Anisotropic Diffusion for UAVs in Future 5G IoT Scenarios

Unmanned aerial vehicles, with their inherent fine attributes, such as flexibility, mobility, and autonomy, play an increasingly important role in the Internet of Things (IoT). Airborne infrared and visible image fusion, which constitutes an important data basis for the perception layer of IoT, has been widely used in various fields such as electric power inspection, military reconnaissance, emergency rescue, and traffic management. However, traditional infrared and visible image fusion methods suffer from weak detail resolution. In order to better preserve useful information from source images and produce a more informative image for human observation or unmanned aerial vehicle vision tasks, a novel fusion method based on discrete cosine transform (DCT) and anisotropic diffusion is proposed. First, the infrared and visible images are denoised by using DCT. Second, anisotropic diffusion is applied to the denoised infrared and visible images to obtain the detail and base layers. Third, the base layers are fused by using weighted averaging, and the detail layers are fused by using the Karhunen–Loeve transform, respectively. Finally, the fused image is reconstructed through the linear superposition of the base layer and detail layer. Compared with six other typical fusion methods, the proposed approach shows better fusion performance in both objective and subjective evaluations.


Introduction
Internet of Things (IoT) has attracted extensive attention in academics and industry ever since it was first proposed. IoT aims at integrating various technologies, such as body domain network systems, device-to-device (D2D) communication, and unmanned aerial vehicles (UAVs) and satellite networks, to provide a wide range of services in any location by using any network. This makes it highly useful for various civil and military applications. In recent years, however, the proliferation of intelligent devices used in IoT has given rise to massive amounts of data, which brings its own set of challenges to the smooth functioning of the wireless communication network. However, the emergence of the fifth-generation (5G) wireless communication technology has provided an effective solution to this problem. Many scholars are committed to the study of key 5G characteristics such as quality-of-service (QoS) and connectivity [1,2]. With the development of UAV technology and 5G wireless systems, the application field of IoT has expanded further [3][4][5]. Due to their characteristics of dynamic deployment, convenient configuration, and high autonomy, UAVs play an extremely important role in IoT. As some wireless devices suffer from limited transmission ranges, UAVs can be used as wireless relays to improve the network connection and extend the coverage of the wireless network. Meanwhile, due to their adjustable flight altitude and mobility, UAVs can easily and efficiently collect data from the users of IoT on the ground. At present, there exists an intelligent UAV management platform, which can operate several UAVs at the same time through various terminal devices. It is capable of customizing flight routes as needed and obtaining the required user data. Intelligent transportation systems (ITS) can use UAVs for traffic monitoring and law enforcement. UAVs can also be used as base stations in the air to improve wireless network capacity. In 5G IoT, UAV wireless communication systems will play an increasingly important role [6,7].
The perception layer, which is the basic layer of IoT, consists of different sensors. The UAVs collect the data of IoT users by means of airborne IoT devices, including infrared (IR) cameras and visible (VI) light cameras [8]. One of the key technologies affecting the reliability of the perception layer is the accurate acquisition of multisource signals and reliable fusion of data. In recent years, heterogeneous image fusion has become an important topic regarding the perception layer of IoT. IR and visible light sensors are the two most commonly used types of sensors. IR images taken by an IR sensor are usually less affected by adverse weather conditions such as bright sunlight and smog [9]. However, IR images lack sufficient details of the scene and have a lower spatial resolution than VI images. In contrast, VI images contain more detailed scene information, but are easily affected by illumination variation. IR and VI image fusion could produce a composite image, which is more interpretable to both human and machine perception. The goal of image fusion is to combine images obtained by different types of image sensors to generate an informative image. The fused image would be more consistent with the human visual perception system than the source image individually. It can be convenient for subsequent processing or decision-making. Nowadays, the image fusion technique is widely used in such fields as military reconnaissance [10], traffic management [11], medical treatment [12], and remote sensing [13,14].
In recent decades, a variety of IR and VI image fusion approaches have been investigated. In general, based on the different levels of image representation, fusion methods could be classified into three categories: pixel-level, featurelevel, and decision-level [15]. Pixel-level fusion, conducted on raw source images, usually generates more accurate, richer, and reliable details compared with other fusion methods. Feature-level image fusion first extracts various features (including colour, shape, and edge) from the multisource information of different sensors. Subsequently, the feature information obtained from multiple sensors is analysed and processed synthetically. Although this method could reduce the amount of data and retain most of the information, some details of the image are still lost. Decision-level fusion is used to fuse the recognition results of multiple sensors to make a global optimal decision on the basis of each sensor independently, thus, completing the decision or classification. Decision-level fusion has the advantage of good realtime performance, self-adaptability, and strong antiinterference; however, the fault-tolerance ability of the decision function directly affects the fusion classification performance. In this study, we focus on the pixel-level fusion method.
The remainder of this paper is organised as follows. In Section 2, we introduce related works and the motivation behind the present work. The proposed fusion method is described in Section 3. Experimental results on public datasets are covered in Section 4. The conclusions of this study are presented in Section 5.

Related Works
In general, the pixel-level fusion approach can be divided into two categories: space-based fusion methods and the transform domain technique. Space-based fusion methods usually address the fusion issue via pixel grayscale or pixel gradient. Although these methods are simple and effective, they easily lose spatial details [16]. Liu et al. [17] observed that this type of method is more suitable to fusion tasks of the same type of images. The transform domain technique usually takes the transform coefficient as the feature for image fusion. The fused image is obtained by the fusion and reconstruction of the transform coefficients. The image fusion method based on multiscale transformation has been widely investigated because of its compatibility with human visual perception. In recent years, a variety of fusion methods based on multiscale transform have been proposed, such as low-pass pyramid (RP) [18], gradient pyramid (GP) [19], nonsubsampled contourlet transform (NSCT) [20], discrete wavelet transform (DWT) [21], and dual-tree complex wavelet transform (DTCWT) [22]. However, image fusion methods based on multiscale transform are usually complex and suffer from long processing time and energy consumption issues, which limit their application.
Due to the aforementioned reasons, many researchers implemented image fusion methods by using discrete cosine transform (DCT). In [23], the authors pointed out that image fusion methods based on DCT were efficient due to their fast speed and low complexity. Cao et al. [24] proposed a multifocus image fusion algorithm based on spatial frequency in the DCT domain. The experimental results showed that it could improve the quality of the output image visually. Amin-Naji and Aghagolzadeh et al. [25] employed the correlation coefficient in the DCT domain for multifocus image fusion, proving that the proposed method could improve image quality and stability in noisy images. In order to provide better visual effects, Jin et al. [26] proposed a heterogeneous image fusion method by combining DSWT, DCT, and LSF. Jin et al. proved that the proposed method was superior to the conventional multiscale method. Although image fusion methods based on DCT have achieved superior performance, the fused results show undesirable side effects such as blocking artifacts [27]. While performing DCT, the image is usually required to be divided into small blocks prior, which causes discontinuities between adjacent blocks in the image. In order to address this problem, several filtering techniques have been proposed, such as weighted least square filter [28], bilateral filter [29], and anisotropic diffusion filter [30]. Xie and Wang [31] pointed out that the anisotropic diffusion processing of images could retain the image edge contour information. Compared with other fusion methods based on filtering, image fusion based on anisotropic diffusion could retain more edge profiles. It preferably suppresses noise and obtains better visual evaluation. However, most of the proposed methods for anisotropic diffusion models are based on the diffusion equation itself, ignoring the image's own feature information, which may lead to loss or blurring of image details (textures, weak edges, etc.). Inspired by the above research, a heterogeneous image fusion method 2 Wireless Communications and Mobile Computing based on DCT and anisotropic diffusion is proposed. The advantages of the proposed method mainly lie in the following three aspects: (1) Due to the use of DCT transform, the fusion algorithm proposed in this paper shows good denoising ability (2) The final fusion images show satisfactory detail resolution (3) The proposed algorithm is easy to implement, and the real-time performance is very good, which is suitable for real-time requirements

Proposed Fusion Method
In this section, the operation mechanism of the proposed algorithm is described in detail. The proposed image fusion framework can be divided into three components, as shown in Figure 1. In the first step, in order to eliminate the noise in the original images, DCT and inverse discrete cosine transform are performed on the IR and VI images, respectively. In the second step, anisotropic diffusion is adopted to decompose IR and VI images to obtain the detail and base layers. In the third step, base layers are fused by using the weighted averaging, and detail layers are fused by using the Karhunen-Loeve transformation. Finally, the fused base and detail layers are linearly superimposed to obtain the final fusion result.
3.1. DCT. As an effective transform tool, DCT can transform the image information from the time domain to the frequency domain so as to effectively reduce the spatial redundancy of the image. In this study, the 2D DCT of an N × N image block f ði, jÞ is defined as follows [32]: where f ði, jÞ denotes the ði, jÞ-th image pixel value in the spatial domain and Fðu, vÞ denotes the ði, jÞ-th DCT coefficient in the frequency domain; cðkÞ is a multiplication factor, defined as follows: Similarly, the 2D inverse discrete cosine transform (IDCT) is defined as: When performing the DCT transform, most of the image information is concentrated on the DC coefficient and lowfrequency spectrum nearby. Therefore, the coefficient close to 0 is deleted, and the coefficient containing the main information of the image is reserved for inverse transformation. The influence of noise can be effectively removed without

Anisotropic Diffusion.
In computer vision, anisotropic diffusion is widely used to reduce noise while preserving image details. The model of anisotropic diffusion of an image I can be represented as follows [33]: where cðx, y, tÞis rate of diffusion, ∇and Δ are used to represent the gradient operator and the Laplacian operator, respectively, and t is time. Then, the equation (4) can be discretized as: In (5), I t+1 i,j is an image with coarse resolution at t + 1 scale. λ denotes a stability constant satisfying 0 ≤ λ ≤ 1/4. The local image gradients along the north, south, east, and west directions are represented as: Similarly, the conduction coefficients along the four directions of north, south, east, and west can be defined as: In (7), gð·Þ is a decreasing function. In order to maintain smoothing and edge preservation, in this paper, we choose gð·Þ as: where T is an edge magnitude parameter. Let I IR ðx, yÞ and I VI ðx, yÞ be IR and VI images, respectively, which have been coregistered. Anisotropic diffusion for an image Iis denoted as ADðIÞ. The base layers I B IR ðx, yÞ and I B VI ðx, yÞ are obtained after performing anisotropic diffusion processes on I IR ðx, yÞ and I VI ðx, yÞ, respectively, which are represented by: Then, the detail layers I D IR ðx, yÞ and I D VI ðx, yÞ are obtained by subtracting the base layers from the respective source images, which are shown in (10).
The results obtained by the anisotropic diffusion of IR image and VI image are shown in Figure 3.

Base Layer Fusion.
The weighted average is adopted to fuse the base layers of IR and VI images. The fused base layer I B F is calculated by: where ω 1 and ω 2 are normalized weighted coefficients, set to 0.5 in this paper.

Detail Layer Fusion.
KL-transform can make the new sample set approximate to the original sample set distribution with minimum mean square error, and it eliminates the correlation between the original features. We use KLtransform to fuse the detail layers of IR and VI images. Let I D IR ðx, yÞ and I D VI ðx, yÞ be detail layers of the IR and VI images, respectively. The fused process based on KL-transform is described as follows.
Step 1. Arrange I D IR ðx, yÞ and I D VI ðx, yÞ as column vectors of a matrix, denoted as X.
Step 2. Calculate the autocorrelation matrix C of X.
Step 4. Calculate the uncorrelated coefficients K 1 and K 2 corresponding to the largest eigenvalue λ max , which is defined as: The eigenvector corresponding to λ max is denoted as μ max . And, K 1 and K 2 are given by:

Wireless Communications and Mobile Computing
Step 5. The fused detail layer I D F is calculated using: 3.3.3. Reconstruction. In this study, the final fused image is obtained by the linear superposition of I B F and I D F , as shown in equation (14).

Experiment
In this section, the coregistered IR and VI images from the TNO image fusion dataset are used to evaluate our algorithm. This database provides a large number of coregistered infrared and visible images. All the experiments are implemented  [34], discrete harmonic wavelet transform (DCHWT) [35], discrete wavelet transform (DWT) [21], two-scale fusion (TS) [36], dual-tree complex wavelet transform (DTCWT) [22], and curvelet transform (CVT) [37]. In order to verify the advantages of the proposed method, the experimental verification is divided into two parts. Subjective evaluation results of the fused image are shown in the first part. In the second part, we compare the objective evaluation results of the proposed algorithm with six comparison algorithms. The six pairs of coregistered source images used in this experiment are depicted in Figure 4.

Subjective Evaluation.
The fusion results obtained by the proposed method and six compared methods are shown in Figure 5. The fusion experimental results from pair 1 to pair 6 are represented from top to bottom, respectively. In order to show a better comparison, the details in the fused images are highlighted with red boxes. As can be seen from Figure 5, our method preserves more detail information and contains less artificial noise in the red window. The image details in fused images obtained by MSVD, DWT, and TS are blurred, which are clearly seen from the first three pairs of the experiment. Compared with the above three fusion methods, the fusion results based on DCHWT preserve more detail information, while showing obvious VI artifacts in them (clearly visible in the last pair of the experiment). The fused images obtained by DTCWT, CVT, and the proposed method could preserve more detail information. Compared with DTCWT and CVT fusion methods, the fusion results of the proposed algorithm look more natural. In the next section, several different objective quality metrics are evaluated to demonstrate the advantages of the proposed method.

Objective Evaluation.
In order to verify the advantages of the proposed method, cross entropy (CE) [38], mutual information (MI) [39], the average gradient (AG) [39], relative standard deviation (RSD) [39], mean gradient (MG) [39], and running time are used as objective evaluation metrics. CE represents the cross entropy between the fused image and the source image. The smaller the cross entropy, the smaller the difference between the images. MI represents the calculation of mutual information between the fused image and the source image. The larger the value of MI, the higher the similarity between the two images. Calculating the average gradient of the image involves calculating the definition of the image, which reflects the expressive ability of the image to the detail contrast. RSD represents the relative standard deviation of the source image and the fused image, which reflects the degree of deviation from the true value. The smaller the relative standard deviation, the higher the fusion accuracy. The mean gradient represents the definition of the fused image. It refers to the clarity of each detail shadow and its boundary on the image. The objective evaluation results of the 6 pairs of the experiment in the "Subjective Evaluation" section are shown in Tables 1-3, and the best results are highlighted in bold.
The comparison results of the objective evaluation indexes of the first two pairs of experiments are given in Table 1. As seen from Table 1, the proposed method outperforms other fusion methods in terms of all metrics except for CE. Although the CE value of the proposed method is not the minimum value, it is very close to the minimum.
The comparison results of objective evaluation indexes of the third and fourth pairs of the experiment are shown in Table 2. As can be seen from Table 2, the fusion result of the proposed algorithm in this paper contains more information. The proposed method outperforms other methods as regards AG, RSD, MG, and running time. The CE values of the proposed method in the third pair of the experiment are very close to the best values produced by MSVD. In the fourth pair of the experiment, all fusion quality indexes of image fusion generated by the proposed method are the best.
The comparison results of the objective evaluation indexes of the last two pairs of the experiment are given in Table 3. As can be seen from Table 3  As can be seen from Tables 1-3

Conclusions
Data fusion, which is a key technology in IoT, can effectively reduce the amount of data in the communication network and thus reduce the energy loss. In this article, we focused on the fusion of the IR image and the VI image and proposed a novel heterogeneous fusion approach. Considering that DCT has shown good denoising ability and anisotropic diffusion has shown satisfactory detail resolution, we fused the two algorithms through effective fusion strategies. Experimental results show that the proposed method can achieve better performance compared with other six state-of-the-art fusion approaches as regards both subjective and objective indexes. However, IR and VI images in this experiment are all coregistered. The actual multisource data may be unregistered. In the future, we will focus on unregistered image fusion.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.