Digital image forensics is a key branch of digital forensics that based on forensic analysis of image authenticity and image content. The advances in new techniques, such as smart devices, Internet of Things (IoT), artificial images, and social networks, make forensic image analysis play an increasing role in a wide range of criminal case investigation. This work focuses on image source identification by analysing both the fingerprints of digital devices and images in IoT environment. A new convolutional neural network (CNN) method is proposed to identify the source devices that token an image in social IoT environment. The experimental results show that the proposed method can effectively identify the source devices with high accuracy.
The IoT is revolutionizing our everyday lives provisioning a wide range of novel applications leverage on ecosystems of smart and highly heterogeneous devices [
In recent years, social network platforms, such as Twitter, Facebook, WeChat, Instagram, and Weibo, have been increasingly used in our daily events and are changing the way we are communicating [
The accuracy of traditional camera source identification mainly relies on the compression strength of the image that needs to be suppressed before noise fingerprint extraction [
In summary, the major contributions of the proposed work are fourfold:
A novel method that combines smart mobiles and social network platforms for image source identification is proposed A new CNN is designed to extract the fingerprint characteristics of image noise on social networks and to match the device fingerprint to identify the camera source device of the image A loss function is proposed based on deep learning method to effectively extract the noise fingerprint of the test image A new dataset was constructed to test the user identification framework based on camera fingerprints
As we all know, the information shared on social networks is often dominated by images. It is of great significance for multimedia forensics to trace the source of these images and identify the camera source by matching them with the camera they belong to. It provides an effective method for network evidence collection by law enforcement officers in the event of cybercrime. To fully understand the relationship between the social network platform images and the camera to which it belongs, a detailed overview of the existing image traceability technology is carried out. The existing widely used image traceability methods mainly include camera source identification based on photo response nonuniformity (PRNU) and camera source identification based on deep learning techniques.
The PRNU is mainly based on the use of digital imaging equipment in the production process due to the imperfection of manufacturing of the CCD sensor array, resulting in the imaging equipment photosensitive elements of the photosensitive characteristics of small differences, e.g., the most widely used is the PRNU feature proposed by [
With the development of artificial intelligence technology and the increase of available image datasets, deep learning technology is gradually introduced into the field of image forensics. Also, deep learning technology can extract the best features from a large number of training datasets, avoiding the limitations of artificially designed features. Due to the rise of social networking sites such as Twitter, Facebook, WeChat, Instagram, and Weibo, researchers can easily obtain a large number of images with complete tags, use these images as research objects to extract image features, and then, use the larger-scale dataset to verify the effectiveness of the algorithm. For example, [
So far, due to the extensiveness and heterogeneity of data information on social network platforms and the difficulty of high computational complexity caused by large-scale datasets for camera source identification algorithms, it is of great significance to combine the traditional PRNU-based noise estimation with the deep learning-based noise estimation and apply it to camera source identification and network forensics.
Based on the investigation of the above-related work, this paper integrates PRNU and deep learning to design a camera source identification network (CSI-CNN) based on image noise fingerprint feature extraction, which optimizes the fully convolutional networks (FCN) [
The core idea of the social network image source identification method proposed in this paper is to identify the camera device source of the images posted by the user on the social network. That is, the noise fingerprint features can be extracted from the images on the social network through CSI-CNN designed in this paper, and the extracted noise fingerprint is correlated with the preestimated camera fingerprints; afterwards, the calculated correlation is used to determine whether the image on the social network is a real image taken by the camera held by the user. Camera fingerprint estimation and social network noise fingerprint extraction are the key contents of camera source identification, which will be introduced in detail in this section.
The social network image source identification method based on camera source recognition requires preestimation of the camera fingerprint, that is, the PRNU value. The specific process includes two parts: determining the camera sensor output model and PRNU estimation.
The imaging process of the camera is very complicated. The light is focused on Charge-Coupled Device (CCD) or Complementary Metal-Oxide Semiconductor (CMOS). The CCD or CMOS completes the conversion of optical signals to signals, and the electrical signals are converted into digital by analog to digital converter. The signal is converted into a digital image through digital signal processing.
In the camera imaging process, the sensor will leave sensor pattern noise (SPN) in any image taken, which is an inherent feature of digital cameras, which is mainly caused by photo response nonuniformity and fixed-pattern noise (FPN). Even with the same type of sensor, the output value of the photosensitive unit will be different, which produces PRNU. It is unique to a single sensor. Aiming at the complexity and polymorphism of camera imaging, [
Among them,
Among them,
The camera fingerprint Use denoising filter
Among them, Get noise residual
Among them, PRNU estimation
The maximum likelihood estimation can be used to estimate the value of
After obtaining the PRNU of the device, it is necessary to extract the noise of the test image. CSI-CNN is designed in this section, noise fingerprint can be extracted through CSI-CNN, and the correlation calculation is performed with the preestimated PRNU value to determine whether the test image belongs to the corresponding device. This section proposes the CSI-CNN network model and introduces in detail how to build the network structure and the training process of the model.
The overall network structure of the proposed CSI-CNN is shown in Figure The middle layer uses batch normalization (BN) and convolution kernel stacking ideas. The main reason for adopting this idea is that when the neural network is trained using minibatch in this paper, different batch data distributions are different, the network must learn to adapt to different distributions in each iteration, which will greatly reduce the training speed of the network. Using the BN method for data standardization can speed up the training process and improve the denoising performance. Using a stack of full convolution kernels allows the network to accept inputs of any size The network structure design uses the bottleneck residual block and uses a
CSI-CNN network structure. This network is a fully convolutional network and does not change the length and width of the input image, its input is a 3-channel RGB image, and the output is a single-channel noise residual image.
According to the above network construction ideas, the input of CSI-CNN is the image to be tested
The number of network parameters in CSI-CNN.
Network layer name | Parameter number |
---|---|
Convolution layer 1 | |
BatchNorm | |
Activation function ReLU | 0 |
Convolution layer 2 | |
BatchNorm | |
Activation function ReLU | 0 |
Convolution layer | |
BatchNorm | |
Activation function ReLU | 0 |
Convolution layer 4 | |
Function ADD | 0 |
BatchNorm | |
Convolution layer 5 | |
BatchNorm | |
Activation function ReLU | 0 |
Function sub | 0 |
Activation function ReLU | 0 |
Convolution layer 6 | |
BatchNorm | |
Activation function ReLU | 0 |
Convolution layer 7 | |
BatchNorm | |
Activation function ReLU | 0 |
Convolution layer 8 | |
BatchNorm | |
Activation function ReLU | 0 |
Convolution layer 9 | |
Function ADD | 0 |
Convolution layer 10 | |
BatchNorm |
Figure
CSI-CNN network training process.
The loss function designed in this paper uses the cosine distance to measure the similarity between the network output and the predicted PRNU value and calculates the loss through the idea of segmentation, and finally, uses it to update the parameters in the network. It enables the network to better extract the characteristics of noise fingerprint for camera source identification.
Among them,
This means that
This means that from the same position of the same camera, we should hope that
To evaluate the performance of camera source identification of the proposed method, we use the following four datasets for testing.
This dataset was established by [
This dataset comes from a competition on the identification of mobile phone image sources held on the Kaggle [
This dataset was established by [
Due to the small number of pictures of a single camera in the above datasets, the model cannot be trained well. To better estimate the performance of the algorithm proposed in this paper, we use 5 different models of mobile phones, including
This paper preprocesses the collected dataset. First, all images in the dataset are cropped into blocks in the central area and then are randomly selected as the input data of CSI-CNN from the cropped blocks for training.
During the experiment, this paper selects different control methods and evaluation indicators according to different experimental purposes, and all comparison methods are experimented on the datasets used in this paper.
When evaluating the denoising model, this paper compares with the wavelet filter denoising model and DnCNN [
When evaluating the performance of the CSI-CNN network model, this paper uses accuracy (ACC), receiver-operating characteristic (ROC), and area under curve (AUC) as evaluation indicators, which are defined as follows:
ROC curve, the abscissa of the curve, is the false-positive rate (FPR), and the ordinate is the true case rate (TPR).
Among them, TP represents the number of samples taken by a certain camera and classified by the model as belonging to the camera. FP represents the number of samples that the image does not belong to a certain camera but is classified as belonging to this camera by the model. FN represents the number of samples whose images did not belong to a certain camera and were classified by the model as not belonging to the camera. TN represents the number of samples taken by a certain camera and classified by the model as not being taken by the camera. AUC is the area under ROC. It is equivalent to Mann-Whitney
Among them,
The running environment of the experiment is Ubuntu16.04LTS operating system equipped with PyTorch1.0.1 and Python3.6. The experiments are run on NVIDA graphics card GeForce GTX1080-Ti.
This paper uses four datasets and uploads them to five social platforms to obtain twenty different datasets. Experiments are performed on these datasets, respectively, and the correlation between the noise fingerprint obtained by CSI-CNN and the PRNU camera fingerprint is estimated by Section
In order to perform camera source identification on the image data of the social network platform, it is first necessary to extract noise fingerprints from the image. The quality of noise fingerprint extraction directly affects the performance of camera source identification. In the Our dataset, this paper randomly takes out 200 images from the test dataset of each camera and tests the mean value of the correlation coefficient with each camera’s fingerprint. The results are shown in Table
Correlation calculation results on our dataset.
Model | Honor 10 | iPhone 6 | Nubia Z17 | Redmi note8 | Galaxy S5 | Wavelet |
---|---|---|---|---|---|---|
Image | ||||||
Honor 10 | 0.0767 | -0.0068 | -0.0040 | 0.0058 | -0.0005 | 0.0591 |
iPhone 6 | -0.0067 | 0.0990 | -0.0051 | 0.0026 | -0.0052 | 0.0777 |
Nubia Z17 | -0.0047 | -0.0056 | 0.0670 | 0.0002 | 0.0013 | 0.0474 |
Redmi note8 | 0.0035 | -0.0027 | 0.0051 | 0.1033 | 0.0024 | 0.0821 |
Galaxy S5 | -0.0136 | -0.0250 | 0.0042 | 0.0088 | 0.2610 | 0.2044 |
In order to test the performance of the CSI-CNN camera source identification method proposed in this paper. We perform camera source identification by correlating the noise fingerprint extracted by CSI-CNN with the corresponding camera fingerprint estimated in Equation (
(a–f) The thermodynamic diagram of the correlation coefficient obtained by uploading our dataset to five platforms and taking two pictures randomly from each camera in the dataset.
In order to better analyze and evaluate the proposed CSI-CNN camera source identification algorithm, we use the representative indicators of deep learning-related performance evaluation to analyze and evaluate its performance.
Figure
(a–d) The histogram representation of AUC values obtained by applying three different algorithms to four datasets on five platforms.
In this paper, the camera with the largest correlation coefficient with the image is used as the source camera, and on this basis, the ACC value is calculated. Table
Accuracy comparison (ACC).
Dataset | Platform | Algorithm | ||
---|---|---|---|---|
CSI-CNN | Wavelet | DnCNN | ||
Our | 0.950 | 0.887 | 0.453 | |
0.864 | 0.792 | 0.580 | ||
0.836 | 0.708 | 0.552 | ||
0.803 | 0.727 | 0.423 | ||
0.937 | 0.843 | 0.460 | ||
Kaggle | 0.936 | 0.928 | 0.508 | |
0.928 | 0.880 | 0.524 | ||
0.840 | 0.756 | 0.492 | ||
0.813 | 0.808 | 0.436 | ||
0.892 | 0.864 | 0.508 | ||
Vision | 0.744 | 0.696 | 0.384 | |
0.708 | 0.676 | 0.352 | ||
0.660 | 0.560 | 0.372 | ||
0.648 | 0.548 | 0.368 | ||
0.728 | 0.604 | 0.368 | ||
Daxing | 0.832 | 0.780 | 0.420 | |
0.836 | 0.880 | 0.524 | ||
0.840 | 0.768 | 0.424 | ||
0.764 | 0.652 | 0.408 | ||
0.808 | 0.768 | 0.428 |
To further evaluate the performance of the algorithm designed in this paper, the image camera source recognizes the ROC curve (as shown in Figure
The ROC curves obtained by applying three different algorithms to four datasets on five platforms. (a–d) Our, Vision, Kaggle, and Daxing datasets.
In order to improve the accuracy of camera source recognition, this paper designs a new loss function. To test the effectiveness of the loss function proposed in this paper, we use the Daxing dataset for training and testing, and the ratio of the training set to the test set is 3 : 1. The initial learning rate is 0.001 and iterates 100 epochs, and each iteration is 30 times, and the learning rate becomes 0.2 times of the original for model training. As shown in Figure
The mean-square error (MSE) loss function and Our loss function were tested on five platforms, respectively, and the AUC change curve on each epoch.
Multimedia forensics is an important research topic in the field of computer security. The combination of online social networks and smart phones is of great significance to crime prevention, evidence collection, and the security of IoT devices. In this paper, a CSI-CNN is proposed to extract noise fingerprints from pictures on social networks and match the extracted noise fingerprint with camera fingerprints to identify the camera source. We conduct experiments on five online social network platforms with different image compression levels. The experimental results show that the CSI-CNN network model proposed in this paper has a higher recognition effect than the current popular DnCNN and wavelet filter camera source recognition algorithms.
With the development of deep learning and the diversification of forensic data, the method proposed in this paper may have some limitations. To overcome these problems, we will use pure deep learning methods to train the features of a large number of heterogeneous forensic data and extend the research object to the video data of social networks.
The labeled datasets used to support the findings of this study can be provided on request.
The authors declare that there are no conflicts of interest regarding the publication of this paper.
The research presented in this paper is supported in part by the National Natural Science Foundation (No. U20B2050) and the Youth Innovation Team of Shaanxi Universities (No. 2019-38).