Calligraphy and Painting Identification 3D-CNN Model Based on Hyperspectral Image MNF Dimensionality Reduction

As a kind of cultural art, calligraphy and painting are not only an important part of traditional culture but also has important value of art collection and trade. The existence of forgeries has seriously affected the fair trade, protection, and inheritance of calligraphy and painting. There is an urgent need for the efficient, accurate, and intelligent technical identification method. By combining the advantages of material attribute recognition and imaging detection of hyperspectral imaging technology with the powerful feature expression ability and classification ability of the convolutional neural network, it can greatly improve the comprehension efficiency of calligraphy and painting identification; meanwhile, in order to reduce the redundancy and the amount of parameters in the method of directly using the hyperspectral image, an objective convex dimensionality reduction method should be used for compressing the original hyperspectral image before deep learning. Based on this, we propose a kind of deep learning method to classify author and authenticity based on the multichannel images obtained by minimum noise fraction (MNF) dimensionality reduction to calligraphy and painting hyperspectral data, and its core is the 2D-CNN or 3D-CNN model with the basic network of “4 convolution layers + 4 pooling layers + 2 full-link layers.” The experimental results show that the identification accuracy of the 2D-CNN calligraphy and painting identification with MNF pseudocolor image mosaic as input and the 2D-CNN calligraphy and painting identification with multichannel MNF dimensionality reduced images direct as input have high accuracy, while the 3D-CNN calligraphy and painting identification with multichannel MNF dimensionality reduced images direct as input not only maintains excellent identification accuracy but also has better learning convergence (step number) and stability compared with the 2D-CNN model. Especially, the 3D-CNN identification accuracy of calligraphy and painting's author and authenticity on the test set can reach 93.2% and 95.2%, respectively.


Introduction
Calligraphy and painting works of art have an important cultural relic value and art transaction value. Te existence of fake works seriously afects the fairness and justice of the calligraphy and painting works of art market and the growth of calligraphy and painting culture. Te expert eye recognition method is the main appraisal method of discovery false painting now, which uses the spatial visual information of calligraphy and painting, while the information obtained from the eye is the integral accumulated information of visible light, and its spectral resolution is lower. In particular, with the continuous improvement of counterfeiting technology, based on traditional eye recognition, it is diffcult to identify high-level counterfeiting means, such as split and high-quality copy. Tere are a few technical recognition means at present, but their accuracy is limited. Hence there is also an urgent need for better scientifc and technological identifcation means [1][2][3].
Spectral imaging is an information acquisition mean of attribute recognition and visual perception. For a certain substance, diferent wavelengths may correspond to diferent spectral values. Based on this, a relationship curve between wavelengths and spectral values can be drawn. According to the previously mentioned diferent spectrum curves, substances can be classifed. Meanwhile, the spectral image includes the unique diferentiation of spectral and spatial characteristics. For calligraphy and painting, although the level of imitation is very high and the spatial information of the fake work is consistent with that of the true one, it is impossible to make the materials completely consistent with the original one. So, there are always some spectral diferences in ink, color, paper, chapter, and other aspects, which could be discovered by spectral imaging technology [4].
For example, a comparison between the traditional photos and hyperspectral feature images of true painting and false painting is shown in Figure 1. For the pictures taken by the camera, the wicks of the true work and the fake work are basically the same, but in the hyperspectral feature images consisting of spectrum 930 nm, 780 nm, and 850 nm of the original hyperspectral image, there is some obvious diference; for instance, the right work has the wick, while the left one does not have the wick in the feature images.
Hyperspectral image classifcation refers to the classifcation of diferent date's tags using the spatial features, spectral features, or spatial-spectral joint features of threedimensional hyperspectral images and includes the supervised and unsupervised ideas. For the pattern recognition and classifcation based on calligraphy and painting hyperspectral images, the usual method is to use spectral features or spatial features for computing, such as spectral angle matching, support vector machine, particle swarm optimization [5], maximum likelihood, clustering, and decision tree. Tis kind of method is more suitable for the situation with simple features. For the classifcation problem with complex features or high-dimensional data, the previous feature analysis method is incomplete or inseparable, which afects the efciency and accuracy of classifcation. So, it requires combining the previous methods with data dimensionality reduction, sparse constraints, or fnding more efcient classifcation models, like the method based on locality preserving dimension reduction and Gaussian mixture model [6], the model based on sparse least square support vector machine (LSSVM) [7], the classifcation by low-rank and sparse representation with adaptive neighborhood regularization [8], the Harris Hawks optimization with principal component analysis (PCA) for dimensionality reduction [9], and so on. Here, in the aspect of hyperspectral image dimensionality reduction, in addition to the aforementioned PCA, it can also be realized by spectrum selection, feature fusion, projection method, and another data transformation [10]. Meanwhile, some shallow networks can also be used to solve the manual feature extraction, such as deep belief network and stack self-encoder network, whose input is a one-dimensional vector, but it is inefcient when it is applied to the feature expression and classifcation of hyperspectral data [11,12].
With the development of the intelligent data analysis, the convolution neural network (CNN) becomes a helpful method in machine learning, which realizes the feature expression to high-dimensional data through convolution operation, and signifcantly reduces the amount of parameters by locally linking efective features and sharing weight theory [13][14][15][16][17][18].
Generally, CNN mainly has three convolution kernels, which includes one-dimensional convolution (1D-CNN), two-dimensional convolution (2D-CNN), and three-dimensional convolution (3D-CNN). Teir network structures are basically similar, but the convolution kernels are different [19][20][21]. 1D-CNN mainly extracts spectral features and then classifes them without considering spatial features. When they face the phenomena such as "same matter has foreign spectrum" and "foreign matter has same spectrum" in hyperspectral images, it is difcult to obtain good classifcation results only by spectral information. Te most essential diference between 2D-CNN and 1D-CNN is that the convolution and pooling of 2D-CNN are two-dimensional operations. Terefore, 2D-CNN can directly extract the spatial features of hyperspectral images or the dimensionality-reduced data for classifcation. Whether using characteristic spectral data or dimensionality-reduced images, it mainly uses the target's spatial features, supplemented by some spectral features [22,23].
Tere are two main methods for classifcation based on spatial-spectral joint features in hyperspectral data. One is to reduce the dimension of the hyperspectral data and to extract its two-dimensional features by 2D-CNN frstly, then the 1D-CNN or traditional method is used to extract spectral information; fnally, the above spatial features and spectral features are fused to complete the classifcation. Te other is directly using 3D-CNN to extract the spectral-spatial joint features of hyperspectral images. Compared with the spectral or spatial features extracted by 1D-CNN or 2D-CNN, respectively, the features extracted by 3D-CNN are high-dimensional and global features, which are more holistic [24,25].
It should be mentioned that the calligraphy and painting identifcation based on hyperspectral images is diferent from the conventional hyperspectral remote sensing classifcation. Te purpose of traditional hyperspectral remote sensing classifcation is mainly to classify and recognize diferent pixels in a single image or multiple images, while the calligraphy and painting recognition based on hyperspectral images is to classify and recognize the labels corresponding to the whole hyperspectral image through the analysis of many hyperspectral images; in other words, the latter is an overall classifcation. Terefore, the computational complexity of the calligraphy and painting identifcation based on hyperspectral images is generally higher than that of similar remote sensing image classifcation. It requires dimensionality reduction of hyperspectral images or parameter reduction of the model in practical learning [26,27].
To sum up, for the method of using manual features to conduct identifcation by an expert eye, there are some problems, such as time consumption, laborious manual, and incomplete feature construction. Combining the hyperspectral imaging and machine learning, it can greatly improve the comprehension efciency of calligraphy and painting identifcation. Because hyperspectral data could be used for machine learning as a whole without artifcial feature extraction, and the diferent channels of hyperspectral images are redundant, so the calculation complexity 2 Computational Intelligence and Neuroscience of machine learning based on whole hyperspectral images is higher. In order to reduce the redundancy and the amount of network parameters, in this paper, we propose a deep learning method for painting and calligraphy identifcation based on hyperspectral data, with an objective convex dimensionality reduction for compressing the original hyperspectral image.

Models of Calligraphy and Painting Identification Based on Hyperspectral Image Dimensionality Reduction
It is supposed that for the hyperspectral image domain of calligraphy and painting, an element can be expressed in an n-dimensional space, namely, x ∈ X(R n ), then the painting identifcation based on hyperspectral dimensionality reduced image is to fnd a dimensionality reduction mapping function c(·) in the previous n-dimensional space, and as a result, diferent kinds of painting can be divided in the mapped dimensionality reduced space. Taking the classifcation of diferent calligraphy and painting author as an example, for the calligraphy and painting hyperspectral data of any two authors x t1 ∈ X(R n ) and x t2 ∈ X(R n ), there is Here, x i t1 and x k t2 represent ith sample and kth sample in two diferent author's data sets; x i t1 and x j t1 represent ith sample and jth sample in the same author's data set. In other words, based on the dimensionality reduction mapping c(·), the calligraphy and painting in the original n-dimensional space would be mapped in a new lower dimensional space; as a result, the mapped data has the smallest measure under the operation of the classifcation operator f(·).
Ten, the method of calligraphy and painting identifcation based on hyperspectral dimensionality reduced images can be divided into two steps: frstly, it is needed to fnd convex data dimensionality reduction mapping which makes the dimensionality reduced data be one-to-one mapping with the original space. Secondly, the subspace partition function in the dimensionality reduced space is designed, and the partition of the reduced dimension space by the subspace partition function has no intersection, or the intersection is the smallest.
Here, hyperspectral data dimensionality reduction is a projection method in the continuous space, which can be expressed as follows: Among them, any n-dimensional data can be projected into a new lower dimensional space by a unifed continuous feature map c(·), where the new space may not be complete, at least convex. Hence, the frst task in this paper is to fnd a space dimensionality reduction transformation so as to construct a calligraphy and painting identifcation model based on the dimension reduced data.

Calligraphy and Painting Hyperspectral Image Dimensionality Reduction Based on MNF Transformation
For the hyperspectral data dimensionality reduction, it is needed to meet the following conditions: frstly, the feature information of the data is preserved as much as possible; secondly, the data redundancy and correlation should be removed. Te mainstream dimensionality reduction methods of hyperspectral data include the method based on the spectral segment selection and the dimensionality reduction method based on data transformation. Among them, segment selection refers to select multiple fewer channel data from the original data, and the original spectral information in the selected multiple channel data is maintained as much as possible. Here, manual selection may inevitably lead to the situation that the dimensionality reduced space and the original space does not meet the one-toone mapping. Feature extraction refers to the mathematical transformation of the original data, which is more suitable for dimensionality reduction of high-dimensional data. Te dimensionality reduction method-based transformation is to compress the searching space, where the dimensionality reduction mapping is continuous. Common transformation methods include principal component analysis (PCA), minimum noise fraction (MNF) [28], and independent component analysis (ICA). PCA is a commonly used dimensionality reduction method for hyperspectral data. Its essence is to rank according to the variance and to retain several principal components with larger eigenvalues. In fact, only when the noise of original data is independent of the data and the noise variance of all channels data is same, the principal components sorted by variance are consistent with the principal components sorted by noise. Terefore, a dimensionality reduction method which can eliminate the infuence of noise is required. MNF is a method of solving the previous problems by a two-layer PCA and selected for hyperspectral dimensionality reduction in the next study, which enable the transformed components to be selected according to the signal-to-noise ratio, so the noise in the transformation component is removed more accurately.
It is assumed that the original calligraphy and painting hyperspectral image F is composed of ideal image F 0 and noise image N through flter processing, which could be expressed as follows: Because fltered ideal spectral data and noise data are not related, it is set that the covariance matrices of the ideal image and the noise image are C 0 and C N , so the covariance matrices of original hyperspectral image F 0 can be written as follows: Te noise fraction (NF) is defned as the ratio of the noise variance to the total variance of the data; that is, for the linear combination Fα of the original spectral data F, the noise fraction is as follows: Here, α � [α 1 , α 2 , · · · , α λ ] T is the λ dimensional linear representation vector. Similarly, the signal-to-noise ratio (SNR) is defned as the ratio of the variance of fltered spectral data to the variance of noise, that is, It can be seen from equation (6) that the minimum noise fraction (NF) can be obtained by maximizing the signal-tonoise ratio (SNR). Tat is, the essence of MNF is to maximize the following formula: Terefore, the minimum noise transformation (MNF) can be divided into two steps: (1) noise unitization transformation so that α T C N α is a unit matrix; (2) spectral principal component transformation is to fnd the corresponding principal component to α T C F α. Te specifc implementation process of MNF transformation is given as follows.
Noise unitization transformation: the covariance matrix C N of noise data N is obtained by noise estimation; we diagonalize it, and its diagonalization matrix D N can be expressed as follows: In the formula, D N is a diagonal matrix consisting of the eigenvalues of C N in a descending order, and U is an orthogonal matrix consisting of the corresponding eigenvectors. Furthermore, it is expressed as follows: where E is the unity matrix, P � UD −1/2 N . At this time, the spectral data F are transformed by the spectral dimension: Here, the original spectral data are projected into the transformation space, the transformed data's noise has a unit variance, and the data between diferent components are not correlated. Principal component transformation: the covariance matrix C F of the spectral data F is transformed as follows: Ten, we diagonalize the prewhitening matrix C F−adj to obtain the diagonalization matrix D F−adj , which is expressed as follows: In the formula, D F−adj is a diagonal matrix formed by descending arrangement of eigenvalues of C F−adj , and V is an orthogonal matrix formed by corresponding eigenvectors. At this time, the noise unitized spectral transformation data F ′ that has been transformed in the frst step is transformed into the principal component transformation in the second step: Te MNF transformation data I mnf of the spectral data F are obtained. Here, the dimensionality-reduced component in this method has been obtained without the infuence of noise, which is lesser sensitive to noise than the principal component.
Generally, we can obtain the MNF images with the same number of spectral segments. In order to achieve spectral segment compression or dimensionality reduction, the frst few transformed images with the largest amount of information are selected as the input data for identifcation. In practical application, for selecting the appropriate number of MNF channels, the characteristic contribution rate curve under the number of MNF channels is calculated. Here, the calculation formula of the characteristic contribution rate of the frst i channels of MNF images is expressed as follows: Here, e � [e 1 , e 2 , · · · , e m ] is the eigenvalue vector of the matrix D F . In engineering application, the characteristic contribution rate of dimensionality reduced data is generally needed to reach 85%.

Convolution Neural Network Based on Multichannel Dimensionality Reduced Images
Based on the above, the MNF-transformed feature images are still multichannels, but the channel number of feature images with efective information becomes lesser, which make modeling and learning more convenient. In this section, the convolution neural networks based on the previous dimensionality reduced data are discussed under diferent convolution cores.

Convolution Neural Network Based on Multichannel
Dimensionality Reduced Images. With the in-depth development of machine learning technology, a series of representative networks have been designed. As one of the classic networks, the visual geometry group (VGG) network improves the classifcation performance to a certain extent by increasing the network depth. Te VGG network has two common structures [29,30], namely, VGG16 and VGG19. Teir diference is the diferent network depth. Compared with the earlier classical network, one improvement of the VGG network is to replace the larger convolution kernel with several consecutive smaller convolution kernels. In this paper, referred to the design of the VGG network, a new depth network is presented as shown in Figure 2. It has the characteristics of equal increase in depth and equal decrease in the width of each layer so as to ensure stable parameter quantity and sufcient complex feature extraction ability in the feature extraction stage. Te model mainly includes an input layer, four convolution layers, four pooling layers, two full-link layers, and an output layer. Te detailed network design is discussed separately in the following diferent models.
Assuming that the input data of calligraphy and painting appraisal network are X and the output of calligraphy and painting appraisal network is y, for the current ith layer, x i is its input, then the input of the i + 1th layer can be expressed as follows: Here, w i is the weight of the ith layer, b i is the ofset, and f i is the excitation function.
Here, for increasing the nonlinear expression of the network, ReLU function is preferentially selected as the excitation function of the convolution layer and full-link layer, whose expression is as follows: Furthermore, for reducing the parameters and feature dimension, max pooling is preferred; meanwhile, softmax function is adopted in the output layer.
At this time, the network's output predicts the possibility belonging to each class in the current iteration. Te specifc calculation formula is as follows: . . .
is to do normalization so that the sum of all classes' probability is 1.
Te training process of CNN is mainly divided into two parts: forward propagation and backward propagation. Backward propagation is used to update the training parameters to minimize the diference between the current classifcation results and the target classifcation, which is defned as follows: where m is the number of training samples, y is the target output and p (t) j is the probability value of predicting that the tth training sample x (t) belongs to class j: During forward propagation, the previous formulas will be used to calculate the classifcation results through the current network parameters. Te formula 1 y (t) � j indicates that it is 1 if class j is the same to real label, otherwise it is 0.
In the network training, the gradient descent algorithm is used to update the parameters. It is needed to calculate the gradient of the price function J(w): Computational Intelligence and Neuroscience Ten, the parameter iteration is carried out through the formula w � w − α · ∇ w J(w), where α is the learning rate.

2D-CNN Model Based on Multichannel Dimensionality
Reduced Image. Te 2D-CNN model based on multichannel dimensionality reduced image is also composed of the input layer, convolution layer, pooling layer, full-link layer, and output layer. Its characteristic is that the multiple dimensional data would be respectively convoluted and then summed; the following is the specifc description. Figure 3 shows that in the 2D-CNN convolution layer, the convolution core moves in two spatial directions. When the input is a color image, the input of the 2D-CNN model is three channels. If we want to use it for hyperspectral images or multichannel dimensionality reduced images of hyperspectral images, we can change the channel number from 3 to a greater value. Here, it is set that the channel number of calligraphy and painting multichannel dimensionality reduced images is m, the size of input image can be expressed as (m, h, w), and the size of convolution kernel is (m, P, Q). At this time, the convolution kernel is performed to do sliding window convolution operation of all values in a window (P, Q) on m channels. When all the sliding window calculations are completed, the calculated values in diferent channels are summed to obtain a feature image.
For the high-dimensional image, one 2-dimensional convolution kernel could be used to get one feature image, which means that the weighted product of diferent channel images to one convolution kernel is fnished for obtaining one feature image. When n convolution kernels are used, a group of n-dimensional feature images can be obtained. At this time, the calculation formula of the convolution layer can be expressed as follows: where v x,y i,j represents the value of the neuron at the position (x, y) in the jth feature image of the ith layer; f is the activation function, and the height and width of the convolution kernel are expressed by P i and Q i , respectively; the weight of the sth feature image of i − 1th layer linked to the neuron at the position (p, q) in the jth feature image of the ith layer is w p,q i,j,s , which is weighted, produced, and summed with the value of the position (x + p, y + q) in the sth feature image of the i − 1th layer, and b i,j indicates the bias.

3D-CNN Model Based on Multichannel Dimensionality
Reduced Image. Te input of the 3D-CNN model is highdimensional data, the convolution kernel is all high-dimensional, and it can extract features in three directions. So, for hyperspectral images or hyperspectral dimensionality reduced images, it is processed by the convolution kernel in three directions; it is no longer simply to extract spatial features or spectral features but to capture the overall spatial and spectral features as a whole.
Similar to the 2D-CNN model, the 3D-CNN model is composed of the input layer, 3D convolution layer, 3D pooling layer, full-link layer, and output layer. Unlike 2D-CNN, the characteristic of 3D-CNN is that the multiple dimensional data would be convoluted but not summed; the following is the further discussion. It is supposed that the size of the input data is (m, h, w), the channel number of the convolution kernel is n, and the size of the convolution kernel is (n, P, Q, R). Figure 4 shows that it is the convolution layer of the 3D-CNN convolution model.
It can be seen that the 3D convolution operation is no longer to process multiple channels of data such as the 2D convolution kernel but to do weighted product in multiple dimensional data directly without summation. In this structure, each feature in the convolution layer will be connected with adjacent multiple dimensional data in the upper layer, and a 3D feature is obtained through 3D sliding window convolution operation. Terefore, it has the better overall expression ability of spatial-spectral features.
At this time, the calculation formula of 3D convolution is wherein v x,y,z i,j represents the value of the neuron at the position (x, y, z) in the jth feature data of the ith layer; f is the activation function, and the length, height, and width of the convolution kernel are expressed as P i , Q i , and R i , respectively; the weight of the sth channel linked to the neuron at the position (x, y, z) in the jth feature data of the ith layer is w p,q,r i,j,s , which is weighted, produced, and summed with the sth feature data at the position (x + p, y + q, z + r) of i − 1th layer; b i,j indicates the bias.
In addition, the pooling layer of the 3D-CNN model is also diferent from the 2D-CNN model, which is realized by using 3D down sampling calculation, such as common 3D max pooling and 3D mean pooling.

Calligraphy and Painting Identification CNN Model Based on Multichannel Dimensionality Reduced Images
Based on the foregoing, by comparing diferent model's design and corresponding analysis, a basic network consisting of "4 convolution layers + 4 pooling layers + 2 fulllink layers" is obtained. In this section, the diferent types of convolution kernels and diferent input design of calligraphy and painting identifcation CNN model will be discussed frstly, and then the optimal calligraphy and painting identifcation CNN model will be selected through the experimental comparison in the next section.

2D-CNN Calligraphy and Painting Identifcation Model under Multiple MNF Pseudocolor Images of Mosaic as Input.
For the calligraphy and painting identifcation based on multichannel dimensionality reduced images by a single 2D-CNN, considering the diference of information contained in the diferent channel, it is planned to splice multiple groups of calligraphy and painting MNF pseudocolor images and take them into a single 2D-CNN. In this paper, MNF pseudocolor images consisting of diferent channel MNF images I mnf(i,j,k) are normalized, and then we do mosaic together in the column direction to obtain the new integrated data, which is conveniently used to be the input of the 2D-CNN model, that is, where concat represents the matrix stitching of multiple matrixes, one represents normalization, and 2 represents the column stitching.
In the application, the specifc model design is as follows: the frst convolution layer uses the 5 × 5 convolution kernel to generate 32 feature images, the second convolution layer uses the 5 × 5 convolution kernel to generate 64 feature images, the third convolution layer uses the 3 × 3 convolution kernel to generate 128 feature images, and the fourth convolution layer uses the 3 × 3 convolution kernel to generate 128 feature images. All convolution layers are processed by complementing 0, and the length and width of all convolution layers remain unchanged. Te pooling layer uses 2 × 2 window for maximum pool operation. Te step size of the convolution layer is (1, 1), and the step size of the pooling layer is (2, 2). Te two full-link layers adopt 1024 and 512 dimensions, respectively, and the output layer is trained by the classifcation of calligraphy and painting authors and the identifcation of calligraphy and painting authenticity, respectively. Here, the classifcation of calligraphy and painting authors refers to judging the author of calligraphy and painting through data analysis, and the authenticity identifcation of calligraphy and painting refers to judging whether calligraphy and painting is true or false through data analysis. Computational Intelligence and Neuroscience 7

2D-CNN Calligraphy and Painting Identifcation Model under Multichannel MNF Dimensionality Reduced Images
Directly as Input. For the case of using a single 2D-CNN for calligraphy and painting identifcation based on multichannel MNF dimensionality reduced images, the multichannel MNF dimensionality reduced images I mnf could also be directly input into the 2D-CNN model as a whole, instead of using MNF pseudocolor images of mosaic as input.
Similarly, we continue to select the previous unifed basic network of "4 convolution layers + 4 pooling layers + 2 fulllink layers," and the specifc model design is consistent with the previous model.

3D-CNN Calligraphy and Painting Identifcation Model under Multichannel MNF Dimensionality Reduced Images
Directly as Input. In this section, the 3D-CNN model is directly used for modeling and learning. In the case of multichannel MNF dimensionality reduced images directly as input, 3D-CNN convolution kernel is used for convolution operation so as to use the high-dimensional feature data. For the convenience of comparison with the previous models based on 2D-CNN, the designation about 3D-CNN in this section is also composed of the input layer, four 3D convolution layers, four 3D pooling layers, two full-link layers, and the output layer; the convolution layer and pooling layer are connected alternately.
Te 3D-CNN calligraphy and painting identifcation model based on multichannel MNF dimensionality reduced images is designed as follows: the frst convolution layer uses the 5 × 5 × 3 convolution kernel to generate 9 feature cubes, the second convolution layer uses the 5 × 5 × 2 convolution kernel to generate 16 feature cubes, the third convolution layer uses the 3 × 3 × 1 convolution kernel to generate 32 feature cubes, and the fourth convolution layer uses the 3 × 3 × 1 convolution kernel to generate 32 feature cubes. All convolution layers adopt complement 0, and the length, width, and depth of all convolution layers remain unchanged. Te maximum pooling method is used for down sampling by 2 × 2 × 2 window on the three directions. At this time, the step size of the convolution layer is (1, 1, 1), the size of the feature cube decays by 1/2 layer by layer, and the step size of pooling layer is (2, 2, 2). Te two full-link layers adopt 1024 and 512 dimensions, and the output layer is trained according to the classifcation of authors and the identifcation of authenticity, respectively.
Compared with 2D-CNN, the core of 3D-CNN is to use 3D convolution kernel to perform convolution operation in three directions of hyperspectral images. Not only the spatial structure characteristics of calligraphy and painting can be learned but also the internal correlation characteristics between spectral information with spatial information can be learned through 3D convolution operation. Te previous 3D convolution operation of spatial information and spectral information combination gives consideration to both the visual analysis of calligraphy and painting and spectral attribute learning. Meanwhile, the characteristic data after convolution will be connected with several adjacent subgraphs in the upper layer, so 3D-CNN learning is transitive. Hence, the 3D-CNN model can be used in calligraphy and painting identifcation model based on hyperspectral images.

Verification Test
In order to test the efect of the previous three kinds of CNN models, the following experiment of calligraphy and painting identifcation will be fnished for selecting the optimal one. Figure 5 shows that the hyperspectral image of a group of Chinese calligraphy and painting are obtained by the calligraphy and painting hyperspectral imaging scanning system. Furthermore, we take Qi Baishi's calligraphy and painting and Lu Yanshao's calligraphy and painting to get the data set. Ten, the visible and near-infrared hyperspectral data of Qi Baishi's true painting, Qi Baishi's fake painting, and Lu Yanshao's true painting are collected. Here, the number of spectral channels of visible near-infrared hyperspectral data is 134.
Trough data extraction, it is found that the spectrum of red seal of Qi Baishi's true painting is diferent from the spectrum of red seal of Qi Baishi's copied painting, as shown in Figure 6. In specifc experiments, the original hyperspectral images of calligraphy and painting are cut to image block by spatial dimension for getting more data.

MNF Dimensionality Reduction Test for Data
Preprocessing. According to the previous designation, the eigenvalue contribution rate is used as basis to select the channel number of MNF images. Generally, we select the appropriate channel number of feature images with 85% as the standard. Figure 7 shows that it is the eigenvalue contribution rate curve of MNF images of a group of calligraphy and painting hyperspectral images. It can be seen that when the number of characteristic components reaches 5, the main information contribution rate reaches 85%, and with the increase of the number of characteristics, the contribution rate basically increases equivalently and steadily. As mentioned earlier, the network input includes pseudocolor images and multichannel images. Because a pseudocolor image needs three channels and the characteristic contribution rate of the frst fve channels of the MNF image just exceeds 85%, so it is at least two groups of pseudocolor images, which could satisfy 85% of the contribution rate. In order to ensure that the information amount of the MNF image is as much as possible, another group of pseudocolor images is added in this study. Finally, the channel number of MNF images under dimensionality reduction is 9, namely, the frst nine channels of MNF images are selected as the dimensionality reduced data for learning. At this time, the main information contribution rate reaches 86%.
In addition, the size of the input data is compressed to 100 × 100 for meeting the input requirement of the model. After compression, the input data's dimension of 2D-CNN calligraphy and painting identifcation model under MNF pseudocolor image mosaic as input is 100 × 100 × 3, where 8 Computational Intelligence and Neuroscience the pseudocolor image is similar to the traditional RGB color image. Teir diference is that the selected three channels are used to replace the RGB three channels for image display. Te input data's dimension of 2D-CNN calligraphy and painting identifcation model under multichannel MNF dimensionality reduced images directly as input is 100 × 100 × 9, and the input data's dimension of 3D-CNN calligraphy and painting identifcation model based on multichannel MNF dimensionality reduced images is 100 × 100 × 9.  Figure 8. It can be seen from the previous explanation that the MNF feature images of the same author's true painting and false painting have shown some diferences; especially, when the feature extraction process is carried out by using the 2D-CNN authenticity identifcation model, with the increase of the depth of the convolution layer, the feature images of

Computational Intelligence and Neuroscience
Qi Baishi's true and false paintings also show more feature diferences. Furthermore, in order to analyze the feature expression ability of 2D-CNN models with diferent functions on the same input, the author classifcation model based on 2D-CNN and the authenticity identifcation model based on 2D-CNN are used to convolute Qi Baishi's same calligraphy and painting data, and then the diference of feature expression is compared. Te results are shown in Figure 9. It can be seen that when the same painting and calligraphy data are input into diferent 2D-CNN models with diferent functions, their output features show some difference; with the increase of depth of the convolution layer, it is indicated that the parameters of the feature expression 1 0 for diferent output targets are diferent, and the results of feature expression are also diferent on texture and color.

Calligraphy and Painting Feature Extraction Based on 3D-CNN.
In order to evaluate the feature extraction ability of 3D-CNN for calligraphy and painting data, 9-channel MNF dimensionality reduced the image block of Qi Baishi's true paintings and 9-channel MNF dimensionality reduced the image block of Qi Baishi's fake paintings are selected to feature extraction experiment. At this time, a diferent feature is extracted by 3D-CNN convolution operation with trained network parameters. Since the features obtained by the 3D-CNN model are 9-channel feature data, in order to facilitate display, only three channel images of each feature data are taken for pseudocolor display. Firstly, for the Qi Baishi's true painting and false painting, the 3D-CNN calligraphy and painting authenticity identifcation model under multichannel MNF dimensionality reduced image block directly as input is used for feature extraction. At this time, the original image and the output features of the fourth convolution layer are shown in Figure 10.
It can be seen from the previous explanation that the MNF feature images of true paintings and false paintings of the same author have shown some diferences. Te feature extracted by 3D-CNN is a stereo feature, and each feature contains multiple channels; while the feature extracted by 2D-CNN is a two-dimensional feature which contains only one channel; as a result, with the increase of the convolution layer's depth, the feature images of Qi Baishi's true paintings and false paintings carried out by the 3D-CNN authenticity identifcation model show more diferences.
Similarly, in order to analyze the feature expression ability of 3D-CNN models with diferent functions when the input image is same, the author classifcation model based on 3D-CNN and the authenticity identifcation model based on 3D-CNN are used to convolute a group of Qi Baishi's calligraphy and painting data, and the diferences of feature expression are compared in Figure 11.
It can be seen that when the same painting and calligraphy data are input into diferent 3D-CNN models with diferent functions, the output feature show some diferences. With the increase of convolution layer's depth, it is indicated that the parameters of feature expression are diferent, and the results of feature expression are also diferent in texture and so-called color.
In addition, it can be seen from the fgure that for Qi Baishi's true painting image blocks and false painting image blocks, the feature expression ability of the 2D-CNN model and 3D-CNN model under the same input is diferent. Especially, the feature diference of convolution layer's output of the 2D-CNN calligraphy and painting authenticity identifcation model is obvious in the spatial feature, while the convolution layer's output feature of the calligraphy and painting authenticity identifcation model based on 3D-CNN is a multidimensional image, which can provide more information in depth of feld dimensions. In other words, its feature expression is -multidimensional, and the feature information is richer.

Calligraphy and Painting Identifcation Experiment Based on Multichannel MNF Dimensionality Reduced Images.
In this experiment, we meanly test the efciency of 2D-CNN and 3D-CNN calligraphy and painting identifcation models based on multichannel MNF dimensionality reduced images.

Classifcation Experiment of Diferent Calligraphers and Painters.
In the test, nearly 100 typical calligraphy and painting works of Qi Baishi and 50 typical calligraphy and painting works of Lu Yanshao are used for hyperspectral imaging scanning for obtaining their visible near-infrared hyperspectral data. Because it is difcult to collect the calligraphy and painting samples with similar labels, there are low number of hyperspectral images of calligraphy and painting in the real experiment, so the original spectral images are cut to obtain the hyperspectral image blocks, which are actually used for learning. At this time, the amount of training data is greatly increased to thousands.
Ten, MNF is carried out to obtain multichannel MNF dimensionality reduced images. Te frst nine channels of MNF images are taken as learning data, and then the training of diferent models is carried out according to previous designation. Figures 12 and 13 show that it is a group of pseudocolor image blocks composed of diferent channel MNF feature images with diferent authors.
It can be seen that the visual information in MNF pseudocolor images synthesized by diferent channels MNF feature image is diferent, which can provide a lot of classifcation information for learning. Tere is also some different characteristic in spatial information between Qi Baishi's calligraphy and painting and Lu Yanshao's calligraphy and painting. For example, it focused on ink rendering with more lines about Qi Baishi's calligraphy and painting, while more curves and lines with richer rendering are used in Lu Yanshao's calligraphy and painting.
Next, the previous data are used for training with the 2D-CNN calligraphy and painting identifcation model under MNF pseudocolor image mosaic as input, 2D-CNN calligraphy and painting identifcation model under multichannel MNF dimensionality reduced image directly as input, and 3D-CNN calligraphy and painting identifcation model based on multichannel MNF dimensionality reduced images, and the target outputs are the classifcation of two authors. Te results are shown in Figures 14-16.
It can be seen that the 2D-CNN calligraphy and painting identifcation model under MNF pseudocolor image mosaic as input, the 2D-CNN calligraphy and painting identifcation model under multichannel MNF dimensionality reduced image directly as input, and the 3D-CNN calligraphy and painting identifcation model based on multichannel MNF dimensionality reduced image directly as input all Computational Intelligence and Neuroscience show good convergence. Furthermore, the accuracy of diferent methods with diferent kinds of input is counted, respectively, as shown in Table 1.
It can be further seen from Table 1 that for the author identifcation of diferent calligraphies and paintings, the accuracy in the test set of the 2D-CNN model under the multichannel MNF images directly as input reaches 93.7%, the accuracy of 2D-CNN based on MNF pseudocolor image mosaic as input reaches 92.6%, and the identifcation result of 3D-CNN based on multichannel MNF dimensionality reduced images reaches 93.2%. Although the accuracy of the calligraphy and painting author identifcation model based on 3D-CNN is slightly lower in the computing speed than that of model based on 2D-CNN, its convergence is more stable and faster in the computing step than that of 2D-CNN.  used for authenticity identifcation learning of calligraphy and painting. As a result, the amount of training data is also increased to thousands. Ten, MNF is performed to obtain the multichannel MNF dimensionality reduced image. For reducing the data amount, the frst nine channels MNF images are taken as the last experiment data. Figures 17 and  18 show that it is a group of pseudocolor image blocks composed of diferent MNF images of true works and fake works. Ten, the training of diferent models was fnished according to the previous designation.
It can be seen from the previous fgure that the visual information in pseudocolor images synthesized by three channels of MNF images of true paintings and false paintings is obviously diferent. Te spatial textures of the MNF pseudocolor image of Qi Baishi's true and false paintings are basically the same, but there is some diference in the visual pseudocolor, which indicates that MNF images can provide rich feature diferences between the true painting and the false painting.
Next, the previous data are used to train the 2D-CNN calligraphy and painting identifcation model under MNF pseudocolor image mosaic as input, 2D-CNN calligraphy and painting identifcation model under multichannel MNF dimensionality reduced image directly as input, and 3D-CNN calligraphy and painting identifcation model based on multichannel MNF dimensionality reduced images. Te results are shown in Figures 19-21.
Similarly, it can be seen that the 2D-CNN calligraphy and painting identifcation model under MNF pseudocolor image mosaic as input, the 2D-CNN calligraphy and painting identifcation model under multichannel MNF dimensionality reduced image directly as input, and the 3D-CNN based on multichannel MNF dimensionality reduced images all show good convergence. Te accuracy of diferent models is further counted; the results are shown in Table 2.
It can be further seen from Table 2 that for the authenticity identifcation of diferent calligraphies and paintings in the test set, the accuracy of the 2D-CNN calligraphy and painting identifcation model based on MNF multichannel images directly as input reaches 93.1%, the result of 2D-CNN based on MNF pseudocolor image mosaic as input reaches 92.8%, and the result of 3D-CNN based on multichannel dimensionality reduced images reaches 95.2%. So, the accuracy of 3D-CNN is the best. More importantly, the convergence process of the 3D-CNN calligraphy and painting identifcation model is more stable and faster than that of the 2D-CNN identifcation model.   and the identifcation is credible; when the part of its surface content is single black or there is no painting content, their refection spectrum information is relatively single, and their features are diferent from the features provided by the calligraphy and painting part with painting content. At this time, in the same work's training, the learning information in diferent parts with painting content and without painting content is diferent. Terefore, in the specifc training, the hyperspectral image block with painting content is mainly used for training, and the hyperspectral image block without painting content or less content may be wrongly identifed.
For example, Figures 22 and 23 show that for a group of true paintings and false paintings of Qi Baishi's, when the selected image block includes a lot of single ink information, the classifcation result judged by the expert eye may be wrong. While in the MNF pseudocolor images of true paintings with ink and false paintings with ink, there is some diference, which could maintain the classifcation by CNN be feasible.
However, when there is nothing or a few of information in the painting, the identifcation result may have a small probability of error. Figures 24 and 25 show that for a group of true paintings and false paintings of Qi Baishi, when the selected image block includes a large part of the paper and a little ink, it is difcult to identify by the expert eye. Meanwhile, there is few diferent characteristics in the MNF pseudocolor images, which make the judgment wrong sometimes.
In addition, because it is difcult to obtain a variety of true calligraphy and painting samples, in this paper, we only select the calligraphy and painting of Qi Baishi and Lu Yanshao for principal verifcation. For the identifcation of a large number of writers, the above learning model needs to be modifed appropriately. For example, in calligraphy and  painting writers' learning, the number of neurons in the network output layer needs to be adjusted to the number of writers. Meanwhile, in the test verifcation, a large number of image blocks without direct information are removed, such as the case in Figure 24. At last, in the training of diferent kinds of painting, the data with similar characteristic distribution should be selected as much as possible so that the essential diference of calligraphy and painting could be used to make the identifcation more accurate.

Summary
By combining the hyperspectral imaging and machine learning, it can greatly improve the comprehension efciency of calligraphy and painting identifcation. In order to reduce the redundancy and the amount of network parameters and improve the classifcation efect in the method of directly using hyperspectral images, we propose a deep learning method of painting and calligraphy identifcation based on space dimensionality reduction, which is objective      Computational Intelligence and Neuroscience convex dimensionality reduction mapping to compress the original hyperspectral image; meanwhile, we use 2D-CNN and 3D-CNN to realize the judgement of the calligrapher/ painter and authenticity in the dimensionality reduced image space. Te test results show that the 2D-CNN calligraphy and painting recognition model under MNF pseudocolor image mosaic as input and the 2D-CNN calligraphy and painting recognition model under multichannel MNF images directly as input all have high recognition accuracy, while the 3D-CNN calligraphy and painting recognition model based on multichannel MNF dimensionality reduced images not only maintains high recognition accuracy but also improves the convergence speed (step number) and learning stability, compared with the 2D-CNN calligraphy and painting recognition model; in particular, the 2D-CNN calligraphy and painting recognition model for author classifcation under multichannel MNF images directly as input and the 3D-CNN calligraphy and painting recognition model for authenticity identifcation based on multichannel MNF dimensionality reduced images perform the best among the methods mentioned in this article. Unfortunately, due to the small number of hyperspectral image samples of calligraphy and painting, especially the small number of fake calligraphy and painting data, the calligraphy and painting recognition model still needs to be further verifed and improved.

Data Availability
Te part of data used to support the fndings of the study can be obtained from the corresponding author upon request.

Conflicts of Interest
Te authors declare that they have no conficts of interest.

18
Computational Intelligence and Neuroscience