Application of Neural Networks to the Classification of Pancreatic Intraductal Proliferative Lesions

The aim of the study was to test applycability of neural networks to classification of pancreatic intraductal proliferative lesions basing on nuclear features, especially chromatin texture. Material for the study was obtained from patients operated on for pancreatic cancer, chronic pancreatitis and other tumours requiring pancreatic resection. Intraductal lesions were classified as low and high grade as previously described. The image analysis system consisted of a microscope, CCD camera combined with a PC and AnalySIS v. 2.11 software. The following texture characteristics were measured: variance of grey levels, features extracted from the grey levels correlation matrix and mean values, variance and standard deviation of the energy obtained from Laws matrices. Furthermore we used moments derived invariants and basic geometric data such as surface area, the minimum and maximum diameter and shape factor. The sets of data were randomly divided into training and testing groups. The training of the network using the back‐propagation algorithm, and the final classification of data was carried out with a neural network simulator SNNS v. 4.1. We studied the efficacy of networks containing from one to three hidden layers. Using the best network, containing three hidden layers, the rate of correct classification of nuclei was 73%, and the rate of misdiagnosis was 3%; in 24% the network response was ambiguous. The present findings may serve as a starting point in search for methods facilitating early diagnosis of ductal pancreatic carcinoma.

The aim of the study was to test applycability of neural networks to classification of pancreatic intraductal proliferative lesions basing on nuclear features, especially chromatin texture. Material for the study was obtained from patients operated on for pancreatic cancer, chronic pancreatitis and other tumours requiring pancreatic resection. Intraductal lesions were classified as low and high grade as previously described. The image analysis system consisted of a microscope, CCD camera combined with a PC and AnalySIS v. 2.11 software. The following texture characteristics were measured: variance of grey levels, features extracted from the grey levels correlation matrix and mean values, variance and standard deviation of the energy obtained from Laws matrices. Furthermore we used moments derived invariants and basic geometric data such as surface area, the minimum and maximum diameter and shape factor. The sets of data were randomly divided into training and testing groups. The training of the network using the back-propagation algorithm, and the final classification of data was carried out with a neural network simulator SNNS v. 4.1. We studied the efficacy of networks containing from one to three hidden layers. Using the best network, containing three hidden layers, the rate of correct classification of nuclei was 73%, and the rate of misdiagnosis was 3%; in 24% the network response was ambiguous. The present findings may serve as a starting point

Introduction
The incidence of ductal pancreatic carcinoma is rising, and the prognosis remains poor. For this reason early detection of this malignancy and identification of potential precancerous lesions is desirable. It is believed that ductal pancreatic carcinoma may develop from intraductal hyperplasia in a multistage process. Because hyperplastic lesions may be present nearby infiltrating carcinoma, it is possible to find cells derived from these lesions in the cytological smears obtained by fine needle aspiration biopsy from the pancreas evaluated for suspected cancer. Identification of cells originating from these hyperplastic lesions provides a chance of identifying patients at risk of dying of pancreatic carcinoma, yet potentially curable. For this purpose it is necessary to identify the cytological features of intraductal hyperplastic cells using special techniques such as image analysis.
The appearance of nuclear chromatin is a wellknown feature used to distinguish cells in pathological diagnostics. A regular arrangement of the elements on a surface, with repeating pattern is referred as texture. The texture characteristics of chromatin may be used to compare various morphological changes.
Apart from selection and extraction of image characteristics it is necessary to classify the data using not only classical methods of statistical classification but also newer approaches such as genetic algorithms, symbolic classification and neural networks. Analysis of ductal epithelial hyperplasia by neural networks was attempted because in our previous studies none of the individual parameters allowed for effective differentia-tion of hyperplastic lesions. These findings were compatible with the idea of usefulness of a set of features (nuclear signatures) instead of single ones [3].

Material and methods
Processed tissue consisted of resection or biopsy of the pancreas. Material was obtained from 35 patients. 14 patients had infiltrating pancreatic carcinoma (group I), 11 had chronic pancreatitis (group II) and 6 had other malignancies of this region (group III). Basic details of the groups under study were given in Table 1. The cases studied were the same as in previous papers [17,18]. In all groups tissue for histopathological examination was obtained from grossly normal pancreatic parenchyma (in group I beyond grossly visible cancer infiltrate). The tissue was fixed in formalin, routinely processed, embedded in paraffin, and then cut into 4 µm sections, stained with hematoxylin and eosin for histological examination, and with hematoxylin for morphometric analysis. The lesions were initially divided into four groups according to the criteria of Kozuka, Cubilla, and Fitzgerald [4,11]: -Flat hyperplasia, -Papillary hyperplasia, -Atypical papillary hyperplasia, -Preinvasive carcinoma.
Normal ductal epithelium and foci of infiltrating carcinoma from group I cases were also included in the analysis.
The lesions were finally classified as low and high grade, with low-grade lesions including normal epithelium, flat hyperplasia and papillary hyperplasia, and high-grade lesions including atypical papillary hyperplasia, preinvasive and invasive carcinoma. This classification is based upon the fact that low-grade lesions occur both in patients with and without pancreatic malignancy, whereas high-grade lesions are specific for pancreatic malignancy. In previous studies we concluded that nuclei of ductal pancreatic epithelial cells does not show significant differences within grades, but does between [17,18].
The system of image acquisition and analysis consisted of an Axioscop microscope (Zeiss, Germany) with a 100× immersing lens Plan-NeoFluar, a CCD camera ZVS-47DE (Optronics, USA) connected by a RGB line with GraBIT PCI card (Soft Imaging System GmbH, Germany) of a standard PC running Windows 95 (Microsoft Corp., USA). The software was developed by one of the authors (K.O.) in the Imaging C (ANSI C) language and was running in AnalySIS v. 2.11 (Soft Imaging System GmbH, Germany) image analysis environment [1,2].
Ducts showing proliferative lesions were marked on the slide by one of the authors (R.T.). To shorten the interactive processing time, image analysis was carried out in two phases. In the first phase selected images were entered in the computer system, filtered, segmented with an automatic threshold setting, and the nuclear profiles separated automatically. If necessary, nuclear profiles were manually corrected, and only these accurately delineated were selected for the further processing. The images of segmented nuclei in a 255 grey level scale were saved on a magnetooptic disc in the TIFF format, packed bits compressed. The data identifying lesion and case was included in each file. The procedure was continued until at least 100 nuclei per case and per lesion were acquired. In the second phase geometric and textural features of the nuclei were extracted in batch processing mode, and the results were saved in a text file. Then the data was normalised and converted into a network simulator compatible file.  The texture refers to the fashion in which smaller patterns are arranged on a surface. There are several methods of quantifying texture properties. The simplest one is to measure grey level standard deviation or variance. Their high values may indicate a greater variation of pixels in the image. Other methods commonly used for texture evaluation are grey level correlation matrix and Laws matrix derived energy. Table 3 summarises the parameters of nuclear chromatin texture used. Apart from regional distribution of chromatin, its central or peripheral location within the nucleus is also important. We decided to assess this distribution using moments.
The moment of the p, q order of an image function f (x, y) is defined as Thus defined moments represent shape of an object, and importantly, the gradient of distribution of grey levels with respect to the position of pixels within the image. It seems natural to use moments in the evaluation of chromatin distribution. Unfortunately, such direct use is impossible because of the dependence of the moments of the scale, location and rotation. To obtain invariability with respect to the location conversion to the gravity centre can be performed, thus obtaining the central moments: The invariability with respect to the scale is obtained through the conversion: These parameters are still sensitive to rotation; however from them seven invariants can be obtained: In our study apart from the above listed parameters of chromatin distribution, we also used basic geometric parameters of the nuclei such as surface area, the minimum and maximum diameter and shape (roundness) factor [14][15][16].
Neural network was simulated on a PC working in the Linux (kernel 2.0.36) environment using the Stuttgart Neural Network Simulator (SNNS) v. 4.1 (University of Stuttgart, Germany). Fully connected forward feed networks were used. As the number of features per nucleus was 33, they all had a 33 neu-rone input layer. Output layer was composed of 2 neurons. Networks contained from one to three hidden layers of 33 neurons each. We used logistic, perceptron and tanh activation functions, and identity output function. Learning was performed with standard back-propagation and back-propagation with momentum methods. Best networks had three hidden layers, perceptron activation functions, and were trained with back-propagation with momentum method.
The set of data contained 7833 records describing individual nuclei. Each consisted of 33 features listed above and a correct classification for comparing with the network output. From the dataset 2000 records were randomly selected as a learning group. Of the remaining 5833, records 500 served as a validation group to test the ability of the network to generalise during the training process. The training process lasted from 100 to 200 cycles; during it the network error was analysed for both the learning and validation group. Then the minimal error for the validation group was reached, the learning process was interrupted and the response of the network for all 5833 records not in training group was saved. These results were evaluated with a program from SNNS package by comparing the classification obtained from the network with the histological one recorded in the input file. For this evaluation the '402040' method was used, i.e., records with the output of the neuron corresponding to the correct classification in the range of 0.6 to 1 were deemed as correctly classified; the ones with the answer of 0 to 0.4 were deemed as misdiagnosed, whereas the ones with the answer of 0.6 to 0.4 the network response were deemed as ambiguous [19]. Statistical analysis of the results was performed using the Statistica for Windows 5.5 PL package (StatSoft Inc., USA).

Results
Ductal pancreatic carcinoma in the whole group I was an adenocarcinoma of intermediate differentiation. High-grade lesions occurred exclusively in group I. In 4 cases foci of intraductal carcinoma were found beyond the primary tumor. In 7 cases atypical papillary hyperplasia, in 8 papillary hyperplasia and in 9 flat hyperplasia was recognised in the ductal epithelium. In 9 cases there were ducts lined with histologically normal epithelium. In 2 cases from group I material for histopathological examination did not contain pancreatic parenchyma without carcinoma infiltration. Additionally, in a third case we found an infiltrating carcinoma in the pancreatic parenchyma identified as normal on gross examination. In groups II and III only low-grade lesions were identified. In group II pancreatic ducts were mostly lined with normal epithelium. In 5 cases flat hyperplasia and in one case papillary hyperplasia was present. Also in group III pancreatic ducts were mostly lined with normal epithelium. In 4 cases papillary hyperplasia and in 5 flat hyperplasia was present.
Mean values of many parameters describing nuclear chromatin differed between grades, but the ranges of values overlapped and there was no tendency of low and high-grade lesions to form distinct clusters (Fig. 1).
As shown in Table 4, the general rate of correct classification was over 70%. The number of misclassified cells was lowest when using a 5-layer network (3%); adding further complexity to the network did not significantly improved these results. Considering the correct classification of low and high-grade lesions (Table 5) it is notable that the effectiveness of recognising low grade lesions increases with the percent of correct classification reaching 80%, with the percent of correct classifications for high-grade lesions being 40-50%. The difference was due mainly to a large proportion of non-classified nuclei in the high-grade group, being above 35%.
Considering the classification of Kozuka, Cubilla and Fitzgerald, the results for individual lesions grouped into low-grade and high-grade category were similar ( Table 6). As for all 9 lesions the differences in  the results were statistically significant with p < 0.01, the differences among 3 lesions classified as low grade as well as among 3 lesions classified as high grade did not attain statistical significance (Kruskall-Wallis ANOVA). The differences in the results obtained by individual networks for individual proliferative lesions did not show any distinct regularity.
To assess the usefulness of the classification we evaluated the number of correctly classified nuclei by individual patients, taking into account only unequivocally classified nuclei. In group I, for a 3-, 4-and 5layer network the mean percentage of correctly classified nuclei was 88.4%, 90.4% and 94.6%, respectively. In case in which percentage of correctly classified nuclei was lowest, the figures were 79.8%, 80.0% and 87.8%, respectively. In group II the mean percentage of correctly classified nuclei was 95.0%, 96.5% and 97.0%. In case in which percentage of correctly classified nuclei was lowest, the figures were 87%, 91.4% and 94.3%. In group III the mean percentage of correctly classified nuclei was 96.5%, 96.6% and 97.0%. In case in which percentage of correctly classified nuclei was lowest, the figures were 91.9%, 89.2% and 90.0%. The lowest efficiency in group I is associated with the presence of high-grade lesions in this group only; as mentioned high-grade lesions were difficult to recognise by the network.

Discussion
One of the main uses of neuronal networks is classification of data. When classes comprise sets of objects with non-overlapping features, their recognition is easy. When the number of features necessary to differentiate classes is large or the values of the individual features for specific groups overlap, classification becomes difficult. The neural network may be in this case useful, although it may require highly complex network architecture. The overlapping of the values of nuclear features of intraductal hyperplasia of the pancreas was noted in our previous study [18].
An fundamental property of the network is its ability to generalise. That means that it is able to correctly classify not only cases used for training, but also the ones that were not present in the learning set. Classification using a trained neural network is very quick and its duration depends only linearly on the size of the data set. An artificial neural network is usually constructed using a program simulating the activity of the neural network on a classic computer. This facilitates changing network architecture and testing various classifiers. What is more, the neural network simulator such as SNNS may generate an executable program simulating the response of the network, possible to integrate as a function within the image analysis system.
The difficulties associated with the use of neural networks may be linked to the need for selection of the appropriate network architecture and training method to be used. The most frequently used neural network architecture is fully connected forward feed network, trained with the back-propagation algorithm; however Karakitsos et al. [9] proposed use of recurrent neural networks, which permit finding natural clustering in a set of data without any known classification and without need for a training phase.
Neural networks are now used in the systems of optical character recognition, systems of industrial steering; neural networks are also used in morphological diagnosis, especially in cytopathology, mainly gynaecological. The PAPNET system based upon neural networks was approved by the US Food and Drug Administration to be used in quality control of cytological smears from the uterine cervix. This system looks for suspicious cells, and presents them for evaluation by an operator. The same tool may be used in the cytological diagnosis of other material; Koss et al. [10] reported a high efficiency in rescreening of cytological preparations of the oesophagus, in several cases with the sensitivity higher than that in standard cytological examinations.
Hurst et al. [6] used a neural network to classify cells in cytological material obtained from the urinary bladder, using immunofluorescence staining for tumor related antigen p300. Their classification was correct in about 75%.
Molnar et al. [13] reported the agreement rate exceeding 95% in cytological material from the stomach. This high efficacy was probably associated with using DNA-specific staining and assessment of DNA ploidy, whose abnormalities, as is known, are frequently seen in neoplastic processes. These investigators reported also much higher efficacy of neural networks as compared with statistical multidimensional analysis and classifiers based on fuzzy logic. Similar results were reported by Karakitsos et al. [7][8][9].
McKeown et al. used neural networks to classify astrocytomas of different grades [12], basing upon the histological findings of pathologist. Using principal component analysis they identified the morphological features that were most important for establishing the diagnosis and grade. They recommend the system to improve standardisation of astrocytoma grading.
Concerning malignancies of the pancreatic region, Hittelet et al. [5] used the nuclear features, including the textural parameters, to differentiate between dysplastic and neoplastic cells of the epithelium of papilla of Vater. Using discriminating analysis the concordance was found in most of 38 cases, attaining the sensitivity of 88% and specificity of 96%.
For that we know, our paper presents first application of texture analysis with neural network classification to pancreatic hyperplastic epithelia. The purpose of our study was to establish a method of classifying cells, potentially useful in clinical practice, with the ultimate goal of application to cytology. The material obtained from the pancreas by means of fine needle aspiration biopsy may be scarce, and not sufficient for establishing a definite diagnosis. Use of methods such as histochemical staining including DNA-specific or immunohistochemical staining can give promising results, but the frequent paucity of the material poses difficulties to such an approach. Image analysis could improve efficiency of diagnosis in these cases, making possible recognition of cells originating from both carcinoma and hyperplastic high grade lesions and yet with all the material being available for standard cytological examination. The "potential usefulness", as we understand, refers to increasing the diagnostic capacity of standard cytological examination. First goal is to decrease the percentage of false negative biopsies, that is the cases in which misdiagnosis is due to excessively cautious interpretation. Second goal is to identify patients with pancreatic diseases accompanied by intraductal epithelial hyperplasia, who are at risk of developing pancreatic carcinoma. For this purpose, the division into low and high-grades is required and sufficient, because high-grade lesions do not occur in patients with pancreatic diseases that, like chronic pancreatitis, may clinically imitate carcinoma. Cytological diagnosis of high-grade lesions in the circumstances when the standard criteria of carcinoma are not met would give a chance to increase the sensitivity of fine needle aspiration biopsy in the diagnosis of pancreatic cancer. It seems that the probability of finding in a cytological preparation of cells derived from pancreatic intraductal proliferative lesions is higher in case of a clinically less advanced lesion. Potentially it could be possible to identify a group of patients with early pancreatic carcinoma or patients with pancreatic precancerous lesions only, and no infiltrating cancer.
The present findings indicate that epithelial hyperplastic lesions in the pancreas differ in nuclear features permitting correct classification of many of them. We believe that the presented method may offer a chance of using image analysis to classify cytological material obtained by means of fine needle aspiration biopsy of the pancreas. The present results may serve as a starting point in prospective searching for methods assisting the diagnosis of intraductal hyperplastic lesions in the pancreas. This in turn may appear useful in the diagnosis of pancreatic carcinoma in its early stage or identifying a high risk group.

Conclusion
-pancreatic intraductal proliferative lesions of low and high grade differ in nuclear texture parameters, -such parameters of individual cell nuclei may overlap, but a multilayer neural network can correctly classify most of nuclei, -classification of textural features by neural network could help to improve pancreatic cancer diagnostics.