Identification of Regional Lymph Node Involvement of Colorectal Cancer by Serum SELDI Proteomic Patterns

Background. To explore the application of serum proteomic patterns for the preoperative detection of regional lymph node involvement of colorectal cancer (CRC). Methods. Serum samples were applied to immobilized metal affinity capture ProteinChip to generate mass spectra by Surface-Enhanced Laser Desorption/ionization Time-of-Flight Mass Spectrometry (SELDI-TOF-MS). Proteomic spectra of serum samples from 70 node-positive CRC patients and 75 age- and gender-matched node-negative CRC patients were employed as a training set, and a classification tree was generated by using Biomarker Pattern Software package. The validity of the classification tree was then challenged with a blind test set including another 65 CRC patients. Results. The software identified an average of 46 mass peaks/spectrum and 5 of the identified peaks at m/z 3,104, 3,781, 5,867, 7,970, and 9,290 were used to construct the classification tree. The classification tree separated effectively node-positive CRC patients from node-negative CRC patients, achieving a sensitivity of 94.29% and a specificity of 100.00%. The blind test challenged the model independently with a sensitivity of 91.43% a specificity of 96.67%. Conclusions. The results indicate that SELDI-TOF-MS can correctly distinguish node-positive CRC patients from node-negative ones and show great potential for preoperative screening for regional lymph node involvement of CRC.


Introduction
Pathologic stage represents the most important prognostic factor for patients with colorectal carcinomas (CRCs) [1,2]. It has been shown that many regional lymph node involvements in CRC are found with the mode of small lymph nodes (less than 5 mm in diameter) [3,4], and standard pathologic evaluation may overlook these low-volume nodal metastases, thereby failing to identify nodes imperative to accurate staging. Therefore, it is extremely necessary to understand the molecular alterations, which confer a regional lymph node involvement and then use this information to enhance node-staging accuracy and individual patient management.
Because of the marked heterogeneity of CRC, a panel of biomarkers for screening and diagnosis would be most appropriate. Surface-enhanced laser desorption/ionization timeof-flight mass spectrometry (SELDI-TOF-MS), an affinitybased mass spectrometry method using a protein chip modified with a specific chromatographic surface, is a modified matrix-assisted laser desorption/ionization mass system, overcoming many of the limitations of 2-dimensional ele-ctrophoresis and matrix-assisted laser desorption/ionization TOF mass spectrometry [5,6]. This advanced and smart technique's practical application, as to analysis of complex biologic specimens such as cell lysates, serum, and body fluids, can detect multiple protein changes simultaneously with high sensitivity and specificity [5][6][7][8][9]. In recent years, several groups had investigated serum samples from CRC patients and controls to construct patterns for identification of CRC patients from healthy controls with high sensitivity and specificity by using SELDI-TOF-MS and different protein chips [10,11]. The aim of the current study was to investigate the application of serum SELDI protein profiling for the preoperative detection of regional lymph node involvement state of CRC. As pooling serum samples may lead to loss of potential biomarkers in SELDI-TOF-MS proteomic profiling [12], quality control sample was offered by a healthy volunteer (male, 42 years old). The quality control serum sample was used to determine reproducibility and treated as a control protein profile for each SELDI-TOF-MS experiment.

Patients and Serum
Two milliliters of whole blood were collected by venipuncture into a vacuum tube in the morning before food intake (two days before operation) then were deposited to clot at 4 • C for 2 hours. The blood was later centrifuged for 20 min at 700 g, aliquoted into 100 μL, and frozen for storage at −80 • C until used.

SELDI-TOF-MS Protein
Analysis. The IMAC30 array, which is suitable for this work [13], was assembled in 8-well ProteinChip Bioprocessor, pretreated with 50 μL 100 mM CuSO 4 to each well for 5 minutes at room temperature, washed 5 times with 50 μL distilled water, then incubated with 50 μL neutralizing buffer (100 mM NaAc, pH4) on a platform shaker for 5 minutes, and washed with 50 μL distilled water for 5 times. After being equilibration with binding buffer (100 mmol/L sodium phosphate, 500 mmol/L sodium chloride, pH7.0), the IMAC30 array was chelated with copper for capturing copper-binding proteins through histidine, tryptophan, cysteine, or phosphorylated amino acids. Serum samples were diluted 1 : 3 into U9 buffer (9 M urea, 2%CHAPS, 50 mM Tris-HCl, pH 9.0) and incubated on ice for 30 minutes. Then the diluted samples were diluted 1 : 13 with the U9 buffer. Each array spot was loaded with 50 μL of diluted serum sample with the 8-well ProteinChip Bioprocessor. After incubated in a platform shaker at room temperature for 60 minutes and the unbounded sample being discarded, each spot was equilibrated with the binding buffer (50 μL/spot) twice (5 minutes per time) to remove the nonspecific binding proteins. The array was then quickly rinsed with 150 μL of distilled water before air-drying. Each spot was loaded with 0.5 μL of saturated sinapinic acid solution prepared in 50% (vol/vol) acetonitrile and 0.5% (vol/vol) trifluoroacetic acid. After air-drying, sinapinic acid solution was added again. Then the array was read by Pro-teinChip reader. Amount of 31 protein chip arrays were done one by one as mentioned above for the analysis of samples. During experiments, the quality control serum sample was used to test the reproducibility of a single IMAC30 chip (intraassay) and that between chips (interassay).
The PBS-II(c) ProteinChip reader was calibrated with the "All-in-one" peptide standard (Ciphergen Biosystems). Each spot was scanned by a laser with the intensity of 200 and a detector with the sensitivity of 9. Mass-to-charge (m/z) ratio was optimized from 2,000 to 20,000, with a maximum of 150,000. The selected sample spots were exposed to the laser beam at 15 different positions, 7 spots for each position. The TOF mass spectra were then collected using Ciphergen's ProteinChip Software 3.1.

Bioinformatics and Biostatistics.
The entire dataset was separated into training set and test set before analysis. The training set was used to construct the classification tree, which was consisted of spectra data from 75 node-negative CRC patients and age-and gender-matched 70 node-positive patients. The test set, which was consisted of the other spectra data of 35 node-positive patients and age-and gendermatched 30 node-negative patients, was used to challenge the discriminatory ability of the classification algorithm blindly.
All spectral data were normalized by total ion current after background subtraction. The range of peak masses was settled between m/z 2,000 and 20,000 because the majority of resolved protein/peptides were found in this range. The m/z from 0 to 2,000 was excluded from analysis because they were mainly the signal noises of the energy-absorbing molecule. The Biomarker Wizard Software (Ciphergnen Biosystems) was subsequently used to make peak detection and clustering across all spectra in the training set and test set with the following settings-for peak detection, the signal-to-noise ratio was 3 and minimum peak threshold was 20%, while for cluster completion, the cluster was 0.5% and the signalto-noise ratio for the second pass was 2. The spectral data were then exported as spreadsheet files. The spectral data of training set were further analyzed by the Biomarker Pattern Software (version 4.0; Ciphergnen Biosystems) to develop a classification tree. The classification tree was set up to divide the training dataset into node-positive CRC patients and node-negative patients through multiple rounds of decision-making in training mode. When the dataset was first transferred to Biomarker Pattern Software, the dataset formed a "root node". Based on intensity, the software tried to find the best peak to separate this dataset into 2 "child nodes". To achieve this, the software would identify the peak and set the concerning intensity threshold. If the peak intensity of a blind sample was not higher than the threshold, this peak would go to the left-side child node; otherwise the right-side. The process would go on for each child node until a blind sample entered a terminal node, either labelled as node-positive patients or node-negative patients. Peaks, which were selected during the process to form the model, were the ones that yielded the least classification error when being combined to be used. After cross-validation in test mode, the validity of this decision tree was further verified using the test set data blindly, which is independent of the training set.

Statistical Analysis.
Comparison of relative peaks intensity levels between groups was made by using the Student ttest and in all cases P < 10 −4 was considered, statistically, significant. Comparison of rates between groups was conducted using the χ 2 test and P < 0.05 was regarded as a significant difference.

Serum SELDI Profiles of CRC with Regional Lymph
Node Involvement versus CRC without Regional Lymph Node Involvement. Spectra from 145 serum samples of CRC patients were acquired in the training set. The protein peaks were identified with masses from m/z 2,000 to 20,000, and 46-peak cluster or common peaks were generated from the identified peaks using the Biomarker Wizard Software. It was found that most of the peaks were detected between m/z 3,000 and 16,500, and the comparability among different samples showed that overall serum profiles from node-positive CRC patients and node-negative patients were very similar despite a few of intersample variations. Therefore, the variations that consistently differentiate these 2 different groups could be considered as the biomarkers of node-positive CRC patients and were considered the most useful for protein profiling. As peaks data from the training set were saved and exported for pattern recognition by Biomarker Pattern Software, a classification tree was created thereby from the training set to discriminate the node-positive CRC patients from node-negative patients. Figure 1 represents the spectral views showing these protein peaks at m/z 7,970 in these 2 groups. From the quantitative point of view, the average normalized intensities of these proteins were either over-or low expressed in node-positive CRC patients (Table 1). These difference were statistically significant (P < 0.05).
Among these classification trees generated by adjusting the setting of Gini, costs, advance, and testing of Biomarker Pattern software, the optimal classification tree with the lowest error cost was eventually established. The selected classification tree is simple and straightforward and used 2 splitters (Node 1 and Node 2) with distinct masses of m/z 3,104, 3,781, 5,867, 7,970, and 9,290, respectively, and classified 3 terminal nodes (Figure 2). The variable importance score of some peaks were shown in Table 2. The error rate of the generated classification tree was estimated through a process of cross-validation.
Performance of the generated classification tree is summarized for the training and test sets. For training set part, the classification algorithm was firstly challenged on learning  Table 3).

Quality Control and Reproducibility.
The reproducibility of each SELDI protein chip assay spectra, that is, mass and intensity from array to array on a single IMAC30 chip (intraassay) and between chips (interassay), was determined by the one healthy volunteer serum sample. Three proteins, among the range of m/z 3,000 to 10,000 observed on spectra randomly selected over the course of the study, were used to calculate the coefficient of variance (CV). The intra-and interassay CVs for mass were both 0.03% while the intraand interassay CVs for the normalized intensity were 17.20% and 19.48%, respectively. Little variations with day-to-day sampling and instrumentation or chip variations were also found.

Discussion
The causative reason of the death of CRC is associated directly with stage and therapeutic methods. Presence and extent of regional lymph node involvement predict outcome in patients with CRC. In terms of diagnosis, treatment, and survival in patients, completeness of nodal resection and staging accuracy has significant implications with this disease [14]. Up to 30% of patients with node-negative  CRC staged by standard pathologic techniques ultimately suffer from disease recurrence and tumor-related mortality following potentially curative primary resection. Traditionally, the methods of local staging for CRC include digital rectal assessment, proctoscopy, transrectal ultrasonography (TRUS), pelvic computed tomography (CT), and magnetic resonance imaging (MRI). More recently, positron emission tomography has also been taken into consideration. In spite of numerous studies and meta-analyses being performed not only for comparing the performance of these staging

||||||||||||||||||||||||||
Peaks were named by their mass to charge ratio (m/z). * The most important peak was assigned an importance index of 100. The importance of other peaks was compared with the top peak and a number below 100 was given for each peak. modalities all together but also for specific parameters (T and N stage, circumferential margin), no general agreement has been achieved [15][16][17]. The molecular and cellular heterogeneity of CRC results in the expression variety of tumor cell products. Analysis of the resultant protein profile may have greater efficiency in node-staging by selecting a combination of protein alterations (pattern recognition) rather than by focusing on specific tumor marker. Evaluations of conventional serologic markers, such as CEA and CA19-9, have yielded confusing results with poor sensitivity and specificity for use in early detection. To our knowledge, these biomarkers have not yet been used in node-staging.
Two-dimensional gel electrophoresis has traditionally been used to identify differences in protein expression in terms of serum, saliva, or tissue specimens, with identified proteins subsequently being excised from the gel and being subjected to peptide mapping analysis by mass spectrometry, which is used for the identification of proteins [18,19]. However, it is labor and time intensive that can be hardly to be reproduced. Besides this, its character confines itself for not being able to handle proteins with molecular weights of less than 10 kDa. SELDI-TOF-MS is able to generate highthroughput protein profile that can afterwards be analyzed to tell protein patterns differences between patients with disease and healthy controls; even it can be used to distinguish patients with different disease stage. Proteomic analysis of serum samples from patients with pancreatic, gastric, breast, nasopharyngeal, liver, ovarian, prostate, and colorectal cancer using SELDI-TOF-MS has been approved feasibility to identify reproducible protein profile that is associated with specific tumor biomarkers, which can definitely be used for early detection of disease [5][6][7][8][9].
Currently, we attempt to combine the SELDI protein chip technology and an artificial intelligence classification algorithm to screen serum protein spectra in the Chinese population of node-positive CRC patients and node-negative patients. For the training set, all 145 serum samples were used to profile protein peaks and to detect important peaks. With a panel of 5 peaks, a classification and regression tree was set up by using Biomarker Pattern Software, which yielded a sensitivity of 94.29% and a specificity of 100% in differentiating node-positive CRC patients from nodenegative patients. Furthermore, the blind test challenged the model with a sensitivity of 91.43% (32 of 35), a specificity of 96.67% (29 of 30), and a positive predictive value of 96.97% (32 of 33).
Regarding the study of serum biomarkers for CRC, Chen et al. [10] investigated in 55 serum samples from patients with CRC and 92 healthy individuals with corresponding physiological features by using H4 protein chips and SELDI-TOF-MS. The analysis software (artificial neural network classifier) separated CRC from healthy individuals, with a sensitivity of 91% and specificity of 93%. Four topscored peaks at m/z of 5,911, 8,930,8,817, and 4,476, were finally selected as the potential "fingerprints" for detection of CRC. Liu et al. [11] used SELDI protein chip (IMAC3) arrays to screen both patients with CRC and health people. The Biomarker Wizard and Biomarker Pattern Software packages were also applied and then constructed a pattern with 2 protein peaks, achieving a sensitivity of 95.00% and specificity of 94.87%, respectively, in masked analysis of an independent set of serum samples. As for staging of CRC, Xu et al. [20] detected the serum proteomic pattern by using SELDI-TOF-MS technology and CM10 protein chip in CRC. They built up a model formed by 6 protein peaks (m/z 2,759, 2,964, 2,047, 4,795, 4,139, and 37,761), which could distinguish local CRC patients (stage I and stage II) from regional CRC patients (stage III). By comparison, the serum biomarkers they found were quite different; this may be due to different types of chips they used and the patients included.
As serum protein profile alternates with the development of cancer, we deem that there must be some proteins representing the characteristic of lymph node involvement of CRC. Considering that IMAC30-Cu 2+ chips are the improvement of IMAC3-Cu 2+ chips and sporadic moderately differentiated adenocarcinoma accounts for about 50% of all CRC, we studied lymph node stage of the colorectal sporadic moderately differentiated adenocarcinoma using IMAC30-Cu 2+ chips.
Also, we compared the sensitivity and specificity of the classification tree we built up and those of TRUS and MRI reported in lectures, the former surpass the later.
The present work explored a panel of highly sensitive and specific serum biomarkers using the SELDI protein chip technology, combining with an artificial intelligence classification algorithm. The model could classifiy node (+) patients of colon cancer and rectum cancer from the node (−) ones with a sensitivity close to 100%. These results suggested that the diagnosis ability of the model plays high sensitivity and specificity in classifying both colon cancer and rectum cancer. Although, these biomarkers provide a potential diagnostic platform for CRC node-staging, which need conformation and reproducibility by much larger and more detailed dataset, the point is that such an innovative clinical diagnostic method has the potential to improve the preoperative node-staging and optimize the individual management of CRC. The 22 protein peaks, either over-or low expressed in CRC with regional lymph node involvement, are now being identified by HPLC and MALDI-MS-MS in our laboratory.
In brief, the serum protein profiling using SELDI-TOF-MS could differentiate CRC with regional lymph node involvement from patients without regional lymph node involvement with a higher degree of sensitivity, specificity, and accuracy. This pioneering technology will doubtless enjoy a promising development room for figuring out CRC preoperative node-staging, efficiently and veraciously.