Novel Serum Biomarkers to Differentiate Cholangiocarcinoma from Benign Biliary Tract Diseases Using a Proteomic Approach

Background and Aim. Cholangiocarcinoma (CCA) is the most frequent biliary malignancy, which poses high mortality rate due to lack of early detection. Hence, most CCA cases are present at the advanced to late stages with local or distant metastasis at the time of diagnosis. Currently available tumor markers including CA19-9 and CEA are inefficient and of limited usage due to low sensitivity and specificity. Here, we attempt to identify serum tumor markers for CCA that can effectively distinguish CCA from benign biliary tract diseases (BBTDs). Methods. Serum samples from 19 CCA patients and 17 BBTDs were separated by SDS-PAGE followed with LC-MS/MS and were subjected to statistical analysis and cross-validation to identify proteins whose abundance was significantly elevated or suppressed in CCA samples compared to BBTDs. Results. In addition to identifying several proteins previously known to be differentially expressed in CCA and BBTDs, we also discovered a number of molecules that were previously not associated with CCA. These included FAM19A5, MAGED4B, KIAA0321, RBAK, and UPF3B. Conclusions. Novel serum biomarkers to distinguish CCA from BBTDs were identified using a proteomic approach. Further validation of these proteins has the potential to provide a biomarker for differentiating CCA from BBTDs.


Introduction
Cholangiocarcinoma (CCA) is one of the highly aggressive malignant tumors that arise from the cholangiocytes lining biliary trees [1]. The incidence and mortality of the disease continue to increase worldwide, and the highest incidence has been observed in the Southeast Asia, especially in Thailand [2,3]. The prognosis of this malignancy is poor due to its silent clinical characteristics, difficulties in early diagnosis, and limited therapeutic measures. At present, radiotherapy and chemotherapy do not significantly improve the survival rate, while the resection of detected tumors at the early stage offers the best curative treatment [4]. Clinical presentations of most CCA patients include biliary tract obstruction; however, many cases of benign biliary tract diseases (BBTDs) are also presented with similar clinical symptoms [5]. Differences in the treatment and prognosis between CCA and BBTDs urge us a need to identify accurate tumor biomarkers that can differentially diagnose the CCA from BBTDs. As CCA typically grows along the bile duct without protruding outward as a forming mass, therefore current imaging techniques including ultrasound, computed tomography (CT), and magnetic resonance imaging (MRI) are not efficient to reveal this lesion [6]. Laboratory assessments for CCA are often not sensitive, nor specific enough. Distinguishing between benign and malignant causes of biliary tract obstruction based on biopsies is rather difficult and usually inadequate to provide an accurate measure. Currently, determination of the serum marker carbohydrate antigen 19-9 (CA19-9) concentration is routinely applied in most laboratories for 2 Disease Markers CCA detection. However, a wide range of sensitivity (50-90%) and specificity (54-98%) of this biomarker for CAA has been reported [7][8][9], and the elevated serum CA19-9 has also been observed in patients with BBTDs [10,11]; therefore, the use of CA19-9 for differentiating CCA and BBTDs is not reliable. Other serum markers including carcinoembryogenic antigen (CEA) and cancer antigen 125 (CA125) have also been used for detecting CCA, but these markers are not satisfactory for CCA detection due to low specificity and sensitivity for screening [12][13][14]. Hence, identification of new tumor markers in the serum would be beneficial in the clinical management of this disease.
In recent years, quantitative proteomics has gained considerable attention and investment in order to identify diagnostic biomarkers for several diseases, including a variety of cancers [15]. In the present study, the proteome of serum samples from CCA patients were quantitatively compared with that of patients with BBTDs, who have shared many molecular and imaging features with CCA. A large-scale quantitative global protein profiling of serum coupled with bioinformatic analyses would identify a proteomic signature for effectively differentiating CCA from BBTDs. Patterns of differentially serum protein expression between CCA and BBTD patients were exploited for development of diagnostic or prognostic tool for this type of cancer.

Serum Samples.
Serum samples were collected from obstructive jaundice patients who underwent endoscopic retrograde cholangiography (ERCP) or biliary tract surgery at Rajavithi Hospital. The use of human materials was approved by the research ethics committee of Rajavithi Hospital. Seventeen patients with BBTDs and 19 CCA patients were enrolled in this study. The diagnosis of CCA was carried out using one of the following criteria: (i) tissue biopsy; (ii) cytology plus radiological (CT scan or MRI) and clinical observation to identify tumor progression at a follow-up of at least two months. Serum samples from these patients were separated by centrifugation and stored at −80 ∘ C within 1 h. The biochemical determinations of serum markers, including CEA and CA19-9, were performed using routine automated methods in the Pathological Laboratory at Rajavithi Hospital.

Sample Preparation, Electrophoresis, and Trypsin
Digestion. Samples were treated with protease inhibitor cocktail and protein extraction from serum was carried out in lysis buffer containing 8 M urea and 10 mM dithiothreitol (DTT). Protein concentration was determined using Bradford protein assay with bovine serum albumin as a standard. Fifty micrograms of total serum proteins were resolved on 12.5% SDS-PAGE. The gel was then fixed for 30 min in a fixing solution containing 50% methanol, 12% acetic acid, and 0.05% formaldehyde, washed twice for 20 min in 35% ethanol, and then sensitized in 0.02% (w/v) sodium thiosulfate for 2 min with mild agitation. After washing twice for 5 min each with deionized water, the gel was then stained with 0.2% (w/v) silver nitrate for 20 min and washed twice prior to the detection in a developing solution (6% (w/v) sodium carbonate, 0.02% (w/v) sodium thiosulfate and 0.05% formalin). The staining was stopped by incubation in 1.5% Na 2 EDTA solution for 20 min. Finally, the stained gel was washed three times for 5 min each with deionized water. The gel was scanned using a GS-710 scanner (Bio-Rad, Benicia, CA) before being stored in 0.1% acetic acid until in-gel tryptic digestion.
The gel lanes were divided into 5 fractions according to the standard protein markers and then subdivided into 15 ranges. Each gel range was chopped into pieces (1 mm 3 /piece), which were dehydrated in 100% acetonitrile (ACN) for 5 min with agitation and dried at room temperature for 15 min. Subsequently, the cysteine residues were blocked with 10 mM DTT in 10 mM NH 4 HCO 3 for 1 h at room temperature and alkylated with 100 mM iodoacetamide in 10 mM NH 4 HCO 3 for 1 h at room temperature in the dark. The gel pieces were dehydrated twice in 100% ACN for 5 min and then were incubated with 0.20 g trypsin in 50% ACN/10 mM NH 4 HCO 3 for 20 min. Purified peptide fractions were dried and reconstituted in 2% ACN and 0.1% formic acid for subsequent LC-MS/MS.

Liquid Chromatography-Tandem Mass Spectrometry (LC/MS-MS).
The LC-MS/MS analysis was carried out using a Waters nanoACQUITY ultra performance liquid chromatography coupled with a SYNAPT HDMS mass spectrometer. A 5-L aliquot of peptide fractions was injected using a builtin nanoACQUITY auto sampler onto a Symmetry C18 trapping column (200 m × 180 mm, 5 m particle size; Waters) at 10 L/min flow rate for on-line desalting and then separated on a C-18 RP nano-BEH column (75 m id × 200 mm, 1.7 m particle size, Waters) and eluted in a 30 min gradient of 2% to 40% ACN in 0.1% formic acid (FA) at 350 nL/min, followed by a 10-min ramping to 80% ACN-0.1% FA and a 5-min holding at 80% ACN-0.1% FA. The column was reequilibrated with 2% ACN-0.1% FA for 20 min prior to the next run. The MS nanoion source contained a 10-m analyte emitter (New Objective, Woburn, MA) and an additional 20-m reference sprayer through which a solution of 200 fmol/ L Glu Fibrinopeptide B (Glufib) in 25% ACN-0.1% FA was constantly infused at 200 nL/min for external lock mass correction at 30 s intervals. For all measurements, the MS instrument was operated in V mode (at 10,000 resolution) with positive nanoES ion mode. The instrument was tuned and calibrated by infusion of 200-fmol/ L Glufib and set up for a spray voltage at 2.7 kV and sample cone voltage at 45 eV. The spectral acquisition time was 0.6 sec. In MS expression mode, low energy of trap was set at a constant collision energy of 6 V. In elevated energy of MS expression mode, the collision energy of trap was ramped from 15 to 40 V during each 0.6-s data collection cycle with one complete cycle of low and elevated energy. In transfer collision energy control, 4 V and 7 V were set for low and high energy, respectively. The quadrupole mass analyzer was adjusted such that ions from m/z 200 to 1990 were efficiently transmitted.

Data Processing, Protein Identification, and Data Analysis.
Continuum LC-MS data were processed using ProteinLynx Global Server version 2.4 (Waters) for ion detection, clustering, and mass correction. Protein identification was performed with the embedded ion accounting algorithm against NCBI human protein database with the minimum cutoffs of two peptides/proteins. The relative quantitation ratios were log 2 -transformed, processed with median normalization for each sample and rank normalization across the data set. The data were subjected to a 6-fold cross-validation. A differentially expressed (DE) protein was defined as having a value of <0.01, based on -distribution with Welch approximation, in all data sets in the fold validation. The visualization and statistical analyses were performed using the MultiExperiment Viewer (MeV) in the TM4 suite software [16]. Other information including protein categorization and biological function was analyzed according to protein analysis through evolutionary relationships (Panther) protein classification [17]. Known and predicted functional interaction networks of identified proteins were derived from the STRING database version 9.1 [18].

Statistical Analysis.
Comparisons between the quantitative variables were performed using either the Mann-Whitney or Student's -test, where appropriate. Qualitative variables were reported as counts, and comparisons between independent groups were performed using Pearson Chisquared tests. values of less than 0.05 were considered significant.

Patient Characteristics.
A total of 36 subjects were included in this serum proteome study, of which 17 were diagnosed as having BBTDs and 19 were diagnosed as having CCA. The BBTD cases included intrahepatic duct stones, common bile duct stones, and benign bile duct strictures. The CCA cases included perihilar cholangiocarcinoma, intrahepatic cholangiocarcinoma, and middle and distal common bile duct cancer. The clinical characteristics of the patients in this study are shown in Table 1. No statistically significant differences were found among the data of the BBTD patients and Purified proteins from these samples were then separated by SDS-PAGE. After migration, entire lanes were divided into 5 sections, which were excised into slices and treated with in-gel digestion. The resulting tryptic peptides were subjected to reverse-phase LC-MS/MS, from which the mass spectrometric results were then analyzed for protein identification and quantification. The relative quantitation ratios were subjected to statistical analyses and 6-fold cross-validation to retrieve the DE proteins between BBTDs and CCA. those with CCA regarding gender, age, and CEA. Although the level of CA19-9 in the serum of patients with CCA was significantly higher when compared to the control patients, the range of detection in both groups was exactly the same (0.60-10000).

Serum Proteome
Profiling. An overview of the experimental strategy conducted in this study is shown in Figure 1. The proteome of serum samples from CCA patients was compared with the serum proteome of the BBTD controls in order to identify the proteins in serum, in particular those that are secreted or leaked from tissues including potential differential protein biomarkers from tumor cells. A total of   Only proteins identified and quantifiable in all folds in cross-validation were further analyzed, allowing for stringent and sensitive protein identification and quantification of differential proteins.

Identification of Differentially Expressed Proteins between CCA and BBTDs.
Applying a value cutoff of <0.01 yielded a total of 94 candidate proteins, with 32 of them up and 62 down in observed abundance for the serum samples from CCA patients comparing to the BBTD controls (Table 2 and Figure 2(a)). We also tested the discriminatory power of these differentially expressed proteins using unsupervised hierarchical clustering. As shown in Figure 2(c), the spectral counts for these proteins resulted in near complete separation of the CCA cases from the BBTD control cases with only two exceptions where BBTD cases were clustered with the CCA samples. However, the PCA scores plot based on the normalized data of serum samples showed a clear separation between the CCA patients and BBTD controls (Figure 2(b)). The Panther classification system was used to identify the functional attributes of the 94 potential CCA-selective proteins. The analysis of the abundance of each functional category revealed substantial differences in CCA serum proteome compared to the BBTD serum proteome. The number of each functional class of differentially expressed proteins is schematically depicted in Figure 3. The analysis revealed significant enrichment of proteins related to a number of various biological functions such as cell adhesion molecules, cytoskeletal proteins, defense/immunity proteins, enzymes and the modulators, extracellular matrix proteins, membrane traffic proteins, nucleic acid-binding proteins, receptors, signaling molecules, structural proteins, transcription factors, transfer/carrier proteins, and transporters. To gain an overview of the biological interaction among the identified proteins, we also constructed the protein-protein functional networks using String database (Figure 4). The protein network analysis provides us a clearer view of a complex framework of proteins that might result in the differences in CCA and BBTDs.
To determine the distinguishing performance of the top five differentially expressed proteins in terms of fold-change, the comparison of the averaged log 2 folds of family with sequence similarity 19 (chemokine (C-C motif)-like), member A5 (FAM19A5) protein, KIAA0321 protein, melanomaassociated antigen D4 (MAGED4B), RB-associated KRAB zinc finger protein (RBAK), and regulator of nonsense transcripts 3B (UPF3B), between CCA and BBTD cases from all cross-validation cohorts was shown in Figure 5. However, due to the limited resources and the lack of availability of an independent validation set, the diagnostic relevance of such molecules for CCA requires further investigation.

Discussion
CCA is the second most prevalent primary hepatobiliary malignancy and represents about 3% of all gastrointestinal cancers [1]. It is associated with inflammatory conditions in the biliary system, and patients with risk factors such as primary sclerosing cholangitis and liver fluke infestations have a higher risk for CCA development [1][2][3]. The generally late clinical presentation of CCA results in a high mortality. At present, the most commonly studied and routinely used serum biomarkers for detecting CCA include CEA and CA19-9 [6]. However, they are nonspecific to CCA and can be elevated in the setting of other gastrointestinal malignancies or other benign conditions, such as cholangitis, cirrhosis, and hepatolithiasis [7][8][9][10][11][12][13][14]. Based on the results in this study, both CEA and CA19-9 could not also distinguish the patients with CCA and BBTDs in our sample cohort as both appeared to be nonspecific for either case. Hence, there is an urgent need for new diagnostic targets. In this study, we evaluated the differential proteome in the serum between the BBTD controls and CCA patients and identified potential biomarker panels to aid in the diagnosis of these common liver diseases.
Total proteins were retrieved from the whole serum without the depletion of high abundant proteins due to the fact that additional steps may not help enrich the level of low abundant proteins and may reduce reproducibility from one sample to the others [19]. Among the identified Disease Markers     proteins, we found that a number of them had previously been described in the context of CCA, confirming the validity of our quantitative proteomic approach. These included overexpression of MAGED4 [20] and DNA mismatch repair protein (MLH1) [21,22], downregulation of albumin (ALB) [20], apolipoprotein B (APOB) [20], apolipoprotein A-II (APOA2) [20], and interalpha (globulin) inhibitor H1 (ITIH1) [20,23]. Expression of serum alpha 1-macroglobulin (A2M) was found to be significantly higher in BBTD compared to CCA patients. Consistently, it has also been reported that the serum A2M increased in patients with liver malignancies including CCA but markedly elevated in hepatic cirrhosis [24]. Fibronectin 1 (FN1) in serum of CCA patients seemed to be lower than that of BBTD patients. Biliary FN1 has been reported as a differential biomarker of benign and malignant diseases [25]. Similarly, serum plasminogen (PLG) of CCA cases was significantly lower than that of BBTD controls. PLG in malignant livers including CCA has been demonstrated to be lower than that of the cirrhosis patients [26]. Other serum proteins were also found differentially expressed between CCA and BBTD including angiotensinogen (AGT), ADAM metallopeptidase with thrombospondin type 1 motif 3 (ADAMTS3), hemoglobin, zeta (HBZ), keratin-1 (KRT1), keratin-10 (KRT10), and serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), and member 1 (SERPINA1). However, the validation of these identified proteins is needed in order to determine if they can be clinically useful as differential biomarkers for CCA and BBTD. The top five proteins which exhibited the maximal fold change between CCA and BBTD consisted of FAM19A5, MAGED4B, KIAA0321, RBAK, and UPF3B. FAM19A5 belongs to the TAFA family of small secreted proteins, which are brain-specific and distantly related to MIP-1 alpha, a member of the CC-chemokine family [27]. This family of proteins has been postulated to function as brain-specific chemokines or neurokines that act as regulators of immune and nervous cells, although the association of this protein and CCA pathogenesis has yet to be evaluated. For MAGED4B, its overexpression has been linked to malignant tumors and poor patient outcome in many types of cancer including breast [28], oral squamous cell carcinoma [29], and hepatocellular carcinoma [30]. However, there are no data available on the expression and the diagnostic or prognostic relevance of MAGED4B in CCA and BBTDs. KIAA0321 is a zinc finger FYVE domain-containing protein, which mediates binding of these proteins to membrane lipids and may be involved in the abscission step of cytokinesis. However, the relevance of  Figure 5: Comparison of the top five differentially expressed proteins between BBTDs and CCA. Normalized log 2 -transformed data were used to create box plots, in which the horizontal lines of each box correspond to the first, second, and third quartiles (25%, 50%, and 75%, resp.) and the whiskers correspond to the maximum and minimum values. this protein and cancer development is yet to be elucidated [31]. RBAK is a member of a known family of transcriptional repressors that contain zinc fingers of the Kruppel type, which interacts with the tumor suppressor retinoblastoma 1. It has been shown that RBAK is expressed ectopically in human fibroblast cells [32]. Since fibroblasts in the stroma of desmoplastic cancers provide optimal microenvironment for CCA progression and they usually become susceptible for apoptosis [33], it would therefore be possible that overexpression of serum RBAK in CCA patients may be from apoptogenic cancer-associated fibroblasts. UPF3B has been reported to be overexpressed in the patients with alcoholic hepatitis [34], but there is currently no link on UPF3B and cancer yet.
In conclusion we identified proteins in the serum that can potentially discriminate patients with CCA from BBTD individuals through proteomic approach using highly stringent analysis with cross-validation. These proteins will be clinically useful to prevent misdiagnosis between CCA and BBTD as they have similar clinical symptoms. Further independent validation of these biomarkers is certainly required using greater numbers of samples from patients with CCA and a wider range of BBTD conditions to test its robustness and obtain the ones with the greatest diagnostic power for differentiating patients with CCA from BBTD controls.