Proteomic Analysis of Colorectal Cancer: Prefractionation Strategies Using two-Dimensional Free-Flow Electrophoresis

This review deals with the application of a new prefractionation tool, free-flow electrophoresis (FFE), for proteomic analysis of colorectal cancer (CRC). CRC is a leading cause of cancer death in the Western world. Early detection is the single most important factor influencing outcome of CRC patients. If identified while the disease is still localized, CRC is treatable. To improve outcomes for CRC patients there is a pressing need to identify biomarkers for early detection (diagnostic markers), prognosis (prognostic indicators), tumour responses (predictive markers) and disease recurrence (monitoring markers). Despite recent advances in the use of genomic analysis for risk assessment, in the area of biomarker identification genomic methods alone have yet to produce reliable candidate markers for CRC. For this reason, attention is being directed towards proteomics as a complementary analytical tool for biomarker identification. Here we describe a proteomics separation tool, which uses a combination of continuous FFE, a liquid-based isoelectric focusing technique, in the first dimension, followed by rapid reversed-phase HPLC (1–6 min/analysis) in the second dimension. We have optimized imaging software to present the FFE/RP-HPLC data in a virtual 2D gel-like format. The advantage of this liquid based fractionation system over traditional gel-based fractionation systems is the ability to fractionate large quantity protein samples. Unlike 2D gels, the method is applicable to both high-Mr proteins and small peptides, which are difficult to separate, and in the case of peptides, are not retained in standard 2D gels.


Epidemiology of colorectal cancer (CRC)
CRC is one of the most common cancers diagnosed each year in Western countries, accounting for 13-14% of all cancer presentations [1,2] and for 10-13% of all cancer deaths [1,2]. The major impetus for research into markers of CRC is that, if detected early, when tumours are still localized [3], the projected 5 year survival rate is ∼90%. Unfortunately, most colorectal cancers (∼60%) present at an intermediate stage with a concomitant decrease in survival rates [2]. The majority of colorectal cancers (65-85%) are sporadic in nature, the result of cumulative somatic mutations [4]. Up to 30% of patients have a family history of bowel cancer, of which 6% belong to clearly defined familial genetic syndromes, such as hereditary non-polyposis colon cancer (HNPCC) and familial adenomatous polyposis (FAP). Of the sporadic risk factors, age is the most important, with the incidence increasing exponentially after age 50 [5].
Colorectal carcinomas arise from adenomas (also referred to as polyps), hence the 'adenoma-carcinoma sequence', with a lag time of ∼10 years FFE/RP-HPLC: a proteomics fractionation tool 237 [4,6,7], and the removal of these polyps has been shown to prevent colorectal cancer [7].

Current detection methods for CRC
In the USA, UK and Australia, the general recommended screening regimen involves annual to biennial faecal occult blood tests (FOBT) and 5yearly sigmoidoscopies [8,9]. However, effective population screening by these means is precluded by patient compliance, due to discomfort and lack of awareness of the epidemiology of colorectal cancer. If used alone, FOBT has been shown to be ineffective [10]. This type of screening is not diagnostic of CRC, but merely selects those patients who should proceed to colonoscopy.
The most widely used blood-based protein biomarker of CRC, carcinoembryonic antigen (CEA), exhibits poor sensitivity for screening for the early detection of the disease and is mainly employed in post-operative detection for recurrence and monitoring of metastasis [11,12]. Whilst a single marker may lack sensitivity and specificity for detection, combining markers improves these parameters [13,14].

Proteomics tools
Current proteomics research can be defined as two contrasting but complementary strategies, cellmapping proteomics and protein expression proteomics [15,16].

Cell-mapping proteomics
Cell-mapping proteomics aims to define protein-protein interactions to build a picture of the complex networks that constitute intracellular signalling pathways. Many genetic mutations associated with cancer progression affect genes encoding proteins in signalling pathways, highlighting the importance of defining these signalling networks [17]. For example, Pandey et al. treated HeLa cells with either epidermal growth factor or platelet-derived growth factor (PDGF), and used anti-phosphotyrosine immunoprecipitation to concentrate a range of proteins that were subsequently phosphorylated [18]. The analysis revealed the role of vav-2, as well as a number of other proteins, in growth factor signalling. Alternatively, Lewis et al. selectively activated or inhibited the mitogenactivated protein kinase (MAPK) pathway and used a proteomic approach to identify 20 novel targets of MAPK signalling [19]. Affinity capture techniques have also been used to identify the anti-apoptotic protein DIABLO/SMAC [20] and binding proteins for suppressors of cytokine signalling, SOCS [21]. Although the scope of this review does not cover the full range of possibilities available, these few examples show the utility of cell-mapping proteomics.

Protein-expression proteomics
Protein expression analysis monitors global expression of large numbers of proteins within a cell type or tissue and quantitatively identifies how patterns of expression change in different circumstances. Global protein profiles can be produced for normal compared with tumour cells in a given tissue, or for cells before and after treatment with a specific drug. Currently, this is the most widely used model of proteomics and is largely dependent upon twodimensional gel electrophoresis (2DE) for visualization of protein profiles. Expression proteomics is the protein equivalent of DNA microarray analysis. Like DNA microarrays, it has the advantage of being non-prejudicial and could define unexpected ways in which known proteins regulate cellular responses. Major limitations of the 2DE system include an inability to detect proteins of medium to low abundance, as well as a limited apparent molecular mass range (M r ), where molecules smaller than 10 K are generally lost. This has prompted much interest in non-2DE approaches for studying global protein profiles.

Correlation of mRNA transcripts and protein expression levels
In the search for tumour progression markers or anticancer drug targets, there has been a concerted effort to define gene expression profiles at the transcript level [22,23]. However, it is clear that mRNA expression data alone are insufficient to predict functional outcomes for the cell, as they provide very little information about activation state, posttranslational modification or localization of corresponding proteins. Moreover, there are numerous reports highlighting the disparity between mRNA transcript and protein expression levels [24,25]. Thus, at the very least, mRNA expression studies must be supported with proteomic information in an integrated approach to provide a complete picture of how cells are altered during malignant transformation [25,26].

Dynamic range of protein abundances
Several technical issues need to be addressed before proteomics can realize its full potential for protein expression profiling of complex proteomes such as cells and tissues [27,28]. Foremost is the problem of dynamic range of protein abundances. For instance, the dynamic range of protein abundances in blood is thought to be ∼10 10 [29]. This makes it extremely challenging to visualize lowabundance proteins and peptides in complex proteomes such as blood, let alone identify them using current mass spectrometry (MS)-based identification methods [30][31][32]. For example, the most abundant protein in human plasma is serum albumin (HSA), present at 40-50 mg/ml, whereas some of the least abundant proteins, such as cytokines [33] and protease biomarkers (e.g. the prostate-specific antigen [34]), are present at ∼10 and ∼3 pg/ml levels, respectively. Given that the current sensitivity of routine protein and peptide identification by MS-MS is ∼500 amol, in order to obtain 500 amol of IL-6, approximately ∼1.5 ml plasma would be required for the initial fractionation step. However, this amount of plasma would also contain 1.4 µM (90 mg) HSA, which is 6.2 × 10 9 (w/w) in excess of IL-6. This is a formidable quantity of protein to fractionate and presents a challenge to current purification schemes, where extensive pre-fractionatation/depletion strategies need to be invoked in order to reveal low-abundance proteins. However, this also assumes that the protein of interest is homogenous and a 100% recovery is obtained through all fractionation steps. For IL-6, which is extensively post-translationally modified (pI range 5-7, M r range 22-29 K [33,35,36]), much larger quantities of plasma would be required to obtain a single enriched population of IL-6 molecules for identification purposes. Of equal importance is the problem that proteins exhibit tremendous heterogeneity with respect to size, charge, post-translational modifications and solubility. Consequently, a wide range of protein separation methods, or combinations thereof, i.e. multidimensional separation strategies, are usually required for the comprehensive analysis of complex proteomes.
For several decades, 2DE [37,38] has been the only proteomics technique that has permitted the separation of thousands of proteins in a single experiment [39,40]. However, the dynamic range of protein abundances that can be separated by 2DE is ∼10 4 , which is inadequate for cell types and tissues such as blood.
Prefractionation of proteins from large volumes, as a prelude to subsequent analysis using 2DE, can be performed by preparative polyacrylamide gel electrophoresis (PAGE) or size-exclusion chromatography on the basis of M r [57,58]. Alternative electrophoretic prefractionation methods based upon solution-phase IEF, rather than gel-based IEF [59] used in 2DE, have been developed. These non-gel-based IEF methods can accommodate large sample volumes and amounts in contrast to gelbased IEF methods. Preparative IEF as a prefractionation technique was first proposed by Bier's laboratory [60,61]. To overcome problems associated with the original device, Righetti and colleagues [62] developed a multicompartment electrolyser in which each compartment was separated . For analytical imaging separations, a portion of each first-dimension FFE-IEF fraction (50 µl/total volume ∼2 ml) was injected directly from the 96 deep-well plate using the Agilent 1100 HPLC equipped with a well-plate autosampler and samples collected automatically into a multi-plate fraction collection system by a polyacrylamide gel membrane, each with a defined pH. A microscale liquid-phase IEF prefractionation method that is also based upon a multicompartment apparatus (4.7 ml total volume, 650 µl/chamber) has recently been described by Speicher et al. [63]. In this apparatus the chambers are separated by thin polyacrylamide gels containing ampholyte mixtures at specific pH values. For a recent review of prefractionation techniques in proteome analysis, see Righetti et al. [64] and Simpson [28].

Free-flow electophoresis (FFE)
FFE, first described by Hannig, [65,66], is a separation device that continuously streams a sample into a carrier ampholine solution flowing as a thin laminar film (0.3-1.0 mm) between two flat plates (see Figure 1). By introducing an electric field perpendicular to the direction of flow, cellular organelles, proteins and low M r species such as peptides can be separated by IEF according to their different pI values and subsequently collected for further analysis [67,68]. Previously, we reported an uncoupled FFE-IEF/SDS-PAGE strategy for separating cytosolic proteins from a human colon carcinoma cell line for subsequent identification by on-line RP-HPLC/electrospray-ionization (ESI)-ion trap MS [69]. We have further refined this strategy to provide a complete liquid-based fractionation strategy by introducing off-line rapid RP-HPLC (1-6 min separation times) as a second dimension for each of the FFE fractions. An example of a complex protein separation, such as a cell lysate, using this method is shown in Figure 2. Unlike 2DE, this technique is suitable for separating low-M r proteins and polypeptides and does not have the problem of sample loadability, due to the ease by which the method can accommodate large sample volumes [70]. FFE can be performed over both broad and narrow ranges of pH by the judicious choice of ampholytes in the first dimension step. Additionally, we have developed software to present the chromatographic output as a single 2D plot (virtual 2D analysis) for quick visual evaluation and 'spot' matching of fractionated proteins.
The ability to fractionate low-M r compounds is an important feature of this 2D liquid-based FFE-IEF/RP-HPLC method, because techniques designed for this purpose are under represented in the armory of current proteomic separation tools. 2D gel-based systems, multicompartment and column-based separation systems are also limited by the quantity of bulk starting material that can be loaded in the first dimension (IPG). This limitation in fractionation of starting material can hamper efforts to mine complex tissues.
The advantages of this system can be summarized as follows: (a) protein and/or peptide separations in the first dimension (IEF) are performed in a liquid phase and, unlike other multicompartment electroanalysers, are not restricted by passaging through any barrier or matrix; (b) the system is truly preparative by not being sample-limited, and separation efficiency is maintained by continual flushing of the separated sample; and (c) the FFE-IEF/RP-HPLC system is capable of separating compounds of low M r (e.g. peptides) as well as high M r (e.g. native proteins and their multimeric complexes) over a broad pH range. For those proteins (especially membrane proteins) that exhibit poor solubility at or near their pI value, an appropriate buffer (e.g. amino acids as well as detergents in the case of membrane protein separations) can be incorporated in the FFE counterflow media prior to collection in the 96-well plate to minimize the time that such proteins stand at their pI value [69]. The high-resolving power produced in the first dimension IEF step, where very narrow range pH gradients can easily be generated, coupled to the high resolution of modern RP-HPLC stationary phases, extends the resolving power of this 2D protein separation system over other previously described 2D systems based solely on coupled FFE/RP-HPLC: a proteomics fractionation tool 241 HPLC columns [71,72]. In the case of high-M r proteins and very hydrophobic proteins such as membrane proteins, the RP-HPLC stationary phases can be substituted by other chromatographic modes, such as hydrophobic interaction chromatography or hydroxyapatite stationary phases to extend the power of the method to cover classes of proteins that are refractory to RP chromatography.
The advantage of the 2D liquid-based FFE/RP-HPLC system over traditional gel-based systems is that complex mixtures of low-M r compounds, such as tryptic peptides, can be fractionated, whereby peptides are fractionated into discrete pools with increasing pH values that vary one from another by ∼0.02 pH units. The ease and rapid determination of the apparent peptide pI value can be achieved by measuring the pH of these pools using a laboratory combination pH electrode. By applying a peptide's pI determinant to high mass accuracy ( 1 ppm) peptide-mass fingerprinting, [73][74][75] the discrimination of peptides differing by small mass units or even isobaric peptides is feasible [76]. Peptide mass, as determined by current high mass accuracy mass spectrometers, by itself as the only parameter for peptide identification in large genome databases, is insufficient for high-confidence protein identification and is no longer considered a valid approach to protein identification by the editors of proteomics journals [77]. The use of a peptide's pI value combined with MS-MS data also provides a powerful qualifier for peptide identifications, as well as decreasing the peptide search pool. The latter can be achieved by minimizing the number of peptides used to search with, by only considering those peptides that are within a discrete pI range in conjunction with the mass range defined by the accuracy of the mass spectrometer used.

Concluding remarks
The 2D liquid based FFE-IEF/RP-HPLC method described here promises to play a key role in the analysis of complex protein and low-M r compounds, the latter being largely under represented in most proteome studies to date. Interestingly, low-M r proteins and peptides are thought to contain a rich source of previously undiscovered biomarkers of disease [78]. Although current claims regarding the potential of proteomics to define cancer-related molecules might outnumber reports of concrete achievement, both globalexpression [79,80] and cell-mapping proteomics [19][20][21] have contributed to our understanding of cell biology and disease. To date, several proteins have been identified from colon tumour samples and patient blood that can potentially be used for biomarkers for the identification of early onset colorectal cancer [81,82]. Validation of these identifications must be performed before these proteins can be ascribed as diagnostic biomarkers [83]. It is anticipated that further proteomics studies aimed at identifying specific proteins present in biological specimens of diseased patients may reveal a panel of proteins that correlate with aberrant growth specific or individual cancer subtypes. When used in combination, such a cohort of biomarkers may provide for a high-sensitivity and -specificity predictive assay with minimal invasion to the patient, thereby allowing for population-based screening.