A Proteomic Approach for the Discovery of Early Detection Markers of Hepatocellular Carcinoma

Individuals chronically infected with hepatitis B or C virus (HBV, HCV) are at high risk for the development of hepatocellular carcinoma (HCC), with disease progression occurring relentlessly over many years. The diagnosis of HCC usually occurs at late stages in the disease when there are few effective treatment options and the prognosis for patients with HCC is very poor. The long latency period, together with clearly identified at risk populations, provide opportunities for earlier detection that will allow more timely and effective treatment of this devastating cancer. We are using a proteomic approach to test the hypothesis that changes in the amount of certain serum polypeptides, or changes in their post-translational modifications, can be used to predict the onset of HCC. Advances in the standardization of two dimensional gel electrophoresis (2DE) coupled with computerized image analysis now permit the reproducible resolution of thousands of polypeptides per run. Serum polypeptides from individuals at different stages in the disease continuum are being resolved by 2DE to identify those that change with disease progression. Polypeptides found by this method can be further characterized by mass spectrometry. In addition, the potential for changes in the glycan structure of certain polypeptides to serve as a marker for disease progression can be explored. The proteomic approach is expected to liberate us from the need to “cherry pick” or guess the best biomarkers and let the data tell us which are the best indicators of disease. Information may also be gleaned about the pathobiology of the disease process.

Individuals chronically infected with hepatitis B or C virus (HBV, HCV) are at high risk for the development of hepatocellular carcinoma (HCC), with disease progression occurring relentlessly over many years. The diagnosis of HCC usually occurs at late stages in the disease when there are few effective treatment options and the prognosis for patients with HCC is very poor. The long latency period, together with clearly identified at risk populations, provide opportunities for earlier detection that will allow more timely and effective treatment of this devastating cancer. We are using a proteomic approach to test the hypothesis that changes in the amount of certain serum polypeptides, or changes in their post-translational modifications, can be used to predict the onset of HCC. Advances in the standardization of two dimensional gel electrophoresis (2DE) coupled with computerized image analysis now permit the reproducible resolution of thousands of polypeptides per run. Serum polypeptides from individuals at different stages in the disease continuum are being resolved by 2DE to identify those that change with disease progression. Polypeptides found by this method can be further characterized by mass spectrometry. In addition, the potential for changes in the glycan structure of certain polypeptides to serve as a marker for disease progression can be explored. The proteomic approach is expected to liberate us from the need to "cherry pick" or guess the best biomarkers and let the data tell us which are the best indicators of disease. Information may also be gleaned about the pathobiology of the disease process.

Primary hepatocellular carcinoma and hepatitis B and C virus
Hepatocellular carcinoma (HCC) is responsible for approximately one million deaths annually and ranks as the 4th or 5th leading cause of cancer death worldwide [1][2][3]. Incidence rates vary geographically and are highest in some regions of Asia and Africa but considerably lower in developed western nations including the USA [2][3][4]. Unfortunately, recent studies show that the incidence of HCC is actually rising in the USA [5] as well as in other areas including Japan and Europe [6][7][8]. HCC is an aggressive malignancy with a poor prognosis [1]; the 5 year survival rates are usually less than 10% following diagnosis using conventional methods of detection and treatment.
The major etiology of HCC is chronic infection with hepatitis B virus (HBV), which is associated with more than 80% of all cases, worldwide, and as many as 95% of cases in areas where HBV is endemic [4]. Despite the availability of an effective vaccine for prevention, there are currently more than 350 million people worldwide who are chronically infected with HBV, including 1.25 million in the USA [9]. In the absence of intervention, between 20-40% of these individuals will eventually die from cirrhosis or HCC.
There is good evidence that a second major etiology of HCC is chronic infection with hepatitis C virus (HCV) [6,[10][11][12]. In some parts of the world, HCV is the major etiology of HCC and it is speculated that rising HCV infection rates account for much of the increase in HCC incidence [5]. There is no vaccine for the prevention of disease from HCV and the Centers for Disease Control (CDC) has classified this as an important emerging disease [13]. Most infections with HCV result in chronic carriage and as many as 2.7 mil-lion Americans are chronically infected with HCV [13]. Worldwide, the prevalence of chronic HCV may be as high as 170 million [14].
It is clear that chronic infection with HBV and HCV define populations at high risk for the development of HCC. However, most chronically infected individuals remain asymptomatic for many years. The pathogenesis and natural history of chronic disease from HBV and HCV are similar, despite their very different virologies. Clinical disease may be the result of a relentless necroinflammatory process resulting from immunological recognition of viral epitopes over decades of infection [9,15]. The long latency between infection and development of serious liver disease (cirrhosis and HCC) provides an important window of time during which individuals can be monitored for disease progression and intervention can be proposed.

Current methods for detection of liver disease in chronically infected people
The progression of liver disease in asymptomatic chronic carriers of HBV and HCV can be monitored by regular physical assessments, serum liver function tests (LFTs), and ultrasound imaging for detection of small masses in the liver [16]. Although these tests involve only simple office procedures, the usefulness of each is restricted by certain limitations. For instance, ultrasound imaging is expensive making its routine use in most countries prohibitive. Moreover, small masses are difficult to detect, particularly in a cirrhotic liver, and therefore detection often occurs at a stage at which the prognosis is very poor [9,17]. Liver biopsy often provides the most useful information, but is unrealistic as a routine screen. Transaminase levels, which are among the serum enzymes measured in the LFT panel, vary throughout the course of chronic hepatitis and are not reliable predictors of disease progression.
The correlation between elevated serum concentrations of alpha fetoprotein (AFP) and the occurrence of HCC is impressive and has provided a useful surrogate marker for disease [18]. Levels of AFP exceeding 50 ng/ml occur in at least 60% of the cases of HCC at the time of diagnosis [19]. However, AFP levels may fluctuate wildly in chronically infected individuals and are influenced by a number of non-malignant physiological events, including alcohol consumption and pregnancy. For example, 58% of the chronic carriers in one study experienced rises of AFP that were unrelated to HCC [19]. Thus, although the limited success of AFP correlation with HCC serves to emphasize the potential of serum markers for early detection of disease, it also underscores the desperate need for additional, more reliable markers.
Early detection of HCC in HCV infected individuals is even less developed than that for HBV. As in the case of HBV carriers, liver biopsy and pathological analyses are useful [20], but of limited practicality. Presumably, at a minimum, ultrasound imaging and AFP monitoring will be an important part of the prognostic picture. However, additional methods of outcome prediction are necessary for both HBV and HCV infected populations.

The need to predict outcomes of chronic HBV infection
Individuals who are chronically infected with HBV and HCV experience a lifetime risk of 20-40% of developing HCC, usually after a period of many years of infection. Currently, it is impossible to predict which chronically infected individuals will have asymptomatic lives and which will succumb to fatal liver disease. Once HCC has been diagnosed, surgical (transplantation or resection) and targeted chemo-or thermo-therapeutic intervention is an afflicted individual's best hope [1,17,21]. There is a clear and urgent need for non-invasive, reliable methods of (i) predicting which chronically infected, but asymptomatic, individuals are likely to develop serious liver disease and (ii) detecting the onset of serious liver disease as early as possible in these individuals.
The correlation of elevated levels of AFP with HCC provides evidence that the serum is likely to contain biomarkers that reflect pathologies occurring in the liver. It is likely that serum will contain polypeptides in addition to AFP whose presence is influenced by disease processes in the liver. The influence of liver pathology on such hypothetical biomarkers might be the result of their direct involvement in liver pathogenesis or might reflect inflammatory activities that are coincident. Moreover, it is likely that post-translational modifications of some hypothetical markers will correlate with disease status and their detection will be important. For instance, it has been suggested that detection of specific glycoforms of AFP provide a better correlation with HCC than simple testing for AFP, per se (see below) [22][23][24].

Opportunities for the discovery of new biomarkers for HCC
Currently, the diagnosis of hepatitis and monitoring of liver disease relies heavily on serological assays. These include assays for viral antigens or corresponding antibodies and for liver-derived proteins such as transaminases and AFP. Advances in analytical technologies and bioinformatics are now making it possible to search for additional, and better, markers of disease status by using broad-based screens for disease related changes in gene expression. Many of these new technologies, such as those based on gene chip microarrays [25], SAGE [26], and differential display [27], are designed to study changes that occur at the RNA tran-script level. While these are powerful tools for examining changes in tissues or cells, they are less appropriate for the analysis of serum samples. On the other hand, the protein content, or proteome, of serum is very well-suited for analysis by high resolution profiling by two-dimensional gel electrophoresis (2DE) [28]. Developments in 2D gel technology such as the use of immobilized pH gradients for isoelectric focusing [29,30], improved buffers for sample solubilization [31,32], and sensitive new stains that are compatible with mass spectrometry [33,34] make this approach feasible. Also essential are improved tools for digital imaging of gels, software for computerized image analysis, and rapidly growing databases with information regarding protein identification and expression and links  to extensive resources in genome databases. It should now be possible to identify polypeptide biomarkers that correlate with clinically defined disease states, permitting the advancement of candidate polypeptides that may serve as markers for progressing liver disease. If polypeptides other than AFP can be added to a panel of detection markers, the ability to predict the onset of HCC with confidence will be greatly enhanced.

Proteome profiling of serum samples as a function of disease state
Several technologies are being explored for the resolution and analysis of large numbers of proteins found in tissues, body fluids, cells, and sub-cellular fractions. The diversity of proteins in terms of their physical and biochemical properties creates both challenges and opportunities for broad-based separation and profiling schemes. Some current methods that are evolving rapidly are those that combine chip technologies with protein identification by mass spectrometry, such as surface-enhanced laser desorption/ionization (e.g. SELDI ProteinChip Arrays) [35] and biomolecular interaction analysis-mass spectrometry (BIA-MS) [36]. Additional techniques such as those for improved resolution by capillary isoelectric focusing [37] and liquid chromatography [38] are also being pursued. The workhorse of protein separation,however, remains 2DE where advances in sample preparation, gel technology, and gel analysis (see above) have all contributed to the lasting value of this classical approach [28,39].
It is possible to resolve over 1000 polypeptides from human serum samples by 2DE using a relatively broad pH range (pH 3-10) for isoelectric focusing in the first dimension and an acrylamide gradient gel for SDS-PAGE in the second dimension. Figure 1(a) shows an electronic image of serum derived proteins detected by fluorescent staining of a 2D gel. The same gel is shown in Fig. 1(b), where spots (or "features" in the 2D gel) have been found by computer analysis using a program designed to detect spots and then quantify spot intensities. Computerized spot detection must be augmented by manual editing, or curation, to refine the computer generated results. After careful spot detection has been completed, the software can be used to match images from a series of similar gels and query the results as to the presence or absence of particular spots or changes in spot intensity across the series. Results can be subjected to statistical analysis to confirm the significance of observed variations.
A general strategy for using this technology to discover new disease biomarkers is to analyze serum collected from patients at different stages of disease and to look for polypeptides, or features, whose appearance, disappearance, change in abundance, or posttranslational modification correlates with disease progression. To apply this strategy to the discovery of biomarkers for the early detection of HCC, we have defined several categories that roughly describe patient groups within the chronically HBV-infected high risk population (see Table 1). The groups are intended to represent possible stages along the continuum between initial infection and clinical disease (hepatitis or HCC), although not all patients follow a clear path through all categories. Serum samples from patients in each of the groups are being examined by 2DE so that a composite gel can be generated (electronically) that represents an average of all features found in the serum of patients within a given group. Composite gels will then be compared to look for differences that occur among the disease categories. A master composite gel can be generated by matching and then merging digital images of 2D gels from several samples representing each of the patient groups. This master composite gel, illustrated schematically in Fig. 2, will then provide an address, or unique identifier, for every feature and can serve as a point of comparison for each gel that is analyzed. Potential markers of disease progression will be features that show a greater variation (with higher statistical significance) among different disease groups than among individuals within a disease group.
Once potential polypeptide biomarkers are recognized by this method, the corresponding protein will be identified by mass spectrometry. A knowledge of the identity of proteins that undergo changes in the serum as the disease progresses may provide insight into the disease process itself. In addition, it will allow the development of simpler assays, such as immunoassays, for the detection of these polypeptides so they can be validated as markers and monitored in a large population of patient samples.

Simplifications of the polypeptide profile
While one advantage of the emerging technology of proteome analysis is the ability to examine the protein Fig. 3. 2DE of human serum proteins with a pH 4-7 first dimension gel. Human serum proteins were resolved by 2DE as in Fig. 1 except that the isoelectric focusing in the first dimension was carried out in a pH 4-7 IPG strip. complement of a sample on a global scale, it is also true that some simplification of the pattern of spots seen on gels greatly facilitates reproducibility as well as subsequent gel matching and analysis. To achieve this, several approaches are possible that do not necessarily compromise, and in fact may enhance, the information content of the gels. These include the use of a narrow pH range in the first dimension focusing gels [40,41], the removal of some very abundant proteins from the sample, and the removal of some post-translational modifications from proteins prior to electrophoresis.

Isoelectric focusing in a narrow pH range
Isoelectric focusing gel strips are commercially available, or can be made [42], that will focus proteins over a narrow pH range. Some of these will focus either acidic (e.g. pH 4-7) or basic (e.g. pH 6-11) proteins and others will focus over a range as narrow as one pH unit [40,41]. Use of these strips improves separation of proteins that might otherwise migrate as overlapping spots, with the added benefit of increased reliability of computerized detection of spots. They also allow a higher total protein load on the gels so that detection of less abundant proteins is improved. An example of how better resolution of human serum proteins is achieved by 2DE using a pH 4-7 isoelectric focusing gel in the first dimension is shown in Fig. 3.

Removal of abundant proteins
A small number of highly abundant proteins are very prominent in the serum proteome profile. For instance, serum albumin constitutes 30-50% of the total serum protein and is poorly resolved in the gel. This causes distortions that can prevent the proper resolution, and even obscure, nearby or co-migrating pro-teins. Immunoglobulin chains, particularly the IgG heavy chains, also represent a major set of proteins that do not resolve well (due to their heterogeneity) and can hamper accurate analysis of nearby proteins. Selective removal of these proteins from the serum sample prior to electrophoresis allows the detection of proteins in these regions of the gel. In addition, greater amounts of the remaining proteins can be loaded so that more of the less abundant proteins will be detected and included in the proteome analysis. We are currently developing immunologically based methods for the selective removal of a small number of prominent proteins such as albumin and IgG heavy chains and preliminary results are shown in Fig. 4. A comparison of Figs 4(A) and (B) demonstrates that the significant streaking, overlap, and distortion of nearby features caused by serum albumin and IgG heavy chains can be remedied by removal of these abundant proteins. Since the depletion removes more than 60% of the protein mass in serum, low abundance serum proteins and proteins normally disguised by albumin or IgG on the 2D gel can be more accurately identified and subjected to quantitative gel analysis. Examples of proteins that can now be identified are complement factor B, complement C3, complement factor H, beta-2-glycoprotein I, and apolipoprotein L (see Fig. 4 and Table 2). These proteins are all involved in inflammatory processes and the corresponding features addressed here by our gel and mass spectrometric analysis are assigned for the first time (screening against public domain databases). Clearly, the removal of abundant proteins prior to 2D gel electrophoresis will allow a more comprehensive analysis of the patient serum samples.

Deglycosylation of serum proteins prior to 2DE
The majority of the proteins that pass through the secretory pathway are glycoproteins and this is reflected in the predominance of glycoproteins in serum [43,44]. An investigation of the SWISS-PROT database has shown that almost two-thirds of all proteins contain the N-glycosylation sequon, NXS/T [45]. Assessing the probability of occupancy of the sequon, it was concluded that half of all proteins may be N-glycosylated proteins. In 2DE analysis, the absolute number of features and quantification of those features may vary as a consequence of the oligosaccharide heterogeneity associated with any given polypeptide. This is one of the ways in which the resolution and reproducibility of the 2D gels can be compromised. Although oligosaccharide heterogeneity can contain useful information with regard to potential biomarkers [46], there are more selective and accurate methods for the investigation of glycan structure (see below). Therefore, we have explored the approach of removing N-glycans on serum proteins prior to their analysis by 2DE, as shown in Fig. 5. Consequences of this de-N-glycosylation are a reduction in the absolute number of features on the gel, the improved resolution of low abundance features into identifiable spots, an improvement in feature reproducibility, and generally better resolved, tighter spots that will allow more accurate spot picking. We are currently investigating the likely improvements in protein identification by mass spectrometry by this de-N-glycosylation approach.

Protein-associated oligosaccharides as biomarkers
Much of the proteome diversity seen in 2DE separations of complex protein mixtures is due to posttranslational modifications such as glycosylation and phosphorylation. However, gel-based proteome analyses can only identify major charge or size differences in the oligosaccharides of glycosylated proteins. Therefore, it will be important in biomarker discovery to investigate technology platforms that accurately elucidate oligosaccharide structure and that can compare subtle differences in glycan structure. This is especially true since alterations in the oligosaccharides associated with glycoproteins and glycolipids are one of the many molecular alterations that accompany malignant cellular changes [46][47][48][49][50][51].
As noted above, it is possible that changes in the glycosylation of AFP show a better correlation with HCC than changes in absolute levels of the protein in serum, which can vary in response to a variety of physiological conditions [19]. For instance, an increase in alpha-1,6 core fucosylation of AFP has been shown to be fairly specific to HCC [52] and therefore a fucosylation index may be useful as a prognostic indicator for HCC [24]. Unfortunately, even when AFP is detected by 2DE, this change in oligosaccharide structure results in a difference of only 146 Da in molecular weight and would be missed by the proteome analysis.
Our lab is investigating the use of highly sensitive glycosequencing technology [53] to allow the identification of glycan-based biomarkers. The increase in fucosylation associated with HCC is not restricted to AFP [52,54,55], and AFP levels are not always suffi- Fig. 6. Glycan analysis of oligosacharides derived from serum polypeptides as a method of biomarker identification. Normal phase high performance liquid chromatography analysis of 2-aminobenzamide labeled oligosaccharides released from HBsAg isolated from: non-HCC HBV infected patient (upper panel); HBV infected HCC patient (middle panel); and from the medium of a human hepatoma tissue culture cell line, Hep G2.2. Particles were purified and glycans were removed by hydrazinolysis [56], fluorescently labeled at their reducing end, and glycan structures identified [53]. The major glycan, as indicated by annotation, is the simple biantennary sugar in a non-HCC HBV infected patient. However, HBsAg purified from an HBV infected individual with HCC shows the presence of new peaks, including biantennary fucosylated sugars. HBsAg purified from an HBV producing cultured human cell line shows the presence of tri-antennary sugars in addition to the other sugar types observed in human patients. cient in patient serum samples to allow accurate analysis of fucosylation. Therefore a broader fucosylation index based on abundant proteins will be of great value. Patients from the high risk population infected with HBV can carry high levels (mg/ml) of viral proteins in their serum. HBV surface antigen (HBsAg) isolated from patient serum can be purified in a stepwise fashion and subjected to highly sensitive glyco-sequencing. An example of this type of analysis is shown in Fig. 6. In the example shown, HBsAg isolated from serum taken from patients with HCC is more highly fucosylated than that from chronically infected patients who have not developed HCC. Interestingly, it also appears that there is no increase in oligosaccharide branching in HCC-associated HBsAg, as opposed to what has been seen for glycans in other cancers and immortalized cells lines [46]. The increase in fucosylation of HBsAg in HCC patients shown here only provides a demonstration of the possibilities of glycosequencing in disease marker identification. Confirmation of this result requires the analysis of a larger number of samples derived from HBV infected individuals experiencing different disease states.

Concluding remarks
There is now a tremendous opportunity for the systematic examination of polypeptide profiles of those at risk for the development of HCC as a function of their disease state. Advances in methods for separation of polypeptides as well as in bioinformatics make it possible to take a fresh look at the expression profiles of at risk populations for many diseases in which changes in serum polypeptides might be expected to correlate with clinical disease status. What is being undertaken for the population at risk for HCC can be applied to any of a number of diseases for which clear at risk populations can be identified. These populations could range from those with infectious or pre-malignant diseases to those with genetic pre-dispositions for particular diseases. The goal of having a large and comprehensive panel of markers for which the same assay technology could be applied to serum or other body fluids to forecast illness is realistic. Early detection of those truly fated for complications will permit rational and early intervention. This, in turn, will have a very favorable impact on the public health, in general, and the affected individuals' life choices, in particular.