Novel Approaches to Identification of Biomarkers for Detection of Early Stage Cancer

Implementation of routine cancer screening accompanied by treatment of early stage disease is paramount to realizing reductions in cancer mortality rates. While morphologic assessment of relevant cells has traditionally been used to identify individuals with cancer, this approach is not feasible to identify cancers at inaccessible sites, such as the ovary or pancreas. For these and many other cancers, cancer mortality remains high, as sensitive and specific screening assays to detect in-situ or early stage disease are as yet unavailable. Identification of novel markers for early identification of cancer is an established priority of the National Cancer Institute. The rapid expansion of genomic-based technologies developed over the last decade, and the development of sensitive and specific blood-based assays for the detection of molecular changes associated with cancer, begin to offer the means for achieving this goal. Below, we review current approaches used in the effort to identify biomarkers appropriate for early cancer detection.


Introduction
Data from the Statistics, Epidemiology and End Results (SEER) registry clearly demonstrate that fiveyear cancer survival is significantly greater if, at the time of diagnosis and initial treatment, disease is localized rather than evidencing regional or distant spread [39]. Thus, implementation of routine cancer screening accompanied by treatment of early stage disease is paramount to realizing reductions in cancer mortality rates. Historically, the identification of individuals with cancer has been achieved by morphologic assessment of relevant cells. While this approach is suitable for identifying those cancers that occur at easily accessible sites (for example, the uterine cervix), morphologic evaluation is not as feasible to identify cancers at less accessible sites, such as the ovary or pan-creas. For these and many other cancers, cancer mortality remains high, as sensitive and specific screening assays to detect in-situ or early stage disease are as yet unavailable. Currently, there is considerable interest in the development of sensitive and specific blood-based assays for the detection of molecular changes associated with cancers at inaccessible sites. Below, we review current approaches used in the effort to identify such biomarkers.

Approaches to detecting cancer biomarkers
Genetic analysis of normal tissues, cancer precursor lesions, in-situ carcinomas and metastatic lesions supports the hypothesis that progression to malignancy is a multi-step process during which cells accumulate mutations that permit the development of a malignant phenotype [11,61]. The accumulation of molecular changes that abrogate or disrupt the function of genes regulating cell cycle control, genomic stability, DNA repair, apoptosis, and/or invasion are necessary for pro-gression to malignancy. Although it is known that alterations of a relatively limited number of genes are essential to the process of malignant progression [13,85], it is also clear that there are a number of different pathways to the development of cancer. For example, genetic changes (alterations in host DNA such as mutations or deletions) or various heritable epigenetic changes (permanent alterations in gene expression not mediated by changes in the nucleotide sequence, such as DNA promoter hypermethylation) may produce profound qualitative and quantitative changes in gene expression and thus protein production. The presence, in serum, of abnormally high levels of certain proteins (resulting from gene over-expression), or the detection of a humoral immune response elicited by the presence of proteins not normally in contact with the immune system, may herald the development of cancer. Therefore, the occurrence of specific genetic or epigenetic changes, or the detection of specific proteins or antibodies in serum, may serve as biomarkers to guide the development of sensitive and specific assays to identify subjects with, or who are at high risk for, malignancy (Fig. 1).

DNA sequence changes as potential cancer biomarkers
DNA sequence changes, including point mutations, translocations, deletions and amplifications, are characteristic of the vast majority of cancers, with some advanced cancers having up to 10 5 mutations identified [9,34]. Many such changes occur after the development of invasive disease and are of little use in guiding the early detection of cancer. However, mutations of genes which play a role in the pathogenesis of the malignant phenotype may potentially serve as potential biomarkers for in-situ or early cancer.
Several approaches have been used to identify specific changes of interest. Early studies examined metaphase chromosomes (karyotypic analysis) to identify cancer-specific genetic alterations or chromosomal abnormalities, such as the "Philadelphia chromosome" associated with chronic myeloid leukemia [71]. Although the addition of various banding procedures has improved the sensitivity and specificity of this technique, metaphase chromosome analysis is of limited value in identifying mutations in solid cancers due to both the large number and variety of mutations which occur, and the difficulties in obtaining adequate numbers of interpretable metaphases. A modification of this approach, fluorescence in situ hybridization (FISH), has been more successful in identifying solid cancers due to the greater ability to visualize and interpret the many complicated genetic alterations occurring in solid tumors, for example, FISH has been used to identify prognostic markers in neuroblastomas and breast cancer [63,65]. However, even with the use of multi-color FISH, many important changes will not be identified.
Comparative genome hybridization (CGH) is another more recently developed technique used to identify regions of DNA amplification and/or deletion in cancer tissues. Equal amounts of tumor and non-tumor DNA are first labeled with distinctive fluorochromes. The labeled DNA is then mixed with normal human metaphase chromosomes for competitive hybridization. Assessment of the ratio of the flurochromelabeled DNA (tumor to non-tumor) at any given area allows one to identify regions of DNA amplification and/or deletion, although structural changes might be missed. Further, since one is only measuring average DNA copy numbers, DNA changes occurring at very low frequency are likely to be missed. Deletions and amplifications must be greater than 10 Mb and 2 Mb in size, respectively, to be detected [11]. Recently, use of CGH arrays in which metaphase chromosomes have been replaced by cloned nucleic acid sequences have overcome some of these problems [36,72]. Recent additional approaches to identify novel amplified DNA sequences include Restriction Landmark Genome Scanning [27] and real-time quantitative PCR [68] when the loci of interest have been previously identified.

Epigenetic changes as potential cancer biomarkers
Loss of the ability to maintain genomic stability and thus regulation of gene expression can also result from heritable epigenetic alterations, that is, permanent changes in gene expression not mediated by a change in the nucleotide sequence. One such change is that seen with aberrant DNA methylation, which refers to methylation of cytosine in CpG dinucleotide to 5-methylcytosine. Methylated cytosine residues are known to have a high risk of mutation. Methylation in the CpG islands in the promoter region is associated with "gene silencing", with the density of methylated cytosine residues of the gene promoter region being inversely related to transcriptional activity (that is, the gene is expressed in the absence of methylation). Both global hypomethylation and gene specific hypermethylation are associated with malignancy [2,16,82]. Global demethylation activates putative oncogenes, in- creases mutation rate, and facilitates genomic instability, while gene specific hypermethylation can abolish the expression of important tumor suppressor genes. Studies in animals and in humans have demonstrated that these epigenetic changes are an early event in carcinogenesis, being present in a variety of cancer precursor lesions, including those of the lung [2,16], colon and rectum [17,18], and endometrium [14]. In one of the largest studies published thus far, Esteller [15] examined promoter hypermethylation of 12 genes important in tumor suppression (p16 INK4a , p15 INK4b , p14 ARF , P73, AVP 5 , BRCA1), DNA repair (hMLH1, GSPT1, MSMT), and metastasis and invasion (CDH1, TIMP3, and DAPK) in over 600 tissues from 15 different types of cancers. As expected, multiple genes were found to be hypermethylated in all tumors. Moreover, different cancer types tended to have different "hypermethylation profiles", that is, for each cancer type, a panel of 3-4 hypermethylated genes could identify that cancer with a sensitivity of 70-90%. Furthermore, the specific combination of genes aberrantly methylated in any given cancer tissue tended to affect all pathways central to tumorigenesis, including cell cycle control and DNA repair pathways, as well as metastasis-related processes. Detection of such hypermethylated genes is of particular value for the development of cancer detection assays since promoter hypermethylation of a specific gene always occurs at the same location within a gene, thus facilitating the detection of such changes. In contrast, disruption of gene function by DNA sequence alterations tends to occur at many different sites within the gene (by a variety of different changes), which greatly complicates the design of assays for the routine detection of genetic changes. Furthermore, the methylation status of a particular DNA sequence can be measured in DNA isolated from easily obtained body fluids such as serum or sputum [58].
A number of different methods to identify methylated CpG islands have been described. Bisulfite nucleotide sequencing [21] is based upon the fact that sodium bisulfite treatment of genomic DNA deaminates unmethylated, but not methylated cytosine residues into uracil. PCR testing is then performed using primers specific for certain fragments containing thymine, or alternatively, cytosine, followed by sequencing of the PCR products. Methylation status is determined by comparing the test sequence to unmodified sequences of the promoter of interest. The utility of this method is limited by the fact that only one gene can be examined at a time; thus an assay system based upon this approach would be extremely labor intensive. Other approaches have been developed, including methylation-sensitive arbitrarily primed PCR [42], restriction landmark genomic scanning [12], methylated CpG island amplification [74], CpG island arrays (also called differential methylation hybridization [33], and methylationspecific oligonucleotide microarrays [23], but to date, all rely on the ability of methylation-sensitive restriction enzymes to cleave unmethylated CpG islands, while leaving methylated CpG islands intact. For example, in the use of methylation-specific oligonucleotide microarrays, PCR amplification products of the bisulfite modified DNA are hybridized to a set of oligonucleotide arrays (19-23 nucleotides in length) that discriminate methylated from unmethylated cytosine at specific nucleotide positions. Quantitative differences in hybridization is then determined by fluorescence analysis.

Differentially expressed genes as potential cancer biomarkers
Another approach to the detection of potential cancer biomarkers utilizes genome based identification of differentially expressed proteins. This method is based upon two facts: (i) at any given time, only a subset of the genes in a given cell are activated or expressed [5], and (ii) progression to malignancy is associated with activation of oncogenes, inactivation of tumor suppressor genes, up-or down-regulation of genes associated with cell cycle regulation, and/or alteration of genes associated with apoptosis. These latter genetic changes are exceedingly common in cancer cells and result in alternations in the level of gene expression as measured by levels of mRNA or protein [3,4,7,19,24,31,37,38,40,43,44,54,[78][79][80]82,83]. Thus, even for cells of the same tissue type, the set of active genes in tumor cells will likely differ from that in normal cells, both qualitatively (different genes expressed) and quantitatively (levels of proteins expressed). These differences can be exploited to facilitate the identification of cancer biomarkers.
Techniques used to identify differences in gene expression between two tissues of interest include Northern blot analysis, Ribonuclease protection assay, quantitative RT-PCR, and in-situ hybridization. These methods provide high sensitivity and specificity in identifying over-expressed proteins, however, only a few genes can be simultaneously evaluated and compared. Detection of over-expressed proteins (that is, assessment of either mRNA or protein) distinguishing normal from abnormal cells has been greatly facilitated by genomics-based techniques such as expressed sequence tag (EST) sequencing [50,51], serial analysis of gene expression (SAGE) [77], differential display [22], cDNA expression array hybridization [49,53,69,79], and proteomics [46]. The public availability of nucleotide and amino acid sequence data (generated from large scale DNA-sequencing projects, such as the Human Genome Project), has made development and exploitation of these approaches a reality.

Expressed Sequence Tag (EST) sequencing and Serial Analysis of Gene Expression (SAGE) to detect over-expressed genes
Expressed sequence tag (EST) sequencing involves the examination of a short transcript sequence (10-14 base pairs), located at the end of a cDNA, which is sufficient to uniquely identify the sequence. Identification of tags is achieved by searching databases containing the known genomic and cDNA sequences already available. Comparison of the frequency of the clones present in two cDNA libraries, one constructed from a cancer tissue and the other from corresponding normal tissue, should reflect the relative abundance of the expressed transcripts in the two tissues. Serial analysis of gene expression (SAGE) [77] also measures transcript abundance by sequencing; however, this method improves upon the efficiency of EST sequencing, as short sequence tags, isolated from RNA at a defined position, are concatenated to permit the generation of long serial molecules for cloning and sequencing. This approach allows efficient analysis of transcripts by the sequencing of multiple tags in a single clone. Over-expressed clones are of particular interest if one is attempting to identify biomarkers upon which early cancer detection screening programs could be based, since low cost ELISA assays, appropriate for large scale screening efforts, can be developed. Once over-expressed markers are identified, confirmation of the association between the cancer and marker over-expression is undertaken by the use of RT-PCR and in-situ hybridization, and then appropriate monoclonal antibodies can be developed and tested.

Use of differential display to detect cancer-associated over-expressed messages
The use of differential display (RNA fingerprinting) to detect over-expressed message RNA has been extensively reviewed by Matz and Lukyanov [47]. Using differential display to identify differentially expressed transcripts involves selecting and sequencing random clones from two libraries (for example, a cancer and non-cancer tissue library) [64]. The terminal portions of the cDNAs from each library are systematically amplified using various sets of short arbitrary primers and the DNA fragments are resolved on a DNA sequencing gel. Several different primer sets are generally designed to attempt to prime all possible transcripts. The relative frequency of each transcript in each sample is reflected by the intensity of the observed banding. By comparing the two gels, differentially expressed transcripts are readily identified. The DNA band of interest can then be removed, cloned and sequenced.

Expression array hybridization to identify over-expressed genes associated with early stage cancer
DNA hybridization arrays are of interest for cancer biomarker discovery as these methods allow a global approach to gene expression analysis by permitting a simultaneous examination of expression patterns of thousands of transcripts. Expression array hybridization has been successfully used to characterize gene expression patterns in a wide range of both normal and cancerous human tissues, including tissues from the lung [29,60,80], prostate [32,50], colon [10], ovary [57,79,81], breast [6,20,28,62,70], and cervix [1]. Arrays vary by the type of material used for the array substrate (for example, membrane, glass or silicon), the array density (macroarrays are less dense than microarrays), specific sequences examined (known or unknown sequences in the form of cDNA clones, oligonucleotides, or PCR products), type of labeling used (radioisotopes versus fluorescence), and method of signal detection (autoradiography, phosphoroimaging or fluorescence). Over the last decade, array technology has quickly evolved, and continues to do so at a very rapid rate. The choice of the specific array format used depends upon many factors, including the scientific question being addressed and the availability and cost of the arrays. In general, individual cDNA clones, oligonucleotide sequences corresponding to specific genes or ESTs, or PCR products are dotted on a solid support such as a nylon membrane, silicon or glass. In the case of macroarrays, replicates (identical sets) containing generally up to 5000 cDNA clones are spotted onto nitrocellulose or nylon. Sets of these arrays are hybridized in parallel with labeled sequences (generally using radiolabeling or chemiluminescence) from either cancer or normal tissues. Individual or groups of cDNAs with differential expression signals are identified using quantitative image analysis software. For cDNA microarrays, up to 40,000 sequences can be placed on the array. Further, sequences from cancers and normal tissue can be labeled with different dyes and competitively hybridized to the same array. High density oligonucleotide arrays, consisting of 25-mers photolithographically generated onto silicon chips, requires that multiple oligonucleotides (to increase sensitivity), as well as mismatch oligonucleotides (to assure specificity) be present for each sequence. Array technology has clearly facilitated the identification of large numbers of potential cancer biomarkers, however, there are a number of issues concerning array design, data interpretation, and verification of results from array experiments which must be considered and addressed [30]. Once over-expressed markers are identified, confirmation of the association with malignancy and development of reagents for appropriate clinical assays, should then be undertaken.

Proteome analysis to identify potential cancer biomarkers
As discussed above, cDNA microarray hybridization methods are immensely useful for identifying potential cancer biomarkers as they facilitate the isolation of genes with significantly higher levels of mRNA transcripts in malignant, as compared to normal tissue (which in many cases, reflects a corresponding differ-ence in the level of protein products in the two tissues). However, a major limitation to this approach is that the rate of protein synthesis and/or the half-life of the protein product is often post-transcriptionally controlled. Therefore, detection of high levels of mRNA transcripts of a specific gene may not necessarily mean that high levels of the corresponding protein product will be present. Further, in regard to the development of appropriate blood-based cancer screening assays, it is difficult to assure that a marker selected on the basis of high levels of messenger RNA, would in fact be secreted into the blood. Further, it may be that significant post-trancriptional modifications have occurred, making the specific over-expressed mRNA transcripts of little value for protein detection. Considering the uncertainty regarding expression, modification, and secretion of over-expressed RNA sequences, many have proposed that direct identification of over-expressed proteins form the basis of identifying blood-based cancer biomarkers. Thus, profiling of specific protein products present would avoid, in theory, these major pitfalls associated with DNA expression analysis.
Proteome analysis requires the ability to resolve, quantitate, and identify proteins in appropriate patient samples. The standard approach involves resolution or separation of complex proteins by twodimensional polyacrylamide gel electrophoresis (2D PAGE), in which proteins are separated by their isoelectric point and molecular weight. Subsequently, selected protein species are characterized and identified by mass spectrometry [41,59]. However, 2D PAGE technology is of limited usefulness for large scale use due to the intensive labor requirements and the inability to detect low abundance proteins and other proteins such as membrane proteins. The use of narrow range immobilized pH gradient strips helps to increase resolution and loading capacity, although their use does not appreciably increase the ease of detection of very high or very low molecular weight proteins.
After the protein separation step, individual protein spots are removed from the gel, trypsin digested and then analyzed by one of a variety of mass spectrometric techniques, such as Matrix-Assisted Laser Desorption Ionization Time of Flight Mass Spectrometry (MALDI-TOF-MS) [26,76] or Electrospray Ionization Time of Flight Mass Spectrometry [76]. For these techniques, proteins (peptides) are exposed to an ultraviolet laser resulting in the desorption and ionization of the peptides. By applying a high voltage gradient to the sample, the ions separate in the flight tube based on their mass and charge. The mass of the protein of interest is established by comparing the time necessary for the ions to reach the detector to the time characteristic of known proteins. The protein of interest is identified by referring to databases containing the masses of specific peptides. Comparisons of proteins found in samples from cancer cases to those from normal subjects can help identify proteins that are over-expressed in cancer, as has been done for squamous cell bladder carcinoma [8].
Several new alternative approaches are being developed, including those that incorporate new methods for protein labeling, such as Isotope Coded Affinity Tag (ICAT) peptide labeling [35,75], which combines quantification and sequence identification of individual proteins from complex mixtures. Another novel approach uses mass spectrometry immunoassay (MSIA) in combination with bioreactive probes [52,73], which allows high throughput protein analysis, the potential for automation, improved sensitivity and provides data regarding interaction between specific proteins. Antibody microarrays/biochips using antibodies of varying affinities as well as recombinant antibodies have been proposed, and there is considerable enthusiasm for the "ProteinChip System" (Ciphergen, Fremont, California), which is based on Surface Enhanced Laser Desorption/Ionization (SELDI) [48] and permits direct and rapid separation and analyses of small amounts of proteins from biologic samples. High throughput ELISA assays to detect proteins of interest can be easily developed.

Detection of antibodies to tumor-associated or tumor-specific antigens as a basis for cancer screening assays
It has been known for some time, that in some patients, the presence of tumor-specific antigens elicited a tumor specific antibody response [55]. It is now more widely accepted that antibodies to tumor-associated or tumor-specific antigens can be detected in the serum of many cancer patients [25,45], thus making these antibodies candidates for potential cancer biomarkers. However, while such antibodies can be easily detected by ELISA methods, the frequency with which antibody responses to any given cancer antigen among cancer patients is typically low (∼ 10-30%). Thus, it has been suggested that the use of a panel of multiple tumor antigens might serve to identify subsets of cancer patients with greater sensitivity and specificity. Recently, approaches that permit systematic screening for identification of such antigens have been developed including serological analysis of tumor antigens by recombinant cDNA expression cloning coupled to a serum antibodybased screening method (SEREX) [66,67,76]. Briefly, fresh tumor tissue is used to construct a tumor-specific lambda phage cDNA expression library which is transfected into E. coli. Phage plaques are lifted onto a nitrocellulose membrane that has been impregnated with a chemical agent that induces expression of cDNA inserts within the phage library. The membrane containing the phage plaques and their respective cDNA products is subjected to immunoblotting with serum from a patient with the cancer of interest. The serum can be from the same individual whose cancer specimen was used to construct the cDNA expression library, or from another individual. A subset of the phage plaques that are recognized by antibodies in the patient serum will contain cDNAs corresponding to tumor proteins that elicited the specific antibody response, and thus, would be of interest as diagnostic targets. This select group of cDNAs can be further tested for immunoreactivity through a series of further SEREX screening procedures, and the sequence of promising cDNAs can be readily determined by isolating phage DNA. While the SEREX technique may fail to detect some tumor antigens, such as those containing conformation or glycosylation-dependent epitopes, it has the advantage of allowing direct cloning of the corresponding cDNAs by reliable, well-established and relatively inexpensive molecular techniques. A variety of different tumor antigens have been identified using this approach [56], including p53 mutational antigens identified in colon cancer [67], differentiation antigens from melanoma, and over-amplified genes in renal cancer [56].

Conclusion
Identification of novel markers for early identification of cancer is an established priority of the National Cancer Institute of the National Institutes of Health. The rapid expansion of genomic-based technologies developed over the last decade begin to offer the means for achieving this goal.