Proteomics as a Tool for Biomarker Discovery

Novel technologies are now being advanced for the purpose of identification and validation of new disease biomarkers. A reliable and useful clinical biomarker must a) come from a readily attainable source, such as blood or urine, b) have sufficient sensitivity to correctly identify affected individuals, c) have sufficient specificity to avoid incorrect labeling of unaffected persons, and d) result in a notable benefit for the patient through intervention, such as survival or life quality improvement. Despite these critical descriptors, the few available FDA-approved biomarkers for cancer do not completely fit this definition and their benefits are limited to a small number of cancers. Ovarian cancer exemplifies the need for a diagnostic biomarker of early stage disease. Symptoms are present but not specific to the disease, delaying diagnosis until an advanced and generally incurable stage in over 70% of affected women. As such, diagnostic intervention in the form of oopherectomy can be performed in the appropriate at-risk population if identified such as with a new accurate, sensitive, and specific biomarker. If early stage disease is identified, the requirement for survival and life quality improvement will be met. One of the new technologies applied to biomarker discovery is tour-de-force analysis of serum peptides and proteins. Optimization of mass spectrometry techniques coupled with advanced bioinformatics approaches has yielded informative biomarker signatures discriminating presence of cancer from unaffected in multiple studies from different groups. Validation and randomized outcome studies are needed to determine the true value of these new biomarkers in early diagnosis, and improved survival and quality of life.


Biomarkers: A working definition
A biomarker is a measurable or assessable entity that provides diagnostic, prognostic, or treatment-orienting information which can drive patient care [1][2][3]. In order to be time, cost, and patient conscious, optimal biomarkers must fulfill four criteria: 1. Easily attainable; 2. Adequate sensitivity; 3. Adequate specificity; 4. Lead to patient benefit through a therapeutic or diagnostic intervention.
An easily attainable sample is one that can be obtained in a physician's office or clinic and for which limited stringency of preparation is required. Urine is an example of readily usable samples [4,5]. More difficult samples are those requiring an invasive procedure for ascertainment, such as breast nipple aspirate [6,7] or needle biopsy. These may yield more sensitive and specific results, but due to the increased complexity and potential injury during the procedure, they may not attain mainstream applications. Blood is a logical source for biomarker information because it is exposed directly to all organs of the body, and therefore may be an archive of all ongoing processes. Blood samples requiring refrigeration or separation within a 4-24 hour period are commonplace for a variety of clinical tests in current use [8]. In ovarian cancer, this may be especially helpful because the symptoms and signs are not specific to the disease and current diagnostic modalities do not recognize early stage disease [9]. This stage of disease may result in alterations of circulating blood components in a fashion that is detectable with newer technologies [10].
Sensitivity, the ability to correctly identify affected patients, is an important criterion for a biomarker. Cor-rect designation of a process is a logical expectation and has been a reasonably attainable goal. Specificity, the ability to correctly identify unaffected persons, is often more challenging and becomes progressively more difficult as events become more rare. Ovarian cancer is estimated to affect one in 2500 post-menopausal women [11][12][13]. It has been estimated that a specificity of over 99% is required of a useful diagnostic biomarker for a disease with this rare incidence [14,15]. A balance between the stringency of the targeted sensitivity and appropriate specificity coupled with adequately powered test steps is required for success. The need for adequately powered sample sets for validation and prospective testing likely would require sharing of specimens and multi-institutional studies for the development of the necessary sample repositories. This suggests that biomarker development is best done as a team and collaborative sharing approach.
The final criterion of an effective biomarker is that of clinical applicability. Development of biomarkers as a scientific endeavor may have merit, especially if the identified biomarkers yield insight into etiology, mechanism, or therapeutic intervention for disease. However, for biomarkers ultimately to be of clinical value, they must provide information that will direct clinical practice. A blood test that identifies the presence of lung cancer may be useful if the components of the test provide knowledge about lung cancer, or if it is used in conjunction with other diagnostic modalities. A blood test for lung cancer done in a clinical vacuum makes little sense -one cannot resect both lungs to find the cancer for the patient with a positive biomarker. Therefore, the final argument a biomarker must realize is the ability of the clinician to use the information gained to alter patient outcome, such as survival or quality of life. Returning to the ovarian cancer example, one asks, how do we intervene with a positive biomarker? Oopherectomy can be considered in the population of women at risk for ovarian cancer, those who have completed child-bearing. A valid, highly specific and highly sensitive biomarker indicating high likelihood of ovarian cancer or high risk of developing ovarian cancer would provide justification for a diagnostic and/or prophylactic oopherectomy. Testing for a defined biomarker coupled with an appropriate and effective clinical intervention will need to be proven to alter outcome, through early diagnosis and intervention, or improved quality of life and reduced lifetime risk of cancer.

Why proteins?
Given the vast array of information from which to develop and validate biomarkers for detecting ovarian cancer, rationale for the use of proteins is a fair question. Many technologies have been used for discovery of genes and proteins that may function as novel biomarkers [16][17][18][19][20]. Comparative genomic hybridization and cDNA array technology have been used to identify single genes or sets of genes with prognostic or diagnostic information for ovarian and other cancers. Most commonly, these studies have been done using archival tumor samples so do not address the criterion of easily obtainable samples. Gene array studies uncovered useful and interesting information about ovarian cancer that researchers moved forward into biomarker and therapeutic targets development. For example, one very promising protein biomarker, HE-4, was identified in a microarray format [12,[21][22][23][24]. This protein is a whey acidic protein (WAP) shown to be increased in expression and protein quantity in blood of patients with ovarian cancer. It is one of several proteins in a multiplex assay under development [23]. The gene array in this case led to a focus on protein. Secreted circulating proteins are easy targets for detection and quantitation as indicated by the biomarker assay. They may be reflections of the tumor directly and/or its local microenvironment. By virtue of its circulation and contact with all tissues of the body, the clinical analyte source is blood, obtainable and readily applied. The protein is the effector end of the gene in almost all situations, and can be modified co-and post-translationally to further regulate information exchange. Therefore, proteins are easily accessible and may have a greater information load than genomic or genetic materials.

Mass spectrometry as a spy glass
Many approaches to discovery of clinically informative proteins and peptides have been attempted. Discovery tools such as two-dimensional electropheresis (2DE) comparing spot patterns between samples from affected and unaffected patients have been examined followed by sequencing of differentially expressed spots for identification and subsequent verification [17,25]. Unfortunately, this technique is a low throughput system that requires large amounts of clinical material because of the low sensitivity of the technology. Further, it is a slow throughput system. Validation requires further large quantities of sample. We and others have used this technique to identify putative biomarkers in ovarian cancer. Brown and colleagues reported use of laser capture microdissected cells from low malignant potential and invasive epithelial ovarian tumors in a 2DE discovery project [17,26]. RhoGDI was differentially expressed and validated as upregulated in invasive cancer [17]. While individual samples identified might address specificity and sensitivity, ease of sampling and speed of discovery are drawbacks with this technique.
Mass spectrometry (MS) has long been used for peptide sequencing. More recently, it has been applied to high throughput discovery techniques when coupled to chip, matrix, or spray sample introduction methods. MS can be successful with minute quantities of sample and can test hundreds of samples in one day. The high throughput nature of MS lends itself to a biomarker application. SELDI (surface-enhanced laser desorption ionization) [6,27], and MALDI, (matrix-assisted laser desorption ionization) [1,25,28] are two mass spectrometry techniques used successfully for peptide and protein discovery. SELDI uses an on-chip protein fractionation as a first selection followed by MALDI for interrogation. The MALDI technique requires a fractionation or isolation step followed by interrogation. As little as a fraction of a drop of blood can be used with either technique, or alternatively low abundance proteins and peptides can be concentrated using chromatographic selection approaches. Resultant MS datastreams are stable and can be introduced into different discriminatory algorithms with supervised and unsupervised analyses to cull peptides or proteins with potential clinical impact [29][30][31][32].

Mass spectrometry and bioinformatics wed to yield biomarker patterns
Our initial hypothesis argued that blood circulated to all parts of the body and would thus be exposed to tumor, in situ or invasive. MS could then be used to mine the hidden information in the serum to yield diagnostic information. The initial proof of concept brought about a storm of support, opposition, collaboration and competition [32,33]. In the ensuing years, many groups have culled information from MS analysis of serum to yield patterns and/or identification of proteins with diagnostic load [5,16,27,28,30,[34][35][36][37][38][39][40]. The original work used early SELDI-time of flight (TOF) technology with a hydrophobic on-chip separation and cinnamic acid matrix [33]. A proprietary genetic bioin-formatic algorithm of Correlogics, Inc was applied to datastreams from a defined training set of serum from 50 cases of ovarian cancer and 50 unaffected women. A five-space peptide signature was identified and validated against an independent and blinded set of sera. Since that time, our group and others have advanced the process to use more developed SELDI, MALDI, and other MS technology with a variety of bioinformatic platforms. These methods have evolved to reflect the knowledge of the source of the diagnostic markers and their association with albumin. Our current SELDI technology utilizes a strong anionic exchange surface that specifically binds albumin. Recently, ProExpression kits (Perkin Elmer, Inc.) have been used for extraction of diagnostic fragments from albumin which are then profiled using a high resolution orthogonal mass spectrometer [41]. The information from these methods is rich and using bioinformatics methods that are robust and provide a list of important ions can guide the identification of diagnostic markers. These advances in sample preparation and bioinformatics can guide discovery of novel diagnostic markers. Pilot studies have discovered novel cancer-specific serum proteomic profiles in ovarian cancer, bladder cancer, prostate cancer, pancreatic carcinoma, and colorectal cancer [5,16,27,28,30,[34][35][36][37][38][39][40]. Most have used independent training and validation (or test) sets of archival serum samples.
None have yet completed large prospective validation trials.
The Gynecologic Oncology Group is currently accruing to GOG-220, a protocol designed to build and validate a proteomic signature for women with pelvic masses in order to discriminate malignant tumors from benign masses. A clinical biomarker such as this would be of value for triaging women to the appropriate gynecologic oncologic care for their diagnosis and initial therapy of their malignancy. The training set of this trial is powered to require at least 50 cancers and 300 benign masses. Validation will use at least 50 additional cancers and 500 benign masses provided blinded to diagnosis at the time of MS analysis. The subsequent step will be a diagnosis trial in which the algorithm defined in GOG-220 will be applied prospectively for diagnostic triage and for prediction of the accuracy of the diagnostic test. The spectrum of cancers by stage and grade will be important in assessing the potential for the biomarker, if successful, to result in a survival advantage. This would fulfill criterion number four, an intervention with clinical benefit.

POSTANALYTICAL
• Analyze each spectrum to identify peaks before applying diagnostic algorithms • Develop criteria for the acceptability of each spectrum based on peak characteristics • Use peaks rather than raw data as the basis for diagnostic analysis • Use caution in interpretation of peaks with m/z <1200 • Select peaks with high intensities and sample stability for diagnosis • Select approximately equal numbers of peaks that increase and decrease in intensity as diagnostic discriminators • In developing a training set for diagnosis, careful clinical classification of patients is essential • Clinical validity depends on having a typical rather than highly selected population of patients • The number of training specimens should be at least 10 times the number of measured values • Any clinical application should use a fixed training set and algorithm for analysis • Any analysis should provide a numerical value • Diagnostic performance should be evaluated with ROC curves to select cutoffs • A sensitivity analysis should be performed of the necessary precision for accurate diagnostic performance • There should be QC procedures for daily verification of software performance *Adapted from [43].

To know or not to know . . . the identity of the proteins
The outcome of MS analysis of serum in a biomarker discovery platform falls into two categories: individually identified proteins or patterns of multiple proteins. Both may have diagnostic value coming from different approaches. MS datastreams can be analyzed in a wide variety of higher order bioinformatic algorithms charged with identifying proteins or patterns of proteins that discriminate event A, e.g. malignancy, from event B, e.g. benign disease [1,42]. A supervised training of the algorithm can yield a product that can then be evaluated for its sensitivity and specificity, or can be subjected to further analysis including protein sequence identification. Development of a product set of MS signals that can correctly categorize events, and therefore correctly diagnose disease from no disease, has been shown to be a probable and reliable outcome of MS proteomics.
The initial proof of concept study demonstrated that a pattern of MS features found in serum could discriminate samples between ovarian cancer patients and un-affected patients [33]. A different pattern was shown to segregate prostate cancer from unaffected males [34]. This was not a simple separation of male from female. The "black box" discriminant, a descriptor rather than identified proteins may be intellectually frustrating while still being clinically useful. The identity of the protein(s) may yield key information into the etiology, behavior, and/or treatment of the disease. Thus, many groups have taken a different approach, wherein they identify peaks of interest and assess them independently for clinical value. For example, three markers (transthyretin, -α-trypsin inhibitor and apolipoprotein A1) were identified as potential diagnostic biomarkers for early stage ovarian cancer using SELDI-TOF patterns to guide identification by SELDI-MS/MS [16]. Is one preferable over the other clinically? Any biomarker that has documented validity, stability, and is shown to have a clinical benefit to the population, whether a "black box" of information or known entities would be a dramatic and critical advance to ovarian cancer and many other illnesses. Fig. 1. The capture of proteins from samples by antibodies is detected by label free detection such as surface plasmon resonance. Once protein has been detected, any modifications are characterized by mass spectrometry.

A window into the future
The use of proteomic patterns as potential diagnostic technology is evolving due to advances in the understanding of sample acquisition and processing, sample fractionation and preparation, robotic processing, mass spectrometry, bioinformatics and data analysis, and interpretation. However, there is much to be done before these techniques can be introduced to the clinical lab. The basics of the source of the diagnostic ions must be confirmed and validated. Understanding the principle of the test procedure is a necessary part of any clinical test. In addition, there are many factors that must be addressed before mass spec technology can be considered for clinical use (Table 1) [43]. Meanwhile, the pattern approach is being adapted to and used in traditional immunoassays some of which have been incorporated into microarrays [16,44]. This approach is one that could allow transition to the clinical laboratory because the technology is better understood. However, the manufacturing issues of a multiplexed assay system are challenging and will need to be resolved. The combination of the microarray approach using binding partners such as antibodies along with mass spectrometry is an exciting possibility. As shown in Fig. 1, antibodies would be used to capture proteins or peptides of interest, which could then be detected by methods such as surface plasmon resonance. MALDI interrogation of the bound protein or peptide could then be used to fine tune the specificity of the antibody binding event. This may detect subtle differences in disease-related proteins, such as phosphorylation events that are difficult to detect using antibodies but are discernible by mass spectrometry.

Parting shots
Clinical proteomics is a rapidly evolving field whose clinical application has yet to be realized. Controversies regarding reproducibility, reliability, sample handling and analysis of the data were thought to have tainted the early promise of this application. On the contrary, these early studies have provided important lessons for the successful application of clinical proteomics in the future. Clinical proteomics offers the potential of early diagnosis of disease and prognostic information to guide clinical treatment of the patient. Additionally, proteomics may supply information regarding drug susceptibility and toxicity that may foretell side effects and complications. The greatest hope of clinical proteomics may be the potential to individually tailor treatment to the patient, a truly revolutionary step in the practice of medicine.