Propidium Iodide is a fluorochrome that is used to measure the DNA content of individual cells, taken from solid tissues, with a flow cytometer. Compensation for spectral cross-over of this fluorochrome still leads to compensation results that are depending on operator experience. We present a data-driven compensation (DDC) algorithm that is designed to automatically compensate combined DNA phenotype flow cytometry acquisitions. The generated compensation values of the DDC algorithm are validated by comparison with manually determined compensation values. The results show that (1) compensation of two-color flow cytometry leads to comparable results using either manual compensation or the DDC method; (2) DDC can calculate sample-specific compensation trace lines; (3) the effects of two different approaches to calculate compensation values can be visualized within one sample. We conclude that the DDC algorithm contributes to the standardization of compensation for spectral cross-over in flow cytometry of solid tissues.
Multiparameter flow cytometry (MP-FCM) of solid tumors is a powerful tool for quantification of antigen expression and DNA content, based on large numbers of individual mammalian cells. However, simultaneous application of different fluorochromes introduces spectral cross-over. Spectral cross-over is the acquisition of fluorochrome intensities from a primary fluorochrome in the detector(s) used to acquire the intensity of secondary fluorochromes. Compensation is the estimation of the amount of fluorochrome intensity that needs to be subtracted from the acquired intensities to correct for spectral cross-over [
The problem of flow cytometry when working with cells originating from solid tissues is the use of propidium iodide (PI). PI is a dye that binds to DNA, and the acquired intensity is proportional to the amount of DNA in a (tumor) cell. The major advantages of PI are that (1) the DNA profile of the acquired (tumor) cells can be studied in relation to their phenotype, (2) it is possible to get very small coefficients of variation (CV) even in paraffin embedded material, in contrast to other DNA dyes like TOPRO3 and, (3) PI doesnot stick to the interior of the flow cytometer, like DAPI does. The major disadvantages of the PI dye are (1) spectral cross-over in all detectors that primarily detect fluorochromes excited with 488 and 635 nm lasers and (2) it binds noncovalently to DNA. The loose binding makes compensation case specific because the average amount of bound PI is also dependant on the number of cells in a single-cell suspension. When the number of cells varies from case to case, the acquired average intensity of PI varies, and therefore the primary detector of PI needs variable amplification. Variable amplification of detectors in a flow cytometry system disturbs compensation matrices. Therefore each case needs its own compensation matrix. As a consequence compensation needs to be performed for each case separately. This is time consuming and therefore costly.
In this paper we investigate spectral compensation using an algorithm we called Data-Driven Compensation (DDC) which is especially developed to deal with variable compensation matrices when PI is used to study the DNA content of (tumor) cells in a single-cell suspension. We describe the analysis steps of the automated compensation concept based on the data of a 2-color experiment. In 2-color flow cytometry each event or count represents the acquired fluorochrome intensities for the 2 colors of one cell. The main difference between the DDC method and known compensation algorithms is based on the fact that DDC calculates the compensation values on automatically selected counts. The key characteristic of the selected counts is that they all have the same primary fluorescence. This feature reduces the value spread of the counts that are selected to calculate the correlation between the acquired fluorochrome intensities in the primary PI detector and a secondary detector. The reduced spread of these counts thus opens the possibility of automatically compensation for spectral cross-over from PI, avoiding the need for any manual intervention.
Over the period of January through December 2007 we analyzed 227 lymph node biopsies that were obtained from sentinel lymph node (SLN) procedures in breast cancer patients [
To ensure comparability of individual cases data files, 190 of a total of 227 data sets were selected based on a minimum of 100.000 counts acquired for both the NC and TEST files. A further 38 cases were excluded from the data set because of a laser replacement and laser beam alignment optimization, leaving a total of 152 cases to enter the final data set.
All flow cytometric acquisitions were performed on a BD FACSCalibur (BD Biosciences, San Jose, CA) flow cytometer with a single 488 nm argon laser. Forward light scatter, right-angle (side) scatter, and two fluorescence signals (FITC and PI) were acquired simultaneously in list mode. The fluorescence was measured using the standard photomultipliers (PMTs) and optical filters (530/30 nm BP filter for FITC and 670 nm LP filter for PI). The forward scatter was recorded with a photo diode. For FITC emission the pulse height was recorded (FL1h), for PI emission, in addition to the pulse height (FL3h), also the pulse width (FL3w), to calculate the area (FL3a), was acquired.
For each sample 100.000 counts were acquired, triggered on FL3. The DNA content was recorded in linear mode with a resolution of 1024 units. The FITC expression of the cells was recorded in logarithmic mode with 4 log decades in a range of 100 to 104, also using a resolution of 1024 units. No hardware compensation was performed during flow cytometric acquisition.
For the analysis of the flow cytometric data files the software package Summit V 4.0. (DakoCytomation, Glostrup, Denmark) was used. Selection of single cells was accomplished by setting a region in a dot plot of FL3-w (abscissa) against FL3-a (ordinate) of the PI parameter for the NC and TEST separately. As a standard procedure, the height of the single cell region was set to include the 2C, 4C, 6C and 8C ploidy clusters. These four clusters represent cells in, (1) the diploid G0/G1 (2C) mode; (2) the combined G2M-phase (4C)—diploid cell cycle—plus tetraploid G0/G1 cells; (3)-(4) two aggregate peaks (6C and 8C), with the 8C peak containing cells in the G2M phase of the tetraploid cell cycle. The four clusters are identified in Figure
In the standard procedure, not data-driven, compensation for spectral cross-over is accomplished using a dot plot of FL3a-PI (abscissa) against FL1h-FITC (ordinate), by entering a percentage (compensation value) in the compensation matrix of the Summit software. As stated before, correct compensation is achieved when the compensated FITC intensity has no bias in the fluorescence distribution that is related to the acquired PI intensity. Since spectral compensation leads to compensation artifacts [
This section describes the theoretical background that stimulated us to develop a data-driven approach for compensation of spectral cross-over compared to hardware-driven or operator-dependant methods. The total fluorescence intensity detected by a primary fluorescence PMT in a 2-color flow cytometer setup consists of four contributions: (1) the fluorescence signal of the primary fluorochrome, (2) the cellular autofluorescence [
To confirm and visualize this intensity-dependent bias, operators plot the fluorescence intensities as detected by the primary PMT against those detected by the secondary PMT. The resulting dot plot contains an approximately linear distribution of points with the intensity-dependent bias expressed as the slope of the distribution [
Most dedicated software packages such as Summit version 4.0 (Dako Cytomation), Flowmax V 2.2 (Partec), and WinList V 5.0 (Verity Software House Inc.) function with manually determined values of the intensity dependent bias. In contrast, the semiautomatic determination of the compensation value is used as a feature in the Winlist, Summit, and FloJo (V 7.5) software packages. All of these methods essentially work with an initial estimate of the intensity dependant bias and require operator evaluation and validation through visual inspection of the compensation results. In most cases this process results in an iterative manual refinement of the initial estimate of the intensity dependant bias and is very clearly operator dependent.
In this paper we propose a fully automated compensation algorithm we have called Data-Driven Compensation or DDC. With this method we aim to exclude operator-associated subjectivity in defining the compensation value. To achieve this, we select for a subset of counts with the common characteristic of value “zero” for their primary fluorescence. As defined above, the background fluorescence is composed out of three factors only: autofluorescence, optical or electronic noise, and cross-over from a second fluorochrome. Autofluorescence and noise are independent of the signal of a second fluorochrome. The cross-over contribution however increases the intensity of the observed fluorescence proportionally to the intensity of the second fluorochrome. This difference in contribution between “signal-independent” and “signal-dependent” elements provides for a good tool to measure the amount of cross-over. If cross-over signaling would not exist, all counts with zero primary fluorescence would effectively be displayed on a line parallel to the abscissa or ordinate in a dot plot. Therefore, the slope of the total fluorescence in the selected counts with zero primary fluorescence can serve this purpose. The resulting selected subset has a reduced variance of FITC intensity when compared to that of all of the counts acquired as illustrated in Figures
In the following section we will describe formal approach of compensation, using the DDC concept, and explain the automatic determination of the intensity dependent bias.
In a 2-color flow cytometry experiment with single cross-over, we define a detector
In essence the DDC algorithm automatically selects counts with zero primary fluorescence content from the acquired flow cytometry data. Below we describe the automatic selection of the zero fluorescence counts and the calculation of the cross-over values from these counts.
As described, an NC and a TEST sample, with
Identification of the common counts as basis for the calculation of the cross-over value. The matrices for the NC (a) and TEST (b) result from an experiment with 4 counts (
In the files every count consists of a total fluorescence signal to which different components contribute. In a similar way the total number of counts is made up of different classes of counts. Three types can be identified in the NC file: (1) negative counts with zero primary fluorescence, (2) positive counts with primary fluorescence signal generated by nonspecific binding of nonrelevant mouse Ig (staining background), and (3) counts that solely result from noise and other flow cytometer imperfections. In a typical experiment less than 5% of type 2 counts are expected from the immunostaining in the NC. In a well-calibrated and maintained flow cytometer, the number of type 3 counts will be less than 0.1%. In summary, the counts of the NC sample consist of about 95% of the type (1) counts.
The counts collected from a TEST sample consist of the same three types as those described for the NC sample, supplemented with counts that contain sufficient fluorescence intensity to be defined as “positive.” The common counts of the NC and TEST matrices will
Having identified the common counts with zero primary fluorescence, we can now proceed to calculate the cross-over values. In the case of single cross-over from
Even though we were able to present a proper path to define compensation, no objective measure exists to quantify and validate the correctness of compensation. Visual inspection is subjective as it depends solely on operator experience while the RIDB value is also visually estimated from the compensated dot plots. Herzenberg et al. [
To test the DDC algorithm we conducted three experiments. In the first experiment we compared the compensation values obtained with DDC to those of manual compensation. All manual compensations were performed individually in the Summit software. We consider the performance of the DDC algorithm comparable with manual compensation when the two sets of compensation values agree. In the second experiment the results of the SLN analysis are calculated. These results are expressed as the percentage of positive counts in the TEST sample. The outcome is compared between the DDC and Summit paths. For these first two experiments a new MATLAB implementation was written for the reanalysis of the cases in the DDC dataset, which used the function FCA_readfcs [
The third experiment is performed to express the correctness of compensation. We herewith propose a new option, building on earlier work. Two methods have been suggested to achieve proper compensation. The first method uses a single stained control (SSC) for each fluorochrome in the panel [
To test the capability of the DDC algorithm to correctly calculate the compensation values, we compared these calculated values against the manually determined values. After linear transformation of the Summit compensation values
Results of a regression analysis of the manually determined compensation values using Summit V4.0 (
(a) shows events of a negative control (NC). (b) shows the events of the matching TEST. Both graphs are uncompensated 2-color plots, from sentinel lymph node (SLN) tissue. (c) shows the TEST after compensation, using a trace line fitted through the common counts of the NC and SSC (dashed line). (d) shows the TEST values, after compensation with a compensation trace fitted through the CC of the NC and the TEST (the full line). Each dot plot consists of 100.000 counts. The grey dots represent the individual counts in each dot plot. The black dots represent the common counts. The median values of each of the 5 distinct clusters (ploidy level 2C, 4C, 6C, 8C, and 10C) are marked with an “X.” For optimal visual representation, the pairs of compensation trace lines are white in the upper graphs and black in the lower graphs. Note that the 2 different compensation trace lines seem to be parallel in (a) and (b), which is an optic illusion because the ordinate is logarithmic. In the linear domain these 2 compensation trace lines diverge. The dot plots represent logical (
A total of five cases (out of 152) fall outside the 95% confidence interval. Two of these five cases are very close together under the lowest prediction bound, and their mutual characteristic is high background signal of the NC sample. This high nonspecific binding in these NC samples leads to an increase of CC values. Since this nonspecific reactivity induces a shift towards higher FITC intensities, fitting the compensation trace line leads to a higher compensation value, and thus to overcompensation. The other three cases (above the highest prediction bound) were reanalyzed by the two operators. They both turned out to be manually overcompensated in the Summit procedure.
The objective of the SLN procedure is to determine the percentage positive counts in a TEST sample compared to its NC. Therefore a second regression analysis was performed to compare the percentage positive counts found with the original SLN Summit procedure (pos_Summit) versus the procedure with incorporation of the DDC algorithm (pos_DDC). The regression line is defined by
Even though compensation values might agree between the two different methods applied, it doesnot automatically imply that these compensation values are correct. As mentioned before we acquired an additional 45 cases, stained according the SSC concept. The compensation values found with using the CC of the SSC and the NC (set 1) can thus be compared with the ones obtained using the CC of the NC and TEST (set 2).
The 2 sets of compensation values show no Gaussian distribution and are highly correlated (
The different compensation values within one sample are illustrated in Figures
We have evaluated an automated data-driven approach to fluorescence compensation in a two-color setting. All other compensation methods used thus far involve operator interaction and limit a fully automated data analysis. In order to provide confirmation and validation for the proposed methodology, we have used two methods that are commonly used in flow cytometry laboratories as reference, the Summit and the Winlist software. We have compared the outcome of the proposed DDC method against the outcome of these reference methods. The results show that (1) two color flow cytometry compensation and analysis of sentinel lymph nodes in Summit and DDC lead to comparable results, despite that Summit uses different compensation values than DDC does, (2) the effects of different approaches to calculate compensation values based on the CC can be visualized within one sample, (3) DDC is a data-driven method that combines the information of a NC or SSC with a TEST to calculate sample-specific trace lines, which enables the possibility to automatically analyze large datasets of individual cases batchwise. This is where DDC improves current compensation methods.
According to the Summit reference guide, the Summit software uses a variant of our formulas (
To better understand the different distributions in the compensated dot plots, we have recompensated 10 data files using the Winlist software and have compared these results with data obtained using Summit. The Winlist option was chosen as a valuable alternative, as it is an independent, commercially available and commonly used software. The compensation algorithm in Winlist is based also, as we have done in the DDC approach, on (
The DDC method as proposed in this work makes it possible to objectively compare the effect on fluorescence compensation results when using different approaches.
In the experiments performed in our laboratory, both FITC and PI are used for simultaneous staining. The compensation performed requires an estimate of the cross-over of signal from the PI dye into the FITC detector (FL1). This cross-over effect and its estimate were also discussed in earlier work [
One alternative path in the literature claims that the percentage cross-over can only be estimated with using an SSC [
Although we found a better distribution of median values after overcompensation, in a specific two color setting, this does not imply that overcompensation always yields better results. When compensated values of one fluorochrome influence the compensated values of other fluorochromes, overcompensation is not recommended. On the other hand when the fluorochromes FITC, PI, and APC are combined, there is only cross-over from PI in FITC and from PI in APC. In this specific setting the compensated FITC and APC values do not influence each other, therefore both can be automatically compensated with DDC, based on two pairs of isotype controls and their TEST.
Batchwise compensation with currently available software is only possible with stable compensation values, which can be monitored by daily calibration and performance tracking. However, PI binds noncovalently to DNA and therefore the intensity of the PI signal is dependant on the cell concentration. Given the same amount of PI, single-cell suspensions with high cell concentrations result in lower PI intensities per cell. The result is a shift of the acquired FL3-a histogram to the left, as compared to samples that contain lower cell concentrations. DNA acquisitions are therefore standardized by adapting the voltage of the PMT of the first acquisition in each tumour sample, until the median value of the first peak falls in channel 200 of a 1024-resolution, linear acquired, DNA histogram [
The evolution from limited multiparameter to full polychromatic flow cytometry comes with increasing flow cytometry colors and parameters [
The DDC approach presented here also integrates operator experience to appropriately deal with sample-specific variables. The set of CC represents the specific area in the traditional approach to set a compensation trace line [
We expect that the implementation of DDC will improve the use of classification algorithms by generating training sets with data that have been corrected for case specific variability.
In the case of 2-color experiments with double cross-over, a NC uses two non relevant antibodies for background staining, one coupled to fluorochrome
Illustration of the identification of the common counts and the calculation of the cross-over value. Each 2-color experiment consists of 2 acquisitions; a negative control (NC) and a TEST. Matrices are shown for the NC (a) and TEST (b) based on an experiment with a total of 4 counts. Cross-over exists from fluorochrome
In a dot plot of
To compare the manual compensation values generated in the Summit software with those obtained with the DDC algorithm, the compensation units need to have a quantifiable relationship. Unfortunately, the unit of compensation in the Summit software is not documented. To properly define this unit, we have compared the “Summit” units to those units as defined in the Winlist software. This software uses compensation algorithms described by Bagwell. These algorithms have been extensively documented [
We used 10 of the acquired data files and reanalyzed these using the Winlist software. Table
The compensation values obtained on 10 data files using Summit and Winlist software. The compensation value corrects for spillover of the PI dye (detected in FL3) into the FL1 detector (FITC).
Compensation value Summit | Compensation value Winlist |
---|---|
1.0 | 0.05 |
0.9 | 0.04 |
1.35 | 0.07 |
1.45 | 0.08 |
1.65 | 0.09 |
1.50 | 0.08 |
1.20 | 0.07 |
0.80 | 0.04 |
0.80 | 0.04 |
1.1 | 0.05 |
The authors acknowledge E. O. Postma, R. Langelaar, and F. Nauwelaers for their critical comments on this paper.