Chemometrics-Enabled Raman Spectrometric Qualitative Determination andAssessment of Biochemical Alterations during Early Prostate Cancer Proliferation in Model Tissue

+e use of Raman spectroscopy combined with multivariate chemometrics for disease diagnosis has attracted great attention from researchers in recent years. +is is because it is a noninvasive and nondestructive detection approach with enhanced sensitivity. However, a major challenge when analyzing spectra from biological samples has been the detection of subtle biochemical alterations buried in background and fluorescence noise. +is work reports a qualitative chemometrics-assisted investigation of subtle biochemical alterations associated with prostate malignancy in model biological tissue (metastatic androgen insensitive (PC3) and immortalized normal (PNT1a) prostate cell lines). Raman spectra were acquired from PC3 and PNT1a cells at various stages of growth, and their biochemical alterations were determined from difference spectra between the two cell lines (for prominent alterations) and principal component analysis (PCA) (for subtle alterations). +e Raman difference spectra were computed by subtracting the normalized mean spectral intensities of PNT1a cells from the normalized mean spectral intensities of PC3 cells. +ese difference spectra revealed prominent biochemical alterations associated with the malignant PC3 cells at 566± 0.70 cm , 630 cm , 1370± 0.86 cm , and 1618± 1.73 cm 1 bands. +e band intensity ratios at 566± 0.70 cm 1 and 630 cm 1 suggested that prostate malignancy can be associated with an increase in relative amounts of nucleic acids and lipids, respectively, whereas those at 1370± 0.86 cm 1 and 1618± 1.73 cm 1 suggested that prostate malignancy can be associated with a decrease in relative amounts of saccharides and tryptophan, respectively. In the analysis using PCA, intermediate-order and highorder principal components (PCs) were used to extract the subtle biochemical fingerprints associated with the cell lines. +is revealed subtle biochemical differences at 1076 cm , (1232, 1234 cm ), (1276, 1278 cm ), (1330, 1333 cm ), (1434, 1442 cm ), and (1471, 1479 cm ).+e band intensity ratios at 1076 cm 1 and 1232 cm 1 suggested that prostate malignancy can be associated with an increase in subtle amounts of nucleic acids and amide III components, respectively. +e method reported here has demonstrated that subtle biochemical alterations can be extracted from Raman spectra of normal and malignant cell lines. +e identified subtle bands could play an important role in quantitative monitoring of early biomarker alterations associated with prostate cancer proliferation.


Introduction
Cancer is a potentially fatal disease dominated by the uncontrolled growth and metastasis of abnormal cells [1]. By 2018, prostate cancer was estimated to be the second most frequent cancer and the fifth leading cause of cancer death in men [2]. e conventional prostate-specific antigen (PSA) procedure for prostate cancer diagnosis has limitations and risks of early detection and treatment of indolent cancers [1], and Gleason grading may be highly subjective rendering to inter-and intraobserver variations in pathological reporting due to resultant false-positive and/or false-negative outcomes [3].
is highlights the need to develop highly sensitive, less invasive, real time, and relatively low-cost methods for screening patients at the early stages of cancer development.
In the search for biomolecular differences between normal and diseased biological samples, vibrational spectroscopy methods, e.g., Fourier transform infrared (FTIR) absorption spectroscopy and Raman spectroscopy, have increasingly generated interest in the biomedical sciences. ese methods have been applied successfully in disease diagnostics due to their capability of performing quantitative analysis on the different chemical compositions and molecular structures of healthy and pathological tissues [4,5]. For instance, FTIR spectroscopy has been previously employed to spectrally differentiate between normal and neoplastic human skin samples suffering from epithelioma and basalioma cancers [4] and in characterizing the damage and regeneration caused by chemical agents in liver tissues [5,6]. In the last two decades, Raman microspectroscopy has become a popular noninvasive spectroscopic technique for in vivo, ex vivo, and in situ human cancer studies [3,[7][8][9][10], thanks to the numerous advances in instrumentation, including the use of various types of powerful laser sources for excitation, novel optics arrangements, and sensitive detectors. Besides being a rapid and objective technique, it requires minimal sample preparation, is less invasive, is reagent free, has high spatial and depth resolution (down to 1 μm), and is minimally influenced by water bands [10,11]. Its potential is enhanced when combined with chemometric algorithms for extracting the most significant chemically relevant information, while the less informative data and insignificant information (noise) are discarded.
Recent works show that Raman microspectroscopy has been extensively used in detecting biochemical alterations in prostatic malignancies. For instance, a 785 nm excitation laser-based spectroscopic study on benign prostate tissue specimens, employing Raman spectroscopy and total reflectance Fourier transform infrared spectroscopy (ATR-FTIR) in liaison with principal component analysis-linear discriminant analysis (PCA-LDA) and variable selection techniques (genetic, successive projection algorithms), was reported by eophilou et al. [12]. In this study, deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) bands in the 1000-1400 cm − 1 region were observed to be the most significant for segregation between the prostate tissues archived over a period of three decades. In [13], the applicability of 1064 nm wavelength in prostate cancer diagnosis was evaluated where spectral markers associated with proteins, lipids, and nucleic acids differentiated malignant and benign samples, mainly in the 1000-1450 cm − 1 region. A support vector machine (SVM) classification model predicted Gleason scores of samples with an overall accuracy of 95%. Elsewhere, a dual excitation wavelength system (based on − 785 nm and 671 nm lasers) was employed in assessing human prostate tissues in fingerprinting (500-1800 cm − 1 ) and high wavenumber (2800-3550 cm − 1 ) regions [14]. In this study, application of SVM on spectral datasets showed the 500-1800 cm − 1 and 2800-3050 cm − 1 spectral regions contained complementary molecular information for prostate cancer detection. e peak intensities of DNA/ RNA, lipids, carbohydrates, and proteins were found to be higher in cancerous tissues when compared to peak intensities in normal tissues.
To understand early progression of prostate cancer, the potential of using monoclonal proliferated immortalized prostate cell lines in prostate cancer studies has been investigated using Raman microspectroscopy and chemometrics. For instance, various chemometric methods (PCA, classical least-squares curve fitting (CLSCF), and selfmodelling curve resolution (SMCR)) were evaluated on their capability to detect biochemical differences between two human prostatic cells (PC3 and PNT2) [15]. PCA was found to be more suitable for the imaging process, where the second principal component (PC2) successfully revealed tumor cells exhibiting a modified distribution of cytoplasmic lipid fraction. On the contrary, SMCR revealed concentration levels of proteins and lipids, where a higher concentration increase in lipids (97%) was observed in tumor cells. In another study [16], biochemical differences between immortalized normal prostate cells (PNT2) and metastatic androgen-independent prostate cancer cells (DU145) were investigated. e features of glycogen (485 cm − 1 ), phenylalanine (621 cm − 1 , 1003 cm − 1 , and 1031 cm − 1 ), DNA (723 cm − 1 ), and nucleic acids (1184 cm − 1 ) were found to be influential for classification assignment into the cancer class (DU145), whereas the L-arginine (497 cm − 1 ) component was found to be influential for classification assignment into the normal class (PNT2). In this study, a PCA-LDA model distinguished the two cells with specificity and sensitivity values of 88% and 95%, respectively.
Often, however, a major challenge would be the detection of subtle biomolecular differences that have eluded detection, perhaps due to their extremely small concentrations against the pronounced background and fluorescence noise. A possible solution which is yet to be explored for detection of subtle alterations during prostate cancer proliferation is the combination of Raman spectroscopy and principal component analysis (PCA). PCA is a widely used dimensional reduction multivariate analysis technique for removing redundancy in the original datasets. PCA aims at reducing the dimensionality while, at the same time, accounting for as much of the variation in the original dataset as possible [17]. With PCA, the data are transformed into a new set of coordinates or variables (principal components) that are a linear combination of the original variables, whereas the observations (scores) in the new principal component space are uncorrelated. In many spectral analyses, however, only the first few principal components (PCs) that account for large proportions of variance (i.e., most significant information) are retained, while the remaining higher order PCs (typically assumed as noise) are discarded [17,18]. Limiting spectra data analysis to only a reduced set of the first few principal components is disadvantageous in two ways: firstly, there is no guarantee that the difference between the sample groups of interest will be in the direction of the first few or high-variance principal components [18], and secondly, the low-variance signals (weak Raman bands) in the presence of much higher variance signals can vanish into the noise of higher principal components [19]. e loss of low-variance signals can exclude most of the subtle information (e.g., subtle biochemical alterations) in the original data concerning between-group variation.
In work by Jolliffe [18], it was shown it is not obvious that the difference between the sample groups will always be in the direction of the first few or high-variance principal components.
is is because the first few principal components are useful for identifying differences between sample groups in the following situations: (i) where the between-group variation is much larger than within-group variation and (ii) where within and between-group variation have the same dominant directions [18]. For instance, work by eophilou et al. [12] showed that the first six principal component scores that accounted for >90% variance could not demonstrate any substantial separation of prostate chippings according to their year of collection although the principal components scores were statistically significant (p < 0.05). In another study, the third principal component (PC3) that accounted for 10.2% of total cumulative variance was found to better discriminate two prostatic cells (PNT2 and DU145 cells) than using the first principal component (PC1) that accounted for 62% of total cumulative variance [16]. Because the best subset of principal components is not necessarily limited to those with the largest variances, Pelletier [19] suggested that analysis of the excluded higher principal components could be a potential method of detecting analyte information and other trends in the dataset.
is is in agreement with observations made by Jolliffe [18] that the last few PCs are not simply unstructured left-overs after removing the important PCs but can be rather useful in other ways which include detecting unobservable near-constant linear relationships between variables, in regression, and selection of subset of variables, in outlier detection.
Currently, there is lack of research interrogating diagnostically useful weak band biomolecular alterations during prostate cancer progression based on the utility of low-variance principal components. erefore, the aim of this study was to investigate the weak detectable biomolecular alterations associated with the metastatic androgen insensitive (PC3) and immortalized normal (PNT1a) prostate cell lines through utility of intermediate and high-order PCs. e weak band alterations in cells were studied in a span of three stages of cell proliferation, i.e., stage 1 (48 hours), stage 2 (72 hours), and stage 3 (96 hours). PCA was used for multivariate analysis. e chosen principal components must be statistically distinct in order to make biological sense; otherwise, they would be only representing random directions [20]. erefore, the utility of PCs was determined by a combination of two statistical criteria, i.e., the t-test and effect size (Cohen's d and Pearson correlation coefficients (r)).
In brief, the breakdown of this paper is as follows: in Section 2, the adopted samples preparation, Raman spectral data collection, and spectral analysis procedures have been described. Section 3 presents the results and their discussion. Finally, Section 4 highlights the significant findings of the study.

Cell Lines Samples Preparation.
e metastatic androgen insensitive (PC3) and immortalized normal (PNT1a) human prostate cell lines at passages 3 and 4, respectively, were prepared at Kenya Medical Research Institute (KEMRI), Nairobi. Both cell lines were cultured in Dulbecco's Modified Eagle Medium (DMEM (1x)) (Gibco for approximately 8 minutes, rinsed with double distilled Millipore water, and then allowed to dry overnight in the biosafety chamber. e attached monolayer cells were examined using the Raman optical microscope system. e remaining cells in T-75 flasks were detached by the usual trypsinization procedure and centrifuged twice at 1200 revolutions per minute (r. p. m.) for 5 minutes. e resultant cells were twice washed in a 1.5 ml phosphatebuffered saline solution (Sigma-Aldrich ® , USA) and centrifuged at 1200 r. p. m. for 5 min after each wash. After removing remaining supernatants, cells were vortexed again, then suspended in 1.5 ml phosphate-buffered saline solution, and stored at − 80°C.

Raman Spectroscopy Instrument.
Raman spectra were recorded using the STR series Raman spectrometer set up (Airix Corporation, Japan) that comprised two sources of excitation: a diode-pumped solid-state laser gem diode green 532 nm (class IIIb, 50 mW, Laser quantum Corporation, UK) and a single-mode diode near-infrared 785 nm (class IIIb, 100 mW) lasers. e scattering efficiency is higher at shorter wavelengths (e.g., 532 nm), thereby resulting in shorter integration times [21], but so does autofluorescence and sample photodegradation [22]. erefore, spectra were collected using 785 nm laser excitation. Importantly, 785 NIR nm excitation source has good penetration depths into the samples (e.g., biological tissues), leading to excitation of large volumes of sample [23]. Raman signals were collected with a 50x infinity corrected microscope objective (Olympus BX51, Olympus Corporation, Tokyo) that was focused after cutting out more than 99.9% of the elastic scattering light through 785 nm MaxLine ® laser clean-up filter (LL01-785, Semrock Corporation) onto the entrance slit of a spectrograph (Princeton Instruments (Acton SpectraPro2300), entrance slit 10 um∼3.0 mm, focal length 300∼500 mm, grating 600 lines/mm). A charge-coupled device (CCD) camera (Ludl Electronics Products, Ltd) was employed for signal detection. e experimental parameters/control functions (including cosmic-ray removal and file conversions) were set and managed by the Window-based STR Raman version 1.9.3 software (Airix Corporation, Japan).

Raman Spectral Data Collection.
awed suspended supernatants were centrifuged at 1200 revolutions per minute (r. p. m.) for 5 minutes. To avoid large background signals that could emanate from the sample substrate, the calcium fluoride substrates (Crystran, UK) were specifically employed for this study [21]. For Raman measurements, a 15 μl aliquot was deposited onto freshly sterile prepared calcium fluoride substrates. e cells were allowed to dry at room temperature (approximately 18°C) in the biosafety chamber for the next 6 hours and immediately taken for spectroscopic measurements. e spectra were collected in the range of 490-2132 cm − 1 region, with a resolution of 1.0 cm − 1 . e collected spectra for each cell were as follows: 48 hours spectral measurements (PC3 � 79 spectra and PNT1a � 75 spectra), 72 hours spectral measurements (PC3 � 84 spectra and PNT1a � 76 spectra), and 96 hours spectral measurements (PC3 � 97 spectra and PNT1a � 101 spectra). e exposure time was 90 seconds. e laser power irradiation over the samples ranged between 19.80 (±0.2) mW and 21.10 (±0.03) mW. e room temperature was constantly maintained at 22°C to ensure wavelength stability. Measurements were taken in the dark to avoid interference from ambient light sources.

Spectral Analysis.
Spectral data analysis was limited to the 500-1800 cm − 1 region. is region is advantageous in revealing the major relevant biochemical fingerprint alterations in biological samples and circumventing baseline correction issues that would possibly emerge from the large background contributions in the lower wavenumber regions [21].
is was performed in two separate processes as follows: (i) To examine prominent Raman band alterations in both cells, spectra data were coadded, averaged, baseline-corrected (linearly), and minimally denoised by Daubechies wavelet transformations, followed by spectral smoothing based on Savitzky-Golay (S-G) polynomial filter smoothing algorithm. For plotting, ORIGIN (Origin pro 9.1) software was used. e resultant 6 averaged spectra (3 averaged spectra for PC3 cells and 3 averaged spectra for PNT1a cells) were normalized to their maximum intensity. To further examine the band intensity differences in the spectral profiles, the difference spectra were computed by subtracting the normalized mean spectrum intensities of PNT1a samples from the normalized mean spectrum intensities of PC3 samples. e two-sample t-test was used to determine their statistical significance. (ii) To determine subtle biochemical alterations, individual raw spectra were baseline corrected (linearly) and normalized to their maximum intensity. e spectral denoising preprocessing procedure was avoided to preserve all pertinent spectral features in datasets.
e PC3 and PNT1a spectra were then combined into single matrices, i.e., w x n, where w � wavenumbers and n � number of spectra, based on the stage of cell proliferation (48 hours, 72 hours, and 96 hours). e resultant stage 1 (X 781×154 ), stage 2 (X 781×160 ), and stage 3 (X 781×198 ) matrices were subjected to singular value decomposition principal component analysis (SVD-PCA) [17]. With this procedure, three matrices U, S, and V were obtained, where matrix U represented the scores, S the size of the scores, and V transpose (V T ) the loadings. is was done in MATLAB 2018a scripting environment.
e primary estimation of the number of probable significant PCs to be retained was done by two methods as follows: firstly, by examining the cumulative fraction (or percentage) of the data and/or eigenvalues explained as a function of the number of principal components (scree plots), and secondly, examining the canonical loadings plots of the principal components based on the canonical parameters.
A methodology which categorizes PCs as either low, intermediate, or high-order PC based on the size of variance of PCs and cumulative percentage [24] was adopted, as follows: (i) Low-order PCs: <99% of the cumulative variance and >1.0 (average) eigenvalue. (ii) Intermediate-order PCs: between 99% and 99.5% of the cumulative variance. (iii) High-order PCs: >99.5% of the cumulative variance and <1.0 (average) eigenvalue.
e t-test and effect size statistical criteria [25] were included in determining the utility (relevance) of PCs for chemometric analysis.
is was done in Microsoft Excel (2010). e two-sample t-tests were employed to compare the PC scores between the PC3 and PNT1a cells. e determined t-test values were utilized in calculating Cohen's d values for each PC score as follows [26]: where d � Cohen's d effect size, (x) � mean of diseased or normal conditions, s � standard deviation, n � number of samples, t � diseased condition, and c � control condition. It should be noted Cohen's d effect sizes were classified as small (d � 0.2), medium (d � 0.5), and large (d ≥ 0.8) [25]. e resultant p values were adjusted using a Holm-Bonferroni method to maintain a family-wise alpha of 0.05 for tests on all variables [27]. Pearson correlation coefficients (r) were computed to assess the degree of linear relationship between principal components, where effect sizes were classified as small (r � 0.2), medium (r � 0.5), and large (r � 0.8) [25]. e low-variance signals (weak band Raman band alterations) were extracted from the intermediate-and highorder PC loadings with the help of peak finding function in Omnic ® software from ermo Scientific. Two parameters, i.e., threshold and sensitivity levels, were adjusted, where the threshold value specified the Y value above which peaks could be found, and sensitivity value took into account the relative size of adjacent spectral features. e sensitivity level was set at a low value (30%) to eliminate noise and other unimportant features above the threshold value. . Figures 1(a) and 1(b) show the highlighted optical photomicrographs of unstained PC3 and PNT1a monolayer grown cells on calcium fluoride (CaF 2 ) substrates, respectively. As expected, the PC3 and PNT1a are typically adherent cells that grow attached to a substrate in discrete patches, usually with more regular dimensions (ideally polygonal shapes). Here, it is noted the attached monolayer cells have epithelial morphological features with the nucleus (i), cytoplasm (ii), and cell wall (iii) locations discernible in both images. e biochemical assignments of peaks were done in accordance with the Raman spectroscopy of tissues, body fluids, or biomolecules, as highlighted in the literature. is was done in consideration of position and possible wavelength differences of each particular Raman band. eoretically, spectral resolution in a dispersive Raman spectrometer is determined by many factors which include spectrometer focal length, diffraction grating, laser wavelength, and the detector [28]. However, the effect of each factor on spectral resolution is considered under the assumption that all other factors remain unchanged, which may include change in slit width, grating line spacing, and effects of small grating movements. However, CCD detector (dispersive Raman) is usually sensitive to the changes in room temperature, owing to the thermal expansion of various elements including the diffraction grating, the detector, and all the other optics. In view of possible detection conditions and experimental errors, the Raman peaks with wavelength difference within 10 cm − 1 were considered as the same peak. is choice of wavelength difference was further motivated based on the work of McCreely [28] who suggested that most analytical Raman applications involve liquids and solids in which Raman bandwidths are significantly greater than those in the gas phase; hence, the narrowest linewidths encountered in most liquid and solid samples lie in the range of 3 to 10 cm − 1 .  Table 1. Figure 2(b) shows the stacked mean Raman spectra that explain biochemical alterations occurring during all stages of cell proliferation, for both cell lines. It should be noted the spectra have been linearly offset for comparison purposes. e spectra demonstrate a similar spectral pattern although there are minor Raman shifts. e figure reveals the mean positions (and the standard errors) of major Raman band contributions for the two cell lines, whose biochemical assignments are well explained in literature [9,13,16,29,30].

Determination of Prominent Biochemical Alterations in PC3 and PNT1a Proliferation
ey include 519 ± 1.30 cm − 1 (phosphatidylinositol), 719 ± 3.78 cm − 1 (phospholipids and nucleic acids), 857 ± 1.84 cm − 1 (proline and tyrosine proteins), 1003 ± 1.49 cm − 1 (phenylalanine), 1083 ± 1.94 cm − 1 (C-N stretch of proteins and lipids and nucleic acids), 1253 ± 1.87 cm − 1 (nucleic acids, lipids, and proteins), 1449 ± 1.35 cm − 1 (C-H vibration of proteins and lipids and nucleic acids), and 1659 ± 1.33 cm − 1 (amide I), with the strongest Raman bands occurring at 519 ± 1.30 cm − 1 , 1253 ± 1.87 cm − 1 , 1449 ± 1.35 cm − 1 , and 1659 ± 1.33 cm − 1 . Figures 3(a)-3(c) show the difference spectra between the normalized mean spectra of PC3 and PNT1a cells at 48, 72, and 96 hours, respectively. e positive bands explain alterations present in greater concentration in PC3 cells, while negative bands explain alterations abundant in PNT1a cells. Generally, the difference spectra show PC3 cells, exhibiting prominent biochemical alterations at 566 ± 0.70 cm − 1 (cytosine and guanine), 630 cm − 1 (glycerol), 972 ± 1.17 cm − 1 (cytosine and proteins), 1186 cm − 1 (guanine, cytosine, adenine, and antisymmetric phosphate vibrations), 1520 ± 1.41 cm − 1 (cytosine, C-C stretch, and C�C stretch mode (β-carotene accumulation)), and 1743 cm − 1 (ester groups) [9,16,[29][30][31]. Similarly, PNT1a samples have prominent biochemical alterations at 550 ± 0.23 cm − 1 (cytosine, guanine, tryptophan, and glycerol), 719 ± 1.31 cm − 1 (phospholipids and nucleic acids), 852 ± 0.47 cm − 1 (proline, tyrosine, and polysaccharides), 948 ± 1.88 cm − 1 (valine and proline), 1250 ± 2.86 cm − 1 (amide III, lipids, adenine, and cytosine), 1332 ± 1.64 cm − 1 (nucleic acids and CH 3 CH 2 wagging of collagen), 1450 ± 2.20 cm − 1 (lipids and proteins), and 1660 cm − 1 (amide I and lipids) [13,30,31]. If we consider stage 2 and 3 spectral datasets, it is observed PNT1a cells spectra have prominent band alterations at 623 cm − 1 (phenylalanine and adenine), 664 cm − 1 (guanine, thymine, and collagen), 898 ± 0.20 cm − 1 (proline and saccharides), 1066 cm − 1 (proline of collagen), 1152 ± 1.44 cm − 1 (proteins and carotenoids), 1370 ± 0.86 cm − 1 (saccharides), 1573 cm − 1 (guanine, adenine, and tryptophan proteins), 1618 ± 1.73 cm − 1 (tryptophan), and 1675 cm − 1 (amide I (β-sheet)) [30,31]. Furthermore, the spectral marker at 576 cm − 1 (phosphatidylinositol) was only detected during Journal of Spectroscopy the first two stages of PNT1a cell proliferation and the late proliferation cycle (stage 3) of malignant (PC3) cells. is suggests presence of enhanced lipid alterations during advanced malignancy. However, to evaluate utility of these bands in prostate cancer diagnosis, the statistical p values were calculated using Student's t-test for each band. e two-sample t-test on averaged intensities around these bands showed they were statistically significant (p < 0.05), with exception of 576 cm − 1 band, meaning the biochemical changes at 576 cm − 1 band could not be used to discriminate between control (PNT1a) and prostate cancer (PC3) cell groups.     C�O stretch (proteins and lipids), C�C stretch (lipids), nucleic acids [9,29] e peak intensity ratios of Raman spectra measurements have been previously utilized in classifying diseased and healthy samples [32]. A similar analysis was performed on the observed prominent difference bands in Figures 3(a)-3(c), where ratio values (i.e., I C /I N ) were calculated by dividing the normalized intensities of PC3 spectra (I C ) by normalized intensities of PNT1a spectra (I N ). Table 2 highlights the bands (566 ± 0.70 cm − 1 , 630 cm − 1 , 1370 ± 0.86 cm − 1 , and 1618 ± 1.73 cm − 1 ) whose band intensity ratios were found to increase or decrease with the stages of cell proliferation. e band intensity ratio values at 566 ± 0.70 cm − 1 and 630 cm − 1 were found to increase with the stages of the cell proliferation, while the band intensity ratio values at 1370 ± 0.86 cm − 1 and 1618 ± 1.73 cm − 1 were found to decrease with the stages of cell proliferation. e increasing peak ratios at 566 ± 0.70 cm − 1 band suggest biochemical changes due to nucleic acids bases (cytosine and guanines) increased with malignancy. is correlates with other closely related prostatic studies [33][34][35] which have suggested that spectral intensities of DNA-related bands in prostate samples increase with malignancy, a factor attributed to enlarged nuclei in malignant cells than for normal cells and therefore greater abundance of DNA content in malignant samples [34]. For instance, qualitative findings by Crow et al. [33] showed nucleic acid contents in malignant prostate biopsies were higher compared to benign prostatic hyperplasia (BPH) ones. Elsewhere, a study that investigated biochemical alterations for the different tissue pathologies within the bladder and prostate gland region observed the DNA contents increased with malignancy [35], while work by Taleb et al. [34] observed DNA contents were higher in malignant (LNCaP) cells when compared with DNA contents in normal (PNT1a) cells. e increasing band intensity ratio values at 630 cm − 1 band suggest biochemical changes due to lipids increased with malignancy. A closely related study by Matias et al. [36] observed that adipocyte levels increased with the severity of malignancy, a factor attributed to the production of lipids via de novo lipogenesis [37]. e 1370 cm − 1 band is a pronounced saccharide band in biological samples, while 1618 cm − 1 band is assigned to υ (C�C) vibrations due to tryptophan protein [30]. e decreasing band intensity ratio values observed at 1370 ± 0.86 cm − 1 and 1618 ± 1.73 cm − 1 bands suggest that biochemical changes due to saccharides and tryptophan decreased with malignancy. e observed decrease in saccharide levels with malignancy fit in with the findings of a previous qualitative cell line-based study where glycogen content was found lower in prostatic adenocarcinoma pathologies, compared to benign pathologies [33], a factor that can be attributed to enhanced glucose uptake by cells during onset of tumor development for conversion to lactate molecules necessary for energy production during cell proliferation [38]. e decrement of tryptophan levels with malignancy is an indication of prostate malignancy undergoing increased tryptophan degradation.

Determination of Subtle Biochemical Alterations during PC3 and PNT1a
Proliferation. Figure 4 shows the eigenvalues plotted as a function of the number of principal components in our raw datasets. As highlighted in Section 2.2.3, the raw datasets were baseline corrected by the linear method and normalized to their maximum intensity, but without denoising to preserve all pertinent spectral features in the datasets. Table 3 shows categorization of principal components as either low, intermediate, or high-order principal components based on the size of variances and cumulative percentages of the total variation. It was noted that the first principal components (PC1) could be categorized as a low-order principal component. For stage 1 datasets of both cells, principal components 2 to 16 were categorized as intermediate-order PCs. For stages 2 and 3 datasets, intermediate-order principal components were 2 to 22 and 2, respectively. e remaining principal components were categorized as high-order principal components according to the criteria given in Table 3. e first two principal components (PC1 and PC2) were found to account for the largest variance of the data. Overall, the total cumulative variances accounted for by the first two principal components (PC1 and PC2) in stage 1, stage 2 and, stage 3 datasets were 99.13%, 99.02%, and 99.58%, respectively.
To determine which PCs scores were potentially significant for sample discrimination, the PCs were subjected to the two-sample t-test and effect size (Cohen's d and Pearson's correlation coefficient r) statistical criteria, as described in Section 2.2.3. Table 4 shows the p values, Cohen's d, and Pearson correlation coefficients (r) computed for the first 10 principal components. Significant differences (p < 0.05) between the principal component scores were observed in PCs 1 to 5 for the stage 1 dataset, PCs 1 to 3 and PC 8 for the stage 2 dataset, and PCs 1 to 6 and PC 8 for the stage 3 dataset. Generally, the first principal component (PC1) had the largest effect sizes (Cohen's d) for all the datasets. e value of d was 1.22 for stage 1, 1.32 for stage 2, and 2.27 for stage 3 datasets. e respective explained total variances were 98.85%, 98.88%, and 99.19%. e utility of observed PCs was further evaluated by plotting their canonical variable distributions (Figures 5(a), 5(c), and 5(e)). Figure 5(a) shows that PCs 2, 3, 4, and 5 had the largest standardized loadings for the stage 1 dataset. For the stage 2 dataset (Figure 5(c)), the largest standardized loading values were for PCs 2, 3, 8, and 9, while PCs 2, 4, 5, and 6 had the largest standardized loadings for the stage 3 dataset (Figure 5(e)).
ese are the principal components that were used for cell discrimination.
Cell discrimination was done by examining the scatter plots of PC scores and their respective loadings vectors. Each loading vector was related to the original spectrum by PC scores, which refer to the weight of that particular biochemical component in each spectrum [16]. e score plots highlighted the natural groupings of each of the principal components for the cell types. It was found that PC 2 (0.29%), PC 2 (0.42%), and PC 4 (0.39%) had best grouping of both cell types for stages 1, 2, and 3 spectral data, respectively. Examination of loading vectors showed the groupings/discriminations were attributed to the prominent biochemical differences observed in Figure 3. It was also observed that some level of clustering was present due to 8 Journal of Spectroscopy      ese discriminations were attributed to the subtle biochemical differences between the malignant and normal cells.
As explained in Section 2.2.3, Cohen's d effect sizes were classified as small (d � 0.2), medium (d � 0.5), and large (d ≥ 0.8) [25]. Table 4 shows PC5 (0.03%) for stage 1 dataset had a large effect size, whereas PC8 (0.02%) for stage 2 and PC5 (0.01%) for stage 3 datasets had medium effect sizes. Examination of both the scatter plots ( Figures 5(b), 5(d), and 5(f )) and the loading spectrum (Figures 6(a)-6(c)) showed the positive bands in the loading spectrum were present in cells to which higher scores were attributed, and vice versa. e loadings profiles were noted to contain useful peaks amid noisy features. A possible method of extracting useful peaks would have been to denoise or smooth the loading profiles. However, this would have introduced spectral artifacts, peak shifts, and loss of useful peaks. erefore, the peaks were extracted in their raw form by adjusting threshold and sensitivity levels/parameters of peak finding function in Omnic ® software from ermo Scientific. It should be noted the sensitivity levels were kept at low values (30%) to eliminate noisy peaks, and only the largest loading vectors that explained scores distribution in Figures 5(b), 5(d), and 5(f ) were extracted and used for further analysis. Detailed information concerning the loading vectors (which in this case are weak Raman bands) that explained natural groupings of PNT1a and PC3 scores using intermediate-and high-order PC5 (0.03%) for stage 1, PC8 (0.02%) for stage 2, and PC5 (0.01%) for stage 3 datasets are shown in Table 5.
It was observed ( Table 5) 697, 1319, 1377, and 1577 cm − 1 ) had the most influence for the assignment of scores into the normal class (PNT1a) for stages 1, 2, and 3 measurements, respectively. Note that the loading vectors at (1232, 1330, and 1442 cm − 1 ), (1471 cm − 1 ), and (1076 and 1278 cm − 1 ) were common for the two cell lines in stages 1, 2, and 3 datasets, respectively. e two-sample t-test on averaged normalized intensities on these bands indicated they were statistically significant (p < 0.05).
To better understand these differences, band intensity ratios (I C /I N ) were determined at all observed loading vectors (weak Raman bands), by dividing the normalized intensities of diseased (PC3 cells) Raman spectra (I C ) by the respective normalized intensities of control (PNT1a cells) Raman spectra (I N ). e band intensity ratio values at 1076 cm − 1 and 1232 cm − 1 bands were found to increase with the stage of cell proliferation (Table 6). e 1076 cm − 1 band is associated with symmetric phosphate stretching modes which normally originate from the phosphodiester groups in nucleic acids and is thought to suggest an increase in the nucleic acids in malignant tissues [30]. e alterations around 1232 cm − 1 band indicate antisymmetric phosphate stretching vibrations due to nucleic acids and amide III components [39]. erefore, the band intensity ratio values of normalized intensities of PC3 cells (I C ) to normalized intensities of PNT1a cells (I N ) at 1076 cm − 1 and 1232 cm − 1 bands suggested prostate malignancy could be associated with an increase in relative amounts of nucleic acids and amide III components. e rest of weak Raman bands (Table 5) can be mainly attributed to subtle nucleic acids and protein alterations, and their specific assignments are explained elsewhere [9,30,31]. e use of high-order principal components is still underexplored as a chemometric tool for the analysis of prostatic tissues. In fact, Raman spectroscopy-based studies for the diagnosis of prostate cancer favor the use of the first few principal components (PCs) that account for large proportions of variance [12,15,16,36,40,41]. A previous review on applications of Raman spectroscopy for prostate cancer [3] showed the majority of prostate-based Raman studies reporting bands around 1000 cm − 1 (phenylalanine ring breathing), 1200 to 1350 cm − 1 (overlapping of the amide III, CH 2 "wagging," and CN "stretch" peaks present mainly in lipids and nucleic acids), 1450 cm − 1 (CH 2 deformation vibration present in nucleic acids, proteins, and lipids), and in the region of 1600 to 1700 cm − 1 (C═O stretch and amide I, present in nucleic acids and proteins).  Figure 6: Principal component loadings that explain natural grouping of PNT1a and PC3 scores due to (a) PC5 (0.03%) for stage 1, (b) PC8 (0.02%) for stage 2, and (c) PC5 (0.01%) for stage 3 datasets. e loadings profiles are noted to contain useful peaks amid noisy features. e useful peaks (weak Raman bands) were extracted from the raw loading profiles by adjustment of threshold and sensitivity levels/parameters in peak finding function in Omnic ® software from ermo Scientific. Sensitivity levels were set at a low value (30%) so that noise and other unimportant features above threshold value were eliminated. 12 Journal of Spectroscopy To the best of authors' knowledge, spectroscopic prostate studies based on the utility of intermediate-and high-order principal components which account for negligible variance (assumed noise) has not been previously reported. Our findings have shown it is possible to discriminate prostatic cells using weak band alterations determined with aid of intermediate-and high-order principal components. Interestingly, earlier studies on prostate cancer cells by Corsetti et al. [16] and Crow et al. [40] reported prominent bands around (880, 1184, 1588, and 1614 cm − 1 ) and (1094, 1096, 1125, and 1576 cm − 1 ), respectively. Similarly, the 1328 cm − 1 and 643 cm − 1 bands were observed as prominent bands in works by Matias et al. [36] and Patel and Martin-Hirsch [41], respectively. In our study, intermediate-and high-order principal components demonstrated the weak bands detected at 871 cm − 1 , 1094 cm − 1 , 1122 cm − 1 , 1123 cm − 1 , 1191 cm − 1 , 1330 cm − 1 , 1333 cm − 1 , 1586 cm − 1 , and 1614 cm − 1 (Table 5) were relevant for grouping of normal and diseased prostatic cells. However, these bands were subtle in intensity and could not be detected in spectral profiles, as shown in Figures 2(b) and 3(a)-3(c). is apparent discrepancy in Raman band intensities between their results and our findings is most likely due to the different experimental conditions employed. In this study, the observed subtle bands can be explained in consideration of different experimental conditions that could have limited detection of optimum Raman signals, e.g., scattering efficiency of laser wavelength (and hence the laser power irradiation over the sample), and the large background due to scattering from the sample itself, substrate, and the microscope objective [21]. Generally, the weak Raman signals that can be potentially useful for disease diagnostics may be swamped in large background effects. Consequently, useful Raman bands may not be optimally detected in all related experiments. It should be noted the works by Corsetti et al. [16], Crow et al. [40], Matias et al., [36] and Martin-Hirsch et al. [41] were based on 785 nm, 830 nm, 830 nm, and 785 nm excitation lasers with values of laser power at 135 mW, 300 mW, 80 mW, and 100 mW, respectively. It is therefore not entirely clear if the laser power played a major role in detection of optimum Raman signals in their experiments.
Our results showed the weak bands in Table 5 were not spectrally visible in Figures 2(b) and 3(a)-3(c) although they were statistically significant (p < 0.05). However, as seen in Figures 5(b), 5(d), and 5(f), the score plots demonstrated a reasonable level of scores discrimination, significantly strengthening the view that there were inherent subtle molecular differences between the two cell lines. ese current results are encouraging and should motivate utility and a deeper investigation of alternative multivariate analysis methods of detecting weak band Raman alterations (e.g., higher principal components), particularly when performing Raman experiments under limited experimental conditions. Table 5: e Raman bands (loading vectors) that explain natural groupings of PNT1a and PC3 scores using PC 5 (0.03%) for stage 1, PC 8 (0.02%) for stage 2, and PC 5 (0.01%) for stage 3 datasets.

Conclusion
In this study, the utility of Raman spectroscopy in interrogating subtle molecular alterations in cultured PC3 and PNT1a cells in vitro has been demonstrated. e findings from this study suggest that Raman spectroscopy combined with principal component analysis (PCA) has potential applications as a research tool in prostate cancer diagnostics. Analyses using Raman difference spectra, PCA, and Raman band intensity ratios were used to identify prominent and subtle biochemical alterations associated with prostate malignancy.
e 566 ± 0.70 cm − 1 , 630 cm − 1 , 1370 ± 0.86 cm − 1 , and 1618 ± 1.73 cm − 1 bands were identified from the difference spectra and were associated with prominent biochemical differences between diseased (PC3) and healthy (PNT1a) cells. e band intensity ratios at 566 ± 0.70 cm − 1 and 630 cm − 1 were found to increase with the stage of cell proliferation. is suggested that prostate malignancy can be linked to an increase in relative amounts of nucleic acids and lipids, respectively. On the contrary, decreasing band intensity ratios at 1370 ± 0.86 cm − 1 and 1618 ± 1.73 cm − 1 suggested that prostate malignancy can be associated with decrease in relative amounts of saccharides and tryptophan, respectively, as the cells proliferate. PCA was used to detect subtle biochemical alterations through analysis of intermediate-order and highorder principal components (PCs). Subtle bands around 1076 cm − 1 , (1232, 1234 cm − 1 ), (1276, 1278 cm − 1 ), (1330, 1333 cm − 1 ), (1434, 1442 cm − 1 ), and (1471, 1479 cm − 1 ) were observed during proliferation of both cell lines. e band intensity ratios at 1076 cm − 1 and 1232 cm − 1 were found to increase with the stage of cell proliferation, suggesting that prostate malignancy can be associated with an increase in relative amounts of nucleic acids and amide III components, respectively. e results of this study show that subtle biochemical alterations can be detected and extracted from the Raman spectra of prostate cell lines. ere is a need for further exploration and quantification of such biochemical alterations, especially at the subcellular level.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper. 14 Journal of Spectroscopy