Reprogrammed Cells Display Distinct Proteomic Signatures Associated with Colony Morphology Variability

Human induced pluripotent stem cells (hiPSCs) are of high interest because they can be differentiated into a vast range of different cell types. Ideally, reprogrammed cells should sustain long-term culturing in an undifferentiated state. However, some reprogrammed cell lines represent an unstable state by spontaneously differentiating and changing their cellular phenotype and colony morphology. This phenomenon is not fully understood, and no method is available to predict it reliably. In this study, we analyzed and compared the proteome landscape of 20 reprogrammed cell lines classified as stable and unstable based on long-term colony morphology. We identified distinct proteomic signatures associated with stable colony morphology and with unstable colony morphology, although the typical pluripotency markers (POU5F1, SOX2) were present with both morphologies. Notably, epithelial to mesenchymal transition (EMT) protein markers were associated with unstable colony morphology, and the transforming growth factor beta (TGFB) signalling pathway was predicted as one of the main regulator pathways involved in this process. Furthermore, we identified specific proteins that separated the stable from the unstable state. Finally, we assessed both spontaneous embryonic body (EB) formation and directed differentiation and showed that reprogrammed lines with an unstable colony morphology had reduced differentiation capacity. To conclude, we found that different defined patterns of colony morphology in reprogrammed cells were associated with distinct proteomic profiles and different outcomes in differentiation capacity.


Introduction
Human pluripotent stem cells (hPSCs) such as induced pluripotent stem cells (iPSCs) and embryonic stem cells (ESCs) have the potential to be differentiated into a whole range of different cell types and are, therefore, of high interest for both researchers and clinicians. Reprogramming of somatic cells to generate hiPSCs has rapidly gained popularity as it enables the use of patient-specific cells.
Maintaining cells in a pluripotent state in vitro requires routine monitoring during expansion. A typical characterization pipeline to ensure pluripotency includes expression of singular pluripotency markers (SOX2 and POU5F1), karyotype analysis, and the ability to form the three germ layers using teratoma assays or embryoid body formation [1]. Despite these quality controls, numerous studies have shown major line-to-line variations [2][3][4][5]. To improve the utility of hPSCs in regenerative medicine and to ensure high-quality clinical-grade cell products, we need a pipeline of robust quality control methods that can be automated to benchmark the cells and filter out reprogrammed cells of inferior quality.
Besides teratoma formation, the colony morphology of reprogrammed cells is considered an important assessment criterion of pluripotency [6][7][8][9][10]. In several studies, the capacity to form teratomas and stable culturing has been correlated to colony morphology [6,[11][12][13], thus correlating this aspect with the functionality of the hiPSC. However, during longterm culturing, the colony morphology has been observed to vary in basically two forms: stable and unstable colony morphologies. Typically, a reprogrammed cell line with a stable colony morphology exhibits compact colonies, usually round, with distinct borders and well-defined sharp edges and is associated with a pluripotent state [14]. A reprogrammed cell line with an unstable colony morphology exhibits irregular colony morphology and is associated with spontaneous differentiation [9]. Although colony morphology is an important indicator of pluripotency, it suffers from subjective evaluation and lack of well-established quantitative metrics. Several groups have in recent years established metrics of colony morphology based on image acquisition to probe for loss of pluripotency [8,15]. However, this requires sophisticated microscopy methods and only takes into account the physical characteristics of the cells and colonies.
Proteomics provides an excellent tool for large-scale quantification and benchmarking of cells and an opportunity to further improve the characterization of colony morphology of reprogrammed cells. Compared to other~omics approaches (transcriptomics and genomics), proteomics measures the translated proteins as opposed to molecules that potentially can become the proteins [16]. The proteome is dynamic and changes rapidly. In this study, we hypothesized that the proteome of reprogrammed cell lines showing stable colony morphology would differ from reprogrammed cell lines showing unstable colony morphology. Subsequently, we aimed to use proteomics to obtain insight into the molecular landscape associated with different colony morphology groups and corresponding variable differentiation potential.

Materials and Methods
2.1. Cell Source. We reprogrammed fibroblasts taken from seven donors. All patients gave written informed consent. The reported experiments were approved by the Regional Committee of Medical and Health Research Ethics (REK 2010/2295). All methods were performed according to the Declaration of Helsinki. A total of 20 reprogrammed cell lines were generated. From donor 1, we generated the following reprogrammed cell lines; 1-A, 1-B, and 1-C. Furthermore, cell lines 2-A, 2-B, and 2-C are derived from donor 2; cell lines 3-A, 3-B, 3-C, and 3-D are derived from donor 3; cell lines 4-A, 4-B, and 4-C are derived from donor 4; cell lines 5-A and 5-B are derived from donor 5; cell lines 6-A, 6-B, and 6-C are derived from donor 6, while cell lines 7-A and 7-B are derived from donor 7. Additional information can be found in supplementary table 1.

Sendai Reprogramming.
Reprogramming of donors 5-7 was performed by Sendai reprogramming and carried out by the company Takara Bio Inc. using a CytoTune-iPS 2.0 Sendai Reprogramming Kit (cat# A16517, Life Technologies). Clearance of the Sendai virus was tested by Q-PCR using a TaqMan assay for Sendai virus. The Sendai virus level was under the detection limit (CT ≥ 36) for all the generated clones. Colonies were picked 3-4 weeks post transduction and expanded in a Cellartis Feeder-Free DEF-CS Culture System (cat# Y30017, Takara). All hiPSC lines were tested negative for mycoplasma.

Maintenance of the Reprogrammed Cells.
The reprogrammed cell lines were cultured in 6-well plates (cat# 83.3920, Sarstedt), coated with Matrigel (cat# 354230, Corning). The cells were maintained in mTeSR™1 media, and media were changed every day. Once the dish was confluent, just before colonies were in contact with each other, the cells were split by using a Gentle Cell Dissociation Reagent (cat# 7174, STEMCELL Technologies) by following the instruction provided by the supplier. In brief, 1 mL Gentle Cell Dissociation Reagent was added to a well in a 6-well plate and incubated (37°C) for 5 min, followed by replacing the Gentle Cell Dissociation Reagent by 1 mL prewarmed mTeSR™1 media and subsequently disrupting the colonies by gently scraping the surface with a cell scraper. The cells were split to a ratio between 1 : 6 and 1 : 10 depending on the growth rate of the line and further cultivated until confluency was reached again.
2.5. SSEA4 + Enrichment. All reprogrammed cell lines were enriched for Anti-Stage-Specific Embryonic Antigen 4-(SSEA-4-) positive cells by using magnetic cell isolation with MicroBeads (cat# 130097855, Miltenyi Biotec) following the guidelines provided by the supplier.
2.6. Classification of Reprogrammed Cell Lines. The reprogrammed cell lines were qualitatively evaluated by using a phase-contrast microscope and manually assigned to one of the three morphology groups (stable colony, unstable class 1, and unstable class 2). Representative lines for each colony morphology group were imaged using a Nikon TE2000 with a 10x objective. Immunocytochemistry analysis was performed on a representative line for each colony morphology group. Cells were cultured on glass coverslips and fixed in 2% PFA for 15 min. The immunofluorescence protocol was performed following the guidelines provided by the suppliers. The following antibodies were used: mouse anti-human α-tubulin (1/100, cat# T5168, Sigma), rabbit anti-human β-tubulin (1/500, cat# ab32572, Abcam); and the following secondary antibodies were used: donkey anti-rabbit A647 (1/500, Molecular Probes) and donkey anti-mouse A594 (1/500, Molecular Probes). The nuclei were stained with DAPI (cat# D1306, Molecular Probes). The samples were mounted in ProLong Diamond Antifade Mountant Media (cat# P36970, Life Technologies). The expression of βtubulin and α-tubulin was analyzed by using a Leica TCS SP5 confocal microscope with a 40x objective. No specific feature of the original data was obscured, eliminated, or misrepresented.

Embryonic Body Formation.
Embryonic bodies were generated by following the instructions of the AggreWell™800 Starter Kit (cat# 34850, STEMCELL Technologies). Briefly, cells were harvested with the Gentle Cell Dissociation Reagent and 1.2 million cells were plated in the Aggre-Well™800 plates and incubated for 24 hours. The generation of embryonic bodies was facilitated by culturing the embryonic bodies in Primate ES Cell Media (cat# 258RCHEMD001, Tebu Bio), the first 10 days in suspension plates (cat# 83.3920.500, Sarstedt) followed by 14 days in 6-well plates (cat# 83.3920, Sarstedt), coated with Matrigel (cat# 354230, Corning). The embryonic bodies were stained for beta-III tubulin (TUJ1), smooth muscle actin (SMA), and alpha-fetoprotein (AFP) by following the instructions of the 3-germ layer immunocytochemistry kit (cat# A25538, Thermo). The expression of TUJ1 (1/500), SMA (1 : 100), and AFP (1 : 500) were analyzed by using a Leica TCS SP2 microscope with a 40x objective, a Leica TCS SP5 confocal with a 40x objective, or a Leica TCS SP8 STED 3X confocal microscope with a 100x objective.
2.9. Flow Cytometry Analysis. Cells were washed in Ca/Mgfree PBS and incubated with TrypLE™ Select (cat# 12563011, Thermo Fisher Scientific) 5 minutes in the incubator. The cell suspension was washed in Ca/Mg-free PBS and then centrifuged 500 g for 4 minutes. The pellet was resuspended in Ca/Mg-free PBS and incubated with the LIVE/DEAD Fixable Dead Cell Near-IR Fluorescent Dye (cat# L10119, Invitrogen), according to the manufacturer's instructions. Next, the cells were then fixed and permeabilized with the Fix/Perm Solution Kit (cat# 554714, BD Biosciences) according to the manufacturer's instructions. Cells were then stained with antibodies and washed. For CD9 analysis (surface marker), cells were stained with the antibody before fixation. A titration curve was previously done to determine the volume of antibody to add per tube of 10 6 cells: 1.5 μL of AF488-POU5F1 antibody (cat# BD560253, BD Biosciences), 1 μL of APC-SOX17 antibody (cat# IC1924A, R&D), and 0.2 μL of APC-CD9 antibody (cat# BD341648, BD Biosciences) and the same amount of isotype control antibodies (cat# BD55772, BD Biosciences; cat# IC108A, R&D; cat# IC003R, R&D). Data were analyzed with FlowJo 10. Dead cells, debris, and doublets were excluded, and after compensation, gating was determined on FL1/FL4 dot plots using Fluorescence Minus One (FMO) controls. Unstained cells and isotype controls were run separately. Assay Kit (cat# 232225, Thermo Fisher Scientific), and volume corresponding to 25 μg protein was further reduced in 0.1 M DTT and processed into peptides using filter-aided sample preparation [18]. Prior to usage, all filters were checked with a simple centrifugation step [19] in order to exclude nonretaining protein membrane filters. The peptide solutions were desalted with Oasis HLB 96-well μElution plate (cat# 186001828BA, Waters) using 0.1% formic acid (FA) and 80% acetonitrile (ACN)/0.1% FA as binding and elution buffers, respectively. Eluted peptides were vacuum dried and dissolved in 2% ACN, 1%FA prior to LC-MS analysis.
The eluting peptides from the LC-column were ionized in the electrospray and analyzed by the Q-Exactive HF. The mass spectrometer was operated in the DDA mode (datadependent acquisition) to automatically switch between full-scan MS and MS/MS acquisition. Instrument control was through Q Exactive HF Tune 2.4 and Xcalibur 3.0. MS spectra were acquired in the scan range 375-1500 m/z with resolution R = 120,000 at m/z 200, automatic gain control (AGC) target of 3e6, and a maximum injection time (IT) of 100 ms. The 15 most intense eluting peptides above intensity threshold 50 000 counts and charge states 2 to 5 were sequentially isolated to a target value (AGC) of 1e5 and a maximum IT of 100 ms in the C-trap, and isolation width maintained at 1.6 m/z (offset of 0.3 m/z), before fragmentation in the HCD (Higher-Energy Collision Dissociation) cell. Fragmentation was performed with a normalized collision energy (NCE) of 28%, and fragments were detected in the Orbitrap at a resolution of 15 000 at m/z 200, with first mass fixed at m/z 100. One MS/MS spectrum of a precursor mass was allowed before dynamic exclusion for 20 s with "exclude isotopes" on. Lock-mass internal calibration (m/z 445.12003) was used. Furthermore, for spray and ion-source parameters, the ion spray voltage was at 1800 V, no sheath and auxiliary gas flow, and the capillary temperature was at 260°C.  [20] using the default parameters with the following exceptions: label-free quantification was set to LFQ, minimum peptide length was set to 6 amino acids, and the match-between-runs option was enabled. The cellular protein levels were relatively quantified using the MaxLFQ algorithm [21], and these intracellular levels are presented as the relative LFQ intensity defined as the normalized relative protein abundance compared across the MS runs.
2.11. Postprocessing of the Proteomic Data. MaxQuant normalized expression data (LFQ intensities) were log 2 transformed. Reverse hits and contaminates were removed. All samples had missing values which is common for low abundant proteins; however, to avoid too many missing values we only considered proteins with expression values in at least 14/20 samples. For every protein, the fold changes (FC) between stable and unstable were evaluated by subtracting the median of the respective logarithm transformed intensities. Next, we used Z-statistics [22] to evaluate the significance of the FC (referred to as FC significance), and FC p values < 0.05 were considered significant. The Perseus software (v.1.6.2.3) was used to analyze and visualize the data [23]. Principal component analysis (PCA) was performed in Perseus and used to compare the reprogrammed lines using the protein abundance. Missing values are incompatible with this approach; therefore, we filtered the protein abundance matrix to only contain valid values. Unsupervised hier-archical clustering was performed in Perseus on z -normalized abundance values. The parameters for clustering were average linkage and Euclidean correlation as distance measurement, prepossessed with k-means.
2.12. Pathway Analysis of the Proteomic Data. Pathway analysis of the proteomic data was performed in Ingenuity Pathway Analysis (IPA) software. Proteins being more abundant in the unstable colony morphology group (n = 338) and proteins being more abundant in the stable colony morphology (n = 276) were analyzed in IPA to find networks and upstream regulators in the two groups. We used logtransformed z-normalized abundance values. "Select identifier type" was set to "Gene symbol-human" and "Measurement annotation for observation" was set to "expr log ratio." We performed a core analysis and used the default setting except the following: 70 molecules per network and 25 networks and all tissue and cell lines.

Receiver
Operating Characteristic (ROC) Curves. ROC curves were generated in GraphPad Prism by using the default settings including a confidence interval of 95% calculated by using the Wilson/Brown method.
2.14. EMT Reversal Experiment Using Ligands. Cells were seeded in wells in a 24-well plate, each well containing a 9 mm cover slip subsequently coated with Matrigel. For the first replicate, cells were harvested prior to the experiment with the Gentle Cell Dissociation Reagent and 50 000 cells were seeded in each well. For the second replicate, cells were harvested with TrypLE™ Select and 100,000 filtered cells were seeded in each well. In both experiments, the cells were treated daily with ALX-270-445 (10, 25, and 50 μM) or A83-01 (0.2, 1, and 10 μM) or SMURF1-i (2, 10, and 25 μM) and the cover slips were collected and fixed after 7 days. The immunofluorescence protocol was performed following directions provided by the supplier, and the following antibodies were used: mouse anti-human E-cadherin (1/250, cat# ab76055, Abcam) and rabbit anti-human vimentin (1/100, cat# 5741, CST). The following secondary antibodies were used: donkey anti-rabbit A647 and donkey anti-mouse A594. The secondary antibodies were all from Molecular Probes (dilution 1/500). The nuclei were stained with DAPI. The samples were mounted in ProLong Diamond Antifade Mountant Media. The expression of E-cadherin and vimentin were analyzed by using the Andor Dragonfly 505 (Andor Technologies, Inc.) confocal microscope with a 20x dry objective (CFI Plan Apochromat Lambda 20x). The immunofluorescence was quantified using the Imaris software (v9.2.1). No specific feature of the original data was obscured, eliminated, or misrepresented.

Generation and Morphological Classification of
Reprogrammed Cell Lines. We used fibroblast cells isolated from seven donors' skin biopsies to generate 20 reprogrammed cell lines (Figure 1(a), supplementary table 1). Donor 1-4 fibroblasts were reprogrammed using episomal plasmids [17] while donor 5-7 fibroblasts were reprogrammed using the Sendai virus. Each donor generated 2-4 reprogrammed cell lines each. After reprogramming, all lines presented a typical pluripotent colony morphology. However, after subsequent enrichment of SSEA4 + positive cells and further culturing, four of the lines had changed their colony morphology to a state with disintegrating colonies and two of the lines had changed colony morphology to a monolayer state with completely dispersed cells, referenced to in the remaining part of this paper as class 1 and class 2 unstable lines, respectively (Figures 1(a) and 1(b)). The cell lines were maintained in the same culturing conditions and split when they reached 80% confluence. At around passage 13, the lines were qualitatively classified into the three colony morphology groups (stable and unstable class 1 and 2) by the use of a phase-contrast microscope (Figure 1(c)). Reprogrammed lines generated from donors 1, 2, 5, and 7 were all classified as lines showing stable colony morphology, whereas reprogrammed lines generated from donors 3, 4, and 6 included some cell lines showing unstable colony morphology and some cell lines showing stable colony morphology.

The Colony Morphology of Reprogrammed Cells Predicts
Differences in Spontaneous and Directed Differentiation Capacity. We then assessed how the variation in colony morphology of the reprogrammed cell lines affected the spontaneous and directed differentiation capacity. First, we assessed spontaneous differentiation by testing the capacity to form embryonic bodies (EB) in 14 selected lines. We used AggreWell plates for EB formation, followed by 10 days of culture in suspension plates and 14 days on Matrigel-coated plates, and subsequently analyzed the EB by immunohistochemistry using markers for ectoderm (TUJ1), endoderm (AFP), and mesoderm (SMA) (Figure 2(a)). Already at day 2 in suspension, a difference was noticeable, where EB from stable colonies stayed as individual spheres, whereas EB from unstable class 1 and class 2 formed aggregates (Figure 2(b)). After completing the 29 days of the EB formation protocol, we found, as expected, that all the reprogrammed lines with stable colony morphology were able to form all three germ layers (Figure 2(c)). In contrast, reprogrammed lines with unstable class 1 morphology and unstable class 2 morphology were only able to reliably form ectoderm. Two of the lines (lines 6-A and 3-C) could only form ectoderm. Two of the lines (lines 4-A and 4-C) could form ectoderm and mesoderm, while one of the lines (line 3-B) could form ectoderm and endoderm. Only one of the unstable class 1 lines (line 6-C) was able to form all three germ layers. An overview showing immunohistochemistry images for the lines can be found in supplementary figure 1.
Next, we investigated the directed differentiation capacity using ligands that directed the reprogrammed lines (d0 stage) towards definite endoderm (DE stage) and    furthermore to primitive gut tube (PG stage) (Figure 2(d)). One representative line from each colony morphology group was analyzed by flow cytometry (3 replicates per line) at the starting point, at the DE stage, and at the PG stage. In order to analyze the capacity to exit the pluripotent state and enter and exit the DE stage, we analyzed the cells by flow cytometry at all three time points (d0, DE, and PG) for cells expressing POU5F1 (pluripotency marker also known as OCT4) and SOX17 (essential transcription factor in the formation and maintenance of DE [24]) (Figure 2(e)). We found that the reprogrammed line with stable colony morphology had 99% ± 0:2 SD POU5F1+ cells at d0, dropping to 30% ± 11 SD at DE and 20% ± 4:8 SD at PG (Figure 2(e)). As anticipated, SOX17 was undetectable at d0 and increased to 80% ± 6:9 SD at the DE stage before dropping to 50% ± 26 SD at the PG stage. For the reprogrammed line with unstable class 1 colony morphology, we observed a similar pattern, albeit with only 83% ± 1:6 SD POU5F1+ cells at d0 and 71% ± 4:4 SD SOX17+ cells at the DE stage. Finally, for the unstable class 2 colony morphology line, we found 93% ± 2:4 SD POU5F1+ cells at d0, and the level stayed high at the DE stage (94% ± 2:4 SD) and at the PG stage (95% ± 0:7 SD) with no observable SOX17+ cells at any time point. Taken together, the unstable colony morphology was associated with impaired directed differentiation capacity and reduced capacity to form the three germ layers.

The Variable Colony Morphology Groups Show Distinctly Different Proteomic
Signatures. Global label-free proteomics of the 20 reprogrammed lines yielded 6173 quantified proteins, with an average of~5000 quantified proteins in each sample (Figure 3(a)). Proteins expressed in at least 14/20 samples (n = 5043) were analyzed by unsupervised clustering (Figure 3(b)). We found that reprogrammed lines clustered together based on colony morphology appearance and not by reprogramming method or sex of the donor. However, it should be noted that one of the unstable lines (line 3-B) clustered within the stable colony morphology group, although the line was classified as an unstable class 1 line. Next, we looked at proteins expressed in all samples (n = 3833) and performed a PCA analysis (Figure 3(c)). Again, we found that reprogrammed lines clustered together based on colony morphology, this time with a clear separation between samples from stable and unstable colony morphology lines. However, we noted that unstable class 1 colony morphology and unstable class 2 colony morphology samples did not separate from each other. These groups were thus merged into one common group in the remaining global proteome comparison. Next, we looked at differentially abundant proteins (n = 614) comparing the stable group (14 cell lines) to the unstable group (6 cell lines) (Figure 3(d)). Significant differentially abundant proteins had by definition a p value < 0.05 and a fold change p value < 0.05. We identified 338 proteins being more abundant in the unstable colony morphology group (supplement table 3), and we identified the top molecular and cellular functions associated with these proteins, as listed in Figure 3(e). Furthermore, in the unstable group, we identified proteins well known as markers for mesenchymal cells, including N-cadherin (CDH2), fibronectin (FN1), vimentin (VIM), and matrix metallopeptidase 14 (MMP14) (Figure 3(f)). We also investigated whether we could detect protein markers for any of the three germ layers in the unstable colony morphology group. Endoderm markers (SOX17, GATA4, GATA6, and EOMES) and mesoderm markers (TBXT, FOXF1) were not identified in our data set. Ectoderm markers including nestin (NES), RNA-binding protein Musashi homolog 1 (MSI1), microtubule-associated protein 2 (MAP2), and glial fibrillary acidic protein (GFAP) were identified, among which only NES and MAP2 had a significantly higher abundance in the unstable colony group (Figure 3(f)).
Similarly, we identified 276 proteins being more abundant in the stable colony morphology group (supplement table 2) and we identified the top molecular and cellular functions associated with these proteins (Figure 3(e)). Furthermore, we detected protein markers for pluripotency including podocalyxin-like protein 1 (PODXL), developmental pluripotency-associated 4 (DPPA4), and DNA (cytosine-5-)methyltransferase 3 beta (DNMT3B) (Figure 3(f)). We also noted a significant higher abundance of E-cadherin (CDH1) in the stable colony morphology group. Together with the significant higher abundance of N-cadherin (CDH2) in the unstable colony morphology group, our observations are in line with a cadherin switch (increase of CDH2 and a decrease of CDH1) previously described in EMT events [25]. Figure 3(g) shows the ranked fold changes for the individual proteins providing the signature for both morphology groups.

Common Markers for Pluripotency Did Not Vary
Significantly between Reprogrammed Lines Showing Stable and Unstable Colony Morphologies. Surprisingly, the abundance of the common pluripotency markers sex-determining region Y (SOX2) and octamer-binding transcription factor 4 (POU5F1) was not significantly more abundant in the stable colony morphology group compared to the unstable colony morphology group (Figure 4(a)), with p values of 0.22 and 0.69, respectively (not shown). Furthermore, we asked if other common pluripotency markers had a higher abundance in the stable colony morphology group compared to the unstable colony morphology group, which could serve as a potential marker for stable colony morphology. Based on well-known pluripotency markers previously published [26,27], we identified 33 out of a total of 49 markers in our data set and further assessed the abundance of these 33 markers in stable and unstable morphology groups (Figure 4(a)). Indeed, we identified that a subgroup of around ten proteins including CD9 antigen (CD9) and PODXL was more abundant in the stable colony morphology group, with p values of 0.0003 and 0.0002, respectively. Next, the separation efficiency for POU5F1, SOX2, PODXL, and CD9 to distinguish between the two groups (stable and unstable) was evaluated by making receiver operating characteristic (ROC) curves (Figure 4(b)). Indeed, SOX2 and POU5F1 showed low ability to distinguish between the two groups, with area under the curve (AUC) values of, respectively, 0.62 and 0. 5   data, we measured the levels of CD9 and POU5F1 by flow cytometry in selected stable and unstable colony morphology lines (Figure 4(c)). There was a tendency, although not significant, towards lower POUF5 and CD9 levels in the unstable colony morphology lines.

Pathway Analysis Suggests TGFB-Induced EMT Events in Reprogrammed Lines with Unstable Colony Morphology.
To identify upstream regulators in the unstable morphology group, we performed pathway analysis using the IPA (Ingenuity Pathway Analysis) software tool. In this analysis, we used the differentially abundant proteins (n = 614, displayed in Figure 3(d)) and asked which upstream regulator proteins could explain the emergence of this protein signature in silico. The top scoring upstream regulators in the unstable morphology group were TCF7L2 followed by CTGF and TGFB1 (Figure 5(a) and supplementary table 4). Since TGFB is the major signalling pathway for inducing EMT [28], we sought to focus on TGFB as an upstream regulator in the unstable morphology group. The canonical TGFB signalling is activated by ligands that act on TGFB receptors with subunits ALK4, ALK5, and ALK7 [29] (Figure 5(b)). SMAD2/3 is subsequently phosphorylated and together with SMAD4 enters the nucleus to activate transcription factors and regulate target genes. An alternative cascade occurs though SMURF1-regulated RHOA degradation that mediates EMT [30] (Figure 5(b)). In this case, the activated receptor phosphorylates PAR6, thereby stimulating the recruitment of SMURF1 and leading to tight junction dissolution, which is a characteristic of EMT. In our proteomic data set, several of the molecules involved in both the canonical TGFB route to EMT and the alternative SMURF-regulated route to EMT were identified and displayed in the volcano plot ( Figure 5(c)). For the canonical TGFB signalling pathway, SMAD2 was found to be more abundant in the unstable colony morphology group together with downstream target molecules such as COL1A1 and FN1. For the SMURF1-regulated route, SMURF1 itself was found to be one of the most abundant proteins in the unstable colony morphology group compared to the stable colony morphology group ( Figure 5(c)). Furthermore, we have conflicting data as PAR6 was found to be less abundant and RHOA was found to be more abundant in the unstable colony morphology group. From these different protein abundances, we hypothesize enhanced activity in the canonical TGFB route and a modified activity in the alternative RBPMS2  EPCAM  SMPDL3B  FBXO2  KRT19  TRIM71  ADD2  L1TD1  CYP2S1  ALPL  DNMT3B  SALL4  JARID2  CGN  APOE  DSG2  DSP  AASS  VRTN  BUB1B  PCDH1  DPPA4  CRIP2  ALDH1L2  CSRP1  CD99  NEFL  KCTD12  GSTM3  AHNAK  NEFM  PALLD  COL3A1  NEK7  ANXA 1  LGALS1  SMURF1  P4HA2  ZC3HAV1L  EPB41L3  FKBP9  MAP2  FN1  MRC2   -5   0   5 Fold change (stable/unstable) log 2 relative abundance Stable signature Unstable signature (g)    [26,27] identified in our proteomic data set. The cluster analysis was only applied on rows, not columns. The heat map revealed a group of 10 markers that were able to separate the two morphology groups. (b) ROC curves for POU5F1, SOX2, PODXL, and CD9 when comparing the stable colony morphology group with the unstable colony morphology group. (c) Flow cytometry analysis of the markers POU5F1 and CD9 in selected stable (S) and unstable (U) lines.  SMURF1-regulated route in the unstable colony morphology group.
Trying to validate these findings experimentally, we selected a reprogrammed line with unstable class 1 colony morphology (line 4-C) previously identified with high expression of EMT markers (Figure 3(f)) and exposed it to TGFB inhibitors (ALX-270-445 and A83-01) and a SMURF1 inhibitor (Smurf1-i) to see whether these ligands could reverse the EMT event which would be indicated by an increase in the colony marker E-cadherin (CDH1) and a decrease in the EMT marker vimentin (VIM) [31]. We treated the line for seven days with each drug at three different concentrations and quantified the level of vimentin and E-cadherin by immunocytochemistry ( Figure 5(d)). Although there were observable alterations in the quantified levels, none of the ligands led to significantly decreased levels of vimentin or significantly increased levels of E-cadherin, and we did not observe a reversal of colony morphology (towards stable colony morphology, not shown).

Discussion
In this study, we used label-free quantitative proteomics to compare reprogrammed cell lines displaying stable colony morphology to lines with unstable colony morphology. Colony morphology is typically considered an important criterion for undifferentiated pluripotent cells and is a valuable assessment in the daily routine in stem cell laboratories. However, this assessment suffers from manual and subjective microscopic inspection and is therefore questionable in an automated pipeline for benchmarking of cells [32].
By providing a first proteomic characterization of the molecular signatures of reprogrammed cells displaying different colony morphologies, our results demonstrate proteome signature patterns robustly capturing the colony morphology and provide an insight into the molecular mechanisms involved in spontaneous differentiation. The protein signatures presented here could represent a base for nextgeneration benchmarking of pluripotent cells, correlating protein profiles with colony morphology, which is considered a critical indicator of true pluripotent cells.
In the unstable colony morphology group, we found higher abundance of mesenchymal markers including vimentin (VIM), N-cadherin (CDH2), and fibronectin (FN1). This is in line with previous reports [25,31,[33][34][35][36]. In fact, the presence of mesenchymal-like cells in colonies that undergo spontaneous differentiating was first time reported in 2001 [37]. Furthermore, epithelial to mesenchymal transition (EMT) was subsequently identified and associated with spontaneous differentiation [33,34]. However, EMT markers in differentiating PSCs have mainly been shown by immunohistochemistry and Q-PCR [25,33,35] and also using RNA-seq [38] and DNA microarray [31,36]. In our study, we show for the first time that mass spectrometry-based proteomics can identify similar EMT profile and also capture the broader molecular picture of this event.
It is known that EMT can be induced via several pathways [28]; however, the mechanisms triggering EMT in stem cells are not fully understood. Already in 2005, D'amour et al. discovered an Activin A-induced EMT in the differentiation to DE; however, it was not clear which signalling pathway was involved [39]. Later in 2017, Li et al. showed that Activin A-induced formation of DE includes an EMT event triggered by TGFB signalling [38]. This is in line with our global proteomic assay where the pathway analysis is suggestive for a TGFB-induced event in the unstable colony morphology group. We identified TGFB pathway molecules to be more abundant (SMAD2, SMURF1, ROCK2, and RHOA) as well as downstream target genes (COL1A1, VIM, and FN1). In our attempt to reverse EMT, we tried to inhibit the TGFB receptors by using the ligands ALX-270-445 (inhibits ALK 5 subunit) and A83-01 (inhibits ALK 4, 5, and 7 subunits). We also attempted to inhibit SMURF1 as this TGFB-related protein had a high abundance in the unstable colony morphology group in our data. By using the selected ligands, we observed an alteration in vimentin and E-cadherin expression; however, a reversal of EMT indicated by an increased level of E-cadherin and a decreased level of vimentin was not observed. As EMT can be induced via several pathways and crosstalk can occur [40], the role of the molecules we are targeting can possibly be replaced by other signals. Feng et al. showed for example in 2012 that an activation of PKC is associated with EMT in stem cells, and Kinehara et al. showed in 2014 that by using a PKCinhibitor the EMT was reversed [31].
The underlying reason for the dynamic change of the PSC colony morphology is not fully understood. Epigenetic memory and an incomplete reprogramming could be one explanation [41]. Furthermore, the feeder-free system has been reported to cause unwanted spontaneous differentiation [37], especially when using Matrigel [42]. Both these findings could explain the variation in our sample set. Cell competition was recently found to be a mechanism during reprogramming where elite cells overtake the cell population [43]. Cell competition could also explain changing in colony morphology at a later passage where differentiated cells outcompete nondifferentiated cells. Also, variation in hiPSC lines has been shown to be donor dependent [5,44,45]; our studies, however, showed that variations related to colony morphology are not donor dependent, as three of the donors (donors 3, 4, and 6) had lines classified to more than one morphology group.
The differentiation potential associated with colony morphology is an important aspect as this is a crucial function of PSCs. In our study, we found that reprogrammed lines with unstable colony morphology could form ectoderm; however, the extent of endoderm and mesoderm formation was varying. There have been some studies correlating different classes of PSCs to differentiation capacity; however, most of them have showed a successful formation of the three germ layers in all classes or only tested a selection of qualified lines [11,12]. Only a few studies have showed varying differentiation potential; for example, Chen et al. published in 2009 a study where hESCs were classified in three morphology groups and found that in vivo differentiation capacity, measured by teratoma formation in mice, differed for the classes [6]. However, the hESC classes were based on expression markers, not colony morphology. Also, Wakao et al. published in 2012 a study where only one out of seven iPSC classes could successfully form EB [13]. However, the iPSCs were classified based on cell characteristics and not the overall colony morphology. To our knowledge, our study is unique in classifying the reprogrammed lines (>P10) based on overall colony morphology and correlation to EB formation capacity.
For the PSCs and regenerative medicine field, the safety aspect is unavoidable. Changes and variations in PSC are partly unpredictable, and it is important to evaluate the cells routinely. As typical and common markers for pluripotency have been questioned [13], more comprehensive automated assays to benchmark cells are needed to ensure a sufficient quality control. Our proteomic data show distinct proteomic profiles for the colony morphology groups; hence, the proteomic analysis reflects the colony morphology and the PSC status. In this study, we demonstrate the validity of using proteomics to monitor reprogrammed lines and suggest that it should be part of an automated assay to benchmark cells.

Conclusion
In this study, we classified 20 reprogrammed cell lines based on colony morphology and subsequently tested their differentiation capacity and analyzed their proteomic profiles using mass spectrometry. We found that different defined patterns of colony morphology were associated with distinct proteomic profiles and different outcomes in differentiation capacity. Finally, we provided insight into possible molecular mechanisms involved in the formation of stable and unstable colony morphologies during reprogramming.

Data Availability
The raw MS data files are available via ProteomeXchange with identifier PXD013481.