Parallelism of Chemicostructural Properties between Filgrastim Originator and Three of Its Biosimilar Drugs

The approval and granting of marketing authorization for a putative biosimilar is based on strong comparability studies with its biological reference product. This is due to the complexity of the structure and nature of the manufacturing process of biological drugs. Hence, a rigorous analytical workflow for chemical characterization and clinical trials to evaluate the efficacy and safety is required to demonstrate their high similarities to the reference drug. This work is focused on the comparison of the originator of filgrastim with three of its biosimilars by evaluating their structural similarity and biological activity. Qualiquantitative analyses were performed by MALDI-TOF/TOF-MS and RP-HPLC-UV. An innovative functional assay using zebrafish as the animal model was developed to evaluate the biological activities of the drugs. The different analyses performed in this study highlighted the structural similarity of biosimilar drugs and their originator. This result was further confirmed by a similar in vivo biological activity.


Introduction
Biosimilar drugs have been on the market for some decades now. However, the use of biosimilars in Italy and entire European Union remains to be controversial. A biosimilar, in order to be authorized in the market, must guarantee similar quality, safety, and efficacy as its reference drug [1].
It is known that biosimilars have structures that are closely homologous to the original drug. Indeed, the complexity of the protein structure and the manufacturing process selected (i.e., host cell, production system, and purification) could lead to minor differences in the physiochemical properties of the biological drugs. But these differences should not affect their biological activity, efficacy, and safety. erefore, comparability studies, based on a thorough physiochemical and functional characterization, are mandatory to demonstrate the similarity between biosimilars and their originator [1].
To address this, a robust analytical workflow should be used to determine the primary, secondary, and tertiary structures of the protein and to identify posttranslation modifications (e.g., glycosylation pattern). It is also necessary to assess possible product-related impurities and to evaluate the stability of the biodrug using its degradation profile. Lastly, functional assays should be performed to test their biological activities as well as clinical efficacy and safety [2,3].
is information is mandatory for the development of biosimilars and grinding of their marketing authorization [4,5].
Biosimilars available on the market can be categorized based on chemical and therapeutical criteria. e most common representative drugs belong to the growth factor family, vaccines, and monoclonal antibodies. For this study, the focus is on a group of the hematopoietic growth factor called the granulocyte colony stimulating factor (G-CSF).
G-CSF is a member of regulatory proteins known as cytokines. It stimulates differentiation, survival, and migration of granulocytes as well as activating neutrophils in vitro and in vivo [6,7]. Native G-CSF is a 22,000 Daltons glycoprotein composed of a single polypeptide chain of 207 amino acids, with glycosylation at r-166 (uniprot_P09919) [8].
e native G-CSF is coded by a gene on chromosome 17 where two isoforms are derived from alternative splicing. Both isoforms are made up of a signal peptide of 29 amino acids  followed by a peptide chain with 178 amino acid residues for isoform A and 175 amino acid residues for isoform B. e difference is accounted by the addition of three more residues (Val-Ser-Gln) in isoform A inserted after Leu65. Isoform B, which is more acidic, elicits a higher biological activity and greater stability than the longer isoform A. Because of this property, its amino acids sequence was chosen as a template for the comparison of the commercially produced G-CSF filgrastim [9].
Filgrastim is produced by recombinant DNA technology using bacteria, particularly Escherichia coli as host cells. It is primarily used to reduce the effect and duration of neutropenia due to different etiologies [10]. is protein consists of 175 amino acids with a molecular weight of approximately 18,800 Da [11]. e sequence of filgrastim is identical to isoform B of G-CSF isolated from human cells, except for two modifications: (1) the presence of a methionine in the N terminal position of the recombinant protein instead of an alanine (r-met Hu G-CSF) and (2) the absence of glycosylation [9,12], which is due to the lack of posttranslational modifications mechanism in the E. coli expression system. Despite these differences, filgrastim preserves the biological activity of the isoform B of human G-CSF [13]. e molecule contains a free cysteine (Cys) at position 18 and two intramolecular disulfide bonds Cys37-Cys43 and Cys65-Cys75 that are essential to preserve a properly folded tertiary structure of the rhG-CSF molecule [14,15].
Filgrastim was first sold in the market in 1991. Since the expiration of its patent, more than fifteen biosimilars are commercially available in the world. Although several comparability studies on their structures and function have already been reported in the literatures [16][17][18][19][20], this study is the first one, to the best of our knowledge, to compare all four drugs distributed in Italy based on the structural and functional point of view. is is important to properly inform end users (i.e., patients and medical staff) about biosimilar drugs as an alternative, rather than preventing them from using it due to lack of trust.
In this paper, we showed the comparison between the originator of filgrastim Granulokine ® (Italian brand name of Neupogen ® , GRA) with its biosimilars supplied on Italian market, namely, Nivestim ™ (NIV), Tevagrastim ® (TEV), and Zarzio ® (ZAR). We particularly focused on the in-depth characterization of the peptide sequences for all drugs to unravel possible posttranslational modification that could lead to changes in amino acid composition and structure. By using sophisticated instruments such as reversed-phase high-performance liquid chromatography RP-HPLC and mass spectrometry, we were able to confirm the structural similarity of the four drugs and identified the presence of one disulfide bridge, which was found to be essential for the biological activity of these proteins.
Moreover, in order to provide information about their functional activity, we performed an innovative functional analysis by utilizing zebrafish embryo (Danio rerio), to compare the effect of the drugs on immune system cell activation. Recently, zebrafish has become a very interesting animal model in different fields, including pharmacology and toxicology [21][22][23][24][25][26][27][28]. Aside from small molecule discovery and screening, it is now used for complex biological drugs characterization, such as biosimilars [29,30]. Zebrafish is a less expensive and more manageable organism than the conventional animal models, such as the mouse or rat. It allows to conduct fast and reproducible tests for high throughput screenings [31,32].
Zebrafish is widely used as the animal model to study vertebrate hematopoiesis in vivo. Genetic and molecular pathways are highly conserved between zebrafish and mammals. e morphology and function of their blood cells are also very similar. Monocytes/macrophages and granulocytes neutrophils, together with erythrocytes, are the first blood cells to enter the bloodstream at around 24 hours postfertilization (hpf ) [33,34]. e white blood cells of the innate immune system are able to respond quickly to proinflammatory stimuli, reaching the site of the stimulus in few minutes. Because the embryos are transparent and there is no adaptive immunity yet until the third day of development, microinjection of exogenous substances to the embryos is widely used as a validated protocol for in vivo studies of innate immune response (neutrophils and macrophages) [30,[35][36][37].
Filgrastim, GRA, and its biosimilars are analogues of zebrafish G-CSF, which promotes the proliferation and differentiation of granulocytes. G-CSF and its receptor G-CSFR have been identified and well characterized in zebrafish, revealing conservation of gene and protein sequences among different species, including human and mouse. Moreover, it was demonstrated that the functional G-CSF/G-CSFR mechanism of action is highly conserved [38]. Zebrafish embryos may thus represent a suitable in vivo animal model to study the effects of exogenously administered G-CSF. e results obtained indicate similar functional and structural properties of all tested compounds supporting the assessment of biosimilarity among them.

Drugs Tested.
All drugs were obtained in their commercially available forms as an injection solution in prefilled syringes (one batch for each compound), as reported in Table 1. Filgrastim reference standard (Y0001173_ European Pharmacopoeia Reference Standard) was purchased from Sigma Italia (Milan, Italy).

pH Value
Determination. e pH of the pharmaceutical preparation GRA and its biosimilars NIV, TEV, and ZAR was measured on three different sampling batches at 25°C by using the pH meter Basic 20 (Crison). e high quality sensor probe 50 28 (Crison), a specific electrode for pH measurement in small sample volumes, possesses the technical specifications required by the European Pharmacopoeia [12]. e pH value is expected in a range between 3.8 and 4.2 [39], and it must not differ by more than 0.05 pH units from the value corresponding to the sample in analysis [12].
LC method optimization was carried out using the filgrastim reference standard at concentration of 340.00 μg/ml (data not shown).
For the quantitation of filgrastim, a calibration curve was done using 5 different dilutions of the filgrastim reference standard (340.00 μg/ml, 170.00 μg/ml, 85.00 μg/ml, 42.50 μg/ml, and 21.25 μg/ml). Fifteen microliters of each standard dilution was analyzed by RP-HPLC-UV as reported above. e calibration curve was plotted using the area under peak versus filgrastim standard reference concentrations. GRA, NIV, TER, and ZAR concentrations were calculated by determining their area under peak and comparing them to the area with the calibration curve (Electronic Supplementary Material Figure S1).

MALDI-TOF/MS.
Before mass spectrometry analyses, proteins were desalted and concentrated by using Zip-Tip ® SCX, according to the manufacturer's protocol.
An equal volume of purified proteins was mixed with an equal volume of the appropriate matrix, SA (10 mg/ml, in acetonitrile : water : TFA 70.00 : 30.00 : 0.10, v/v/v). One microliter of the mixture was loaded on the MALDI plate and allowed to dry at room temperature (rt).
Experiments were carried out on three different sampling for each batch on an AB Sciex 5800 MALDI TOF/TOF-MS, equipped with a nitrogen laser (k � 337 nm). Protein samples were analyzed using the midlinear mode, setting the laser intensity at 5,000 μJ with a pulse laser of 400 Hz, the detector voltage multiplier at 0.68, and recordering the spectra in a mass range from 4,000 to 45,000 Da with a focus mass of 18,800 Da.

GC-MS
Analysis. An aliquot (2 µl) of the peak eluted at 1.1 min in ZAR samples was collected during three independent RP-HPLC qualitative analysis, and it was analyzed by GC-MS using the method previously reported by Gianoncelli et al. [29] with the capillary column HP-5MS (30 m, 0.25 mm ID, 0.25 mm film thickness; J&W Scientific, Folsom, CA, USA).

Enzymatic Digestion with Endoproteinase Glu-C.
e whole procedure was carried out in a laminar flow with sterilized hood and powder-free gloves, in order to reduce keratin contamination. e enzymatic digestion was conducted in both reducing and nonreducing conditions, on three different samplings for each batch. is is necessary to maintain the pH value between 7.50 and 8.00, which is the optimum value for the enzyme activity. e solution containing the protein was reduced with 2 μl of 0.1 M DTT solution and was kept at 56°C for 1 h. In order to alkylate the thiol residues, 2 μl of 0.027 M IAA solution was added, and the mix was kept at rt for 20 minutes. e solution was digested with 1 μl of an aqueous solution of Glu-C (0.50 μg/μl, enzyme : substrate ratio of 1 : 10 (w/w)), and it was kept overnight (o/n) at 37°C. Subsequently, GRA, NIV, TEV, and ZAR were digested using the same protocol.

Nonreducing
Condition. Ten micrograms of filgrastim standard (corresponding to 16.70 µl, 0.53 nmol) was digested with 2 μl of aqueous solution of Glu-C (0.50 μg/μl, enzyme : substrate ratio of 1 : 10 (w/w)) and diluted with 50 μl of NaH 2 PO 4 (0.02 M pH 7.80), which is necessary to maintain the pH value of 7.50, the optimum value for the enzyme activity. e solution was kept o/n at 37°C. Subsequently, GRA, NIV, TEV, and ZAR were digested using the same protocol. e solution was kept o/n at 37°C. Each experiment was conducted on three different sampling for each batch.

RP-HPLC-UV.
Twenty microliters of each drug digested with Glu-C were separated on a Waters XSELECT CSH C18 column (150 mm × 2.1 mm ID, particle size 3.5 μm); the mobile phase composition was (A) water : acetonitrile : TFA (95.00 : 5.00 : 0.05 v/v/v) and (B) acetonitrile : water : TFA (95.00 : 5.00 : 0.05 v/v/v); the mobile phase flow rate was 0.2 ml/min. e gradient program was as follows: from 10 to 50% B in 40 min, then to 90% B in 5 min for 2 min, then to 10% B in 3 min, and reequilibration to 10% B for 15 min. All analyses were performed on three different samplings for each batch at 30°C; the detection wavelength was set at 215 nm.
LC method optimization was carried out using filgrastim reference standard digested with Glu-C (data not shown).

MALDI-TOF/TOF-MS.
Before mass spectrometry analyses, peptides were desalted and concentrated by using ZipTip ® C 18 , according to the manufacturer's protocol.
An equal volume of purified peptides was mixed with an equal volume of the appropriate matrix, CHCA (10.00 mg/ml, in acetonitrile : water : TFA 70.00 : 30.00 : 0.10, v/v/v). One microliter of the mixture was loaded on the MALDI plate and allowed to dry at rt.
Experiments were carried out on three different samplings for each batch on an AB Sciex 5800 MALDI TOF/ TOF-MS, equipped with a nitrogen laser (k � 337 nm). Peptide samples were analyzed using the reflector mode, setting the laser intensity at 3,500 μJ with a pulse laser of 400 Hz, the detector voltage multiplier at 0.52, and recording the spectra in a mass range from 700 to 2,600 Da with a focus mass of 1,600 Da. All mass spectra resulted from accumulation of at least 1,500 laser shots using a random search pattern.
e most intense peptides were subjected to tandem mass spectra, using the MS/MS mode setting with the laser intensity at 5,000 μJ with a pulse laser of 1,000 Hz and the detector voltage multiplier at 0.60. All mass spectra resulted from the accumulation of at least 17,500 laser shots using a random search pattern.
MALDI-TOF/TOF mass spectrometer was calibrated using ProteoMass ™ Peptide and Protein MALDI-MS Calibration Kit.

Fish Maintenance and Egg
Collection. All zebrafish embryos were handled according to national and international guidelines, following protocols approved by the local Committee (OPBA protocol nr 211B5.24) and authorized by the Ministry of Health (authorization nr 393/2017-PR). Healthy adult wild-type zebrafish (AB strain) were used for eggs production. Fish were maintained under standard laboratory conditions as described [40], at 28°C on a constant of 14 h light/10 h dark cycle. Immediately after spawning, fertilized eggs were harvested, washed, and placed in 10 cm Ø Petri dishes in fish water. e developing embryos were incubated at 28°C and maintained in 0.003% (w/v) 1-phenyl-2thiourea to prevent pigmentation.

In Silico Analysis.
e human G-CSF protein information collected in the UniProt database (uni-prot_P09919) were used to obtain the human G-CSF receptor (CSF3R) entry, in the "Protein Interaction" section [8]. Ensembl full-length sequences of the CSF3R transcript and protein were deduced and used to search the zebrafish assembly on BLAST.
e Ensembl genome sequence  [41]. One full-length zebrafish csf3r transcript, encoding for one Csf3r protein, was identified. A comparative analysis of gene organization between human and zebrafish G-CSF receptor genes was carried out, employing information supplied by the Ensembl database [41]. Synteny analysis was performed using the Synteny database and the Genomicus genome browser [42,43]. e protein sequences of human and zebrafish G-CSF receptor were employed to perform multiple sequence alignment by the Clustal Omega program [44].

Leukocytes
Quantification. e reference standard of filgrastim, as well as the originator GRA and its biosimilars TEV, NIV, and ZAR, was diluted in 0.05% (w/v) phenol red solution (Sigma-Aldrich) to the final concentration of 250 ng/μl. At 48 hours post fertilization (hpf ), 1 nl of each dilution was injected into the otic cavity of dechorionated zebrafish embryos. As a negative control, 0.05% (w/v) phenol red solution without the pharmaceutical compounds was used. Escherichia coli JM109 bacteria in 0.05% (w/v) phenol red solution were used as positive control [37]. Embryos were incubated at 28°C for 2 h after injection and then fixed in 4% (w/v) paraformaldehyde in PBS overnight at 4°C. Fifty embryos for each injected compound were used to perform whole-mount in situ hybridization (WISH), according to isse protocol [45]. e probes pu1, lplastin, and mpx were selected to identify early zebrafish myeloid progenitors, monocytes/macrophages, and granulocytes neutrophils, respectively [46]. Embryos were mounted in agarose-coated dishes, and images were taken with a Leica MZ16 F stereomicroscope equipped with DFC 480 digital camera and LAS Leica Imaging software (Leica, Wetzlar, Germany). Leukocytes quantification was performed using ImageJ 1.45 s image analysis software. Quantifications are expressed as a mean ± standard deviation of three independent experiments.
2.11. Statistical Analysis. Statistical analyses were done using GraphPad Prism software 6.01 version (La Jolla, CA, USA). One-way ANOVA followed by Dunnett's test was performed to identify statistically significant differences among the different groups of data, considering a p value < 0.05 as the threshold for a significant difference.

pH Values.
e measured pH values are the following: GRA 3.95 ± 0.03, NIV 3.90 ± 0.02, TEV 4.12 ± 0.02, and ZAR 4.28 ± 0.02. All values were within the range of variability due to the presence of acetic acid and sodium hydroxide (sodium acetate) as excipients in the pharmaceutical formulation ( Table 1). As claimed by the manufacturer, ZAR had a different composition of the excipients; i.e., the sodium acetate was replaced by glutamic acid. However, this difference did not result in significant changes in the pH value.

Qualitative Analyses of Intact Proteins by RP-HPLC-UV and MALDI-TOF-MS.
In order to evaluate the purity of the protein present in the formulations, as well as the possible presence/absence of degradation products or aggregates, we performed RP-HPLC-UV analyses on the pharmaceutical preparations under study. e results obtained from RP-HPLC were confirmed also by MALDI-TOF-MS and analytical technique that allow the measurement of intact proteins with molecular weight higher than 100,000 Da.

RP-HPLC-UV Qualitative Analysis. RP-HPLC-UV
analyses demonstrated the presence of a single molecular species (presumably filgrastim) in the originator GRA as well as in its biosimilar products NIV, TEV, and ZAR with no sign of product-related impurities. e overlapping chromatograms of the drug under study are reported in Figure 1. It is possible to note the presence of only one major peak that eluted at 17.500 min, which was the same retention time of the filgrastim reference standard (data not shown). A zoom of chromatograms highlighted a small variation on the retention time of each individual peak probably due to the instrumental error. However, this difference was still in the acceptable range of variability. e filgrastim protein present in the originator GRA and in its biosimilar products NIV, TEV, and ZAR eluted at 17.567, 17.505, 17.572, and 17.466 min, respectively.
Interestingly, only in the chromatogram of ZAR (Figure 1), it was possible to detect a second minor peak, which eluted at 1.100 min, close to the solvent front. is peak was recovered during RP-HPLC-UV analysis by a fraction collector and analyzed by GC-MS (Electronic Supplementary Material Figure S2A, S2B, S3A, and S3B). It was identified as the glutamic acid used as excipient only in ZAR pharmaceutical formulation (Table 1).

MALDI-TOF-MS Qualitative
Analysis. MALDI-TOF-MS analysis was performed to determine the molecular weight of the proteins contained in the pharmaceutical formulations. As shown in Figure 2, the mass spectra of the samples were similar and characterized by 3 peaks with the m/ z value of about 9,400, 18,800, and 37,700, respectively.
While the main peak with an m/z value of approximately 18,800 represented the single charge of filgrastim, the m/z value of about 9,400 represented the double charge of filgrastim and the m/z value of about 37,700 could represent the single charge of the filgrastim dimer. e major peak had an average mass value in the range of 18,831-18,842 Da, which was in agreement with the molecular weight of the filgrastim standard reference (18,799 Da). is result suggested that the main component of these preparations is indeed filgrastim. No peaks corresponding to contaminants or impurities were observed in the mass profile.

Quantitative Analysis of Intact Proteins by RP-HPLC-UV.
In order to quantify the protein amount from different pharmaceutical preparations, a calibration curve was prepared Journal of Chemistry using the lgrastim reference standard at di erent concentrations (Electronic Supplementary Material Figure S1). e linearity of the calibration curve was assessed over a range of 21.25-340.00 µg/ml. e method proved to be linear with an R 2 0.9999. e quantitative analyses of GRA, NIV, TEV, and ZAR showed that their respective average concentrations were 579.9305 ppm (with a recovery of -3.345%), 592.2905 ppm (with a recovery of −1.285%), 596.1610 ppm (with a recovery of −0.640%), and 570.4979 ppm (with a recovery of −4.917%) (Electronic Supplementary Material Figure S4 and Table S1). Results obtained for each pharmaceutical preparation correspond to the claims on the label by the individual manufacturers.

Structural Analyses of Drugs: Peptide Mapping.
To better characterize the chemical structure of lgrastim, the originator and biosimilar drugs were studied using a proteomic approach. Samples were digested by Endoproteinase Glu-C followed by a chymotrypsin digestion, in order to obtain the optimum length of the peptides for the analysis in MALDI TOF/TOF-MS. Both digestions were performed in nonreducing conditions to verify the presence of disul de bridges, which are essential for the biological activity of this protein. On the contrary, Glu-C digestion was conducted in reducing conditions; this has been helpful in order to demonstrate the presence of the disul de bridge. Peptides resulted from both digestions were puri ed by Zip-Tip C 18 and analyzed by MALDI-TOF/TOF mass spectrometry and RP-HPLC-UV. Figure 3 shows the RP-HPLC-UV chromatograms of the drug samples previously digested in the nonreducing condition. As shown in panel A, the chromatographic pro les of peptides were identical for each drug. e data were then compared to the peptides obtained from digestion of the lgrastim reference standard (data not shown). Although all digested proteins had a similar pro le, the intensity of some individual signal was slightly di erent; this was illustrated better by overlapped chromatograms (Figure 3(b)). e apparent discrepancy was probably due to the enzymatic digestion process which, although performed under the same experimental condition, may result in di erent cutting e ciencies, thereby generating di erent concentrations of peptide products. In general, sometimes during the digestion process, proteases do not cleave perfectly the protein even if we are in the same experimental condition; so, one or more missed cleavages must be taken into account when analyzing the peaks resulting from mass spectrometry and liquid chromatography analyses. Figure 4, we report the comparison of MALDI-TOF/TOF MS analysis of lgrastim after proteolytic digestion in nonreducing (Figure 4(a)) and reducing (Figure 4(b)) conditions. In both cases, the mass range comprised between m/z 1,522 and m/z 1,660 was zoomed to better highlight possible di erences in the spectrum. As shown in Figure 4(a′), we were able to identify one peak with the m/z value of 1,532 that corresponded to the peptide " 35 KLCATYKLCHPEE 47 ," in which C 37 and C 43 are linked by an intramolecular disul de bond that is necessary for the activity of the compound ( Figure 5) [47].

MALDI-TOF/TOF-MS Qualitative Analysis. In
To con rm this result, we performed protein digestion under reducing conditions to break the disul de bond. As expected, the peak with m/z of 1,532 was not detectable in the relative mass spectrum (Figure 4(b′), while we were able to identify two new peaks (m/z values of 1,534 and 1,648) which are consistent with the theoretical molecular weight of the peptide " 35 KLCATYKLCHPEE 47 " with reduced (+2 Da) and alkylated (+57 Da for each Cys) disul de bond.
ese data indicate that the analytical approach employed in this study can be used to verify the integrity of chemical bonds within the lgrastim molecule.
A similar pattern of results was obtained when the MALDI-TOF/TOF MS analysis was applied to the originator GRA and its biosimilar products NIV, TEV, and ZAR. e relative data are reported in Table S2 (Electronic  Supplementary Material Table S2). e peptide mass values, experimentally determined, matched with the theoretical peptide mass of lgrastim obtained from the in silico digestion [48]. As expected, the small peptides consisting of few amino acids were lost during sample puri cation or suppressed due to interference by matrix ions in the low m/z range and thus were not detected.
In peptide maps, peak patterns of GRA and its biosimilars (NIV, TEV, and ZAR) were comparable, with no additional or missing peptides detected, indicating the identical primary structure. e tandem mass analysis allowed us to con rm the sequence, amino acid for amino acid, of each peptide corresponding to the peak found during analysis in MALDI-TOF/TOF-MS (data not shown).

Journal of Chemistry
Overall, these results showed that the masses of the peptides generated by the Glu-C/chymotrypsin digestion of the originator GRA and its biosimilar products NIV, TEV, and ZAR were identical and superimposable to the theoretical and experimental mass of lgrastim. In addition, disul de bonds were preserved in all drugs as shown by the data obtained after digestion in nonreducing conditions.

In Silico and In Vivo Analysis on Zebra sh Embryo.
In order to evaluate if zebra sh could be used as an animal model for lgrastim functional analysis, we rst assessed the similarity between human and zebra sh G-CSF receptors by in silico analysis. We then investigated the e ects of lgrastim on the activation of the innate immune response in vivo.
e information about human G-CSF protein (P09919) supplied by the UniProt database [8] allowed us to identify, in the "Protein Interaction" section, the full length of the human G-CSF receptor (CSF3R) entry: Q99062. is was used to search the Ensembl database for CSF3R 836 amino acids full length protein sequence, with accession number of ENSP00000362198 [41]. e human CSF3R protein is encoded by the 3,373 bp transcript ENST00000373106.5, which is the product of the CSF3R gene ENSG00000119535 located on human chromosome 1.
e Ensembl human G-CSF receptor protein and transcript sequences correspond to the NCBI database RefSeq: NP_000751 and RefSeq: NM_000760, respectively [49]. e human CSF3R sequence was used to BLAST search the zebra sh GRCz11 Ensembl genome assembly [41]. We identi ed one fulllength zebra sh Csf3r transcript ENSDART00000063986.6 (RefSeq: NM_001113377), which is the product of csf3r gene ENSDARG00000045959, located on zebra sh chromosome 16. e 2,687 bp zebra sh transcript encoded for the 810 amino acids Csf3r protein ENSDARP00000063985 (RefSeq: NP_001106848.1).  Journal of Chemistry e gene organization of both human and zebra sh G-CSF receptor appeared very similar, constituted by 16 exons, of which 15 coding, and 15 introns. It is well established that a conserved colocalization of gene clusters among di erent species often correspond to a conserved protein function [50]. A synteny analysis was performed between the human chromosome 1 and the zebra sh chromosome 16, in the genomic region that contains the G-CSFR gene. By using the Genomicus genome browser [43], we found two paralogs and ve ortholog genes in the G-CSFR syntenic region, two of which maintained the same orientation and three orientated in the opposite direction. e analysis was repeated by using the Synteny database [42], which allowed the evaluation of a more extended genomic region. By analyzing a 100-gene window, one more ortholog gene associated with the G-CSFR gene was highlighted, while enlarging the window size to 200 genes, 74 gene pairs were found. e amino acid sequences of human and zebra sh G-CSF receptor were employed to perform Clustal Omega multiple sequence alignment [44]. e human CSF3R amino acid sequence covered 93% of zebra sh Csf3r sequence, with 29% identity and an overall similarity of 43%. By using the information collected in the UniProt database [8], we identi ed in the human G-CSF receptor sequence (UniProt: Q99062) the most important domains and the amino acids responsible for posttranslational modi cations. e eight cysteines located in position 26, 46, 52, 101, 131, 142, 248 and 295 on human CSF3R protein were conserved in the zebra sh Csf3r sequence in position 26, 46, 52, 100, 129, 141, 234, and 284, respectively. ese amino acids are essential to form four disul de bonds, necessary for the correct protein folding and function. Six out of eight N-glycosylation sites in the human G-CSF receptor sequence in position 51, 93, 128, 134, 579, and 610 were conserved in the zebra sh csf3r sequence in position 51, 92, 126, 132, 552, and 589, respectively. e WSXWS motif in position 318-322, necessary for proper protein folding and receptor binding, was also present in the zebra sh Csf3r sequence in position 306-310. Moreover, box 1 motif in position 658-666, required for JAK interaction and activation, was well conserved in position 638-646 of the zebra sh Csf3r sequence.  Red. * * Figure 4: Comparison of MALDI-TOF/TOF-MS analysis of the lgrastim standard after nonreducing (a, a′) and reducing (b, b′) conditions. In (a, a′), it was underlined one peak at an m/z value of 1,532 that means the presence of the disul de bond (red arrow) and the absence of two peaks at m/z values of 1,534 and 1,648 highlighted by blue arrows that represent the disul de bond reduced and alkylated, respectively. In (b, b′), conversely, it was underlined the presence of two peaks at an m/z value of 1,534 and 1,648 (red arrows) that represent the disul de bond reduced and alkylated, respectively, and the absence of one peak at an m/z value of 1,532 related to the presence of the disul de bond (blue arrow). e MALDI-TOF/TOF spectra of digested proteins in nonreducing conditions are shown in Figure 5. It is possible to note that identical peptides were present in each drug preparation ( * nonreducing content; * * reducing content).  Data obtained by computational analysis strongly suggested that the zebra sh G-CSF receptor could recognize and bind human recombinant G-CSF protein.
To perform a functional analysis in vivo of lgrastim, as well as the originator GRA and its biosimilars TEV, NIV, and ZAR, we conducted preliminary experiments in order to set the optimal drug concentration to be used in zebra sh embryos. Based on the data available in the literature, several increasing doses of lgrastim reference standard were selected to be administered to the embryos. In a particular study on the e ect of gamma radiation, the dose of lgrastim (Neupogen ® ) injected subcutaneously in mice was 300 µg/kg [51]. In another preclinical study, lgrastim (Neupogen ® ) was administered subcutaneously to neutropenic and nonneutropenic rats at the following doses: 10, 20, 30, 100, and 500 µg/kg [52]. In human patients, lgrastim is used in clinical practice at 1-10 µg/kg/ day, depending on the pathology to be treated and on its severity, as reported in the manufacturer lea ets. Based on the collected data, the following range of doses administered to the zebra sh embryos were 1, 5, 10, 100, and 500 µg/kg, which correspond to pg/mg. Since the average weight of a 48 hpf embryo is 0.5 mg, the nal doses of lgrastim reference standard administered were 0.5, 2.5, 5, 50, and 250 pg/embryo. Following a well-established protocol [36,37], the lgrastim reference standard was diluted in 0.05% phenol red solution to the selected concentrations and was injected into the otic capsule of healthy zebra sh embryos at 48 hpf. Negative control embryos were injected with the 0.05% phenol red solution without the drug, while Escherichia coli JM109 in 0.05 % (w/v) phenol red solution was used as the positive control. Twenty-ve embryos for each point were injected. Treated embryos were incubated at 28°C for 2 h after injection to let the drugs act, and then a WISH was performed. e mpx probe was selected to identify granulocytes neutrophils in the preliminary experiments [46]. Concentration curves demonstrated a concentration-dependent increase in the neutrophils number in the site of injection. Moreover, none of the tested doses resulted lethal or harmful for the embryos (data not shown). Based on the results obtained from the preliminary experiments, we chose the highest dose of 250 ng/μl to be used in succeeding experiments.
It was veri ed if the originator GRA and its biosimilars TEV, NIV, and ZAR could stimulate granulocytes in zebra sh embryos with similar potency compared to the reference standard of lgrastim, as well as compared among them. Following the same protocol used in preliminary experiments, all the samples were injected into the otic capsule of healthy zebra sh embryos at 48 hpf. e probes pu1, lplastin, and mpx were selected to identify early zebra sh myeloid progenitors, monocytes/macrophages, and granulocytes neutrophils, respectively [46]. Results are reported in Figure 6, in which the amount of the three leukocyte populations is shown as a percentage of the embryo area marked by the corresponding probe. Both myeloid progenitors, monocytes/macrophages, and granulocytes neutrophils were signi cantly attracted (p < 0.05) to the site where E. coli (positive control, black column) was injected as compared to the ones injected with negative control (white column). e percentage of leukocytes in the positive controls was 2.95-, 1.71-, and 2.19-fold higher than negative controls for pu1, lplastin, and mpx probes, respectively. e number of myeloid progenitors, macrophages, and neutrophils was increased in a statistically signi cant way (p < 0.05) when compared with the respective negative controls. Likewise, it was observed in embryos treated with all the tested drugs (gray columns), when compared with the respective negative controls. e amount of the three leukocyte populations in treated embryos was comparable to that observed in the respective positive control embryos. e increase of pu1-positive cells was around 2.5 times higher than negative controls in embryos treated with the reference standard of lgrastim, as well as GRA, TEV, NIV, and ZAR. e percentage of the  lplastin-positive area in treated embryos was about 1.5-fold higher than negative controls, while the increase of mpxpositive neutrophils was around 2 times higher than negative controls in embryos treated with all the tested drugs. ese data showed that the analyzed pharmaceutical compounds can e ciently activate the innate immune response in vivo, with similar potency in the positive immunomodulatory action when compared to the reference standard of lgrastim, as well as when compared among them.

Discussion and Conclusion
Here we show the results of a comparative study among the biotechnology drug GRA and three of its biosimilars NIV, TEV, and ZAR. is is the rst time, to the best of our know how, in which all four drugs distributed in Italy are compared at the same time and in the same way both from a structural and functional point of view. Quantitative and qualitative chemical evaluation has been assessed by recognized techniques such as RP-HPLC-UV and MALDI-TOF-MS while biological activity has been studied in vivo using an innovative experimental animal model represented by zebra sh embryos.
Quantitative analyses showed that concentration of lgrastim, in all analyzed drug preparations, matched the label claims made by the corresponding manufacturers. e qualitative analysis performed by RP-HPLC-UV demonstrated that all four drugs were characterized by a single main peak. e only di erence is in ZAR, which had an additional peak, and was identi ed by GC-MS as glutamic acid.
However, it should be emphasized that molecular weight determination, even when associated with known DNA sequence inserted in the known cell system, is not su cient to identify a molecular species. Coding errors as well as posttranscriptional modi cations may indeed alter the primary, secondary, or tertiary protein structure. us, with MALDI-TOF/TOF-MS, we were able to demonstrate that all four drugs had the same amino acid sequence, and in nonreducing conditions, both disulfide bridges were conserved. Finally, by functional in vivo analysis using the zebrafish embryo, we showed that there is no difference in their biological activities. We have confirmed in vivo that TEV, NIV, and ZAR are similar to their originator GRA in terms of efficiency in activating the innate immune response, with similar positive immunomodulatory action.
Today's complex biologic formats and the everincreasing regulatory demands necessitate accurate and robust analytical methods to characterize a molecule. us, the use of the most effective tools to assess a molecule's physicochemical properties, especially in the case of biosimilars and innovators, will ensure that the end product is on a par with the original, safe, and efficacious. e robust comparative analytical and functional data reported in this study are important in order to increase the knowledge of end users (i.e., patients and medical staff) so that there is not restriction on the use of biosimilar due to lack of trust because it supports the biosimilarity among the biosimilar drugs Nivestim ™ , Tevagrastim ® , and Zarzio ® and the reference product Granulokine ® . Figure S1: calibration curve and correlation coefficient obtained from analysis of filgrastim reference standard. Five different dilutions of the filgrastim reference standard (340.00 μg/ml, 170.00 μg/ml, 85.00 μg/ml, 42.50 μg/ml, and 21.25 μg/ml) were analyzed by RP-HPLC-UV. e calibration curve was plotted using the area under peak versus filgrastim standard reference concentrations. Figure S2: GC-MS spectrum of the second minor peak found only in ZAR formulation and recovered during RP-HPLC analysis. In A, the chromatogram is reported, while in B, the related mass spectrum is reported. Figure S3: peak identification by comparing the experimental mass spectrum against mass spectra in a specific library. In A, is reported the mass spectrum of the unknown peak, while in B, the mass spectrum is matched in the library and corresponding to the glutamic acid. Figure S4: quantitative RP-HPLC-UV analyses of Granulokine ® (GRA), Tevagrastim ® (TEV), Nivestim ™ (NIV), and Zarzio ® (ZAR). Each drug was injected two times, and the chromatogram shows in blue the first analysis and in black the second one. Table S1: data analysis of quantitative double analyses of Granulokine ® (GRA), Tevagrastim ® (TEV), Nivestim ™ (NIV), and Zarzio ® (ZAR) and relative average and recovery calculation. Each drug was injected two times, in blue the values of the first analysis and in black the values of the second one. Table S2: matrix-assisted laser desorption/ionization timeof-flight/time-of-flight mass spectrometry-(MALDI TOF/ TOF-MS-) positive ion spectrum of Granulokine ® (GRA) and its biosimilars, Tevagrastim ® (TEV), Nivestim ™ (NIV), and Zarzio ® (ZAR) after endoproteinase Glu-C, in the nonreducing condition, and chymotrypsin digestion and purification with Zip-Tip C18. (Supplementary Materials)