Assessing the Role of DNA Methylation-Derived Neutrophil-to-Lymphocyte Ratio in Rheumatoid Arthritis

Rheumatoid arthritis (RA) is a disease of chronic systemic inflammation (SI). In the present study, we used four datasets to explore whether methylation-derived neutrophil-to-lymphocyte ratio (mdNLR) might be a marker of SI in new onset, untreated, and treated prevalent RA cases and/or a marker of treatment response to the tumour necrosis factor inhibitor (TNFi) etanercept. mdNLR was associated with increased odds of being a new onset RA case (OR = 2.32, 95% CI = 1.95–2.80, P < 2 × 10−16) and performed better in distinguishing new onset RA cases from controls compared to covariates: age, gender, and smoking status. In untreated preclinical RA cases and controls, mdNLR at baseline was associated with diagnosis of RA in later life after adjusting for batch (OR = 4.30, 95% CI = 1.52–21.71, P = 0.029) although no association was observed before batch correction. When prevalent RA cases were treated, there was no association with mdNLR in samples before and after batch correction (OR = 0.34, 95% CI = 0.05–1.82, P = 0.23), and mdNLR was not associated with treatment response to etanercept (OR = 1.10, 95% CI = 0.75–1.68, P = 0.64). Our results indicate that SI measured by DNA methylation data is indicative of the recent onset of RA. Although preclinical RA was associated with mdNLR, there was no difference in the mean mdNLR between preclinical RA cases and controls. mdNLR was not associated with RA case status if treatment for RA has commenced, and it is not associated with treatment response. In the future, mdNLR estimates may be used as a valuable research tool to reliably estimate SI in the absence of freshly collected blood samples.


Introduction
Rheumatoid arthritis (RA) is the most common inflammatory arthropathy, characterized by chronic systemic inflammation (SI) [1].The pathophysiology of RA involves a complex interplay between different cells including leukocytes, synovial fibroblasts, chondrocytes, and osteoclasts that leads to loss of immune homeostasis [1].Of all the cells implicated in the pathophysiology of RA, neutrophils possess the greatest cytotoxic potential owing to their ability to release degradative enzymes and reactive oxygen species [2].They are activated by exposure to immune complexes, rheumatoid factors, and cytokines in synovial fluid [3].In addition, the neutrophils interact with macrophages, dendritic cells (DCs), natural killer cells, mesenchymal stem cells, and lymphocytes influencing innate and adaptive immune responses leading to SI [3,4].Circulating blood cell components such as white blood cell and mean platelet volume are considered putative biomarkers of inflammatory activity [5,6].Clinically, this inflammatory activity can be measured by acute phase proteins [7], although recent studies have shown that aberrant neutrophil-to-lymphocyte ratio (NLR) may be used as a marker of SI in the development of coronary heart diseases [8,9], solid tumours [10], and autoimmune diseases like Takayasu's arteritis [11] and RA [12].Under certain circumstances such as anti-IL-6 therapy when C-reactive protein (CRP) levels are less useful in monitoring inflammation, NLR has been shown to be a better marker of evaluating disease activity in patients with RA [13].However, leukocyte measures are not readily available in many studies, especially from prospective population-based cohorts due to archiving of blood samples.This limits the evaluation of immune parameters and immunomodulation which are of immense importance in a chronic disease research setting [14].Two recent studies have shown the utility of a methylation-derived NLR (mdNLR) index from peripheral blood DNA as an alternative measure of NLR and have applied this as a marker of cancer development and progression [15,16].
We hypothesized that SI in RA could be assessed by measuring mdNLR and could be used as a research tool for assessing SI especially in a chronic disease setting without the need for fresh samples.Further, we hypothesized that treatment for RA might reduce any association between RA and mdNLR.In this regard, we tested if mdNLR might be associated with treatment response to the tumour necrosis factor inhibitor (TNFi) etanercept.

Materials and Methods
Figure 1 provides an outline of the study design.

Study Samples
2.1.1.New Onset RA Dataset.The raw methylation data and covariates for rheumatoid arthritis cases and controls were obtained from the publicly available Gene Expression Omnibus submitted dataset GSE42861 which was part of the Epidemiological Investigation of Rheumatoid Arthritis (EIRA) study [17,18].Only incident RA cases were invited for the study within the years 1996-2009 from middle Sweden.The controls matched by sex, age, smoking status, and residence area were selected from the same population as previously described [17].Cells for isolating DNA were obtained from the patients during their first visit to the rheumatology department before giving any disease-modifying antirheumatic drugs (DMARDs) [19].Methylation data were generated on DNA isolated from EDTA-treated blood samples of anticitrullinated protein antibody-(ACPA-) associated subtype of rheumatoid arthritis and controls using the Illumina HumanMethylation450 BeadChip array [17].In the new onset RA dataset, of the 689 samples available for analysis (controls = 335, cases = 354), we removed 2 outlier samples (for mdNLR) and 2 samples with no information on smoking status, leaving us with 685 samples for the final analysis including 352 ACPA-associated RA cases and 333 controls.
2.1.2.Preclinical RA Dataset.To assess whether mdNLR can be used as a predictive biomarker of future RA diagnosis in individuals who were disease-free at the time of blood collection, we used data from the Avon Longitudinal Study of Parents and Children (ALSPAC).DNA samples were collected when the women in the study were pregnant, which was 18 years prior to completing a questionnaire ascertaining RA status.The women were asked whether they had ever been diagnosed with RA by a doctor and what year they had first been diagnosed.HumanMethylation 450K Bead-Chip array data for roughly 1000 women were generated as part of the Accessible Resource for Integrated Epigenomics Studies (ARIES) project [20].Of these, a random sample of 200 women who had never been diagnosed with any form of arthritis 18 years after enrolment were classified as "disease-free" controls for the purposes of the current study.A further 48 women were selected as "preclinical RA" cases (14 from the original ARIES study plus 34 ALSPAC women The sample selection and preparation have been described previously [21].Fifteen samples with missing information on smoking status were removed from the main analysis leaving 56 samples for the final analysis.

Data Preprocessing and Estimating Proportion of
Leukocyte Cell Types 2.2.1.New Onset RA Dataset.The raw data was normalized using subset-quantile within array normalization (SWAN) algorithm [22].We removed bad quality probes (detection P > 0 01), probes containing SNPs in the CpG interrogation site or single-nucleotide extension site, cross-reactive probes, and probes on chromosomes X and Y [23].Proportions of leukocyte subtypes-granulocytes, monocytes, and lymphocytes (CD4 + T cells, CD8 + T cells, B cells, and NK cells)-were estimated by (i) the "estimateCellCounts" function implemented in the Bioconductor package minfi [24] according to the Houseman method [25] and (ii) an optimized reference-based cell mixture deconvolution methodology IDOL [26].Methylation-derived neutrophil-to-lymphocyte ratio (mdNLR) was estimated by dividing estimated proportions of granulocytes by lymphocytes as previously described [15].Previous studies have reported a strong agreement between mdNLR and cytological NLR estimates instilling confidence in the DNA methylation-based estimates of leukocytes and methylation-derived SI [15,16].

Preclinical RA and Prevalent Treated RA Datasets.
Peripheral blood samples were collected according to standard procedures, spun, and frozen at −80 °C.Isolated DNA was bisulphite converted using the Zymo EZ DNA Methylation™ kit (Zymo, Irvine, CA).Following conversion, the genome-wide methylation status of over 485,000 CpG sites was measured using the Illumina HumanMethylation 450K BeadChip assay according to the standard protocol.The arrays were scanned using an Illumina iScan, and initial quality review was assessed using GenomeStudio (version 2011.1).Quality control of ARIES samples has been described previously [15].Data for both time points (during pregnancy and 18 years after) and both subsamples (ARIES and extra samples from ALSPAC) were preprocessed as a single set in R (version 3.0.1)with the wateRmelon package according to the subset-quantile normalization approach [22].The mdNLR was calculated as described above using the "estimateCellCounts" generated using the Houseman method in minfi [24,25].Data for the prevalent treated RA dataset and the preclinical RA dataset were normalized together.

Treatment Response
Dataset.Seventy-one samples were available for the assessment of mdNLR as a biomarker of response to TNFi (etanercept) therapy.The raw data were preprocessed and normalized as previously described [21].The proportion of leukocyte subtypes was estimated by "esti-mateCellCounts" function implemented in minfi, and mdNLR was derived as described above.

Statistical Analyses.
The analyses were performed using the statistical software R (version 3.4.0).Univariate and multivariate logistic regression was performed to test the association between mdNLR and RA status.A Wilcoxon rank sum test was performed to test if RA cases and controls differed in the levels of DNA methylation at five sites that have recently been identified as CpGs arising during myeloid differentiation that could serve as surrogates for mdNLR [16].
In the new onset RA dataset, Liu et al. reported an imbalance between the number of cases and controls run per date which may potentially confound our analysis.Surrogate variable analysis is used for identifying, estimating, and incorporating sources of variation in gene expression and DNA methylation analysis [27,28].To identify technical sources of variation, we performed SVA and derived ten surrogate variables in the new onset dataset.Individuals' age, gender, and smoking status have been previously shown to be associated with the risk of developing RA [29,30].Hence, we incorporated age, gender, and smoking status along with 10 surrogate variables in the statistical model.The ability of mdNLR to classify RA cases from controls was assessed using Receiver Operating Characteristic (ROC) curves and the corresponding area under the ROC curve (AUC) values using the R package pROC [31].For the preclinical and prevalent treated RA datasets, multivariate logistic regression was 3 Journal of Immunology Research The statistical model was adjusted for individuals' age, smoking status, and bisulphite conversion plate.In the treatment response dataset, the association between mdNLR at the beginning of the treatment and response at 3 months was evaluated using a multivariate logistic regression model adjusting for individuals' age, gender, smoking status, use of DMARDs, and disease activity score 28.

Sample Characteristics.
The sample characteristics including demographic and epidemiological data for all four datasets are shown in Table 1.New onset RA, preclinical, and treated prevalent RA cases did not differ from controls in terms of mean age and smoking status (Table 1).In the treatment response dataset, the nonresponders were older and had a higher health assessment questionnaire score compared to good responders (Table 1).

mdNLR Is Elevated at Rheumatoid Arthritis Disease
Onset.In the new onset RA dataset, we compared the mdNLR derived using two different algorithms ("estimate-CellCounts" and IDOL) and found that the two methods of estimating mdNLR were highly correlated (r = 0 87, P < 2 × 10 −16 , Supplementary Figure 1).For this reason, the mdNLR derived using "estimateCellCounts" was used for further analyses.
We observed an elevated neutrophil and decreased lymphocyte count in new onset RA patients compared to controls (Figure 2(a)).The mean mdNLR for controls was 2.0, compared to 4.7 in new onset RA cases (Figure 2(b)).In a multivariate logistic regression model, a higher mdNLR index was associated with increased odds of being an RA case (OR = 2.32, 95% CI = 1.95-2.80,P < 2 × 10 −16 ) as shown in Table 2. Further, we found that all five CpGs associated with myeloid differentiation (suggested to be a surrogate for mdNLR) were hypomethylated in RA cases compared to controls (Supplementary Table 1, Supplementary Figure 2).On its own, mdNLR was able to distinguish RA cases from controls with an AUC of 0.80 (95% CI = 0.77-0.83),which was higher than covariates alone including individuals' age, gender, and smoking status (AUC = 0.56 ,95% CI = 0.52-0.60,P < 2 × 10 −16 for a difference between the two AUCs).Including the covariates with mdNLR did not improve the ROC curve, and AUC remained at 0.80 (95% CI = 0.77-0.84)(Figure 3).

Elevated mdNLR Is Associated with Increased Odds of
Being a Preclinical RA Case.We found increased odds of being a preclinical RA after adjusting for batch effects (OR adj = 4.30, 95% CI = 1.52-21.71,P = 0 029; Table 2).We did not observe a difference in the mean mdNLR between preclinical RA case and controls (Table 1).There  5 Journal of Immunology Research was no association between mdNLR in pregnancy and time to RA diagnosis (Supplementary Figure 3), and mdNLR had limited diagnostic ability to discriminate preclinical RA from controls in this small sample (Supplementary Figure 4(a)).

mdNLR Is Not Associated with Treated Rheumatoid
Arthritis.Women with RA who were assumed to be using DMARDs had a lower mean mdNLR compared to controls (Table 1), but confidence intervals around the odds ratio crossed the null in multivariate model adjusted for batch (OR = 0.34, 95% CI = 0.05-1.82,P = 0 23; Table 2).In a ROC analysis, mdNLR was unable to discriminate RA cases from controls (Supplementary Figure 4(b)).

mdNLR Is Not Associated with TNFi Treatment Response.
A higher mdNLR index was not associated with increased odds of being a poor responder to TNFi (etanercept) treatment compared to a good responder (OR = 1.10, 95% CI = 0.75-1.68,P = 0 64; Table 2).

Discussion
In this investigation of four datasets, we have identified mdNLR as a marker of SI and RA status.We were able to demonstrate the utility of mdNLR as a marker of chronic SI in untreated RA, which is in line with the findings of previous studies [12,32].The mdNLR had an improved diagnostic ability over age, sex, and smoking alone, although the poor performance of the covariates may be explained by the original study design, which matched cases and controls on smoking status.We also found associations between RA status and methylation at five myeloid differentiation CpGs, which may potentially indicate myeloid suppression [16].This finding may reflect the presence of myeloid-derived suppressor cells (MDSCs) in RA cases.MDSCs are known to act as suppressors of antitumour immune responses [33,34] and have been shown to play a role in arthritic progression in mice and RA patients [35].Interestingly, we found that pregnant women with elevated mdNLR had increased odds of being a preclinical RA compared to disease-free pregnant women, suggesting that mdNLR may have some utility in identifying RA before clinical features manifest.However, it should be noted that prior to batch effect correction, there was no difference in the mean mdNLR between RA cases and controls suggesting that any difference between the two groups are potentially masked by batch effects.These findings would need replication in an independent dataset.Similarly, we found no association between mdNLR and time to diagnosis; this may be due to low statistical power in this small dataset.We found lower (nonsignificant) mdNLR in treated RA cases compared to controls; these findings are not surprising because treatment with DMARDs is hypothesized to reduce inflammation [36] and would therefore affect mdNLR.For example, a recent meta-analysis showed that treatment with TNFi reduces SI [37].TNF is a key cytokine in the inflammatory response, stimulating both its own production and the production of many other inflammatory cytokines.It has been shown to have a dominant role in RA, hence the rationale for TNF inhibition as a therapeutic target in RA [38].We hypothesized that a high inflammation index at the baseline (reflected by a high mdNLR) may be associated with TNFi treatment response.However, our finding of no association between mdNLR and response to the TNFi etanercept suggests that response to TNFi may be predominantly a genetically and/or epigenetically driven process independent of the baseline SI.Although with the limited sample size, we are unable to conclusively prove the role of SI in TNFi treatment response.
Strengths of our study include the use of four datasets, giving us the ability to explore varied roles for mdNLR in untreated, treated, and preclinical RA cases, as well as in relation to treatment response.We carried out multivariate analyses adjusting for appropriate potential confounders and sources of variation.For example, smoking is suggested to play a critical role in the development of RA through altered immunologic function [1], and our recent study identified an altered number of immune cells in response to smoking [39].The original study that generated the new onset RA dataset attempted to control potential confounding by smoking by matching cases and controls on smoking status [17].In our analyses, we further adjusted models for smoking status to address any residual confounding that may have occurred due to smoking.
The limitations of our study are as follows.First, we were unable to identify independent datasets in which to replicate our findings.This could be attributed to the fact that most of the published and publicly available RA datasets have genome-wide DNA methylation data on specific cell types like T and B lymphocytes [40,41] or fibroblast-like synoviocytes [42] which cannot be used to derive mdNLR.Second, in the new onset RA dataset, we were only able to adjust for the 7 Journal of Immunology Research variables that were publicly available, which included age, gender, and smoking status.However, we believe that by using surrogate variables, we were able to capture and adjust the potential sources of heterogeneity that are not captured by variables included in the model.Third, we were unable to compare mdNLR to NLR due to the absence of directly measured blood cell type proportions for the studied datasets.Although, confidence in our measured methylation-derived SI is strengthened by previous studies that have validated the use of methylation-derived cell counts in estimating SI [16,26].Fourth, in the ALSPAC sample, we assumed that all prevalent RA cases were undergoing some form of treatment because they all reported that they had been diagnosed by a doctor; however, treatment and dosage data were not available.Finally, we were unable to adjust for the ACPA levels for the preclinical RA dataset as the ACPA levels were not measured for the samples in the ALSPAC dataset.
The NLR is routinely derived from absolute neutrophil and lymphocyte counts from a complete blood count in RA patients.We envisage that mdNLR will be useful for epidemiologists as a research tool to investigate SI using archival blood samples in the absence of cell-based NLR estimates.Our confidence in estimating mdNLR as a measure of systemic inflammation is strengthened by previous studies that have reported a high concordance between cell-based and methylation-derived NLR [16,26].

Conclusions
In conclusion, we have demonstrated that mdNLR is elevated during RA disease onset, but not in prevalent cases.This may be reflective of a higher SI in RA patients prior to DMARD therapy.Our findings would be useful in estimating SI especially in prospective studies where the estimates of leukocyte subtypes were not recorded at recruitment.It remains to be tested if mdNLR measures SI independent of acute phase proteins such as CRP.In the future, it would be interesting to validate our findings in a large prospective study of RA and note how early we can detect SI during the disease development.Finally, we would be interested in testing if mdNLR may be a useful tool to evaluate SI in a chronic disease research setting, especially in large prospective studies with archived blood samples.

Data Availability
ALSPAC data management policies do not permit datasets to be made publicly available due to data confidentiality and the potential to identify individual study participants from the data.Data used will be made available following an approved request from the ALSPAC executive (alspacexec@bristol.ac.uk).The ALSPAC data management plan is available online: http://www.bristol.ac.uk/media-library/ sites/alspac/documents/researchers/data-access/alspac-datamanagement-plan.pdf.BRAGGSS data is available on request from Professor Anne Barton (anne.barton@manchester.ac.uk) or is available from the corresponding author upon request.

Figure 2 :
Figure 2: Leukocyte cell subtypes and mdNLR in new onset RA cases and controls.(a) Estimated proportions of leukocyte subtypes (monocytes, neutrophils, and lymphocytes) in controls and new onset RA cases.(b) Higher mdNLR SI index in new onset RA cases compared to controls (P < 2 2e − 16).

Figure 3 :
Figure 3: Diagnostic ability of mdNLR to distinguish new onset RA cases and controls.Each ROC curve was generated from a different classifier: shown below the figure with the area under the ROC curve (AUC) values.Covariates included age, gender, and smoking status.

Table 1 :
Sample characteristics of datasets.

Table 2 :
Association between mdNLR and rheumatoid arthritis status.