Next Generation Exome Sequencing of Pediatric Asthma Identifies Rare and Novel Variants in Candidate Genes

Multiple genes have been implicated to have a role in asthma predisposition by association studies. Pediatric patients often manifest a more extensive form of this disease and a particularly severe disease course. It is likely that genetic predisposition could play a more substantial role in this group. This study is aimed at identifying the spectrum of rare and novel variation in known pediatric asthma susceptibility genes using whole exome sequencing analysis in nine individual cases of childhood onset allergic asthma. DNA samples from the nine children with a history of bronchial asthma diagnosis underwent whole exome sequencing on Ion Proton. For each patient, the entire complement of rare variation within strongly associated candidate genes was catalogued. The analysis showed 21 variants in the subjects, 13 had been previously identified, and 8 were novel. Also, among of which, nineteen were nonsynonymous and 2 were nonsense. With regard to the novel variants, the 2 nonsynonymous variants in the PRKG1 gene (PRKG1: p.C519W and PRKG1: p.G520W) were presented in 4 cases, and a nonsynonymous variant in the MAVS gene (MAVS: p.A45V) was identified in 3 cases. The variants we found in this study will enrich the variant spectrum and build up the database in the Saudi population. Novel eight variants were identified in the study which provides more evidence in the genetic susceptibility in asthma among Saudi children, providing a genetic screening map for the molecular genetic determinants of allergic disease in Saudi children, with the goal of reducing the impact of chronic diseases on the health and the economy. We believe that the advanced specified statistical filtration/annotation programs used in this study succeeded to release such results in a preliminary study, exploring the genetic map of that disease in Saudi children.


Introduction
Asthma and other allergic diseases, including allergic rhinitis, eczema, and food allergy, cause a substantial burden of disease in childhood. Although a rapid increase in asthma and allergies has been identified over the latter part of the 20th century, the reasons for this are still unknown [1]. Recent changes in environmental factors and their interactions with genetic profiles have been suggested as major factors responsible for the increase in asthma and allergic diseases [2].
Asthma, a chronic inflammatory respiratory condition characterized by hyperresponsive airways and reversible airflow obstruction, is a substantial public health problem that affects nearly 155 million individuals worldwide with the prevalence of current asthma and is higher in children compared than adults [3][4][5]. Although environmental factors are important, there are strong genetic predispositions for the development of allergic diseases. It has been reported that there are more than 100 candidate genes in every chromosome which are identified to have a linkage with asthma and the strength of association of these single-nucleotide polymorphisms (SNPs) with asthma varieties in different parts of the world [3][4][5]. A better knowledge of asthma susceptibility will hold promise for a better understanding of the pathology, diagnosis, prevention, treatment, and management of this increasingly frequent disease.
Next-generation sequencing (NGS) technology, in particular exome sequencing, currently represents the most powerful and cost effective approach to identifying variation in the human genome and has already been shown to uncover important disease-causing variation missed by GWAS studies [6][7][8][9][10][11][12][13]. Previous studies have implicated rare variants in asthma and asthma-related traits for a number of relevant genes [14,15], suggesting that rare variants may indeed play an important role in asthma susceptibility.
Chromosome 17q21 was the first asthma susceptibility locus discovered by genome-wide association studies (GWAS) [10,14,[16][17][18][19]. However, none of the genes within the locus had previously been implicated in asthma pathogenesis. SNPs in17q21 showing highly significant associations with childhood asthma correlated with the expression of ORMDL3 transcripts, suggesting ORMDL3 was a plausible asthma candidate gene in the locus [1]. Later, allelespecific gene expression was also observed for other genes in the 17q21 locus [20].
At least twelve GWAS of asthma have been conducted and have yielded numerous associations, with the most significant (and in most cases replicated) associations occurring in or near the following genes: ORMDL3 [1,2], PDE4D [21], HLADRB1 [2], HLA-DQ [2,3], RAD50-IL13 [3], DENND1B [4], TLE4 [5], SMAD3 [2], IL1RL1 [22], IL18R1 [2], IL33 [2], IL2RB [2], RORA [2], and SLC22A5 [2]. These findings have greatly expanded our understanding of the disease, having identified several novel genetic loci that had never previously been implicated in the pathogenesis of asthma (e.g., ORMDL3, RAD50, DENND1B, and TLE4). Despite these successes, no definitive causal variants have been identified in any of these genes. It is asserted that the associated variants are in LD with the causal variants in these genes, but more effort must be made to identify causal variants so that the biology of these genes in the etiology of asthma can be better understood.
Despite the success of GWAS in identifying the associated genetic variants for complex diseases and more than 1000 studies in the past few years conducted to identify the genetic complexity of many immune diseases including asthma, this approach still could-in part-explains the heritability of asthma regarding the clinical prediction of phenotypic heritability and immunological pathways [23].
This study is aimed at revealing the genetic determinants for pediatric asthma in Saudi Arabia using whole exome sequencing technique, reviewing the results with similar international studies.

Recruitment of Pediatric Asthma Cohort of Patients.
Seventy-nine children included in this study were initially selected from the Pediatric Allergy and Immunology Clinic, Maternity Children Hospital Makkah, KSA, between Jan 2014 and Oct 2015. All children had been diagnosed with asthma after clinical evaluation in addition to both physiological assessment including pulmonary function tests (PFT) and the use of the International Study of Asthma and Allergies in Childhood (ISAAC) questionnaire (modified for the population) (note that PFT was not considered for young age who was not able to perform the test properly). They were aged between 5 and 14 years at the time of recruitment. Written informed consent was obtained from the attending parents of all the children. In the initial recruitment interview, clinical data and venous blood samples (3 ml of whole blood for CBC and DNA extraction and 3 ml for plasma separation) were collected.
Additional comprehensive clinical data were extracted from their medical records with their consent. For each patient, the information gathered included gender, dates of birth and initial diagnosis, laboratory investigations, physiological assessment, disease history, parents' history for any allergic and/or other autoimmune diseases, medication history (use of steroids, immunomodulators, and biological therapies), and history of potential allergen such as carpet, plants, and/or animal exposure.

Selection of Samples for Whole Exome Analysis.
Nine asthmatic patients submitted for NGS were recruited based on both clinical evaluation and phenotypic criteria using the following: (i) Global Initiative for Asthma (GINA) guidelines [24] in addition to the use of the International Study of Asthma and Allergies in Childhood (ISAAC) questionnaire (modified for the recruited patients) as initial identification with considerations of medications used (β-agonists and steroids) (ii) A history of respiratory symptoms such as shortness of breath, chest tightness, wheezing, and coughing that varies in intensity over time, as well as variable expiratory airflow limitation (iii) Physiological tests including both PFT and the spirometry measurements (iv) Only the patients not recorded with any other allergy symptoms-like skin or food allergy-who were submitted to whole exome sequencing (see (Table 1)) This study will be extended by performing genotyping for all selected patients for measuring the incidence of the identified variants.

Selection of a Panel of Known Asthma Target Genes.
All studies in the GWAS catalogue with "asthma" as a phenotype or keyword till 28 March 2017 were reviewed, and studies identifying asthma susceptibility variants that included pediatric subjects were identified and the significant loci retrieved. Following this, 16 additional genes from four studies were added following manual review of the literature. For a list of studies contributing to the candidate gene list (see Supplementary Table 3-3). In a total of twenty loci encompassing 131 potential candidate genes among which 110 genes were covered in this analysis.

Results
Verified BamIDf remix values for eight samples were below the 0.02 threshold and did not indicate substantial levels of contamination. One sample (subject 9) showed evidence for contamination and was excluded from further analysis. The minimum mean read depth was 47.7x, and the minimum coverage at 10x depth was 86.6% across the eight samples (Supplementary Table 2 In a total of 8 subjects, 21 variants were identified following filtering for potential functionality, among which 13 had  3 Disease Markers previously been identified in dbSNP and 8 were novel ( Table 2). Nineteen of the 21 variants were nonsynonymous and 2 were nonsense (stop gain). Among the 13 nonsynonymous variants in dbSNP, 2 variants have been previously studied. The rs3923647 was reported to be associated with increased production of the Th1 cytokines, IFN-γ and IL-2, following BCG vaccination [37]. The variant rs16889462 was identified and encoded for SLC30A8 gene reported for many functions in T2D pathophysiology involving the lowering of T2D risk in case of reduced SLC30A8 gene activity [38]. With regard to the novel variants, the 2 nonsynonymous variants in the PRKG1 gene (54041969 and 5404970) were present in 4 cases (1, 3, 4, and 5), and a nonsynonymous variant in the MAVS gene (3842992) was identified in 3 cases (4, 5, and 9).

Discussion
This report describes the results of a first study of exome sequencing in Saudi children diagnosed with asthma. Exome sequencing of eight children in this study has identified 21 potentially deleterious SNPs in known asthma genes; among them, eight novel variants and 13 variants previously deposited in dbSNP.
In this study, the genetic association was not repeated in the selected samples because this study was not designed as a family-based study. Our study was presented to define-for the first time in KSA-the genetic variants that are mostly associated with the asthma development in pediatric age in Makkah region through a large asthma GWAS cohort study, recognizing the possible genetic causes that implicated in asthma and perhaps also suggesting related biological pathways that play a role in the pathogenesis of asthma. Using advanced filtration and annotation programs, the total resulted variants were reduced to only 28 candidate genetic variants on 10 loci which were associated with childhoodonset disease.
On chromosome 1, three genetic variants were identified in three different samples (cases 3, 4, and 7) with two novel variants. One was reported previously, rs147827524, encoded for pyrin and HIN domain family member 1 gene (PYHIN1) that has been accounted for HIN200 proteins which are primarily nuclear proteins involved in transcriptional regulation of genes important for cell cycle control, differentiation, and apoptosis in addition to a surprisingly large proportion of asthma risk in people of African descent [39]. The two novel SNPs were discovered, rs152488110 and rs180651520, encoded for both cysteine-rich C-terminal 1 gene (CRCT1) and xenotropic and polytropic retrovirus receptor 1 gene (XPR1), respectively. The CRCT1 SNP is located near the end of the exon region in the gene and among many nonsynonymous reported SNPs. Hence, we supposed that it will be of nonsignificant effect on the protein function. However, the second variant, XPR1, has already been mentioned to play a role in modulating human airway smooth muscle (ASM) contraction, cell growth, and proinflammatory cytokine production that promote bronchoconstriction, airway inflammation, and remodeling in asthma [40]. On chromosome 2, only one variant was identified that was reported before, rs184451758, encoded for tensin 1 gene (TNS-1), the gene that has been accounted essentially for myofibroblast differentiation and extracellular matrix formation. The polymorphism in that gene is reported to be significantly associated with full form COPD risk [41]. On chromosome 4, three polymorphic variants were identified, one previously reported, rs3923647, encoded to Toll-like receptors1 (TLR1) polymorphisms that seems to play a role in susceptibility to asthma, atopic eczema, and allergic rhinitis [42]. The other two polymorphisms identified, rs745975778 and rs61529635, were reported for the same gene, synaptopodin 2 (SYNPO2), that was reported to be associated with total serum IgE in asthmatics in an independent GWAS, suggesting roles for this gene in asthma [43]. On chromosome 6, two closed sequentially novel variants were identified, rs32946010 and rs32946011, encoded for the same transcription factor element gene, bromodomain (BRD2), located on exon 8 (of 12 exons for that gene) where its protein has been shown to modulate transcription, in particular, in cell cycleinduced transcriptional activation. It has been reported recently that BRD2 protein inhibition attenuates neutrophil-dominant allergic airway disease in mice models [44]. On chromosome 8, one nonsynonymous polymorphic variant was identified, rs16889462, that was reported previously encoding one of the zinc efflux transporters, solute carrier family 30 member 8 (SLC30A8), which has been classified as one of the major components for providing zinc to insulin maturation and/or storage processes in insulinsecreting pancreatic β-cells. Five genome-wide association studies (GWAS) identified SLC30A8 polymorphism rs13266634 among Asian and European but not African populations [45]. On chromosome 9, only one reported polymorphism was identified, rs35642290, encoded for the cytoskeletal protein, talin 1 (TLN-1), which is considered one of the genes that might be associated with total IgE in asthmatics [46]. On chromosome 10, surprisingly, three variants in the same gene were identified including one previously reported SNP and two sequential novel ones, rs54041969 and rs54041970, identified in 50% of samples (cases 1, 3, 4, and 5) located on exon 14 (of 18 exons reported for that gene). All three variants encode the same protein kinase c GMP-dependent 1 gene (PRKG-1). All PRKG protein isoforms act as key mediators of the nitric oxide/cGMP signaling pathway and are important components of many signal transduction processes in diverse cell types [47]. It was reported that asthma is typically associated with high levels of exhaled nitric oxide (NO) which reduce the normal levels of S-nitrosothiols, which act as a bronchodilator in the airway [47].
On chromosome 12, one previously reported SNP, rs747186265, was identified-in only one sample (case number 6)-encoded for the otogelin-like gene/protein, which has been accounted to be expressed in the inner ear of vertebrates with the highest level of expression seen at the embryonic stage. No significant ear complications were recorded for that child in our study. We suppose that this variant was not significantly involved in the pathophysiology of asthma development. Also, we suppose that this SNP does not affect the protein structure and function even if it is 4 Disease Markers    indicates a heterozygous genotype. Note: sample number 9 primarily selected was removed after Qc of genomic data. 6 Disease Markers classified as nonsynonymous, and/or we recommend performing further studies for those particular SNP effects. On chromosome 19, four polymorphisms were identified in our study including two previously reported SNPs, rs142299823 and rs74939505, encoded for both genes of zinc finger protein family, ZNF30 and ZNF154, respectively, involved in the process of DNA binding transcription factor activity. Additionally, two novel variants were identified both in two children in the analysis (cases 4 and 5), nonsynonymous (rs7267725) and stop gain (rs7267726). Both variants were encoded for the insulin receptor (INSR) gene region that was suggestively associated with asthma risk [48]. On chromosome 20, only one nonsynonymous previously reported variant, rs779234123, was identified-in three samples (cases 4, 5, and 8)-encoded for mitochondrial antiviral-signaling gene (MAVS), which expresses the protein required for protein kinase activity which is essential for gene expression. Impaired antiviral interferon expression may be involved in asthma exacerbations commonly caused by rhinovirus infections in asthmatic patients [49]. Numerous genetic and molecular studies have been carried out in the field of asthma in both the children and adults previously [50][51][52][53][54][55][56][57][58][59][60]. Among genetic studies, exome/NG sequencing studies have been documented throughout the global populations [61,62]. The confirmed observations along with the genetic conclusions confirm the attractive genetic susceptibility factors in asthma patients. It is possible that genetic and nongenetic novel and documented variants might play a major role in the Saudi asthma patients. However, the limited patient's number and missing the screening of novel variants present a limitation of this study.
Not only genetic factors but also environmental, racial, and ethnic factors are associated with the asthma etiology; also, the dietary factors may have an active and direct impact on asthma symptoms [63][64][65]. Asthma can be triggered by exposure to many environmental factors [66]. Allergic asthma which is caused by allergens like pet dander, pests, dust mites, and mold is more common in children than nonallergic asthma. Nonallergic asthma can be caused by certain factors such as viruses, anxiety, stress, cold or dry air, and smoke. The incidence of allergic asthma is highest in childhood, while the incidence of nonallergic asthma is peaks in late adulthood [67]. It is reported that the African American and Hispanics are more susceptible to asthma, and the morbidity is high in this category [68,69]. Children living in single-parent households are twice likely to be diagnosed with compared to children living of married parents [70]. However, to our knowledge, how these factors are contributing in the Arab population to asthma etiology is not fully understood, because limited studies are available regarding this disease from Middle Eastern countries.

Study Strengths and Limitations.
The results of this study are unique genetic analyses of children with asthma in Saudi Arabia, identifying novel nonsynonymous polymorphisms that may correlate with severe asthma and weak response to treatment in children. This study will form the basis of future research in the field and would be of archival and perhaps diagnostic purposes.
This study has a limitation in terms of sample size also. Although we have initially recruited 79 cases with asthma in this study, due to budget constraints, we could do only 9 cases analysis by whole exome sequencing. This study has a limited number of patients and would benefit from larger sample size and perhaps comparison with genetic profiles in different countries.

Conclusions
In conclusion, this is the initial exome sequenced study implemented in the Saudi children diagnosed with asthma. Based on early studies and the results we found in this study, we assume that genetic variants might play a role in the increased susceptibility for the development of asthma. Other variants present in this study cannot be avoided considering the high number of loci and its specific genetic role involved within the disease in the global population. Future studies recommend to screen more patients for novel variants within the Saudi population to rule out its role in the asthma disease.
Abbreviations DNA: Deoxyribonucleic acid EDTA: Ethylene Diamine tetra acetic acid SNP: Single-nucleotide polymorphism QC: Quality control KSA: Kingdom of Saudi Arabia.

Data Availability
Data are available in the Supplementary Materials provided.

Conflicts of Interest
All investigators have no affiliations with or involvement in any organization or entity with any financial interest or nonfinancial interest (such as personal or professional relationships, affiliations, knowledge, or beliefs) in the subject matter or materials discussed in this manuscript.