Identification of Immune Related LRR-Containing Genes in Maize (Zea mays L.) by Genome-Wide Sequence Analysis

A large number of immune receptors consist of nucleotide binding site-leucine rich repeat (NBS-LRR) proteins and leucine rich repeat-receptor-like kinases (LRR-RLK) that play a crucial role in plant disease resistance. Although many NBS-LRR genes have been previously identified in Zea mays, there are no reports on identifying NBS-LRR genes encoded in the N-terminal Toll/interleukin-1 receptor (TIR) motif and identifying genome-wide LRR-RLK genes. In the present study, 151 NBS-LRR genes and 226 LRR-RLK genes were identified after performing bioinformatics analysis of the entire maize genome. Of these identified genes, 64 NBS-LRR genes and four TIR-NBS-LRR genes were identified for the first time. The NBS-LRR genes are unevenly distributed on each chromosome with gene clusters located at the distal end of each chromosome, while LRR-RLK genes have a random chromosomal distribution with more paired genes. Additionally, six LRR-RLK/RLPs including FLS2, PSY1R, PSKR1, BIR1, SERK3, and Cf5 were characterized in Zea mays for the first time. Their predicted amino acid sequences have similar protein structures with their respective homologues in other plants, indicating that these maize LRR-RLK/RLPs have the same functions as their homologues act as immune receptors. The identified gene sequences would assist in the study of their functions in maize.


Introduction
Pathogens, including bacteria, fungi, oomycetes, and viruses, are a major threat to the global human food supply. They attack plants in an attempt to gain nutrients from them. During the course of evolution, both plants and pathogens have evolved methods to combat each other. Plants, like animals, are equipped with immune receptors for recognizing invading pathogens and activating innate immune responses [1,2]. It is widely accepted that plant immune responses consist of two branches of resistance. The first branch involves plasmamembrane localized pattern recognition receptors (PRRs) that recognize the conserved microbial molecules referred to as pathogen-associated molecular patterns (PAMPs) [2,3]. The second requires intracellular receptors that are proteins encoded by classical plant disease resistance (R) genes to detect the presence of pathogen proteins inside the host cell [4]. The largest group of plant immune receptors that were identified 20 years ago are cytoplasmic nucleotide binding site (NBS) leucine rich repeat (LRR) proteins encoded by R genes [5]. The NBS-LRR genes have been further subdivided into two main groups, based on their N-terminal structures [6][7][8][9]. The first group possesses a domain with homology to the intracellular signalling domains of the Drosophila Toll and mammalian interleukin-1 receptors and is referred to as TIR-NBS-LRRs or TNLs. The second, non-TNL, group is collectively known as CC-NBS-LRRs or CNLs, based on the presence of a predicted N-terminal coiled-coil domain in some, but not all, members of this group [10]. Studies on the structure of cloned R genes revealed that not all R genes encode NBS-LRR proteins. Several R genes encode transmembrane receptor-like kinase/proteins (RLK/RLP) that are one of the most important groups of cell surface receptors [4,11]. A typical RLK contains an extracellular receptor domain (LRR) to perceive a specific signal, a transmembrane 2 International Journal of Genomics domain (TM) to anchor the protein within the membrane, and an intracellular cytoplasmic kinase domain to transduce the signal through autophosphorylation followed by further phosphorylation of downstream components to regulate gene expression [12][13][14].
Since the discovery of the first plant RLK in 1990, a small number of RLKs have been functionally characterized, such as flagellin sensing 2 (FLS2), tyrosine-sulfated peptide receptors PSKR1 and PSY1R [15][16][17]. The extracellular LRR domains in these proteins that act as cell surface immune receptors or components of receptor complexes contribute to plant defense/pathogen-recognition. Further studies on the molecular structure and function of flagellin reveal that numerous signal perception and transduction systems are needed in plants to recognize all potential invaders [18]. Indeed, more than 400 genes encoding RLK sequences with various receptor configurations are present in the genomic sequence of the Arabidopsis thaliana, of which 216 members contain an LRR in the extracellular domain [19]. In addition to RLKs, 149 R genes encoding NBS-LRR are also present in the Arabidopsis thaliana genome [20]. Therefore, the LRR-containing receptors play a crucial role in intercellular communication and disease resistance in plant immunity.
Since 1992 the first plant R gene, Hm1, which confers specific resistance against a leaf blight and ear mold disease of corn, was cloned. More than 100 R genes have been cloned from different plant species, of which approximately 80% encode R proteins with NBS-LRR domains [21]. To date, along with the development of bioinformatics technology and the availability of whole-genome sequences of several plants, such as Arabidopsis, rice, Brachypodium distachyon, maize, and sorghum, numerous NBS-LRR genes have been revealed [10,20,[22][23][24]. However, reports on how many RLK/RLP genes are present in different plant species are rare. To further understand the importance of R genes, this study aims at identifying the sequences of R genes containing NBS-LRR and RLKs from the sequenced maize genome. In doing so, the R genes may be made available for breeding purposes. The bioinformatics analysis of the R gene homologues provides a definitive resource for the ongoing functional and evolutionary studies of this large plant gene family. Meanwhile, maize (Zea mays L.) is a food staple in many regions of the world and is used for animal feed and ethanol fuel. It is the world's most extensively grown crop and has the highest world-wide production in all cereal crops [http://faostat.fao.org/]. In addition to its economic value, maize is also an important model plant for studies in plant genetics, physiology, and development. Thus, our work will shed more light on the maize immune system and the findings will provide a strong groundwork for the isolation of candidate R genes in maize.

The Maize
Genome. The complete genome sequence (RefGen v3) of Zea mays (maize inbred line B73) collected in Ensemble (http://plants.ensembl.org/Zea mays/Info/Index) was used in the genomic analysis of encoding LRR-receptorlike genes.

Identification of NBS-LRR and LRR-RLK Encoding Genes.
Plant Gene Family Database (PGF-DB), a database for analysis of gene families from Oryza sativa, Sorghum bicolor, Zea mays, and Arabidopsis thaliana, contains more than ten thousands gene families constructed by the Markov clustering (MCL) method and the complete linkage method of BLASTP. By using keyword "NB-LRR" to search for gene family in PGF-DB (http://rapdb.dna.affrc.go.jp/) we identified one relevant gene family, "NB-ARC." This family includes a number of maize genes that are homologues of known rice NBS-LRR genes. The obtained maize genes were used as a set of candidate NBS-LRR genes to search the maize Ensemble database. First, the complete set of NBSencoding genes was identified in the genome of Zea mays by reiterative process of using the NBS domain from the Pfam database (PF00931; http://pfam.sanger.ac.uk/search). The threshold expectation value (E-value) was set to 10 −4 which corresponds to that described in rice and Arabidopsis [20,22]. In the second step, a set of LRR-encoding genes was identified using the conserved LRR-domain from the Pfam database (PF00560, PF08263, and PF12799) and then the encoding NBS-LRR genes were identified from the above two sets of NBS-and LRR-encoding genes. Sequences found multiple times were identified by multiple sequence alignments using Clustal W2 [25] and the redundant sequences were manually removed. All of the corresponding NBS-LRR candidate proteins were surveyed to determine whether they encoded TIR or CC motif. This survey was based on the Pfam database (http://pfam.xfam.org/search) and used SMART (Simple Modular Architecture Research Tool: http://smart.embl-heidelberg.de/) protein motif analysis and COILS, a program for detecting coiled coil (CC) domains (http://www.ch.embnet.org/software/COILS form.html).The NBS-LRR candidate protein was filtered out in sequences lacking the TIR or CC motif.
Using a similar method to retrieving NBS-encoding genes as described previously, a complete set of RLK-encoding genes were retrieved from the maize Ensemble database by using the protein kinase catalytic domain (PF00069) from the Pfam database. The LRR-RLK genes were then identified using the conserved LRR-domain (PF00560, PF08263, and PF12799) from the identified RLK-encoding genes.

Gene Locations and Definitions of Gene Cluster and Gene
Duplication. The positions of all the NBS-LRR and LRR-RLK genes on maize chromosomes were defined by Ensemble Genomes Search (http://plants.ensembl.org/Zea mays/Info/ Index) using the gene names. If two or more NBS-LRR/LRR-RLK genes resided within 200 kb, a gene cluster was defined based on Houb's definition of a gene cluster [26]. If two of the paired genes were located in a cluster, this pair was designated as paired genes; if three or more genes were located within 200 kb, these genes were designated as multigenes. The geneduplication events of NBS-LRR and LRR-RLK genes were also investigated in accordance with the below criteria when the encoded amino acid sequence was used as a query in BLASTP searches for possible homologues in the Zea mays genome: (1) the sequence alignment covered >70% of the longer gene; (2) the aligned region had an 70% identity.

Alignment and Analysis of Sequence.
Proteins encoded by LRR-RLK genes are crucial cell surface immune receptors for plant disease resistance. The LRR-RLK homologues in maize were found by searching the available maize nonredundant protein sequences database (taxid:4577) with BLASTP (http://blast.ncbi.nlm.nih.gov/Blast.cgi). The scoring parameters for BLASTP search were set as default: BLOSUM62 was used in the protein weight matrix, gap costs were set as 11 for existence and 1 for extension, and compositional adjustment was set as conditional compositional score matrix adjustment. The sequence with the highest score (the lowest E value and the highest identity) was categorized with LRR-RLK from other plants that had the same homology and phylogenetic grouping. Homology analysis of the maize LRR-RLK protein sequences with their homologue sequences was performed using MatGat software v2.02 [27]. The protein signature was also analyzed using the simple modular architecture research tool (SMART) [28]. Multiple sequence alignments were generated using Clustal W2 [25]. Based on a ClustalW multiple alignment phylogenetic tree, a similar one was constructed using the neighbor-joining method within the MEGA5 program [29] and bootstrapped 1,000 times.

Identification and Classification of NBS-LRR and LRR-RLK Encoding
Genes. Availability of the complete Zea mays genome sequence (maize inbred line B73) has made it possible for the first time to identify all the LRR-containing receptor-like genes in this plant species [24]. From the first step of NBS-filter, a total of 217 NBS-encoding genes were identified in the genomic sequence of maize inbred line B73 and collected in Ensemble. In the second step of LRR-filter, 62 of the NBS-encoding genes were subsequently found not to be LRR-encoding genes. Of the remaining genes, 151 were identified to have an NBS-LRR structure and were surveyed using Pfam, SMART, and COILS to determine whether they encoded TIR, CC motifs. 147 genes were identified to contain a CC motif, but only four genes belong to the TNLs group (see Supplementary Table S1 in Supplementary Material available online at http://dx.doi.org/10.1155/2015/231358). The fact that most of these genes belong to CNLs in maize genome further suggests that the monocots are likely to lack the TIR genes [10,22,23,30]. 64 out of 151 NBS-LRR genes were identified for the first time in this study [31] (marked in asterisk in Supplementary Table S1). The majority of these newly found genes are located on chromosomes 2, 4, and 7 (11, 18, 7, resp.). Upon initial searching of the LRR-RLK genes, 1521 protein-kinase-encoding genes were identified in the maize genome. Of these genes, 226 genes were found to be LRR-RLK protein encoding genes in a following reiterative process (Supplementary Table S2).
In maize whole-genome sequences, at least 39,469 protein-coding genes have been identified. NBS-LRR genes accounted for approximately 0.38% of all the protein-coding genes in this species. The relative proportion of these genes was over three times lower than that in the rice genome (1.23%) but slightly higher than that in the sorghum genome (0.18%) [22-24, 32, 33]. Additionally, the LRR-RLK encoding genes accounted for approximately 0.57% of maize proteincoding genes, 1.5 times higher than NBS-LRR genes. Furthermore, comparing with LRR-RLK gene numbers identified in maize, sorghum (Sorghum bicolor) had similar results with 208 [34] and 211 [35] in different studies. However the numbers in rice (Oryza sativa) varied widely from different studies which are 177 [34], 309 [36], and 353 [35], respectively.

Chromosomal Distribution of Maize NBS-LRR and NBS-RLK Encoding
Genes. NBS-LRR type R genes were identified on each of ten maize chromosomes. The genes are either located separately on each individual chromosome or in gene clusters on a single chromosome (see Section 2 for detailed definition). Their distribution on chromosomes is nonrandom and uneven. For example, chromosome 9 contains only two NBS-LRR genes, while chromosomes 10, 4, and 2 contain 32, 28, and 18 NBS-LRR genes, respectively, with about 21.19% of those genes locating on chromosome 10 ( Figure 1(a) and Table 1). Notably, two larger clusters have been found on chromosome 10, with one containing 14 genes and the other containing 5 genes (Figure 1(a) and Table 2). Similar to the distribution of sorghum NBS-LRR gene clusters [23], most of the maize NBS-LRR gene clusters were located at the distal end of each chromosome (Figure 1).
LRR-RLK genes are also located on each individual maize chromosome. However, their distribution on chromosomes is relatively even, unlike the clustered NBS-LRR genes (Figure 1(b) and Table 2). The number of LRR-RLK genes on each chromosome is between 13 (on Chr. 10) and 32 (on Chr. 4).

Duplications of NBS-LRR and LRR-RLK Genes.
During evolution, both segmental duplication and tandem duplication have contributed to the large number of gene families in plants [37]. The gene duplications have greatly expanded the NBS gene family in both monocot and dicot lineages [20,22,23]. In this study, the duplication of NBS-LRR and LRR-RLK genes was confirmed by a BLASTP comparison of all the predicted maize proteins against each other. A total of 56 out of the 151 NBS-LRR genes were duplicated and they were subsequently divided into 16 multigene families (Tables  2 and 3). The maximum number of family members was 14, which cluster at the distal end of chromosome 10 (Figure 1), and the average number of family members was 3.5.
In contrast to NBS-LRR genes, 46 multigene families that contain 110 out of 226 LRR-RLK genes were identified in the maize genome (Tables 2 and 4). Among these multigene families, the percentage in LRR-RLK genes (48.67%) was higher than that in NBS-LRR genes (37.09%), demonstrating that almost half of the LRR-RLK genes were duplicated. Furthermore, the maximal number of family members in LRR-RLK genes was lower than that in NBS-LRR genes (5 and 14, resp.; Table 2). The average number of LRR-RLK members per multigene family was 2.4 and also lower than that of NBS-LRR members per multigene family. This result revealed that LRR-RLK genes are highly diverse NBS-LRR genes within the maize genome. More interestingly, no pairs of NBS-LRR genes were found on duplicated chromosomal segments, while 36 pairs of LRR-RLK genes were found. This indicates that the expansion of LRR-RLK genes may have

Molecular Characterization and Phylogenetic Analysis of Maize LRR-RLKs.
To identify the homologues of the LRR-RLKs in maize, we used known plant LRR-RLK protein sequences to search the maize protein database. Out of the 226 identified LRR-RLK genes, six homologues were found, including FLS2 (GRMZM2G080041), Cf5 (GRMZM2G107872), PSY1R (GRMZM2G177570), PSKR1 (GRMZM2G080537), BIR1 (GRMZM2G121565), and OsSERK1 (GRMZM2G150024). All the predicted amino acid sequences of these genes contain a signal peptide, extracellular LRR domains, a single-pass transmembrane domain (TM), and an intracellular kinase domain except for Cf5 ( Figure 2). Maize FLS2, OsSERK1, PSY1R, and PSKR1 are typically LRR-serine/threonine protein kinases, while BIR1 is dual-specificity serine/threonine/tyrosine kinase but Cf5 is a transmembrane LRR-receptor-like protein. At the amino acid level, the identified maize LRR-RLK/RLPs shared the highest identity (or similarity) within family members (Table 5). Meanwhile, maize LRR-RLK/RLP proteins share a higher identity (or similarity) to their monocots homologues than dicots homologues. In order to elucidate the relationships among the maize LRR-RLK/RLP genes with their homologues in other plants, the encoded amino acid sequences of LRR-RLK/RLP genes (see Supplementary Material 2) were used to construct a neighbor-joining phylogenetic tree with the multiple sequence alignments (Figure 3). The phylogenetic tree showed that all of the six maize LRR-RLK/RLP protein sequences were grouped with their respective homologues, confirming their phylogenetic relationships. The tree also indicated that there were two distinct clades: one clade includes PSY1R, PSKR1, BIR1, and OsSERK1 which are typical LRR-RLKs and the other includes FLS2 and Cf5, both known as PRRs in dicots as they contain a more similar LRR domain. This is supported by the fact that these maize LRR-RLK/RLPs are grouped with their homologues from other plant species.  It can be concluded that the maize LRR-RLK/RLPs are more close to that from monocots than dicots members.

Discussions
A genomic analysis of the disease-resistance genes encoding NBS has been extensively investigated in many plant species. However no TIR-NBS-LRR genes were reported in maize and the LRR-RLK genes were rarely characterized, although the first plant RLK was identified in maize twenty years ago [38].
In the present study, we discovered for the first time four TIR-NBS-LRR genes out of 151 NBS-LRR genes and 226 LRR-RLK genes from the maize whole-genomic sequence database and further characterized six proteins encoded by LRR-RLK/RLP genes. The putative LRR domain in these proteins is likely to act as a ligand-binding domain to recognize pathogens as part of the maize immune system. The maize genome sequence data employed in the present study is more accurate than in previous studies as gaps were also sequenced, which enabled us to estimate the actual number of NBS-LRR encoding genes. Thus, additional 64 NBS-LRR genes in maize were identified compared to 109 [31] or 95 [39] NBS-LRR genes in previous investigations. It is interesting that the newly found genes on chromosome 7 are in clusters or pairs, indicating that the local tandem duplications are possibly responsible for gene expansion. 226 LRR-RLK genes were discovered in maize in this study, which is in line with the number of Sorghum bicolor (211) and Arabidopsis thaliana (213), but much less than that of Oryza sativa (353) [35].
Previous studies have shown that rice diverged from the progenitors of maize and sorghum about 60 million years ago, whereas maize and sorghum evolved from a common ancestor about 25 million years ago [40]. The NBS-LRR genes are much less frequent in maize than in rice or sorghum, in which 480 and 211 NBS-LRR genes were identified, respectively [22,23]. This suggests that loss of NBS-LRR genes in maize and sorghum was rapid after the split with rice especially in maize. The distribution of the number of NBS-LRR genes on each chromosome confirmed previous findings, although the total number found was different [31]. For example, chromosome 10 contains the greatest number of NBS-LRR genes, whereas chromosome 9 has the least. Additionally, NBS-LRR genes are most abundant on maize chromosome 10, and their homologues are located in chromosome 11 in rice and chromosome 8 in sorghum [33,41]. Clustering of NBS-LRR genes in the genomic regions in various species suggests that there are chromosomal hot spots in which the NBS-LRR genes are duplicated. Additionally, 14 NBS-LRR genes are located in maize chromosome 10, suggesting that these genes originated by tandem duplications and subsequently evolved under selective pressure in this region. Maize NBS-LRR genes mostly encode the CC-type of N-terminal domains. Only four maize TIR-encoding genes (GRMZM2G319375, GRMZM2G402165, GRMZM2G132403, and GRMZM2G302279) were identified. Similar findings were also reported in other monocots, such as rice (only one TIR-encoding gene identified), sorghum (two TIR-encoding genes), and Brachypodium distachyon (none identified) [10,22,33]. In contrast, the TIR encoding genes have extensively expanded in dicots, with 98 found in Arabidopsis and 78 in poplar [20,30]. The CC-type NBS-LRR genes from dicots and monocots tend to cluster together on chromosomes suggesting that the CC-type NBS-LRR genes originated before the divergence of the monocots and dicots [20,22,23]. Additionally, TIR-type genes are likely to have been lost from the grass species rather than having arisen from plant evolution after the monocot/dicot separation as there are less TIR-type genes in monocots than dicots, although the reason remains unclear [42,43].  Figure 3.
In maize, 27.8% of the NBS-LRR genes are located in gene clusters, a much lower proportion than in sorghum at 97% [23]. It has also been reported that the highest proportion of clustered NBS-LRR genes has been found in Arabidopsis and rice [20,22]. Moreover, there is potentially less duplication and fewer multigene families in maize than sorghum and rice. This peculiar distribution in maize might be caused by more dispersed-recombination than duplication as the maize genome is approximately 3 times bigger than the sorghum genome [33]. In contrast, the LRR-RLK genes are not distributed in clusters but are scattered throughout the maize genome. Nearly 50% of LRR-RLK genes have been duplicated, indicating that a whole-genome wide duplication event could result in the expansion of LRR-RLK genes in maize. The difference genome location and copy number between NBS-LRR genes and LRR-RLK genes suggest that they may have evolved from two different ancestors and have different roles in acting as receptors in plant immune systems. Additionally, the genome-wide distribution of maize NBS-LRR and LRR-RLK genes indicates that a large number of different loci are related to the immune system and that the maize resistance system is very complex.
The number of LRR-RLK genes found in Arabidopsis was similar to that in maize. More than 400 genes were identified with RLK configurations that can be classified into at least 21 structural classes based on their extracellular domains, in which the LRR-RLKs represented the largest group consisting of 216 members [19]. The presence of LRR-RLK genes in both dicot and monocot suggests that the size of this gene family may have been similar to the present-day level before the diversification of the land plant lineages.
The plant RLKs have been functionally characterised in some studies and are implicated to be involved in a diverse range of signalling processes, including brassinosteroid signalling via brassinosteroid insensitive 1 (BRI1) [44], recognition of flagellin by FLS2 [15], and bacterial resistance mediated by Xa21 [11]. So far, none of 226 LRR-RLKs in maize have been functionally studied. The LRR-RLK genes identified in this paper will be invaluable for gene function analysis. In addition, six maize LRR-RLK/RLPs were found to contain an extracellular ligand-binding domain (LRR domain), a single membrane-spanning domain, and a Cterminal intracellular protein kinase domain (tyrosine or serine/threonine rich region) and thus it is likely that they have the same function as their homologues in other plant species. For example, FLS2 functions as an immune receptor sensing the bacterial flagellin in maize as it does in Arabidopsis. The extracellular LRR domain to recognize the peptide fig22 [45], maize BIR1, as a BAK1-(BRI1-associated receptor kinase 1-) interacting receptor-like kinase, works together with BAK1 to negatively regulate cell death and defense responses [12,46,47]. Other than OsSERK1 (GRMZM2G150024 on Chr. 4), two more SERK (Somatic Embryogenesis Receptor-Like Kinase) family members, named ZmSERK1 (GRMZM5G870959 on Chr. 10) and ZmSERK2 (GRMZM2G115420 on Chr. 5), had been characterised in maize previously [48]. Besides the role in somatic embryogenesis, the SERK family members have been associated with R-gene resistance, such as Mi-1 against potato aphids in tomato plants [49] and Xa21 against Xanthomonas oryzae pv. oryzae (Xoo) in rice [50]. Although three members of the SERK family have been found in maize, whether they have the same functions in disease resistance as seen in tomato and rice remains to be determined.
In conclusion, this study has identified a number of the maize LRR-receptor-like genes and characterised a number of LRR-RLK/RLP genes based on their structural domains, physical chromosomal locations, and phylogenetic relationships. These sequences will aid in the study of their functions in maize.
International Journal of Genomics