Identification and Analysis of the SET-Domain Family in Silkworm, Bombyx mori

As an important economic insect, Bombyx mori is also a useful model organism for lepidopteran insect. SET-domain-containing proteins belong to a group of enzymes named after a common domain that utilizes the cofactor S-adenosyl-L-methionine (SAM) to achieve methylation of its substrates. Many SET-domain-containing proteins have been shown to display catalytic activity towards particular lysine residues on histones, but emerging evidence also indicates that various nonhistone proteins are specifically targeted by this clade of enzymes. To explore their diverse functions of SET-domain superfamily in insect, we identified, cloned, and analyzed the SET-domains proteins in silkworm, Bombyx mori. Firstly, 24 genes containing SET domain from silkworm genome were characterized and 17 of them belonged to six subfamilies of SUV39, SET1, SET2, SUV4-20, EZ, and SMYD. Secondly, SET domains of silkworm SET-domain family were intraspecifically and interspecifically conserved, especially for the catalytic core “NHSC” motif, substrate binding site, and catalytic site in the SET domain. Lastly, further analyses indicated that silkworm SET-domain gene BmSu(var)3-9 owned different characterization and expression profiles compared to other invertebrates. Overall, our results provide a new insight into the functional and evolutionary features of SET-domain family.


Introduction
SET-domain superfamily includes all but one of the methyltransferases that methylate specific site of histone lysine (K) residues involved in epigenetic regulation. This family is characterized by the highly conserved SET (Su(var)3-9, E(z), Trithorax) domain that consists of approximately 130 amino acids and is responsible for the catalytic activity of these methyltransferases. Through transferring a methyl group from S-adenosyl-L-methionine (AdoMet) to the amino group of a lysine residue on different sites of histone proteins, SET-domain methyltransferases function in making different histone methylation marks. Histone methylation is very important for the chromatin modification and regulation of gene expression [1][2][3][4], which plays a crucial role in the animal development and a number of other biological processes, such as heterochromatin establishment, transcription regulation, parental imprinting, and cell fate destination. Researches also suggest that SET-domain proteins are closely related to many human diseases [5][6][7][8][9][10].
Members of this family have been studied extensively for their function in modifying histones by methylating them directly and thus changing the mode of chromatin to regulate the binding of cofactors in many research models, such as human, mouse, Drosophila, Arabidopsis, and yeast. On the other hand, more and more recent studies give sufficient evidence of the fact that SET-domain-containing proteins also regulate many nonhistone substrates, including some 2 BioMed Research International transcriptional factors and tumor suppressors [12][13][14][15][16][17]. Epigenetic regulation of SET-domain family remains to be studied in lepidopteron insects. Based on silkworm genome database, we identified, cloned, and analyzed the SET-domains proteins in silkworm, Bombyx mori. Here, we provide an overview of the common and unique features of silkworm SET-domain family members, which will provide more information and important reference to the whole SET-domain proteins.

Materials and Methods
2.1. Silkworm. The silkworm strain, Dazao (p50), used in this study is maintained by the State Key Laboratory of Silkworm Genome Biology. The silkworm larvae were reared with fresh mulberry leaves under 25 ∘ C, with a 12 h/12 h photoperiod.

Screening of SET-Domain Family Genes.
Protein sequences of SET-domain family members from other species were used to query the silkworm database with -value less than 0.1. The hits in the screening were furthermore confirmed by blast in NCBI protein database. Besides, the conserved SETdomain sequences of other species were also used as query sequence to blast in silkworm database. Moreover, we used online protein domain prediction program SMART (http:// smart.embl-heidelberg.de/) and Pfam (http://pfam.sanger.ac .uk/) to validate the SET domain in screening hits from silkworm database.
2.4. RNA Extraction. Different tissues of silkworm larvae were collected and stored in liquid nitrogen until use. Trizol reagent (Invitrogen, USA) was used to extract the total RNA of silkworm tissues. RNA concentration was calculated by spectrophotometer. RNA samples were digested by RNasefree DNase I (TaKaRa, Japan) to get rid of genomic DNA contamination. 2 g of each RNA sample was used to synthesize the first strand of cDNA by M-MLV Reverse Transcriptase following the manufacturer's instructions (Promega, USA).

Verification of Identified Genes and Expression Analysis.
Primers were designed for most of the identified silkworm SET-domain family members to clone them from silkworm cDNA. Silkworm cytoplasmic gene Actin3 (forward primer: 5 -AACACCCCGTCCTGCTCACTG-3 ; reverse primer: 5 -GGGCGAGACGTGTGATTTCCT-3 ) was used as an internal control. PCR amplification was performed using cDNA of deferent silkworm tissues to examine their expression profile. The 20 L PCR reaction volume is as follows: initial denaturation at 94 ∘ C for 3 min, followed by 30 cycles of 30 s at 94 ∘ C, annealing at temperatures (usually set at 55 ∘ C) for 45 s and 1 min extension at 72 ∘ C, and extension at 72 ∘ C for 10 min. The PCR products were analyzed by 1% agarose gels.
2.6. Phylogenetic Analysis. SET-domain amino acid sequences of silkworm SET-domain family candidates were aligned to each other and also with the representative SET domains of other species by the program ClustalX. The sequences of silkworm SET-domain-containing proteins were applied to construct phylogenetic trees by neighbor-joining algorithm (1000 bootstrap replicates) with the program MEGA4.0.

Identification of Silkworm SET-Domain Family.
We have identified the silkworm SET-domain family genes from silkworm database SilkDB for the first time. We have found 25 genes containing SET domain from silkworm genome by screening. Referring to the classification method of SETdomain family members in other species which is based on the SET-domain sequence and the feature of its flank motifs or domains, we were able to characterize 17 of them into six subfamilies of SUV39, SET1, SET2, SUV4-20, EZ, and SMYD (see Table 1). Silkworm SET-domain genes are mainly located on chromosomes 1, 3, 4, 15, 16, and 23, and 12 of them have ESTs in the database. We did not find any homologous genes of mammal RIZ subfamily in silkworm database; this may be the species difference like the SET proteins SET1, SET2 can be found in yeast while homolog genes of Su(var)3-9 and EZH1/2 are missing.

Cloning and Bioinformatics
Analysis. According to the gene sequence from silkworm genome database, we designed primers to clone and verify whether these genes are true hits. We have cloned most members of the silkworm SET-domain family (see Table 1). Through bioinformatics analysis, we found that SET domains of silkworm SET-domain family are intraspecifically and interspecifically conserved, especially for the catalytic core "NHSC" motif, substrate binding site, and catalytic site in the SET domain ( Figure 1).

Phylogenetic Analysis of Silkworm SET-Domain Subfamilies.
We have selected the representative subfamily to make phylogenetic analysis; we found that the domain architecture of SUV4-20 subfamily and SETDB1 and Su(var)3-9 of SUV39 subfamily is highly conserved to other species. The three members of silkworm SET2 subfamily are clustered to their homolog of other species, respectively.

SUV4-20
Subfamily. SUV4-20 subfamily methylates H4K20 in other species. Human, mouse, and Xenopus laevis have two SUV4-20 members, while in Drosophila and several other insects there is only one member. In silkworm, we identified only one SUV4-20 member like other insects. Phylogenetic analysis of SUV4-20 subfamily suggests that      Figure 3). We identified three members of this subfamily in silkworm, BmG9a-like, BmSetdb1, and BmSu(var)3-9 (actually, we found two variants of Su(var)3-9 in silkworm; see details in Study of Silkworm SET-Domain Gene BmSu(var)3-9). We have not identified the highly similar homolog proteins of G9a and SETDB2 in silkworm. There is one protein that we name BmG9a-like, because it has very similar domain structure to G9a of other species including ANK (ankyrin repeats) and Pre-SET domain, while Post-SET is replaced by an ALARD domain. Since BmG9a-like shows very low sequence similarity to G9a of other species, it is alone on the phylogenetic tree. Based on all of that, we speculate that BmG9a-like is a unique member existing in silkworm. However, more evidence should be provided. SETDB1 has very special SET domain which is bifurcated by a 100-300 amino acids' insert. We also aligned the SET-domain sequence of BmSetdb1 with other species. SET sequence of SETDB1 is highly between silkworm and other species while the insert sequences differ from one species to another ( Figure 4).

SET2
Subfamily. SET2 subfamily of mammals mainly includes 5 members, NSD1, NSD2, NSD3, SET2 (HIF1/ HYPB), and ASH1. NSD1, NSD2, and NSD3 share the conserved domains such as SET, AWS (Associated With SET), Post-SET, PWWP (a domain containing highly conserved Pro-Trp-Trp-Pro motif), PHD, and Ring finger. In silkworm, we just identified one homolog of the NSD proteins, BmNSD1, which has five of the above mentioned conserved domains except the Ring finger. However, so far, there is no homolog protein of NSD identified in Drosophila. HIF1 contains four conserved domains, SET, Post-SET, AWS, and WW. BmHIF1 identified from silkworm database have all domains but the WW domain, which may be because the sequence of BmHIF1 in silkworm is not complete, which is also confirmed by the relatively shorter sequence of BmHIF1 compared to Drosophila and other insects. ASH1 has four conserved domains except SET domain, AT hook, BROMO, BAH, and PHD. BmASH1 has all the domains except AT hook. Phylogenetic analysis of SET2 subfamily shows that members of silkworm SET2 subfamily cluster together with their homolog proteins in other species separately ( Figure 5).

Expression Profile of Silkworm SET-Domain Genes. The expression profiles of silkworm SET-domain family members
show that they have widely high expression level in gonad (testis and ovary) except BGIBMGA002076, which may be related to the gonad's function to propagate the genetic information ( Figure 6).

Study of Silkworm SET-Domain Gene BmSu(var)3-9.
Through amplification of homolog gene of Su(var)3-9 in silkworm, we found that silkworm has two transcript isoforms. Using RACE technology to obtain the full-length cDNA sequence showed that the situation is different from other species; the two transcript isoforms have a different sequence of 846 bp at the 5 end which belongs to the 5UTR of the longer spliceosome, and they encode a protein of 317 and 593 amino acids separately. Embryonic expression analysis of different period and different organs of three-day, fifthinstar larvae of silkworm displays that Su(var)3-9 is highly expressed during the 1-9 days of development of embryo, whereas it is relatively low expressed in the larvae organs (Figures 7 and 8).

Conclusion
SET-domain superfamily is a group of histone lysine methyltransferases which form different methylation marks (mono-, di-, and trimethylation, also known as me1, me2, and me3) on histone by transferring the methyl from S-adenosyl-Lmethionine (AdoMet) to lysine residues of histone proteins, with production of S-adenosyl-L-homocysteine (AdoHcy). The core enzymatic sites of the methyltransferases are located in the SET domain. Since the first SET-domain protein Drosophila DmSu(var)3-9 had been identified at the end of the last century, researchers confirmed the methyltransferase activity of Su(var)3-9 protein in mammals for the first time at the beginning of the 20th century. SET-domain proteins are widely identified from the lower eukaryote yeast to higher human. Human has more than 100 SET-domain-containing proteins. The great SET-domain family is linked to its great function in epigenetic regulation. SET-domain-containing proteins play important roles in the development process of human, Drosophila, and Arabidopsis. There are very few reports on the study of histone methylation in silkworm. Currently, the only one SET-domain protein reported is BmE(z) which is the main member of PcG (Polycomb group genes) proteins. The methylation of H3K27 by BmE(z) has been confirmed in silkworm by knocking down BmE(z) which leads to the reduction of H3K27me3 level [18]. Besides, demethyltransferase has been reported in silkworm, BmLid. The function of demethylating H3K4me2, H3K4me3, H3K9me2, H3K9me3, and H3K27me3 of BmLid has been validated and it is supposed to have wider catalytic substrate than its homolog protein in mammals and Drosophila [19].
In our study, from the silkworm database, 24 genes of SET-domain family were identified and 17 out of them  were included in SET1, SET2, SUV39, SUV4-20, EZ, and SMYD and six subfamilies. Except for three from SUV39 subfamily members, four from SET1 subfamily members, three from SET2 subfamily members, one from SUV4-20 subfamily members, one from EZ subfamily members, and five from SMYD subfamily members, the remaining five were independent members and two out of them were the homologous genes. Although the RIZ subfamily had not been identified in the silkworm, the certain members had already covered the common members of other species. Comparative analysis for identifying the SET-domain family members showed that they were highly conserved in each species, including the substrate binding site and catalytic site. Evolutionary analysis of the representative subfamily showed that the structure of the SUV4-20 family was highly consistent in the large number of species including the silkworm.
Identified in the silkworm, Bombyx mori SUV39 subfamily members SETDB1 and su(VaR)3-9 domain composition are highly conserved among various species, especially SETDB1 unique for sequence insertion and splitting the set domain in the silkworm, Bombyx mori and other species are extremely conservative. In addition, G9a-like gene BGIBMGA007949 in the silkworm possessed the conservative composition and structure of G9a and group specificity. BGIBMGA001497, BGIBMGA003106, and BGIBMGA002246 in the silkworm were three respective homologous genes of SET2 subfamily. Interestingly, Su(VaR)3-9 in embryos from day 1 to day 9 and third day of the fifth larval stage were highly expressed, suggesting an important role possibly in the embryonic differentiation process. In summary, based on the silkworm genome database, we identified, cloned, and analyzed the SET-domains proteins in     Figure 7: Expression patterns of BmSu(var)3-9 during embryogenesis day 1 to day 9 in silkworm. RT-PCR was performed to detect the expression patterns of BmSu(var)3-9 using specific primers and Actin3 gene was used as the internal control at each time point. silkworm, Bombyx mori. We intend to study the common and unique features of silkworm SET-domain family members, which will provide more information and important reference to the whole SET-domain proteins.