Comparative Structures and Evolution of Vertebrate Carboxyl Ester Lipase (CEL) Genes and Proteins with a Major Role in Reverse Cholesterol Transport

Bile-salt activated carboxylic ester lipase (CEL) is a major triglyceride, cholesterol ester and vitamin ester hydrolytic enzyme contained within pancreatic and lactating mammary gland secretions. Bioinformatic methods were used to predict the amino acid sequences, secondary and tertiary structures and gene locations for CEL genes, and encoded proteins using data from several vertebrate genome projects. A proline-rich and O-glycosylated 11-amino acid C-terminal repeat sequence (VNTR) previously reported for human and other higher primate CEL proteins was also observed for other eutherian mammalian CEL sequences examined. In contrast, opossum CEL contained a single C-terminal copy of this sequence whereas CEL proteins from platypus, chicken, lizard, frog and several fish species lacked the VNTR sequence. Vertebrate CEL genes contained 11 coding exons. Evidence is presented for tandem duplicated CEL genes for the zebrafish genome. Vertebrate CEL protein subunits shared 53–97% sequence identities; demonstrated sequence alignments and identities for key CEL amino acid residues; and conservation of predicted secondary and tertiary structures with those previously reported for human CEL. Phylogenetic analyses demonstrated the relationships and potential evolutionary origins of the vertebrate CEL family of genes which were related to a nematode carboxylesterase (CES) gene and five mammalian CES gene families.


Introduction
Bile-salt activated carboxylic ester lipase (CEL; also designated as cholesterol esterase and lysophospholipase) is a major triglyceride, cholesterol ester and vitamin ester hydrolytic enzyme contained within pancreatic and lactating mammary gland secretions [5][6][7][8][9][10]. CEL is also secreted by the liver and is localized in plasma where it contributes to chylomicron assembly and secretion, the selective uptake of cholesteryl esters in HDL by the liver, LDL lipid metabolism, and reverse cholesterol transport [11][12][13][14]. Plasma CEL may also contribute to endothelial cell proliferation, the induction of vascular smooth muscle proliferation, and thrombus formation through interaction with platelet CXCR4 [15]. More recently, CEL expression has been reported in human pituitary glands where it may function in regulating hormone secretion in association with the CEL hydrolytic activity of ceramides [16].
Structures for several human and animal CEL genes and cDNA sequences have been determined, including human (Homo sapiens) [7,[17][18][19], gorilla (Gorilla gorilla) [20], mouse (Mus musculus) [21][22][23], rat (Rattus norvegicus) [24][25][26], and cow (Bos taurus) CEL genes [1,27]. The human CEL gene comprises 11 exons and is localized on chromosome 9 [28]. Several Alu repetitive sequence elements and putative transcription factor binding sites have been identified in the 5 -untranslated (UTR) region, including pancreatic-specific binding sites, which contribute to a high level of expression in the exocrine pancreas [17,29,30]. Exon 11 of the human CEL gene encodes a variable number of tandem repeat sequences 2 Cholesterol region (VNTR) (17 repeats are most common) which is highly polymorphic in human populations and contributes to plasma cholesterol and lipid composition [13]. Moreover, rare CEL gene defects in this region are responsible for a monogenically derived diabetes condition called maturityonset diabetes of the young type 8 (MODY8), also known as diabetes and pancreatic exocrine dysfunction (DPED), which causes a defect in insulin secretion [31,32].
Human CEL is expressed predominantly in the lactating mammary gland and beta cells of the exocrine pancreas, where the enzyme contributes significantly to triglyceride, cholesterol ester and vitamin ester metabolism [5][6][7][8][9][10]. CEL also promotes large chylomicron production in the intestine, and its presence in plasma supports interactions with cholesterol and oxidized lipoproteins [11] which may influence atherosclerosis progression [12]. CEL expression has also been reported in the human pituitary gland, and a possible role for CEL in the regulation of hormone secretion and ceramide metabolism has been described [16]. Studies of Cel − /Cel − knock out mice have shown that other enzymes besides CEL are predominantly responsible for the hydrolysis of dietary cholesteryl esters, retinyl esters, and triglycerides [33]. Metabolic studies of Cel-null mice however have reported that a lack of CEL activity causes an incomplete digestion of milk fat and lipid accumulation by enterocytes in the ileum of neonatal mice which suggests a major role for this enzyme in triglyceride hydrolysis in breast-fed animals [9,34]. Moreover, reverse cholesterol transport is elevated in carboxyl ester lipase-knockout mice which supports a significant role for this enzyme in the biliary disposal of cholesterol from the body [14].
A CEL-like gene (designated as CELL) has also been identified on human and gorilla chromosome 9, about 10 kilobases downstream of CEL, which is transcribed in many tissues of the body but lacks exons 2-7 and is unlikely to be translated into protein [17,20,35]. The CELL pseudogene gene duplication apparently occurred prior to the separation of Hominidae (man, chimpanzee, gorilla, and orangutan) from Old World monkeys (macaque) with CELL being restricted to genomes of man and the great apes [20].
Three-dimensional structural analyses of human CEL have shown that the enzyme belongs to the alpha/beta hydrolase fold family with several key structural and catalytic features, including an active site catalytic triad located within the enzyme structure and partially covered by a surface loop, the carboxyl terminus region of the protein which regulates enzymatic activity by forming hydrogen bonds with the surface loop to partially shield the active site, and a loop domain which binds bile salt and frees the active site to access water-insoluble substrates [1,10,[36][37][38]. In both conformations, CEL forms dimeric subunit structures with active sites facing each other. The common variant of the human CEL gene contains VNTR repeats, but there is a high degree of polymorphism in the repeated region [31,32]. While the biological function of the polymorphic repeat region is unknown, it has been suggested that it may be important for protein stability and/or secretion of the enzyme, particularly given that this region contains many O-glycosyl bonds linking carbohydrate residues to the CEL C-terminus, including fucose, galactose, glucosamine, galactosamine, and neuraminic acid residues [39].
This paper reports the predicted gene structures and amino acid sequences for several vertebrate CEL genes and proteins, the predicted secondary and tertiary structures for vertebrate CEL protein subunits, and the structural phylogenetic and evolutionary relationships for these genes and enzymes with mammalian CES (carboxylesterase) gene families [40,41].
BLAT (BLAST-Like Alignment Tool) analyses were subsequently undertaken for each of the predicted vertebrate CEL amino acid sequences using the University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc .edu/cgi-bin/hgBlat) [4] with the default settings to obtain the predicted locations for each of the vertebrate CEL genes, including predicted exon boundary locations and gene sizes. Structures for human, mouse, and rat CEL isoforms (splice variants) were obtained using the AceView website (http:// www.ncbi.nlm.nih.gov/IEB/Research/Acembly/index.html ?human) to examine predicted gene and protein structures [2]. Alignments of vertebrate CEL sequences with human carboxylesterase (CES) protein sequences were assembled using the ClustalW2 multiple sequence alignment program [53] (http://www.ebi.ac.uk/Tools/clustalw2/index.html).

Phylogenetic Studies and Sequence Divergence.
Alignments of vertebrate CEL and human, mouse, and nematode CES-like (carboxylesterase) protein sequences were assembled using BioEdit v.5.0.1 and the default settings [58]. Alignment ambiguous regions were excluded prior to phylogenetic analysis yielding alignments of 480 residues for comparisons of sequences (Table 1). Evolutionary distances were calculated using the Kimura option [59] in TREECON [60]. Phylogenetic trees were constructed from evolutionary distances using the neighbor-joining method [61] and rooted with the nematode CES sequence. Tree topology was reexamined by the boot-strap method (100 bootstraps were applied) of resampling, and only values that were highly significant (≥90) are shown [62].

Alignments of Human and Other Vertebrate CEL Subunits.
The deduced amino acid sequences for opossum (Monodelphis domestica) and chicken (Gallus gallus) CEL subunits and for zebrafish (Danio rerio) CEL1 and CEL2 subunits are shown in Figure 1 together with the previously reported sequences for human (Homosapiens) [7,39], mouse (Mus musculus) [22], and bovine (Bos taurus) [27,63] CEL subunits ( Table 1). Alignments of the human and other vertebrate CEL subunits examined in this figure showed between 56-80% sequence identities, suggesting that these are products of the same family of genes and proteins ( Table 2). The amino acid sequence for human CEL contained 756 residues whereas other vertebrate CEL subunits contained fewer amino acids: 599 residues (cow), 598 residues (mouse), 579 residues (opossum), 556 residues (chicken), and 550 residues for zebrafish CEL1 and CEL2 ( Figure 1; Table 1). These differences are predominantly explained by changes in the number of VNTR 11 residue repeats at the respective CEL C-termini, with human CEL containing 17 repeats, whereas bovine, mouse, and opossum CEL C-termini contained only 3, 3, and 1 repeats, respectively, while chicken and zebrafish CEL subunits exhibited no VNTR-like C-terminus sequences. Table 1 summarizes this feature among all of the vertebrate CEL sequences examined and shows that substantial numbers of C-terminus VNTR repeats were predominantly restricted to higher primates, especially gorilla (Gorilla gorilla) (39 repeats) [20], human (17 repeats) [17], and rhesus (Macaca mulatta) (15 repeats) CEL, whereas other mammalian CEL subunits usually contained 3 VNTR repeats, with the exception of the predicted dog (Canis familiaris) CELC-terminus, which contained 13 VNTR-repeat sequences. A comparison of the 11-residue repeat sequences for the mammalian CEL subunits examined showed the following consensus sequence: Pro-Val-Pro-Pro-Thr-Gly-Asp-Ser-Glu-Ala-Ala (Figure 2), for which the first 4 residues have been proposed to play a role in facilitating Oglycosylation at the 5th residue (Thr) position [10]. Several other key amino acid residues for mammalian CEL have been recognized (sequence numbers refer to human CEL) ( Figure 1). These include the catalytic triad for the active site (Ser194; Asp320; His435) forming a charge relay network for substrate hydrolysis [10,64]; the hydrophobic N-terminus signal peptide (residues 1-20) [7,65]; disulfide bond forming residues (Cys84/Cys100 and Cys266/Cys277) [7,66]; arginine residues (Arg83/Arg446) which contribute to bile-salt binding and activation [1,36]; a heparin binding site (residues 21-121); as well as the 11-residue VNTR repeat (×17) at the CEL C-terminus (residues 562-756). Identical residues were observed for each of the vertebrate CEL subunits for the active site triad, disulfide bond forming residues and key arginine residues contributing to bile salt activation, however, the N-terminus 20-residue signal peptide underwent changes in sequence but retained predicted signal peptide properties ( Figure 1; Table 1). The N-glycosylation site reported for human CEL at Asn207-Ile208-Thr209 [10] was retained for each of the 22 vertebrate CEL sequences examined, with the exception of platypus (Ornithorhynchus anatinus) CEL which contained two predicted N-glycosylation sites at Asn381-Val382-Thr383 and Asn548-Leu549-Thr550 (Table 3). Predicted Nglycosylation sites were also observed at other positions, including Asn381-Ile382-Thr383 for opossum (Monodelphis domestica) CEL; Asn270-Thr271-Thr272 and Asn381-Leu382-Thr383 for chicken (Gallus gallus) CEL; Asn270-Thr271-Thr272 for lizard (Anolis carolensis); Asn550-Val551-Thr552 for fugu (Takifugu rupides) CEL (Table 3). Given the reported role of the N-glycosylated carbohydrate group in contributing to the stability and maintaining catalytic efficiency of a related enzyme (carboxylesterase or CES1) [67], this property may be shared by the vertebrate CEL subunits as well, especially for those containing multiple predicted sites for N-glycosylation, such as chicken CEL, which contains three such sites.   Table 1 for sources of CEL sequences; * shows identical residues for CEL subunits;: similar alternate residues;. dissimilar alternate residues; N-Signal peptide residues are in red; N-glycosylation residues at 207NIT (human CEL) are in green; active site (AS) triad residues Ser, Asp, and His are in pink; O-glycosylation sites are in blue; disulfide bond Cys residues for human CEL (•); essential arginines which contribute to bile-salt binding are in red; helix (human CEL or predicted helix); sheet (human CEL) or predicted sheet; bold font shows known or predicted exon junctions; exon numbers refer to human CEL gene; CEL "loop" covering the active site (human CEL residues 136-143) are in green; Hu-human CEL; Co-cow CEL; Mo-mouse CEL; Op-opossum CEL; Ch-chicken CEL; Z1-zebrafish CEL1; Z2-zebrafish CEL2.

Predicted Secondary and Tertiary
for vertebrate CEL sequences were compared with the previously reported secondary structure for bovine and human CEL [1,68] (Figure 1). Similar α-helix β-sheet structures were observed for all of the vertebrate CEL subunits examined. Consistent structures were particularly apparent near key residues or functional domains including the β-sheet and α-helix structures near the active site Ser194 (β8/αD) and Asp320 (β10/α8) residues, and the N-glycosylation site at Asn207-Ile208-Thr209 (near β8) [69]. The single helix at the C-termini (αN) for the vertebrate CEL subunits was readily apparent, as were the five β-sheet structures at the N-termini of the CEL subunits (β1-β5). It is apparent from these studies that all of these CEL subunits have highly similar secondary structures. Figure 3 describes predicted tertiary structures for mouse CEL and zebrafish CEL1 protein sequences which showed significant similarities for these polypeptides with bovine [1,36] and human CEL [68]. Identification of specific structures within the predicted mouse CEL and zebrafish CEL1 sequences was based on the reported structure for a trun-cated human CEL which identifies a sequence of twisted βsheets interspersed with several α-helical structures [10,68] which are typical of the alpha-beta hydrolase superfamily [40]. The active site CEL triad was centrally located which is similar to that observed in other lipases and esterases [40,70,71]. The major difference between CEL and other serine esterases is an apparent insertion at positions 139-146 (for human CEL) (Figure 1 of Supplementary Material available online at doi 10.1155/2011/781643) which appears to act as a surface loop that partially covers the opening to the catalytic triad and allows access to the active site by water soluble substrates by the truncated CEL [68]. This active site loop is also readily apparent in the predicted structures for mouse CEL and zebrafish CEL1. These comparative studies of vertebrate CEL proteins suggest that the properties, structures, and key sequences are substantially retained for all of the vertebrate sequences examined.    Figure 2: Amino acid alignments for C-terminal 11-residue repeat sequences for mammalian CEL subunits. Hydrophobic amino acid residues are shown in red; hydrophilic residues in green; acidic residues in blue; basic residues in pink; (squared T) refers to known Oglycosylation sites for human CEL; R refers to repeat number. P-proline; V-valine; T-threonine; G-glycine; D-aspartate; E-glutamate; S-serine; A-alanine; K-lysine; N-asparagine; note consistent PVPP start sequences.

Cholesterol
The identified N-glycosylation site is for human CEL (see [10]). Amino acid residues are shown for known or predicted N-glycosylation sites: ; the rainbow color code describes the 3-D structures from the N-(blue) to C-termini (red color); N refers to amino terminus; C refers to carboxyl terminus; specific alpha helices (αA . . . αN) and beta sheets (β1 . . . β13) were identified, as well as the active site region and the "loop" covering the active site.
for vertebrate CEL genes based upon BLAT interrogations of several vertebrate genomes using the reported sequences for human, gorilla, mouse, rat, and bovine CEL [6,7,20,[22][23][24]27] and the predicted sequences for other vertebrate CEL proteins and the UCSC Genome Browser [4]. Human and mouse CEL genes were located on human chromosome 9 and mouse chromosome 2, which are distinct to the carboxylesterase (CES for human or Ces for mouse) gene family cluster locations in each case: on human chromosome 16 and mouse chromosome 8, respectively (Table 1; see [41]). The zebrafish (Danio rerio) genome showed evidence of tandem duplicated CEL genes, with predicted CEL1 and CEL2 genes being located about 7.3 kilobases apart on zebrafish chromosome 21 (Table 1). This is in contrast with many other gene duplication events during zebrafish evolution that have occurred predominantly by polyploidisation or duplication of large chromosomal segments rather than by tandem gene duplication [72]. Figure 1 summarizes the predicted exonic start sites for cow, opossum, chicken, and zebrafish CEL genes with each having 11 exons, in identical or similar positions to those reported for the human CEL and mouse Cel genes [17,22,23]. In contrast, human CES1 [73,74], CES2, CES3 [75,76], CES4 [41], and CES5 [77,78] genes contained 14, 12, 13, 14, and 13 exons, respectively, which are predominantly in distinct positions to those described for vertebrate CEL genes, with the exception of the last exon in each case (Figure 1 of Supplementary Material). Consequently, even though CEL and CES genes and proteins are members of the same serine hydrolase superfamily [10,40], it is apparent that CEL is not a close relative of the CES gene family, for which at least five genes are clustered on a single chromosomes on human and mouse chromosomes and are more similar in gene structure to each other than they are to the CEL gene ( Figure 1 of Supplementary Material; see [41]). Figure 4 illustrates the predicted structures of mRNAs for human and mouse CEL transcripts for the major transcript isoform in each case [2]. The transcripts were 10.5 and 7.6 kilobases in length, respectively, with 10 introns and 11 exons present for these CEL mRNA transcripts. The human CEL genome sequence contained a microRNA site (miR485-5p) located in the 3 -untranslated region and a CpG island (CpG51). The occurrence of the CpG island within the CEL gene may reflect a role in regulating gene expression [79] which may contribute to a higher than average gene expression level reported for human CEL (×1.5 times higher). Figure 2 of Supplementary Material shows a nucleotide sequence alignment diagram for the CpG51 region of the human CEL gene in comparison with several other mammalian and other vertebrate CEL genes. The Multiz alignment patterns observed demonstrated extensive sequence conservation for the CpG island which contains dinucleotide and trinucleotide repeat sequences in most genomes examined.
The prediction of a microRNA (miRNA; miR485-5p) binding site in the 3 untranslated region of human CEL is   also potentially of major significance for the regulation of this gene. MicroRNAs are small noncoding RNAs that regulate mRNA and protein expression and have been implicated in regulating gene expression during embryonic development [80]. Moreover, a recent study of a related miRNA gene (miR-375) has been recently shown to be selectively expressed in pancreatic islets and has been implicated both in the development of islets and the function of mature pancreatic beta cells [81]. A similar role may be played by miR485-5p with respect to the regulation of CEL expression during pancreatic beta cell development. Table 1 of Supplementary Material presents comparative nucleotide sequences for miR485-5plike CEL gene regions for several vertebrate genomes which shows high levels of sequence identity, particularly among mammalian CEL miRNA target sites and suggests that this site has been predominantly conserved during vertebrate evolution, particularly by eutherian mammalian CEL genes. Figure 5 shows a UCSC Genome Browser Comparative Genomics track that shows evolutionary conservation and alignments of the nucleotide sequences for the human CEL gene, including the 5 -flanking, 5 -untranslated, intronic, exonic, and 3 untranslated regions of this gene, with the corresponding sequences for 10 vertebrate genomes, including 5 eutherian mammals (e.g., mouse, rat), a marsupial (opossum), a monotreme (platypus), and lower vertebrate genomes. Extensive conservation was observed among these genomic sequences, particularly for the rhesus CEL gene and for other eutherian mammalian genomes. In contrast with the eutherian mammalian genomes examined, other vertebrate genomes retained conserved sequences only within the 11 exonic CEL regions. It would appear that exonic CEL nucleotide sequences have been conserved throughout vertebrate evolution whereas only in eutherian mammalian genomes have other regions of the CEL gene been predominantly conserved. Figure 6 presents "heat maps" showing comparative gene expression for various human and mouse tissues obtained from GNF Expression Atlas Data using the GNF1H (human) and GNF1M (mouse) chips (http://genome.ucsc.edu/; http://biogps.gnf.org/) [3]. These data supported a high level of tissue expression for human CEL, particularly for the pancreatic islets, the pituitary gland, and fetal liver, which is consistent with previous reports for these genes (see [10]). High levels of CEL gene expression have also been reported for the human mammary gland where CEL plays a major role in lipid digestion in breast milk by neonates [6]. The localization of CEL within the human pituitary gland is of major interest as this enzyme also hydrolyzes ceramides [8], which suggests a possible role in the regulation of hormone secretion in both normal and adenomatous pituitary cells [16]. A high level of expression of CEL in mouse tissues was also observed (×3.4 times average expression) (Figure 4), particularly for the pancreas, mammary gland, and spleen ( Figure 5) Figure 7: Phylogenetic tree of vertebrate CEL with human and mouse CES1, CES2, CES3, CES4, and CES5 amino acid sequences. The tree is labeled with the gene name and the name of the vertebrate. Note the major cluster for the vertebrate CEL sequences and the separation of these sequences from human and mouse CES1, CES2, CES3, CES4, and CES5 sequences. The tree is "rooted" with the CES sequence (T27C12) from a nematode (Caenorhabditis elegans). See Table 1 for details of sequences and gene locations. A genetic distance scale is shown (% amino acid substitutions). The number of times a clade (sequences common to a node or branch) occurred in the bootstrap replicates are shown. Only replicate values of 90 or more which are highly significant are shown with 100 bootstrap replicates performed in each case.

Comparative Human and Mouse CEL Tissue Expression.
for this enzyme in cholesterol ester, retinyl ester, and triglyceride hydrolysis and metabolism have been described [10]. Recent metabolic studies using Cel − /Cel − (knock-out) mice (or CELKO mice) have demonstrated that CEL is not an essential enzyme for these metabolic functions [82,83], although CELKO neonatal mice exhibit an incomplete digestion of milk fat [9,84], and in adult CELKO mice causes an elevation in reverse cholesterol transport (RCT) in adult animals [14]. The latter finding is potentially of major clinical significance for this enzyme, given that any increase in RCT and the associated increased biliary disposal of cholesterol may contribute to preventing atherosclerosis [85,86].

Phylogeny and Divergence of Vertebrate CEL and
Mammalian/Nematode CES Sequences. A phylogenetic tree (Figure 7) was calculated by the progressive alignment of human and other vertebrate CEL amino acid sequences with human and mouse CES1, CES2, CES3, CES4, and CES5 sequences. The phylogram was "rooted" with a nematode CES sequence and showed clustering of the CEL sequences which were distinct from the human and mouse CES families. In addition, the zebrafish CEL1 and CEL2 sequences showed clustering within the fish CEL sequences examined, which is consistent with these genes being products of a recent duplication event during teleost fish evolution. Overall, these data suggest that the vertebrate CEL gene arose from a gene duplication event of an ancestral CES-like gene, resulting in at least two separate lines of gene evolution for CES-like and CEL-like genes. This is supported by the comparative biochemical and genomic evidence for vertebrate CEL and CES-like genes and encoded proteins, which share several key features of protein and gene structure, including having similar alpha-beta hydrolase secondary and tertiary structures [10,40,41,71,78] (Figure 1 of Supplementary Material).
In conclusion, the results of the present study indicate that vertebrate CEL genes and encoded CEL enzymes represent a distinct alpha-beta hydrolase gene and enzyme family which share key conserved sequences and structures that have been reported for the human CES gene families. CEL is a major triglyceride, cholesterol ester and vitamin ester hydrolytic enzyme contained within exocrine pancreatic and lactating mammary gland secretions and is also localized in plasma where it contributes to chylomicron assembly and secretion, in the selective uptake of cholesteryl esters in HDL in the liver and in reverse cholesterol transport, including biliary disposal of cholesterol. Bioinformatic methods were used to predict the amino acid sequences, secondary and tertiary structures and gene locations for CEL genes, and encoded proteins using data from several vertebrate genome projects. A proline-rich and O-glycosylated 11-amino acid C-terminal repeat sequence (VNTR) previously reported for human and other higher primate CEL proteins was also observed for other eutherian mammalian CEL sequences examined. Opossum CEL, however, contained a single Cterminal copy of this sequence while CEL proteins from lower vertebrates lacked the VNTR sequence. Evidence is presented for tandem duplicated CEL genes for the zebrafish genome. Vertebrate CEL protein subunits shared 53-97% sequence identities and exhibited sequence alignments and identities for key CEL amino acid residues as well as extensive conservation of predicted secondary and tertiary structures with those previously reported for human CEL. Phylogenetic analyses demonstrated the relationships and potential evolutionary origins of the vertebrate CEL family of genes which were related to a nematode carboxylesterase (CES) gene and five mammalian CES gene families. These studies indicated that CEL genes have apparently appeared early in vertebrate evolution prior to the teleost fish common ancestor more than 500 million years ago [87].