Codon Usage Bias and Cluster Analysis of the MMP-2 and MMP-9 Genes in Seven Mammals

Matrix metalloproteinase (MMP)-2 and MMP-9 are a family of Zn2+ and Ca2+-dependent gelatinase MMPs that regulate muscle development and disease treatment, and they are highly conservative during biological evolution. Despite increasing knowledge of MMP genes, their evolutionary mechanism for functional adaption remains unclear. Moreover, analysis of codon usage bias (CUB) is reliable to understand evolutionary associations. However, the distribution of CUB of MMP-2 and MMP-9 genes in mammals has not been revealed clearly. Multiple analytical software was used to study the genetic evolution, phylogeny, and codon usage pattern of these two genes in seven species of mammals. Results showed that the MMP-2 and MMP-9 genes have CUB. By comparing the content of synonymous codon bases amongst seven mammals, we found that MMP-2 and MMP-9 were low-expression genes in mammals with high codon conservation, and their third codon preferred the G/C base. RSCU analysis revealed that these two genes preferred codons encoding delicious amino acids. Analysing what factors influence CUB showed that the third base distributors of these two genes were C/A and C/T, and GC3S had a wide distribution range on the ENC plot reference curve under no selection or mutational pressure. Thus, mutational pressure is an important factor in CUB. This study revealed the usage characteristics of the MMP-2 and MMP-9 gene codons in different mammals and provided basic data for further study towards enhancing meat flavour, treating muscle disease, and optimizing codons.


Introduction
Codon usage bias (CUB) is defined as unequal utilisation in the frequency of synonymous codons in coding amino acids (AAs), and it has been used extensively for investigating gene phylogeny [1]. e synonymous codon characteristics include universality, degeneracy, and wobble, and they should be used randomly to encode corresponding AAs with no pressure of interference pressure. However, CUB can be affected by nucleotides composition, translation, hydrophobicity, tRNA abundance, and protein structure [2][3][4][5][6]. Notably, natural selection and mutational pressure, which drive the correct translation process, are the major factors associated with CUB [7,8]. Natural selection affects the pattern of codon usage in organisms, and mutational pressure may arise whilst the proportion of codon bases changes. CUB greatly increases the variability of genetic information and reflects the genetic drift of codons to a certain extent [9]. erefore, CUB can reveal the evolution of genes or organisms and environmental adaptation [10].
CUB is assessed by using the effective number of codons (ENC), codon adaptation index (CAI), frequency of optimal codons (FOP), codon bias index (CBI), and relative usage of synonymous codons (RSCU). ENC is calculated by comparing the GC content of synonymous codon positions [11]. CAI is 0-1; the closer the value is to 1, the stronger the nucleotide bases prefer synonymous codons [12]. FOP and CBI are both 0-1. ese two indicators are close to 1, and the optimal codon for encoding amino acids is preferred. However, if CBI is negative, the optimal codon usage is less than the average number of codons used [3,13]. RSCU is the specific value between the actual observation and theoretical observation, amongst which the theoretical observation value is the observation value when the synonymous codon usage frequency is the same, namely, there is no codon bias. If RSCU � 1, there is no CUB. If RSCU > 1, the appearance frequency of the codon is higher than the other synonymous codon. By contrast, it indicates lower genes. If RSCU > 2, then the frequency of CUB is extremely high [14].
Matrix metalloproteinases (MMPs) are a family of Zn 2+ and Ca 2+ -dependent proteolytic enzymes that are widely expressed in animal tissues and highly conservative during biological evolution [15]. MMP-2 and MMP-9 can regulate muscle growth, repair, and some relative processes that affect biochemical reactions for muscle regulation [16]. Although recent research mainly focused on exploring MMP-2 and MMP-9 function for animal skeletal muscle development, healing diseased muscle and even meat [17][18][19][20][21][22][23], studies on MMP codons is rare. erefore, there is an urgent need for exploring mammals' MMP-2 and MMP-9 genetic evolution and codon usage pattern regulating muscle growth.
In   Genetics Research affecting CUB for MMP-2 and MMP-9 genes and provide basic data for enhancing the meat flavour and finding a promising gene treatment for muscle disease.

Base Composition of MMP Genes' CDS in Different
Mammals. e coding sequence (CDS) of yak MMP-2 and MMP-9 genes were obtained in our laboratory, and the NCBI accession numbers were MZ476247 and MZ476248, respectively. e CDS of other animals' genes were from NCBI GenBank, and their accession numbers are shown in Figure 1. CodonW 1.4.2 software developed by J. Peden was used to analyse the MMP-2 and MMP-9 CDS in seven mammals for calculating A/T (A/T base content, the same below), G/C, T 3S (third base of the codon is T content, the same below), C 3S , A 3S , G 3S , GC 3S , AT 3S , ENC, CAI, CBI, FOP, and RSCU [24]. R packages pheatmap and ggplot2 were used to analyse the data.

PR2
Plot. PR2 plot could analyse the bias amongst ATCG under gene mutation [25]. If the frequency of the third base is A > T, then dots are scattered on the top of the PR2 plot. If the frequency is C > G, then dots are on the left.   Table 2: Nucleotide composition in the sequence of MMP-9 gene. Species

Codon Neutral Analysis.
Codon neutral analysis was carried out by the correlation analysis of GC 12 (the average of the GC content of the first and second bases) and GC 3S to compare the influence of natural selection pressure and mutational pressure on CUB [27]. A significant correlation between GC 12 and GC 3S indicated that mutational pressure had a strong influence on codon preference; otherwise, natural selection influenced CUB [28].

ENC Plot.
e relationship between ENC and GC 3S without environmental selection pressure could be simulated by the following formula (1). e ENC/GC 3S reference curve shows the main characteristics of codon usage patterns [24]. If CUB is more affected by natural selection, it should be below the standard curve. By contrast, it should be above the standard curve if it is more affected by other factors such as gene mutation. In general, the ENC is from 35 to 61. If ENC > 35, CUB is weak [11].

Phylogenetic Analysis.
Neighbour joining (NJ) trees were established based on the MMP-2 and MMP-9 CDS in seven mammals. e results (Figure 1) showed that the MMP-2 and MMP-9 genes of Bos grunniens were similar to those of B. taurus. ese two genes of S. scrofa were similar to those of B. grunniens and B. taurus. Interestingly, the MMP-9 genes of C. lupus familiaris showed closer proximity to those of S. scrofa but those of the MMP-2 gene was farther.

Nucleotide Composition of MMP-2 and MMP-9 Genes.
Compared with the content of codon bases of the MMP-2 and MMP-9 genes in seven mammals, the results showed (Tables 1 and 2) that the G/C content was higher than the  A/T content. Most mammals' MMP-2 and MMP-9 GC 3S were larger than AT 3S , except for the MMP-2 gene of B. taurus and M. musculus. e above findings indicated that the MMP-2 and MMP-9 gene codons preferred GC 3S . e codon usage results (Tables 3 and 4) showed that ENCs of the MMP-2 and MMP-9 genes in seven mammals were 40-56, indicating that these two genes had low expression and their codon conservation was high.
CAI showed that the preference for synonymous codons of the MMP-9 genes in seven mammals was significantly better than that of MMP-2, but both were lower than 0.3, indicating that it failed to reflect the preference of synonymous codons.
FOP and CBI results of the MMP-2 and MMP-9 genes showed that the optimal codon usage of MMP-2 in B. grunniens and B. taurus was inferior to the five other animals, whilst the optimal codon usage of MMP-9 was better than that of MMP-2.
Heat map analysis of the correlation between codon base composition and GC 3S (Figure 2) showed that most of the codons of the MMP-2 and MMP-9 genes in different mammals were positively correlated with GC 3S and in line with AC-, CG-, AT-, TC-, GG-, CC-, GC-, and other codons whose third base was C.

Factors Influenced CUB.
e PR2 plot result ( Figure 5) showed that the ATCG base distribution of the MMP-2 and MMP-9 genes amongst seven mammals was above 0.5 on the x-axis. e bases distribution of the MMP-2 genes was mainly on the x-axis and the upper right of the y-axis and that of the MMP-9 genes was to the x-axis and the upper right of the y-axis. e above results indicated that the contents of A 3S and C 3S for the MMP-2 gene and the content of T 3S and C 3S for the MMP-9 gene were high, respectively.
Neutral analysis ( Figure 6 and Table 7) showed that GC 3S of these two genes was in the range of 0.44-0.78, whereas GC 12 was from 0.52 to 0.67. e difference was that GC 12 and GC 3S of the MMP-2 gene were strongly negatively correlated (Pearson r � −0.851, p value < 0.05), whilst GC 12 and GC 3S of the MMP-9 gene were not significantly correlated, indicating that the base composition of the MMP-2 gene codons was susceptible to mutational pressure, but the factor influencing the MMP-9 gene was natural selection. e ENC plot showed (Figure 7 and Table 7) that all ENC/GC 3S dots of the MMP-2 and MMP-9 genes were distributed below the reference line. ENC and GC 3S had a strongly negative correlation (MMP-2: Pearson r � −0.993, p value < 0.01; MMP-9: Pearson r � −0.963, p value < 0.01), and the distribution range of GC 3S was large, indicating that the CUB of these two genes was affected by mutational pressure.

Discussion
is study found that gelatinase MMP genes had CUB for encoding amino acids such as Ile, Arg, Glu, and Ser related to muscle development and meat quality. Gly, Arg, and Leu can promote collagen synthesis, and animal muscle is the main way to obtain natural collagen for humans [29,30]. Delicious amino acids (DAAs), including Glu, Gly, Ser, Asp, Arg, and Ile, are known as precursor substances that determine the flavour of meat and can improve the taste of chicken and keep the meat soft [31]. Recent research found that the quality of chicken improves and the content of DAAs increases [32]. Otherwise, Strecker amino acids (SAAs), including Phe (phenylalanine), Cys (cysteine), Ile (isoleucine), and Leu (leucine), are highly related to the production of flavour. e higher their content, the stronger the fragrance [33]. For the MMP-2 and MMP-9 genes, the RSCUs of AUC encoding Ile; UCC and AGC encoding Ser; CGC encoding Arg; GAC encoding Asp; GAG encoding Glu; UUC   Genetics Research sarcopenia with aging [35].
us, BCAAs are important regulators of metabolism and metabolic health in in vivo [36]. e gelatinase MMP CUB associated with corresponding AAs can provide basic data for the improvement of meat quality and muscle disease of MMP molecular modification.
Mutational pressure may be the main factor influencing the CUB of MMPs.
is study found that the clustering results of the RSCU were different from the NJ trees of the genes, indicating that the MMP genes were highly conserved but maybe subjected to mutations during the evolution of different species.
is influence caused a decline in the  accuracy of single-gene species classification. Nucleotide AT (U) CG base composition is an important feature of genes, and the GC content can reflect the overall trend of gene mutation which is a decisive factor affecting the frequency of nucleotide use. Changes in the third base of the codon did not affect the encoded AAs, so GC 3S could be an important reference for analysing the codon usage pattern. e gene mutation will affect the composition of the synonymous  Note. * P value < 0.05; * * p value < 0.01; red represents strong correlation, blue represents moderate correlation, and black represents irrelevance. codon third bases with no natural selection, and the stronger the CUB, the more the codon is inclined to GC 3S [37,38]. Novembre et al. also found that the third base distribution of the MMP-2 and MMP-9 genes is mainly AC 3S and CT 3S , respectively, and the ENC/GC 3S dot distribution can reach a wide range compared with the reference curve with gene mutation pressure. us, mutational pressure may play an important role in affecting the CUB for MMP-2 and MMP-9 genes, which also explains the difference in RSCU clustering in the seven mammals. Interestingly, we also found that the clustering results based on the RSCU of the MMP-2 gene were not completely consistent with the phylogenetic results based on the MMP-2 gene's CDS. Given that wild yak and Tibetan antelope grow in harsh environments with low altitudes and oxygen consumption, their EGLN1 gene has mutated changing nucleotide bases and leading to CUB changes [39,40]. erefore, we believe that the phylogenetic evolution of MMP-2 genes should not only refer to gene sequence but also CUB, which could be a supplement to species classification.

Conclusion
MMP-2 and MMP-9 are low-expression genes in mammals, and their codons are highly conservative. Both have a CUB at GC 3S and prefer codons encoding DAAs and SAAs for improving soft meat and muscle disease treatment.
Data Availability e yak MMP-2 and MMP-9 genes data used to support the findings of this study are included within the article and are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.