Identification of Biomarkers Associated with Cancerous Change in Oral Leukoplakia Based on Integrated Transcriptome Analysis

Objective Oral leukoplakia (OLK) is the most common precancerous lesion in the oral cavity. This study aimed to explore key biomarkers for monitoring OLK for early diagnosis of oral squamous cell carcinoma (OSCC) and screen small-molecule drugs for the prevention of OSCC. Method The Gene Expression Omnibus (GEO) database was explored to extract two microarray datasets, namely, GSE85195 and GSE25099. The data of the normal group, OLK group, and OSCC group were analyzed by weighted gene coexpression network analysis (WGCNA) to identify the most significant gene module and differentially expressed genes (DEGs). The intersection genes were extracted as the key genes of OLK carcinogenesis. Subsequently, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were analyzed in the module. Connectivity Map and molecular docking were used to screen small-molecule drugs. The diagnostic values of four key genes were identified and verified in the GSE26549 dataset. Results WGCNA obtained the red module (r = −0.91, p < 0.05) with the strongest correlation with cancerous phenotype. GO enrichment analysis showed 60 pathways, including 28 biological processes, 11 cell components, and 21 molecular functions, and KEGG enrichment analysis showed 4 pathways (p < 0.05). In the differential expression analysis, there was no intersection between the upregulated genes and the red module genes. However, the intersection of the downregulated genes and the red module genes yielded 4 key genes: dopachrome tautomerase (DCT), keratin 3 (KRT3), keratin 76 (KRT76), and FAM3 metabolic regulation signal molecule B (FAM3B). The area under the curve of the diagnostic model constructed by these four genes was 0.963 (CI = 0.913–1.000). The sensitivity was 0.933, and the specificity was 0.923. The diagnostic model was successfully verified in GSE26549 (AUC = 0.745, CI = 0.638–0.851). Compared with the diagnostic models of the previous studies, the diagnostic efficiency of this model was the highest. The small-molecule drugs, selumetinib and benidipine, were selected according to the gene expression profile and showed binding activity when docking with the above molecules. Conclusions This study provides new targets and drugs for OLK. These targets could be used as the key diagnostic molecules for long-term follow-up of OLK. The small-molecule drugs selumetinib and benidipine could be used for the prevention and treatment of OSCC.


Introduction
Oral leukoplakia (OLK) is a potentially malignant disease, with an incidence of 409.2 per 100,000 persons in men and 70.0 in women [1]. e pathological manifestations of OLK are various degrees of abnormal epithelial hyperplasia, which eventually progresses to malignant transformation and invades the surrounding tissues [2]. OLK is the most common precancerous lesion in the oral cavity. e malignant transformation rate is between 1% and 40% [3]. Clinically, photodynamic therapy, microwave therapy, and surgical resection are usually used to prevent cancer development [4]. e clinical manifestations and pathology of patients need to be followed up and monitored. Currently, there is no specific drug for oral squamous cell carcinoma (OSCC) treatment. erefore, early diagnosis is an important means to reduce morbidity and mortality and improve prognosis. Due to the insidious onset and nonspecific symptoms of OLK, the diagnosis of OLK is often delayed. It is well known that changes in gene expression usually precede histopathological changes and are closely associated with the progression of cancers [5]. erefore, abnormal gene expression has become a new perspective for the early diagnosis of OLK.
Currently, there are no clear diagnostic methods to predict whether OLK would become cancer. Cai et al. [6] found that SPP1 increased as normal tissues progressed to OLK and OSCC tissues. Fernanda Herrera Costa et al. [7] found that ALDH1A1 was positive in OLK tissues and negative in OSCC tissues, while ALDH1A2 was negative in OLK tissues and positive in OSCC tissues, which can be used as potential biomarkers for early detection of OSCC. However, this method only focused on the effect of the single gene, which cannot reveal the genetic relationship and build the relationship between genes and diseases. With the development of bioinformatics, large-sample, high-throughput gene microarray chip sequencing technology is increasingly used for the screening of diagnostic biomarkers. e most important bioinformatic analysis method is weighted gene coexpression network analysis (WGCNA). WGCNA is a method of constructing an expression network by using correlations between genes to mine gene modules that are highly related to external biological traits. It is highly sensitive to genes with small fluctuations and magnifies potential organisms in functional enrichment research. e signal has been successfully used in the analysis of a variety of diseases. Niemira et al. [8] used WGCNA to identify 4 new molecular targets related to nonsmall cell lung cancer, to gain an understanding of the pathogenesis of the disease. Yang et al. [9] used WGCNA to construct a four-gene prognostic model for liver cancer, and its C-index has better predictive potential than TNM staging.
In this study, the combined mRNA sequencing data of multiple microarray chips were used for WGCNA and differential expression analysis, pathway enrichment analysis, and protein interaction network construction to obtain key genes and pathways for OLK carcinogenesis. e results were used in centralized validation of external data. We also screened small-molecule drugs that could prevent OLK from becoming cancerous lesions through the Connectivity Map database with molecular docking, providing a theoretical basis for the clinical diagnosis and treatment of OLK. e flow chart of this study is shown in Figure 1.

Microarray Chip Expression Data Source.
e raw data of GSE85195 [10] and GSE25099 [11] were downloaded from Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/ geo/) and processed using R software (4.0.5). Agilent's microarray sequencing chip was used in GSE85195; the sample type was Homo sapiens tissue samples; and the sequencing platform was the GPL6480 (Agilent-014850 Whole Human Genome Microarray 4 × 44 K G4112 F). e samples included 1 normal tissue, 15 OLK tissues, and 34 OSCC tissues. Affymetrix's microarray sequencing chip was used in GSE25099; the sample type was Homo sapiens tissue samples; and the sequencing platform was the GPL5175 (Affymetrix Human Exon 1.0 ST Array). e samples included 22 normal tissues and 57 OSCC tissues. e detailed information is shown in Table 1.
e normal-izeBetweenArrays function of the limma package [12] and the RMA method of the affy [13] package was used to perform data standardization, normalization, and gene annotation, remove probes without annotation information, take the average expression when the same probe appears multiple times, and take the common gene combined data in the two datasets. e ComBat method of the sva package [14] was used to remove batch effects between multiple datasets to obtain a gene expression matrix. (WGCNA). WGCNA package [15] was used to perform weighted gene coexpression network analysis: First, the hclust function was used to cluster the above gene expression matrix, remove outliers, and construct a gene relationship network for all the gene data in the remaining samples. e pickSoft reshold function was used to select the best soft threshold. en, the gene module was constructed, and blockwiseModules function was used to identify gene modules, set the minimum number of genes in the module to 20, the maximum number of genes to 7,000, and the tree cutting height value to 0.25. Finally, the correlation between each gene module and the clinical phenotype was calculated, moduleEigengenes function was used to calculate the module eigengene (ME) in each module, and the first principal component in the principal component analysis result was used to express the gene expression in each module. e overall level of the module's feature values was compared with the normal samples, OLK samples, OSCC samples, and status (status phenotype represented the dynamic process of progression from normal to OLK to OSCC). Pearson's correlation analysis was performed, the modules with the strongest correlation with the phenotype were selected, and the correlation coefficient was changed in sequence with the progression of the disease.

Functional Enrichment Analysis of Gene Modules.
In order to dig deeper into the biological functions of gene modules, the abovementioned selected gene modules were subjected to Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genome (KEGG) enrichment analysis, in the Database for Annotation, Visualization, and Integrated Discovery (David, https://david. ncifcrf.gov/Home.jsp). e biological processes, cell components, molecular functions, and KEGG pathways were 2 Journal of Oncology selected, and the results were exported to obtain pathways related to OLK and OSCC.

e Protein Interaction Network Diagram of Gene Modules.
e genes in the module with the strongest correlation with the clinical phenotype in the WGCNA results were extracted, and the STRING online database (https:// string-db.org/) was used to score the genes in the module to predict the possibility of protein interactions and construct a protein-protein interaction network diagram. e more complex the structural relationship, the more important the core gene was in the development of the disease. e analysis results were imported into Cytoscape (v3.8.2), and the maximal clique centrality (MCC) algorithm in the cyto-Hubba plugin was used to mark the top 10 key genes.

Differential Expression Analysis.
We compared the three groups of samples in the abovementioned high-throughput gene expression matrix pair by pair, and the limma package [12] in R software (4.0.5) was used for differential expression analysis. e calculation method of the difference between any two groups was that the latter was compared with the former. e ratio was the fold change (FC) value, and then, the logarithmic value based on 2 was used. is value was represented by log 2 FC. If the log 2 FC value is greater than 0, it means that the latter was higher than the former. It was an upregulated gene and was represented by red. Otherwise, it is downregulated gene and was indicated in blue. According to the FDR criterion proposed by Benjamini and Hochberg in 1995, the adjusted p value was calculated by multiplying the p value by the ranking of the gene in the total genes. In differential gene expression analysis, the difference was considered to be meaningful when the multiplicity of difference was greater than 2, filter |log 2 FC| greater than 1, p value less than 0.05, and adjusted p value less than 0.05 as filter conditions. e volcano map and heat map were drawn. R software was used to compare the upregulated genes and downregulated genes of the three groups of normal, OLK, and OSCC with the most clinically relevant gene modules and drew the Venn diagram to obtain the key genes. e key genes were used as a diagnostic model to identify whether OLK was cancerous.
2.6. Small-Molecule Drug Prediction. Connectivity Map (https://clue.io/) is a database of chemical reagent action expression profiles. Researchers can use gene expression profiles to match chemical drugs in the database. e degree  of enrichment is expressed by scores. A positive number means that the chemical drug action is the same as the expression profile results. Negative numbers represent the opposite. e higher the score, the more similar and the better the prediction effect. e gene expression profile of OLK vs. OSCC was entered into the Query tool in the database. We entered the 150 upregulated genes and 150 downregulated genes with the largest fold difference, and the top 5 smallmolecule chemical drugs with the highest scores in the results were selected to be molecularly docked with key genes.

Molecular
Docking. e three-dimensional structure of the protein molecules was downloaded from the AlphaFold database (https://alphafold.ebi.ac.uk/), and the small-molecule drugs were downloaded from the PubChem database (https://pubchem.ncbi.nlm.nih.gov/). e AutoDock Vina 1.2.0 for molecular docking was used for analyzing the threedimensional structure. e PyMOL 2.5.2 software was used to draw three-dimensional docking images, and the Discovery Studio 2021 software was used to draw two-dimensional docking images. It is generally believed that when the binding energy is less than −4.25 kcal/mol, there is a certain binding activity between the small-molecule drug and the protein.
When the binding energy is less than −5.0 kcal/mol, it is considered that the two have good binding activity, and when the binding energy is less than −7.0 kcal/mol, it is considered that the two have strong binding activity.

Verification in the External Dataset.
e OLK group and OSCC group were extracted in the dataset of this study, and the expression level and diagnostic efficacy of the diagnostic model were compared. en, GSE26549 [16] was used to verify the diagnostic model. Affymetrix's microarray sequencing chip was used in GSE26549; the sample type was Homo sapiens tissue samples; and the sequencing platform was the GPL6244 (Affymetrix Human Gene 1.0 ST Array). e samples included 86 OLK tissues. e median follow-up time was 6.08 years, and 35 of the 86 patients developed OSCC over the course. R software (4.0.5) was used to compare the expression differences of four key genes in different pathological grades of OLK. Since the samples of the severe dysplasia were less than 3, 2 severe dysplasia samples and 1 sample without pathological type were not included in the study. But all samples are included in the comparison of hyperplasia and dysplasia. en, the receiver operating characteristic curve (ROC) is analyzed, and the diagnostic performance of different diagnostic models in the dataset is compared. In order to dig deeper into the functions of the key genes, the Gene Expression Profiling Interactive Analysis (GEPIA, http://gepia.cancer-pku.cn/) database was used to calculate the overall survival rate and disease-free survival rate, and its expression characteristics were observed in tumors.

Protein Expression in Healthy and Tumor Tissues.
e Human Protein Atlas (HPA, http://www.proteinatlas.org/) database provides multispecies, multitissue, and multisite immunohistochemical staining sections, including a large number of normal tissues and cancer tissues. It contains immunohistochemical staining results of a variety of proteins and currently contains more than 26,000 kinds of antibodies. It was used to compare the expression levels of key genes in the normal head and neck tissues and squamous cell carcinoma tissues.

Removal of the Batch Effect of Gene Expression Profiling
Chip. After the two datasets of GSE85195 and GSE25099 were merged, they contained 13,905 genes and 129 samples, including 23 normal tissue samples, 15 OLK samples, and 91 OSCC samples. ere were no missing values in the gene expression matrix. Figure 2 shows the principal component analysis (PCA) results before and after batch correction. Before the batch correction, the datasets were clustered and the datasets were separated, indicating that there was an obvious batch effect. After batch correction, the datasets were clustered and the tissue samples were separated, indicating that the batch effect had been removed and could be used for subsequent analysis.

Construction of Gene Coexpression Network.
We performed cluster analysis on all samples and found no outliers (Figure 3(a)), so all samples were included for subsequent analysis. First, we constructed a gene relationship network: when R 2 is more than 0.8 and the average connectivity is less than 100 (Figures 3(b) and 3(c)), the best soft threshold was selected as 6.
en, we constructed the gene module according to the selection of the above soft threshold. When the branch height was less than 0.25, it was considered that the gene similarity exceeds 75% and could be merged into one module, and the genes that cannot match any module were merged into a gray module. A total of 23 modules were identified ( Figure 3(d)).
Subsequently, cluster analysis was performed on all gene modules (Supplementary Figure 1A), and a correlation heat map was drawn (Supplementary Figure 1B). It was found that each gene module has little correlation with other modules and has a strong correlation with its own module. e adjacency relationship or topological overlap relationship was represented by a heat map ( Supplementary Figure 2), and it was found that each gene had a strong correlation with the module gene. e correlation between the genes in the module was represented by a density map (Supplementary Figure 3). It was found that except for the gray module, the correlation coefficients of genes and modules in other modules were all basically distributed above 0.5, indicating that the genes in the module were highly correlated with each related module. ese results suggested that the module construction was reasonable. Supplementary Figure 4 shows the relationship between module connectivity and genes. e connectivity of each gene in the module was the sum of its correlation with other genes. We found that except for the gray module, the greater the correlation coefficient between a gene and a module, the . e shapes represent different datasets, the circle is GSE25099, the triangle is GSE85195, the colors represent different tissue samples, dark blue represents normal tissue, red represents OLK tissue, and green represents OSCC tissue.      greater the mean connectivity of the gene, the more the gene was at the core of the module, and the network in each module conforms to the scale-free network. Finally, the correlation between the gene module and the clinical phenotype was analyzed (Figure 3(e)). It was found that the red module had the strongest correlation with the disease development (r � −0.91, p � 2e − 50), which contained a total of 435 genes. e correlation coefficient of modular genes gradually decreased with the progression of the disease. Figures 3(f )-3(i) show the correlation analysis of module membership (MM) and gene significance (GS) in the red module for the four phenotypes of normal, OLK, OSCC, and status. MM was the correlation between gene expression and module eigengene, and GS was the correlation coefficient between gene and phenotype. e correlation coefficient in the normal group was 0.87 (p � 1.6e − 135), the correlation coefficient in the OLK group was 0.52 (p � 1.7e − 31), the correlation coefficient in the OSCC group was 0.96 (p < 1e − 200), and the correlation coefficient in the status group was 0.97 (p < 1e − 200). We found that in the red module, MM and GS were related to the phenotype genes with stronger correlations that had higher module eigenvalues, and red modules were highly correlated with each phenotype. e expression trend of the red module gene in the sample showed that as the characteristic value of the module increases, the expression of the gene in the sample increases (Figure 3(j)), indicating that this module gene could be used to describe the sample.

Functional Enrichment Analysis.
ere were 60 GO analysis results, including 28 biological processes, 11 cell components, and 21 molecular functions (p < 0.05), sorted by p value from small to large. Figure 4(a) shows the top 10 biological processes, which were mainly related to melanin biosynthetic process, vitamin D metabolic process, exogenous drug catabolic process, xenobiotic metabolic process, mammary gland epithelial cell differentiation, left/right axis specification, ion transport, steroid metabolic process, biotin metabolic process, and epithelial cell differentiation. Figure 4(b) shows the first 10 cellular components, which were mainly related to melanosome membrane, melanosome, organelle membrane, cell cortex, extracellular exosome, endosome membrane, mitochondrion, intermediate filament, extrinsic component of membrane, and serotonin-activated cation-selective channel complex. Figure 4(c) shows the first 10 molecular functions, which were mainly related to oxidoreductase activity/acting on paired donors/ with the incorporation of or reduction in molecular oxygen, monooxygenase activity, heme binding, protein homodimerization activity, oxidoreductase activity, iron ion binding, steroid hydroxylase activity, oxygen binding, transcription factor binding, and transcription corepressor activity.
ere were 10 KEGG analysis results totally shown in Figure 4(d), which were mainly related to 4 statistically significant pathways. ey were metabolic pathways, drug metabolism-cytochrome P450, linoleic acid metabolism, and arginine and proline metabolisms (p < 0.05).

Construction of a Protein Interaction Network Diagram.
e genes in the red module were entered into the STRING database, Homo sapiens were selected as the species, and the noninteracting proteins were hidden. ere were a total of 342 pairs of protein interactions. e network diagram formed is shown in Figure 4(e). e MCC algorithm in Cytoscape software could predict the protein or gene at the core position in the network and highlight the first 10 types. e darker the color indicated, the gene was more likely to be the core gene. ese genes were as follows: TP53, UBC, CYP3A4, GTPBP4, RBM28, DCT, TYR, TYRP1, PDCD11, and CYP2C19 (Figure 4(f )).

Differential Expression Analysis.
e results of a pairwise comparison of the normal group, the OLK group, and the OSCC group showed that in comparing the normal group with the OSCC group, 176 genes were screened out according to the filter conditions, among them 101 genes were upregulated and 75 genes were downregulated ( Figure 5(a) and 5(b)); in comparing the OLK group and the OSCC group, 780 genes were screened out, and among them, 337 genes were upregulated and 443 genes were downregulated (Figures 5(c) and 5(d)); in comparing the normal group with the OSCC group, 847 genes were screened out, and among them, 419 genes were upregulated and 428 genes were downregulated (Figures 5(e) and 5(f )).
e results of the pairwise difference analysis among the normal group, the OLK group, and the OSCC group are intersected with the genes in the red module. ere was no overlap in the upregulated genes, and there were 4 key genes in the downregulated genes, which were the dopachrome tautomerase (DCT) gene, keratin 3 (KRT3) gene, keratin 76 (KRT76) gene, and FAM3 metabolic regulation signal molecule B (FAM3B) gene (Figures 5(g) and 5(h)). ese four key genes were marked in the volcano map.

Small-Molecule Drug Screening.
e Connectivity Map database was used to compare and analyze the gene expression profile. We selected small-molecule drugs that could adjust the expression profile and ranked the top 5 according to the scoring from high to low (Table 2). e first place was selumetinib, which was a MEK inhibitor and MAP kinase inhibitor, with an enrichment score of −96.76. e second place was benidipine, which was a calcium channel blocker and L-type calcium channel blocker with an enrichment score of −95.74. e third place was levetiracetam, which was an acetylcholine receptor agonist, N-type calcium channel blocker, and synaptic vesicle glycoprotein ligand, with an enrichment score of −93.23. e fourth place was ampicillin, which was a cell wall synthesis inhibitor, with an enrichment score of −92.88. e fifth place was aminolevulinic acid, which was an oxidizing agent, with an enrichment score of −92.37.  OSCC group (f ). Intersection of differential genes in the healthy group, OLK group, and OSCC group compared with the genes in the red module (g-h). Upregulated genes (g). Downregulated genes (h).    e binding energy of benidipine and FAM3B was −6 kcal/mol, showing a good binding activity.

Verification in the External Dataset.
In the combined dataset of GSE85195 and GSE25099, the expression of DCT, KRT76, and FAM3B in the OSCC group was significantly lower than that of the OLK group, and the expression of KRT3 in the OSCC group was lower than that of the OLK group (p < 0.05) (Figure 7(a)).
e ROC curve analysis showed that the area under the curve in which DCT, KRT3, KRT76, and FAM3B distinguished OLK from OSCC is 0.946 (CI � 0.900-0.991), 0.665 (CI � 0.451-0.880), 0.893 (CI � 0.802-0.984), and 0.911 (CI � 0.844-0.977), respectively (Figure 7(b)). When the four indicators jointly distinguished OLK from OSCC, the area under the curve was 0.963 (CI � 0.913-1.000), sensitivity 0.933, and specificity 0.923, which had high diagnostic accuracy (Figure 7(c)). e dataset of GSE26549 was downloaded to compare the differences between four key genes in different pathological grades of OLK and analyze the ROC curve of the diagnostic model in OLK and OSCC. e results showed that the difference between hyperplasia and moderate dysplasia of DCT and FAM3B was statistically significant (p < 0.05), and the differences in other genes among the three groups were not statistically significant (Figure 7(d)). e comparison of hyperplasia and dysplasia showed that the difference between the two groups of DCT was statistically significant (p < 0.05), and the difference of other genes between the three groups was not statistically significant (Figure 7(e)).
According to the pan-cancer study of the GEPIA database, DCT was highly expressed in skin cutaneous melanoma (SKCM) and almost not expressed in normal skin tissues and other tissues (Figure 8(a)). e expression of KRT3 was highest in the normal esophageal epithelium and oral mucosal epithelium of the head and neck. It was almost not expressed in other tissues of the body, and its expression decreased when it underwent cancerous transformation (Figure 8(b)). e expression of KRT76 was the highest in normal head and neck oral mucosal epithelial tissues, it was almost not expressed in other tissues of the body, and its expression was reduced when cancer occurs (Figure 8(c)). e expression of FAM3B in tumors of various systems was sometimes upregulated and sometimes downregulated, and its expression was decreased in head and neck squamous cell carcinoma (HNSC) (Figure 8(d)), which validated our results.
Survival analysis showed that there was a difference in the overall survival rate between the two groups with high and low expression of DCT gene (logrank p � 0.04) (Figure 8(e)), and the high expression group was the risk group. e overall survival rate of the FAM3B gene highand low-expression groups was different (logrank p � 0.046) (Figure 8(h)), and the low-expression group was the highrisk group. ere were also differences in the survival curves of disease-free survival between the two groups (logrank p � 0.013) (Figure 8(l)). e low-expression group was the high-risk group, and the remaining survival curves had no statistical difference (Figures 8(f )

Protein Expression in Normal and
Tumor Tissues in the HPA Database. We searched the expression levels of four genes in normal oral epithelial tissues and tumor tissues, but KRT3 was not found in the database. HPA database showed that DCT expression was higher in normal oral epithelial cells than in cancerous cells (Figures 8(m) and 8(n)). e cytoplasmic/membranous expression of KRT76 in oral epithelial cells was high in normal epithelial tissues, and its expression was reduced in squamous cell carcinoma (Figures 8(o) and 8(p)). e cytoplasmic/membranous expression of FAM3B in oral epithelial cells was reduced when  the epithelial tissue undergoes cancerous change (Figures 8(q) and 8(r)). e conclusion was consistent with our above research results.

Discussion
In this study, the multichip joint analysis solved the problem of insufficient sample size in oral cancer research. e gene changes in the process of phenotypic changes of epithelium from normal to abnormal hyperplasia and cancer were analyzed. e one-step analysis of WGCNA among the three groups of samples simplifies the analysis process and does not require pairwise comparison analysis, and the correlation analysis results with the clinical phenotype can be directly determined.
e key genes screened by the intersection of the gene module with the strongest correlation of WGCNA and differential expression analysis not only retain the most relevant genes in the WGCNA results with clinical phenotype but also retain the different genes in the differential expression analysis results and save the step of subjectively screening and optimizing the analysis process.
e four genes screened in this study are not only significantly different between OLK and OSCC, but DCT is also different in simple hyperplasia and dysplasia, and DCT and FAM3B are also different in hyperplasia and moderate dysplasia. Other genes are not significantly different in different pathological grades of OLK, and the reason may be due to the small sample size. Severe dysplasia is most prone to cancer, so severe dysplasia of OLK is extremely difficult to obtain, which has a certain impact on the results.
DCT, also named as tyrosinase-related protein 2 (tyrosine 2, TYR2), is one of the three important enzymes for melanin synthesis in the human body. e lack of DCT can cause albinism. Studies have shown that the risk of squamous cell carcinoma in patients with albinism is 1,000 times higher than that of the general population [22]. Cell experiments [23] and pan-cancer studies have shown that primary melanoma, metastatic melanoma, and nevus occur  Survival curve of KRT3 gene disease-free survival rate (j). Survival curve of KRT76 gene disease-free survival rate (k). Survival curve of FAM3B gene disease-free survival rate (l). DCT expression in normal oral epithelial tissues (m). DCT expression in oral squamous cell carcinoma (n). Expression of KRT76 in normal oral epithelial tissues (o). Expression of KRT76 in oral squamous cell carcinoma (p). FAM3B expression in normal oral epithelial tissues (q). FAM3B expression in oral squamous cell carcinoma (r).
when DCT is overexpressed, and squamous cell carcinoma occurs when DCT is expressed at a low level. KRT3 and KRT76 are the skeletal components of cells, which are mainly distributed in the skin, mucous membrane, esophagus, and other areas with a high degree of keratinization, and their expression is significantly reduced when cancerous change occurs. KRT3 mutations are related to corneal dystrophy [24]. Immunohistochemistry and KRT76 knockout mouse experiments showed that the expression of KRT76 was downregulated in normal oral tissues, oral precancerous tissues, and oral cancer tissues [25]. e reason for increased tumor susceptibility is related to the change in the tumor microenvironment and immune factors [26]. FAM3B is a cytokine-like protein that regulates glucose and lipid metabolism by interacting with the liver and pancreas. Its elevated expression is related to type 2 diabetes, colon cancer, prostate cancer, and more. Staining of tissue sections showed that FAM3B expression decreased in OSCC tissues. Mouse experiments showed that knockdown of FAM3B promotes cell apoptosis by upregulating p53 in mice [27], and the expression of p53 protein increases in OSCC [28]. erefore, FAM3B might cause OSCC by upregulating p53 protein.
GO enrichment analysis in the red module showed that OLK was related to the abnormal expression of multiple pathways, including vitamin D metabolic process, steroid metabolic process, exogenous drug catabolic process, and so on. Ras activation suppressed vitamin D transcriptional activity, vitamin D levels fell, and the risk of OLK cancer and OSCC increased. e prognosis and overall quality of life of patients with OSCC were affected by abnormal expression of the vitamin D metabolic pathway [29]. As a steroid hormone derivative, vitamin D also inhibited the activation of the NF-κB pathway mediated by lcn2 via RPS3, enhancing the susceptibility of OSCC to cisplatin and the efficacy of treatment [30]. Patients with OSCC had considerably greater plasma and saliva cortisol levels than patients with OLK, patients with OLK had higher cortisol levels than normal people, and those with advanced stages had higher cortisol levels than those with early stages. As a result, OSCC is related to aberrant steroid metabolism [31]. Meanwhile, the drug efficacy of the patients with OSCC was influenced by the exogenous drug catabolic process. Endogenous and external triggers such as pH, matrix metalloproteinases (MMPs), reactive oxygen species (ROS), redox conditions, light, and magnetic fields could activate the stimulating response drug delivery system and improve the prognosis [32].
KEGG enrichment analysis in the red module showed that drug metabolism-cytochrome P450 and arginine, linoleic acid metabolism, and proline metabolism were related to OSCC. Cytochrome P450 2R1 (CYP2R1) is a vitamin D 25-hydroxylase that is involved in the conversion of dietary vitamin D to the active metabolite 25-(OH)-D 3 . CYP2R1 and vitamin D receptor (VDR) mRNA expression considerably rose in OSCC, according to the real-time RT-PCR study [33]. Simultaneously, the expression of the cytochrome P450 subtypes 1A1 and 1B1 (CYP1A1 and CYP1B1) genes increased in the head and neck cancer (HNC) cell line [34], elucidating the involvement of cytochrome P450 in OSCC. In the amino acid codon 72 of the p53 protein, there is a single-nucleotide polymorphism that encodes arginine (Arg) or proline (Pro). e arginine genotype of the OSCC group lowered the risk of oral cancer compared to the normal control group; however, the proline allele raised the risk of OSCC [35], showing the importance of arginine and proline metabolism in OSCC. e relevant results are consistent with the conclusions of this study.
OLK is an important step in the process of oral mucosal carcinogenesis, which requires long-term close monitoring and follow-up. e current monitoring of OLK is limited to clinical manifestations and lacks clear molecular indicators of cancer. e key genes screened in this study provide high-performance diagnostic indicators for the long-term monitoring of OLK and early diagnosis of OSCC. e genes DCT, KRT3, and FAM3B are reported for the first time in OSCC, and their expression remains to be evaluated. More samples are still needed to expand to verify the conclusions, and more intensive biological verification should be carried out in cell experiments and animal experiments.
Data Availability e data that support the findings of this study are openly available in GSE85195, GSE25099, and GSE26549 at https:// www.ncbi.nlm.nih.gov/geo/.

Disclosure
Chunshen Li and Yingying Shi are the co-first authors.

Conflicts of Interest
e authors declare that there are no conflicts of interest.

Supplementary Materials
Supplementary Figure 1: clustering tree between modules (A). Clustering heat map between modules (B). e abscissa and ordinate are the module names. Red is high similarity, and blue is low similarity. Supplementary Figure 2: the relationship between gene clusters and modules in each module. e darker the color, the stronger the correlation. Supplementary Figure 3: density plot of gene number and correlation coefficient. e abscissa is the correlation coefficient between genes and modules, and the ordinate is the number of genes. It is used to observe the distribution of correlation coefficients between each module and the gene expression in the module. A density map of 4 modules is drawn in each graph. Supplementary Figure 4: the relationship between connectivity and gene correlation in each module. e abscissa is the connectivity of each gene in each module, and the ordinate is the correlation coefficient between each gene in each module and the module. (Supplementary Materials)