Microarray datasets are widely used resources to predict and characterize functional entities of the whole genomics. The study initiated here aims to identify overexpressed stress responsive genes using microarray datasets applying
Plant stresses are the reasons for food insecurity and thus are a major threat to mankind [
Plants respond to environmental stresses at both cellular and molecular level by altering the expression of many genes via different types of complex molecular signaling networks [
Various abiotic stresses, such as drought, high salinity, and variable temperature, negatively impact plant growth and productivity. Plants have adapted to respond to these stresses at the molecular, cellular, physiological, and biochemical level, enabling them to survive. Various adverse environmental stresses induce the expression of a variety of genes in many plant species [
In current study,
The whole working procedure is demonstrated in Figure
(a) A schematic diagram elucidating the whole experimental procedures mentioned with all used databases and software. (b) A screen shot of the used tools for literature search. Alibaba and Highwire searching tools were extensively used in this study.
In the current study, microarray datasets were retrieved from ArrayExpress database [
Microarray datasets collected for this study.
Serial | Accession | Description of the data |
---|---|---|
1 | E-MEXP-3714 | To identify novel miRNA and NAT-siRNAs that are associated with salt and cold stresses in |
|
||
2 | E-GEOD-42290 | Expression data in an |
|
||
3 | E-GEOD-45543 | Microarray analysis of transcriptional responses to abscisic acid and salt stress in |
|
||
4 | E-GEOD-33642 | Genome-wide profiling of small RNAs in |
All collected datasets were downloaded and copied in an excel sheet. The target was to create a scatter plot based on the value (log value) generated by the microarray expression. A simple layout was created from the value corresponding to the samples represented in the study.
Cytoscape version 3.1.0 (
All expressed genes from the physically interacting nodes were put in Venny to find the common genes/proteins, and a Venn diagram was produced as an output file (
Those commonly upregulated genes were taken for further analysis. Protein-protein interaction was identified using String 9.05 (
Through extensive text mining and literature search the initial stress related gene pool for stress response was generated. For searching literatures associated with
Target genes were blasted in BlASTn suite from NCBI (
Gene ontology program [
Targeted genes functions were determined by finding actual protein domains using InterProScan [
The robustness of the network was analyzed by checking several different parameters of the network. The statistical probability was counted following the methods mentioned by Zaman et al. [
Primarily four microarray datasets from ArrayExpress database were taken. Samples data were collected and arranged in an excel sheet. The samples expression based on log value was justified and only the values of more than 5 were considered in this study. Figure
Chart layouts for collected datasets for this study. The value of the represented transcripts ranged from 0 to 20 and was classified into four distinct groups like 0–5, 5–10, 10–15, and 15–20. All transcripts are upregulated at the range of up to 5–10.
The collected data were further screened based on the expression mentioned in Figure
Selected numbers of transcripts were chosen for further studies.
Serial | Treatment | Number of transcripts |
---|---|---|
1 | ABA | 643 |
2 | Drought | 526 |
3 | Cold | 1023 |
4 | Salinity | 977 |
The selected transcripts ID were then merged into Cytoscape software and the aim was to create an expression hub based on the physical interaction value as well as maximum expression parameters. The hub generated (Figure
An expression hub generated by Cytoscape. (a) Here clustering coefficient calculated was 0.657, network diameter was 14 at radius 1, number of nodes were 680, and the network centralization was 0.099. (b) Clustering genes based on physical distance calculated. Three visible clusters were found keeping the coefficient value at 0.5.
The commonly expressing (i.e., upregulated) and physically connected genes found in different stress signals were then sorted by using the Van Diagram technique to create a Venn diagram, so that common transcripts could easily be isolated from the transcript chunks. The Venn diagram demonstrated that only 42 commonly upregulated genes were found in the selected datasets (Figure
A Venn diagram representing the common transcripts from the collected datasets. Note that the four collected datasets were generated from
The 42 commonly upregulated genes (Supplementary File 1, see Supplementary Material available online at
Protein-protein interaction networks of plant transcription factors (TFs), enzymes, and regulatory genes in plant abiotic stress responses. Here, regulatory genes and TF are indicated with the arrow mark. Abiotic stresses like drought, salt, cold, and ABA stress factors modulated the level and activity of the regulatory genes and their target genes. The box (b) represents an indication of TF proteins from the model plant
A gene regulatory network of thirty commonly upregulated genes in ABA dependent pathway, salinity stress, cold, and drought stress responsive pathways. The bridging between transcription factor and stress responsive proteins clearly indicated their corelation in this figure.
It was revealed that (Figure
In the light of the above result eight genes TF and enzymes were bridging among each other and brought their downstream targets in the expression hub. These strongly corelated connectomes in abiotic stress tolerance were short-listed (Table
Selected major eight genes/TF/enzymes with their short form.
Serial | Gene name | Identity |
---|---|---|
01 | DREB2A (dehydration-responsive element-binding protein 2A) | Transcription factor |
02 | P5CS1 (delta-1-pyrroline-5-carboxylate synthase 1) | Enzyme |
03 | CPL1 (C-terminal domain phosphatase-like 1) | Transcription factor |
04 | ERD5 (early responsive to dehydration 5) | Transcription factor |
05 | NHX1 (Na+/H+ exchanger) | Vacuolar antiporter |
06 | SOS1 (salt overly sensitive 1) | Plasma membrane antiporter |
07 | SOS2 (salt overly sensitive 2) | Protein kinase |
08 | SOS3 (salt overly sensitive 3) | Calcium-dependent protein serine |
In the next section of the result, all possible characters of the targeted eight molecules were revealed depending on their amino acid, protein domains, individual interactomes, and gene ontology to get the whole pictorial view of the genes in three different sectors of life system, biological, molecular, and cellular, respectively. Available free tools mentioned in Section
Selected eight genes’ amino acid sequences were collected from NCBI database and BLASTed in NCBI BLASTp suite using the protein-protein blast algorithm. The conserved domains were retrieved to understand functional entities of the target proteins. Only the homologs close to the search molecule were considered to find out protein superfamily conservancy (Table
Targeted proteins and their conserved domains search by protein-protein blast algorithm. Homologs of the target proteins with their coverage and identities indicated the conservancy of the targeted protein among species.
Serial | Name | Conserved domain | Homologs | Identity (I) and Coverage (C) | Functions of homologs |
---|---|---|---|---|---|
1 | DREB2A | AP2 superfamily |
|
C: 100%, I: 99% | Drought responsive elements |
|
|||||
2 | P5CS1 | AAK superfamily/ALDH-SF superfamily |
|
C: 100%, I: 97% | Encodes a delta-1-pyrroline-5-carboxylate synthase and responsive to abiotic stress tolerance |
|
|||||
3 | CPL1 | NIF superfamily/DSRM superfamily |
|
C: 99%, I: 63% | Encodes a novel transcriptional repressor harboring two double-stranded RNA-binding domains |
|
|||||
4 | ERD5 | Pro_dh superfamily |
|
C: 100%, I: 97% | Encodes a proline oxidase and is expressed by high levels of osmotic stress |
|
|||||
5 | NHX1 | TM_PBP1_branched-chain-AA_like superfamily |
|
C: 100%, I: 99% | Vacuolar antiporter |
|
|||||
6 | SOS1 | TM_PBP1_branched-chain-AA_like superfamily/CAP_ED superfamily |
|
C: 100%, I: 89% | Ca+ responsive elements |
|
|||||
7 | SOS2 | PKc like superfamily/AAMPKA_C like superfamily |
|
C: 100%, I: 92% | Protein kinase |
|
|||||
8 | SOS3 | EFh superfamily |
|
C: 99%, I: 86% | Membrane transporter |
All targeted genes are highly conserved among other species. The mentioned superfamilies (Table
The protein domains (Table
Target proteins and their characterized domains explored using InterProScan.
Serial | Name | Protein domains |
---|---|---|
1 | DREB2A | (1) AP2/ERF domain |
|
||
2 | P5CS1 | (1) Gamma-glutamyl phosphate reductase |
|
||
3 | CPL1 | (1) Double-stranded RNA-binding |
|
||
4 | ERD5 | (1) Proline dehydrogenase |
|
||
5 | NHX1 | (1) Na+/H+ exchanger |
|
||
6 | SOS1 | (1) Cyclic nucleotide-binding domain |
|
||
7 | SOS2 | (1) Protein kinase, catalytic domain |
|
||
8 | SOS3 | (1) Recoverin |
Protein-protein interaction was observed using string 9.05 database. Each target molecule based PPI was checked. The PPI interaction and stringency to each other was calculated (Figure
Distance and physical interaction calculated of the targeted genes with their interacting protein.
The parameters revealed the robustness of the network. Targeted all up-regulated proteins could build up a protein-protein interaction network among themselves (Figure
Parameters for the integrated network. (a) Average clustering coefficient versus number of neighbors, (b) betweenness centrality versus number of neighbors, (c) closeness centrality versus number of neighbors, (d) average neighborhood connectivity versus number of neighbors, (e) number of nodes versus degree, (f) frequency versus number of shared neighbors, (g) frequency versus path length, (h) number of nodes versus stress centrality, and (i) topological coefficient versus number of neighbors have been plotted.
Gene ontology study helped to get all correlated real and hypothetical functions of the target protein molecules. It depicted (Table
GO annotation of the targeted proteins. It shows their diverse and very significant role during stress.
Name | Direct annotation by GO |
---|---|
DREB2A | Protein binding, response to hydrogen peroxide, regulation of transcription, response to chitin, sequence-specific DNA binding transcription factor activity, response to heat, response to UV-B, response to water deprivation, heat acclimation, and response to high light intensity |
|
|
P5CS1 | Response to salt stress, response to abscisic acid, hyperosmotic salinity response, pollen development, root development, response to water deprivation, proline biosynthetic process, delta-1-pyrroline-5-carboxylate synthetase activity, response to desiccation, and response to oxidative stress |
|
|
CPL1 | Double-stranded RNA binding, phosphatase activity, response to salt stress, response to wounding, abscisic acid-activated signaling pathway, negative regulation of transcription, DNA-templated, and phosphoserine phosphatase activity |
|
|
ERD5 | Glutamate biosynthetic process, response to water deprivation, proline catabolic process, proline dehydrogenase activity, response to oxidative stress, and defense response to bacterium |
|
|
NHX1 | Protein binding, vacuolar membrane, lithium ion transport, sodium:hydrogen antiporter activity, response to salt stress, regulation of stomatal closure, sodium ion transmembrane transporter activity, leaf development, and protein import into peroxisome matrix |
|
|
SOS1 | Protein binding, response to hydrogen peroxide, lithium ion transport, response to salt stress, sodium:hydrogen antiporter activity, sodium ion transmembrane transport, regulation of reactive oxygen species metabolic process, response to oxidative stress, and response to reactive oxygen species |
|
|
SOS2 | Protein binding, response to salt stress, protein kinase activity, plasma membrane, kinase activity, plant-type vacuole membrane, and identical protein binding |
|
|
SOS3 | Protein binding, detection of calcium ion, calcium-mediated signaling, cellular potassium ion homeostasis, stomatal movement, calcium ion binding, calcium-dependent protein serine/threonine phosphatase activity, and hypotonic salinity response |
A correlation image was drawn based on the calculation and analysis provided in Supplementary File 2 to address an internal correlation of these eight protein molecules in
Final integration based on the correlation data count (Supplementary File 2).
In model plant,
The aim of the study was initially to find out commonly upregulated genes in different abiotic stress and by doing that the ultimate goal was to hypothesize a gene regulatory network. In current study, four abiotic stress dependent gene expression counts were taken. The expression hub creations led to the finding out of the most common genes/proteins that are upregulated in all targeted abiotic stress conditions. Then the sorting was done based on the connectome data and the only bridging molecules were taken for further studies. Then, extensive bioinformatics tools and databases were used to characterize all individuals in terms of similarities, conservancy, protein domain, GO, and individual interaction. It turned out that all individuals are highly correlated in functions and diverse in mechanism at the same time. Most of the GO annotation referred to functional entities in common patterns which helped to create a regulatory network that depict that these targeted genes/proteins are the most common and role player in plants during stress and maintain some uniqueness. Moreover, most of the functions of these targeted genes/proteins have DNA binding properties which can be a major basis of saying that these molecules are most competent for initiating stress tolerance response as they bring about more TF, enzymes, and/or other regulatory genes in the same string during stress tolerance.
Bioinformatics study based on online tools and database using freely available microarray datasets show that there are some common genes upregulated during various environmental stresses. The proposed protein-protein interaction network may solve the mystery relating abiotic stress tolerance mechanism, so further validation by wet lab experiments are required to resolve the secret. So, in future attempts need to be taken in the wet bench to analyze their activity in total to have an in-depth idea of their actual activity under stress condition so that it could bring some answers to the farmers in the crop sector as well as in the nature.
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors would like to thank BAS-USDA PALS program for funding the project.