MicroRNA Gene Interaction in Amyotrophic Lateral Sclerosis Dataset

All microRNAs (miRNAs) in amyotrophic lateral sclerosis (ALS) study were collected from public databases such as miRBase, mir2Disease, and HumanmiRNA and Disease Database (HMDD).These miRNA datasets were used for target identification; these sets of miRNAs were expressed in brain specific parts of brain such as midbrain, cerebellum, frontal cortex, and hippocampus. Gene’s information and sequences were collected from NCBI and KEGG databases. All miRNAs were used for target prediction against 35 ALS associated genes.Three programs were used for target identification, namely, miRanda, TargetScan, and PicTar.The dataset contained information about miRNA targets sites identified by each program. Intersection studies of three programs such as miRanda versus TargetScan, miRanda versus PicTar, and TargetScan versus PicTar were carried out with all datasets. Target sites identified by each program were further explored for distribution of target sites across 35 genes in 5 UTR, CDS, and 3 UTR for miRNAs expressed in midbrain, cerebellum, frontal cortex, and hippocampus as predicted. Dataset was also used for calculation of multiplicity and coopretivity; this information was then used for construction of complex gene-microRNA interaction map.


Introduction
Neurological disorders are major health problems in recent years and it is projected that the number of people affected by these disorders will double every 20 years [1].The burden associated with these diseases is especially very high in low income and developing countries.The most assuring approaches must be used for rehabilitation and reduction of the burden of neurological disorders.This needs promising healthcare policies which will strengthen neurological care within the existing healthcare system.ALS is the most common neuromuscular disease which affects younger and older people of all races and ethnic background.It is caused due to mutations in SOD1 and other associated genes which lead to ALS-related motor neuron degeneration.There is no specific cure for ALS disease.However, FDA has approved first drug treatment for the disease riluzole (Rilutek), which is believed to reduce damage of motor neurons [2].It increases survival in ALS patients, but it is not a very effective treatment for ALS patients.Thus there is a major challenge to search novel therapeutics for treatment of ALS; new promising technologies such as RNAi are the boon for healthcare.miRNAs are the small potent regulators which are effective noncoding RNA, which control almost 60% of human gene and several diseases.Various studies have shown that miRNA is highly expressed in the brain for maintaining normal brain function, neuron differentiation, synaptic plasticity, neuronal degeneration, and so on [2][3][4][5][6][7].Williams et al. have shown recently that miRNA-206 delays ALS progression and promotes regeneration of neuromuscular synapses in mice.They have also shown that histone deacetylase 4 (HDAC4) which controls neuromuscular gene expression is among the strongest computationally predicted targets of miR-206.Several studies have shown that brain specific miRNAs are involved in controlling important neurological pathways which are involved in neuronal signalling, so there is urgent need of searching brain specific miRNA interactions in neurological pathways such as in ALS disease [7][8][9][10][11].This dataset is all about interaction of brain associated miRNA in ALS pathway specific genes, which is an important future direction for ALS pathogenesis.

Methodology
The aim of this study is to identify miRNA targets in ALS associated genes and construct an interaction map of miRNA and ALS genes.The idea is to screen all the miRNAs which are affiliated with neurological disorders.We did an extensive literature search for miRNA related to ALS and other neurological disorders.We have generated miRNA data from some of databases such as The Human microRNA and Disease Database (HMDD) (http://202.38.126.151/hmdd/mirna/md/),mir2disease database (http://www.mir2disease.org/)and from other relevant references.HMDD database has integrated 10,177 entries, which include 617 miRNA genes and 438 diseases from 3266 publications [12].This database is a manually curated database.They have associations of miRNA and disease from the literatures.It contains miRNA names, disease names, dysfunction evidences, and the literature PubMed ID.Each miRNA is connected to miRBase for detailed genome annotations.The miR2Disease is also a manually curated database; it is updated bimonthly [13].It provides detailed information on a miRNA-disease relationship, including miRNA ID, disease name, a brief description of the miRNA-disease relationship, miRNA expression pattern in the disease state, detection method for miRNA expression, experimentally verified miRNA target gene(s), and the literature references.We have also collected brain specific miRNA, as all neurological diseases are associated with the brain.Broadly we have divided this dataset into four brain specific parts such as midbrain, cerebellum, frontal cortex, and hippocampus.The miRNAs can be expressed in any part of the body, so they are not specific to a part.Thus, midbrain miRNAs can be expressed in cerebellum or any other part of the brain or body.We have taken 112 midbrain specific miRNAs, 62 cerebellum specific miRNAs, 105 frontal cortex specific miRNAs, and 93 hippocampus specific miRNAs.To reduce this redundancy, all miRNA datasets are combined and redundant miRNAs are removed and all unique miRNAs were used for miRNA target identification.
Mature miRNA sequences were collected from miRBase (http://www.mirbase.org/),which is a searchable database of published miRNA sequences and annotation [14].Here each entry in the miRBase sequence database represents a predicted hairpin portion of a miRNA transcript, with information on the location and sequence of the mature miRNA sequence.We can also browse hairpin and mature sequences; entries can also be retrieved by name, keyword, references, and annotation.All sequence and annotation data are also available for download.
The genes sequences were collected from KEGG pathway database (http://www.genome.jp/kegg/).KEGG pathway is a collection of manually drawn pathway maps representing our knowledge on the molecular interaction and reaction networks for metabolism, genetic information processing, environmental information processing, cellular processes, organismal systems, and human diseases such as neurodegenerative diseases [15].We have selected neurodegenerative disease, that is, amyotrophic lateral sclerosis (ALS) pathway.Total 35 genes were selected from ALS pathway.It provides a link to all genes so that we can access all the information for genes such as gene name, definition, orthology, organism, pathway, drug targets, class, SSDB, motif, other Dbs, structure, position, amino acid sequence, and nucleotide sequences.All the relevant information and gene sequences are collected.Other information of genes such as gene length and its distribution, that is, 5  UTR, 3  UTR, and CDS, and other genes relevant information are collected from NCBI (http://www.ncbi.nlm.nih.gov/).
There are many tools available for miRNA target identification, but when it comes to accuracy of target identification, very few tools have the good sensitivity and specificity.To overcome this complication we have used three well-known tools for target identification which have reliable sensitivity and specificity.The three tools are miRanda, TargetScan, and PicTar.miRanda is an algorithm for the detection of potential miRNA target sites in genomic/gene sequences.miRanda reads RNA sequences (such as miRNAs) from one file and genomic DNA/RNA sequences from another file.Both of these files should be in fasta format.We have uploaded fasta files for miRNA sequences and each gene sequences separately to miRanda algorithm.Algorithm scans miRNA sequences from file against all gene sequences and potential target sites are reported.It identifies potential target sites by using a two-step strategy.In the first step a dynamic programming local alignment is carried out between the query miRNA sequence and the reference sequence.The alignment procedure scores are based on sequence complementarily and not on sequence identity.Here algorithm looks for A:U and G:C matches instead of A:A, G:G, and so on.The G:U wobble pair is also permitted, but generally scores are less than the more optimal matches.In the second phase, the method utilizes folding routines from the RNAlib library, which is part of the ViennaRNA package.At this stage it generates a constrained fictional single-stranded RNA composed of the query sequence, a linker, and the reference sequence (reversed).This structure then is folded using RNAilib and the minimum free energy (DG kcal/mol) is calculated for that structure.Finally, in the last step, detected targets with energies less than an energy threshold are selected as potential target sites.Target site alignments passing both thresholds and other information are produced as output [16,17].
The following cutoff values were used for prediction of target sites using miRanda algorithm: gap open penalty: 2.0, gap extension: 8.00, match score () ≥ 150.00, duplex free energy (Δ) = −25.00kcal/mol, and scaling parameter () = 3.00.The selected gene sequences and mature miRNAs sequences were used as control reference and query sequences, respectively, as input to miRanda.TargetScan algorithm searches predicted miRNA targets in mammals, by searching for the presence of conserved 8mer and 7mer sites that match the seed region of each miRNA.It can also predict nonconserved sites as an option.It also identifies sites with mismatches in the seed region that are compensated by conserved 3  pairing.In mammals, predictions are ranked based on the predicted efficacy of targeting as calculated using the context scores of the sites, and as an option, predictions are also ranked by their probability of conserved targeting.It uses annotated human UTRs and their orthologs, as defined by UCSC whole-genome alignments.Conserved targeting has also been detected within open reading frames (ORFs) [18].PicTar is also an algorithm for the identification of miRNA targets.This is a searchable algorithm which provides details of 3  UTR alignments with predicted sites, links to various public databases, and so forth.It also provides details of miRNA target predictions in vertebrates, seven Drosophila species, three nematode species, and human miRNA targets that are not conserved but coexpressed [19].
Initially it was believed that miRNA can target only 3  UTR in animals and in plants they can target regions including 5  untranslated regions (5  UTR), coding region (CDS), and 3  untranslated regions (3  UTR).But there are ever growing evidences that miRNA can target CDS [20][21][22][23][24] as well as 5  UTR regions in animals.All target sites identified by three programs were collected and tabulated in Excel files with respect to all 35 genes.These sites were divided into 5  UTR, CDS, and 3  UTR.miRNAs expressed in various parts of nervous system and their target sites were also explored in three regions.More target sites were predicted in CDS region as compared to 3  UTR and 5  UTR.Total 1456 target sites were predicted using miRanda for miRNAs considered in the study.The majority of target sites were found in CDS, whereas target sites in the 3  UTR region are comparatively fewer.However, a significant number of targets were also found in the 5  UTR of the genes.This suggests that CDS is more prone to miRNA target regulation than 3  UTR and 5  UTR.All miRNA targets predicted were mapped on 35 genes with respect to 5  UTR, CDS, and 3  UTR. Figure 4 shows the target sites predicted on BAX gene by the miRanda program.The BAX gene is situated on chromosome 19 and its location is on 19q13.3-q13.4region.On analyzing miRNA targets on BAX gene, one hotspot is identified in 3  UTR region.This hotspot has four miRNAs on target site; these are hsa-miR-127-3p, hsa-miR-149, hsa-miR-125a-5p, and hsa-miR-874.Mapping of other miRNA was also carried out on BAX gene.
Several miRNA hotspots were also identified; a miRNA hotspot is the region of sequence that is prone to action of a group of miRNAs.Many miRNA amalgamated are at the same site or share the same target sites or sites located in vicinity of other miRNAs; these regions are commonly known as "hotspot." At these regions, usually, miRNAs that occupy the same spot are coregulated and coexpressed and are involved in important biological functions.The miRNA target region which shows minimum 10-nucleotide overlap at miRNA binding region and possesses minimum overlapping of three miRNA targets is conceded as a hotspot.Total 11 hotspots were identified on the 35 genes in 5  UTR, CDS, and 3  UTR regions.Multiplicity and cooperativity of miRNA dataset were calculated.It is known that one miRNA can be controlled by more than one gene and this is called multiplicity and one gene can be controlled by more than one miRNA and this is called cooperativity [17].Top 10 miRNAs which show highest number of targets or multiplicity and cooperativity were selected and used for constructing a gene-miRNA interaction map.A complex interaction map among selected 35 genes and top miRNAs was constructed.To reduce the positive noise from computational miRNA target prediction and to get reliable results, we used common miRNA targets prediction strategies by using three programs (miRanda, TargetScan, and PicTar).Figures 1-35 represent mapping position of miRNA, chromosome number, and location of gene on chromosome; for example, Figure 1 is the schematic representation of miRNA target sites on ALS2 gene, located on chromosome number 2, and location is 2q33.1.Figures 36 and 37 are the schematic representation of miRNA conserved regions/hotspots targets on two genes on BCL2 and MAP2K3, respectively.

Dataset Description
The dataset associated with this Dataset Paper consists of 7 items which are described as follows.location on a chromosome describes its chromosome number, the arm of the chromosome (the longer arm is called q and shorter arm is called p), position (distance from the centromere), and so forth.Gene SLC1A2 location is 11p13-p12, meaning that it is located on chromosome number 11, on the p arm of chromosome (shorter arm), and the position is in between p13 to p12 arm.The last column Function of Genes lists the function of genes.were used for prediction of miRNA targets by miRanda program.MiRanda is used for prediction of target sites across the genes in 5  UTR, CDS, and 3  UTR regions.This item has information of distribution of target sites in these regions.Column Serial Number represents the serial numbers; Column Gene Name lists the names of 35 genes associated with this study.Columns miRNAs Expressed in Midbrain in 5  UTR, miRNAs Expressed in Midbrain in CDS, and miRNAs Expressed in Midbrain in 3  UTR contain information of miRNAs expressed in midbrain, that is, miRNA target distribution in 5  , CDS, and 3  regions.As for gene number 1 ALS2 in 5  UTR, 2 miRNA targets are located, whereas in CDS and 3  UTR 17 and 1 target sites were identified.miRNAs targets identified by cerebellum specific miRNA were present in columns miRNAs Expressed in Cerebellum in 5  UTR, miRNAs Expressed in Cerebellum in CDS, and miRNAs Expressed in Cerebellum in 3  UTR; for ALS2 gene 2, 3, and 0 targets were identified in 5    respectively.One interesting thing in this study was, with all types of brain specific miRNA, CDS has more number of targets as compared to 5  and 3  .miRNA binding regions and possess minimum overlapping of three miRNA targets.11 hotspots were identified in 5  UTR, CDS, and 3  UTR regions on 9 genes out of total of 35 genes, so these are the important genes with significant biological functions.Columns Serial Number and Gene Symbol list the serial numbers and gene symbols, respectively.Column Hotspots at 5  Region has the information of hotspots at 5  region; as for BAX gene only 1 hotspot is there in 5  UTR region whereas in CDS and 3  regions there is no hotspots.CDS and 3  regions hotspots information is present in columns Hotspots at CDS Region and Hotspots at 3  Region, respectively; for BAX gene, 1 hotspot is there in 5  UTR region.

Concluding Remarks
Repression of candidate genes has paramount importance in battle against neurological disorders such as in ALS; it can be achieved by means of a single miRNA or a group of miRNAs.ALS datasets associated with this study describe miRNA action on genes and important miRNAs involved in ALS disease.It also focuses a complex miRNA interaction map of 35 genes.Most of computational algorithms predict miRNA targets in 3  UTR only, but this study gives a comprehensive map of miRNA target in 5  UTR and CDS region.Interestingly, we found that more miRNA targets sites were present in CDS and 5  UTR as compared to 3  UTR.This data suggests that miRNA regulation may be present highly at these regions and a further in vivo investigation may be required for exceptional results.miRNA conserved target or hotspots are the important regions of genes and can be used as efficient regions for knockdown.Mapping of miRNA targets in ALS affiliated genes gives a comprehensive map, so it will guide us to choose important miRNA markers required in ALS disease.In this study we have used an interactive approach, by using various computational programs for prediction of common miRNA targets which is the best method of reducing the noise from complex datasets and can be used for other data associated with health and disease.

Dataset Availability
The dataset associated with this Dataset Paper is dedicated to the public domain using the CC0 waiver and is available at http://dx.doi.org/10.1155/2014/780726/dataset.

Figure 7 :Figure 8 :
Figure 7: Schematic representation of miRNA target sites on BID.

Figure 21 :Figure 22 :
Figure 21: Schematic representation of miRNA target sites on TNF.

Column 1 : 3 : 4 : 5 :
Serial Number Column 2: Gene Symbol Column Hotspots at 5  Region Column Hotspots at CDS Region Column Hotspots at 3  Region
). Information of amyotrophic lateral sclerosis disease associated genes.35 genes were selected from ALS pathway (KEGG); information of genes is collected from NCBI, KEGG, and other relevant databases.In column Serial Number are listed the serial numbers from 1 to 35.Column Gene Symbol lists the gene symbols of 35 genes, as every gene has a unique abbreviation of a gene name and symbols.Gene number 1 has a symbol SLC1A2 (SLC1A2 (solute carrier family 1) (glial high affinity glutamate transporter) member 2).Most of genes have synonym names; SLC1A2 gene has synonyms EAAT2, GLT-1, SLC1A2, and so forth.All this information is present in column Synonyms.Column Accession Number has information about accession numbers; SLC1A2 gene has accession number NM 004171.Every gene has unique gene ID.Genes IDs are present in column Gene ID; SLC1A2 gene has gene ID 6506.Each gene's length in base pairs (bp) is present in column Gene Length; SLC1A2 gene has 12021 bp length.Coding region or coding DNA sequence of a gene is the region that codes for protein; it is composed of exons.CDS position of each gene is present in column CDS Position; for gene SLC1A2, CDS is 593-2317 bp.Column Gene Location has information about gene's location or gene's cytogenetic band information.The gene's

Table ) .
This item contains 3  UTR specific targets identified by TargetScan program.Total 3214 targets were identified in 35 genes by using four brain specific miR-NAs.Column Serial Number represents the serial numbers; Column Gene Name lists the names of 35 genes associated with this study.Column miRNAs Expressed in Midbrain in 3  UTR has number of miRNAs targets by midbrain specific miRNA in 3  UTR region; for ALS2 gene, 26 targets have been identified by using TargetScan program.Cerebellum specific miRNA targets are present in column miRNAs Expressed in Cerebellum in 3  UTR; here ALS2 gene has 17 targets in 3  UTR.Column miRNAs Expressed in Hippocampus in 3  UTR has miRNA targets of hippocampus specific miRNA; 21 targets were identified for ALS2 gene.Frontal cortex specific miRNA targets were identified in column miRNAs Expressed in Frontal Cortex in 3  UTR; for ALS2 gene 25 targets have been identified.

Table 1 :
Number of target sites predicted in 3  UTR using miRanda, TargetScan, and PicTar.

Table ) .
Results obtained using TargetScan, PicTar, and miRanda were used for comparison of target sites or common target site prediction.This item contains comparison of target site prediction results obtained using TargetScan and PicTar.Column miRNA has list of miRNA.
Column Start Position of Target Site in TargetScan contains start position of target site for particular miRNA; as for hsa-miR-142-5p miRNA, start position is 1203 and end position of Column 4: Start Position of Target Site in PicTar Column 5: Gene Name Dataset Item 7 (Table).miRNA hotspots are the regions which are showing a minimum of 10 nucleotide overlaps at