Prediction of RNA structure is a useful process for creating new drugs and understanding genetic diseases. In this paper, we proposed a particle swarm optimization (PSO) and ant colony optimization (ACO) based framework (PAF) for RNA secondary structure prediction. PAF consists of crucial stem searching (CSS) and global sequence building (GSB). In CSS, a modified ACO (MACO) is used to search the crucial stems, and then a set of stems are generated. In GSB, we used a modified PSO (MPSO) to construct all the stems in one sequence. We evaluated the performance of PAF on ten sequences, which have length from 122 to 1494. We also compared the performance of PAF with the results obtained from six existing well-known methods, SARNA-Predict, RnaPredict, ACRNA, PSOfold, IPSO, and mfold. The comparison results show that PAF could not only predict structures with higher accuracy rate but also find crucial stems.
RNA functions as an information carrier, catalyst, and regulatory element, perhaps reflecting its importance in the earliest stages of evolution. The structures of RNAs provide insight into the mechanisms behind these functions. Determining sequence is the first step in determining structure, and many billions of nucleotide sequences are now known. The second step is determining secondary structure, and relatively few classes of RNAs currently have known secondary structures [
The metaheuristic methods [
The main contributions of this paper are described as follows. A framework, namely, PAF, was proposed for RNA secondary structure prediction, which includes CSS and GSB. In CSS, MACO was proposed to search the crucial stems. In GSB, MPSO was designed to construct all the stems in one sequence.
The rest of the paper is organized as follows. Section
ACO algorithm is biologically inspired from the behavior of colonies of real ants, and in particular how they forage for food. ACO has been formalized into a metaheuristic for combinatorial optimization problems by Dorigo and coworkers [
In ACO, an ant
PSO originated from the simulation of social behavior of birds in a flock [
Kennedy and Eberhart [
The function
See Algorithm
Initialize the parameters of MACO Randomly initialize the solutions for all the ants While current number of iterations < Max iteration For each ant in the population For each stem End for Evaluate the solution according to the energy. End for For each stem in the set Update the pheromones according to ( End for End while
MPSO was modified based on our previous studies IPSO [
See Algorithm
Initialize all the parameters of MPSO While current number of iterations < Max iteration For each particle Update its velocity Update its position Restrict position and velocity Calculate fitness and Update local best End for Update the global best Turn the parameters of MPSO via fuzzy logic controllers End while
The parameter details of ACRNA are number of ants = 100, number of iterations = 600,
The measures used for prediction accuracy on the majority of documents currently are sensitivity, specificity, and
Ten sequences from the comparative RNA website are selected for evaluation of the proposed method, and the details of the sequence are described in Table
RNA sequence details.
Organism | Accession number | RNA class | Length | Base pairs in known structure |
---|---|---|---|---|
|
X05914 | 16S rRNA | 784 | 233 |
|
M27605 | 16S rRNA | 945 | 251 |
|
AF034620 | 5S rRNA | 122 | 38 |
|
U40258 | Group I intron, 16S rRNA | 468 | 113 |
|
AF197120 | Group I intron, 23S rRNA | 394 | 120 |
|
L19345 | Group I intron, 16S rRNA | 543 | 138 |
|
U02540 | Group I intron, 16S rRNA | 556 | 131 |
|
J01415 | 16S rRNA | 954 | 266 |
|
Y08511 | 16S rRNA | 964 | 265 |
|
D14876 | 16S rRNA | 1494 | 468 |
Table
A comparison of the highest matching base pair structures from PAF and mfold in terms of sensitivity, specificity, and
Sequence | Known bps | Predicted bps | TP | FP | FN | Sensitivity | Specificity |
|
|||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PAF | mfold | PAF | mfold | PAF | mfold | PAF | mfold | PAF | mfold | PAF | mfold | PAF | mfold | ||
|
233 |
|
252 |
|
82 |
|
170 |
|
151 |
|
35.2 |
|
32.5 |
|
33.8 |
|
251 |
|
245 |
|
113 |
|
132 |
|
138 |
|
45.0 |
|
46.1 |
|
45.6 |
|
38 |
|
34 | 28 |
|
|
5 | 10 |
|
73.7 |
|
|
85.3 |
|
80.6 |
|
113 |
|
133 |
|
74 |
|
59 |
|
39 |
|
65.5 |
|
55.6 |
|
60.2 |
|
120 |
|
116 | 91 |
|
|
24 | 29 |
|
75.8 |
|
|
79.3 |
|
78.0 |
|
138 |
|
167 | 82 |
|
|
84 | 56 |
|
59.4 |
|
|
49.7 |
|
54.4 |
|
131 |
|
174 |
|
95 |
|
79 |
|
36 |
|
72.5 |
|
54.6 |
|
62.3 |
|
266 | 260 |
|
|
95 |
|
163 |
|
171 |
|
35.7 |
|
36.8 |
|
36.3 |
|
265 | 249 |
|
|
74 |
|
167 |
|
191 |
|
27.9 |
|
30.7 |
|
29.2 |
|
468 |
|
496 |
|
271 |
|
225 |
|
197 |
|
57.9 |
|
54.6 |
|
56.2 |
| |||||||||||||||
Averages | 202.3 |
|
211.6 |
|
100.8 |
|
110.8 |
|
101.5 |
|
55.28 |
|
52.52 |
|
53.66 |
Table
A comparison of the highest matching base pair structures from SARNA-predict, RnaPredict, ACRNA, PAF, PSOfold, and mfold in terms of sensitivity and specificity.
Sequence | Length | SARNA-Predict | RnaPredict | ACRNA | PAF | PSOfold | mfold | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Se | Sp | Se | Sp | Se | Sp | Se | Sp | Se | Sp | Se | Sp | ||
|
784 | 42.1 |
|
27.9 | 26.9 | 27.9 | 26.9 |
|
41.7 | 42.9 | 41.5 | 35.2 | 32.5 |
|
945 |
|
48.9 | 37.1 | 38.8 | 45.8 | 46.9 |
|
|
45.6 | 48.5 | 45.0 | 46.1 |
|
122 | 71.1 | 90.0 | 71.1 | 90.0 |
|
78.4 | 73.7 |
|
71.1 | 87.1 |
|
85.3 |
|
468 | 74.3 | 64.6 | 60.2 | 51.9 | 63.7 | 54.1 | 75.2 |
|
|
64.7 | 65.5 | 55.6 |
|
394 | 63.3 | 65.0 | 62.5 | 62.0 | 63.3 | 65.0 | 75.8 |
|
75.0 | 74.4 |
|
79.3 |
|
543 |
|
56.7 | 57.2 | 49.1 | 58.0 | 47.9 | 59.4 |
|
58.0 | 53.0 | 60.1 | 49.7 |
|
556 | 74.0 | 58.8 | 61.8 | 50.3 | 73.3 | 55.2 |
|
|
76.3 | 59.2 | 72.5 | 54.6 |
|
954 | 44.7 |
|
33.5 | 35.6 | 44.7 |
|
|
47.7 | 44.4 | 47.2 | 35.7 | 36.8 |
|
964 | 35.1 | 38.9 | 30.9 | 33.9 | 34.3 | 38.7 |
|
|
35.1 | 39.6 | 27.9 | 30.7 |
|
1494 | 52.4 | 51.9 | 52.4 | 51.9 | 58.3 | 58.0 |
|
|
58.3 | 58.0 | 57.9 | 54.6 |
| |||||||||||||
Averages | 722.4 | 56.77 | 56.58 | 49.46 | 49.04 | 54.56 | 51.99 |
|
|
58.28 | 57.32 | 55.28 | 52.52 |
In order to validate the stability of the proposed method, we ran ACRNA, IPSO, PAF, and PSOfold ten times and calculated the average highest matching base pair structures of each algorithm in terms of sensitivity and specificity. The detail results are shown in Table
Average sensitivity and specificity via ACRNA, IPSO, PAF, and PSOfold.
Sequence | Length | ACRNA | IPSO | PAF | PSOfold | ||||
---|---|---|---|---|---|---|---|---|---|
se | sp | se | sp | se | sp | se | sp | ||
|
784 | 25.1 | 24.5 | 27.8 | 27.1 |
|
|
32.6 | 29.9 |
|
945 | 40.2 | 38.8 | 38.7 | 36.7 |
|
|
42.4 | 40.1 |
|
122 | 70.5 | 73.6 | 67.8 | 68.1 |
|
|
69.2 | 75.4 |
|
468 | 61.3 | 52.6 | 60.5 | 51.8 | 62.4 |
|
|
52.6 |
|
394 | 60.9 | 63.5 | 57.1 | 56.2 |
|
|
65.1 | 63.3 |
|
543 | 56.2 |
|
50.1 | 41.3 |
|
47.8 | 54.4 | 47.8 |
|
556 | 67.1 | 54.1 | 63.2 | 47.3 |
|
52.9 | 68.5 |
|
|
954 | 34.1 | 32.1 | 31.9 | 33.5 |
|
|
32.4 | 33.3 |
|
964 | 31.1 | 30.2 | 30.2 |
|
|
31.8 | 32.6 | 31.8 |
|
1494 | 56.0 | 55.4 | 52.5 | 51.4 | 54.5 |
|
|
55.9 |
| |||||||||
Averages | 722.4 | 50.25 | 47.36 | 47.98 | 44.59 |
|
|
51.76 | 48.46 |
Average best sensitivity by running the algorithms 10 times on
Average best sensitivity by running the algorithms 10 times on
Average best sensitivity by running the algorithms 10 times on
In this paper, a framework, PAF, was proposed for RNA secondary structure prediction, which consists of CSS and GSB. In order to preserve crucial structures, MACO in CSS is proposed to find the important stems. MPSO in GSB is developed to generate predicted structures in order to save searching spaces. The experimental results show that the performance of the proposed method is significantly better than those of the other metaheuristic methods in terms of sensitivity, specificity, and
This research was supported by The Chinese Government’s Executive Program “Instrumentation development and field experimentation” (SinoProbe-09).