This paper presented the issues of true representation and a reliable measure for analyzing the DNA base calling is provided. The method implemented dealt with the data set quality in analyzing DNA sequencing, it is investigating solution of the problem of using Neurofuzzy techniques for predicting the confidence value for each base in DNA base calling regarding collecting the data for each base in DNA, and the simulation model of designing the ANFIS contains three subsystems and main system; obtain the three features from the subsystems and in the main system and use the three features to predict the confidence value for each base. This is achieving effective results with high performance in employment.
A neural fuzzy theory approach is considered; it presented the combined advantages of both fuzzy logic and neural networks [
There are several attempts to use the Neurofuzzy technique in more applications, Neurofuzzy approaches mix between fuzzy inference systems (FISs) and neural networks (NNs) and combine the advantages of both of them, because of neurofuzzy approach’s successful practical technology in many areas. The neural fuzzy (NF) systems take a place in the field of genomics; the paper [
The paper [
The important part of this work is to determine the confidence value in development way of using the Phred base calling. By employing an algorithm on a huge data set Phred was able to make a model (a lookup table). The input space of the model consists of trace data features such as the peak spacing, uncalled/called ratio, and peak resolution. The output space is the resulting quality value. Creating a confidence value for the base calls, in this procedure of traces and quality values using Consed [
The neural fuzzy (NF) systems can achieve a higher accuracy within a relatively short training time comparing with neural networks and the difficulty of choice and building of membership functions in the fuzzy logic of a given problem. Unlike other applications, the neural fuzzy (NF) techniques are more transparent models and efficient to implement.
The Neurofuzzy classifier is used on the other side of implementations, which is able to decode much of the hidden contextual information in two fuzzy rules per base and partially discover its underlying behavior [
A Neurofuzzy technique is implemented by the designing which is suitable for determining the confidence value of base calling. This designing includes three subsystems to determine the three features regarding the collected data at each base (for more information about the data collection see [
Model overview of ANFIS main confidence system.
We use the Neurofuzzy technique, which is the optimal method to measure incorrect analysis and editing process to make it much easier and faster with the results and give the credible and true representation of the values. The Neurofuzzy technique here very well establishes the method and has a tremendous potentiality to give results with high accuracy ratio and the efficiency in dealing with DNA sequencing; this approach is proposed regarding collecting information at the base; for more information see [
This designing includes the three subsystems to determine the three features (peakness, height, and spacing) and the main system that inputs the three features obtained, and we determine the confidence value for each base in DNA base calling by using the MATLAB tool; see Figure
In this system we use the four files of data set that contain 500 samples for each, and each file is divided into two parts: training set and testing set through implementing the method; in the training set, we use the data set of about 350 samples, whereas, in the testing set, we use the data set of about 150 samples. In the first, in the training step, we load the file training set of data sample for DNA base calling, and in the fuzzy inference system two inputs and one output for each system in the three subsystems are generated to obtain the three features and three inputs and one output in the main system in order to obtain the confidence value for each base. We attempt several processes in ANFIS subsystems and ANFIS main system and choose the most suitable one with less average testing error.
According to the membership function of this system, we generate five MF for each input in each system, in the subsystem design, in the peakness, we generate the triangular MF, in the height, we generate the gauss 2 MF, and, in the spacing, we generate the trapezoidal MF; in the main confidence system, we generate the gauus 2 MF, with constant output for each of the three subsystems and the main one; see Figure
(a, b, c) The membership function of each feature in the subsystem, and (d) is the membership function for the confidence value of the main system.
Peakness
Height
Spacing
Confidence value
Then perform training with backpropagation in neural fuzzy system with 500 epochs, as well as testing the system by loading the file testing set. Through this method, we select the option that reaches the result in very high accuracy; see Figure
(a, b, c) The ANFIS testing results for each feature in the subsystem, and (d) is the ANFIS testing results in the confidence value in the main system.
Peakness
Height
Spacing
Confidence value
By using the Neurofuzzy technique, we obtained the results with high performance through building the optimal designing, through training and testing the system; the if-then rules are generated automatically in suitable way that helps us to give the result with reaching success in the correct analysis depending on data set that is loaded in the system; when the system is tested and the rules are generated the structure of the Neurofuzzy system is applied, and we can use the rule view to get the results for each system. To illustrate our method a section of a DNA sequence which includes six bases (ATCTCG) is used. Table
The input data value for each base.
Features | Input data | The bases of the sequences | |||||
---|---|---|---|---|---|---|---|
A | T | C | T | C | G | ||
Peakness | | 0.998 | 0.999 | 0.794 | 0.999 | 0.930 | 0.999 |
| 0.361 | 0.478 | 0.838 | 0.721 | 0.665 | 0.618 | |
| |||||||
Height | | 0.889 | 0.991 | 0.644 | 0.954 | 0.696 | 0.952 |
| 0.560 | 0.421 | 0.604 | 0.606 | 0.531 | 0.485 | |
| |||||||
Spacing | | 0.305 | 0.305 | 0.281 | 0.286 | 0.302 | 0.274 |
| 0.298 | 0.305 | 0.305 | 0.281 | 0.286 | 0.302 |
Figure
(a, b, c) The ANFIS rule view and the structure for the three fuzzy subsystems and (d) for the main fuzzy system.
From the above, we obtained the value of the three features (peakness, height, and spacing), and now we use these values as the three inputs in the main ANFIS confidence system to determine the confidence value for each base in DNA base calling. Figure
Confidence value for bases called.
Input data | Bases | |||||
---|---|---|---|---|---|---|
A | T | C | T | C | G | |
Peakness | 0.824 | 0.825 | 0.499 | 0.499 | 0.612 | 0.766 |
Height | 0.683 | 0.767 | 0.230 | 0.692 | 0.390 | 0.741 |
Spacing | 0.813 | 0.813 | 0.813 | 0.812 | 0.813 | 0.813 |
Confidence value | | | | | | |
The main idea of this paper is using an efficient technique implemented in DNA base calling in order to determine the confidence value for each base; the combination of neural network and fuzzy logic has overcome the difficulties through building the membership function in fuzzy logic and the constraints in implementing the neural network, which gives efficient results with high performance, by designing the ANFIS for each of the three subsystems to obtain the three features and the main system to predict the confidence value for each base.
The authors declare that they have no competing interests.