An Efficient Machine Learning-Based Feature Optimization Model for the Detection of Dyslexia

Dyslexia is among the most common neurological disorders in children. Detection of dyslexia therefore remains an important pursuit for the research works across various domains which is illustrated by the plethora of work presented in diverse scientific articles. The work presented herein attempted to utilize the potential of a unified gaming test of subjects (dyslexia/controls) in tandem with principal components derived from data to detect dyslexia. The work aims to build a machine learning model for dyslexia detection using comprehensive gaming test data. We have attempted to explore the potential of various kernel functions of the Support Vector Machine (SVM) on different number of principal components to reduce the computational complexity. A detection accuracy of 92% is obtained from the radial basis function with 5 components, and the highest detection accuracy obtained from the radial basis function with 3 components is 93%. On the contrary, the Artificial Neural Network(ANN) shows an added advantage with minimal number of hyperparameters with 3 components for obtaining an accuracy of 95%. The comparison of the proposed method with some of the existing works shows efficacy of this method for dyslexia detection.


Introduction
One of the most complicated neurological brain disorders that is attracting attention among researchers in modern neuroscience is Dyslexia [1]. e International Dyslexia Association defines dyslexia as a disorder identified by difficulties with spelling, language processing, and accurate word recognition. e overall paradigm of dyslexia can be summarized in Figure 1. e main actors of dyslexia consist of phonological disorder (PD), visual disorder (VD), and auditory disorder (AD). ese disorders start to evolve from the time of birth and manifest themselves into an abnormality. e associated abnormality with these actors plays a very critical role in shaping the personality of a person. e consequences are multifold with numerous behavior deficits (BD) and cognitive deficits (CD). Most people think of dyslexia as a disorder in which a person is seeing letters and words backwards such as seeing "b" as "d" and vice versa, "was" as "saw" and vice versa. However, the truth is that people with dyslexia see things the same way as everyone else. Dyslexia is caused by a phonological processing problem [2] meaning people affected by it have trouble not with seeing language but with manipulating it. For example, if a person with dyslexia hears a word such as "heat" and then someone asks him/her to remove the first word (which is "h"). It would be very difficult for a person with dyslexia to tell what word is left ("eat").
Another example of a person with dyslexia is that they tend to break a word in parts to read it, thus delaying reading comprehension. Dyslexia affects about 5-17% of population across most languages [3]. e dyslexia condition emerges at some stage in childhood and evolves progressively in adolescence.
is effect hampers the academic growth and subsequently diminishes self-esteem and confidence [4]. On the other hand, the emergence of transformative healthcare technologies has catalyzed a revolution in provisioning and operational functioning of healthcare services, driven mostly by Computer Aided Detection and Diagnostic systems, interchangeably referred to as CAD systems. Recent advances in imaging technologies have made it plausible for medical practitioners to use advanced and hybrid imaging techniques such as PET, USG, X-Rays, CT scan, fMRI, and SPECT, in addition to others. Enhancements in these techniques enable medical practitioners to gather detailed information about body organs and physiology. ese techniques typically make use of internal, external, or both sources of energy [5,6]. e work proposed herein have multiple advantages that make it a potential candidate as a dyslexia detection framework. is does not consider any imaging modality information for the development of CAD for dyslexia. e information used is generated using a gamified online test structured in such a way that behavioral and cognitive deficits are perceived and quantified. It utilizes acquired data to establish a machine learning framework with principal component analysis. With SVM and ANN as machine learning frameworks, PCA is seen to be very effective in the detection of dyslexia. Moreover, PCA significantly reduces the computational overhead of the model since it has to deal with narrow feature space. e overall organization of the paper is as follows: section 2 presents some of the existing work in the domain of dyslexia detection. e proposed framework is illustrated in section 3 of the paper, while the experimental setup along with the results are discussed in section 4 of the paper. Section 5 presents the conclusion of the work.

Related Work
With the advent of smart devices that are utilized in different domains such as healthcare, business organizations, educational sector, cities, and agriculture, a humongous amount of data is being generated. ese insights to these data open new challenges and possibilities in a wide range of applications. e information collected from various sources in a healthcare setup open possibility for early detection or future prediction of various diseases. Studies presented in [7][8][9][10] have leveraged the healthcare data for different detection tasks. Studies by [11,12] reveal that the collection of data sets for dyslexia are relatively cheap when we create a dataset by using standardized psychoeducational tests and learner's handwritings. is is the reason that we are using a gamified online test for the study. So, the use of these data sets actually provides 2 benefits: first is that it is very cheap to collect and the other one being that the size of the data set, that is, in terms of features is very large, which is one of the fundamental requirements for building a stable machine learning model. e next subsection provides a list of machine learning algorithms that have been proposed from time to time for the detection of dyslexia. All these studies have used different types of machine learning algorithms and datasets of varying nature and sizes. e study by [13] demonstrated the application of artificial neuron networks to identify the presence of dyslexia in school children. e study used the test score as the data and MLP architecture of ANN. With a 10-fold cross validation, an accuracy of 75% was reported in the study. e work presented by [14] used all sequences of machine learning algorithms which include support vector machines, artificial neural networks, and k-means. MRI scans were used to classify between dyslexia and control groups. With a dataset size of 56, ANN showed up with the best accuracy of 94.8%. e work done by [15] demonstrates the use of MRI scans for discriminating dyslexia and control cases. e study is carried out on a dataset of 236 subjects with SVM as the ML algorithm. An accuracy of 83% was reported in this study. EEG scans of 80 subjects were used for the diagnosis of dyslexia at an early stage using machine learning algorithms which included the k means, ANN, and fuzzy logic classifier with an average accuracy of 89.6%, 89.7%, and 85.7%, respectively, by [16]. Another study based on EEG scans is presented in [17] on 6 subjects with a median age group of 5. Here, a multilayer perception model was used to detect dyslexia by analyzing brain activity signals, achieving an accuracy of 85%. MATLAB's LIBSVB toolbox was used to implement the linear support vector machine classifier on 61 MRI scans to discriminate a dyslexia biomarker using white matter features of the brain. e accuracy reported in this study for dyslexia detection is 83.6%. e work done in [18] categorizes dyslexia and nondyslexia cases on MRI scan data of 925 subjects using linear SVM. e study reports an accuracy of 80%. A linguistic computer game-based dyslexia detection was done by [19] on a 267 subject dataset utilizing eye tracker features. is study reports an accuracy of 85% using the SVM from the LIBSVM Toolbox of MATLAB. Another eye tracker-based dyslexia detection was performed by a study carried by [3] on a dataset of 185 subjects. e SVM was used with automatic recursive feature elimination, obtaining an accuracy of 96%. Another study wherein the SVM has been inducted for dyslexia detection on an eye tracking feature is reported in [20][21][22][23][24]. An accuracy of 80% on a dataset size of 97 is achieved in this work.

Proposed Methodology
e methodology adopted for this study is pictorially shown in Figure 2. With a large number of methodologies existent on the use of imaging modalities for dyslexia detection, the utilization of the gaming-based tests is also being explored for the potential detection methodology for dyslexia. e next subsection discusses the mathematical framework of the work from start Assuming a feature vector (F mi ) of each subject, we should see Eq where m corresponds to subject number and i corresponds to each index of the feature vector. For the complete feature space in 2D, we should see Eq With F s , we estimate an accuracy parameter A u from a set of machine learning kernel functions corresponding to SVM. In addition, the same accuracy parameter is estimated for the ANN. Having obtained an A u from a feature space of size 3644 × 196, our aim is to reduce the feature space by weighted reduction for improving A u . To put it in a more generic way, we aim to obtain the best possible A u with a feature space which maximizes σ 2 for all F i,j . Initially, a set of parameters are chosen for different kernel functions of SVM as given in equations (3) k(x, y) � tanh(kx.y + c), k(x, y) � 1 + xy + xy min(x, y) ese equations correspond to kernel tricks, namely, linear, radial basis function, Laplace, hyperbolic tangent, Bessel, and linear spline, respectively. ese 6 kernel functions yield 6 machine learning algorithms whose potential we wish to explore with the change in feature space size. e choice of the kernels as given in eq. 4, eq. 5, and eq. 7 largely depends on the tunable parameters σ1, σ2, and σ3, respectively. e selection of these 3 parameters determines the efficacy of the kernel in specific and the SVM as a classifier in general. e selection of σ1, σ2, and σ3 can neither be underestimated nor overestimated. If the values are overestimated, the kernel function will behave more like a linear function and thus losing the capability of a nonlinear projection. On the other hand, if the values are underestimated, the decision boundary will be sensitive to the noisy data; thus, there will be a lack of regularization.
In line with this rationale, the values of σ1, σ2, and σ3 are set as 0.15. With all these parameters of the kernel function set, we implement principal component analysis on F s to help us extract a new set of F s coefficients. e main idea of applying principal component analysis is to reduce the higher dimensionality of a feature space having large correlated data with a lower dimensionality feature space having small correlated data. e principal components derived from the original data tend to capture most of the variance of the data and hence can be effectively utilized to train a classifier model. Figure 3 shows an instance wherein we have plotted 100 principal components against the amount of variance that they have captured in the form of eigen values. As can be seen in Figure 3, the first few components capture almost all the variance of the data implying its efficacy. Algorithm 1 shows a pseudocode for the proposed methodology.

Experimental Results
e dataset [1] chosen for this study is a thorough evaluation of the following components of language speaking and understanding: phonological awareness, morphological awareness, visual discrimination and categorization, alphabetic awareness, syllabic awareness, semantic awareness, auditory discrimination and categorization, visual working:  Computational Intelligence and Neuroscience memory, and sequential auditory: working memory. e setup is quite contrasting to the setups which use different types of imaging modalities as a tool for detecting dyslexia. e dataset is 3644 subjects, 2 class-labeled data with 196 attributes. Figure 4 shows the distribution of the cases with respect to various age groups. e number of dyslexic and nondyslexic cases is well distributed in the range of 07 to 17 years. Figure 5 shows the percentage of dyslexia subjects' age wise. e point to observe here again is that the distribution is almost evenly distributed. With the given feature set, the proposed model uses two classifiers. Several classification methods exist, which include quadratic discriminant analysis (QDA), linear discriminant analysis (LDA), decision trees, maximum entropy classifier, Naive Bayes classifier, K-nearest neighbor, support vector machine (SVM), and Artificial Neural Network (ANN) [18]. e work herein uses the said dataset to detect dyslexia with the SVM and ANN. First, we propose PCA-driven new feature vectors as the indicators for the dyslexia. Table 1 depicts the dyslexia detection accuracy using 10 principal components. e highest accuracy is achieved by using the radial basis kernel function with a hyperparameter value σ1 � 0.5. e lowest detection accuracy is obtained using the spline kernel function for the SVM. Similarly, Table 2 gives a comparative detection accuracy of the 6 kernel functions with 5 principal components: PC1, PC2, PC3, PC4, and PC5. As expected, the accuracies obtained are slightly better compared to the results obtained in Table 2.
e reason that can be attributed to this is depicted in Figure 3 wherein it is seen that lower principal components capture most of the variance in the data. e dyslexia accuracy is seen to improve further when the number of components used in the framework of SVM kernels is reduced to 3. e same is depicted in Table 3.
In comparison to a detection accuracy of 92% obtained from the radial basis function with 5 components, the highest detection accuracy obtained from the radial basis function with 3 components is 93%. e capability of principal components in detecting dyslexia is also depicted by the score plot shown in Figure 6. e plot shows how dyslexic and nondyslexic cases are segregated by the two principal components. Firstly, the PC1 shown as the dotted red vertical line divides all the given cases in the direction of the maximum variance. e number of outliers on the application of the first principal component is significantly     (1) For each Ci, estimate σ1, σ2 and σ3 (2) While A u Optimized (3) Apply PCA (4) For n no. Of PCs : Fs' (5) For n upto 3, do (6) Split F s ′ into X 1 , X 2 , and X 3 where X 1 � 0.7 F s , X 2 � 0.15 F s , and X 3 � 0.15 F s (7) For C i : train the model with each data point in X 1 . Test the model with data points X 2 Validate the model with data points X 3 (8) Estimate A u (9) Reduce the value of n: return to 6 Return Accuracy. ALGORITHM 1: Pseudocode of the proposed methodology. 4 Computational Intelligence and Neuroscience large. e second principal component as shown in the purple dotted horizontal line is now seen to reduce the number of outliers. e same methodology for predicting dyslexia using the online gaming-based test is carried out using ANN. e aim of this part of the study was to observe an accuracy improvement by changing the input size of the ANN. We choose a fixed hidden layer size of 10. With two output classes, dyslexic and nondyslexic, the input to the ANN was changed as per the number of principal components retained from the feature space. Table 4 shows the number of weights learnt by each NN with the changing number of inputs. On one side, the smaller number of components hide most of the information from the data, and on the other hand, the number of components leads to a smaller number of weights that were needed to be learnt. e comparison of the proposed methodology with some of the recent works reported in [15,25,26] is tabulated in Table 5. Most of the work reported for the detection of dyslexia has 4 main parameters, namely, the size of the dataset, the nature of the dataset, underlying machine learning approach, and the performance of the overall methodology. Based on these 4 parameters tabulated in Table 5, most of the work has been carried out on a relatively small-sized dataset. e demerit of the small-sized dataset in the machine learning framework is that it lacks generalization. e proposed work is carried out on a dataset which is comparatively much larger than the other reported works and hence is better in terms of generalization.

Conclusion
As the numbers state, dyslexia is listed over a population of 10% across the globe with consequences from moderate to severe personality changes. In Saudi Arabia, the incidence rate of dyslexia is found to be around 7%. Early detection of this disorder can help effective treatment in most of the cases. With researchers, clinicians and experts from various domains taking a stride to address this issue, the success is not that far away. Artificial intelligence and machine learning in contention with the medical imaging modalities have come up with possibilities of hope. e work presented herein successfully attempted to use an online gaming test-      Computational Intelligence and Neuroscience 5 based strategy for the detection of dyslexia. It is pertinent to mention that with the age and lifestyle of the subjects under consideration, online gaming methodology for data acquisition becomes one of the first choices. e work extends by utilizing this acquired data to establish a machine learning framework with principal component analysis. With SVM and ANN as machine learning frameworks, PCA is seen to be very effective in the detection of dyslexia. Moreover, PCA significantly reduces the computational overhead of the model since it has to deal with narrow feature space. e work herein reports an accuracy of 95% with PCA and ANN with nearly 4000 subjects in the overall experimentation setup. e proposed work shows potential as depicted by the comparison of this methodology with some of the existing works. is work can be a promising candidate for the development of the learning management system for dyslexia. In future, the authors will try to improve the results of this research work by employing a deep learning model where optimization will be carried out on input images directly [27,28].

Data Availability
Data used in this article will be shared on request to the corresponding author.

Conflicts of Interest
e authors declare that they have no conflicts of interest.