IoMT-Based Mitochondrial and Multifactorial Genetic Inheritance Disorder Prediction Using Machine Learning

A genetic disorder is a serious disease that affects a large number of individuals around the world. There are various types of genetic illnesses, however, we focus on mitochondrial and multifactorial genetic disorders for prediction. Genetic illness is caused by a number of factors, including a defective maternal or paternal gene, excessive abortions, a lack of blood cells, and low white blood cell count. For premature or teenage life development, early detection of genetic diseases is crucial. Although it is difficult to forecast genetic disorders ahead of time, this prediction is very critical since a person's life progress depends on it. Machine learning algorithms are used to diagnose genetic disorders with high accuracy utilizing datasets collected and constructed from a large number of patient medical reports. A lot of studies have been conducted recently employing genome sequencing for illness detection, but fewer studies have been presented using patient medical history. The accuracy of existing studies that use a patient's history is restricted. The internet of medical things (IoMT) based proposed model for genetic disease prediction in this article uses two separate machine learning algorithms: support vector machine (SVM) and K-Nearest Neighbor (KNN). Experimental results show that SVM has outperformed the KNN and existing prediction methods in terms of accuracy. SVM achieved an accuracy of 94.99% and 86.6% for training and testing, respectively.


Introduction
Genes are the building blocks of heredity. ey are passed down through the generations. ey contain deoxyribonucleic acid (DNA), which includes protein-making instructions. A mutation is a change in one or more genes that happens on a regular basis. e mutation changes the gene's instructions for making a protein, leading it to either not work properly or not exist at all.
is can lead to a genetic disorder, which is a serious illness. One or both parents can pass on a genetic mutation to their children. Everybody is susceptible to mutation at some point in their lives [1]. ere are illnesses caused by mutations inherited from the parents at birth. Congenital mutations in a gene or a combination of genes that appear at different times in life might cause other disorders. A mutation of this type may occur at random or as a result of environmental factors [2].

Multifactor Genetic Disorder.
ese disorders are caused by mutations in numerous genes, and they are typically the consequence of a complex interplay of environmental and nutritional factors. It is sometimes referred to as a complicated or polygenic disease [3]. Cancer, diabetes, and Alzheimer's disease can all be linked to a multifactor genetic condition.

Mitochondrial Genetic Disorder.
It is associated with mutations in the mitochondrial nonnuclear DNA. Each mitochondrial genome contains 5 to 10 circular DNA segments. During fertilization, they maintain their organelles as eggs. As a result, this condition is always inherited from the mother [3]. e mitochondrial genetic condition causes mitochondrial encephalopathy, lactic acidosis, stroke-like events, and eye damage. "Every year, about 140 million toddlers are born throughout the world, with ten million of these toddlers being born with a severe birth defect of genetic or partially genetic origin, many of which are identified late," said Linguraru. e genetic disease prediction challenge was first handled as a two-class classification issue for machine learning research, with a classification model consisting of true and false training data. Decision trees, K-NN, naïve Bayesian classifier, and binary SVM classifier were employed [4]. Positive training samples in binary classification systems contain genes associated with known illnesses, whereas negative samples do not. Machine learning technology may be used to detect the presence of a genetic condition utilizing a facial photograph taken at a point of care, such as a pediatric office, maternity ward, or general practitioner clinic, as well as the 'patient's medical history [5].
e major contributions of this study are given below: (i) Proposed a IoMT-based machine learning model to predict mitochondrial and multifactorial genetic disorders. (ii) e proposed model will improve previously used machine learning techniques with the help of different simulation parameters.
(iii) Proposed framework uses unique data preprocessing techniques to enhance the prediction results. (iv) e proposed model uses various statistical matrixes to check the performance and reliability.

Literature Review
e identification of the most likely disease candidate genes is an important issue in biomedical research, and several methodologies have been proposed [6,7]. Formalized paraphrase Most early techniques, such as ToppGene [8], highlighted candidate genes by rating them according to morphological or behavioral systems and correlating these ranks to commonly identified illness genes. ese schema techniques have the limitation of being unable to find indirect relationships between genes that do not yet share comparable characteristics or activities. Biological networkdriven gene prioritizing approaches have recently been developed to solve this issue [6,[9][10][11][12].
e coverage of functional genomic data, where new high technologies have provided a huge quantity of behavioral data among biological components, has resulted in the development of such network-based approaches over application techniques as well as protein structures. Machine learning algorithms have recently been effectively implemented to many important biomedical problems [13,14], including genetic code explanation [15], genetic analysis categorization [16,17], deductive reasoning of gene monitoring networks [18], drug target prognosis [19,20], and revelation of epigenetic interactions in malady statistics [21,22], as well as pharmacology [23]. Machine learning has been used to predict disease-associated genes [24,25]. e challenge is typically framed as a classification job in which known genetic disorders and biological data linked with medical history data are used to build a classification model that is then used to predict emerging genetic illnesses. So, more pragmatic techniques have been developed. In fact, unary classifiers that can only be trained from positive data have been proposed [26]. To combine data from various sources, this research employed a binary support vector machine. Because the remaining collection may contain genes for unknown disorders, semisupervised learning approaches such as semisupervised binary learning techniques [27] and positive and negative [28] were proposed. In previous research, they used machine learning for genome disorder prediction with the help of DNA sequencing data and unary classification. Due to sequencing data results, they are impactful but not efficient to predict different kinds of genetic disorders with perfect accuracy and on time. e major drawback in previous research is DNA sequencing data. Due to this, results vary from paternal to maternal genes and ignore most of the parameters like abortion counts, etc. e authors [29] employed fine Gaussian SVM on hepatitis C patients using public data and achieved 97.9% resultant accuracy. A previous study [30] used the IoMT architecture empowered with a deep neural network for intrusion detection and achieved a 15% increased test results.

2
Computational Intelligence and Neuroscience In this research, we used different supervised machine learning approaches with the help of patient medical history to predict mitochondrial and multifactorial genetic inheritance disorders. With the help of this study, the proposed model easily overcomes the drawbacks of DNA sequencing and achieved the best prediction accuracy. Table 1 shows the limitations of previous studies. It shows that Asif et al. [31] achieved 79% prediction accuracy empowered with RF and SVM used miRNA feature base dataset and having handcrafted features and imbalance data limitation. Alshamlan et al. [32] achieved 81% prediction accuracy empowered with the GBC algorithm used the SRBCT feature base dataset and having handcrafted features and imbalance gene sequence data limitation. KhaderKhader et al. [33] achieved 80.5% prediction accuracy empowered with BA and SVM used gene seq feature base dataset and having imbalance gene sequence limitation.

Materials and Methods
e ability to forecast genetic disorders allows doctors to provide drugs that are helpful to the patient's health, and patients may easily maintain their health before any severe complications arise. We employed machine learning techniques such as SVM and KNN to predict mitochondrial and multifactorial inheritance gene disease in this research. Following the prediction analysis, we highlighted the model with the best accuracy in this study. Figure 1 shows our workflow from dataset selection to prediction. e proposed model uses IoMT technology to gather data from numerous hospitals with the help of different digital devices which can vary from hospital to hospital. With the help of IoMT, the collection of process data is easy and beneficial for further simulations. e suggested model is unique in that it picks and downloads a novel tagged dataset of genomic abnormalities from Kaggle. is dataset consists of 12,280 instances, 28 independent features, and one dependent feature (output class). Data were preprocessed in the early phases of this work, performing data normalization, replacing null or missing values applying different mean techniques, and splitting the dataset into two halves: training and testing. e proposed model uses two machine learning techniques in the training phase: SVM and KNN for training on 70% of the dataset. e remaining 30% of the data is utilized for testing. As a consequence, based on the best accuracy, we chose the best-predicted model, which has been described in the simulation result section. Before we describe the simulation results it is appropriate to briefly describe the algorithms employed in this work.
3.1. Support Vector Machine. Support vector machine algorithm attempts to process the raw data onto a discrete feature space before generating an ideal interval hyperplane that can discriminate between positive and negative examples. We use a two-class SVM approach in this classification, and we create the training set using molecular sequences and interaction data, as reported in [27]. e positive training data includes all known illness genes, whereas the negative training data includes genes linked with new diseases and an additional 10% of genomic sequences. e study [28] also uncovered EPI-related genes using a binary class SVM classifier. 69 binary characteristics of known PID and non-PID genes were combined to produce the classifier. e trained classifier identified 1,442 potential PID genes. In this work, a binary class SVM is trained on 29 functions and 70% of the dataset instances.
To show the characteristics of yi, linear combination variables β i may be used to choose the vectors of the SVM hyperplane. A hyperplane relation is defined as [34,35]: where k is the kernel function k(x, y) and m is a constant. Polynomial kernel function used for the training dataset is as follows [34][35][36]: where k is the kernel function and y is the instance of features. SVM classifier minimizes the variables by soft margins.
e soft margins minimizing classifier is represented by equation (3) above, whereas the hard margins classifier is represented by β. Using a limited optimization problem, soft margin equation (3) can be rewritten as follows [37]: where i � {1, . . ., n} and ζ i is the smallest nonnegative number.

K-Nearest Neighbors.
e KNN is a nonlinear predictive model developed in 1951 by Evelyn Fix and Joseph Hodges and later modified by omas Cover [28]. It is utilized in the segmentation and prediction of data. For both cases, the feed is a dataset containing the nearest k training sets. e outcome is determined by whether KNN is used for classifying or predicting. To improve prediction outcomes, the suggested model employed KNN for prediction and used a 70% training dataset to train the model based on features by varying the number of k folds. Statistical formation of KNN is given as [38]: In the KNN classifier, the k-nearest neighbors is given a weight of 1/k, while the remainder are given a weight of 0. e j th nearest neighbor is assigned weight f nj with [38].
Computational Intelligence and Neuroscience

Dataset
We used the genome disorder dataset from Kaggle [39]. is dataset contains the medical histories of 12,280 people who have mitochondrial and multifactorial genetic inheritance disorders. ere are 28 independent variables and one dependent variable in the genomic disorder dataset. In data preparation, the suggested model uses several missing value strategies to substitute null values.

Simulation Results and Discussion
SVM and KNN machine learning methods were used to train and test the proposed model. e classification accuracy, miss-classification rate, precision, sensitivity, and F1 score are used to evaluate these algorithms. e suggested model's initial stage involves preprocessing the data, replacing missing values, and dividing the data into two phases: training and testing. e suggested model is subsequently trained for the testing phase using SVM and K-NN machine learning methods. e simulation results from the proposed model are detailed below in terms of several prediction parameters. In the first phase, simulation results demonstrate confusion matrices of training and testing for both machine learning algorithms, and then the comparison of their parameters is presented in the second phase. Table 2 shows the simulation parameters of the proposed model of SVM and KNN. It shows that the KNN model uses a total number of 5 neighbors with the exhaustive NS method, Minkowski distance between neighbors and standardize equals true. In parallel SVM uses a polynomial kernel function with auto kernel scale having 3 polynomial orders and standardize equals true. e training confusion matrix of the SVM and K-NN algorithms can be seen in Table 3. e trained KNN model's confusion metric yields 6922, 657, 825, and 191 scores of true positive, true negative, false positive, and false negative, respectively. SVM received 6959, 1205, 277, 154 attributes of true positive, true negative, false positive, and false negative. As a result, the suggested model demonstrates that SVM obtains the greatest true positive rate when compared to the KNN model. Table 4 depicts the prediction outcomes of both machine learning algorithms using the suggested model. e confusion metric for testing the K-NN model receives 3023, 115, 469, 77 attributes of true positive, true negative, false positive, and false negative, respectively, while the confusion metric for testing the SVM receives 2931, 262, 322, 169 attributes of true positive, true negative, false positive, and false negative. e suggested SVM model Figure 2 gets the lowest mean squared error of 0.1089 after 24 epochs. It signifies that the suggested model's prediction results are accurate and efficient. Furthermore, this value has been improved by vary simulation hyper parameters, dataset with numerous numbers of iterations.
In Table 4 the accuracy, miss-classification rate, sensitivity, precision, and F1 score values are calculated by using the formulas mentioned below [ e proposed model outcomes are analyzed using accuracy, miss-classification rate, precision, sensitivity, and F1score analysis parameters.     Table 6 shows the comparative analysis of previous studies with the proposed model and it shows Asif et al. [31] achieved 79% prediction accuracy empowered with RF and SVM used miRNA feature base dataset and having handcrafted features and imbalance data limitation, Alshamlan et al. [32] achieved 81% prediction accuracy empowered with GBC algorithm used SRBCT feature base dataset and having handcrafted features and imbalance gene sequence data limitation, KhaderKhader et al. [33] achieved 80.5% prediction accuracy empowered with BA and SVM used gene seq feature base dataset and having imbalance gene sequence limitation and on the other side the proposed model achieves 86.6% prediction accuracy empowered with SVM using genetic clinical feature based data and with IoMT technology. e proposed model achieves the best accuracy using the proposed model of SVM with the help of different simulation parameters which are far better than previously researched articles. So, it shows with the varying of simulation parameters models can get the best training and testing results.

Conclusion and Future Work
Smart machine learning plays a critical role in the early detection of genetic disorders. SVM and K-NN techniques were employed in this study to predict mitochondrial and multifactorial genetic inheritance disorders. e medical history of a patient provides significant information about a genetic problem, and this information is employed by the suggested model to forecast genetic inheritance disorders. SVM has the highest prediction accuracy of 86.6 percent, and it outperforms genetic sequence methods in terms of prediction performance. Patients and physicians will benefit from this research since it will allow them to predict gene abnormalities quickly and save lives. We also intend to develop this study in the future by using multiclass categorization of cancer, dementia, and diabetes, which will be extremely useful in the health care industry.
Data Availability e data used in this paper can be requested from the corresponding author upon request.

Disclosure
Atta-ur-Rahman and Muhammad Umar Nasir are the cofirst authors.

Conflicts of Interest
e authors declare that there are no conflicts of interest.