Predicting Divorce Prospect Using Ensemble Learning: Support Vector Machine, Linear Model, and Neural Network

A divorce is a legal step taken by married people to end their marriage. It occurs after a couple decides to no longer live together as husband and wife. Globally, the divorce rate has more than doubled from 1970 until 2008, with divorces per 1,000 married people rising from 2.6 to 5.5. Divorce occurs at a rate of 16.9 per 1,000 married women. According to the experts, over half of all marriages ends in divorce or separation in the United States. A novel ensemble learning technique based on advanced machine learning algorithms is proposed in this study. The support vector machine (SVM), passive aggressive classifier, and neural network (MLP) are applied in the context of divorce prediction. A question-based dataset is created by the field specialist. The responses to the questions provide important information about whether a marriage is likely to turn into divorce in the future. The cross-validation is applied in 5 folds, and the performance results of the evaluation metrics are examined. The accuracy score is 100%, and Receiver Operating Characteristic (ROC) curve accuracy score, recall score, the precision score, and the F1 accuracy score are close to 97% confidently. Our findings examined the key indicators for divorce and the factors that are most significant when predicting the divorce.


Introduction
Divorce (or dissolution of marriage) is the definitive termination of a marital partnership, canceling the legal duties and responsibilities of marriage and dissolving the parties' matrimonial relations. In other terms, divorce is a constitutional action taken by married people to end their marriage. It is also known as marriage dissolution and is the constitutional step that ends a marriage ahead when either partner dies.
In general, there are two sorts of divorce. One option is "divorce from bed and board," which is legal in some jurisdictions. At its essence, this permits couples to legally separate and is frequently utilized by spouses who want to live their own lives but do not want to formally break their marriage for whatever reason. Divorce from bed and board is uncommon these days. An "absolute divorce," which terminates the marriage, is the most prevalent kind of divorce, therefore, to speak, a legal clean break. is topic will be the subject of this article. ere are several ways to achieve the aim of having a court issue, an absolute divorce ruling. For convenience, it has been the usual practice in law to classify each of these procedures as a different type of divorce, which we will do here. e following states contain divorce data for the United States. ere have been 2,015,603 weddings. Marriage occurs at a rate of 6.1 per 1,000 of the population in total. ere have been 746,971 divorces. Divorce occurs at a rate of 2.7 per 1,000 people (45 reporting states) [1].
Divorce occurs at a rate of 16.9 per 1,000 married women. Many experts believe that this is a far more authentic representation of the genuine divorce rate [2] than the raw number. e divorce rate for every 1,000 married women is about double of what it was in 1960; nonetheless, it is lower than the all-time high of 22.6 in the early 1980s. In the United States, about half of the total marriages end in separation or divorce. According to the researchers, 41% of all the first marriages result in divorce. e second marriages fail about 60%. All third marriages end in divorce about 73%. e United States has the world's sixth highest divorce rate [3].
Machine learning is an artificial intelligence (AI) technique that enables computers to automatically develop and learn on their own without being explicitly programmed. Machine learning [4] is anxious with the establishment of computer programmers that can access information data and employ it to learn on their own. Text classification [5] is a machine learning approach that assigns tags or categories to text automatically. Text classifiers can evaluate and categorize text by sentiment [6], subject, and consumer intent using natural language processing (NLP) [7] quicker and more correctly than people.
Ensemble modeling is an effective method for improving the performance of our model. It typically pays to use ensemble learning in addition to any other models we may be developing. Ensemble learning techniques [8] are a kind of machine learning methodology that accommodates numerous base techniques to create the best prediction technique. e divorce prospect prediction is the core objective of this novel research study. e main contributions of this research are the following: (i) A novel research study in terms of divorce prospect prediction using a questionnaire dataset is proposed in this paper. (ii) e three advanced machine learning models, support vector machine (SVM), passive aggressive classifier (PAC), and neural networks (multilayer perceptron classifier) are utilized for the prediction task. Our employed techniques are fully hyperparameter tunned. (iii) An enhanced novel ensemble learning approach based on three machine learning techniques is employed to predict the divorce prospect of the couple. (iv) e divorce exploratory data analysis (DEDA) is conducted to get fruitful insights to form the dataset and to determine the major factors that cause divorce. (v) e cross-validation (CV) is applied in 5 folds, and the performance results evaluation metric of the proposed approach is examined.
(vi) e comparative analysis of model performance is conducted among the three employed SVM, PAC, and Neural network approaches. e rest of the paper is formulated as: e divorce-related work is examined in Section 2. e architectural methodology analysis of our proposed research approach is analyzed in Section 3. e applied advanced machine learning techniques are examined in Section 4. en, a novel ensemble learning approach based on three machine learning techniques is discussed in Section 5.
e results and evaluation of the proposed approaches are explained and deliberated in Section 6. en, to conclude the research work, Section 7 contains the conclusion of this novel research study.

Related Work
e authors used Yöntem's findings to construct 56 questions as divorce predictors. Furthermore, they employed four automated learning models (perceptron, logistic regression, neural networks, and randomized forest) as well as three hybrid models based on voting criteria. Each of these models was trained in 5 distinct scenarios, resulting in a total of 35 tests, with the performance attained in terms of accuracy, sensitivity, and specificity is 0.98, 1.0, and 0.96, respectively, for the perceptron model and a hybrid model [9]. e categorization approaches are used to forecast divorce in Turkey. In 2019, the authors carried out this investigation. ey determined in this study that the ANN technique paired with a correlation-based matrix of feature space selection performs best, with an accuracy score of 98% and a Kappa value of 0.97. e SVM model training span is also less than that of the ANN model training span [10]. e authors utilized significant characteristics in this suggested study by deleting duplicate features that do not help with the prediction by applying an improved machine learning technique to the standard dataset accessible to forecast the divorce rate. ey were able to reach 99% accuracy. is technique may also be utilized as evidence by family counseling professionals on a couple's emotional and psychological well-being [11].
Within the area of this study, divorce prediction was performed utilizing the Divorce Predictors Scale based on the Gottman couple's therapy. DPS's success was explored utilizing the multilayer perceptron (MLP) neural networks and decision tree algorithms. e study also seeks to identify the most important features of the Divorce Predictor Scale values that influence divorce. When the direct classification learning methods were applied to the divorce dataset, the RBF neural network had the greatest success rate of 98%.
is scale can be used by family counselors and family therapists to help with the case formulation and intervention planning. Furthermore, the predictors of divorce in the Gottman couple relation therapy were verified in the Turkish samples [12].
In a long-term, prospective longitudinal research, this paper explores the predictability of divorce. During the 14-year research period, the prediction was attainable with a technique that incorporated marital happiness, concerns of 2 Computational Intelligence and Neuroscience the marriage breakup, and emotional interaction in both talks. e algorithm correctly predicted divorce 93% of the time [13]. An artificial neural network (ANN) technique was created and employed in this research to predict whether or not a couple will divorce. e prediction is based on several questions that the couple acknowledged, and the answers to those questions served as the input data to the ANN model. e model was subjected to repeated learning over training data and validation cycles until it achieved 100% accuracy [14].
e authors are offering a study on the prediction of divorce cases using available machine learning techniques in this paper. e authors compared the accuracy of the perceptron learning classifier, random forest learning classifier, decision tree learning classifier, Naive Bayes learning classifier, support vector machine learning classifier, and K-nearest neighbor learning classifier for divorce case prediction. Following training, the algorithm will forecast whether or not the divorce will materialize. is allows the therapist to assess how stressful a couple's condition is and properly counsel them. With the perceptron model, the authors attained an accuracy of 98% [15]. e detection of COVID-19 based on a blood test was proposed in this study [16]. e ensemble-learning-based approach was developed for the prediction of COVID-19. At the first stage of research, the deep-learning-based classifier convolutional neural network (CNN) was utilized. e dataset was used from the San Raffaele Hospital. In the second stage of research, the 15 different machine-learning-based classifiers were applied. e findings of the research study show that the ensemble learning model achieved an accuracy score of 99%.
Malware detection based on ensemble learning techniques is proposed in this study [17]. e fully connected convolutional neural network (CNN)-based classifier was developed for base stage classification. e machine-learning-based models were utilized for end-stage classification. 15 machine-learning-based classifiers were utilized for malware detection. e dataset of Windows Portable Executable (PE) malware was used for model training and testing results. e research findings show that the fully connect CNN ensemble model and machine-learning-based extra trees classifier achieved an accuracy score of 100%.
In conclusion, our proposed novel research study is based on the prediction of divorce prospects using ensemble learning techniques. e comparative analysis with the past applied research study shows that our research study outperformed by utilizing advanced techniques. e research study results' outcomes are efficient, validated, and higher than the past applied approaches. We have revealed the key indicators for divorce and the factors that are the most significant when predicting divorce in this research study.

Methodology
e methodological analysis of the proposed research study is analyzed in this section. e working flow of our research findings flow is elaborated here. e questionnaire dataset is analyzed and useful insights are taken from it. Feature engineering is applied to make a predictable model with the best-fit features in the context of divorce prediction. e data normalization is applied to make the dataset in perfect form for our proposed model. Now dataset splitting is applied to split the dataset into two portions. e 80% portion of the data is used for model training and 20% is utilized for model testing and performance evaluation. e three models are applied with the ensemble learning approach. Finally, the ensemble learning model prediction is used for predicting the divorce. e research methodology for this novel research is examined in Figure 1. It visualizes the workflow of the complete research study. In the first step, the questionnaire dataset is analyzed by the exploratory data analysis (EDA). en, in the next step, feature engineering is applied to get the useful features for the ensemble learning model. en, the data normalization is applied. e dataset splitting is applied in the next step. en, the train portion is given to the model, and then, the test model results in the evaluation of the test portion. After all these methodology steps are done, a predictive ensemble learning model is formed and ready to predict the divorce of a couple.

Dataset.
e dataset is based on the questions asked by the specialists to the married couples [18]. e answers to these 54 questions will predict the chance of divorce between them. e questions are graded on a scale of 0 to 4, with 0 being the worst and 4 being the best. e last category indicates whether or not the couple has divorced. Table 1 contains the descriptive dataset analysis.

Divorce Exploratory Data Analysis.
e divorce exploratory data analysis (DEDA) refers to the essential process of administrating preliminary investigations on data to spot anomalies. e uncovered data patterns can be found by applying DEDA. e test hypotheses are performed using DEDA. e assumption validation using graphical representations and summary statistics is demonstrated by utilizing the DEDA. e bar plot is a plot on the Divorce_Y_N column in Figure 2. In the bar plot, 0 represents the number of divorce class and 1 represents the divorce class. e bar plot shows the total number of divorces and not divorce value. e value of divorce in Figure 2 is 86, and the value of number of divorce is 84. e bar chart shows that the data set is balanced. Both classes have approximately the same number of rows. e violin chart is the plot based on the dataset to explore the cause of divorce in Figures 3 -5. A violin graph is a cross between a kernel density plot and a box plot that visualizes the data peaks. It is utilized to display how numerical data points are distributed in the employed dataset.
As opposite to a box plot, which can only bring summary statistics, violin graphs visualize summary statistics as well as the frequency of every variable. In the violin plot of the I'm_not_wrong (51) column, we explore that as the intensity of value increases, the number of divorces increases, and as the value decreases, the number of divorces decreases. e Computational Intelligence and Neuroscience analysis graph also shows that it has a great impact on the Divorce_Y_N column.
In Figure 3, the data from the violin plot is also explored with the column of love (16), common goal (10), and enjoy holidays (8).
e graph shows the cause of divorce and no_divorce when the value of the scale changes. e violin plot is also plotted on the column of happy (17), always never (32), trust (21), and you are inadequate (53) in Figure 4. e violin plot shows how the cause of divorce changes when the scale changes. e violin plot of argue_then_leave (42), humiliate (36), and friend social (30) is analyzed in Figure 5. In Figure 5, we explored whether the effect of divorce change is linked with the scale change through the violin plot..
All these applied divorce analyses prove to be very fruitful in the context of getting useful insights from the dataset and its related features. e histogram chart is the plot of the dataset in Figures 6 and 7. A histogram is referred to as a data representation tool, which appears to be a bar chart that buckets a variation of outcomes along with the x-axis columns. e numerical value count or percent of value occurrences in the dataset for every column is represented on the y-axis.
We get the histogram of features 2_stranger (7) (21) showing the total number of counts on the y-axis and the 0 to 4 scale on the x-axis. e histogram chart is plotted on trust, role, marriage, love, and dreams columns and explored the number of counts on a different scale on the y-axis and x-axis, respectively.
From Figure 6, we have analyzed that the feature I'am_not_wrong (51) has higher rank values among all. is shows that this feature question has a major cause of divorce and that's why it has higher ranked scale values.
is applied divorce histogram analysis is based on the prominent questions present in the dataset and their scale ranks.
ese questions are analyzed to get their feature importance and to determine the relationship between divorce causes.
ese features are for model training and getting divorce prediction from it.
A correlation graph displays the correlations for various variables present in the dataset employed. e correlation matrix emphasizes the relationship between all the possible pairings of values in a dataset. It is a powerful tool for summarizing a large dataset in addition to visualizing and identifying trends in the provided data. We draw the correlation matrix on the dataset in Figure 8. e visualized features are based on the correlation values above or equal to 0.7. e feature that has low correlation values is not present in the feature display map. e correlation matrix shows that all features are highly related. All features are important to use for the training of our model.

Feature Engineering.
e technique of changing the raw dataset into a prominent feature space that well describes the root problem of predictive techniques, resulting in improving absolute correlation features are examined in Figure 9. e fav_food (24), know_well (30), freedom_value (12), marriage (18), special_time (5), roles (19), harmony (11), happy (17), enjoy_travel (9), insult (36), humiliate (37), and trust (21) are the top correlated features. If one of us apologizes when our discussion deteriorates, the discussion ends. 2 I know we can ignore our differences, even if things get hard sometimes. 3 When we need it, we can take our discussions with my spouse from the beginning and correct them. 4 When I discuss this with my spouse, contacting him will eventually work. 5 e time I spent with my wife is special for us. 6 We don't have time at home as partners. 7 We are like two strangers who share the same environment at home rather than family. 8 I enjoy our holidays with my wife. 9 I enjoy traveling with my wife. 10 Most of our goals are common to my spouse. 11 I think that one day in the future when I look back, I see that my spouse and I have been in harmony with each other. 12 My spouse and I have similar values in terms of personal freedom. 13 My spouse and I have a similar sense of entertainment. 14 Most of our goals for people (children, friends, etc.,) are the same. 15 Our dreams with my spouse are similar and harmonious. 16 We're compatible with my spouse about what love should be. 17 We Computational Intelligence and Neuroscience

Dataset
Splitting. Dataset splitting appears as a requirement for removing bias from training data in machine learning systems. e dataset is split into two sets: the training dataset, which is used by the model to learn an efficient mapping of inputs to output, and the test set, which is utilized to effectively assess the proposed model's result performance. is division prevents the employed technique from overfitting [19]. e dataset splitting utilized in this research has a ratio of 80: 20. e 80% portion of the dataset is used to ensemble learning models, and the 20% portion of   Computational Intelligence and Neuroscience the dataset is utilized for testing and evaluating the ensemble model. e random state unit for splitting is 42.

Passive Aggressive Classifier.
e passive-aggressive categorization [20] is one of the accessible incremental learning methods because it uses a closed-form updating rule. In the sense that they do not require a learning rate, passive-aggressive algorithms are akin to perceptron models. ey do, however, contain a regularization parameter. e classifier updates its weight vector for each misclassified training sample it gets in an attempt to fix it. e hyperparameters by tuning analysis of the passive-aggressive algorithm are examined in Table 2.

Support Vector Machine.
e support vector machine (SVM) [21] is a supervised learning model that is utilized to solve regression and classification problems. It is largely employed in categorization-related difficulties. Every data item is visualized as a point in n-dimensional space, where n is the number of data features. e value of every data feature is the worth of a certain coordinate in the SVM model. en, we achieve classification by establishing the hyper-plane that best distinguishes the two classes of the employed dataset. e SVM technique hyperparameters are analyzed in Table 3.

Neural Networks.
A feedforward artificial neural network (ANN) that generates a set of outputs from a set of employed inputs is referred to as a multilayer perceptron (MLP) neural network [22]. An MLP is referred to by various layers of employed input nodes that are associated as a directed graph between the output and input layers. Backpropagation is utilized by MLP to train the employed neural network. An MLP is a neural network that joins many layers in a directed graph, which means that the data signal routed across the graph nodes is only a single direction. In addition to the input nodes, every node has an activation function of the nonlinear form.

Ensemble Learning
e ensemble learning approach is examined and applied in this research. e architecture of the applied approach, the ensemble approach, is analyzed in Figure 10. e training dataset is used for training the three classification models utilized in this research. e SVM, linear model, and neural network model are trained and tested parallelly using the pipeline of ensemble learning. e ensemble learning architecture is based on the logic to train and test all model underlying models in parallel. Now, the testing results are used by the "hard" voting function to find the average accuracy of the model. We have applied hard voting because our classification data depends on class labels and the associated weights with every classifier. e higher accuracy score is our best prediction value.

Results and Evaluation
All performance evaluation metrics utilized in this research are examined in this section. e ensemble learning model accuracy score value, ROC accuracy score value, recall score value,   Computational Intelligence and Neuroscience 13 precision score value, and F1 score values are the performance evaluation metrics employed in this research study. One parameter for assessing the classification models is accuracy. e accuracy score value is the percentage of the correct number of predictions made by our proposed model. e accuracy of our proposed technique is 100%. Formally, accuracy is represented by using the following mathematical equation: accuracy � number of correct predictions total number of predictions .
(1) e ROC curve is referred to as the probability curve analysis that displays the true positive rate (TPR) outcome vs the false positive rate (FPR) outcome at numerous threshold settings, separating the signal data from the noise data. e area under the curve (AUC) is a measure of an employed learning classifier's ability to discriminate between classes and is utilized to summarize the ROC curve. e ROC AUC of our proposed technique is 97%. e mathematical equation expresses the ROC AUC score: Precision is referred to as the ratio of true positives rate (TPR) outcomes to all positive outcomes. e recall is a measure of how well our model identifies true positives. In our case, both have a 97% score. e mathematical equation that expressed the precision and recall: precision � true positive true positive + false positive , recall � true positive true positive + false negative .
e F1 score value is measured by taking the weighted average value of recall and precision. As a result, this score value examines both the false positives rate (FPR) and the false negatives rate (FNR). e F1 score is periodically more valuable than the accuracy score value, exclusively if the dataset class distribution is not equal. In our situation, the F1 score is 97%. Mathematically, it is reparented as follows: F1 score � 2 * (recall * precision) (recall + precision) .
e hyperparameter tuning results before and after are analyzed in Table 5. e k-fold cross validation comparative results are analyzed in Table 6. e applied learning techniques comparative analysis with the ensemble learning approach is demonstrated in Tables 7 and 8. A confusion matrix (CM) analysis is referred to as a summary of the employed classification problem and the prediction outcomes as visualized in Figure 11. e number of right and wrong predictions is summarized with count values and divided by dataset category. e CM displays several methods in which the classification technique gets perplexed when making predictions. It is critical to assess the model's performance once it has been trained using some training data. When we developed a confusion matrix, we had several components:

Conclusion
e prediction of divorce by using machine learning and ensemble learning techniques is the core motive of this research study. e findings of our study are based on key indicators for divorce and the factors that are most significant when predicting divorce. e support vector machine (SVM), passive aggressive classifier, and neural network (MLP) are applied to predict divorce. e cross-validation and performance evaluation techniques are manipulated to evaluate the proposed models. Our EL proposed technique achieved the highest accuracy of 100%. In the context of limitations and future directions, we will try to enhance the questionnaire dataset by adding more questions to get more clarified results and also apply the data augmentation techniques. To reduce overfitting, we will explore different deep learning models.
Data Availability e supporting data for the findings of this study are available from the corresponding author on reasonable request.

Consent
Informed consent was obtained from all subjects involved in the study.

Conflicts of Interest
e authors declare no conflicts of interest.

Authors' Contributions
M. M. S. F. conceptualized the study, carried out supervision, conducted the survey, and assisted in data collection. A. R. and F. Y. performed data analysis and manuscript writing. G. A.,, M. A, N. Z, and A. T. provided resources and carried out data curation, funding acquisition, and project administration. All authors have read and agreed to the published version of the manuscript.