Stress Estimation Model for the Sustainable Health of Cancer Patients

Good health is the most important and very necessary characteristic for stress-free, skillful, and hardworking people with a cooperative environment to create a sustainable society. Validating two algorithms, namely, sequential minimal optimization for regression (SMOreg) using vector machine and linear regression (LR) and using their predicted cancer patients' cases, this study presents a patient's stress estimation model (PSEM) to forecast their families' stress for patients' sustainable health and better care with early management by under-study cancer hospitals. The year-wise predictions (1998-2010) by LR and SMOreg are verified by comparing with observed values. The statistical difference between the predictions (2021-2030) by these models is analyzed using a statistical t-test. From the data of 217067 patients, patients' stress-impacting factors are extracted to be used in the proposed PSEM. By considering the total population of under-study areas and getting the predicted population (2021-2030) of each area, the proposed PSEM forecasts overall stress for expected cancer patients (2021-2030). Root mean square error (RMSE) (1076.15.46) for LR is less than RSME for SMOreg (1223.75); hence, LR remains better than SMOreg in forecasting (2011-2020). There is no significant statistical difference between values (2021-2030) predicted by LR and SMOreg (p value = 0.767 > 0.05). The average stress for a family member of a cancer patient is 72.71%. It is concluded that under-study areas face a minimum of 2.18% stress, on average 30.98% stress, and a maximum of 94.81% overall stress because of 179561 expected cancer patients of all major types from 2021 to 2030.


Introduction
There is an intense need for a sustainable society for every resilient city in the world, and this need is fulfilled by such people who have the characteristics which can play the role of pillars to develop a successful civilization. These characteristics include "unstressedness," "skillfulness," "hardworking," and "cooperativeness." Cancer is one of the most devastating diseases and causes many deaths. It was reported that, in 2020, from 185 countries, 19.3 million new cases of 35 types of cancer and 18.1 million cases of nonmelanoma skin cancer were estimated, whereas 10.0 million people died from 35 cancer types and 9.9 million patients died only from nonmelanoma skin cancer [1]. From 1998 to 2020, 201767 patients, with different cancer types, were registered within these 23 years only in three hospitals in the Punjab province of Pakistan. Therefore, cancer has become a great burden for sustainable public health. It has become the cause of immense stress for all family members if there is a cancer patient in the family. Such family members cannot work hard, even having qualities like "skillfulness," "hardworking," and "cooperativeness," to create a sustainable society for a resilient city.
Machine learning gave us different algorithms to implement for social sciences in data mining [2][3][4][5][6][7]; mostly, regression models are used for prediction. Linear regression is implemented to predict the value for a dependent variable using independent values. Multiple regression uses several explanatory variables to predict the outcome of a response variable. Using a support vector machine (SVM) [8], sequential minimal optimization (SMO) [9] was proposed for solving the regression problem [10]. SMOreg was an improvement to SMO for SVM regression presented by Shevade et al. [11]. A study compared linear regression and SMOreg for predicting in the business area [12].
Good health is the most important and very necessary characteristic for stress-free, skillful, and hardworking people with a cooperative environment to create a sustainable society. As discussed above, cancer has become the cause of immense stress for all family members if there is a cancer patient in the family. Such family members cannot work hard, even having qualities like "skillfulness," "hardworking," and "cooperativeness," to establish a viable civilization. Therefore, to overcome or reduce this stress on the families, there is a need for early management by every hospital for better care of cancer patients, especially in underdeveloped countries like Pakistan. This study presents a model to forecast their families' stress for patients' sustainable health and better care with early management by under-study cancer hospitals. To use the predicted number of new cases from 2021 to 2030 in the estimation of the stress, this study also validates the predicted results by linear regression and SMOreg, because some of the previous studies validated and others did not verify the forecasted cases of cancer patients.

Literature Review
Literature has intensive work regarding prediction models for different diseases. Reddy et al. presented an adaptive genetic algorithm with a fuzzy logic model to predict devastating heart disease with a better approach to predicting at early stages [13]. A study proposed a novel approach for classifying the infant cries of a newborn into three groups such as sleep, hunger, and discomfort [14]. Ramaneswaran et al. proposed a hybrid Inception model that is v3 XGBoost for the classification of severe and deadly disease, lymphoblastic leukemia, from microscopic images of white blood cells [15]. Gundluru et al. designed a model based on deep learning for dimensionality reduction with principal component analysis; an algorithm of Harris hawks optimization was also implemented for optimization of the classification and process of feature extraction. They also extracted the most important features in this regard [16]. The approach of structural equation modeling was used to study the relationships between mental health and parenting stress [17]. The approaches of structural equation modeling and confirmatory factor analysis were used to trial the posttraumatic growth role, physical growth, resilience, and mindfulness for the prediction of health-related and psychological adjustment [18]. Mediation analyses and multivariate regression were used for clarification of the extent to which coping strategies, psychological symptoms, and sleep quality with social support interfere as well as whether they arbitrated the relationship between fatigue or functional capacity and sleep quality in a sample of lung cancer patients treated with chemotherapy [19]. Stress patterns in connection with social support networks of hospice care were shared by Guo et al. [20]. Patient stress was classified with experiments from blood volume pulse by Lisowska et al. [21]. The stress level with related aspects in cancer patients was discussed by Durangi et al. [22]. Mikkelsen et al. shared the effect of emotional therapy in psychologically upset caregivers of tumor patients [23]. Stress in cancer patients was assessed by Safaei and Shokri using a factorial validity factor [24]. The research community has also published fruitful results regarding predictions for coming years to give oncologists better management and healthcare ideas during the treatment of this lethal disease [25][26][27][28][29][30][31][32][33][34]. A study presented a comprehensive analysis discussing the risk of incidence of subsequent hematological malignancies for primary tumors in cancerous patients [35]. The performances on the Wisconsin Breast Cancer dataset of different machine learning algorithms including Decision Tree, k Nearest Neighbors, support vector machine, and Naive Bayes were compared to observe the accuracy in classifying that dataset regarding the effectiveness and efficiency of each algorithm [32]. Table 1 shows the related results about developments in different areas published in recent years. In 2012, worldwide mortality and incidence rates of breast cancer were investigated using age-specific mortality and incidence rates [31]. Breast cancer statistics of four countries, including the US, UK, Egypt, and India, were shared in 2015 [26]. According to a prediction, around 3.2 million new cases of female breast cancer worldwide per year will be seen by 2050 [27].

Method
There are three parts of this study. The first part evaluates the used approaches (LR and SMOreg), and the second part forecasts and compares the number of predicted cases of cancer patients (by these approaches) to be used in the third part of the study, whereas the third part shows the proposed model (patient's stress estimation model) by this study.

Patients and Datasets.
A total of 219882 cases of cancer patients registered from 1998 to 2020 were obtained with year-wise details from three sources. The first data source was the record room of the Clinical Oncology Department of Allied Hospital, Faisalabad Medical University, Faisalabad, Pakistan. The second data source of this study was Shaukat Khanum Cancer Registry [36] at Shaukat Khanum Memorial Cancer Hospital and Research Centre, Lahore, Pakistan, whereas the third source of the data, used in this study, was derived from a previous study [37]. After data cleaning and organization, cases of 2815 repeated incidences were removed, and finally, 217067 cancer patients were listed year-wise in two parts of the dataset for this study. The first part named, "CancerPatients1998to2010," contained the cases of 88710 patients listed year-wise from 1998 to 2010. The second part named, "CancerPatient-s1998to2020," had a list of 217067 patients saved year-wise from 1998 to 2020. The adopted methodology of this study is shown in Figure 1  Computational and Mathematical Methods in Medicine was used in the first part because we wanted to evaluate both approaches before forecasting new cancer incidences from 2021 to 2030. Therefore, in the first part, the LR model and SMOreg were implemented to predict the number of cancer patients from 2011 to 2020 providing a list of cancer patients registered from 1998 to 2010. LR and SMOreg were configured by five properties including "selected attribute," "number of times units to forecast," "timestamp," "periodicity," and "confidence interval" providing them with values "patients," "10," "year," "yearly," and "95%," respectively.

Configuration for Forecasting Cancer
Patients from 2021 to 2030. In the second part, again, LR and SMOreg were implemented with the same configuration, as discussed in Section 3.1, to forecast the year-wise number of patients from 2021 to 2030 using the "CancerPatients1998to2020" dataset. Then, there was a need to compare the forecasted values by both approaches. The next section elaborates on the analysis methods used to compare the differences between the predicted values and the known values listed in the dataset.  2017 [17] To study the relationships between mental health, parenting stress, and dyadic adjustment among first-time parents

Structural equation modeling
Showed the full intervention effect of mental health between dyadic adjustment and parenting stress. An analysis for multigroup observed that the paths did not vary across fathers and mothers.
2018 [18] To examine the role of physical posttraumatic growth, posttraumatic growth, resilience, and mindfulness in the prediction of psychological and health-related adjustment Confirmatory factor analysis and structural equation modeling Forecasted quality of life and improvement of lower distress. The relationship between adjustment and resilience was noticed to be negotiated.
2019 [19] To clear up the extent to which coping strategies, psychological symptoms, and social support interfere with good sleep quality and whether they arbitrate the relationship between fatigue and sleep quality or functional capacity of lung cancer patients.

Multivariate regression and mediation analyses
119 patients were enrolled, 58.2% of whom were found having a poor sleep because of cancer stress.

[13]
To forecast heart disease which will help a physician in the diagnosis of heart disease at early stages Rough sets and fuzzy rulebased classification with adaptive genetic algorithm Main strengths of the presented model where it could efficiently tackle noisy data even on a huge number of attributes.

[14]
To categorize the infant cries of a newborn into three groups such as hunger, discomfort, and sleep Acoustic feature engineering and the variable selection using random forests Showed a mean accuracy of around 91% for most situations, and this showed the capability of the suggested great gradient boostingpowered grouped-support-vector network in the classification of neonate cry. Also, the presented approach had a fast recognition rate of 27 seconds in the recognition of those emotional cries.
2021 [15] To classify severe lymphoblastic leukemia from microscopic images of white blood cell Image feature extractor and a classification head Exhibited that using an XGBoost versus softmax classification head enhanced classification performance. Further, the attention map of the extracted features by Inception v3 for interpretation of the features learned by the presented model.

[16]
To detect diabetic retinopathy at the early stages giving better results than other published approaches

Harris hawks optimization
The proposed model surpassed the other leading machine learning algorithms. However, training time was minimized. It was victimized to overfitting producing a negative impact on results when the original dataset was employed. The performance of the proposed approach had been improved even with an increased dataset size by two times.
3 Computational and Mathematical Methods in Medicine and RMSE 2 for LR and SMOreg, respectively, according to values from the list in the "CancerPatients1998to2020" dataset, were then compared for the conclusion. The detail of this analysis is given in Statistical Analysis of this study.   It also causes stress for them when they see a person in their relationship becomes a patient, especially a chronic patient. From the observation and interviews with the under-study patients and with their family members, it was derived that, when a person suffers from cancer, his or her family member becomes stressed because of two major reasons including affiliations and financial aspects. In affiliations, as the first stress-impacting factor, this study includes "father," "mother," "child," "brother," "sister," "friend," "colleague," and "neighbor," whereas "(is patient) working person," "expired," "physical status," "income status," and "treatment expenses" are financial aspects included by this study as the second stress-impacting factor. Other factors that take part in the calculation of total stress for a family of a cancer patient (s) are "number of working family members," "number of independent family members," and "number of expired patients in a family" included in this study.

Estimating Stress for a Family Member of a Cancer
Patient. The first equation of PSEM was derived by this study which is given below:

Computational and Mathematical Methods in Medicine
where Sf denotes the stress for a family member of a cancer patient. A is an affiliation that may be of five types including father/mother, child, brother/sister, friend, and colleague/ neighbor. To estimate the stress, these types are assigned weights: 5, 4, 3, 2, and 1, respectively. wP is for getting input on the question: "Is the cancer patient working person?"; if the answer is "yes," then wP is assigned 10 and 5 otherwise. E is for getting input on the question: "Is the cancer patient expired?"; if the answer is in "yes," then E is assigned 7 and 4 otherwise. The variable pS is for getting input on the question: "What is the physical status of the cancer patient, can he/she work?" The answer may be "cannot work," "can work 25%," and "can work 50%" and is assigned weights: 5, 2, and 1, respectively. The variable iS is for getting input on the question: "What is the income status of the cancer patient?" The answer may be "cannot work," "can work 25%," and "can work 50%," and is assigned weights: 5, 2, and 1, respectively. The variable eT is for the taking input of the question: "What are the expenses for treatment of the cancer patient?". The answer may be "self," if no funding was available; "self and free," if some funding was available; and "free," if funding was available. For "self" and "free," 10 and 1 weights are assigned, respectively, whereas from 2 to 9, weights are assigned for self and free according to the available funding ratio to self-expenses on the treatment of the cancer patient. All the weights are assumed to get the values mathematically calculated. The observation of the under-study data and most of the interviews with many patients derived this study to suppose the above weights.

Calculating Total Stress for a Family of a Cancer
Patient. After estimation of stress for a family member of a  Computational and Mathematical Methods in Medicine cancer patient, PSEM is required to calculate total stress for the whole family of the cancer patient (s). Therefore, using Equation (1) and other factors including "number of working members of a family of a patient (s)," "number of dependent members (who do not work) of a family of a patient (s)," and "number of an expired cancer patient (s) in that family," the following equation was derived by this study to calculate total stress for the whole family of a cancer patient (s) (Figure 2).
where TS denotes the total stress for the whole family of a cancer patient (s). Sf is the stress for a family member of a cancer patient, calculated by Equation (1). nD is the number of dependent members (who do not work) of a family of a patient (s). nW is the number of working members of a family of a patient (s), whereas nE is the number of expired cancer patients in that family.

Estimating the Overall Stress of All Cancer Patients in
Under-Study Areas. Using Equations (1) and (2), PSEM derives the third equation (given below) to estimate overall stress for all cancer patients in the under-study areas.
where OES denotes the overall estimated stress of all cancer patients in under-study areas. nF is the number of families with cancer patients in the under-study areas. TS is the total stress for the whole family of a cancer patient, calculated using Equation (2), whereas pA is the population of the areas of under-study hospitals. Suppose there are 35 families in the area with cancer patients, then the numerator of the given fraction will add the total stresses of 35 families, and then, this sum is divided by the population of that area.   96  3  7  1  481  95  2  6  1  476  94  1  3  0  282  93  1  5  1  466  92  1  4  1  369  91  1  3  0  273  64  0  5  1  385  63  1  2  0  126  61  1  2  0  122  49  3  4  0  98  48  1  2  0  96  47  1  6  0  282  46  1  5  1  231  45  1  3  0  135  44  0  7  1  353  42  1  2  0  84  41  5  1  0  205  45  1  2  2  92  23  1  1  0  23 7 Computational and Mathematical Methods in Medicine 4.2. Statistical Analysis. There is a need to compare and evaluate the performance in forecasting the year-wise number of patients from 2011 to 2020 by LR and SMOreg. Therefore, their RMSE 1 and RMSE 2 are calculated that are 1076.15 and 1223.70, respectively, using the following equation [38]:

Evaluating and Validating the Predictions by LR
where P denotes the predicted value, O is the observed value, and n is the number of forecasting instances, whereas i = 1 , 2, 3, ⋯n. Analyzing the statistical difference between LR and SMOreg in forecasting the year-wise number of patients from 2021 to 2030, a t-test is applied. A two-sample t-test value of N applied for these models is 10

The Estimated Stress for a Family Member of Cancer
Patients. In the third part of the study, to forecast (2021-2030) overall stress for all expected cancer patients of the under-study areas using PSEM, there was a need to calculate TS, total stress, for a family of a cancer patient and thus, Sf was required to be calculated because it had been used in Equation (2). Sf is stress for a family member of a cancer patient (see Section 3.4.2). Therefore, it was observed that many patients had common values of patient stress affecting factors including A, wP, E, pS, iS, and eT (these variables have already been discussed in Section 3.4.2). Using Equation (1) with these common values, the calculated Sf is given in Table 2.

Discussions
Part 1 of this study concludes that (based on the observed number of patients registered from 1998 to 2010 in the under-study hospitals) linear regression is better in forecasting the year-wise number of patients from 2011 to 2020 than that of SMOreg because RMSE 1 (1076.15) is less than RMSE 2 (1223.70). The statistical analysis of part 2 finds that there is no significant statistical difference between the yearwise number of patients from 2021 to 2030 predicted by linear regression and that of SMOreg because the p value (0.767) is not less than 0.05. The linear regression model predicts 179561 patients, whereas SMOreg predicted 181768 patients from 2021 to 2030. This is the reason for using the forecasted year-wise patients by LR from 2021 to 2030 because, as discussed already, linear regression is better in forecasting the year-wise number of patients from 2011 to 2020 than that of SMOreg. After all, RMSE 1 (1076.15) is less than RMSE 2 (1223.70). This study finds that linear regression performance remains better than that of SMOreg. Further, observing a total of 217067 already registered cancer patients from 1998 to 2020, it is estimated that the understudy hospitals will register 15493, 16119,16658,17183,17707,18231,18755,19280,19805, and 20330 new cases of cancer patients from 2021 to 2030, respectively. As discussed in "Method," the third part of this study drives patients' stress-impacting factors and estimates stress for a family member of a cancer patient, total stress for a family of a cancer patient, and the overall stress of all cancer patients. Unfortunately, we could not find any paper that was exactly relevant to the major contributions of this study; however, some studies presented some parts of these contributions. Table 5 compares their relevant work and the approach used in this study.

Conclusion
This study, for expected cancer patients of the under-study areas, forecasts (2021-2030) by using the proposed model, PSEM, estimating 30.98%, 2.18%, and 94.81% with 328.43, 23, and 1003 average, minimum, and maximum values of TS, respectively. Thus, under-study areas face a minimum of 2.18% stress, on average 30.98% stress, and a maximum of 94.81% overall stress because of 179561 expected cancer patients of all major types from 2021 to 2030. Therefore, these families remain unsuccessful to create a sustainable society due to the stress of their cancerous family members. This study recommends that PSEM can also be used to calculate and forecast stress for patients with other chronic diseases.

Data Availability
The authors have used publicly available data to support the findings of this study that is included within the article.

Ethical Approval
This study included the only number of patients from three major cancer hospitals. The data is openly available in the repository on their website as discussed in Section 3.1 of this paper. The interviews with patients and observations of the patient's factors are not personal that have been discussed by this study; therefore, this study does not require any ethical approval from an ethical approval body.