Identifying the Influencing Factors of Depressive Symptoms among Nurses in China by Machine Learning: A Multicentre Cross-Sectional Study

,


Introduction
As a result of the COVID-19 pandemic phase, there is a growing concern about the health of healthcare professionals, such as nurses, who are immersed in stressors including overloaded clinical tasks and nursing assignment. Nurses play an essential role in the healthcare system but are often exposed to high workloads and stressful environments, such as those that place them in proximity to dying patients [1]. Nursing work traits have created a greater need for nursing professionals on the frontline, leading to high levels of overwork, detrimental psychological consequences, and the deterioration of mental health [2], including increased depressive symptoms.
Depressive symptoms not only afect nurses' health but also negatively afect their work performance and the quality of the care provided [3][4][5]. Depressive symptoms substantially reduce an individual's quality of life and life satisfaction. Nurses are highly prone to psychological health problems, such as depression [6]. According to a meta-analysis, approximately 30% of nurses working during the COVID-19 pandemic outbreak sufered from psychological symptoms, including anxiety, stress, and depression, with prevalence rates of 37%, 43%, and 35%, respectively [7]. In China, nurses had low mental health statuses during the COVID-19 pandemic phase, with prevalence rates for anxiety and depression estimated at 18.1% and 34.4%, respectively. Meanwhile, nurses who cared for COVID-19 patients had an extremely high prevalence rate of depression, at 47.1% [8].
Most of the current studies on nurses' depressive symptoms have mainly emphasised the association between depressive symptoms and negative physical and psychological outcomes, such as perceived stress [9], burnout [10], chronic fatigue [11], sleep quality [12], medication error [13], and decreased quality of care received by patients [13]. In China, there is a lack of an efcient supervision system and organisational institution to help nurses to cope with negative psychological health; instead, there is a larger focus on the improvement of nurses' professional skills and abilities. Recently, more attention has been paid to nurses' career identity and organisational support to help nurses combat occupational stress, ease their burden, and prevent them from experiencing depression [14,15]. Moreover, positive psychology theory has gained traction, with an emphasis on positive personal resources, such as recovery experience and resilience [16,17]. Based on the NAM model [18], the identifying factors that afect nurses' well-being can be categorised into external factors (i.e., sociocultural, regulatory, business, and payer environment, organisational factors, and learning/practice environment) and internal (individual) factors (i.e., healthcare role, personal factors, and skills and abilities). Terefore, this study explores the infuence of the following related factors on nurses' depressive symptoms: (1) internal (individual) features, including demographic characteristics (i.e., age, sex, and economic status) and internal personal features (i.e., recovery experience, resilience, and chronic fatigue); and (2) external features (i.e., environmental and organisation resources and work-demand related factors), such as organisational support, career identity, and burnout.
Machine learning (ML) has a strong data training ability and obvious advantages in building prediction models, which may be helpful for identifying nurses' depressive symptoms. Most of the previous studies have employed traditional linear regression methods to analyse the relationship between independent variables and depressive symptoms [9][10][11]. ML can solve the problem of multiple related factors and multicollinearity between variables. Furthermore, ML can ascertain the contribution ranks of the predictors and the comparative efects of independent factors on the dependent variables [19]. Some studies have confrmed that ML models can efectively predict depression-related problems [20][21][22]. However, few studies have used ML methods to identify the infuencing factors of depressive symptoms among nurses [23]; thus, it is necessary to conduct a more detailed analysis. To develop evidencebased interventions that can reduce the impact of depressive symptoms among nurses, it is necessary to determine the specifc contributions of external variables, such as work-related factors, and internal personal factors, such as positive psychological resources, as well as demographic characteristics.
Consequently, the purpose of this study was to (1) detect nurses' depressive symptoms by applying ML and (2) identify the predictors of nurses' depressive symptoms and provide a research basis for reducing nurses' negative psychological outcomes, so as to improve their quality of life and well-being.

Design and Sample.
Tis study employed a multicentre cross-sectional design in Liaoning Province, Northeast China, from January 2022 to April 2022 during the phase of normalization of COVID-19 prevention and control. Tis study used multistage proportional random sampling to collect information from nurses. It randomly selected two general hospitals from each city (Shenyang, Fushun, Fuxin, and Liaoyang) in Liaoning Province; around 30.0% of the nurses in clinical departments were selected from each hospital. Approximately 1,500 clinical nurses were fnally selected from 8 hospitals. A self-administered anonymous smartphone questionnaire (via the Wenjuanxing platform) was delivered to the nurses to evaluate their depressive symptoms and related factors. After excluding illogical answers and invalid responses, the response rate was 95.4%.

Participants.
Tis study included 1,500 clinical nurses aged 18 years and above who were working in the 8 selected general hospitals in Shenyang, Fushun, Fuxin, and Liaoyang and could complete the online questionnaire via the Wenjuanxing platform. Nurses who had been diagnosed with or treated for a severe mental illness (e.g., bipolar disorder, schizoafective psychosis, and paranoid psychosis) were excluded from this study.
In this study, the sample size was calculated based on the following formula: Te prevalence rate of depressive symptoms among the nurses after the epidemic is around 30%, and Π � 30% was used as the basis to estimate the sample size. "d" is the allowable error. To ensure accuracy, d � 3.5%, for 95% confdence interval, α � 0.01 and Z � 2.58. Te estimated sample size was 1,369, taking into account a 20% loss of follow-up rate. In the fnal draft of this study, about 1,500 questionnaires were collected, excluding unqualifed questionnaires, and 1,431 valid questionnaires remained, resulting in a valid response rate of 95.4% (1431/1500).

Data Preparation.
Tis study excluded survey responses that were logically inconsistent. A total of 1,431 survey responses were obtained. Te data were randomly divided into a training set (n � 1,144) and test set (n � 287) in an 8 : 2 proportion. Figure 1 presents the detailed fowchart of this study.

Variables. Tis study used the Patient Health
Questionnaire-9 (PHQ-9) [24] to measure nurses' depressive symptoms. Te PHQ-9 is commonly used to measure depressive symptoms based on the Diagnostic and Statistical Manual of Mental Disorders. Te scale is comparably sensitive and specifc and includes nine items. A cut-of value score of ≥10 on the PHQ-9 indicates the existence of depressive symptoms. In this study, Cronbach's α coefcient of the PHQ-9 was 0.924. A detailed description of the questionnaire is provided in supplementary materials. Tis study categorised the predictors afecting the nurses' psychological health as internal and external features. Internal features included demographic characteristics (i.e., age, sex, marital status, income, and chronic disease) and internal personal features (i.e., coping styles, recovery experience, resilience, sleep quality, chronic fatigue, and perceived stress); external features included organisational support, career identity, and burnout.
Tis study simultaneously considered the sum score of these scales and their dimensions in analytic models to accurately explore the infuencing factors of depressive symptoms among nurses.

Model Establishment and Performance Evaluation.
Tis study implemented the following three steps to identify the important features of depressive symptoms among nurses. First, the variables describing demographic characteristics were analysed by using the chi-square test. It then used maximum relevance minimum redundancy (mRMR) to initially screen for the internal personal and external factors on the training set. mRMR combines two indicators to evaluate the importance of features: one maximises the correlation between features and variable classifcations, the other minimises the redundancy between the features. In this study, the chi-square test calculated the correlation between the internal personal and external features and depressive symptoms. Tis study used Pearson's correlation coefcient to calculate the redundancy between the internal personal and external features. All internal personal and external features were sorted by importance.
Subsequently, this study conducted fve ML algorithms using a fve-fold cross-validation strategy to establish the models on the training set. Tis study used fve ML algorithms: k-nearest neighbours (KNN), Gaussian Naive Bayes (GNB), support vector machine (SVM), random forest (RF), and logistic regression (LR). To discover the appropriate feature dimensions, this study gradually included the internal personal and external features according to the importance ranking to locate the optimal feature subset and identify whether there were depressive symptoms. By considering the fve ML model results, this study obtained the smallest feature subset with the largest area under the curve (AUC) of the models. Ten, this study evaluated the performance of the test set. Finally, the signifcant demographic features on the training set were added to the optimal internal personal and external features subset to establish combined models. Similarly, it evaluated the combined models' performance on the test set. Furthermore, this study used a univariate LR to further compare the predictive abilities of the important demographic characteristics and optimal internal personal and external features to determine the depressive symptoms and validate the abilities on the test set. Tis study compared the receiver operating characteristics (ROCS) of the demographic characteristics and internal personal and external features using the DeLong test.
To obtain the best model prediction performance, some interacting hyperparameters need to be tuned. Te radial basis function was used as a kernel function of the SVM model, and a cross-validation grid search method was applied to fnd the best hyperparameters of the SVM model c (from 1e − 03 to 1e + 03, number � 12) and C (from 1e − 04 to 1e + 04, number � 12). Te random search method was used to fnd the best hyperparameter of the RF model and avoid overftting, including the number of estimators (from 80 to 120, number � 5), the maximum depth of the tree (from 2 to 8, number � 3), minimum number of samples required to segment nodes (3, 5, or 10), and minimum number of samples per leaf node (5, 10, or 15). Stepwise LR was used to analyse depressive symptoms related features. Te number of neighbours selected in the KNN model was 3, 5, 7, and 9. Te GNB classifer used the default parameters to build the model. Hyperparameters adjustment results are provided in supplementary materials, Table S1.
2.6. Data Analysis. All statistical analyses were performed using Python (Version 3.7.3). Te level of statistical signifcance was set at P < 0.05. In addition to the stepwise LR using the Statsmodels package, other ML methods used the scikit-learn package. Te basic confguration of the computer for statistical analysis is as follows: CPU, Intel (R) Core (TM) i7-9700; RAM, 32 GB; and Operating system, Windows 10.

Ethical Statement.
Tis study was implemented in accordance with the Helsinki Declaration (1989) and was approved by the Ethics Committee of China Medical University (ID: 2020048). Te participants were voluntary and anonymous. Tey were well-informed of the aims and contents of this study and provided signed informed consent before the survey. Table 1 shows that out of the 1,431 nurses, 456 (31.9%) have depressive symptoms. Te ages of the nurses range from 18 to 57 years. Te majority are female (97.8%) and married (78.1%). Approximately, 64.5% receive an income of 3,000-6,000 yuan, and 23.5% sufer from a chronic disease.

Demographics Characteristics.
In the training set, when compared with nurses who have no depressive symptoms, depressive symptoms demonstrate statistically signifcant diferences in prevalence rates based on income, marital status, and chronic disease, which may indicate that unmarried nurses with low incomes and chronic diseases tend to sufer more often from depressive symptoms. Table 2 shows the importance of ranking of the internal personal and external features obtained from the mRMR. When the features dimension is fve, the AUC of each model has the largest value. Ultimately, sleep quality, chronic fatigue, exhaustion, physical fatigue, and organisational support comprise the optimal feature subset that distinguishes depressive symptoms from no depressive symptoms. Table 3 shows the evaluation indicators of the fve models based on the optimal internal personal and external features. Each model has a good prediction performance based on the test set. Tis indicates that internal personal and external features can predict nurses' depressive symptoms.

Combined Models Establishment.
After adding income, chronic disease, and marital status to the optimal feature subset, the performance of the fve combined models slightly improves. Te AUC and accuracy increase to 0.904 and 0.826 on the test set, respectively. Compared with the other models, the LR model is more stable. Notably, only six features are included in the LR model; marital status and physical fatigue are excluded. Tese results suggest that sleep quality, chronic fatigue, exhaustion, organisational support, income, and chronic disease are the most important features for identifying nurses' depressive symptoms. For the LR model, the P values of the included variables are shown in supplementary material Table S2. Table 4 shows the evaluation indicators of the fve combined models on the training and test sets, and Figure 2 shows the ROCs of the combined models. Figure 3 shows the ROCs of the univariate analysis of the test set. Te AUCs of the optimal internal personal and external features are much higher than those of the demographic characteristics, and chronic fatigue has the best prediction performance. According to the DeLong test results, there are no signifcant diferences in ROC between organisational support and the demographic characteristics (P � 0.068-0.237); however, the ROCs of the other features were signifcantly diferent (P < 0.001-P � 0.016). Tese results suggest that internal personal and external features are more efective than demographic characteristics for predicting nurses' depressive symptoms.

Discussion
To the best of our knowledge, this study is the frst to use ML to predict the risk factors of depressive symptoms among nurses in Northeast China. Tis study found that nurses had severe depressive symptoms, with a prevalence rate of 31.9%. Tis prevalence rate was much higher than that of a study conducted in Iran (17.8%) [30] and Li's study, which found that 26.2% of the nurses sufered from depression during COVID-19 isolation [31]. However, the current study's results were slightly lower than those of a study conducted in Sichuan Province and Wuhan City [8], which revealed that depressive symptoms had a 34.3% prevalence rate among nurses at the outbreak of COVID-19 and a 55.0% rate among frontline nurses at the beginning of the COVID-19 pandemic [32]. As a high-stress group, nurses are prone to negative emotions, such as depression, due to the specifcities and limitations of their work environment, which, in turn, afect nurses' mental health and work quality. During   the COVID-19 pandemic phase, nurses not only undertook heavy work tasks due to unprecedented workloads, but also faced the risk of infection, which could have triggered complex psychological stress responses and resulted in the development of depressive symptoms [32]. Tis study used fve ML methods to identify the most important predictors of depressive symptoms among nurses. It achieved good predictive results: the LR model was the best predictor, with an AUC of 0.904 on the test set, while the KNN model was the worst predictor, but the AUC was also 0.871. Te ML model results exceeded those of Zhou et al., who used four ML models to predict nurses' depressive symptoms during COVID-19 phase in China (AUCs: 0.785-0.829) [23]. Terefore, ML methods are feasible for predicting depressive symptoms among nurses, and the proposed predictive factors are reliable. Most of the previous research on predicting nurses' mental health using ML has focused on clinical characteristics or workplace factors and has highlighted health-related predictors and the importance of optimising workplaces [23,33]. Tis study examined nurses' positive psychology resources, coping styles, organisational supportive resources, and career identity to predict depressive symptoms; these predictors can be conceptualised to improve nurses' psychological health. Overall, more attention should be paid to nurses' depressive symptoms, and efective measures should be taken to optimise their mental health status.
Tis study found that ML was an efective tool for predicting the most meaningful and distinctive features of depressive symptoms among Chinese nurses. A previous study by Havaei et al. used ML algorithms to predict the impact of work-related factors on nurses' mental health with signifcant benefcial results [33]. In the current study, the nurses' demographic characteristics, including marital status, income, and chronic disease were associated with their depressive symptoms, which concurs with the results of the previous studies [34]. More importantly, in the current study's ML models, the internal personal and external     features could better explain most of the depressive symptoms among nurses than the demographic characteristics. Moreover, fve key external and internal personal features could signifcantly predict depressive symptoms, including sleep disturbance, exhaustion due to burnout, chronic fatigue, physical fatigue, and perceived organisation support. By comparing the fve depression prediction models, this study found that the LR model had the best performance for predicting depressive symptoms, and the strongest predictor was sleep quality. However, physical fatigue was excluded.

Journal of Nursing Management
Te results also showed that sleep quality, exhaustion due to burnout, chronic fatigue, and physical fatigue were positively associated with depressive symptoms. Perceived organisational support, as a protective factor, was associated with depressive symptoms, which was consistent with the previous studies [12,[35][36][37]. Compared with traditional LR, the multicollinearity problem between variables can be solved using ML algorithms, and more efective predictors can be screened out [38]. Tis study difers from the previous studies that have employed ML algorithms as it incorporates the total score of the scales into the prediction model simultaneously with each scale dimension to efectively understand the distinctive contributions of each dimension and the total efect of the overall factor [33,39]. Tus, the fractional dimension and the total efect of the feature are not interchangeable, meaning that if you obtain a highdimensional score of the feature, you can simultaneously obtain a high total score of the feature. Terefore, this study simultaneously considered both the fractional dimension and the total efect of the feature on depressive symptoms.
Sleep quality was a critical predictor of nurses' depressive symptoms; this is in agreement with the previous studies that have shown that sleep disorders impact an individual's ability to tackle work tasks, which, in turn, negatively impacts job performance, reduces work productivity, and ultimately afects quality of care, thereby increasing the risk of depression [12]. Te research has also confrmed that there is a strong association between sleep quality and depression [40]. Sleep disturbances may cause changes in epigenetic characteristics, personality, and neurobiological functioning, which are typical risk factors of depression. Moreover, sleep disorders are accompanied with fatigue; therefore, subsequent mental impairments, such as depressive symptoms, may occur.
In this study, both the chronic and physical fatigue dimensions were crucial predictors of depressive symptoms. Fatigue is a common problem that afects healthcare professionals, especially during the COVID-19 pandemic phase in China. Te consequences of chronic and physical fatigue can impair an individual's recovery functioning and vigour at diferent levels and can interrupt harmonious relations between family members and work [41]. Chronic fatigue caused by a depleted ability to restore and recover physical and mental health could increase an individual's susceptibility towards developing depressive symptoms [42]. Moreover, the longer fatigue lasts and the greater its intensity, the more it can impact performance in terms of daily activities and duties inherent to various professionals in social roles. Depressive symptoms could be further aggravated by chronic fatigue or physical exhaustion in stressful work surroundings. Some scholars have argued that fatigue and depressive symptoms have a high degree of overlap, while recent studies have suggested that chronic fatigue could be treated as an independent predictor of depressive symptoms. Terefore, fatigue should be managed to optimise nurses' mental health status [43].
Tis study found a strong association between exhaustion and depressive symptoms and revealed that the greater the level of burnout experienced, the more depressive symptoms reported by the nurses. Tere has been much attention on burnout among nurses, which has subsequently been identifed as one of the occupational risks for nurses due to its high morbidity [44]. Many studies have found that all three dimensions of burnout are signifcantly associated with depression [45], while the current study found that only exhaustion due to burnout was a powerful predictor of depressive symptoms. Meanwhile, a study conducted in a Brazilian hospital noted that too many work tasks assigned to a nurse could lead to faster work rhythms and less time of work. Moreover, nurses must deal with many patients and their family members, which could enhance their vulnerability to stress-related disorders [46] and exacerbate depressive symptoms.
In addition, perceived organisational support was conversely correlated with depressive symptoms, as found in other studies [14,15]. Nurses with a high level of perceived organisational support were less likely to experience depressive symptoms. Tis might be because nurses who perceive organisational support and feel that their work contributions are valued may be more likely to adopt an optimistic working attitude, which would positively impact their mental health [47,48]. It has also been previously studied that perceived organisational support could relieve depressive symptoms in diferent populations, such as policemen [37,49]. Te less support perceived from organisations, the more severe depressive symptoms the nurses exerted. Supportive organisations can help nurses combat stressful work tasks, reducing the incidence of depressive symptoms. Organisational support has a benefcial efect on depressive symptoms, which helps nurses cope appropriately when confronting their working environment and thus can prevent depressive symptoms [50]. Nowadays, most hospitals underestimate the mental health demands put on nurses and the requisite supportive strategies to foster psychological health, resulting in widely varying levels of organisational support in various healthcare settings [51]. Organisational support interventions should help the nurses facilitate communication with the supervisors and leaders of the hospital and increase the nurses' participation in decisionmaking to improve the social support to combat depressive symptoms. At the organisational level, hospitals should decrease the working pace and workloads of nursing and increase the number of staf to ease work overloads and burnout. Medical institutions should develop strategies to incentivise enhanced organisational support, which will prevent depressive symptoms in the nursing population and ultimately improve the quality of care for patients.
Te limitations of the study should be made an explanation. First, this study did not comprehensively consider other associated factors of depressive symptoms among nurses, such as fear of COVID-19, the time of COVID-19 phase, the work organisation, and work conditions including work climate. However, this study's results may have reference value for the future research on psychological interventions at the individual and organisational level. Second, this study's design was a cross-sectional survey so the causal relationship between variables could not be confrmed; therefore, further longitudinal studies should be conducted in this regard. Tird, this study's results could be optimised through other various and efcient ML methods. Finally, the study was conducted in Northeast China, which may limit the generalisability of the results in diferent national and organisational contexts.

Conclusion
Tis study found that Chinese nurses in clinical settings sufered from severe depressive symptoms, and ML constituted a feasible approach to identify the predictors of depressive symptoms. Specifcally, the LR model could successfully capture high-dimensional information on the risk factors of depression among nurses. Sleep disturbance, exhaustion due to burnout, chronic fatigue, and physical fatigue were important predictors of depressive symptoms. Furthermore, organisational support could relieve depressive symptoms. To prevent depressive symptoms among nurses, this study suggests to prioritise interventions to improve the management of fatigue, sleep quality, and exhaustion due to burnout, and emphasise organisational support to enhance nurses' work-life balance.

Implications for Nursing Management
By using ML to predict depressive symptoms in nurses, this multicentre cross-sectional study can help nursing managers identify psychological problems in nurses. ML prediction is a promising strategy for predicting depressive symptoms in Chinese nurses, and it provides evidence-based recommendations for preventing depressive symptoms. For internal personal features and external variables, the ML prediction model has high validity and efectiveness. Tis study supports the identifcation of nurses' mental health problems. Moreover, improving nurses' mental health contributes to the quality of care and patient satisfaction with healthcare services as well as playing an important role in strengthening nursing care and creating value for the healthcare system.

Data Availability
Te datasets used during the study are available upon reasonable request from the corresponding author. 8 Journal of Nursing Management

Additional Points
Implications for Nursing Management. ML is a promising strategy that can predict depressive symptoms and make evidence-based recommendations for interventions to prevent depressive symptoms among Chinese nurses.

Ethical Approval
Tis study was approved by the Ethics Committee of China Medical University (Ref no. 2020048) and conducted in line with the Helsinki Declaration principles.

Conflicts of Interest
Te authors declare that they have no conficts of interest.

Authors' Contributions
Shu Li contributed to the acquisition and analysis of data, drafting, and the revision of the manuscript. Kristin K. Sznajder contributed to revision of the manuscript and provided English edits. Lingfang Ning, Hong Gao, Xinyue Xie, Shuo Liu, Chunyu Shao, and Xinru Li contributed to the acquisition and analysis of data. Xiaoshi Yang was responsible for the conception, design, drafting, and the revision of the manuscript.