Incidence and Complications of Atrial Fibrillation in a Low Socioeconomic and High Disability United States (US) Population: A Combined Statistical and Machine Learning Approach

Background Poor socioeconomic status coupled with individual disability is significantly associated with incident atrial fibrillation (AF) and AF-related adverse outcomes, with the information currently lacking for US cohorts. We examined AF incidence/complications and the dynamic nature of associated risk factors in a large socially disadvantaged US population. Methods A large population representing a combined poor socioeconomic status/disability (Medicaid program) was examined from diverse geographical regions across the US continent. The target population was extracted from administrative databases with patients possessing medical/pharmacy benefits. This retrospective cohort study was conducted from Jan 1, 2016, to Sep 30, 2021, and was limited to 18- to 80-year age group drawn from the Medicaid program. Descriptive and inferential statistics (parametric: logistic regression and neural network) were applied to all computations using a combined statistical and machine learning (ML) approach. Results A total of 617413 individuals participated in the study, with mean age of 41.7 years (standard deviation “SD” 15.2) and 65.6% female patients. Seven distinct groups were identified with different combinations of low socioeconomic status and disability constraints. The overall crude AF incidence rate was 0.49 cases/100 person-years (95% confidence limit “CI” 0.40–0.58), with the lowest rate for the younger group (temporary assistance for needy family “TANF”) (0.20, 95%CI 0.18–0.21), the highest rates for the older groups (age, blindness, or disability “ABD” duals—1.51, 95% CI 1.31–1.58; long-term services and support “LTSS” duals—1.45, 95% CI 1.31–1.58), and the remaining four other groups in between the lower and upper rates. Based on independent effects after accounting for confounders in main effect modeling, the point estimates of odds ratios for AF status with various clinical outcomes were as follows: stroke (2.69, 95% CI 2.53–2.85); heart failure (6.18, 95% CI 5.86–6.52); myocardial infarction (3.71, 95% CI 3.49–3.94); major bleeding (2.26, 95% CI 2.14–2.38); and cognitive impairment (1.74, 95% CI 1.59–1.91). A logistic regression-based ML model produced excellent discriminant validity for high-risk AF outcomes (c “concordance” index based on training data 0.91, 95%CI 0.891–0.929), together with similar measures for external validity, calibration, and clinical utility. The performance measures for the ML models predicting associated complications with high-risk AF cases were good to excellent. Conclusions A combination of low socioeconomic status and disability contributes to AF incidence and complications, elevating risks to higher levels relative to the general population. ML algorithms can be used to identify AF patients at high risk of clinical events. While further research is definitely in need on this socially important issue, the reported investigation is unique in which it integrates the general case about the subject due to the different ethnic groups around the world under a unified culture stemming from residing in the US.


Introduction
Atrial fibrillation (AF) is an evolving medical condition that is fueled in most cases by cardiovascular and/or noncardiovascular multimorbidity. [1,2] If not properly treated, AF patients are at a high risk of stroke and other AF-related complications related to dynamic interactions with age, gender, and associated comorbidities [3].
In addition to the traditional cardiovascular risk factors, poor socioeconomic status coupled with individual disability are significantly associated with adverse AF outcomes. [4,5] Yet, there is little information in the published literature from the US. [6] A recent study [7] in the US indicated that individuals with household income < $ 40 000 had the greatest risk for heart failure (HR "hazard ratio" 1.17; 95% CI 1.05 to 1.30) and MI (HR 1.18; 95% CI 0.98 to 1.41) relative to those with income ≥ $ 100 000.
In the US, the Medicaid population is a typical program for multiple groups with varying characteristics financed by the federal/state governments, with individuals centered around the poverty line and several constituents possessing varied degrees of cognitive and/or physical disability. To date, there has been no study examining these different groups within the Medicaid space with respect to AF incidence and associated complications.
is investigation was initiated to examine this issue in greater detail, with an emphasis on the incident AF and AFrelated complications, as well as the dynamic nature of associated risk factors in a large socially disadvantaged Medicaid population, spread across several geographical areas in the US continent.
Our specific aims are as follows: (i) to report the incidence of AF in Medicaid recipients making up seven distinct groups of low socioeconomic status and disability; and (ii) to use a combined statistical and machine learning (ML) approach to examine and predict AF incidence and AF-related complications (i.e., stroke, congestive heart failure, myocardial infarction, major bleeding, and cognitive impairment) as a function of different Medicaid groups and their comorbid/demographic profiles.

Data Sources and Eligibility Criteria for Poor or Socially Disadvantaged Socioeconomic Status.
e patients were drawn from the US Medicaid program financed by both the federal and state governments. e program is aimed to address healthcare coverage for the adult population ranging in age from 18 to 90 years and characterized with low socioeconomic status coupled with disability. e data were accessed from administrative medical and pharmacy claim databases that are subject to US privacy laws.
is retrospective cohort study period was conducted from Jan 1, 2016, to Sep 30, 2021, with patients enrolled in both medical and pharmacy benefits. is population was subjected to a number of inclusions and exclusions with the most notable of continuous enrollment with a minimum period of 30 months to allow the investigation of incident AF cases which require the absence of claims for at least 24 months with AF ICD ("International Classification of Diseases") 10 codes in the medical databases.
Our approach was in line with the methods determined by Piccini et al. [8] and Lip et al. [2] According to the researchers, to avoid classifying patients with prevalent AF as incident cases, it was considered that an individual is to have an incident AF only if the diagnosis occurred after at least 2 years of enrollment in the health plan with no AF diagnosis. Tu et al. [9] and Lip et al. [2] found that adding additional requirements based on pharmacy data would increase the sensitivity and specificity of identifying incident AF patients from administrative databases. e researchers reported the added requirements of the absence of anticoagulant use and heart rhythm control subject to the proper exclusions (see suppl Tables S1 and S2).

Definition of Poor Economic Status/Socially Disadvantaged for Eligibility into Medicaid Programs and Its Group
Categories. Eligibility into Medicaid programs is based on financial requirements and/or medical needs. e upper limit for financial criteria is based on a percentage of the Federal Poverty Line (FPL), which is considered a measure of the minimum household income developed yearly by the US government. For 2021, the federal government set the income limit to qualify for Medicaid at 138% of the FPL depending on the family size. For example, if a two-person household's income is at or less than 138% of $ 17,420-$ 24,040 annually or $ 2,003 per month, one would be eligible for government assistance programs. For a threemember household, the upper limit is $ 2,525 per month. e medically needy program varies from state to state and is designed for individuals with significant health needs whose income is higher than the income limit requirements to otherwise qualify for Medicaid under income eligibility groups. As such, the main criteria for eligibility for Medicaid are based on financial and medical needs and are reserved for individuals with poor economic status and critical medical needs.

e Medicaid Program Have Several Groups Including
(i) Temporary assistance for needy families (TANF)is program, which is time limited, assists families with children when the parents or other responsible relatives cannot provide for the family's basic needs.   recipients with the exception of a  subset group of patients who are termed as dual  eligible with benefits paid for by both Medicaid and  Medicare (each patient is designated as primary  Medicare and secondary Medicaid). (iv) Long-term services and support (LTSS)-It is a diverse group, extending from young to old adult population, with many different types of physical and cognitive disabilities and receiving (a) institutional care and (b) home-and community-based services. ey often receive services and support for many years, or even decades, and often have complex conditions and high needs. us, they are among the Medicaid's most expensive beneficiaries. Similar to the ABD program, individuals in the LTSS group also have a subset of a dual-eligible population.
With the above in mind, the TANF, family care, nondual ABD, or nondual LTSS programs are covered by Medicaid. On the other hand, individuals in the dual ABD and LTSS programs are covered by both Medicaid and Medicare.

Variable Definition.
e index date for an incident AF case was qualified as the date corresponding to the first medical claim with an AF ICD 10 code as explained above. e incidence of any adverse clinical outcome (i.e., heart failure, stroke, myocardial infarction, major bleeding, or cognitive impairment) was identified as the first case, after the AF index date by at least 30 days until the end of the study period (Sep 30, 2021) (see suppl. Table S3 for the definition of outcomes). Patients were censored for each of the five adverse clinical outcomes. Clinical outcomes were treated as binary variables with 1 for the presence of a condition and 0 for its absence. e presence or absence of AF was also treated as a binary variable and was defined in a similar way. e list of comorbid conditions was identified during a baseline period of 2 years preceding the AF index date. e clinical outcomes and baseline comorbid conditions were identified from medical claims using primary and/or secondary diagnoses, as summarized in suppl Table S3 (for ICD 10 codes). Each comorbid condition was treated as a binary variable, with 1 for condition presence and 0 for its absence.
Demographic variables included gender and age. Gender was used as a binary variable, with females as 1 and males as 0 (or reference group). Age was defined as a continuous (in years) as well as a categorical (i.e., nominal variable consisting of multiple levels) variable, with several groups (18-44 years being the reference group or 0; 45-54 years or 1; 55-64 years or 2; 65-74 years or 3; and 75-90 years or 4).
Medicaid group type was categorized into seven groups (TANF or 0; family care or 1; any two categories enrolled at different times by the same patient or 2; ABD nondual or 3; LTSS nondual or 4; ABD duals or 5; and LTSS duals or 6). e TANF group was the reference category and the Medicaid group type was treated as a nominal variable, with multiple levels including the reference group.
Two multimorbid indices were defined in this study as the sum of comorbid conditions (i.e., the sum of all 1s when a chronic condition is present) for the first index and the sum of all comorbid conditions and age group as a nominal variable. e two multimorbid indices included both cardiovascular and noncardiovascular multimorbidity. e CHADS2, [10] CHA2DS2_VASc, [11] and C2HEST [12] clinical rules were used as originally defined in the literature.

Quantitative Analyses.
e analyses included both descriptive and inferential computations. e SAS Enterprise Software was used for all descriptive computations and main effect modeling using logistic regression. Machine learning computations were performed using parametric methods (i.e., logistic regression and neural networking) of the SAS Miner Software [13,14]. e details of quantitative analyses are provided in Supplementary Materials. In particular, in order to address the limitations of using administrative databases in reference to the severity of comorbid conditions and clinical outcomes, this was partly accomplished by using a cost threshold with the assumption that any clinical condition exceeding the cost threshold is considered economically costlier conditions and hence more severe. erefore, a binary variable was created with 1 representing the high cost (hence, the severe cases) and 0 representing the lower cost (thus, the mild cases). e cost was based on the total allowed amount (paid by the insurance company as well as the deduction paid by the patient in a given health plan) for the year prior to the AF index date. erefore, any member with, for example, a total annual cost of $ 2000 or more in the prior year to start participating in the study (as defined by the index date or the equivalent) was considered a severe case or else nonsevere case or $ 0. More details are provided in Appendix S1.

AF Status Outcome and AF Complication Outcomes Using
Main Effect/ML Modeling. With care cost threshold introduced as a model feature (i.e., an input variable defined as a risk factor if its total annual cost prior to the index date for AF cases or equivalent for non AF cases exceeds the cost threshold of $ 2000), all comorbid conditions were significant risk factors for incident AF events, with the exception of cognitive impairment which was nonstatistically significant, while lipid disorders, metabolic syndrome, and asthma were protective factors (Table 3). Males were at a higher risk of AF incidence than females; as well as advancing in age. In general, strong risk factors (≥50% higher risk for AF incidence relative to its absence) were congestive heart failure, hypertension, valvular disease, chronic obstructive pulmonary disease, cost threshold, age group, and Medicaid group type. e C index was 0.822. e ML models demonstrated better discriminant validity relative to main effect modeling. e two parametric methods employed showed comparative results for both:  Figure S1). e true value of ML models lies in their nonlinear associations with the outcomes including the two-way interactions (suppl Table S4). e point estimates of odds ratios for AF status with various clinical outcomes were as follows: stroke (2.69 95% CI 2.53-2.85); heart failure (6.18 95% CI 5.86-6.52); myocardial infarction (3.71 95% CI 3.49-3.94); major bleeding (2.26 95% CI 2.14-2.38); and cognitive impairment (1.74 95% CI 1.59-1.91) ( Table 4).

ML Modeling of Higher Risk AF Incidence and Associated
Complications. Table 5 shows the c index values for the MLbased models for higher-risk AF incidence (defined by condition presence and cost threshold of at least $ 5000 in terms of total care cost in the year prior to index date) and the associated adverse clinical outcomes. e c index values were good to excellent (0.82-0.92 for all outcomes, except for major bleeding which was about average "0.72") for both the training and validation samples.
e areas under the curve and the curve calibration for the external validation samples were good (suppl Figure S2). e cumulative lift values were good (Table 5). For example, targeting the top 10% of high-risk populations would capture about 70% of all AF patients, and 50% to 65% of the associated stroke, heart failure, myocardial infarction, and cognitive impairment cases; about 32% of major bleeding events can be detected for the top 10% of high-risk members.
e ML-based formulations were nonlinear in nature and mostly dominated by interactive terms and fewer polynomial and main effects (see suppl Table S5). Both Medicaid group type and AF contributed significantly to the associations with adverse clinical outcomes. Figure 1 shows the decision curve analysis result for all ML models. e developed models produced better results in terms of net true positives than the "treat all" option even in the presence of low prevalence for the diagnosed conditions. Selecting a probability threshold of 2% for AF and cognitive impairment outcomes as the separator between low-and high-risk outcomes had corresponding sensitivity/ specificity values of 71.2%/90.1% and 87.6%/86.3%, respectively, for the AF and cognitive impairment outcomes. A probability threshold of 3.5% would be adequate for stroke, CHF "congestive heart failure," and MI "myocardial infarction" outcomes and having sensitivity/specificity values of 74.3%/72.5%, 83.1%/76.4%, and 71.9%/82.6%, respectively. Finally, a higher threshold of 6.5% was satisfactory for major bleeding (sensitivity/specificity values: 64.5%/65.3%), having the highest prevalence among the outcomes.

Discussion
In this study, our principal finding shows how combination(s) of socioeconomic status and disability contributes to AF incidence and complications, elevating risks to higher levels relative to the general population. Second, ML algorithms can be used to identify these AF patients at high risk of clinical events in these groups with low socioeconomic International Journal of Clinical Practice status and disability. Moreover, we identified the distinct Medicaid groups which are at the highest risk of AF incidence and complications.
It has been suggested that poor socioeconomic status is associated with an increased risk of AF incidence, but the literature is not definitive on this relationship. A recent systematic review found no consistent pattern for an    6 International Journal of Clinical Practice   association between socioeconomic status and the risk of AF.
[15] One Chinese study [16] found that the prevalence of AF was highest in high-income regions (2.54%), followed by middle-income regions (2.33%), and lowest in low-income regions (1.98%). On the other hand, a European study [17] found that high-income groups tended to have the lowest levels of AF risk relative to low-income cohorts. A life-course disadvantaged socioeconomic status is an important predictor of the first hospitalization of AF. [4] Nonetheless, the US lacks a comprehensive study of different combinations of disadvantaged socioeconomic and disability cohorts such as those enrolled in the Medicaid population. Of note, in this Medicaid population, the great majority of the population examined was under 65 years of age, and the overall crude incidence rate was distinctively high (0.49 cases/100 person-years (95% CI 0.40-0.58) and ranged from 0.20 to 1.5, which are relatively high compared to the published literature for this age group). For example, Wilke et al. [18] found incidence rates of 0.436 cases/100 personyears for men and 0.387 for women for the German population. Miyasaka et al. [19] reported an increase in age-/ gender-adjusted incidence of AF per 100 person-years from 0.304 (95% CI 0.278-0.331) in 1980 to 0.368 (95% CI 0.342-0.395) in 2000 based on a general cohort in a Minnesota county in the US, so the higher incidence rates obtained in the present investigation are suggestive of the examined cohort being a sicker group than the general population. Furthermore, mental issues such as depression are also paramount as the prevalence was considerably high as well (19.2%). e above argument is further supported by the incidence ratios of complications of higher AF risk patients. e incidence ratios of adverse clinical outcomes were as follows: 22.3% for stroke, 45.6% for heart failure, 21.7% for myocardial infarction, 24.2% for major bleeding, and 9.2% for cognitive impairment relative to the following ratios for the non-AF cohorts (3.7% for stroke, 4.2% for heart failure, 2.5% for myocardial infarction, and 1.3% for cognitive impairment). e contribution of Medicaid group type toward the high AF incidence and adverse clinical outcomes was clearly demonstrated in strong terms both as an independent effect as well as interaction with comorbid profile (e.g., depression, vascular disease, and diabetes mellitus) and demographic variables (i.e., age groups and gender). e findings of this investigation clearly suggest that a poor socioeconomic status coupled with disability constraints may have negative consequences for AF incidence and associated AF-related complications. ese were indeed demonstrated in the ML models developed for the detection of high-risk AF incidence and their associated complications, particularly in light of their high discriminant validity/performance effectiveness (based on cumulative lift) and high calibration/ clinical utility. Further research would help advance the role of population health studies with respect to improved quality of care and cost of care savings.

Limitations.
is study is observational in nature and may be limited by its inherent biases as well as the use of administrative databases. Yet, our findings support the findings of European studies on the role of poor socioeconomic status as a risk factor for AF incidence and its potential complications.

Conclusions
A combination of low socioeconomic status and disability constraints contributed significantly to AF incidence and complications, elevating the risk to higher levels relative to the general population. e use of ML algorithms revealed significant nonlinear associations which can be used to target high-risk AF patients for cardiovascular prevention programs.

Data Availability
Data are available as presented in the paper. According to US laws and corporate agreements, our own approvals to use the Anthem and Ingenio-Rx data sources for the current study do not allow us to distribute or make patient data directly available to other parties.

Conflicts of Interest
e authors declare that they have no conflicts of interest.  (4)

Supplementary Materials
e supplementary material includes: (a) Healthcare codes for extracting atrial fibrillation from medical and pharmacy claims as well as comorbid history; (b) Details of machine learning-based models for atrial fibrillation outcomes and associated complications (i.e., stroke, congestive heart failure, myocardial infarction, major bleeding, and cognitive impairment, together with performance assessment analyses); and (c) Details of quantitative analyses. (Supplementary Materials)