Empowering Medical Students: Harnessing Artificial Intelligence for Precision Point-of-Care Echocardiography Assessment of Left Ventricular Ejection Fraction

Introduction Point-of-care ultrasound (POCUS) use is now universal among nonexperts. Artificial intelligence (AI) is currently employed by nonexperts in various imaging modalities to assist in diagnosis and decision making. Aim To evaluate the diagnostic accuracy of POCUS, operated by medical students with the assistance of an AI-based tool for assessing the left ventricular ejection fraction (LVEF) of patients admitted to a cardiology department. Methods Eight students underwent a 6-hour didactic and hands-on training session. Participants used a hand-held ultrasound device (HUD) equipped with an AI-based tool for the automatic evaluation of LVEF. The clips were assessed for LVEF by three methods: visually by the students, by students + the AI-based tool, and by the cardiologists. All LVEF measurements were compared to formal echocardiography completed within 24 hours and were evaluated for LVEF using the Simpson method and eyeballing assessment by expert echocardiographers. Results The study included 88 patients (aged 58.3 ± 16.3 years). The AI-based tool measurement was unsuccessful in 6 cases. Comparing LVEF reported by students' visual evaluation and students + AI vs. cardiologists revealed a correlation of 0.51 and 0.83, respectively. Comparing these three evaluation methods with the echocardiographers revealed a moderate/substantial agreement for the students + AI and cardiologists but only a fair agreement for the students' visual evaluation. Conclusion Medical students' utilization of an AI-based tool with a HUD for LVEF assessment achieved a level of accuracy similar to that of cardiologists. Furthermore, the use of AI by the students achieved moderate to substantial inter-rater reliability with expert echocardiographers' evaluation.


Introduction
Point-of-care ultrasound (POCUS) is frequently utilized by physicians in many medical specialties as well as among medical students [1].With the recent development of lowcost portable devices and increasing number of applications, it is expected that POCUS use will expand in the coming years [2,3].Furthermore, in felds such as emergency medicine or internal medicine, the expected results are often general or binary (i.e.pericardial fuid present or absent; left ventricular ejection fraction (LVEF) normal or grossly abnormal) rather than a detailed result given by a cardiologist [4,5].Also, diferent POCUS guidelines have been proposed with basic requirements including a qualitative assessment of left ventricular (LV) systolic function, leaving the exact calculation of LVEF to expert echocardiographers only [6].Te specifc quantifcation of cardiac LVEF is one of the most signifcant and frequent applications of echocardiography [7].Nonetheless, the methods used to make these specifc calculations are operator-dependent [8].
Artifcial intelligence (AI) and computational technologies are increasingly utilized across various imaging modalities, including cardiac imaging, by nonexpert operators.Tey assist in decision making and enhancing diagnostic capabilities [9][10][11][12].In the clinical practice of echocardiography, AI is mainly used in automated tools implemented in high-end devices but less incorporated in systems used for point-of-care testing [13].Tis is due to the low frame rate and image quality that limit the use of speckle-tracking algorithms on these devices.Incorporating AI into real-time focused echocardiography operated by noncardiologists to accurately assess various cardiac functions may signifcantly improve accurate image interpretation, reduce variability among nonexperts, and lead to better diagnostic decisions.
Te objective of this study was to assess the diagnostic accuracy of an AI-based assessment tool on a hand-held ultrasound device (HUD) operated by medical students as compared with cardiologists' visual evaluation in assessing the LVEF of patients hospitalized in a cardiology department of a tertiary care teaching hospital.

Study Design.
Tis was a prospective study of real-time focused echocardiography operated by medical students using an AI-based technology for LVEF evaluation compared to cardiologists and expert echocardiographers.Te study was approved by the hospital's Institutional Review Board (IRB number: 0325-18-SZMC).
Te clips acquired using the HUD were assessed for LVEF by three methods: visually by the students, students + the AIbased tool, and visually by cardiologists (Figure 1(a)).A formal echocardiography was completed within 24 hours including LVEF eyeballing and Simpson's method evaluation by expert echocardiographers (Figure 1(b)).

Study
Comparisons.Te study's primary comparison was designed to show that the AI-based tool used by nonexperts for LVEF evaluation is accurate compared with cardiologists' assessment.A correlation goal of 80% between the AI-based tool and the cardiologists was defned as suitable for the study.
Secondary comparisons included all three LVEF measurements (students, students + AI-based tool, and cardiologists) as compared with a parallel mean assessment of the high-end echocardiography completed within 24 hours by the expert echocardiographers.

Patient Selection.
Study participants were nonselected patients admitted to the cardiology department within their frst 48 hours of hospitalization.As part of the cardiology department routine, all admitted patients underwent an ofcial echocardiography within the frst 24 hours of hospitalization.

Study Setting.
Te study was conducted at a single tertiary care medical center from March 2019 through March 2020 and included 4 th to 6 th year medical students that routinely worked in the cardiology department as physician assistants.Te students were trained to use a HUD (Vscan Extend with Dual Probe; General Electric) equipped with LVivo EF (DiA Imaging Analysis Ltd), an AI-based program, that provides automated calculation of LVEF from the apical 4 chamber (A4ch) view (Figure 2).Te students were assigned to read preliminary relevant information after which they underwent a quiz to assess their knowledge.Tey then underwent a 6-hour course that included frontal lectures and hands-on practice.Te frontal lectures discussed background information, practical information, and heterogeneous echocardiographic video clips encompassing the full clinical range of LVEF calculation.During the hands-on practice, each of the participants had to complete at least four supervised scans assessed for both proper acquisition and LVEF evaluation.Prior to the clinical study, a preliminary practical examination of the devices and the AIbased application was performed by the principal investigators for troubleshooting and to rule out any practical problems.Following the training, a pilot phase was conducted where the operators' skills were evaluated.A total of nine students were trained in the course, and one did not complete the required training, leaving eight students who participated in the study.

Study Protocol.
Written consent was obtained from all patients who participated in the study.Tose who refused to participate or whose AI-based measurement was unsuccessful were excluded.Data included age, sex, body mass index (BMI), relevant chronic comorbidities, and admission presentation.Technical aspects were also recorded including the patient's ability to turn on their left side, their profciency in maintaining efective communication (i.e.adhering to instructions and cooperating with the examination), study difculty, and quality.
Te study fowchart is shown in Figure 1(a).Te medical students performed the POCUS examination using the HUD and acquired the echocardiography clips obtained from the A4ch view.Te study acquisition was evaluated by the students on a scale of 1-3 for difculty (easy, intermediate, or difcult) and on a scale of 1-4 for image quality (excellent-optimal visualization; high--proper visualization of >50% of the segments; moderate-<50%; and poor-inappropriate visualization).Te view was focused and optimized on the LV, avoiding foreshortening.Te interventricular septum was aligned parallel to the plane, and at least a 2-beat heart cycle was recorded.Depth was adjusted so that the LV occupied twothirds of the view.Te students were then asked to visually evaluate the exact LVEF.Next, the acquired clips were visually evaluated by a cardiologist (AO and ZD), who were blinded to the previous results, for a second LVEF measurement and image quality according to the abovementioned scale.Finally, the LVEF was assessed on the recorded echocardiographic clips using the AI-based application (after both the student and cardiologist have committed to specifc LVEF values).In case of a failure of the automated algorithm to calculate the LVEF, if the entire   International Journal of Clinical Practice border tracings were incorrect or if the clip was signifcantly foreshortened, the image acquisition was repeated (up to fve subsequent attempts).Te patients underwent an ofcial echocardiogram using a high-end device within 24 hours of being recruited into the study.Tese clips were acquired by a certifed echocardiographic technician (equivalent to a Registered Diagnostic Cardiac Sonographer in the United States).Each formal echocardiogram was evaluated for LVEF using both visual evaluation and Simpson's method by two fellowship-trained expert echocardiographers (AB and DR), blinded to the patient's details and previous study assessments.
2.6.Data Management.Following consent, the study patients were given a separate anonymous identifying number for the study documentation.Te Primary Investigator (PI) kept an Excel fle with the case identifying number, the date of the study, and patient identifers (PI fle).Te HUD-based LVEF results were inserted into a second fle using the patient's identifying number (Hand-held fle).Ofcial echocardiography results were documented on a third fle (Ofcial fle).Te HUD results were later matched to ofcial results using the identifying number.

Sample Size Calculation.
Sample size calculations were designed to meet the study comparison and were performed using G * Power software (version 3.1.9.4,Heinrich Heine University Düsseldorf, Germany).We planned a paired study with a 1 : 1 ratio.While previous data regarding LVivo EF usage showed a high correlation with the gold standard [14], in order to maximize data yield, we assumed a low correlation of 0.4 between the AI-based LVEF calculation and the cardiologist's assessment.Based on these assumptions, we calculated that data accrued from 67 participants would sufce to reject the null hypothesis with a probability (power) of 0.9.Type I error was calculated as 0.05 and was two-tailed.

Statistical Analyses.
Descriptive statistics were used to analyze baseline and clinical characteristics as well as echocardiography results and comparisons, using chi-square or Fisher's exact tests for categorical variables, and the t-test or Mann-Whitney U test for continuous variables, where appropriate test selection was based on data distribution and normalcy.
For continuous LVEF comparisons, the paired T-test or signed-rank test for two means (paired observations) were applied to test the statistical signifcance of the diferences between the results obtained from each method.
Te students' visual evaluation and the students + AIbased tool LVEF continuous evaluations were compared to the cardiologists' assessment for linear correlation using the Pearson correlation coefcient (r values <0.3, 0.3 to 0.5, 0.5 to 0.7, and ≥0.7 were considered to represent poor, poor to fair, fair to good, and excellent correlation, respectively).LVEF assessment agreement and bias were calculated using the Bland-Altman analysis including mean diference and 95% limits of agreement (according to 2 standard deviations).
For categorical variables, the inter-rater reliability using the Kappa coefcient was then calculated using cutofs of 50% and 40% for the LVEF between the echocardiographer's high-end device assessment and the three HUD-based LVEF evaluations, including visually by the students, students + AI tool, and visually by the cardiologists.Kappa values 0, 0 to 0.2, 0.21 to 0.40, 0.41 to 0.60, 0.61 to 0.80, and ≥0.81 were considered to represent no agreement, slight, fair, moderate, substantial, and almost perfect agreement, respectively.
All tests were two-tailed, and a p value of 5% or less was considered statistically signifcant.
Statistical analyses were performed using SPSS Statistics for Windows version 26 (SPSS Inc., Chicago, IL).

Correlation of the LVEF Assessment Methods: Students vs.
Cardiologists and Students + AI vs. Cardiologists.A fair to good correlation was demonstrated between the students' and the cardiologists' visual evaluation for LVEF assessment of the students' acquired clips, with a Pearson correlation coefcient of 0.51 (p < 0.001; Figure 3(a)).An excellent correlation was demonstrated between the students + AI measurement and the cardiologists' visual evaluation for LVEF assessment, with a Pearson correlation coefcient of 0.83 (p < 0.001; Figure 3(b)).

Assessment Agreement of the LVEF Assessment Methods: Students vs. Cardiologists and Students + AI vs. Cardiologists.
LVEF assessment agreement between the students' and the cardiologists' visual assessment of the HUD-acquired clips using the Bland-Altman analysis revealed a mean bias of −1.77 (p � 0.062), with limits of agreement ranging from −18.4 to 14.8 (Figure 4(a)).LVEF assessment agreement between the AI measurement and the cardiologists' visual assessment of the students' acquired echocardiography clips revealed a mean bias of −1.44 (p � 0.052), with limits of agreement ranging from −14.4 to 11.5 (Figure4(b)).

4
International Journal of Clinical Practice  International Journal of Clinical Practice

Inter-Rater Reliability of LVEF Assessment: Te Tree Assessment Methods of HUD-Acquired Clips vs. the Expert Echocardiographers' Assessment of High-End Device Clips.
As shown in Figure 5, the categorical agreement of LVEF assessment comparing the three assessment methods of students' acquired clips (students, students + AI, and cardiologists) with the expert echocardiographers' assessment of the formal echocardiogram using LVEF 50% as cutof revealed a substantial agreement for the AI measurement and the cardiologists (Kappa of 0.64, standard error of 0.09, p < 0.001 and Kappa of 0.67, standard error of 0.09, p < 0.001, respectively) but only a fair agreement for the students' visual evaluation (Kappa of 0.29, standard error of 0.10, p � 0.007).A similar analysis using LVEF 40% as the cutof revealed a moderate agreement for the AI measurement (Kappa of 0.51, standard error of 0.12, p < 0.001) and a substantial agreement for the cardiologists (Kappa of 0.71, standard error of 0.10, p < 0.001) but a fair agreement for the students' visual evaluation (Kappa of 0.24, standard error of 0.13, p � 0.027).

Discussion
Tis study showed that the use of an AI-based tool on a HUD operated by medical students for LVEF assessment of patients admitted to the cardiology department has a high correlation with cardiologist visual assessment.Moreover, when compared with fellowship-trained expert echocardiographers using a high-end device, the AI-based LVEF measurement of the students' HUD-acquired clips can reach an agreement signifcantly higher than student visual evaluation and almost as good as that of the cardiologists.
Te increasing use of POCUS by clinicians across specialties has been accompanied by a parallel introduction of ultrasound to medical students [15].However, according to one critical systematic review, ultrasound was not shown to improve medical students' understanding of anatomy and only some studies show that it improves diagnostic abilities while there are no clear benefts in terms of patient outcomes [16].Many of the tools suggested to enhance ultrasound skills involve either passive learning or are not conducted in clinical settings [17,18].
As POCUS has gained popularity across many medical disciplines, the use of HUD has expanded due to its advantages, including small size, portability, cost, and its ability to provide a real-time and instantaneous assessment [19].Tese characteristics were proven useful in settings that can lead to a direct impact on immediate patient diagnosis and management and led to HUD utilization for bedside evaluations, including during the COVID-19 pandemic [20][21][22].Tough HUD use may involve several limitations, including screen size, imaging quality, and equivocal observations, its utilization was found to be reliable and accurate in diferent POCUS settings if properly performed [23,24].Similarly, this study demonstrated a moderate/substantial agreement between a HUD with an AI-based tool operated by medical students and high-end devices operated by skilled sonography technicians and evaluated by expert echocardiographers.
Short-term accurate assessment of LVEF by medical students following a dedicated training session has been previously shown for both prerecorded and real-time acquired clips [25,26].In contrast to these studies, our research took place in a real-time setting on patients admitted to the cardiology department and included both independent clip acquisition and LVEF evaluation by medical students.Also, the long-term efect of training for echocardiography diagnosis among novice users who are not routinely exposed to echocardiography practice is less studied and is not always maintained [27].Aside from the obvious loss of training that increases with time and lack of reinforcement, this observation may also stem from a limited training efect secondary to novice users' adoption of a less structured approach to image reading ending with a less efcient analysis.Tis trend may explain the fnding in the present study that, unlike previous publications [25,26],  Te AI-based tool used in the present study (LVivo EF) has been recently validated using a traditional formal echocardiography device for LVEF automated quantifcation as compared with cardiac magnetic resonance imaging [28].Also, a previous study that tested this AI tool use with hand-held echocardiography clips acquired by a 5th year cardiology resident found an excellent correlation of 0.92 for the entire studied cohort as compared with formal echocardiography [14].Similarly, we found a correlation of 0.82 with the cardiologist evaluation as well as a high agreement when compared with fellowship-trained expert echocardiographers.While Filipiak-Strzecka et al. tested the tool on a single highly trained cardiology resident (after six months of training in the echocardiography lab), the echocardiography acquisition in the present study was conducted by eight medical students who underwent only a 6-hour didactic course, and as such it refects real-life novice use.
We have shown that all of the cases with unsuccessful AI measurement had a poor/moderate quality, were difcult to perform, and had a lower LVEF as compared with the successful cases.As the phased array transducer on this particular HUD is narrower than in conventional devices, if the LV was severely dilated, it may be challenging to include all borders in view throughout the entire cardiac cycle, resulting in an unsuccessful AI measurement.Similar to the fndings of this study, Filipiak-Strzecka et al. found that most of the unsuccessful AI tool calculations were conducted on poor-quality clips (26/36).Notably, they showed an unsuccessful AI measurement in 36 patients (27% of those attempted), whereas in our study, the rate was only 7% (6/ 88).Samtani et al. showed a 2% (6/242) unsuccessful rate of AI-based measurement using a standard echocardiography device [28].Te varying rate of unsuccessful attempts may result from a higher number of acquisition attempts (fve vs. three) or from difering patients and image characteristics including a lower volume of poor image quality (7 vs. 23%) and a potentially higher volume of normal functioning hearts (the proportion in their study was not published).

Limitations.
Te study extended over one year, resulting in a gradual loss of training since the didactic course.Tis factor may account for the relatively lower diagnostic accuracy observed among the medical students.Nonetheless, they retained their acquisition capabilities as proven by the comparisons to the high-end device evaluations.Moreover, the study refects real-life clinical practice, as novice users are utilizing their skills for months and years after their initial training.A signifcant limitation is that the cardiologist visual assessment used as the reference was done on the students' acquired clips and could have been foreshortened.Tis design was chosen to minimize the potential biases for LVEF mismatch, including diferent acquisition by experienced personnel.Another limitation is the relatively small sample size from a single medical center.Even though this study compared diferent echocardiographic methods for  International Journal of Clinical Practice the assessment of LVEF, i.e., visually estimated evaluation vs. tracing of the ventricular borders and exact maximum and minimum surface measurement (via the AI-based tool and Simpson's method), it has been shown that the two are closely correlated when properly conducted [29].Moreover, the scales used for acquisition difculty and image quality were not based on ofcial guidelines and were created for study purposes.Also, the LVEF evaluation was assessed by the students using the A4ch view exclusively.Nonetheless, the LVEF evaluation was accurate with a moderate/substantial agreement achieved when compared with the highend device clips assessed from all views.

. Conclusions
Medical students can improve their LVEF assessment profciency using a HUD to match that of cardiologists through the utilization of an AI-based tool.In addition, the use of AI for LVEF assessment enabled novice users to achieve moderate to substantial inter-rater reliability as compared with expert echocardiographers.Tis study ofers a rationale for considering the use of this AI-based tool as an efective decision-making support tool for POCUS LVEF evaluation by nonexperts.Further studies should be conducted among diferent types of noncardiologist clinicians such as internists, emergency physicians, and physician assistants to assess the generalizability of these fndings.In addition, prospective studies should be conducted to investigate whether AI-based tools can impact patient outcome.

Figure 1 :
Figure 1: Flowchart of LVEF assessments and the primary and secondary comparisons.(a) Te LVEF assessment methods using the students' HUD-acquired clips upon recruitment to the study and the echocardiographer assessment using the formal high-end echocardiography clips completed within 24 hours from recruitment to the study.(b) Te study's primary comparison included LVEF assessment on students' HUD-acquired clips: students vs. cardiologists and students + AI vs. cardiologists.Te secondary comparisons included the three assessment methods of students' HUD-acquired echocardiography clips (students' visual evaluation, AI + students, and cardiologists' visual evaluation) with the fellowship-trained expert echocardiographer's assessment of the formal high-end echocardiography.AI, artifcial intelligence; HUD, hand-held ultrasound device; LVEF, left ventricular ejection fraction.

Figure 4 :
Figure 4: Te agreement using the Bland-Altman analysis of LVEF assessment on students' HUD-acquired clips: students vs. cardiologists and students + AI vs. cardiologists.(a) LVEF assessment agreement between the students' and cardiologists' visual evaluation revealed a mean bias of −1.77 (red line) with limits of agreement ranging from −18.37 to 14.83 (yellow lines), p � 0.062.(b) LVEF assessment agreement between the students + AI and cardiologists' visual evaluation revealed a mean bias of −1.44 (red line) with limits of agreement ranging from −14.40 to 11.52 (yellow lines), p � 0.052.AI, artifcial intelligence; HUD, hand-held ultrasound device; LVEF, left ventricular ejection fraction.

Figure 5 :
Figure5: Categorical inter-rater reliability of LVEF assessment using Kappa coefcient comparing the 3 assessment methods of students' HUD-acquired echocardiography clips (students' visual evaluation, AI + students, and cardiologists' visual evaluation) with the fellowshiptrained expert echocardiographers' assessment of the formal high-end echocardiography using 2 cutof values of LVEF for each set of analyses (LVEF of 40 and 50%).AI, artifcial intelligence; CI, confdence interval; HUD, hand-held ultrasound device; LVEF, left ventricular ejection fraction.

Table 1 :
Baseline demographics and clinical characteristics of those with successful AI vs. unsuccessful AI measurements.
+ As per cardiologist evaluation.* As per medical student evaluation.++ As per expert echocardiographer assessment of the formal echocardiography.AI, artifcial intelligence; BMI, body mass index; bpm, beats per minute; HR, heart rate; LVEF, left ventricular ejection fraction; n, number; SD, standard deviation.Bold values indicate the p value is statistically signifcant (less than 0.05).