Identification of Differences in Body Composition Measures Using 3D-Derived Artificial Intelligence from Multiple CT Scans across the L3 Vertebra Compared to a Single Mid-Point L3 CT Scan

Purpose Body composition analysis in colorectal cancer (CRC) typically utilises a single 2D-abdominal axial CT slice taken at the mid-L3 level. The use of artificial intelligence (AI) allows for analysis of the entire L3 vertebra (non-mid-L3 and mid-L3). The goal of this study was to determine if the use of an AI approach offered any additional information on capturing body composition measures. Methods A total of 2203 axial CT slices of the entire L3 level (4–46 slices were available per patient) were retrospectively collected from 203 CRC patients treated at Western Health, Melbourne (97 males; 47.8%). A pretrained artificial intelligence (AI) model was used to segment muscle, visceral adipose tissue (VAT), and subcutaneous adipose tissue (SAT) on these slices. The difference in body composition measures between mid-L3 and non-mid-L3 scans was compared for each patient, and for males and females separately. Results Body composition measures derived from non-mid-L3 scans exhibited a median range of 0.85% to 6.28% (average percent difference) when compared to the use of a single mid-L3 scan. Significant variation in the VAT surface area (p = 0.02) was observed in females compared to males, whereas male patients exhibited a greater variation in SAT surface area (p < 0.001) and radiodensity (p = 0.007). Conclusion Significant differences in various body composition measures were observed when comparing non-mid-L3 slices to only the mid-L3 slice. Researchers should be aware that considering only the use of a single midpoint L3 CT scan slice will impact the estimate of body composition measurements.


Introduction
Te measurement of body composition relies on the assessment of quantity and distribution of body fat and lean muscle mass [1] and varies between sexes [2].In colorectal cancer (CRC) patients, body composition has been associated with survival-related clinical outcomes [3][4][5][6][7][8].Te most common technique for evaluating body composition has arisen through the use of computed tomography (CT) [9].Grading of CT images through the use of a semiautomated analysis using a manual interpretation of body composition is possible, but this approach has limitations due to its labour-intensive nature and a high degree of specialisation.A single abdominal axial CT image taken at the L3 level (typically at the midpoint of L3, referred to as mid-L3 from hereon) is typically used to examine body composition in individuals with CRC [10][11][12].However, there is limited justifcation as to why the mid-L3 is used as the gold standard [13,14] and limited data exist to compare whether body composition measures utilising other CT slices from L3 or the entire L3 vertebral level (non-mid-L3) result in different estimates.
Deep learning is one of the primary techniques used in artifcial intelligence (AI), and its use has been growing in popularity as a viable approach for automating the process of body composition segmentation [15].In prior studies, AI models designed to replicate the process of semiautomated analysis have been trained and validated using a single mid-L3 slice [16][17][18][19][20][21].Tese models have yielded promising results [16][17][18][19][20][21].Our previously trained AI model has also shown promising segmentation (98% dice similarity) of CT body composition in CRC patients (submitted for publication).Te use of AI technologies may therefore make it possible for the rapid acquisition of other L3 slices to assess body composition measures compared to those from a single mid-L3 slice.
In the present study, we aimed to employ our in-house AI model for automated segmentation and quantifcation of body composition from all available CT scans from a patient's complete L3 level.Tis would allow determination as to the level of variation across the L3 region in terms of estimating body composition measurements and highlight any potential impact on future clinical studies.

Methods
Tis study was approved by the Western Health Ofce for Research (Project QA2020.24_63907).Te protocol followed the tenets of the Declaration of Helsinki and all privacy requirements were met.

Study Population and CT Scans.
Using sagittal imaging, the anatomical level of L3 was identifed by a trained human grader (author JoY) using the medical image viewer Synapse 5 (FUJIFILM).All available axial scans (n � 2203 axial scans) at the L3 level for each patient were collected.For each patient, one CT slice being most representative of the L3 was defned as the mid-L3 slice, which in line with the Alberta Protocol (https://tomovision.com/Sarcopenia_Help/index.htm) was manually selected by a trained human grader (author JoY).
Each collected CT scan was represented as a digital imaging and communications in medicine (DICOM) image with a resolution of 512 by 512 pixels.Te CT scan parameters included slice thickness (1 mm-8 mm) and dose value (100-140 kVp) that difered depending on the clinical indication.Each CT unit/pixel was transformed to the Hounsfeld unit (HU) scale; a quantitative measure of radiodensity for analysing CT scans [22] using the formula: pixel value × slope + intercept (https://www.idlcoyote.com/fleio_tips/hounsfeld.html).Te pixel value, intercept, and slope were retrieved from each DICOM fle.
Patients' inclusion criteria included being (a) diagnosed with colon cancer at Western Health between 2012 and 2021.Patients were identifed from the Australian Comprehensive Cancer Outcomes and Research Database (ACCORD), a prospectively maintained registry of patients diagnosed with CRC in Victoria, Australia; (b) availability of L3 axial CT scans.
Patients were excluded from the study if any of the following were present in their L3 scan: (a) low CT scan quality that was difcult to manually read; (b) evidence of an excess quantity of SAT extending outside the CT image; (c) signs of muscle cut of; and (d) presenting with major artefacts.
Age at the time of diagnosis and sex were both obtained from the ACCORD database for each patient.

Body Composition Measures.
Tis study examined skeletal muscle (SM), visceral adipose tissue (VAT), and subcutaneous adipose tissue (SAT) as components of body composition measures on the mid-L3 slice and other L3 slices for each patient.Te following body composition measures were analysed in this study: (1) SM surface area (cm 2 ) (2) VAT surface area (cm 2 ) (3) SAT surface area (cm 2 ) (4) SM radiodensity (HU) (5) VAT radiodensity (HU) (6) SAT radiodensity (HU) Te formula used to calculate the surface area (cm 2 ) of a particular body composition for each slice was (size of the specifc body composition × the pixel spacing).Te pixel spacing was derived from the data included within each CT DICOM fle.
Te radiodensity of a specifc body composition measure was determined by averaging the values of pixel representing that body composition in each slice.

AI Model.
A two-dimension U-Net convolutional network that was trained and validated on 541 previously collected mid-L3 CT scans was used to segment muscle, VAT, and SAT (submitted for publication).Te training dataset comprised 338 CT scans derived from CT scans of 116 CRC patients.Each patient's accessible CT scans (from six months prior to surgery or three months after surgery) were collected so that one or more scans were available for the same patient.For each patient, a trained human grader (author JoY) manually selected the mid-L3 CT slice based on the Alberta Protocol (https://tomovision.com/ SarcopeniaHelp/index.html).Using a semiautomated software (Slice-O-Matic version 5.0, Tomovision, Quebec, Canada), all CT scans of the training dataset were manually segmented in accordance with the Alberta Protocol (https:// tomovision.com/Sarcopenia_Help/index.html).Tis dataset was then randomly divided into a training (80% of scans, number of scans � 270) and a validation dataset (the remaining 20% of scans, number of scans � 68).Te training dataset was used to develop the segmentation model, and the validation dataset was applied to assess the performance of the fnal ftted model.According to the results, the average 2 Radiology Research and Practice dice coefcient in the validation dataset for all body composition segmentation was 0.98, with 0.98 for muscle, 0.98 for VAT, and 0.99 for SAT.Te AI model was further tested on an additional CT dataset from another 203 patients, with 1 in 10 scans (number of scans � 21) selected at random for manual segmentation in order to perform cross-validation.Te average dice coefcient for the AI model constructed in this test dataset was 0.98, with 0.97 for muscle, 0.98 for VAT, and 0.98 for SAT. Figure 1 shows an example of body composition segmentation, including an original CT scan and a segmented CT scan.
To assess the performance of our AI model in segmenting diferent L3 slices in the current dataset, all available scans at the L3 level (198 CT slices in total) from a randomly selected 21 patients were manually segmented (author JoY) using the semiautomated software (Slice-O-Matic version 5.0, Tomovision, Quebec, Canada), according to the Alberta Protocol (https://tomovision.com/Sarcopenia_Help/index.htm).Te threshold settings for the segmentation tool were as follows: SM: −29 to 150 HU, VAT: −150 to −50 HU, and SAT: −190 to −30.Tese thresholds were predefned in the Alberta Protocol for SliceOmatic (https://tomovision. com/Sarcopenia_Help/index.htm).
Te Sorensen-Dice coefcient (Dice coefcient) was used to determine the efectiveness of U-Net-based segmentation by comparing AI and manual reading on the 198 assessed scans.Te average Dice coefcient achieved for all body composition segmentation on these scans was 0.97, with 0.97 for SM, 0.96 for VAT, and 0.97 for SAT, respectively, indicating that our AI produced a highly accurate representation of body composition segmentation for each of the diferent L3 slices.

Statistical Analysis.
To compare body composition between mid-L3 and other L3 slices, the average percent diference was calculated.For a particular body composition measure of each patient, the average percent diference was computed using the formula: average (absolute value ((each L3 slice (excluding mid-L3) body composition-mid-L3 body composition)/mid-L3 body composition) × 100).
Te Mann-Whitney test was performed to determine if there was a statistically signifcant diference between sexes (unpaired data) regarding continuous parameters.A p value threshold of 0.05 indicated a statistically signifcant result.

Results
Te dataset for the current study consisted of 2203 CT scans obtained from 203 patients who had surgical treatment for CRC.Te mean age of the cohort was 60.87 ± 12.42 years (97 M, 106 F).Te median number of CT slices that represented the whole-L3 vertebra was 10 slices per patient (IQR: 9-11).

Single Mid-L3
Slice.Body composition measurements using the mid-L3 CTslice of all patients are shown in Table 1.Females had signifcantly less SM and VAT surface area than males (p < 0.001).Female patients exhibited signifcantly more SAT surface area and lower SAT density than male patients (p < 0.001).

Discussion
Body composition measurements, in particular SM surface area, have been associated with rectal cancer response to neoadjuvant therapy and corresponding survival outcomes [23,24].Furthermore, body composition has been suggested as a superior method of dosing chemotherapy for CRC, to decrease rates of dose-limiting toxicity [8,25].Currently, 2D body composition is still commonly measured as there is limited clinically validated software available for researchers and clinicians to use.As a result, the gold standard Alberta Protocol derived mid-L3 vertebral CT slice is routinely utilised for the measurement of body composition [10][11][12].
Two studies by Shen et al. [13,14] published in 2004 have been frequently cited as justifcations for the use of the L3 vertebra as the gold standard of obtaining body composition.Te frst study examined the relationship between crosssectional VAT areas at various anatomic locations and VAT volume in 320 healthy subjects.Teir fndings indicated that the area between 5 and 10 cm above the L4-5 vertebrae level provided the most accurate estimate of VAT volume in men and women, respectively, when utilising only a single 2D CT slice.Te latter study by Shen investigated the relationship between a single cross-sectional area at diferent anatomic locations and the total volume of muscle and adipose tissues in 328 healthy subjects.Tese results indicated that the area between 5 cm above the L4-5 level and 5 cm below the L4-5 level showed the highest correlation with muscle and adipose tissues volume, respectively.However, both studies relied on MRI scans, and Radiology Research and Practice neither study included CRC patients nor specifcally stated the signifcance of L3 segments (although L3 is located 5/ 10 cm above L4-5).Another study by Schweitzer et al. [26] reported that a single MRI scan at the L3 level was the best representative site for assessing total volumes of SM, VAT, and SAT.Again, this study was conducted on only 142 healthy subjects and not CRC patients.Consequently, if considering using only a single representative CT slice for body composition, using a mid-L3 CT slice and correlating it to a patient's clinical outcome does not appear to have been adequately addressed and requires further investigation.
Our study demonstrated that body composition measurements obtained from a single-CT slice image at the mid-L3 vertebral level difer to those obtained from analysis of multiple slices that constitute the entire L3 vertebra.Te surface area of body composition components displayed a large degree of variability across L3.For example, VAT and SAT surface area readings had a median of 5.49% and 6.28% in average percent diference, respectively, between nonmid-L3 slices and the mid-L3 vertebral slice.
It was of particular interest that we identifed signifcant variation in body composition parameters in the mid-L3 slice and the non-mid-L3 slices between the two sexes.Our  4 Radiology Research and Practice study also demonstrated that between the mid-L3 slice and non-mid-L3 slices, VAT variance was greater in females, whereas the opposite was true for SAT variance.
From our results, it can be surmised that the use of only a single 2D CT scan at the mid-L3 level presents a limited view of body composition and that the advent of AI now ofers researchers an enhanced and more accurate means of obtaining a broader based measure of 3D body composition measures which will aid in our understanding of the role that body composition plays in clinical outcomes.
In this study, we have presented results from our validated AI model to automatically segment body composition measures for SM, SAT, and VATfrom multiple CTslices across the whole-L3 vertebra in CRC patients.Manual cross-check validation with experienced researchers demonstrated that the AI model provides excellent body composition segmentation on all CT slices at this L3 level (Dice similarity of 0.97).
Despite these promising results, there were several limitations to our study.Te study was conducted at a single centre, with data that were collected retrospectively.Furthermore, these fndings on body composition measures need to be further elaborated on their clinical impact on CRC outcomes.In addition, while our results are highly promising, we should note that our results have not been evaluated on an external dataset (i.e., other hospital institutions or in other countries).Our future work will recruit additional internal and external patient datasets to test the validity of our results and strengthen our fndings with data from various institutions and patient cohorts in order to verify its robustness.A future prospective study in a clinical context is essential to conduct more rigorous testing of our AI models, specifcally to evaluate their generalizability and robustness.

Conclusion
We found that the use of multiple CT slices from various locations on L3 identifed signifcant variations in estimates of body composition compared to when only using a single slice from the mid-L3 vertebral level.Tis heterogeneity in body composition across L3 was signifcantly linked to sex diferences.Te use of AI to derive 3D body composition ofers an enhanced means of obtaining a more accurate measure of body composition as a predictive tool for determining outcomes related to colorectal cancer.

Figure 1 :
Figure 1: A sample case demonstrating the original CT scan (a) and the AI segmented CT slice (b).In the segmented CT slice, the red indicates the region of SAT, the yellow indicates the region of VAT, and the blue indicates the region of muscles.

Table 1 :
Characteristics of body composition for all patients and by sex in CRC patients using only the mid-L3 slice.

Table 2 :
Average percent diference (%) in muscle, VAT, SAT area, and radiodensity between the mid-L3 slice and non-mid-L3 slices in all patients and by sex.For each patient, the average percent diference was calculated by averaging (absolute value ((each L3 slice (excluding mid-L3) body composition-mid-L3 body composition)/mid-L3 body composition) × 100).