Automatic Detection of Horner Syndrome by Using Facial Images

Horner syndrome is a clinical constellation that presents with miosis, ptosis, and facial anhidrosis. It is important as a warning sign of the damaged oculosympathetic chain, potentially with serious causes. However, the diagnosis of Horner syndrome is operator dependent and subjective. This study aims to present an objective method that can recognize Horner sign from facial photos and verify its accuracy. A total of 173 images were collected, annotated, and divided into training and testing groups. Two types of classifiers were trained (two-stage classifier and one-stage classifier). The two-stage method utilized the MediaPipe face mesh to estimate the coordinates of landmarks and generate facial geometric features accordingly. Then, ten machine learning classifiers were trained based on this. The one-stage classifier was trained based on one of the latest algorithms, YOLO v5. The performance of the classifier was evaluated by the diagnosis accuracy, sensitivity, and specificity. For the two-stage model, the MediaPipe successfully detected 92.2% of images in the testing group, and the Decision Tree Classifier presented the highest accuracy (0.790). The sensitivity and specificity of this classifier were 0.432 and 0.970, respectively. As for the one-stage classifier, the accuracy, sensitivity, and specificity were 0.65, 0.51, and 0.84, respectively. The results of this study proved the possibility of automatic detection of Horner syndrome from images. This tool could work as a second advisor for neurologists by reducing subjectivity and increasing accuracy in diagnosing Horner syndrome.


Introduction
Horner syndrome is a clinical constellation of signs and symptoms, typically consisting of the triad of miosis, ptosis, and facial anhidrosis. Tis syndrome was frst comprehensively described by a Swiss ophthalmologist named Johann Friedrich Horner in 1869 [1]. Horner syndrome occurs when the sympathetic innervation of the eye is interrupted. Because of the long, circuitous anatomical pathway of the oculosympathetic eferent chain, the cause of Horner syndrome could be various. As the literature reported, Horner syndrome often does not have an identifable cause, but 35%-60% cases of Horner syndrome were associated with neoplasms [2]. Considering the potentially life-threatening event, researchers recommended taking this syndrome as a "red fag" warning and deserving sufcient attention from all clinicians [3]. However, the diagnosis of Horner syndrome is often challenging due to the inconsistency of symptoms [3]. In addition, although the diagnosis could be improved by using clinical history, physical examination, and pharmacologic testing, it is still operatordependent because of the subjectivity of pupillometry. Terefore, an objective diagnostic tool might be benefcial to clinicians.
Computer version methods have been widely applied to process medical images for providing object predictions in these years. Recent studies have achieved outstanding accuracy in the classifcation of skin lesions from dermoscopic images [4], malignancy detection on mammography [5], the diagnosis of acute lymphoblastic leukemia [6], the detection of retinopathy in retinal fundus photographs [7], as well as the detection of COVID-19 from CT scans [8,9] and X-ray images [10]. However, this technique is rarely used to detect clinical signs, such as the Horner sign.
Tis study presents an objective method for detecting Horner syndrome from face images. Here, we proposed two methods for this task: the two-stage and the one-stage methods. Te two-stage method contained two steps: the frst step was landmark extraction by MediaPipe face mesh [11], and the second step was the construction of conventional machine learning classifers. Te one-stage method transferred Horner syndrome recognition into an object detection task that can directly recognize Horner syndrome from facial photos. Here, we utilized one of the latest and most powerful algorithms in this feld, YOLO v5 [12], which can carry out regional proposal and classifcation simultaneously for this task. Our method may provide a possibility for the detection of Horner syndrome. Tese classifers could act as reliable assistants for neurologists in the near future.

Data Sources.
Our dataset was acquired from the image dataset of patients with brachial plexus injury in our department. Te acquisition was performed following relevant regulations and proved by the ethics committee of our institution. Images that fulflled the following criteria were as follows: (1) images containing both eyes and at least 2/3 face of the subject; (2) images shot in sufcient light and had an adequate resolution for pupil observation. Te exclusion criteria were as follows: (1) the face in the image had apparent tilt or rotation; (2) images contained more than one person. Te whole process of this study is illustrated by a fow chart (Figure 1).

Annotation
Procedure. Image annotation greatly infuences the quality of the dataset, which could impact the accuracy of the fnal model. For that, we had two experts with 14 and 26 years of experience, respectively, to label the data together. Images and necessary information, including case history and EMG and MRI scan results, were provided to them to get a precise diagnosis. Te label was obtained on the consensus of the two experts. In cases with confict, they thoroughly re-evaluated and discussed to reach a fnal diagnosis. Te image would be excluded if the agreement cannot be reached after reviewing all available information.

Dataset Splitting.
Te images in the dataset were randomly split into two parts: the training set and the testing set. Te splitting scheme was 75/25. Te training set was used to train and validate the model, while the testing set was used to evaluate the model's performance.

Data Augmentation.
To overcome the issue of insufcient training data and increase the robustness of the model, we applied data augmentation by using the albumentations library [13]. Images were expanded by ten times after adding Gaussian and multiplicative noise, RGB shifting, contract/ brightness/scale changes, fipping, and cropping.

Two-Stage Detection Classifer.
Te two-stage detection classifer contained two main steps. Te frst step was extracting facial landmarks and the second was the construction of machine learning classifers using the extracted features. In this work, we used the facial landmark detector from the MediaPipe library [11] to generate landmarks on the face images. Tis model is able to output the 3D position of 468 face landmarks from an image, containing information on various facial areas such as the cheeks, forehead, mouth, and eyes. Considering Horner signs mainly infuence the appearance of the eyes and the periocular regions, we selected 32 landmarks to generate geometric features for the classifer. To represent the geometric features efciently, we converted the coordinates of landmarks to distances between points and angles between edges. A total of 22 (11 for each side) parameters were selected to characterize the geometric features of the interest area (shown in Figure 2). Each parameter was estimated in two manners (the MediaPipe face mesh could estimate the coordinate of landmarks in both 2-dimension and 3-dimension manners). Ten, to eliminate individual diferences, we generated ratios of these parameters between the left and right sides. All the ratios were calculated by dividing the smaller value into the larger value to prevent the efect caused by the side.
After data standardization, we performed feature decomposition using principal component analysis (PCA). Ten, the features were fed to classifers. In this work, logistic regression, K-neighbors, decision tree, support vector machine (SVM), Bernoulli naïve Bayes, Random Forest, GradinetBoosting, AdaBoost, Light GBM, and XgBoost were used for classifcation. Te grid-search method [14] was used to identify the optimal hyperparameters and structure of the classifer. In addition, the fve-fold cross-validation was also employed to assess the combination of hyperparameters to avoid overftting. In this procedure, the training set was further split into fve subsets. Four subsets were used to train the classifer, while the remaining one was used to validate the accuracy. Te optimal confgurations in this stage were applied in the testing set.

One-Stage Detection
Classifer. YOLO (you only look once) family is one of the most powerful and fastest deep learning object detection algorithms. Unlike the other object detection techniques that send multiple patches to the classifer, the YOLOs send the whole image to a single convolutional neural network (CNN). Tis CNN predicted the bounding box, as well as the class possibilities at the same time. It was frst presented by Redmon et al. in 2016 [15]. In this study, we utilized the latest version, YOLO v5 [12], to detect Horner syndrome. Te region of interest (ROI) was the portion of the image containing the eyes and periocular areas. Since the data were insufcient in this work, we chose to freeze the convolution layers and retrain the fully connected layer rather than start from scratch. Te pretraining weights (yolov5s) were used as the initial weights. Te YOLO classifer was trained through 2000 epochs, with a batch size of 32, a learning rate of 0.01, a weight decay of 0.005, and a momentum of 0.937.  Journal of Healthcare Engineering 2.6. Model Evaluation. Te performance of the classifer was evaluated by assessing the diagnosis accuracy, sensitivity, and specifcity. Te receiver operating characteristic (ROC) curve was used to illustrate the capacity of two-stage classifers. Te precision-recall curve with mean average precision at IoU (Intersection-Over-Union) � 0.5 (mAP@0.5) was used to show the performance of the one-stage classifer. In addition, confusion matrices were employed to present whether the predictions of classifers were discordant with the gold standard.

Statistical Analysis.
Experiments were performed on three Intel(R) Xeon(R) CPUs with 8 GB RAM and a NVI-DIA RTX 3090 GPU with 24 GB RAM. Python 3.7 was used as the development environment. OpenCV was used for image preprocessing, and albumentations was used for data augmentation. Te MediaPipe library was used for facial landmarks extraction. Scikit-learn was used for constructing the machine learning classifers and compute the evaluation metrics. NumPy, Pandas, OS, and Matplotlib were also used in this procedure.

Clinical Characteristics.
In total, 173 images of patients were collected in our dataset. Sixty-nine images of patients were diagnosed with Horner (+), while the remaining ones were Horner (−). Te included images were split into training and testing sets, then the data augmentation was performed. Te training and testing sets had 1350 (510 positives and 840 negatives) and 380 (140 positives and 240 negatives) images, respectively. Te resolution of included images ranged from 455 * 837 to 2848 * 4288.

Descriptive Statistics of Extracted Features.
With a minimum detection confdence of 0.7, the MediaPipe face mesh failed to detect human faces in 40 and 27 images in the training and testing sets. Terefore, only 1310 images (478 positives and 832 negatives) were entered into feature extraction, and 353 images (118 positives and 235 negatives) were used for model evaluation. Te detection rate was 97.03% and 92.9%, respectively. Ten, the ratios of selected geometric features were calculated (shown in Figure 2). Te distribution of data in training and testing sets is summarized in Table 1. Principal component analysis identifed 11 components that explained 98.22 percent of the variance between the positive and negative cases (Figure 3(a)). Te composition of each principal component is presented in Figure 3(b). Te optimal hyperparameters of each classifer were identifed by the grid-search method and shown in Table 2.
All the features were presented as ratio, and the generation process is shown in the methods' part. sd: standard deviation.

Two-Stage Classifer.
Te performances of the machine learning classifers are sorted from high to low according to the prediction accuracy (shown in Table 3). Te decision tree classifer held the highest accuracy (0.790), followed by KNN, XgBoost, gradient boost classifer, logistic regression, support vector classifer, LGBM, random forest classifer, AdaBoost classifer, and Bernoulli NB. Te sensitivity, specifcity, positive predictive value, and negative predictive value are also presented in Table 3. Confusion matrices in Figure 3(c) present the number of true positive, false positive, true negative, and false negative. In addition, the visual comparison between the classifer was also generated by using the receiver operating characteristic curve (Figure 3(d)). Te gradient boost classifer produced the highest AUC (0.830), while the decision tree classifer produced the lowest (0.628). Figure 2: Examples of the facial landmarks generated by MediaPipe and the parameters used to characterize the face.

One-Stage Classifer.
Te performance of the onestage classifer was summarized in Figure 4(a), which shows the change of accuracy and losses during the training process. Te accuracy, sensitivity, and specifcity of the classifer were 0.65, 0.51, and 0.84, respectively. Te classifcation performance of this classifer was also presented by the confusion matrix (Figure 4(b)) and the precision-recall curve (Figure 4(c)). Te average precision of negative, positive, and all classes were 0.702, 0.838, and 0.770, respectively.

Discussion
Tis work presented two approaches for automatic Horner syndrome detection from facial images. Te two-stage method integrated an automatic face landmark generator MediaPipe with machine learning classifers. Te one-stage method utilized an object detection algorithm, YOLO v5. Both methods achieved adequate accuracy in this task. To the best of our knowledge, this is the frst study trying to detect Horner syndrome automatically, and the results proved that the computer output could act as a second adviser for neurologists and contribute to doctors before making the fnal decisions.
Horner syndrome arises from a lesion or a disruption along the oculosympathetic eferent chain, which a variety of etiologies can cause. Te typical triad of Horner syndrome is ipsilateral ptosis, pupillary miosis, and facial anhidrosis [16]. However, all three symptoms are not consistently present and are always subtle. According to a report of 318 patients, less than 2% of patients presented with anhidrosis, and ptosis was recorded in only 34% of patients [17]. Although the occurrence rate of miosis was relatively high (91%) [17], the observation of it is signifcantly impacted by light [18]. Usually, miosis is more apparent in the dark, and the so called "dilation lag" is only apparent within the frst few seconds [18,19]. Te actual degree of miosis could also be impacted by several factors including the resting size of pupil, alertness of the patients, and sympathetic drive [18]. Terefore, the presence of Horner syndrome can easily be overlooked in clinical practice. Although ptosis and miosis of Horner syndrome are not likely to cause any functional disturbance, the detection of Horner syndrome is still critical, as its cause can be very threatening or sometimes lethal. Previous studies have indicated that Horner syndrome should be considered a "red fag" warning, and thus, the recognition and evaluation are important to all clinicians [3].
Typically, the diagnosis of Horner syndrome was confrmed by several pharmacological agents such as cocaine, apraclonidine, and hydroxyamphetamine. However, the use of these agents has many drawbacks. First, some of the agents are controlled drugs and rarely available. It is impractical to use them in general departments. Second, the construction of pharmacological tests and the pupillometry after drug use require the experience of the operator, and therefore, it is hard to normalize and generalize. Tird, the sensitivity of pharmacological tests needs further validation because there are several reports of false negative cases [20,21]. In addition, the results of these tests could be infuenced by the time from the onset of damage [22,23], as well as the use of other drugs [19]. Terefore, a diagnostic tool other than drugs may beneft clinical practice.
In this study, we present a method to detect Horner syndrome from digital images. Actually, the image is the most commonly used method for the recording of Horner  Principal components signs previously. Our method may provide a new way to diagnose without additional burdens. Te two types of classifers all showed adequate accuracy. For the two-stage method, we used MediaPipe for the face landmark extraction and trained machine learning models according to the coordinates of landmarks. Tis cooperation has been utilized in several previous studies. Siam et al. [24] used facial landmarks to generate geometrical features for human emotion classifcation and demonstrated superior performance. In the report of Gomez et al. [25], they analyzed the evoked facial gestures in patients with 'Parkinson's Disease from the video of patients and indicated that the detection rate signifcantly improved (from 75.00% to 88.46%) by using the 17 facial features derived from the landmark detection algorithm. In addition, similar attempts have also been applied in the assessment of cerebral palsy [26], pose evaluation in sports [27,28], and human activity recognition [29]. Te combination of pose estimation methods and machine learning classifers presented superior performances in these works. In our study, we utilized landmarks around the eyes to generate parameters for this task because Horner syndrome mainly infuences the geometrical features around the eyes. However, as mentioned above, ptosis is not consistently present in Horner (+) patients [17], and it is common to observe asymmetry in eyes among healthy individuals. Terefore, it is not hard to understand the relatively high specifcity and low sensitivity of these models. As for the one-stage classifer, the YOLO v5 is the latest version of the most powerful and fastest object detection algorithms. Te YOLO family has been utilized in many medical tasks including the detection of lung nodules [30], breast abnormalities [31], and lymphocytes [32] and achieved high accuracy. However, in this paper, the predictive accuracy was slightly lower than the two-stage classifers. We assumed that it was due to the insufcient data volume, although we have used data augmentation and transfer learning technology. At present, data defciency in medical imaging is a common problem for all researchers. Tis is extremely obvious in our task because the incidence of Horner syndrome is not so high, and there was no existing database of this syndrome. Future studies with more images are needed to develop this model.
Te lack of powerful computers is an inevitable problem in most clinical settings [33]. Terefore, for medical use, the running speed of the model is just as important as its accuracy. In this study, both deep learning models are characterized by fast running speed and well performance [11,34]. Te MediaPipe face mesh (BlazeFace) [11] showed super-real-time performance (200-1000+ frames per second) on mobile devices and achieved an average precision (AP) of 98.61% in the testing dataset. Similarly, YOLO v5 was presented as an efcient and powerful object detection model [34] and achieved state-of-the-art performance with a speed of 140 frames per second on Tesla P100, which performs twice faster as the previous version [35]. Without the requirement of excessive computational power support, these methods are more suitable for application in real clinical settings. As for the model performance in this study, the specifcities of classifers were much higher than the sensitivities. Tis characteristic indicates that these proposed models can help rule out patients with oculosympathetic pathway problems. Tese detectors can curtail the necessity for examination for all patients, thereby saving time and resources. In addition, the automatic detection can also beneft primary hospitals, where there are no available experts to rule out those high-risk patients.
Tis study also has some limitations. Firstly, due to the rarity of Horner (+) patients, the study only involved a limited number of samples. Future studies with larger sample sizes may help to enhance robustness and improve the accuracy of this method. Secondly, all the diagnoses of Horner syndrome were derived from patients with brachial plexus injury. Te diverse prevalence in diferent diseases might afect the sensitivity and specifcity of detectors, which could also impact the potential generalizability of the results. Tirdly, although facial images are the most convenient and commonly  used method for recording Horner syndrome, videos can provide dynamic information and have great potential to achieve better results. To solve these problems, we will establish the Horner syndrome database with various patients caused by diferent primary diseases. In addition, we will also attempt to investigate the possibility and accuracy of video detection.

Conclusions
In summary, we proposed two pipelines to detect Horner syndrome from facial images and evaluated their performance. Both the methods presented adequate accuracy compared with human experts. Our results have proved the possibility of automatic Horner detection, which could work as a second advisor to rule out high-threatening patients.

Data Availability
Te raw data required to reproduce these fndings cannot be shared at this time as the data also form a part of an ongoing study. Te processed data are available upon request by contact with the corresponding author.