Gait Analysis Using Computer Vision Based on Cloud Platform and Mobile Device

,


Introduction
is work is part of a project called Gait-A whose main objective is the early detection of frailty and senility syndromes using gait analysis. Physical activity is one of the main components involved in frailty syndrome evaluation [1,2]. Gait is identified as a high cognitive task in which attention, planning, memory, and other cognitive processes are involved [3,4]. rough gait analysis, quantification of measurable information of gait, and its interpretation [5], frailty and dementia syndromes can be diagnosed. is process is carried out by specialists and is based on estimations through visual inspection of gait.
In this work, we propose a computer vision approach that could aid the specialists providing them with objective measurements of gait and, thus, gain in objectivity of the gait analyses performed.
We propose the use of smartphone cameras to record the subject's gait and also provide computer vision algorithms able to analyse those sequences to extract spatiotemporal gait parameters. ese parameters are then sent to the cloud to be analysed by a classifier for the purpose of determining whether abnormalities are present or not.
A lot of works dealing with gait analysis using computer vision are found in the literature. However, most of them focus on gait biometrics for human identification, and few of them address gait analysis for detection of abnormalities.
e main goal of this study is to provide a nonexpensive and easy-to-deploy solution to obtain the spatiotemporal parameters of gait, which will be fed to classification algorithms that will discriminate between normal and abnormal gait. It needs to be mentioned that the process of obtaining spatiotemporal parameters for abnormal gait compounds the task as the number of assumptions that can be made over gait patterns is drastically reduced. In such cases, neither cyclic patterns nor the totality of the gait phases can be assumed to be present. In this work, for study purposes, Parkinsonian gait, knee pain, and foot dragging among other patterns that deviate from what we consider normal gait will be taken as abnormal gait.
A set of different gait features is analysed in [6] for person identification. e process starts by extracting the silhouette with a background subtraction technique to then obtain the contour. After the contour is obtained, they extract four timeseries features: width/height ratio, bounding box width, silhouette area, and center of gravity (COG). ese four features follow a cyclic pattern that match the gait cycle and are used to identify a person through deterministic learning.
Xu et al. examined the suitability of the Kinect sensor to measure gait parameters while walking on a treadmill in frontal view [7]. ey compared the heel strike (HS) and toe off (TO) they obtained with those obtained using a motion tracking system. HS showed less error than TO because it happens closer to the sensor.
Choudhury and Tjahjadi [8] proposed a method composed of three modules: silhouette extraction, subject classification using Procrustes shape analysis (PSA) and elliptic Fourier descriptor (EFD), and combination of both results. For silhouette extraction, they use background subtraction and morphologic operations to remove noise. PSA module analyses a group of shapes using matching of geometrical locations of a silhouette. e stride length is computed using the width of the bounding box. Finally, EFD allows to characterize the contour of the subject in key points of a gait phase.
Leu et al. proposed a method to extract skeleton joints from sagittal and frontal views [9]. e method proposed uses the horizontal and vertical projection of the silhouette pixels to obtain the neck joint. en they apply an anatomical model to obtain hip, knees, and ankles. Yoo and Nixon [10] also extract skeleton joints using an anatomical model to segment the silhouette but they obtain the mean points of each segment and then apply linear regression to obtain a line that represents the bones. During double support gait phase, they apply motion tracking to estimate the location of the occluded points. Khan et al. [11], similarly obtain the skeleton by computing the mean points of each body segment. ey obtain leg movement and posture inclination and compare it with a normal gait model to recognise Parkinsonian gait.
In addition, we find the following proposals for classifying gait patterns. In Wang [12], the method is based on optical flow that calculates a histogram of silhouette flows to which an eigenspace transformation applies. e data obtained are compared with a normal gait template to calculate deviation. In Bauckhage et al. [13], homeomorphisms apply between 2D lattices and binary shapes to obtain a vector space in which the silhouette is encoded. ey performed several silhouette bounding box splittings to obtain different lattices that are then classified using support vector machine (SVM).
Apparently, most of the vision-based gait analysis proposals use sagittal view for the reason that it provides more information with which to work. However, there are obtainable benefits out of a frontal gait analysis. According to Whittle [14], more gait abnormalities can be observed from a sagittal view than from a frontal view. However, we do also undertake frontal gait analysis for the following reasons: (i) Some abnormalities can only be observed from a frontal point of view. Whittle [14] mentions that circumduction gait, hip hiking, abnormal foot contact, and rotation among others are better observed from a frontal view. (ii) In terms of the physical space necessary for recording, sagittal gait sequences require much more than those of frontal gait, for which only a small hall or corridor will serve.
A way to reduce the space needed for sagittal view recording is to use a treadmill, but it could alter gait patterns, especially with frail people. Another workaround is to use a motorised camera that follows the subject, but it is expensive and could complicate the background subtraction as it is moving as well. Both workarounds complicate the acquisition of gait sequences making it difficult to be processed by a smartphone.
Sagittal images show a clear view of feet displacement and enough information to locate heel and toe of each foot. In frontal view, on the other hand, it is not easy to determine where the heel and toe are located in each foot. erefore, a different approach is required for frontal sequences.
In sagittal view, the size of the subject's silhouette is maintained along the whole of its trajectory. However, in frontal view, the size of the silhouette increases along its trajectory, so a normalization might be required. e paper is organized as follows. Section 2 describes the sagittal and frontal methods to obtain spatiotemporal parameters of gait, their implementation in a smartphone, and the classification of normal and abnormal gait in a cloud platform. Section 3 shows the results in which the spatiotemporal gait parameters are subjected to normal and abnormal gait classification. Finally, Section 4 provides the conclusion of this work.

Methods
In this paper, we present a platform for gait analysis using computer vision where a smartphone records and processes a gait sequence to obtain spatiotemporal parameters to be sent to the cloud for a classification between normal and abnormal gait. e layout of the platform is shown in Figure 1. In the following subsections, each module of the platform will be described.

Sagittal Approach.
e sagittal approach takes gait sequences recorded from the side as input. e method presents four phases: preprocessing, feet location, feature extraction, and skeleton extraction. Figure 2 shows the diagram of the sagittal approach. e classification phase is performed in the cloud.

Preprocessing.
In this phase, a background subtraction is performed to obtain the silhouette of the subject using mixture of Gaussians [15] background subtraction. After that, a morphology operator is applied to remove noise. Finally, the bounding box of the remaining silhouette is extracted by computing the x, y positions using (1), and then those points are made to correspond to a rectangle (x, y, width, height) using (2). (2)

Feet Location.
e silhouette obtained by background subtraction is then enclosed in its bounding box and split into four regions, namely, head (13% of bounding box height), torso (34%), upper legs (24%), and lower legs (29%), according to an anthropometric model [16] as shown in Figure 3. e lower leg region is then brought to focus. We search the silhouette pixel with maximum X component to obtain the toe of the front foot (FF) using (3) and the pixel with minimum X to obtain the heel of the back foot (BF) using (4). en, the lower leg region is split into halves vertically to separate each foot. In the BF half, we search for the lower right pixel (assuming displacement from left to right) to obtain the BF toe. In the FF half, we search for the lower left pixel to obtain the heel. e final result is shown in Figure 3.
arg max arg min  the previous phase. To these time series, we applied gradient analysis of the X component to obtain heel strike (HS) when the mean point gradient between FF heel and FF toe goes from greater than zero to zero (foot stops moving as shown in (5)) and the toe off (TO) when the mean point gradient between BF heel and BF toe goes from zero to greater than zero (foot starts moving as shown in (6)). Applying the gradient directly over the position time series produces a lot of false positives due to some noise. To filter the noise, we apply a threshold where any gradient value less than that is set to zero. is threshold can remove small oscillations due to an error in the process of getting the silhouette and locating toes and heels. It follows that a Gaussian smoothing is applied, and isolated values greater than zero or equal to zero are removed using (7).
2.1.4. Skeleton Extraction. e skeleton extraction phase provides a fast way of obtaining an approximation of the locations of the head, neck, hip, knees, and feet. It uses the four regions of the silhouette described in the feet location phase. e head and torso regions are divided in half horizontally, and the COG of each half is computed. e COG of the upper region is moved to the top, and the COG of the lower region is moved to the bottom. en, the head lower COG and the torso upper COG are averaged to obtain a common point which is the neck. e head location corresponds to the upper COG of the head region.
e upper leg region is also split horizontally in half, and both COGs are obtained. In addition, a vertical split is also performed, and another two COGs are obtained. e upper COG is moved to top and averaged with the lower torso COG to obtain the hip location. Lower COG is discarded. en right and left COGs are moved to bottom, those two points being the location of the knees. e knees are adjusted to simulate bending. e process to adjust the knees consists in tracing three circles: one with center at the hip and thigh length radius (which is the height of the upper leg segment) and two other circles with center at each foot and radius equal to the tibia length (which is the height of the lower leg segment). en, an intersection between the hip circle and each of the foot circles is performed. ere are three possibilities: (i) No intersection. In this case, the knee point is the one given by the COG. (ii) One intersection. In this case, the knee point is the intersection point. (iii) Two intersections. In this case, the knee point is the intersection point more to the right (assuming gait direction from left to right).
Finally, the location of each foot is the mean point of the heel and toe obtained in the feet location phase. Figure 4 shows the final result.

Frontal Approach.
e frontal approach is very similar to the sagittal one proposed in the previous subsection. It has the same phases: preprocessing, feet location, feature extraction, and skeleton detection. e diagram of the frontal gait approach is shown in Figure 5.

Preprocessing.
is phase is exactly the same as for sagittal. e silhouette is obtained using Mixture of Gaussians as background subtraction, and then morphology operators are applied to remove noise.

Feet Location.
In frontal view, both toes are always visible but heels are constantly occluded, so heels cannot be properly located. erefore, we can only rely on toe information.
To obtain toes, we proceed by dividing the silhouette in four regions according to the anthropometric model established in [16]. We focus only on the lower leg segment. en, we calculate its bounding box and split it vertically into half to separate both feet. It is important to recalculate the bounding box of this part so the vertical split separates both feet accurately; otherwise, any misalignment can cause problems. Note that the process of splitting the bounding box for the purpose of separating both feet will never be accurate with gait patterns that place one foot in front of the other. We will assume that this particular gait pattern is not present in our dataset. We obtain the left and right foot toe by locating the pixel with minimum y component in the left and right half, respectively (8) (Figure 6).  Mobile Information Systems arg min x,y ∀ x,y ∈ silhouette : y .

Feature Extraction.
e previous phase provides the position of each toe for each frame, which is precisely the information we need to derive HS and TO. We propose an approach to obtain HS and TO with frontal gait based on the time series derived by subtracting the vertical component of both feet.
We will use the subtraction of the y component of the toes to obtain a curve in which zero crosses indicate the feet adjacent gait phase. HS and TO of each foot are located between each zero cross. We can estimate HS and TO by assuming that HS is produced before TO; HS is produced in the first half of each region and TO in the second half. erefore, we can estimate HS and TO following (9) and (10), respectively, where zc i relates to the frame in which a zero cross point occurs and zc i−1 relates to the frame of the previous zero cross point.
is approach poses some problems with some abnormal gait patterns, as shown in [17], in which some events could not be detected, for example, when a foot is always behind the other or is dragged due to some injury or pain. Figure 7 shows foot dragging where, in some cases, the curve does not cross zero during the swing phase. To solve the problem, we devise another method. Using the same curve from the previous approach (the difference of y component of each foot), we proceed by applying Gauss filters to remove noise ( Figure 8 shows the curve of Figure 7 after applying Gauss filters), then we obtain the local maxima and minima, which are located more or less at the center of each pair of zero crosses. But, in this case, the curve does not have to cross zero to produce a maximum or minimum, and the problem is solved.
HS are located before a maximum or minimum, and TO after. We know that both events are located in that region. Empirically adjusting them, we derived that the HS is located at 1/4 the distance between one maximum (or minimum) and the previous one (12), and TO is located at 1/8 the distance between one maximum (or minimum) and the next one (13).
Being M an ordered set of maxima and minima in ascending chronological order: HS of m i is obtained as and TO is obtained as

Skeleton Detection.
e process is the same as the one described for the sagittal approach, but for frontal approach, the adjustment of knees is not necessary.

Smartphone Implementation.
Sagittal and frontal approaches were implemented on Android using OpenCV native functions. We allowed two ways of processing a dataset: (i) On a real-time video: the smartphone camera records the subject walking and processes it at the same time. (ii) On a previously recorded video: the smartphone records the subject walking and stores it in memory, and then the stored video is processed.

Mobile Information Systems
To achieve real-time processing, we use the pyramidal multirresolution approach described in [18]. We achieve 10 fps using a quad core at 1.4 GHz smartphone with 1 GB memory and 25 fps using a tablet with a Tegra K1 quad core processor at 2.2 GHz and 2 GB memory. e size of the input image was reduced to 480 × 270 pixels. However, results shown in Section 3 are obtained using full resolution using the dataset.

Cloud Platform.
To develop the cloud platform, we used the Microsoft Azure Machine Learning platform.
is is a cloud platform for designing and developing predictive models. Azure provides a REST Web Service to access the Machine Learning tools.
For our purposes, we develop a K-nearest neighbour (KNN) algorithm with Dynamic Time Warping (DTW) as a distance function accessed through the REST Web Service provided by Azure. To perform a classification between normal and abnormal gait, we use the stride (bounding box width for sagittal approach, and subtraction between y component of each foot for frontal approach) and leg-angle time series (provided by the skeleton extraction algorithm computed as the angle formed by the hip and each foot).

Results and Discussion
We will now describe the experiments performed and the results obtained. e dataset recorded for the experiments is also described in this section.

Dataset.
To test the proposed approaches, we recorded two datasets of subjects walking: one using sagittal view and the other using frontal view. Both datasets were recorded in a room with a nonhomogeneous background including windows where the light made it difficult to extract the silhouette. is was intentional because we wanted to test our approaches in real conditions, and so the silhouette is often incomplete. Figure 9 shows the room in which the recordings were performed.
To record the frontal dataset, we placed a camera at one end of an 8 m corridor and asked the subject to walk towards it.
We captured a total of 23 samples of normal gait and 20 samples of abnormal. To record the sagittal dataset, we used the same environment, but we placed a camera at a distance of 4 m from the perpendicular of the gait direction to obtain a side view. In this case, a total of 15 samples of normal gait and 15 of abnormal gait were recorded. Even if the number of recorded samples is low (43 for frontal gait and 30 for sagittal gait), there are a total of 320 HS events and 319 TO events for frontal gait and 233 HS events and 223 TO events for sagittal gait.
We asked the subjects to walk normally along the corridor and then to walk feigning some of the following abnormalities: (i) Knee pain: the subject simulated pain in one of his knees. (ii) Foot dragging: the subject dragged one foot. (iii) Parkinsonian gait: the subject made some small steps with variable speed. (iv) Other: the subject depicted random patterns.
To guarantee the privacy of the subjects, we published only the silhouettes extracted during the silhouette extraction phase. ese silhouettes are stored as an ordered set of images, and a file with the elapsed milliseconds for each image is also included. For each recorded sample, we manually mark the frames in which a HS or TO event occurs to use it as a ground truth. We also include information related to pixel width to be able to calculate distances and the sample class (normal � 0 or abnormal � 1). In addition, a file with the output of the feet location and feature extraction phases is included which contains the positions of heel and toe of each foot, their gradients, and the events of HS and TO detected. ese results are the output of the HS and TO detection algorithm using full resolution (1920 × 1080), which do not correspond to those provided by the smartphone using a quarter of that resolution.
Both datasets are accessible through the URL provided by [19].

Experiments.
We performed experiments using our own datasets for sagittal and frontal gait. We used the manual marking of the HS and TO events of each gait sequences of the dataset as ground truth. e error margin of this manual marking was set to ± 1 frame because that is the minimum value. We also assumed an error of ± 1 frame in the algorithm output. So, the global error margin was set to ± 2 frames. en, the difference in frames between the ground truth and the proposed algorithm was analysed. Any difference less or equal to the global error margin was considered acceptable. en, the root mean square error (RMSE) of the differences was computed using where n corresponds to the number of events (HS or TO in this case), m i the frame of the event i in the manual marking, and a i the frame of the event i in the algorithm output.

Sagittal Approach.
In Table 1, we show the results after applying the HS and TO detection algorithm with the filtering method described in the previous section for sagittal view. e table shows the amount of correct detections (less than 2 frames of difference between algorithm and manual marking), undetected cases, wrong detection (more than 2 frames of difference), and the root mean square error of both correct and wrong cases. As observed, the RMSE of both HS and TO events is lower than the error margin of 2 frames. TO events are more accurately delimited than HS events. But, HS events show less undetected cases. erefore, it will be HS, the event we will use to obtain the spatiotemporal parameters to perform classification. Figure 10 shows graphically the correct, wrong, and undetected cases.

Frontal
Approach. Table 1 also shows the results after applying the frontal approach. As shown in there, the RMSE of both HS and TO in normal gait is smaller than the error margin of 2 frames, but it is slightly bigger for abnormal gait. erefore, results are acceptable for both normal and abnormal. Error is mainly produced in the first steps when the silhouette is smaller (the subject is farthest from the camera). Figure 11 shows graphically the results of Table 1. e results obtained with our sagittal view approach are similar for normal gait. We obtained 1.44 frames for HS and 1.08 for TO, which were slightly more precise than the ones we extracted from frontal approach (1.88-1.63) but close to each other. However, in the case of abnormal gait, we obtained 1.79 frames for HS and 1.59 for TO, which were more precise than those obtained with the frontal approach (

Classification.
To perform a classification between normal and abnormal gait, we use KNN to compare the stride length and leg-angle time series of the different gait cycles. To calculate the distance between two time series, we apply DTW. We perform the classification test with two different methods: (i) Testing each gait cycle separately. e time series corresponding to each gait cycle is treated separately as if it belonged to different subjects. (ii) Testing each gait cycle of each recording sample and outputting the mode class for each subject. In this case, a prediction for each gait cycle follows, and then another prediction is computed by outputting the mode class for the same recording sample.
To validate the proposed classification, we use 10-fold and leave-one-out cross-validations to finely measure the accuracy of each classifier. Table 2 shows the results of the stride and leg-angle time series for the sagittal approach. We obtained an accuracy rate of 100% using leg-angle time series when outputting the mode class for each recording sample. Least accurate results, however, are the ones offered by the stride width. e results of the classification experiments for frontal approach are shown in Table 3. As shown in there, testing each recording sample produces better results as it tends to eliminate outliers.
We have focussed on obtaining a classification between normal and abnormal gait to assess the suitability of the proposed algorithm to differentiate between the two of them. For this test, we considered knee pain and foot dragging as abnormal gait. e results obtained suggest that the classifier can differentiate between normal and abnormal gait. erefore, future work will focus on classifying different abnormal gaits.

Conclusion
e main contribution of this paper is a nonexpensive and easy-to-deploy approach to obtain HS and TO and some skeleton joints using both sagittal and frontal gait sequences. Frontal view poses some problems when obtaining heels position, so we focus on toes instead. Results show acceptable precision in providing HS and TO in both the sagittal and the frontal methods. Comparing both approaches, results were similar but sagittal proved to be more accurate. e dataset recorded to test the proposed approaches is for anyone to use it [19]. To maintain the privacy of the subjects, we published only the silhouette.
We also provide a cloud platform-based web service to perform a classification between normal and abnormal gait for both sagittal and frontal views. Results show a classification rate greater than 80% in frontal view and more than 90% in sagittal view. e ability to perform gait analysis using frontal view reduces the physical space required for the tests. In addition, this method does not rely on silhouette displacement (the sagittal approach does), so it is also suitable for treadmill gait sequences. erefore, the space could be reduced even more in cases where the alteration of gait patterns that the treadmill could cause does not significantly matter.
Future work will focus on improving the accuracy of HS and TO for abnormal gait and classifying different abnormal gait types.

Conflicts of Interest
e authors declare no conflicts of interest.