Quantification of the Upper Extremity Motor Functions of Stroke Patients Using a Smart Nine-Hole Peg Tester

This paper introduces a smart nine-hole peg tester (s-9HPT), which comprises a standard nine-hole peg test pegboard, but with light-emitting diodes (LEDs) next to each hole. The s-9HPT still supports the traditional nine-hole peg test operating mode, in which the order of the peg placement and removal can be freely chosen. Considering this, the s-9HPT was used in lab research to analyze the traditional procedure and possible new procedures. As this analysis required subjects with similar levels of dexterity, measurement data from 16 healthy subjects (seven females, nine males, 25–80 years old) were used. We consequently found that illuminating the LEDs in various patterns facilitated guided tests of diverse complexity levels. Next, to demonstrate the clinical application of the s-9HPT, the improvement in the hand dexterity of 12 hospitalized stroke patients (45–80 years old, six females and six males) was monitored during their rehabilitation. Here, we used traditional and guided tests validated by healthy subjects. Consequently, improvements were found to be patient specific. At the beginning of rehabilitation, traditional tests suitably indicate improvements, while guided tests are beneficial following improvements in motor functions. Further, the guided tests motivated certain patients, meaning the rehabilitation was more effective for these individuals.


Introduction
The nine-hole peg test was introduced in 1971 [1], together with a mechanical tool for measuring finger dexterity [2,3]; further, upon its introduction, specific dimensions were provided for both the board and the pegs used. The traditional test (TRT) is quite simple, and healthy persons can complete it in 15-20 seconds. To perform it, the subject must, using only one hand, pick nine pegs up one at a time from the holder and place them in the holes in the board in arbitrary order. When all nine pegs are in the holes, they must then be removed, also one at a time, and with the same hand. The result of the test is the time that has elapsed from the moment the subject picked up the first peg to the moment they returned the last peg to the holder or placed it on the table. The execution time must be measured by a person supervising the test.
Assessments of hand movement do not rate fine motor skills; instead, they are rather functional tests. By analyzing the results of such tests, medical doctors, or physiotherapists can diagnose the severity of the disorder in question and the level of self-management possible and can estimate the prospective improvement. In short, the objective assessment of fine motor function greatly assists effective therapy.
The nine-hole peg test has been extensively studied in previous literature. For instance, several prior studies have tested its effectiveness with patients with various illnesses; examples of such studies are as follows: Heller et al. [4] found the nine-hole peg test to be suitable for rating the dexterity of patients recovering from acute strokes; Earhart et al. [5] tested 262 patients with Parkinson's disease and found the result of the nine-hole peg test to be a clinically useful measure for assessing their upper extremity function; and Feys et al. [6] found the nine-hole peg test to be reliable and applicable for assessing multiple sclerosis patients with different levels of upper limb impairment. Furthermore, the ninehole peg test has also been applied in other related studies: Brunner et al. [7] and Paquin et al. [8] used the nine-hole peg test to assess the functional recovery of stroke patients; Mathiowetz et al. [9] and Wang et al. [10] published adult norms for the nine-hole peg test; Wade [11] found that dexterity disability is best assessed using the nine-hole peg test or the ten-hole peg test; Wang et al. [12] recommended the nine-hole peg test for inclusion in the motor battery of the NIH Toolbox; and Koyama et al. [13] measured thumb and index finger distance during the nine-hole peg test and found the placement phase to be more informative than the removal phase.
We evaluated medical doctors' experiences concerning the traditional nine-hole peg test and, as a result, the limitations of the low-tech pegboard became evident. Consequently, a smart nine-hole peg tester (s-9HPT) was developed at the Department of Measurement and Information Systems, Budapest University of Technology and Economics. In this paper, we begin by analyzing both the traditional and guided tests based on recordings of healthy subjects using the smart device. Then, we demonstrate the clinical applicability of the s-9HPT by having stroke survivors perform traditional and guided tests that have been validated by the healthy subjects.

Materials and Methods
As mentioned in the previous section, the main aim of this research was to analyze the nine-hole peg test procedure using the s-9HPT; additionally, however, a further aim was to assess the clinical applicability of the new tests facilitated by s-9HPT.
2.1. Tested Persons. Twelve hospitalized stroke patients (Table 1) and 16 healthy control subjects (seven females and nine males, aged 25-80 years, all but one being righthanded) were tested. All subjects provided written informed consent. The research was performed in accordance with the Declaration of Helsinki, and the study protocol was approved by the Scientific and Research Ethics Committee of Szent János Hospital Budapest (protocol no: 001/2016). Stroke patients were selected from the patients of the Department of Physical and Rehabilitation Medicine of Szent János Hospital, Budapest. Specifically, the inclusion criteria for recruitment were (1) upper extremity functional impairment due to the damage of the central nervous system and (2) aged between 18 and 85 years. Meanwhile, the exclusion criteria were (1) plegia of the upper extremities that rendered motor functions untestable, (2) the presence of any disorder influencing hand function that is not related to the central nervous system, and (3) full legal incapacity or partial legal capacity. Patients' actual functional states were assessed for both hands at hospitalization and at the end of the rehabilitation program (i.e., when they were discharged from the hospital) using the Functional Independence Measure (FIM, [14,15]) (this was performed by a physician who specialized in physical and rehabilitation medicine) and the Barthel Index (BI) (taken by a nurse). Stroke patients repeated the test session on several occasions during their hospitalization period. The minimum number of tests performed by a patient was five, and the maximum was 16 (with the average being nine).
FIM and BI were also used to reveal changes in the patients' functional ability; these measures are widely used to assess stroke patients during rehabilitation. Specifically, FIM features 18 items within six domains and focuses on both activities of daily living and cognition, while BI features 10 items but only assesses activities of daily living.
In regard to the healthy control group, the inclusion criterion was: aged between 18 and 85 years; concurrently, the exclusion criterion was the presence of any disease influencing hand function. The hand dominance of the tested persons was determined by the therapists.

s-9HPT.
Using the same mechanical dimensions as the traditional pegboard for the nine-hole peg test, the s-9HPT ( Figure 1) has several extra functions. The most obvious of these is that the test can be completed without the presence of a supervisor, as it automatically measures users' execution times. Such unassisted usage is helped by the incorporation of a delayed start to the test: when the "start" button is pressed, the device displays the following messages in sequence for one second each: "ready"-"steady"-"go." At the moment the "go" message appears, a buzzer also sounds. Then, two measurements of execution time are performed. The first ranges from the time the buzzer sounds to the removal of the last peg, while the second begins with the insertion of the first peg and ends with the removal of the last one. The difference between the two measurements is the time taken to pick up the first peg and insert it into a hole.
The s-9HPT has a sensor in each hole that allows it to detect the presence of a peg. The device measures not only the time required to complete the test but also the time intervals between consecutive peg insertions and removals. Thus, the device affords the measuring of three phases of the total execution time: On the device, a light emitting diode (LED) is located next to each hole; in Figure 1, it can be seen that the LED corresponding to hole 1 is illuminated. By adding a visual component, guided tests (GUTs) can be performed. An example of such a test would involve a subject being asked to insert a peg into a hole with a corresponding illuminated LED; when this peg is correctly inserted, the LED at this hole remains lighted and the LED for the next hole in the sequence illuminates. When the ninth peg is placed in the hole, all LEDs are turned off and on for a short period (0.2 s), and then all but one LEDs are switched off. At this point, the peg associated with the illuminated LED should then be removed.  There is another test that diverges slightly from this process: the random test (RAT). Here, the device might request that the subject, while in the process of inserting pegs, remove a peg before all nine have been inserted. During RATs, an unlit LED, while all other eight LEDs are illuminated, is used to signify that the corresponding peg should be removed.
Considering the above descriptions, it is clear that this system affords the setting of GUTs with differing levels of complexity. The TRT does not specify the order the pegs should be placed into and removed from holes; thus, a subject can choose different orders every time he/she is tested, causing low reproducibility. GUTs eliminate this uncertainty. Further, the order of placement and removal can be programmed beforehand by the tester. Moreover, as in both GUTs and RATs subjects are required to perceive and recognize lit and unlit LEDs, as well as place pegs in corresponding holes, it should be noted that deteriorated cognitive function results in longer execution time.

The Measurement
Procedure. In this study, the same medical doctor supervised all tests. To begin, the doctor informed each subject of the correct method of using the s-9HPT, allowing them to become acquainted with the device before commencing the assessment. The tester was placed on the table in front of the test subject, and each user kept the tester in a fixed position using the hand that was not being tested. Meanwhile, the pegs were placed in a holder next to the tester, on the same side as the hand being tested. As mentioned earlier, the pegs were to be picked up one at a time, either from the holder or from the holes; if a peg fell onto the table, the participant picked it up and continued the test; if a peg fell to the floor, the supervisor picked it up and placed it back in the holder. When the participant was comfortable with the device, the supervisor began a TRT; this entailed the insertion of nine pegs into the holes in an arbitrary order, and then the removal of each peg, also in an arbitrary order.
The device was rotated by 90 degrees so that the display was closer to the nontested hand; this position guarantees the same level of usage as the traditional nine-hole peg testing device, while improving the visibility of the LEDs. Five GUTs with different complexities were defined; there were two main kinds of GUTs: those entailing a simple sequence of holes, and those featuring a pseudorandom sequence. A test sequence could be made more difficult by requesting that the test subject place pegs into holes that are located behind pegs (from the angle of the tested hand) that have already been placed in holes. Specifically, for the GUTs, the following hole orders were used (for an s-9HPT that has been rotated counterclockwise, as in Figure 1, and with the subject using their right hand): GUT 1 (easy):  1 4 3 5 9 2 7 6 Meanwhile, for the RAT, as mentioned above, the placement and removal of pegs was conducted in a random order; further, during such tests, the removal of placed pegs was requested intermittently, at which times the subject would remove pegs they had placed up to that point. A sample random order is the following ("P": placement, "R": removal): (2) TRT, repeated a number of times (for most subjects, twice) until the subject becomes familiar with the device and the test procedure

Contents of the Test Sessions for Healthy Subjects
The above session plan was identical for all healthy patients; that is, the number of each form of test and the order were not randomized.

Contents of the Test Sessions for Stroke Patients.
A different test plan was provided to the stroke patients; this was because the healthy subjects were tested in order to assess the traditional nine-hole peg test procedure and the smart tester, while the stroke patients were tested to show that the s-9HPT gives more information concerning stroke patients' rehabilitation than the simple mechanical nine-hole peg test device. Further, stroke patients became tired within a few minutes of testing, meaning their test sessions had to be shorter. At the beginning of the first test session involving stroke patients, the subjects were also given an opportunity to familiarize themselves with the s-9HPT, and they signed an information sheet and consent form. Then, they repeatedly performed TRTs until their results stabilized. The remainder of the first test session comprised the following tests: Items (1)-(4); completed with the non-affected hand Items (1)-(4); completed with the affected hand (1) TRT, repeated three times Further test sessions began with three TRTs. Then, for each patient, depending on their actual condition, the test session was either terminated after the third TRT or continued through the application of GUTs, which were given on an approximately weekly basis. Details on the length of each patient's rehabilitation process are provided in Table 1.
Further, the exponential fit of the measured data and the calculation of adjusted R 2 parameters (characterizing the goodness of the fit) were also performed using MATLAB.

Results
Overall, over 600 tests were completed by the healthy subjects, while over 1000 were completed by the hospitalized stroke patients. Although all participants were asked to begin the test upon hearing the buzzer, some participants, especially patients, a couple of times commenced the test too early or late. Thus, during the evaluation, the moment the first peg was inserted was set as the start of the test, as this increased the reliability of the results.
Although the test sessions comprised of different test items for stroke patients than for healthy persons, all participants' test results were found to have improved; in some cases, improvement was even found during the same test session, as the participants became accustomed to the device while they were using it.
For GUT1 and GUT2, the correct orders in which the pegs should be placed were relatively easy for the participants to replicate; meanwhile, during TRTs (i.e., without LED guiding), both patients and healthy persons generally spontaneously placed the pegs in the same order as requested by GUT1. However, GUT3 featured a more sophisticated sequence, while GUT4 and GUT5 required the subjects to pay close attention. GUTs have three levels of complexity. This was determined by using the nonparametric Kruskal-Wallis probe to examine the test sessions (minimum of six) performed by each of the 16 healthy subjects; consequently, three groups (TRT; GUT3, GUT4, and GUT5; and RAT) with significantly different mean ranks were revealed (α = 0 05).  Figure 2 (showing, for subjects' dominant hands, TRT scores and results for GUT1, GUT2, GUT3, GUT4, GUT5, and RAT). A surprising finding was that for most healthy persons, GUT1 required a longer execution time than the more difficult GUT2. However, the reason for this is that the participants initially found the GUT format to be unusual, but grew accustomed to the method as they used it and consequently found the more difficult GUTs performed afterward to be more normal. During the traditional test, the time intervals of the phases had higher COV than the ratios of these time intervals to the total execution times. The results (mean, COV) were as follows: placement time: 12 However, the stroke patients' results could not be averaged; the patients had different levels of motor disability, and their improvements during rehabilitation also differed. The results of two stroke patients (S3 and S4, affected hand) over the course of the rehabilitation are shown in Figure 3, along with total execution time and the ratio of placement time to total execution time.
Further, results for stroke patients' nonaffected hands were also found to have improved (see Table 2); however, the level of improvement here was much less than that for affected hands. This shows that the noted improvements in affected hands were only slightly based on the practice patients gained through performing the test.

Discussion
The s-9HPT facilitates guided tests of diverse complexity levels. These tests provide more detailed assessments of hand dexterity than the traditional nine-hole peg test. Nevertheless, guided tests require sound cognitive function. The traditional and guided tests were analyzed based on the results of healthy subjects, and this analysis greatly assisted in the assessment of stroke patients.
Stroke patients do not constitute a homogeneous group; thus, their results cannot be averaged. Each patient's data were analyzed individually. At the beginning of the rehabilitation, TRT was found to be appropriate for assessing patients' hand movements. Guided tests were used to assess fine motor skills and were applied following observations of improvements in motor functions. Our results attest that improvements in stroke patients' motor skills are reflected differently by nine-hole peg test results and functional ability measures (FIM, BI). The correlation between FIM and BI was found to be weak, with the PCC decreasing from 0.58 at the commencement of the rehabilitation to 0.38 at the end. The correlation between BI and NHPT tests was also weak, with PCC being better at the However, this discrepancy can be explained by the fact that FIM and BI evaluate functional independence, which differs from hand dexterity.
In general, stroke patients had higher removal-time rates and, thus, lower placement-time rates than healthy persons. This indicates that for stroke patients, making the required hand movements and grabbing pegs are difficult. The level of difference between the difficulty experienced placing a peg into a hole and that involved in placing a peg into the much bigger peg holder was smaller for patients than for the healthy subjects. However, results are person specific; illustrating this point, Figure 3 shows two patients' (S3 and S4) differing rehabilitation progress.
Analyzing the stroke patients' results, a marked improvement was found relating to the number of days spent in hospital. Decreases in the total execution time of the TRTs can be approximated using the T d = A * exp −B * d + C function, where T is the total execution time of a TRT; d is the time in days from the first test session to the actual test session; and A, B, and C are patientspecific constants.
The goodness of exponential fit to data was tested using MATLAB's curve-fitting tool, and the median of adjusted R 2 for the 12 stroke patients was found to be 0.67 of the exponential fit. For nine of these patients (S1, S3, S4, S6, S7, and S9-12), the exponential curve fitting was found to be good (for these nine patients, the median of adjusted R 2 was 0.88 (min.: 0.53, max: 0.99)). Meanwhile, however, for the other three patients, the exponential fit was worse than simply fitting a horizontal line; this caused negative R 2 values.
Such exponential curve fitting can help therapists decide if further improvements (decreases) in TRT execution time can be expected in the near future. We evaluated the goodness of exponential fit for each stroke patient and the relation between the estimated final values for the fitted exponential functions and the lastmeasured values for TRT execution time. These data can help determine whether the rehabilitation program is worth continuing. Consequently, considering our participants, the exponential function fit must be rejected for S2, S5, and S8, as the results corresponding to these patients exhibit a fluctuation around a given value; this means that for these patients, continuing the rehabilitation program would not be reasonable. Meanwhile, the exponential fit was good for the other patients' results. In particular, S1, S3, S7, S10, S11, and S12 achieved the estimated final (minimum) value with regard to their total TRT execution time. Thus, the rehabilitation program is not expected to yield further improvement for these participants, either. However, during the last complete test session, for S4, S6, and S9, the total execution time was much longer than the estimated minimum value; consequently, for these patients, the rehabilitation program is expected to yield further improvements in the near future (it should be noted that there are also other aspects that should be taken into account in order to appropriately determine if the rehabilitation program should be terminated or continued). Figure 3 shows the difference between the progress of S3 and S4.
The guided tests mainly confirmed the qualification of the approach based on the exponential fit, with S4 and S10 representing exceptions. For S1, S2, S3, S5, S7, S8, and S12-similar to the healthy subjects-GUT4 and GUT5 required longer execution times than TRT. Further, the difference between the affected and nonaffected hands with regard to GUT4 execution time was a maximum of 10 s. S6 and S9 differed from all other patients: their execution times in GUT1, GUT4, and GUT5 were shorter than those in TRT, and the average differences between their affected and nonaffected hands were substantially greater than those of the other patients (GUT1: 27.4 and 31.8 s, resp., average of other patients: 6.4 s; GUT4: 19.3 and 18.2 s, resp., average of other patients: 9.1 s; GUT5: 18.5 and 19.7 s, resp., average of other patients: 9.4 s). These results affirm that for S6 and S9, further improvements can soon be expected with regard to the motor function of their affected hands.
Based on the exponential fit, S4 was qualified to continue the rehabilitation. However, similar to the healthy subjects, she produced longer execution times in all guided tests than  in TRT, and the difference between her affected and nonaffected hands in TRT, GUT1, and GUT5 was much smaller than the differences shown by S6 and S9. These results do not suggest that further improvement can be expected in the motor function of S4's affected hand in the near future. For S10, the difference between the affected and nonaffected hands with regard to GUT4 execution time was longer than 10 s (12.8 s), while GUT1, GUT4, and GUT5 execution times were equal. Consequently, his rehabilitation should be continued.
Through GUTs, therapists can personalize test items for a given patient. In our sample, some patients (S4, S7, and S10) mentioned enjoying the GUTs; these patients were motivated to improve (decrease) their total execution times, and this made the rehabilitation more effective. However, there were also patients who disliked (S5) or were almost unable to perform (S9 and S11) the GUTs, and consequently, these tests did not improve the efficiency of their rehabilitation.
Next, it should be noted that at the beginning of the rehabilitation process, S9 and S11 stated that they found the GUTs to be too difficult; however, by the end of the program S9 was able to perform GUTs. Unfortunately, as she did not perform a GUT at the beginning of her rehabilitation, her improvement could not be quantified. Further, during the final test session, S2 became confused while performing GUT4 because he was disturbed by a nurse accidentally entering the room. Consequently, the score for this test (48.79 s) was much worse (longer execution time) than that S2 could have achieved, but he was too tired to repeat GUT4 at the end of the test session.
Another finding of our research is that the better a stroke patient's nine-hole peg test score at the beginning of the rehabilitation program, the smaller their improvement at the end (see Table 2). Further, substantial improvement at the beginning of rehabilitation (S3, S4, S9, S10, and S12) was mainly due to improvements in arm (and not in hand) movement. For patients' affected hands, the correlation between the result of the TRT given at the commencement of the program and the difference between the test scores for the TRTs at the beginning and at the end of the rehabilitation program was excellent: PCC = 0 996 and SRC = 0 944. Further, the correlation between the average TRT result during the first complete hospital test session and the average daily improvement was also excellent: PCC = 0 92 and SRC = 0 92.
Test-retest reliability [17] was rated based on the three TRTs performed by the 16 healthy persons at the beginning of their test sessions. The ICC for the 16 data groups was calculated using Excel, and the results are convincing: for the dominant hand, ICC was 0.921 (95% confidence interval: 0.82-0.97), while for the nondominant hand, ICC was 0.918 (95% confidence interval: 0.80-0.97).

Study Limitations
In total, the 12 stroke patients and the 16 healthy control persons completed over 1600 tests, and we found this to be sufficient to analyze the traditional nine-hole peg test, to verify the evaluation algorithms and the GUTs, and to give feedback to engineers developing the s-9HPT. However, further tests involving more stroke patients are required to validate the clinical applicability of the s-9HPT. In addition, further tests are needed to evaluate the diagnostic power of the detailed time values the s-9HPT measures; specifically: placement, intermediate, and removal time as well as time intervals between consecutive peg placements and removals.

Conclusions
The TRT results for the healthy persons were categorized into three age groups, and these are presented in Table 3; these data are similar to norms published in previous related studies [9,12,18].
Finally, in accordance with the findings of Wang et al. [10], Earhart et al. [5], and Oxford Grice et al. [18], within the healthy group, we found that women performed slightly better than men (see Table 3).
The nine-hole peg test is considered an effective test for hand dexterity, as it can objectively qualify and quantify the progress of (e.g., Parkinson's) or recovery from (e.g., stroke) certain diseases. However, the s-9HPT provides more detailed information on hand dexterity. Using the s-9HPT, therapists can personalize test sessions for patients, selecting traditional and guided tests of different complexities; moreover, the results measured by the s-9HPT can help therapists decide if the rehabilitation program can be expected to yield further improvements for a patient. Meanwhile, guided tests motivate some patients, making their rehabilitation more effective; further, there are marked differences in stroke patients' progress, meaning each patient's data must be analyzed individually. Finally, the s-9HPT can be used by the patients themselves at home without the presence of a supervisor; such home application can help the rehabilitation of stroke patients.

Conflicts of Interest
The authors report no declarations of interest. The authors alone are responsible for the content of this article.

Acknowledgments
The s-9HPT was developed by Balázs Patonai and then improved by Zoltán Havas and Gábor Nemes, students of Budapest University of Technology and Economics.