This research investigates if a computer and an alternative input device in the form of sensor gloves can be used in the process of teaching children sign language. The presented work is important, because no current literature investigates how sensor gloves can be used to assist children in the process of learning sign language. The research presented in this paper has been conducted by assembling hardware into sensor gloves, and by designing software capable of (i) filtering out sensor noise, (ii) detecting intentionally posed signs, and (iii) correctly evaluating signals in signs posed by different children. Findings show that the devised technology can form the basis of a tool that teaches children sign language, and that there is a potential for further research in this area.
Communication involves the exchange of information, and this can only occur effectively if all participants use a common language [
This research investigates if a computer, and an alternative input device in the form of sensor gloves, can be used in the process of teaching children Australian sign language (Auslan). Each sign consists of a number of parts: hand shape, place of articulation, orientation, path of movement, and nonsign components including facial expression [
A central difference between current research in this area and our work is that current research tends to investigate how sensor gloves can be used for sign language interpretation [
This research paper presents the initial research into the viability for using data gloves in combination with a computer and software to provide feedback to children on the accuracy of their expressive signs. The aspects covered in this paper are a description of the gloves and hardware, how to identify intentionally posed handshapes, and an initial investigation into the viability of evaluating signals from two children with different hand sizes using one set of data gloves. This paper only briefly discusses how the results could be incorporated into a learning system.
We have decided to incorporate sensor gloves into our system design, as this technology has been used in a variety of application areas, which demands accurate tracking and interpretation of sign language. An example is the AcceleGlove technology developed by [
This paper is organized as follows. In Section
In this section we define what sensor gloves are, and describe some of the existing glove technologies. The hardware components of the gloves will be discussed first. We then go on to describe some of the processing techniques that are used to analyse and interpret data signals that are generated by the gloves.
Sensor gloves are hand worn devices with inbuilt sensors that can capture information about the movements and positioning of the user’s hands. Some of the most widely known sensor glove technologies are the (i) DataEntryGlove [
The DataEntryGlove was presented by Gary Grimes from Bell Telephone Laboratories in 1983, and was the first widely published sensor glove [
Thomas Zimmermann developed the DataGlove in 1987. This glove was constructed of a lightweight fabric glove equipped with optical sensors on each finger, and magnetic sensors on the back of the gloves [
The CyberGlove was developed at Stanford University in 1988 and was specifically designed for the Talking Glove Project, which focused on translating American sign language into spoken English [
The AcceleGlove uses accelerometers and potentiometers to capture finger and hand poses. The accelerometers are placed on the fingers, the wrist, and the upper arm, and are used to provide orientation and acceleration information. The potentiometers are located on the elbow and the shoulder, and provide information about the hand’s absolute position with respect to the body [
Before we move on to the next section, it is important to notice that literature points out that signals from sensor gloves have to be converted from an analogue to a digital format before being interpreted by a computer [
When the reviewed literature discusses issues related to the software components of glove technologies, the main focus is on how to classify signals. This focus is held because the classification process is central in determining if signs can be correctly identified. The requirements of the target application area determines what method is best suited for classifying the signals (e.g., sometimes it is sufficient to use a classification method that only takes into account the shapes of the hands, while other times it is necessary to use a classification method, which analyses hand shapes, hand locations, and hand movements). One must also determine if one wants to classify static or articulated signs. If one wishes to classify static signs, then it might be necessary to use a classification method that can filter out “transitional signs,” which not only are intentionally posed by the user, but rather arise as the fingers and hands move from one pose to another. If the target application area requires classification of articulated signs, then one might have to use a classification method, which takes into account (i) initial hand shapes, (ii) hand orientations, (iii) hand positions, (iv) hand motions, and (v) end hand shapes [
Some methods that have been used to successfully classify signals from sensor gloves are (i) neural networks (NNs), (ii) hidden markov models (HMMs), and (iii) template matching. When using NNs or HMMs, one must first construct a network with sufficient nodes and links to capture gestures at an abstraction level, which satisfies the requirements of the application area. Then one must train the network by iteratively processing representative samples of the type of data to be classified. The drawback with using NNs and HMMs, is that significant time and effort is required in order to design and train the networks. It is also hard to search for errors, and to explain the outcome of the classification process [
In this section we will describe the hardware and software components that have been devised throughout this research. We start off by describing the hardware components. Then we continue by describing software components that have been devised to (i) filter out sensor noise, (ii) detect intentionally posed signs, and (iii) evaluate signals in signs posed by different children.
A number of issues had to be considered when we were assembling the hardware for the sensor gloves. Some of these issues are the following. What size the gloves had to be, to fit onto the hands of different children? How to make the gloves robust enough to ensure that they can withstand the wear and tear, which results from several children putting them on and taking them off? What type and number of sensors to select to successfully extract Auslan signs? How to attach the sensors to the gloves? How to convert the analogue signals from the sensor gloves into a digital format, which can be readily interpreted by a computer?
To ensure that the gloves would have a size suitable for a child, we used a pair of children’s gloves as a base for the sensor gloves. The selected gloves were made up of robust and stretchy lycra material. We selected gloves with this material to ensure that the sensor gloves would be robust enough to withstand wear and tear, and to ensure that they would have enough stretch to fit the hands of different children.
10 flex and 10 tactile sensors were incorporated into the gloves. These sensors were selected because they would make it possible to detect finger flexion and the touch of fingertips, which is sufficient to register a number of different Auslan signs. A pair of the selected flex and tactile sensors are shown in Figure
A pair of the selected flex and tactile sensors.
How flex and tactile sensors were distributed across the gloves.
The wires from the sensors were then plugged into an I-cube X converter so that sensor signals could be converted from an analogue to a digital format. The I-cube X is a system that enables a large variety of sensors to be connected to the I-cube X box which is a digitizer that converts the signal from the sensor into a digital message [
Sensor gloves, I-cube X converter, and uno MIDI to USB converter.
The main issues that had to be considered when we were designing the software for the sensor gloves were the following. How to simplify the data to support fast processing? How to identify intentionally posed signs? How to correctly evaluate signals in signs posed by different children?
We will provide a quick overview of the raw data from the sensors before we go on to describe the architecture of the software, as this will make it easy to understand why we have employed the particular methods. Data packets with raw data are transmitted from the I-cube X box to the computer every fourth millisecond. These data packets include information about when the data was captured, the wire or channel the data is transmitted through, and the signal strength. Signals from the tactile sensors have a strength that range from zero to 118, where zero corresponds to no pressure, and 118 corresponds to hard pressure. Signals from the flex sensors range from zero to 115, where zero corresponds to no flex and 115 corresponds to maximum flex.
To support fast processing we devised a low-pass filter. This filter removes data signals if the input from all the sensors has a strength below a set threshold. The threshold value was set to 45, which is a value just above the maximum random signal fluctuation observed in the sensor system when the sensors are not stimulated. Signals that are not removed by the low-pass filter are scaled so that the minimum value of all signals is zero, the maximum value of signals from the tactile sensors is 73, and the maximum value of signals from the flex sensors is 70. These signals are then passed on to a classification module. This classification module first labels each data packet according to the finger and sensor the particular stimulus was detected from. This is done by using a mapping model, which relates wires or channels to particular label names. The mapping model is shown in Table
Mapping model used to label data packets from the sensors.
Mapping model for labeling of data packets | |||||
---|---|---|---|---|---|
Channel | 1 | 2 | 3 | 4 | 5 |
Right hand | Thumb tactile | Thumb flex | Index tactile | Index flex | Middle tactile |
| |||||
Channel | 6 | 7 | 8 | 9 | 10 |
Right hand | Middle flex | Ring tactile | Ring flex | Little tactile | Little flex |
| |||||
Channel | 11 | 12 | 13 | 14 | 15 |
Left hand | Thumb tactile | Thumb flex | Index tactile | Index flex | Middle tactile |
| |||||
Channel | 16 | 17 | 18 | 19 | 20 |
Left hand | Middle flex | Ring tactile | Ring flex | Little tactile | Little flex |
When the data packets have been labeled, they are analyzed to discriminate between intentionally and unintentionally posed signs. This analysis is conducted by evaluating the strength of the sensor signals throughout pulses, which last for two seconds. These pulses have two main phases. We call the first of these phases (which lasts for one second) a “registration phase.” In this phase, signals from all the sensors are registered by the software. The second phase (which also lasts for one second) is referred to as a “constant phase.” In this phase the signals from the sensors can only fluctuate 30 units above or below the signals detected in the first phase, for input to be recognized as being part of an intentionally posed sign. If sensor signals that have been detected throughout the “constant phase” are stable enough to satisfy this criterion, then they are grouped into an intentionally posed sign and processed further. If some sensor signals fail to satisfy this criterion, then all the detected signals are discarded at the end of the pulse. This process is illustrated in Figure
Process of discriminating between intentionally and unintentionally posed signs.
When an intentionally posed sign is detected, it is compared to a library of prestored model signs. This is done to classify the sign, and to determine if it is correctly posed. An intentionally posed sign is regarded as being correctly posed, if it satisfies two criterions. To satisfy these criterions the sensor signals in the intentionally posed sign must be the
The parameters were specified throughout an empirical trial and error process. Six different model handshapes have been generated. These model signs have been labeled (i) fist, (ii) thumb up, (iii) little finger up, (iv) pointer up, (v) ok, and (vi) cup. How these model handshapes are expressed is shown in Figure
Expected and actual input values from sensors in the sensor glove.
Model signs and results | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Thumb tactile | Thumb flex | Index tactile | Index flex | Middle tactile | Middle flex | Ring tactile | Ring flex | Little tactile | Little flex | |
|
|
|
|
|
|
|
|
|
|
|
Child 1 | 34/ |
0/0 | 0/0 | 0/0 | 0/0 | 34 |
31/ |
46/ |
52/ |
0/ |
Child 2 | 35/ |
80/ |
73/ |
0/0 | 0/0 | 0/0 | 0/0 | 45/ |
0/0 | 79/ |
| ||||||||||
|
|
|
|
|
|
|
|
|
|
|
Child 1 | 0/ |
0/0 | 33/ |
0/0 | 0/0 | 0/0 | 31/ |
38/ |
0/0 | 0/0 |
Child 2 | 31/31 | 48/ |
73/ |
0/0 | 0/0 | 0/0 | 0/0 | 56/ |
0/0 | 0/0 |
| ||||||||||
|
|
|
|
|
|
|
|
|
|
|
Child 1 | 49/ |
0/ |
0/0 | 0/0 | 31/ |
0/0 | 31/ |
105/ |
61/42 | 0/0 |
Child 2 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | 65/ |
0/ |
0/0 |
| ||||||||||
|
|
|
|
|
|
|
|
|
|
|
Child 1 | 37/ |
33/ |
0/ |
0/ |
0/0 | 0/0 | 0/0 | 90/ |
0/0 | 0/0 |
Child 2 | 0/0 | 0/0 | 34/34 | 0/ |
0/0 | 0/0 | 0/0 | 53/ |
0/0 | 0/0 |
| ||||||||||
|
|
|
|
|
|
|
|
|
|
|
Child 1 | 54/ |
0/0 | 88/ |
34/ |
0/ |
0/0 | 0/ |
0/0 | 0/ |
0/0 |
Child 2 | 0/0 | 0/0 | 36/ |
0/ |
0/ |
0/0 | 0/ |
55/ |
0/ |
0/0 |
| ||||||||||
|
|
|
|
|
|
|
|
|
|
|
Child 1 | 0/41 | 36/100 | 0/0 | 0/0 | 0/ |
0/0 | 0/ |
0/0 | 0/ |
0/0 |
Child 2 | 0/ |
0/0 | 0/0 | 36/ |
0/ |
0/0 | 0/ |
70/ |
34/34 | 0/0 |
How the six model signs incorporated into the software should be posed.
Model signs and their associated sensor signals are displayed in rows with bold bounding boxes. The sensor signals that were generated when the children posed the signs are displayed in the cells below each model sign. The first of the two values in the cells with sensor signals generated by children, was detected in the “registration phase” of pulses. The second value was detected in the “constant phase.” Gray cells contain sensor signals that have been successfully matched with corresponding signals in a model sign. Preliminary analysis shows that the technology has the potential to recognize aspects of intentionally posed signs. When one studies the table further, one also find that the flex and tactile sensors on the ring finger generated data that was most consistent with signals in model signs.
When one studies Table
We will describe provide a more thorough analysis of the data presented in Table
In this section we will describe two experiments, which have been conducted to investigate if the devised technology (i) has the potential to identify intentionally posed signs, (ii) is robust enough to correctly evaluate signs posed by more than one child. We will also explain how the data in Table
To properly test the devised technology, we asked three different children one 7-year-old and two 5-year-olds to pose the six-model signs illustrated in Figure
However, before we go on to describe the results we have to point out that one of the three participants (one of the children aged 5 years) did not want to interact with the technology in any way. When conducting ethical experiments it is important to give participants the right to refuse to participate at any stage during the experiment as exercised by one child [
At the start of the first experiment we asked the children to put a sensor glove onto the right hand. We then asked them to pose the signs in Figure
In the second experiment, we compared the signals captured from different children as they intentionally posed the six hand shapes in Figure deviate with less than 30 units below or above the model values, when the model values are greater than zero. are not different when the model values are zero.
We use these boundaries because a study of the model signs show that it is possible to discriminate between the six model handshapes in Figure
Results from the experiments described in the last section are presented below. We start off by describing the data that was generated throughout the experiment that investigated if it is possible to identify intentionally posed signs.
To investigate if it is possible to identify intentionally posed signs by using the pulse concept described in Section
Summed signal values registered at the start and the end of pulses as each of the six signs were posed by participant 1.
Summed signal values registered at the start and the end of pulses as each of the six signs were posed by participant 2.
To investigate if the devised technology is robust enough to correctly evaluate signs from more than one child, we compared the signals registered as the children posed the six hand shapes shown in Figure
Outcome when the allowed deviation from a model sign was zero units.
The number of signal pairs that did, and did not, satisfy the constraints when the allowed deviation from a model sign was 30 units below or above the specifications of the model sign is shown in Figure
Outcome when the allowed deviation was 30 units above or below a model sign.
Results from this experiment show that a total of 52 of the 60 signal pairs, were similar enough to be correctly evaluated. We therefore regard it as being possible to correctly evaluate signals in signs posed by different children. However, the results also show that eight of the 60 signal pairs were so different that it is impossible to correctly evaluate at least one of the signals in the pair by using the current system specifications.
An analysis of why the differences between these eight signal pairs are so great, indicates that the differences arise because the children were unable to wear the gloves in the exact same way. The children had considerable difficulty in putting the glove on the individual fingers and pulling it up to the correct position. The other reason for the differences is that the children were of different age, and therefore had quite different hand sizes. This turned out to be a problem because the sensors ended up being distributed differently onto the hands of the different children.
This paper has described how to construct a set of sensor gloves, which could potentially be used as a component in a system that can provide feedback to children learning Auslan sign language from a computer. Experiments showed that the devised technology can (i) identify intentionally posed signs, and (ii) correctly evaluate signals in signs posed by different children. It is therefore worth pursuing this research further and extending the research to address other aspects of sign language including movement, hand orientation, and the location of where the sign is made relative to the body. Furthermore, work should be conducted before the technology can be used to teach children Auslan sign language in an accurate and efficient way. Some of the issues that should be addressed include (i) how to redesign the gloves to reduce the discrepancy between signals registered from different children, and (ii) how to devise a function, which provides intuitive feedback that can be used to guide children in the process of reducing the discrepancy between posed signs and model signs. A learning system could only be developed if the feedback that was given was timely and accurate for a wide range of learners.