This paper applied speech recognition and RFID technologies to develop an omni-directional mobile robot into a robot with voice control and guide introduction functions. For speech recognition, the speech signals were captured by short-time processing. The speaker first recorded the isolated words for the robot to create speech database of specific speakers. After the speech pre-processing of this speech database, the feature parameters of cepstrum and delta-cepstrum were obtained using linear predictive coefficient (LPC). Then, the Hidden Markov Model (HMM) was used for model training of the speech database, and the Viterbi algorithm was used to find an optimal state sequence as the reference sample for speech recognition. The trained reference model was put into the industrial computer on the robot platform, and the user entered the isolated words to be tested. After processing by the same reference model and comparing with previous reference model, the path of the maximum total probability in various models found using the Viterbi algorithm in the recognition was the recognition result. Finally, the speech recognition and RFID systems were achieved in an actual environment to prove its feasibility and stability, and implemented into the omni-directional mobile robot.
For speech recognition, the dissimilarity between the signal characteristic values and the characteristic values in the database was calculated in early stages to identify the minimum difference as the recognition result. However, this method has a problem of poor recognition effect due to different talking speeds. Afterwards, some scholars proposed the dynamic time warping (DTW) to improve the recognition effect [
The ANN is a method often used in the artificial intelligence domain [
The direction of the voice controlled guide type omnidirectional mobile robot is controlled by voice, and the robot has the RFID guide system and infrared image tracking and ultrasonic obstacle avoidance functions [
Robot system hardware link.
The speech signals are preprocessed before speech recognition. The speech preprocessing contains sampling, frame, endpoint detection, preemphasis, and windowing. After the speech signal preprocessing, the characteristic feature parameters are identified for subsequent recognition calculation. In this paper, the Linear Predictive Coefficient (LPC) is used to deduce the cepstrum and delta-cepstrum as the most important feature parameters.
The concept of linear prediction originates from that the amplitude of a sampling point is related to the amplitude of an adjacent sampling point during pronunciation. If the postsampling sequence of speech signals is
After determining the LPC, the cepstrum coefficient is deduced from the LPC [
The states and frames are separated averagely from the audio part of a segment of speech according to the preset HMM state number, and the feature vectors in the frames are used to calculate the mean value
In order to obtain the correct relationship between frame and HMM state more accurately, this paper uses a Gaussian probability function [
The HMM can be represented by
The Gaussian probability density function determines the probability value between frame and state. The HMM has many optional paths for state transition, and the path with the maximum total probability value among all possible paths is required to be found. This paper uses the Viterbi algorithm [
Initializing
Recursing
Terminating
Path backtracking
After the new relationship between state and frame is obtained using the Viterbi algorithm, the mean value and variance in old state are updated, and the Gaussian density function is used to determine the updated probability between state and frame again. The new total probability value is obtained using the Viterbi algorithm. The update continues until the maximum total probability value is converged, and this is the reference model after training.
The needed commands are trained into models, which serve as reference database of speech recognition. The feature parameters are determined according to previous procedure during recognition. The reference models of database are compared using the Viterbi algorithm to determine the probability value of each model and find the optimal state sequence. The time warping of speech signals is solved automatically when corresponding to a sequence of frames to the state sequence. The key point in the speech training procedure is to identify the correlation between frame and state. The relationship between frame and state should be updated by continuous path backtracking of Viterbi, until the path with the maximum total probability is determined. The most important step in the recognition procedure is to compare the reference models of training and obtain the maximum total probability value in reference models.
Figure
System operation flow of voice controlled guide type omnidirectional mobile robot.
Omnidirectional mobile robot.
We place the robot in the actual environment and test various moving actions (forward, backward, turn left, turn right, stop, and turn back). The voice control of speaker dependent and speaker independent are tested by five users, respectively, and the experimental results of speech recognition rates are shown in Table
Recognition rates for the speaker dependent and speaker independent.
Speaker dependent | Recognition rates |
---|---|
Chun-Yuan | 96.7% |
Jian-Min | 93.3% |
Yi-Chung | 90% |
Wei | 96.7% |
Hung-Hui | 93.3% |
Average recognition rates |
|
|
|
Speaker independent | Recognition rates |
|
|
Jason | 66.7% |
Ian | 73.3% |
Andy | 90% |
Momo | 83.3% |
Apple | 50% |
Average recognition rates |
|
User using speech to control robot to move forward and turn left. (a) User commands robot to move forward. (b) User commands robot to stop. (c) User commands robot to turn left. (d) Robot turns left and moves on.
Robot guide experiment. (a) User commands robot to move forward. (b) Robot detects tag and asks user whether he needs any introduction to the place or not. (c) User says YES. (d) Robot plays video.
This paper used the HMM-based speech recognition method to complete a voice controlled guide type omnidirectional mobile robot. The first convenience of voice control is that the operation does not require manual operation, which makes the robot more user-friendly. The guide system based on RFID technology enables the users to know the information of an unfamiliar environment quickly. Finally, the robot movement experiment and the robot guide system experiment proved the feasibility and stability of this voice controlled guide type omnidirectional mobile robot.
The authors declare no conflict of interests.
The financial support of this research by the National Science Council of Taiwan, under Grant no. NSC-100-2221-E-167-004 is greatly appreciated.