Bionics solution to learn the arm reaching with collision avoidance

Abstract: This article presents a learning model that simulates the control of an anthropomorphic arm kinematics motion. The objective is to reach and grasp a static prototypic object placed behind different kinds of obstacle in size and position. The network, composed of two generic neural network modules, learns to combine multi-modal arm-related information (trajectory parameters) as well as obstacle-related information (obstacle size and location). Our simulation was based on the notion of Via Point, which postulates that the motion planning that is divided into specific successive position of the arm. In order to determine these special points, an experimental protocol has been built and pertinent parameters have been integrated to the model. According to these studies, we propose an original method that takes into account the previous learning modules to determine the entire trajectory of the wrist in order to reach the same object placed behind two successive obstacles. The aim of this approach is to understand better the impact of experience in a task realisation and show that learning can be performed from previous initiation. Some results (applied to obstacle avoidance task) show the efficiency of the proposed method.


INTRODUCTION
Since the past twenty years, motion planning is one of the most important research objectives in the robotics field.More specifically, the learning of the reach-and-grasp of an object by a robot is particularly difficult when an obstacle is placed between them.Although several studies have shown the coupling of arm and finger movements during prehension Jeannerod (1981Jeannerod ( , 1984)), Bootsma et al (1994), the reach-motion planning in a cluttered environment may be handled separately.Thus, in this study, we consider the kinematics motion planning of an arm avoiding obstacles.
We can classify the different methods to deal with this problem.The first one called "local", which uses only incomplete information of the environment, and is usually implemented in an "on-line" algorithm.In this case, the robot checks up potential collisions during the robot motion and activates a matched strategy to avoid the obstacle.Several authors worked on path-planning problem with obstacles and proposed a large number of methods, such as potential-field Khatib (1985), Krogh Corresponding Author: P. Gorce LESP EA 31-62, Universit é du Sud Toulon Var Avenue de l'universit é 83957 Lagarde, France Email: gorce@univ-tln.frand Thorpe (1986), Koren and Borenstein (1991), wallfollowing method Bauzil et al (1981) (that continues to follow the obstacle's contours until it has passed by the obstacle), or goal-oriented recursive path planning method Noorhosseini and Malowani (1992) (tries to find the longest straight path segment).
The "global" method uses complete information about the workspace and considers the whole freedom of the manipulator.In this case, the collision-avoidance algorithms can be classified into an "off-line" algorithm, where the motion planning is carried out before the robot motion.Concerning off-line algorithms (Lei 1999), the principle of distance maps is to divide the space by grids with equal distance (Latombe 1991, Jarvis 1993).Many solutions have been proposed in the literature; for example, a differential geometry method based on the kinematics (Chirikjian and Burdick 1990) or methods based on a potential function around the obstacles (Khatib 1986, Volpe and Khosla 1987, Khosla and Volpe 1988).
This paper presents a learning model that treats an anthropomorphic arm-motion planning.The goal of this learning is to reach and grasp a static prototypic object placed behind an obstacle of varying position and size.For that, two generic neural network modules have been used (Carenzi et al 2004).These two learning models realize the correlation of multi-modal information: arm-related information (trajectory parameters) and obstacle-related information (obstacle size and location).
We based our simulation on the notion of Via Point, previously observed by Johansson et al (2001), which postulates that the reach-motion planning is divided into specific successive position of the arm from gaze information.In this study, as we have no gaze information, we show that elbow and wrist positions and obstacle characteristics are sufficient to avoid the obstacle.
For that, we present an experimental protocol, which makes possible the extraction of several specific data to be integrated into the generic learning model.The neural-network architecture is used to determine the total trajectory of the arm in reaching and grasping tasks while avoiding the obstacle.
According to these studies, we propose an original method, which takes into account the previous learning modules.The goal of this method is to determine the entire trajectory of the wrist in order to reach the object placed behind two successive obstacles.
This paper is organized as follows.In "Experimental Procedure" section we present the method used to obtain the different expected data.The third section is devoted to the presentation of the neural-network architectures retained to perform the reach-and-grasp learning.In the fourth section, we present the learning and simulation results demonstrating the tools efficiency.Finally, we present the neural-network architecture and simulation results of the novel approach.

Materials and methods
Seventeen healthy subjects (age, 22-34 years), 12 males and 5 females, were volunteers to participate.Subjects were seated in a chair facing a table and were instructed to reach and grasp a prototypic object (box) placed behind an obstacle of varying position and size.It was a compact block (8 × 8 × 8 cm) placed at 50 cm from the wrist initial position P0 (see Figure 1).Three obstacles of different sizes (10 × 35 × 10, 10 × 35 × 15, 10 × 35 × 20 cm) were placed randomly at 25 cm (P2 position) or 40 cm (P1 position) from P0.Then, subjects had to perform six different tasks.
Each subject executed all randomised tasks.Movements were recorded using the Vicon system from markers placed on the shoulder, elbow and wrist articulations.The trajectories of each marker are reconstructed by special software.Subjects were allowed to reach the object turning round the obstacle or from the top of the obstacle but only these last trials were taken into account.

Trajectory analysis: via point underlining
The location of the obstacle between the initial hand position and the target object required the subjects to produce greater vertical elevation during the reaching movement.This phenomenon affects the wrist and the According to the trajectory description, we can notice that the maximum vertical height achieved by the wrist happened at 42 ± 1% times relative to the total time of the task independent of the obstacle location and size.This trajectory parameter characteristic represents the first via point position.
In order to take into account the reach path curve depending on the height of the obstacle, we determine the second and the third via point position, respectively, at 20% and 70% of the total task time (see Figure 2).With these three characteristic positions, we can reproduce the experimental trajectories with the use of a spline interpolation.
In relation with these time characteristics, we define the corresponding Elbow via Point location as shown in Figure 3.

Final position analysis
The final position gives some relevant information concerning the elbow and wrist position in relation to the obstacle position and size (see Figure 4).In fact, we notice the existence of a safety distance between the elbow and the obstacle enabling the obstacle avoidance.The following figures represent the elbow trajectory in the obstacle frame and show that the final vertical distance between the elbow and the obstacle is invariant whatever may be the height of the obstacle: Then, the final elbow as well as the final wrist positions can be determined only according to the obstacle position and height.At this stage, we have determined five characteristic positions (see Table 1) of the upper limb for each condition which permit the path planning: the initial position, the first heuristic via point, the via point corresponding to the maximal height of the wrist, the second heuristic via point and the final position.
In the next section, we will present the neural network used to learn the final-arm configuration and the generalization of the trajectory parameter determined in this section.

Architecture
The neural network learning algorithm is based on the locally weighted projection regression (LWPR), used for incremental learning of nonlinear functions (Vijayakumar andSchaal 2000, D'Souza et al 2001).It uses locally linear models, spanned by a small number of univariate regressions in selected directions in input space, to achieve a piecewise linear function approximation.
The region of validity, called a receptive field, of each linear model is computed from a Gaussian function: where c k is the centre of the kth linear model, and D k corresponds to a distance metric that determines the size and shape of the validity region of the linear model.Given a query point x, every linear model calculates a prediction y k (x).The total output of the learning system is the normalized weight mean of all linear models: The main capability of the generic neural network is to learn multi-modal sensory-motor relations independent of the nature of sensory signals.In this paper, we focus on a net of two neural networks able to generate the entire trajectory avoiding collision from the position and size of the obstacle and of the object to grasp.
The first learning module is dedicated to the learning of the final upper limb configuration, whereas the second module is devoted to the learning of via points of the trajectory.Figure 5 illustrate the net of the neural network.

Learning of the final upper limb configuration
The learning of the arm configuration requires several steps.The first step concerns with the determination of the end effector position and orientation.The wrist position and orientation relative to the object are pre-defined.
According to the study of Tolani et al (2000), we have computed a robust algorithm that can determine the inverse kinematics and deal with the redundancy problem.In fact, we can determine only one arm configuration considering the desired elbow position as a constraint.This desired position is coming from the experimental data as shown in "Experimental Procedure" section.
With these two conditions, the final arm configuration avoiding the obstacle is determined and the learning can be performed (see Figure 6).
The following scheme explains the learning procedure.
In the following section, we will present the results of the learning described in this paragraph and illustrate the efficiency of the tool through some modelisation results.

Learning of the via point-generated trajectory
The generalization of via point comes from the experimental data.A training point matrix containing the via point position and the corresponding obstacle height and position has been established.Then, the neural network learns to generalize the via point position.The next figure shows the algorithm of this learning (see Figure 7).

Learning curve results: final arm configuration and via point
In this section, we present the simulation results that concern the learning of the arm configuration avoiding the obstacle and the learning of the trajectory parameters as via point (see Figures 5 and 6). Figure 8 shows, respectively, their learning curves (mean squared error (MSE)) over 6000 and 9000 training epochs.
To evaluate the efficiency of the neural network, we have calculated the mean positional error corresponding to the norm of the vector from the desired position to the actual position, with a test of 5000 different configurations, after having specified the obstacle position and size.These mean error values are 5.1 ± 1.0 and 7.2 ± 1.6 mm for the elbow and the wrist, respectively.In addition, it is worth noting that the safety distance given by the predicted model is approximately equal to the one given by experimental data: The mean error value of the wrist position is equal to 20 ± 6 and 15.5 ± 3 mm for the elbow in the six conditions.
Figure 10(a) shows the wrist trajectory given by the predicted model and the experimental one for the higher obstacle heights in P1 and P2 position.

Modelisation results
In this section, we present the modelisation results.As a core, an anthropomorphic arm model has been created.The 7 degrees of freedom (DoF) arm model is composed of two segments (arm and forearm) linked by three joints as shown in Figure 11.The shoulder joint has 3 DoF (q 1 , q 2 , q 3 ), the elbow joint has 1 DoF (q 4 ) and the wrist joint has 3 DoF (q 5 , q 6 , q 7 ).
Thus, the configuration of the arm is completely defined by the vector of the joint angles q = (q 1 , q 2 , q 3 , q 4 , q 5 , q 6 , q 7 ) T .
The last figures illustrate the ability of our learning model.Hand shape is appropriate for grasping the block, the wrist position and orientation relative to the object has been pre-defined.
The inputs of the model are the obstacle position and height and the object position.
With the use of the two neural nets learned, the simulator is able to generate a trajectory that can avoid the obstacle.
Figures 12(a) and (b) represent the final posture of the upper limb at the end of the grasping movement.We have represented the elbow and wrist evolution in order to avoid the obstacle.

P1 P2
Comparison between the predicted model and the experimental data

REACH TRAJECTORY LEARNING AVOIDING TWO SUCCESSIVE OBSTACLES
In this section, we propose an original method that takes into account the previous learning modules to determine the entire trajectory of the wrist in the case of reaching an object placed behind two successive obstacles of different kind of size and location (see Figure 13).
The approach aims at understanding the impact of experience in a task realisation by showing that learning can be performed from previous initiation.

Architecture and algorithm of the new neural network
This learning requires multiple steps.The input of the model is a vector of size 3 containing the obstacle 1 and 2 on the x-axis and height.For each obstacle position, we can determine the associated via point, as shown in "Neural network model" section, which enables the determination of the entire trajectory.
At this step, we have defined three via point for each obstacle position and height.We compare the Z value of each via point and define as output of the model the four characteristics points: • the first via point Vp1 corresponds to the higher one of the two obstacles, • the second via points related to the maximum of each obstacle, and • the fourth via point Vp4 refers to the same condition as the first via point.
The next figure (see Figure 14) shows the algorithm of this learning.
After learning (when we specify the obstacle positions and heights), the model gives us the four parameters describing the entire reaching trajectory towards the object.

Via point learning curve
Here we present the via point determination learning curve.Figure 15 shows the learning curve (MSE) over 18,000 training epochs.
To evaluate the efficiency of the neural network, we have calculated the mean position of each via point determined by the predicted model and the forward model.For the four via points, the error is 5.5 ± 1.2, 4.1 ± 0.8, 10.1 ± 1.5 and 4.5 ± 0.4 mm, respectively.Figure 17 represents the wrist trajectory in the XZ plane given by the prediction for the 15 and 20 cm obstacle heights in P1 and P2 position.
Figure 18 represents the wrist trajectory in the XZ plane given by the learned model for a 14 cm obstacles height at a distance of 5 cm.
Although the distance between the two obstacles is small, we can notice that the wrist follows the first trajectory up to the maximum and then down to the final position.

Modelisation results
The last figures illustrate the capability of our learning model.As previously explained, hand shape is appropriate for grasping the block, the wrist position and orientation relative to the object has been pre-defined.
The inputs of the model are the position and height for the two obstacles and the object positions.
Using the predicted model, the simulator is able to generate the entire trajectory in order to avoid the two obstacles (see Figure 19).

CONCLUSION
In this article, we have presented an original approach that is integrating several parameters from experimental data in a generic neural network for path planning tasks.
According to the notion of via point and obstacle-related information such as obstacle size and location, the trajectory is determined after learning in order to reach and grasp a static prototypic object placed behind an obstacle of varying position and size.This method consists in combining two neural networks.The first is dedicated to the learning of the final upper limb configuration avoiding the obstacle, whereas the second is devoted to the learning the trajectory parameters (Via Points).Both of these information are provided by the experimental data analysis and integrated into the two different learning modules.
Several results show the efficiency of the tool and an original method that has taken into account the previous learning modules.The aim of this method was to determine the trajectory of the wrist in order to reach the same object placed behind two successive obstacles.
The different results have shown that the learning can be performed from previous initiation.This approach has allowed a better understanding of the impact of experience in a task realisation.
Moreover, the use of this net of generic neural network has shown that we can treat more complex situations.In the future work, we will integrate other learning modules as the determination of the hand shape or as the wrist orientation and position relative to the object.

Figure 1
Figure 1 Experimental protocol representation.elbow trajectories.Figures 2 and 3 represent respectively the wrist and the elbow trajectory for the six conditions (obstacle height: 10, 15, 20 cm; Position P1 and P2).According to the trajectory description, we can notice that the maximum vertical height achieved by the wrist happened at 42 ± 1% times relative to the total time of the task independent of the obstacle location and size.This trajectory parameter characteristic represents the first via point position.In order to take into account the reach path curve depending on the height of the obstacle, we determine the second and the third via point position, respectively, at 20% and 70% of the total task time (see Figure2).With these three characteristic positions, we can reproduce the experimental trajectories with the use of a spline interpolation.In relation with these time characteristics, we define the corresponding Elbow via Point location as shown in Figure3.

Figure 2 Figure 3 Figure 4
Figure 2 Temporal wrist elevation and via point specification.

Figure 5
Figure 5 Net of neural nets.

Figure 6
Figure6Algorithm for the arm configuration learning.

Figure 9 Figure 7 Figure 8 Figure 9
Figure9(a) presents the wrist trajectory given by the predicted model and the experimental model for the three obstacles height in P1 position.Figure9(b) shows the predicted elbow trajectory in P2 position for the higher obstacle and an arbitrary case trajectory (position = 0.3; height = 0.16).

Figure 13
Figure 13 Via point position determination.

Figure 14 Figure 15
Figure 14 Algorithm for the via point position learning.

Table 1
Trajectory characteristic position