Research on Visual Servo Grasping of Household Objects for Nonholonomic Mobile Manipulator

This paper focuses on the problem of visual servo grasping of household objects for nonholonomic mobile manipulator. Firstly, a new kind of artificial object mark based onQR (Quick Response) Code is designed, which can be affixed to the surface of household objects. Secondly, after summarizing the vision-based autonomous mobile manipulation system as a generalized manipulator, the generalized manipulator’s kinematic model is established, the analytical inverse kinematic solutions of the generalized manipulator are acquired, and a novel active vision based camera calibrationmethod is proposed to determine the hand-eye relationship. Finally, a visual servo switching control law is designed to control the service robot to finish object grasping operation. Experimental results show that QR Code-based artificial object mark can overcome the difficulties brought by household objects’ variety and operation complexity, and the proposed visual servo scheme makes it possible for service robot to grasp and deliver objects efficiently.


Introduction
A classical mobile manipulator system (MMS) consists of a manipulator which is mounted on a nonholonomic mobile platform.This type of arrangement extends manipulator's workspace apparently and is widely used in service robot applications [1,2].The development of MMS mainly involves two classical items, namely, motion planning [3][4][5][6][7][8] and coordinating control [9][10][11][12][13], which are used to overcome the mobile platform's nonholonomic constraint and make the MMS move quickly and efficiently.
When robots operate in unstructured environments, it is essential to include exteroceptive sensory information in the control loop.In particular, visual information provided by vision sensor such as charge-coupled device (CCD) cameras guarantees accurate positioning, robustness of calibration uncertainties, and reactivity of environmental changes.Much of the work related to CCD cameras and manipulators has focused on the applications about the manipulator's visual servo control, which specifies robotic tasks (such as object grasping, assembling) in terms of desired image features extracted from a target object.The overview of visual servo can be seen in literature [14][15][16].In general, visual servo approaches can be divided into three different kinds, namely, position-based visual servoing (PBVS) [17,18], image-based visual sevoing (IBVS) [19,20], and hybrid visual servoing (HYBVS) [21][22][23].In PBVS, the feedback signals in vision loop are the intuitive relative 3D pose between current and desired cameras estimated by current and desired image features using homography matrix or fundamental matrix estimation and decomposition.In IBVS, the feedback signals are image features whose changing velocities are related to the velocity twist of the camera via the image jacobian matrix (also called interaction matrix).Compared with the PBVS, IBVS is robust to perturbations of the robot/camera models and can maintain the image features in the field of view (FOV) of the camera through path planning [24].But the drawbacks are also obvious; the interaction matrix's depth information needs to be estimated and only local stability can be guaranteed for most IBVS schemes.To combine the advantages of PBVS and IBVS, HYBVS is proposed.In HYBVS, the feedback signals consist of relative 3D pose and image features, the former is used to control a subset of the camera configuration vector while the latter are used to regulate the remaining camera configuration vector.
Relating CCD cameras and the mobile robots lead to the applications of vision-based autonomous navigation control.Ma et al. [25] have developed a vision-guided navigation system, where a nonholonomic mobile robot tracks an arbitrarily shaped continuous ground curve.Dixon et al. [26] present an adaptive tracking controller of a wheeled mobile robot via an uncalibrated camera system that the controller copes with the parameter uncertainty of the mechanical dynamics and the camera system.Amarasinghe et al. [27] have developed a vision-based hybrid control scheme for autonomous parking of a mobile robot that its controller consists of a discrete-event controller and a pixel-errordriven proportional controller.Vassallo et al. [28] present a similar project where a vision-based mobile robot attempts to autonomously navigate in a building.
The newest trend is to integrate CCD cameras into mobile manipulator to form a vision-based mobile manipulation system (VBMMS).Thanks to the capabilities of the vision subsystem, the VBMMS can work in an unstructured environment and has wider applications than a fixed-base manipulator and a mobile platform.Due to the lack of accurate and robust positioning performance of VBMMS, very few physical implementations have been reported.de Luca et al. [29] have considered the task-oriented modeling of the differential kinematics of nonholonomic mobile manipulators and has developed an image-based controller for VBMMS, but their approach is illustrated through simulation, not physical implementation.Mansard et al. [30] have attempted to control a humanoid robot to grasp an object while walking.Wang et al. [31] have developed a robust vision-based mobile manipulation system for wheeled mobile robots.In their research, an innovative controller with machine learning using Q-learning is proposed to guarantee visibility of visual features in servo process.
This paper presents a physical implementation of VBMMS in service robot intelligent space.It consists of two basic contributions.First, after summarizing the VBMMS as a generalized manipulator, the kinematics is analyzed analytically and an active vision-based camera calibration method is proposed to determine the hand-eye relationship consequently.Second, a novel switching control strategy is proposed which switches between eye-fixed approximation and position-based static look-and-move grasping.The remainder of the paper is organized as follows.Section 1 will introduce the design of the QR Code-based artificial object mark.In Section 2, the VBMMS is summarized as a generalized manipulator, then the kinematics, inverse kinematics, and hand-eye relationship determination are discussed.In Section 3, the switching control strategy which switches between eye-fixed approximation and static lookand-move grasping is designed.Two experiments will be presented in Section 4 to validate the designed switching controller.Conclusions will be drawn in Section 5.

Design of QR Code-Based Artificial Object Mark
As shown in Figure 1, the QR Code-based artificial object mark is composed of two parts: the internal information representation part which includes the object's property and operation information and the blue concentric ring region which is called external identification part.Due to the facilitated detection of the external identification part, the mark can be recognized from complex home environment rapidly using vision sensor.The coding of information stored in internal information representation part of the mark is shown in Table 1.
It can be seen from Table 1 that there are two different kinds of information, namely, object's property information and object's operation information.The object's property information includes name, serial number, sizes, and material, in which the properties of name and serial number are used as the unique identification of an object and the property of sizes is to let robot know the gripper's opening degree.The object's operation information includes operation force, position, and orientation, which can assist the robot in finishing grasping operation in an appropriate way.

Kinematics, Inverse Kinematics, and Hand-Eye Relationship Determination of VBMMS
As shown in Figure 2, the eye-in-hand type VBMMS is combined with a nonholonomic tracked mobile robot (TMR), a 4-DOF (Degree of Freedom) manipulator, and a CMOS camera.The Grandar AS-RF type TMR uses differential drive structure and is easier to carry out steering control.The Schunk Powercube 4-DOF manipulator is mounted on the TMR, and the Gsou V80 CMOS camera is mounted on the manipulator's end-effector.In addition, the VBMMS is equipped with an onboard computer, whose computational capability can support real-time performance of the system.3.1.Kinematics.Due to the nonholonomic constraint of the TMR and the nonredundancy of the 4-DOF manipulator, using VBMMS to complete grasping task is very difficult, and so far, barely no related works can be found.Take the difficulties of controlling the TMR and manipulator separately into account, the VBMMS is summarized as a generalized manipulator shown in Figure 2.For the generalized manipulator, the TMR is considered as a 3-DOF (rotationtranslation-rotation) manipulator and the manipulator's first degree is aborted because of its coincidence with the TMR's third degree.In that case, the generalized manipulator has six degrees, which guarantees that the end-effector can approach an arbitrary position at any pose.Table 2 illustrates the modified D-H parameters of each link for serial-link manipulator, where   ,   ,   ,   mean twist, length, angle, and offset of the th link, respectively.In addition to the last column of the table, R is for revolute while T is for prismatic.For the prismatic joint,   is the variable whose value range is   > 0, while for the revolute joint,   is the variable whose value range is − ∼ .
It is well known that the symbol  J can be used to note the generalized manipulator Jacobian matrix expressed in endeffector coordinate frame {} that transforms velocities in joint space to velocities of the end-effector in Cartesian space.For the 6-DOF generalized manipulator, the end-effector Cartesian velocity is where the 6-vector represented by is the end-effector's Cartesian velocity with respect to its own frame {}.Based upon the resolved transformation matrix −1 T  and 6 T  ,  T  can be determined as for prismatic joint corresponding  = 2; the th column of  J can be constructed as follows: for revolute joint corresponding  = 1, 3, 4, 5, 6; the th column of  J: The  J can finally be constructed as follows: 3.2.Inverse Kinematics.From ( 2)-( 13) we can achieve that the manipulator has 6 joints to allow arbitrary end-effector pose.In order to reach the specified end-effector position, the inverse kinematic solutions can be acquired through separating the unknown variable  1 ,  2 ,  3 in (2)- (13).
Given an arbitrary end-effector pose, whose orientation is expressed by RPY description method, the four sets of inverse kinematic solutions can be solved by (19), as shown in Table 3.
After solving all the solutions, they can be processed so that they verify the joints' value ranges, and then the optimal set of solution can be acquired using a certain optimality criteria such as the shortest path.

Hand-Eye Relationship Determination.
Hand-eye relationship determination is a key issue in visual servo control of a robot hand-eye system, and much work has been done.In this paper, we propose a novel method which is based on Zhang's camera calibration method and tensor theory.As we know, Zhang's algorithm proposed in 1999 is very representative because of its easy use, flexibility, and high accuracy [32].The algorithm only requires the camera to observe a planar pattern (such as planar checkerboard) shown at a few different orientations.In consequence, a closed-form solution of camera intrinsic parameter matrix K and the pose of planar object's coordinate frame {} with respect to the camera frame {} noted as  T  can be computed, which are followed by a nonlinear refinement based on the maximum likelihood criterion so as to improve the accuracy.
Figure 3 shows the scheme of hand-eye relationship determination, where {}, {} are the current end-effector frame and current camera frame, respectively.
From Figure 3, there is In ( 20),  T  =   T   is the unknown hand-eye relationship,  T   can be known from the known motion of the manipulator, and  T  and   T  can be known from Zhang's method.
Note that By substituting ( 21) into (20), one obtains It can be seen from ( 22) that once we solve the unknown R, then t will be solved easily with (23).On the other hand, ( 22) can be summarized as the form of AXB = X, considering a  ×  matrix X to be a second-order mixed tensor whose element of th row and th column is noted as    ; the equation AXB = X can be written as follows: where ,  are free indexes, and ,  are dummy indexes.Take Kronecker into account: Equation ( 24) can be written as According to the different values of the free indexes  and , we can acquire  ×  sets of equations from (26).Let x = [x 1 , x 2 , . . ., x  ]  , where x  is the th row vector of Matrix X. Equation ( 26) can finally be written as where H ∈  (×)×(×) , and its element ℎ Consider that there are  sets of known manipulator movement, H  , corresponding to the th movement, therefore we obtain Based upon least-squares solution of a homogeneous system of linear equations, x is the last column of V, where B = UDV  .Note because R is a rotation matrix whose Frobenius norm is √ 3 and det(R) > 0, we can determine R as Till then, the unique R verifying constraint det(R) > 0 can be distinguished from (31).Furthermore, we choose R  = UV  , where R = UDV  as the final solution of (22).
After solving R, the unknown t can also be solved using the next equation:

Design of Visual Servo Switch Control Scheme for Grasping
The visual servo control scheme designed in this section consists of two steps: eye-fixed approximation of the household object and static look-and-move grasping.Once the VBMMS is commanded to grasp a household object that is in its camera's FOV, the VBMMS starts to approximate the object with its eye gazing it.When the distance between VBMMS's camera and household object reaches a certain value, the process of approximation switches to the static look-andmove grasping.
4.1.Eye-Fixed Approximation. Figure 4 illustrates the scheme of eye-fixed approximation.In Figure 4,  00 and  * 00 correspond to current image and desired image, respectively, and both are called zero-order moment of the blue concentric ring region to the object mark. m and  * m are called nonhomogeneous projective coordinate of the center of blue concentric ring region to the object mark, which also correspond to current image and desired image, respectively.
It is well known that for nonhomogeneous pro- time derivative  ṁ is linearly related to the joints' velocity q through the interaction matrix L  m : where P = [  ,  ,  ]  is the coordinates of a 3D point relative to current camera frame,  R  ,  t  are the results of hand-eye relationship, and  J(q) is the generalized manipulator's Jacobian in end-effector coordinate frame {}.For the unknown variable   in L  m , its estimation  Ẑ can be chosen as follows: where  *  is the constant depth of the 3D point relative to the desired camera frame.
We partition the interaction matrix so as to isolate the second degree of freedom of the generalized manipulator.Note the following: in which diag (⋅) is a diagonal matrix, L  only includes the second column of L  m , and L  ⊥ includes the first, fourth, fifth, and sixth columns of L  m .We get Note  2 =  00 −  * 00 , where  is a constant coefficient; a simple approximation control law can be acquired by using feature  2 , as where sgn(⋅) is sign function and  1 is a time-invariant coefficient.Note e 1 =  m −  * m; let e 1 be exponential convergent ė 1 = −e 1 .There is The vector e 1 + L  q  can be considered as a modified error that incorporates the original error e 1 while taking into account the error that will be induced by robot motion q  .Under the influence of control input shown in (37) and (38), the VBMMS approaches to object with a constant speed until  2 = 0, and during the approximation process, the object is maintained in the FOV from beginning to end.

Static Look-and-Move
Grasping.Figure 5 illustrates the scheme of static look-and-move grasping.
In Figure 5, 3D object point P is on the plane  whose normal is n.Meanwhile, there are three different camera frames: the desired camera frame { * }, the initial camera frame {}, and the current camera frame {  }, that is, after a manipulator's known movement from q  to q  .The homogeneous transformation   T  can be computed as where  T  is the hand-eye relationship, 0 T  is the initial end-effector's pose, and 0 T   is the current end-effector's pose.Our ultimate objective is to determine the pose of {  } relative to { * }, namely, R, t.In order to solve them, two steps are needed.
Step 1. Structure reconstruction of plane  with respect to camera frame {  }.
Take current camera frame {} as the reference frame, the plane  can be noted as   = [  n  ,  ] and point in frame {  } by Euclidean homography H  : where ≅ means that the H  transforms one vector into the other, up to a scale factor.Given at least four sets of image point correspondences   m  ↔  m  ,  = 1, . . ., , the H  satisfying ‖H  ‖  = 1 can be derived using direct linear transformation (DLT) method.
After solving H  , written as , there are Furthermore By solving (42),   can be determined.Finally, using we can acquire the structure of plane    with respect to camera frame {  }.
Step 2. Computation of R, t of camera frame {  } with respect to desired camera frame { * }.
and the H satisfying ‖H‖  = 1 can also be derived using DLT method if there are  > 4 sets of image point correspondences.
Adjust the solved H so that it satisfies det(H) > 0; there are where the scale factor  > 0. Furthermore, As we all know, the rotation matrix R , whose axis and angle are k and  can be described as Note e 3 = [0 0 1]  .Choose then Substitute Equation ( 50) means that the first two rows of matrix then the positive scale factor  and the rotation matrix R are To refine the solved R, apply singular value decomposition to it.As R = UDV  , choose R = UV  as the ultimate result.By substituting (52) and refined R into (45), the unknown t can finally be determined: Till then, the homogeneous transformation  * T   can be acquired: Furthermore, the transformation 0 T  * , which represents the pose of the desired end-effector frame { * } relative to the base frame {0}, can be computed as In (56), 0 T   can be known from the VBMMS's kinematic model established in Section 3.1, and the hand-eye relationship  T  has already been known from Section 3.3.Finally, we can acquire the VBMMS's control input q from the analytical inverse kinematics proposed in Section 3.2.

Experiments and Analysis
In this section, we will discuss two experiments: hand-eye relationship determination and the VBMMS's switch control scheme separately.

Experiment of Hand-Eye Relationship Determination.
In order to simplify the complexity of grid corner extraction, choose a model plane which contains a pattern of 4 × 4 squares as the calibration object; the size of each square is 22 mm × 22 mm.Attach an object frame {} to it; the upper left corner of the first square is selected to be associated to the origin point of {}, the  axis is perpendicular to the model plane pointing outward, the -plane of {} is aligned with the plane, so that points on the plane have zero -coordinate, and ,  axes are paralleled to the sides of the checkerboard's square, respectively.
Let the TMR be static; take six images of the plane under different orientations caused by several known manipulator movements.The images are shown in Figure 6, whose resolutions are all 640 × 480, and the corners are detected as the intersection of straight lines fitted to each square.Using Zhang's calibration method, the refined camera intrinsic parameter matrix K, together with the refined extrinsic parameters   T  ( = 1, . . ., 6) corresponding to the six images respectively, can be acquired: The known movements of the manipulator corresponding to six images are shown in Table 4.   From Table 4, the homogeneous transformation 0 T   can be computed using generalized manipulator's kinematic model.The computed 0 T   and corresponding   T  are shown in Table 5.
Combine image 1 and image 2 (noted as {12}).Using calibration method mentioned in Section 3.3, matrix H 1 ∈      As can be seen in Figure 7, the blue concentric ring region of the object mark can be detected robustly and conveniently using Gaussian model for color-based segmentation and Hough transformation for segmented region's ellipse fitting.
Choose  = 0.5, and the gain involved in the approximation control law (37)  1 = 0.01.The gain involved in the eyefixed control law (38)  = 0.15,  *  is set to be 25 cm while its true value is 31 cm; the computed eye-fixed approximation control input is given in Figure 8.
Figure 9 shows the images corresponding to the camera position {}, where the eye-fixed approximation process terminated, and the position {  }, where the VBMMS does a known movement from q = [0, 0, 0, 1.2820, −0.0240, −0.2175]  to q  = [0, 0, 0, 1.3894, −0.1964, −0.0416]  , together with the extracted corner corre-spondences, matched by RANSAC (random sample consensus) method.In addition to the corner extraction, the ROI (region of interest) is selected to be within the concentric ring.
The corner correspondences given in Figure 9 is leading to the homography H  satisfying ‖H  ‖  = 1.Using Step 1 mentioned in Section 3.2, the 3D structure of the object mark plane with respect to the camera frame {  } can be computed: (59) Figure 10 shows the images corresponding to the camera position {  } and the desired camera position { * }, together with the extracted corner correspondences matched by RANSAC.
Applying DLT to the corner correspondences, the homography H satisfying det(H) > 0 can be acquired; then using Step 2 mentioned in Section 3.2, the transformation   T  * can be computed.Finally the 0 T  * and the control input of the VBMMS can be determined from the inverse kinematics discussed in Section 3.2: q = [−0.18475.3829 0.1237 1.3305 −0.0058 −0.1220]  . (60) Affected by the computed control input, the VBMMS moved to a position corresponding to the desired camera frame { * }.Thereby, the visual servo grasping task is executed successfully.

Conclusion
It is nearly impossible for VBMMS to finish household objects' grasping and delivering operation without some prior knowledge (such as objects' color, texture, sizes, and localization) provided by people, not only for the household objects' variety and operation complexity, but also for the difficulties of the VBMMS's kinematic modeling and its nonholonomic constraint's handling.On the one hand, a new QR Code-based artificial object mark is designed, which can store object's property information and operation information and can be easily distinguished from complex family environment.On the other hand, in order to model the VBMMS, we summarize it as a generalized manipulator, followed by acquiring its analytical inverse kinematic solutions and determining the hand-eye relationship based upon active vision.Meanwhile, in order to deal with the VBMMS's nonholonomic constraint, a visual servo switching control law which is composed of eye-fixed approximation part and static look-and-move grasping part is designed.The proposed scheme can solve the household objects' grasping and delivering problem well, and makes it possible to let VBMMS-type service robot provide better housekeeping service.

Figure 2 :
Figure 2: Vision-based mobile manipulation system and its link coordinate frames.

Figure 5 :
Figure 5: Scheme of static look-and-move grasping.

Figure 6 :
Figure 6: Six images of a model plane under different orientation, together with the corners (indicated by red cross and blue square).

Figure 7 :Figure 8 :
Figure 7: Images of the target for the desired/initial camera position and object mark recognition using Gaussian model and Hough transformation.

Figure 9 :
Figure 9: Two images of the object mark corresponding to {} and {  }, together with the extracted corner correspondences.

Figure 10 :
Figure 10: Two images of the object mark corresponding to {  } and { * }, together with the extracted corner correspondences.

Table 1 :
Coding of information stored in QR Code-based artificial object mark.

Table 4 :
The known movements of the generalized manipulator.
5.2.Experiment of VBMMS's Switch Control Scheme.The images corresponding to the desired and initial camera position and the object mark recognition using Gaussian model and Hough transformation are illustrated in Figure7.

Table 5 :
The pose information corresponding to known movements of the generalized manipulator.