A vision-based method for autonomous landing of a rotorcraft unmanned aerial vehicle

This article introduces a real-time vision-based method for guided autonomous landing of a rotor-craft unmanned aerial vehicle. In the process of designing the pattern of landing target, we have fully considered how to make this easier for simplified identification and calibration. A linear algorithm was also applied using a three-dimensional structure estimation in real time. In addition, multiple-view vision technology is utilized to calibrate intrinsic parameters of camera online, so calibration prior to flight is unnecessary and the focus of camera can be changed freely in flight, thus upgrading the flexibility and practicality of the method.

landing point.As the major sensor for the main task undertaken by a rotor-craft UAV, this vision sensor can achieve such tasks as monitoring and shooting in air, and can also provide a natural sensing modality for object detection and landing.Vision-based autonomous landing technology has recently become an active topic of research (Shakernia et al. 1999;Sharp and Shakernia 2001;Shakernia and Vidal 2002;Saripalli and Montgomery 2002;Saripalli et al. 2003).Shakernia and Vidal (2002) have worked out the UAV motion estimation method of multiple views that makes use of the rank deficiency of the multiple-view matrix , and Sharp and Shakernia (2001) have also worked out a motion estimation method based on a dual view.Still, for Saripalli, it is an estimation method to enable a helicopter to land on moving targets that is based on vision techonology in which moments of inertia are applied to identify targets.The Kalman filter is used in target tracking, and a track controller is employed to enable the helicopter to land on a target (Saripalli and Sukhatme 2003).
Our work has been inspired by the work mentioned above, with our focus being on upgrading the flexibility and practicality of the vision method.We propose a unique and simple idea for designing and identifying a landing target, thus obtaining a frame rate of 25 Hz.Another improvement is that calibration can be done online.In practice, we need to change the focus of the camera during the searching and tracking task.At the same time, in order to pose estimation, the value of focus must be determined.It is a problem for most cameras to return to the exact value of focus after being changed.Multiple-view vision is implemented to solve this problem.So there is no necessity for the camera to be calibrated prior to takeoff, and its focus can be changed in flight.Here, how to design the landing pattern is to be taken into consideration so as to provide enough corners for calibration.The UAV used in our work is a mini rotor-craft (SHU-XY-UAV 2) developed by Shanghai University, which is equipped with GPS, gradienter, magnetic compass, angular-speed top, and a two-degree-of-freedom (pitch and yaw) cradle head whose rotating angular of the pitch and yaw can be obtained through a photoelectric encoder.The article focuses on research of the vision system, while the flight control system is ignored.

AUTONOMOUS LANDING STRATEGY
The designed autonomous landing of our mini rotor-craft UAV is divided into three stages.
Stage 1: To get back to the place near the landing point through the GPS navigational system, then hover above the landing location at a height of 5-8 meters and search for a landing target.The focus of the camera would be fixed during landing.
Stage 2: To select five images selected from the sequence of images in which the target is included, with a distinct angle difference.The angle difference is determined using information from a gradienter and a magnetic compass.Then intrinsic calibration of the camera will be done.
Stage 3: To identify the target and estimate pose, then land gradually according to the guide on the pose information from the vision system.The workflow of this vision method is shown in Figure 1.

Designing target
Pattern design of the target is done to make it easier to identify and obtain image coordinates of the corners, providing enough corners for calibration is also considered.In Figure 2, there is a black square area in the white background in which four white squares are distributed in the shape of a triangle; black-white comparison is helpful for segmentation and corner extraction.Here, corner 1 is the origin of world coordinate.The direction from corner 1 to corner 4 is the positive direction of the X axis, and to corner 2 is the positive direction of the Y axis.The world coordinates of 20 corners in Figure 1 is to be measured in advance for use in pose estimation.Usually, the process for obtaining the image coordinate of the corners includes threshold segmentation, labeling, corner detecting, and identification (Sharp and Shakernia 2001;Saripalli et al. 2003;Rekimoto 1998).In order to upgrade the real-time effect of the method, a unique and simplified algorithm is employed in identification as illustrated in the article.

Segmentation
In practical application, robustness of the algorithm is a very important factor.To test the effectiveness of the method, an image taken in strong light and shadow is to be selected (see Figure 4).Since the target is designed in black and white, it is clear in the histogram (see Figure 3) that the target (see Figure 4) can be segmented by means of a global threshold.Moreover, to account for the unstable light and the changes in image brightness because the pictures were taken from different angles, a disturbing area is included in the segmentation result (see Figure 5).After threshold segmentation, the labeling is done.The labeling area is to be eliminated if its pixel number is less than the preset lower limit or more than the preset upper limit, and then the image coordinate of the pixel including max-x, max-y, mini-x or mini-y in the reserved candidate    labeling area (see Figure 6) is to be recorded.According to the extreme image coordinate of the labeling area, the width/length ratio of wrap rectangular at the candidate labeling area is computed.The labeling area, whose ratio is between the lower limit and the upper limit, is to be reserved and otherwise to be eliminated (see Figure 7).After the above-mentioned processing, the reserved labeling area is called candidate areas.If there is no candidate area, the next frame is processed.

Identification
For pose estimation, corners related to the 20 corners in the segmented image should be found out.To be exact, we will make out image coordinates of the 20 corners in Figure 1.
Step 1: Identify four vertexes of the black square pattern.First, the edge is to be extracted (see Figure 8), and then any points must be identified, including the max-x, max-y, mini-x and mini-y of the edge.Then it must be decided whether the points are the same.If they are two different points (see Figure 9), they must be the two diagonal vertexes, and the other two can be realized by searching for the farthest point from the diagonal on the edge.If they are three different corners (see Figure 10), any of the two are connected to make a line; if all pixel values in eight adjacent areas at the line center are 1 (white), then the two points are diagonal vertexes.The search method for other vertexes is the same as those used in the case of two different points.If there are four different corners, they are taken as just four vertexes (see Figure 11).Vertexes of the four white squares inside can also be obtained with the same method.
Step 2: Identify the orders of the four vertexes.In Figure 2, given centroid of the black square is 0, the pixel value of line 01's center should be 0 (black).In practice, we will decide whether all pixel values in eight adjacent areas  are 0; Once we get the point 1, its adjacent vertex in the counterclockwise direction should be 2, the diagonal vertex 3 and the last vertex 4.After finding points 1, 2, 3 and 4, it will be easy for us to determine the other vertexes in the rest of the four white squares.
If we cannot find 20 corners in any candidate according to this method, the candidate will not be considered as a target.

CALIBRATION BASED ON MULTIPLE VIEWS
Before estimating the Euclidean structure, the intrinsic matrix K of the camera must be known, and general model of K is shown in Equation (1).
In most vision-based landing systems, it is assumed that the intrinsic matrix of the camera is precalibrated, whose focus does not change in flight.But in practical applications, its focus is to be changed by distant control for the purpose of shooting or inspection.With multiple-view vision technology, we could utilize multiple images in which target is included to estimate the intrinsic matrix of the camera (Hartley and Zisserman 2000; Zhang 2000) as shown in the following deduction: Assume target is on Z = 0 plane of the world coordinate system.The image coordinate of a pixel is denoted by m = [u, v, 1] T , and the world coordinate of three-dimensional point is denoted by M = [x, y, z, 1] T .According to projection theory, we have Equation (2): where λ is the unknown homogeneous scale factor, r 1 , r 2 , r 3 is a row vector of rotation matrix, and T is the translator vector.Given It is clear that H is a homography between image plane and target plane.Provided from Equation (3), we have, Once a relation between the present image and the designed pattern is determined, given four or more such correspondences, we obtain a matrix equation of the formAh = 0, where h is just the components h ij stacked into a nineelement vector.The solution h is the null space of A, which may be computed using the singular value decomposition (Hartley and Zisserman 2000; Yuan et al. 2004).
Using the knowledge that r 1 and r 2 are orthonormal, we have Noted that B is symmetric, defined by a six-dimensional vector b = [B 11 , B 12 , B 13 , B 22 , B 23 , B 33 ] T .Therefore, the two fundamental constraints (4) and ( 5) can be rewritten as two homogeneous equations in b: where V 1 and V 2 are six-dimensional vectors, only relative to elements of matrix H.If n images of the model plane are observed, by stacking n such equations as Equation ( 7), we have where V is a 2n × 6 matrix.If n ≥ 3, we will have, in general, a unique solution b defined up to a scale factor.Once b is estimated, we can compute all camera intrinsic matrix K.
Note that the movement of the camera for selected images must not be translating, or degenerated configuration will occur (Zhang 2000).In practical application, we employ five pictures for calibration of the camera, and the five pictures are chosen through the following procedures: first, a target must be picked from the picture; besides, with large differences in cradle angle on shooting the five pictures, the angle can be obtained through a photoelectric encoder.During and after calibration, the focus of the camera can be fixed.

POSE ESTIMATION
If the target is found in image, then we can get image coordinate m = [u, v, 1] T of 20 corners.With the already known world coordinator M = [x, y, z, 1] T for the 20 corners, according to the above calculation method, H can also be obtained.
From ( 2) and (3), we have where r 1 , r 2 and r 3 are the column vectors of the rotation matrix.When K and H are known, r 1 , r 2 and T can be worked out from Equation (9).Since R is a rotation matrix, namely, r 1 , r 2 and r 3 are perpendicular, Because of noise in the data, the so-computed matrix R = [r 1 , r 2 , r 3 ] does not in general satisfy the properties of a rotation matrix.The problem considered is to solve the best rotation matrix R to approximate a given 3×3 matrix R. Here, 'best' is in the sense of the smallest Frobenius Problem ( 11) is equivalent to one of maximizing trace(R T R).Let the singular value decomposition of R be USV T , where S = diag(s 1 , s 2 , s 3 ).If we define an orthogonal matrix Z by It is clear that the maximum is achieved by setting R = UV T because then Z = I.This gives the solution to (11).Note: It is the pose of camera that is calculated here.As for the pose of UAV, cradle pose should also be taken into consideration at the same time, which will not be explained in detail here.

EXPERIMENT
For protection of the UAV, a stable independent operation of the vision system should be ensured prior to its installation on the UAV.A ground simulation experiment was conducted to verify the effectiveness of the method.In our test, a laser printer was used to print the target, where the size of the black square is 180 mm × 180 mm; in practical application, the target size can be decided by the parameters of the camera.In testing, the camera moves (by hand) at a height of 500-2000 mm relative to the target.Polhemus fastrack six-dimensional magnetic tracker (with a static accuracy 0.8 mm RMS for the x, y, or z position; 0.15 • RMS for receiver orientation) is installed on the camera, so as to distinctively reflect the computing result.To move the camera above the target and send image sequence wirelessly to computer for real-time processing, an image resolution of 320 × 240 is used.If a high image resolution is chosen, more precise results will be obtained, but high transmission speeds cannot be reached.Frame number for the experiment is 120, and five images are automatically chosen from the first 40 frames for calibration with the method mentioned above, while taking into consideration  Following this, the pose estimation can be done.We chose five frames randomly (see Figure 12), and the result of pose estimation is given in Table 1 and Figure 14.
Error occurred in five frames when detecting the target in the other 80 frames, so we have an accuracy of 93.75%; its processing speed reaches 25 frames per second.Average absolute value of divergence compared to measurement of the magnetic tracker is as follows: Detection errors occur when the angle between the ray axis of the camera and the target plane is large, or the images of the target have too serious distortion.But autonomous landing will not be affected distinctively.During autonomous landing, the UVA is in the state of slow speed and acceleration; so apparent errors can be got rid of by checking motion continuity.Besides, since there is a fast refresh rate of the system currently, small errors will not affect the UVA for a long time, and thus can be ignored.
For protection from any danger, an import control rule is that, in case any signal is lost or there is an error, our UAV will be in the protective mode, in which case the UAV will automatically hold its position and pose in air.

CONCLUSION
The real-time vision-based method for guided autonomous landing of rotor-craft unmanned aerial vehicle introduced here has two features: 1.It combines designing and identification of the target, which simplifies image processing, thus improving its real-time effect and accuracy.
Figure 14 The pose of five frames present in world coordinates.
2. It adopts multiple-view vision technology and combines angle sensor to calibrate the intrinsic parameters of the camera online, thus improving its flexibility and practicality.
Although this is done in linear computing, it is accurate enough to guide an autonomous landing of the UAV; it has also been proved to be a robust method through the theoretical analysis and experiment.
Next, debugging is to be done on the mini rotor-craft UAV; based on motion prediction, cradle heads are to be controlled to track the landing target; still, we are to search for suitable hardware on UAV to perform a vision job with light weight and low power.

Figure 1
Figure 1 Pose estimation algorithm in autonomous landing.

Figure 2
Figure 2 Pattern of target and order of corners.

Figure 3
Figure 3 Histogram of grey image.

Figure 4
Figure 4 Sample grey image.

Figure 5
Figure 5 After threshold segmentation.

Figure 6
Figure 6 After filter by pixel number.

Figure 7
Figure 7 After filter by width/length ratio.

Figure 8
Figure 8 Edge of pattern.

Figure 12
Figure 12 Five original images.

Figure 13
Figure13Five images after edge detection.

Table 1 Result of pose estimation
the angle signal of the tracker, and the calibration result is