A Master-Slave Surveillance System to Acquire Panoramic and Multiscale Videos

This paper describes a master-slave visual surveillance system that uses stationary-dynamic camera assemblies to achieve wide field of view and selective focus of interest. In this system, the fish-eye panoramic camera is capable of monitoring a large area, and the PTZ dome camera has high mobility and zoom ability. In order to achieve the precise interaction, preprocessing spatial calibration between these two cameras is required. This paper introduces a novel calibration approach to automatically calculate a transformation matrix model between two coordinate systems by matching feature points. In addition, a distortion correction method based on Midpoint Circle Algorithm is proposed to handle obvious horizontal distortion in the captured panoramic image. Experimental results using realistic scenes have demonstrated the efficiency and applicability of the system with real-time surveillance.


Introduction
Digital video surveillance has become commonly used in public and private places such as government buildings, military bases, car parks, and banks, and so forth. Traditional monitoring cameras can only cover a limited area, leading to "blind spots. " Developments in panoramic imaging technology offer significant advantages over traditional surveillance systems. They can monitor an area that covers 180 ∘ or 360 ∘ and so replace several traditional cameras. Nevertheless, images captured by panoramic cameras have limited range of scale due to the relatively reduced resolution. Comparatively PTZ dome cameras can focus on areas of interest rapidly by decreasing or increasing focal length. The master-slave camera composed of a fish-eye panoramic camera and PTZ dome camera combines the advantages of both ( Figure 1). The fish-eye panoramic camera is responsible for acquiring global and wide images in the large surveillance area, and the PTZ dome camera is used to acquire multiscale videos for more detailed information. Figure 2(b) demonstrates the internal structure of the proposed system, where a panoramic camera acts as master camera and is mounted next above the traditional PTZ dome camera (Figure 2(a)). Upgrading from existing surveillance systems to a master-slave system is simple (shown in Figure 2(c)). This system can automatically direct a slave PTZ dome camera(s) to zoom into target areas of interest, in which details of object appearance are available at a higher resolution.
Different master-slave camera systems use various compositions of PTZ dome cameras and mixtures of other types of camera. Regardless of how the system is composed of camera types, the critical technique is to develop a suitable calibration algorithm for accurate interaction between the cameras. Through calibration, we hope that the target appointed by master camera can steer the slave camera to focus on the same position at the pixel level. The simplest and most direct way to precisely calibrate two cameras is to manually find every pixel in an image captured by one camera and correspond these with pixels in an image captured using another camera. Dense mapping such as this is impractical and it seriously limits the applicability. In practice, fewer points are required, which can be interpolated within some degree of accuracy. However, often the level of accuracy is not unacceptable. It has been shown experimentally that calibration based on 200 sample points may take several 2 The Scientific World Journal   Figure 2: (a) The master-slave visual surveillance system. (b) The internal structure of the proposed system. (c) Most existing surveillance systems can be upgraded to the master-slave system by adding a fish-eye camera above the traditional PTZ camera, or adding a PTZ dome camera underneath the panoramic camera.
hours. Hence, a practical calibration method is required for master-slave camera systems.
To ensure the panoramic camera captures the same scene content that the PTZ camera covers, the fish-eye camera is fixed inclining towards the gravity direction (Figures 3(a)-3(c)). However, this design brings serious image distortion in the horizontal direction (Figure 3(d)). So, an additional challenge for the proposed system is to find a suitable image correction to handle these distortions.
Our contributions over existing competing systems are twofold: (1) in terms of camera calibration, an efficient and accurate calibration method is proposed to accomplish the calibration between stationary and dynamic cameras. This method does not require specific camera setup or a particular grid pattern; (2) in terms of fish-eye distortion correction, our technique correctly handles the particular type of distortion introduced in fish-eye panoramic images. By adjusting the values of interrelated parameters, the extent of the distortion can be controlled. Moreover, the proposed algorithm can be applied to an embedded camera platform without any extra hardware resources due to its low computational cost.
The remainder of the paper is organized as follows. Section 2 reviews related work. Section 3 introduces the calibration method between panoramic and PTZ dome cameras. Section 4 describes the proposed distortion correction algorithm. In Section 5, experiments are implemented and the experimental results are shown. Finally, Section 6 concludes this paper.

Related Work
As mentioned, the main challenge in the application of the proposed system is to actively control a PTZ dome camera to correctly focus on the same target in the panoramic scene. The precision of this interaction largely depends on the accuracy of the spatial calibration, which can be considered as the mapping between each of pixels in fish-eye panoramic image and the pan-tilt angles of PTZ dome camera. A practical calibration method should not only be efficient and effective but also need no particular system setup or human intervention. For the proposed system, it also has a problem of horizontal distortion when the master camera is mounted with an angle towards the gravity direction. Here we start with reviewing the calibration methods and approaches used in maser-slave camera system. Current calibration methods can be divided into two main categories: geometry calibration and data fitting calibration. Geometry is a calibration method through single camera calibration and dual camera joint calibration to obtain the mapping relationship. However, it requires a priori knowledge about camera imaging model and geometric environment. Sato et al. [1] present an indoor monitoring system with multiple camera units, which includes panoramic camera and PTZ camera. They calculated the PTZ camera rotation angle corresponding to the point ( , ) on the panoramic image using the following equation: where and are the distances between panoramic camera and PTZ camera in the horizontal and vertical direction respectively.
is the height from PTZ camera to the ground, and is the height from PTZ camera's bottom to the ground; and are the height and width of panoramic image. This simple method has large error because PTZ rotation angle is calculated through the position of point in the panoramic image and relative shift of PTZ rotation angle. Scotti et al. [2] employed the master-slave camera in which the optical axis of catadioptric panoramic camera coincides with the horizontal axis of PTZ camera. They approximately assumed that the optical centers of two cameras overlap with each other due to the close installation. As the polar angle of the pixel in the panoramic image equals the corresponding horizontal rotation angle of PTZ camera, it simplifies the calculation greatly. It also reduces the error of the horizontal rotation angle which is brought by the change of the optical center with PTZ's movement. However, the method assumes that objects in the scene should be located on the same space ground. This hypothesis is invalid for most practical circumstance. The distance measurement between PTZ camera and the ground is required to be manually set, which also limits the installation procedure. The Scientific World Journal Geometry calibration generally needs to know two priories and satisfy a hypothesis, so the mapping relationship largely depends on both the accuracy of the priori and the validity of the hypothesis. While, the data fitting models the relationship between the panoramic coordinates and PTZ rotation angle by fitting the sample points. Both the camera imaging model and relative position of two cameras can be ignored, so this type of methods is more flexible. Hampapur et al. [3] triangulated a position by two or more calibrated cameras and determine the steering parameters for a third PTZ camera which is also calibrated. Chen et al. [4] proposed a versatile method for a variety of cameras. However, their method is at the cost of reducing the accuracy. They sampled the pixel coordinates ( , ) in the panoramic image and the correspondent PTZ rotation angle and hope to find the best fitting polynomial to describe their relationship. Tan [5] proposed a mapping method based on image piecewise fitting to obtain an improvement. Different polynomials were used according to the distortion degree in different areas. Nevertheless, the accuracy does not meet requirements. Senior et al. [6] used a master-slave camera that is composed of a fixed super wide-angled camera and a PTZ camera. They selected several sample points in the FOV of super wideangled camera and determined nonsample point mapping relationship by a linear interpolation. Zhou et al. [7] selected a number of pixel locations in a static camera. For each pixel, manually move the slave camera to center the slave image and record the corresponding slave pan-tilt angles to obtain a lookup table. It links the static camera coordinates with the pan and tilt angles. Although their method is accurate enough to initialize the track of dynamic camera, it is time consuming and inconvenient. You et al. [8] employed a mosaic image created by snapshots of slave camera to estimate the relationship between static master camera plane and pan-tilt controls of slave camera. Compared with other approaches, this solution provides an efficient and automatic way to calibration of a master-slave system. Nevertheless, the mapping determined by a liner interpolation is inaccurate.
In terms of fish-eye panoramic image distortion correction, Devernay and Faugeras [9] assumed the presence of straight lines in the scene. Distortion parameters are sought which lead to lines being imaged as straight in the corrected image. Kannala and Brandt [10] proposed a novel calibration method for fish-eye lens cameras that was based on viewing a planar calibration pattern. This method was proven suitable for different kinds of omnidirectional cameras as well as for conventional cameras. Wang et al. [11] presented a new model of camera lens distortion that utilized two angular parameters and two linear parameters. These parameters were used to determine the transform from an ideal plane to real sensor array plane, which governs the lens distortion. Yu [12] proposed a lens geometric and photometric distortion correction method to obtain a high quality image. By using a simplified camera calibration technique, lens geometric coefficient can be estimated. Photometric distortion was corrected using a nonlinear model fitting of a proposed photometric distortion model function. Ying et al. [13] used spherical perspective projection model to calibrate the fish-eye lenses. Based on straight line spherical perspective projection constraint, the mapping between a fish-eye image and its corresponding spherical perspective image was determined. Once the mapping is obtained, the fish-eye lenses can be calibrated. Since orthographic spherical perspective projection was employed, these algorithms can only be applied to orthographic fish-eye cameras but not for equidistant fish-eye cameras [14,15]. Li et al. [16] presented an embedded real-time fish-eye image distortion correction algorithm, which can be applied in an IP network camera. However, this algorithm only aimed to correct the distortion in the vertical direction. Moreover, methods that adapted the projection to content in the scene were also presented [17][18][19]. However, these methods require human intervention, and the corrected image has to be cropped.
Most previous distortion correction research focuses on constructing and calculating the internal reference model, which can express the mapping between the three-dimensional world and the two-dimensional image. Based on the internal reference model, the distorted image is mapped onto a three-dimensional spherical surface or parabolic surface. By using perspective projection, the distortion can be corrected. However, these methods aim to correct the distortion of conventional fish-eye panoramic images, and few methods have been proposed to correct the particular type of distortion in the fish-eye panoramic image captured by a masterslave camera. The proposed system draws inspiration from Midpoint Circle Algorithm (MCA) [16] and applies this algorithm to correct these distortions.

Calibration
Through a specific calibration, the target appointed by master camera can steer the slave camera to focus on the same position. To achieve this goal, the core technique is to determine the geometric relationship between the master camera image pixel coordinates and the pan-tilt angles of the slave camera.

Analysis of Coordinate Systems.
The basis of calibration is to establish three coordinate systems: a panoramic coordinate system, a PTZ coordinate system, and a spherical coordinate system. As a master-slave camera is composed of a fish-eye panoramic camera and a PTZ dome camera, the panoramic coordinate system is a fish-eye coordinate system based on the fish-eye panoramic images, the PTZ coordinate system is a coordinate system based on PTZ camera taking pan angle and tilt angle as parameters, and the spherical coordinate system is an auxiliary coordinate system transforming from the panoramic coordinate system to the PTZ coordinate system. Here we set ( , ) as the pixel positions in the panoramic image. As shown in Figure 4(b), the PTZ coordinate system contains two parameters. is a point on the surface of sphere. represents the pan angle between and positive -axis. It increases in the anticlockwise direction viewing from positive -axis, which ranges from 0 ∘ to 359 ∘ .
is the tilt angle that ranges from 0 ∘ to 89 ∘ . It is the angle between and XOY plane which increases in the clockwise direction viewing from positive -axis. The PTZ coordinate system is defined within a hemisphere on the XOY plane with the -axis pointing downwards. Figure 4(c) illustrates the unit spherical coordinate system. Starting point and every axis correspond to the PTZ dome coordinate system. The coordinate of point is denoted by ( , , ). is the angle between and positiveaxis which increases in the anticlockwise direction viewing from positive -axis and ranges from 0 ∘ to 359 ∘ . is the angle between and positive -axis which increases in the anticlockwise direction viewing from positive -axis and ranges from 0 ∘ to 89 ∘ .

Transformation between Coordinate Systems.
When the transformation from the panoramic coordinate system to the PTZ coordinate system is obtained, we can determine the mapping relationship between each of pixels in the image captured by the fish-eye camera and image captured by the PTZ camera. However, the transformation cannot be calculated in a single step directly. We need to use spherical coordinate system to link with these two coordinate systems. The process is completed in three steps. Firstly, assume ( , , 1) to be any pixel with homogeneous coordinates in the panoramic coordinate system with a corresponding spherical homogeneous coordinate ( , , , 1). Secondly, set the mapped homogeneous coordinate to be ( , , , 1). In the PTZ coordinate system, its corresponding spherical homogeneous coordinate is ( , , , 1). Last, establishing a mapping relationship between ( , , , 1) and ( , , , 1) as follows: is a 4 × 4 matrix which represents the transformation from a panoramic spherical coordinate to PTZ dome spherical coordinate. Before the transformation, there are two constraints for the master-slave camera system. Firstly, the camera's optical axis is perpendicular to the image plane and point of intersection is at the center of the image plane. Secondly, the fish-eye panoramic camera's optical center and the PTZ dome camera's optical center are in the same vertical plane. If above two constraints are not satisfied, great errors may result. The mapping is obtained through a process of transformation among coordinates: (1) Transformation from panoramic coordinate system to spherical coordinate system. As shown in Figure 3(a), we can obtain the distance between point and the center. Based on the fish-eye imaging model, the radical angle of point off-center is = . ( According to and , the -axis of the point can be calculated. So the transformation is = , is the normalization constant to ensure √ 2 + 2 + 2 = 1. (2) Transformation from PTZ coordinate system to spherical coordinate system.
As PTZ dome camera model is a hemisphere model, the PTZ coordinate system is identical with the spherical coordinate system. The transformation can be written as  According to the transformation using the two steps described above, the final transformation can be obtained.

Calibration Theory between Master and Slave Camera.
( , ) are the coordinates of pixel in the panoramic coordinate system. ( , , ) are the coordinates of the corresponding point in the PTZ coordinate system. The mapping can be determined by finding a matrix that makes the optical centers of the two cameras coincide with each other. Namely, a panoramic image can be considered as a large PTZ dome image. A PTZ dome camera satisfies the theory of pinhole imaging. According to Tsai [20], the theory of pinhole camera imaging is illustrated in Figure 5(b), in which̃= . In the proposed system, a fish-eye panoramic camera is mounted above a PTZ dome camera and their optical centers are on the same vertical plane. Here is the optical center of panoramic camera and is its principal point. is the optical center of PTZ dome camera and is its principle point. is a point in the panoramic coordinate system and is the corresponding object point in the world coordinate system. We project the PTZ dome image on the panoramic image. According to pinhole imaging theory, if the PTZ dome camera shoots at , will coincide with , which is the result we hope to obtain. In this work, the mapping is calculated through a process of selecting sampling points. According to the algorithm presented by Zhang [21], it has to select at least 3 PTZ dome image to solve the transformation matrix.

Feature Points
Matching. This step requires a method to detect and match visual feature that is robust to scale, rotation, viewpoint, and lightning. The Scale Invariant Feature Transform (SIFT) [22] exhibits great performance under these requirements. In this work, we employ SIFT to detect feature points in both the panoramic image and the PTZ is the feature point matrix in the th PTZ dome image.
is the matching feature point matrix in the panoramic image: Here is the center point spherical coordinate matrix in the PTZ dome image.
is the matching point matrix in the panoramic image: . The calibration can be considered as calculating the transformation matrix. As a result, for given any pixel, we can calculate the corresponding rotation angle. The feature points matching step is employed to solve the unknown in the transformation matrix .

Distortion Correction
As mentioned by Strand and Hayman [23], a straight line in world coordinates can be projected to a corresponding circle on the fish-eye image plane, which means that the mapping process can be calculated directly using MCA [16]. The proposed distortion correction method can be divided into two parts. Firstly, the coordinate mapping is calculated based on MCA between a column of the first corrected image and the arc line of the original fish-eye panoramic image in the vertical direction, which is named Vertical Correction Image (VCI) in this work. Secondly, following the same principle, the coordinate mapping between a row of the second corrected image and the arc line image of VCI in horizontal direction is calculated, which is named Horizontal Vertical Correction Image (HVCI) here. Figures 6(a) and 6(b) demonstrate the structure of captured fish-eye panoramic image and VCI in vertical direction, whose width and height both are and . Figures 6(c) and 6(d) show the VCI and HVCI in the horizontal direction.
For vertical distortion correction, the first column in VCI corresponds to the first arc line in the fish-eye panoramic image. For an arbitrary point on , there is a corresponding point on . Since the distortion is corrected only in the vertical direction, ordinate value of point is the same with its corresponding point . In fish-eye panoramic image, does not project to VCI, so its length is called Vertical Distortion Redundancy Length (VDRL).
For an arbitrary column in VCI, assume that its corresponding arc line image in the fish-eye panoramic image is . The point ( , ) on has its corresponding point ( , ) on . Based on MCA, for an arbitrary point ( , ) on , assume that its corresponding point 8 The Scientific World Journal on is ( 0 , 0 ). Thus the corresponding point position can be calculated as follows: where = /2, = + (( − 2 )/ ), and = − . The extent of the distortion correction can be controlled by the value of VDRL. The distortion correction in the horizontal direction follows the same logic. In HVCI, an arbitrary row has its corresponding arc line in the VCI, such as the first row line image and its corresponding first arc line . With VDRL in fish-eye panoramic image, there also is Horizontal Distortion Redundancy Length (HDRL) in the VCI, which is the length of | |. Because the distortion is corrected only in horizontal direction, the abscissa value of a point in HVCI is the same as its corresponding point in VCI. For an arbitrary row line in HVCI, assume that its corresponding arc line in VCI is . A point ( , ) on has its corresponding point ( 1 , 1 ) on . The coordinate values of point can be derived as follows: where = /2 and = + (( − 2 )/ ). By traversing the HVCI, all positions can be mapped to corresponding points in VCI. Thus, the distortion in the horizontal direction can be corrected, and the extent of distortion correction can also be controlled by adjusting the value of HDRL. After correcting the distortion in the vertical and horizontal directions, the distortion in the fish-eye panoramic image captured by master-slave camera can be corrected effectively and efficiently.

Experiments and Results
With regard to the time consuming of the proposed calibration method. The process can be considered as two steps: image subtracting and image matching. Capturing one image takes 2 seconds roughly. For one pair of images, image matching procedure takes about 5 seconds. In this circumstance, the entire process only takes 50 seconds. Compared with the traditional manual calibration which normally consumes several hours, the proposed automatic calibration algorithm increases the efficiency dramatically.
For the accuracy evaluation, Figure 7 demonstrates the images captured by the proposed master-slave camera. Here we focus on four areas of the panoramic image which includes distinguished features. Figure 8 shows the mean error between requested pixel ( , ) and the ground truth pixel ( , ) to evaluate the distribution of error.
In Figure 8, the dark blue area represents 0∼35 pixels, the green area represents 35∼70 pixels, and the red area represents 70∼105 pixels. In this work, the size of the PTZ dome image is 1920 * 1080 pixels. It shows that error can be under-controlled within 1 ∘ over 95% of the scene. However, three main factors could influence the accuracy during the calibration procedure. Firstly, the PTZ dome camera only has accuracy of 1 ∘ other than 0.1 ∘ . Under this condition, 3.5 pixels is equivalent to 1 ∘ which bring the result that the centering error greatly reduces. Secondly, the selection of the PTZ dome images' position. Once the chosen 6 PTZ dome images are in uniform distribution in the panoramic image, the red area would not appear in Figure 8. Last but not least, it is the feature point matching process and the numbers of match points directly influence the result of calibration. Figure 9 shows the images after distortion correction with different HDRL values, and the correction results are marked by red lines to offer a more direct expression. The distortion in Figure 9(a) is corrected in a small scale without any obvious noticeability. Although the distortion correction result in Figure 9(b) is close to the realistic scene, it is still away from being ideal, while in Figure 9(c), the HDRL value is 120 in which the distortion in horizontal direction obtains a desirable correction effect. When the HDRL value is 140 (Figure 9(d)), the setting is obviously overdone and causes overdistortion. It is worth mentioning that the HDRL needs to be tested to obtain a relatively ideal value since it differs in different panoramic images. Table 1 shows the comparison of time consumption on dealing with different resolution images with the algorithm   proposed by Ying et al. [13]. The comparison was conducted with single-threaded implementation on 3.20 GHz Intel Core i5-3470 CPU and 4.00 GB RAM computer by using Microsoft Visual Studio 2008 software. The result shows that the proposed algorithm has much less time consumption than Ying's algorithm under different resolutions. Moreover, the calculation resource consumption between Ying's algorithm and the proposed algorithm is also listed, which includes the utilization of CPU and memory. As represented, less resource is required by the proposed correction method which enables the proposed surveillance system to achieve real-time performance.

Conclusion and Discussion
A master-slave camera system that is composed of panoramic and PTZ dome cameras is proposed for stationary and dynamic visual surveillance. A panoramic camera observes a scene with a large field of view, and PTZ dome cameras simultaneously capture high-resolution images with multiscale information. It can roughly cover 2-square-kilometer area with one camera, especially suitable for the large area surveillance such as squares and stadiums. More specifically, we present a calibration method for obtaining the mapping relations between master camera and slave camera. The availability and accuracy of the method are validated by the experiments shown in this paper. Additionally, we propose a correction approach to correct the particular type of distortion in fish-eye panoramic image captured by this camera system. It has been applied on embedded camera platform without any extra hardware resources due to its low computational cost. In order to achieve the more precise interaction, future work would consider a calibration method based on panoramic image mosaic to obtain the pixel level mapping relation between the fish-eye image and the PTZ camera's motion parameters.