Performance Improvement for Two-Lens Panoramic Endoscopic System during Minimally Invasive Surgery

One of the major challenges forMinimally Invasive Surgery (MIS) is the limited field of vision (FOV) of the endoscope. A previous study by the authors designed a MIS Panoramic Endoscope (MISPE) that gives the physician a broad field of view, but this approach is still limited, in terms of performance and quality because it encounters difficulty when there is smoke, specular reflections, or a change in viewpoint. is study proposes a novel algorithm that increases the MISPE’s performance. e method calculates the disparity for the region that is overlapped by the two cameras to allow image stitching. An improved evaluation of the homographymatrix uses a frame-by-frame calculation, so the stitched videos are more stable forMIS. e experimental results show that the revised MISPE has a FOV that is 55% greater, and the system operates stably in real time. e proposed system allows a frame rate of 26.7 fps on a single CPU computer. e proposed stitching method is 1.55 times faster than the previous method. e stitched image that is obtained using the proposed method is as similar as the ground truth as the SURF-based stitching method that was used in the previous study.


Introduction
Minimally Invasive Surgery (MIS) is becoming a gradually preferred option to traditional open surgery because it involves decreased blood loss, decreased postoperative pain, fast recovery time, and less scarring. However, one of the major obstacles for MIS is limited FOV. MIS is difficult to implement in operations because of its nonintuitive nature, so less experienced physicians encounter greater risks during MIS surgery.
To solve the problem of limited FOV in the current endoscope, we consider other studies that proposed the use of a panoramic endoscope that has a special design. One example of such endoscopes was the one developed by Yamauchi et al. [1]. e group was the first to design an endoscope that was able to produce a wider view of an image through the help of an image-shifting prism. In another study, Roulet et al. [2] proposed the design of a 360°endoscopy that would employ panomorph lens to produce wide-view images. Recently, a panoramic endoscope that used convex parabolic mirrors was developed by Tseng and Yu [3]. ese studies are important in increasing the viewing angle of the surgical images as well as the entire working area. is makes laparoscopic surgery safer. ese studies have focused on designing optical systems through the application of multiple prisms, lens, and mirrors to generate a larger surgical view of images. However, this approach also encountered many issues from aberrations to blind zones, and image quality is often affected by noise or distortion.
To increase the image viewing angle, an image stitching technique (mosaicking) is used in computer vision [4]. Many studies use this technique in MIS. Behrens et al. [5,6] demonstrated the mosaicking of a sequence of endoscopic bladder images from a video. A Global and Local Panoramic View for Gastroscopy is proposed in [7]. In [8], a sceneadaptive feature-based approach was proposed for the mosaicking of placental vasculature images that are obtained during computer-assisted fetoscopic procedures. ese studies perform the image stitching using the movement of a monocular endoscope.
is yields only panoramic static images that do not reflect the changes that may occur in the shape of the organs or blood vessels outside the camera's FOV. Takada et al. [9,10] proposed a Hybrid Tracking and Matching Algorithm for Mosaicking Multiple Surgical Views. ey combined the feature-based image registration and the optical flow tracking algorithm to evaluate the homography matrices for frames acquired from different trocar-retractable cameras.
is approach has improved speed and robustness for cases in which the overlap size is small compared to the feature-based method only. However, this approach can lead to errors in results if the tracking process is failed.
Previous studies by the authors proposed a MIS panoramic endoscope (MISPE) that gives physicians a broad area of view [11] using a feature-based image-stitching algorithm. Our MISPE is made up of two lenses which are mounted on its tip and have a tunable distance between them. e shortest distance is 5 mm. e endoscope is then connected to a PC using a USB to enable it to simultaneously capture the surgical images. e video-stitching module is applied at this point. It stitches the images to enable physicians to obtain a wide view of the image. e stitched image represents every transformation that occurs in the FOVs of the two lenses mounted on the endoscope. Figure 1 shows a schematic diagram of this MISPE system. However, this feature-based image-stitching approach does not perform well for MIS, which is often affected by smoke, a change in viewpoint, and specular highlights. ese problems are a function of the tissue characteristics, the proximity of the light source, and the proximity of the cameras. In this situation, the distribution of features in images is ambiguous, and the precision with which feature pairs are matched is decreased, so few features are detected or they are unevenly distributed. is results in a failure in image registration and less accurate stitching results. Figure 2 shows the stitching result for the Speeded-Up Robust Features-(SURF-) based stitching method that is used in the previous study. Figure 2(b) shows an erroneous patch because there are only a few common feature points.
In the previous study, video stitching is performed using a frame-by-frame evaluation of a homography matrix. is approach ensures correct stitching when the two cameras move toward or away from the surgical area. However, this approach changes the shape of the stitched images because there is a significant change in the homography matrix, even though the cameras are fixed. e matching points change frequently because of environmental factors such as brightness, smoke, and specular reflections. Figure 3 shows the changes in the dark area from Figures 3(a) and 3(b). is produces unstable stitched images in a video, which distract the physician's concentration while observing the video during surgery. e change in the homography matrices between consecutive frames must be smooth when stitching videos. is study proposes several improvements in the revised MISPE system that address the issues of the previous MISPE system. A new algorithm based on calculating the disparity map is used to stitch images. e speed, quality, and stability with which the video is stitched is increased. e remainder of this paper is organized as follows. Section 2 presents the proposed image-stitching algorithm. Section 3 presents the proposed video-stitching algorithm.
e experimental results are presented and discussed in Section 4. Finally, conclusions are drawn in Section 5.

The Proposed Image-Stitching Algorithm
e image-stitching algorithm comprises two stages: image registration and image compositing [11].
Image registration is the most important element of the image-stitching process because it directly affects the accuracy of the image-stitching results. is involves searching for pixels (e.g., feature points) or objects in the two different camera views. However, searching the entire image is a complex process that requires high-performance computing. Images that are captured directly from the camera must also be corrected to account for lens distortion. erefore, this study uses an image rectification technique [12] to transform the endoscope system into an aligned-undistorted configuration.
e search is then simplified to a one-dimensional problem with a horizontal line parallel to the baseline between the cameras. A video-stitching module then uses the aligned-undistorted configuration. e algorithm involves three steps, as shown in Figure 4. e details of the processes are described in the following subsections.

Rectifying Images.
is step corrects any distortion in the image that occurs due to distortion of the lens and aligns the two cameras into one viewing plane, so that the pixel rows between the cameras are exactly aligned with each other. is study uses Bouguet's algorithm [12] from the OpenCV library. In order to ensure precise rectification, 20 concurrent images were captured of a 16 × 11 chessboard at a distance from 3 cm to 15 cm and at different angles. is step was performed offline. Figures 5(a) and 5(b) show the input images and the undistorted and rectified images with the corresponding pixels on the same horizontal line (epipolar line). When the rectification process is complete, two rectified images are used for image stitching.

Image Registration.
Image registration matches points in two overlapping images to evaluate a homography matrix. For two rectified images, the image registration involves two steps: Step 1. Compute the disparity map. e disparity map shows all of the pixels that are different in the two rectified images. is study uses the block matching (BM) algorithm from OpenCV as a StereoBM module because this algorithm is fast and effective and similar to the one that was developed by Konolige [13]. It uses a small "sum of absolute difference" (SAD) window to find points in the left and right images that match, and then the disparity is calculated as the actual horizontal pixel difference.
A disparity map that is computed using by StereoBM usually contains invalid values (holes), which are usually concentrated in uniform texture-less areas, half-occlusions, and regions near depth discontinuities. erefore, Fast Global Smoothing (FGS) [14] is used as the postfiltering module in OpenCV to filter the disparity map. is module enables this type of postfiltering for real-time processing on the CPU. Figure 6 shows that two disparity maps that are computed using StereoBM: one that uses the left image as the reference image (left disparity map) and a second disparity map that uses the right image as the reference (right disparity map). e left-right-consistency-based confidence [15] is then used to refine the disparity map (refined disparity map) in half-occlusions and uniform areas.
It is not necessary to use all of the matching pixels in the overlap region to evaluate the homography matrix because the computational burden is large. erefore, this study proposes an ROI-grid method to determine the corresponding point pairs for two rectified images to evaluate the homography matrix.
A region of interest (ROI) where the pixels are calculated for disparity is defined as a region in the overlapping part of the left resized-rectified image. is study assumes that the minimum width of the overlapping area is equal to 30% of the width of the rectified image, so the ROI established as shown in Figure 7 is a region at position A (7w/10, 5) with width (3w/10 − 5) and height (h-10), where w and h are the width and height of the left rectified image. e points at the edge of the image are removed because the disparity value for these points is often unstable.
is ROI is then divided into m × n grids. e peak intensity of the grid is used to extract its corresponding point in the right rectified image by calculating its disparity value as follows: is gives a set of (m × n) corresponding point pairs for two overlapped rectified images. Because the homography matrix is a (3 × 3) matrix with 8 degrees of freedom (DoF), at least four corresponding point pairs are required to determine the matrix, so the rank of the (m × n) matrix must ensure that the number of corresponding point pairs was not less than 4. is study uses a (9 × 24) matrix. Figure 7 shows that this approach ensures that a large number of corresponding pairs are evenly distributed in the ROI.
is makes stitching more accurate and more stable.
Because there are still some mismatched pairs that have invalid disparity values, the RANdom SAmple Consensus (RANSAC) algorithm [16] is used to determine the wellmatched pairs (inliers) by removing the mismatched pairs (outliers). e homography matrix is then evaluated using the set of the inliers and the least-squares method to give the least reprojection error [17].

Image Compositing.
After image registration, the imagecompositing stage yields wide-angle images. is step uses the same process that is described in a previous paper by the authors [11]. e graph-cut technique algorithm [18] is used to determine an optimal seam to eliminate the appearance of "artifacts" or "ghosting." e multiband blending method [19] is then used to smooth the stitching results. In summary, the proposed image-stitching algorithm for this study includes the following six steps as described in (Algorithm 1):

The Proposed Video-Stitching Algorithm
e proposed image-stitching algorithm stitches video by stitching images that are captured from videos or cameras frame-by-frame. For practical applications, there are two requirements for the proposed MISPE system: stability and fast processing time for stitching videos.

Stitching Video at Increased Speed.
For the proposed image-stitching algorithm, the two most time-consuming steps involve computing the disparity map and determining the seam mask, especially for high-resolution images. erefore, the proposed method accelerates the videostitching process by reducing the time that is required to calculate the disparity map and determine the seam mask, as shown in Figure 8.
is study uses a downsizing technique to transform the processed images into low-resolution images using an imageresize function and the bilinear interpolation algorithm in OpenCV. e resized-scale value is input manually to accelerate the process and maintains the required image quality. To decrease the time that is required to produce the disparity map, the resized-scale value (k1) is selected such that the rectified images can be resized to a resolution of (320 × 240) in many of the experimental cases. e disparity map is then computed, and the homography matrix for these two resized-rectified images is calculated. It is assumed that the homography matrix for the two resizedrectified images is H resize . Because the coordinates of a point in the resized image are proportional to the coordinates of that point in the original image with a resizedscaling factor k1, the homography matrix that transforms the two original rectified images on the same plane is defined as follows: To decrease the time that is required to determine the seam mask, the resized-scale value (k2) is used to transform the two warped images to two low resolution (64 × 48) resized-warped images. e two seam masks are then calculated using the graph-cut algorithm. e two masks are then resized to the original resolution, and the warped images are blended. For this study, the resized-scale value (k2) can be adjusted to ensure that the quality of the seam estimation is high.

Increasing the Stability of the Stitched Video.
e proposed method rectifies two frames that are captured from two cameras and computes the disparity map. e matching point pairs in two rectified frames are then determined using the proposed ROI-grid method. A RANSAC algorithm is used to determine the well-matched pairs (inliers) and to remove the mismatched pairs (outliers) using a RANSAC threshold. is study uses a RANSAC threshold of 3.0. It is assumed that there are N inliers, {(p 1 , q 1 ), (p 2 , q 2 ), . . ., (p N , q N )}, in the two frames. e medium reprojection error of a 3 × 3 homography matrix (H) for the set of N inliers is defined as follows: where H is a 3 × 3 homography matrix, p i and q i are the corresponding homogeneous coordinates of the i-th pair in  Input: Two input images (1) Rectify two input images to two rectified images (2) Compute the disparity map using the stereoBM and FGS filter in OpenCV (3) Establish a homography matrix using the proposed ROI-grid method and a RANSAC algorithm (4) Transform the right rectified image into the same left rectified image plane using the estimated homography matrix (5) Determine an optimal seam to prevent the possibility of "ghosting" in the stitched image using the graph-cut technique (6) Render panorama using the multiband blending method Output: Panoramic image ALGORITHM 1: Proposed image-stitching algorithm. 6 Journal of Healthcare Engineering the N inliers, and d (p i , Hq i ) is the Euclidean distance between points p i and Hq i . It is assumed that H prev is the homography matrix for the previous two frames. e medium reprojection error for H prev on the set of N inliers is as follows: e error ME (H prev ) is large when the cameras are moved significantly, so the coordinates of the inliers change significantly. e error is small when the cameras are almost fixed. If this error is sufficiently small, it is possible to use H prev to stitch the two current frames. erefore, the proposed algorithm compares ME (H prev ) with a specified threshold value to reduce the variation in the homography matrix which increases the stability of the stitched videos.
For the two initial frames, the homography matrix is calculated as the minimum value of the cost function ME (H) for the set of inliers in the two initial frames.
For the next frames, the error ME (H prev ) is calculated using the same set of inliers in the current frames. If the error is less than a specified threshold value, H prev is used to stitch the two current frames. If the cameras move by a significant amount, the homography matrix is calculated again as the minimum value of the cost function ME (H) for a set of inliers in the two current frames. is study uses a threshold value of 2.0. Detailed proposed algorithm is described in (Algorithm 2):

Results and Discussion
e proposed method was tested using a PC with an Intel i5-4590 3.4 GHz CPU 16 GB of RAM memory on an Ubuntu 16.04 system. e program is implemented using C++ with OpenCV 3.4.0. e default parameters for the OpenCV function are used. ese parameters can be adjusted on the control panel to achieve the best quality disparity.

Video-Stitching Results.
To validate the method, experiments for in vivo animal trials were performed with the support of the IRCAD MIS research center, Show Chwan Memorial Hospital, Taiwan. e two endoscopic cameras that were used in the experiments were 2.0MP USB Digital Endoscopes from Oasis Scientific Inc. Figure 9 shows the animal trial results: Left and middle show two input images that are captured using the two endoscopic cameras during the in vivo animal experiment. Right shows the image stitching results. e results confirm that the proposed method can increase the FOV by 155%.
We performed the video-stitching process on various kinds of video samples, with three of these shown in Figure 9. e video samples shown in the figure are the ones that occur under the conditions of presence of smoke, the appearance of specular reflections, the appearance of moving surgical tools, and the presence of a moving camera during MIS. e first sample represents a situation where the cameras are held still, but there is a movement of the heart and presence of smoke. e second sample describes the situation in which our endoscope moves toward or away from the surgery area. e third one is filmed under various camera movements, specular reflection appearance, and the appearance of a moving tool in MIS. ese input videos and results can be found in [20][21][22].

Comparison with the Previous Method.
To determine the effectiveness of the proposed algorithm, the proposed method is compared with the SURF feature-based stitching method that is described in a previous study by the authors [11]. In this study, we only made comparisons with [11] because the stitched images from two endoscopic cameras for MIS are similar.
First of all, we evaluate the percentage of stitchable frames for the three samples, as shown in Table 1. In sample 1, there are a lot of frames in the two input videos that are not stitched by SURF. is is due to the occurrence of environmental factors such as smoke or specular reflection appearances resulting in very few matching features detected correctly, while all the frames are stitched using the proposed method for the three samples. Journal of Healthcare Engineering e second, the quality of the two methods is evaluated in comparison with ground truth for the same dataset. Figure 10 shows the stitching results for both methods in the three samples: Left shows the SURF-based stitching results, Middle shows the stitching results for the proposed method, and Right shows the ground truths. It is seen that the stitching results for the proposed method are similar to the ground truth, and the SURF-based stitching result is distorted in shape because only a few matching feature pairs are detected and these are unevenly distributed.
To determine the alignment accuracy for both methods, the average error for every pixel in the overlapping area after the warping transformation is calculated using the evaluated homography matrix. (f) is the left image and (g) is the result for the right image after transforming on the same plane as the left image. A maximum rectangular area within the overlapping area is then defined to calculate the corresponding pixel difference between images (f) and (g). e alignment error is calculated as follows: where f(x, y) and g(x, y) are the grayscale pixel values for position (x, y) in the defined rectangular area of the two images (f) and (g) and m × n is the size of the selected rectangular area.
e alignment error of the stitching video is evaluated as the average alignment error of the stitchable frames. For the three datasets in Figure 9, the alignment error is shown in Table 2. It is clear that the alignment errors for the proposed method are all smaller than those for SURF. e third, the computational time or the video-stitching rate for both methods is determined. Input videos that were used in the experiments were simultaneously acquired by the two endoscopic cameras with a resolution of 640 × 480 pixels at 30 fps. Figure 11 shows the video-stitching rate after 1500 consecutive frames. ese graphs show that the proposed method operates at 26.7 fps and the SURF-based method operates at 17.2 fps. e proposed method stitches video 1.55 times faster than the SURF-based method.
Finally, to compare the stability as well as quality of the stitched videos for both methods, three additional demonstration videos were posted on Youtube [20][21][22]. ese results show that the shape of frames in the stitched videos by our method only changes when the cameras move significantly away or toward the surgical area. Meanwhile, the frames in the stitched video by the previous approach always change the shape frame-by-frame. It means that the video that is processed using the proposed method has a more stable picture than that processed using the previous method. erefore, these results demonstrate the feasibility of the proposed method in MIS to expand the narrow FOV of a conventional endoscope.  e proposed method increases the FOV of the input image and supports the reconstruction of a 3D surface image of the overlapping area. Figure 12 (a) shows two input images with an overlapping area (yellow color), (b) shows the disparity map, (c) shows the stitching result, and (d) shows a 3D surface image as a direct result from the disparity map.
However, the proposed method still has some limitations. e method cannot stitch any two images unless the parameters for the rectification process are known. ese parameters include the intrinsic and extrinsic parameters for each camera. e feature-based stitching method does not have this requirement. e accuracy of the proposed method depends on the stereo matching algorithm that is used and on the accuracy of the rectification process, so if the rectification process is not sufficiently accurate, the two rectified images are not aligned and this affects the quality of the final stitched image. is study only uses the stereoBM algorithm and FGS filter in OpenCV to calculate the disparity map. A future study will involve a new proposal to improve the  quality of the disparity map and the development of an additional 3D reconstruction module.

Conclusions
is study proposes a novel algorithm that increases the performance of a MISPE. e proposed stitching algorithm uses stereo-vision theory, so it also supports 3D reconstruction. e experimental results show that the revised MISPE system operates stably in real time and increases the endoscope's FOV by 155%. Using the proposed algorithm, the revised MISPE operates at a frame rate to 26.7 fps on a single CPU computer for two endoscopic cameras at a resolution of 640 × 480. e proposed stitching method is 1.55 faster and produces results that are closer to ground-truth than the SURF-based method that was used in a previous study by the authors.
Data Availability e image (video) data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.