Visual prosthesis applying electrical stimulation to restore visual function for the blind has promising prospects. However, due to the low resolution, limited visual field, and the low dynamic range of the visual perception, huge loss of information occurred when presenting daily scenes. The ability of object recognition in real-life scenarios is severely restricted for prosthetic users. To overcome the limitations, optimizing the visual information in the simulated prosthetic vision has been the focus of research. This paper proposes two image processing strategies based on a salient object detection technique. The two processing strategies enable the prosthetic implants to focus on the object of interest and suppress the background clutter. Psychophysical experiments show that techniques such as foreground zooming with background clutter removal and foreground edge detection with background reduction have positive impacts on the task of object recognition in simulated prosthetic vision. By using edge detection and zooming technique, the two processing strategies significantly improve the recognition accuracy of objects. We can conclude that the visual prosthesis using our proposed strategy can assist the blind to improve their ability to recognize objects. The results will provide effective solutions for the further development of visual prosthesis.
Globally, around 45 million people suffer from blindness caused due to eye diseases or uncorrected refractive errors. Although much progress has been made to rectify visual impairments, there is still no effective treatment for blindness [
To overcome the limited visual perception, researchers have tried to optimize the image presentation to deliver the effective visual information in the simulated prosthetic vision. Research groups have evaluated different approaches to improve the performance of the methods to optimize the visual information presentation. Boyle et al. [
In terms of image processing strategy based on the saliency detection, most of the studies use the biologically plausible saliency visual model to extract the foreground objects. These sophisticated methods have low accuracy and high complexity. This leads to the segmentation being more complex (using “GrabCut” segmentation). Li et al. [
The image processing stage in visual prosthesis aims to set the resolution of original image corresponding to the number of stimulating electrodes and is called Lowering Resolution with Gaussian Dot (LRG). The limited number of electrodes in visual prosthesis can lead to huge loss of information when presenting the daily scenes. This severely restricts the ability of prosthetic implant to recognize objects in daily life. Increasing the contrast between the foreground region and the background region in real-life scenes can optimize the visual information presentation in simulated vision. Therefore, automatically detecting the main objects and precisely separating the objects from the scenes are needed first. In this paper, we define salient object as the main object and segment it as foreground. In Figure
Schematic diagram of the two image processing strategies.
The extraction of the ROI region is based on the saliency detection technique. The saliency detection models are based on the visual attention mechanism and are used to extract the salient features to generate the saliency map. Common models such as the Itti and the GBVS are widely used in the field of visual prosthesis [
The salient object detection model is based on the manifold ranking. The manifold ranking method proposed by Zhou et al. uses the intrinsic manifold structure of a data for graph labeling [
In this model, the images are segmented into superpixels firstly by SLIC [
Diagram of the main step in the salient object detection model: (a) original image, (b) superpixels, (c) saliency map in the first stage, and (d) final saliency map.
The salient object detection model based on the manifold ranking outperforms other models in detecting salient objects. However, due to the presence of illumination and complex background, incorrectly classified pixels may be generated. Using the single-threshold in the saliency map segmentation cannot obtain the precise binary mask of objects. This reduces the accuracy of foreground extraction and affects the perception of the main objects for the prosthetic wearers. Thus, in this paper, we introduced a dual-threshold and multiregion connectivity analysis method for the saliency map segmentation. Compared with the single-threshold method, the new method improves the accuracy of the segmented objects [
pixels. The lower threshold the weak target. target pixel, the weak target pixel is added to the strong target set. Until all the weak targets have been traversed, the strong target region contains the final segmentation results.
For the edge detection step, a multiscale Sobel operator is adopted to extract edge feature. Sobel operator is a kind of first-order differential edge detection operator. It is performed by extracting the gradient of the image. The gradient magnitude and direction reflect the edge strength and direction. The first-order differential operator with the scale
Results of edge detection under different scales. (a) Image with white Gaussian noise, (b) Sobel edge detection with scale
The “FEBR” processing strategy is used to enhance the contrast between the main object and the background under simulated prosthetic vision with low resolution. The “FEBR” processing strategy is able to increase the object recognition rate. For the foreground image
The “FZE” processing strategy is used to detect and enhance the edge feature of the foreground objects to increase the recognition rate. Before performing edge detection, we add a zoom step to increase the size of the target and make it occupy the entire visual field. The zoom step takes the minimum sized box containing the foreground pixels as the zoom window. Then, the zoom window is cropped to the size of the final presented image. In Figure
Flowchart of the FZE processing strategy.
To simulate the real visual perception, this paper uses a phosphene model based on a Gaussian distribution [
The images processed by the phosphene model correspond to the actual electrode array of visual prosthesis. This process is called “Lowering Resolution with Gaussian (LRG) dots.” After LRG pixelization, the images processed by “FEBR” and “FZE” were converted to 6 different phosphene resolutions. Figure
Results under different strategies after LRG: (a) original image, (b) image processed after LRG without optimization, (c) image processed after LRG under the strategy FZE, and (d) image processed after LRG under the strategy FEBR.
This paper adopts a two-stage saliency detection scheme based on the manifold ranking. It makes full use of the image’s intrinsic manifold structure, which can be effectively used to highlight the target uniformly as well as compressing the background. In order to illustrate the advantages of the model, we compare this algorithm with other saliency detection algorithms such as IT, GB, FT, CA, RC, and CB [
Average time taken to compute saliency map for images in MSRA-1000 database.
Method | IT | GB | FT | CA | RC | CB | Ours |
---|---|---|---|---|---|---|---|
Time (s) | 0.246 | 1.614 | 0.102 | 38.896 | 0.154 | 2.146 | 0.091 |
Code type | MATLAB | MATLAB | C++ | MATLAB | C++ | MATLAB | MATLAB |
Comparison of different saliency detection algorithms.
Comparison of different saliency detection algorithms.
The salient object mask is segmented using the adaptive double-threshold method. In order to evaluate the performance of the segmentation method, we compare the mask data of salient object with the ground-truth. In Figure
Comparison of different saliency segmentation method: (a) example image “nail clippers” and (b) example image “dust bin.”
In order to verify the recognition rate of the objects in real-life scenes using the processing strategies proposed in this paper, we designed a psychophysical experiment for visual prosthesis. The materials used in the experiment were images selected from our daily life and were essential in our daily life. The visual field was 20 that simulates the prosthesis device. The resolution of every image was normalized to
Subjects were seated 60 cm in front of a 21-inche LCD monitor (Lenovo INC, Beijing,
The recognition score (RS) was used to quantify the recognition results. If the subjects were able to correctly recognize the objects and give the right name, RS was set to 2. If the subjects could not correctly name the object and they could describe the shape or specific features of objects, RS was set to 1. Otherwise, the RS was set to 0. As shown in (
In this experiment, the accuracy of the object recognition task was evaluated under 6 different resolutions. Table
Results of the object recognition rate under different resolutions.
Strategy | Resolution | |||||
---|---|---|---|---|---|---|
| | | | | | |
FZE | 0.00 ± 0.00 | 35.46 ± 2.59 | 64.13 ± 2.58 | 83.29 ± 2.38 | 92.45 ± 1.83 | 96.32 ± 1.12 |
FEBR | 0.00 ± 0.00 | 13.57 ± 2.13 | 54.14 ± 2.23 | 81.39 ± 2.42 | 90.26 ± 2.19 | 95.56 ± 1.09 |
Comparison of the object recognition rate under different resolutions (
The image resolution has a statistically significant effect on the RA scores (
Introduction of a certain image processing technique is considered to be an effective method to optimize the presentation of the visual information in visual prosthesis. Some basic image processing methods such as edge detection and contrast enhancement have been applied to some retinal prosthesis systems. In this paper, we demonstrate that the introduction of a saliency detection model based on manifold ranking and a segmentation technique based on the multithreshold and connectivity analysis have significant effects on the segmentation of main objects in daily scenes. Two processing strategies are proposed to optimize the image presentation. These strategies can help extract and present the main information from real-life scenes and help a blind person successfully complete the tasks of perception and recognition of objects in a given scene. Through psychophysical experiments, we show that the proposed image processing methods can significantly improve the ability of a person’s object location and recognition.
Automatic detection and extraction of main objects in a scene are a key step in the processing strategy. The proposed saliency detection model and the segmentation technique have the ability to segment objects in 60 experimental materials with an accuracy of 90%.
The validity of the object segmentation will affect further processing for object enhancement and the performance of the task of perceiving and recognizing of objects for prosthetic users. The recognition performance in the two processing strategies is analyzed to show that segmentation significantly affects the recognition rate. Segmentation closer to the real scene improves the accuracy of the object recognition. This makes it very clear that our proposed method cannot be similar to the function of human eyes which can extract complete objects from complex scenes. The objects extracted using our proposed method will always either miss some part of the content or contain some unnecessary background information. According to prosthesis research, edge information has a significant influence on the recognition of objects. If the edge information of the extracted object is not well preserved, the final recognition performance will be relatively poor.
The most important factor affecting the accuracy of object extraction is the generation of the saliency map. It is also a technical challenge in the research area of saliency region segmentation. The main objects are not marked by adequate salient points in poorly segmented materials. Although the computational model based on manifold ranking provides huge benefit for saliency extraction, some objects in the image cannot get enough large area of interest. More efficient saliency models, ROI definitions, and segmentation methods need to be adopted in future research to achieve more accurate objects extraction from the daily scenes.
Obtaining objects from real-life scenarios can be used to enhance the presentation of the objects to the blind people. Edge information is an important object feature and is the main factor which affects the performance of object recognition during low resolution. To enhance the foreground information, the presentation strategy uses foreground zooming and keeps the edge information. Experimental results in simulated prosthetic vision show that foreground zooming and edge detection can effectively improve the recognition accuracy of subjects in the good segmentation. Although the individual recognition results are not good in poor segmentation, they do not affect the overall performance of the image recognition task. Due to the zoomed foreground, this approach presents more edge detail information to the user than the direct edge detection approach. We report that the foreground zooming strategy in this study has the highest recognition results. Based on this, in the current stage of retinal prosthesis systems with less than 1000 electrodes, foreground zooming is more suitable for visual presentation. By enhancing the foreground information, not only will the number of correctly named objects be greatly improved, but also the subjects can describe the objects in the images more accurately. Although the two processing methods remove certain background information, they also reduce the influence of scene illumination to highlight the main information in the scene. This is highly significant to obtain better visual task performance under limited visual perceptions.
The results of different image analyses show that different subjects have different recognition abilities. With the same segmentation, objects with simple shape are relatively hard to recognize. But the subjects are able to accurately describe the image after enhanced expression. The recognition rate of the objects with complex contour information is much higher than objects with simple shape. Zhao et al. [
The results demonstrate that the recognition accuracy is significantly affected by the resolution. When the resolution increased from
In this paper, we simulate the visual percepts with the phosphene model with Gaussian distributions. In fact, current visual percepts provided by visual prosthesis contain multiple factors that would influence recognition, including distortion, dropout, and shape irregularity. Insights from psychological experiments and theoretical considerations suggest that the interaction between implant electronics and the underlying neurophysiology of the retina will result in spatiotemporal distortions that include visual “comets” in epiretinal prostheses and motion streaks in optogenetic devices [
In this paper, different visual information processing strategies were explored to optimize the presentation of visual information under simulated prosthetic vision. The saliency detection model is introduced to detect salient objects in real-life scenes. A multithreshold method is proposed to improve the foreground segmentation. Two processing strategies are carried out to optimize the presentation of visual information. Experimental results demonstrate that the two strategies significantly improve the visual perception and recognition rate of objects under low resolution. This work can be used to help blind people to significantly improve their ability to adapt to the surrounding environment.
Foreground zooming and edge detection
Foreground Edge Detection and Background Reduction
Lowering Resolution with Gaussian dot
Region of Interest.
The authors declare that there are no conflicts of interest regarding the publication of this article.
The authors are thankful to the volunteers from Xi’an University of Technology. The research is supported by the National Natural Science Funds Youth Science Fund Project (no. 61102017) and the Scientific Research Program Funded by Shaanxi Provincial Education Department (no. 12JK0499).