Humans make about 3 saccades per second at the eyeball's speed of 700 deg/sec to reposition the high-acuity fovea on the targets of interest to build up understanding of a scene. The brain's visuosaccadic circuitry uses the oculomotor command of each impending saccade to shift receptive fields (RFs) to cortical locations before the eyes take them there, giving a continuous and stable view of the world. We have developed a model for image representation based on projective Fourier transform (PFT) intended for robotic vision, which may efficiently process visual information during the motion of a camera with silicon retina that resembles saccadic eye movements. Here, the related neuroscience background is presented, effectiveness of the conformal camera's non-Euclidean geometry in intermediate-level vision is discussed, and the algorithmic steps in modeling perisaccadic perception with PFT are proposed. Our modeling utilizes basic properties of PFT. First, PFT is computable by FFT in complex logarithmic coordinates that also approximate the retinotopy. Second, the shift of RFs in retinotopic (logarithmic) coordinates is modeled by the shift property of discrete Fourier transform. The perisaccadic mislocalization observed by human subjects in laboratory experiments is the consequence of the fact that RFs' shifts are in logarithmic coordinates.

In this article, we demonstrate that a mathematical data model we have developed for image representation intended for biologically-mediated machine vision systems [

Humans make about three saccades per second at the eyeball's maximum speed of

Although interruptions caused by saccades remain unnoticed in our daily life, in laboratory experiments, it becomes possible to probe the unexpected consequences of the saccadic eye movement. Specifically, laboratory experiments in lit environments have shown that briefly flashed probes around the saccade's onset are perceived as compressed toward the saccadic target [

In this article, we argue that the conformal camera's complex projective geometry and the related harmonic analysis (projective Fourier analysis) may be useful in perisaccadic perception. In particular, the image representation in terms of PFT may efficiently model the RFs shift that remaps cortical retinotopy in the anticipation of each saccade and the related phenomenon of perisaccadic perceptual space compression. During fixations the brain acquires visual information, resolving inconsistencies of the brief compression resulting from remapping. The computational significance of this remapping, when incorporated into neural engineering design of a foveate visual system, stems from the fact that it may integrate visual information from an object across saccades, eliminating the need for starting visual information processing anew three times per second at each fixation and speeding up a costly process of visual information acquisition [

This paper is organized as follows. We outline the neural processes of the visuosaccadic system involved in the preparation, execution, and control of the saccadic eye movement in Section

One of the most important functions of any nervous system is sensing the external environment and responding in a way that maximizes immediate survival chances. For this reason, the perception and action have evolved in mammals by supporting each other's functions. This functional link between visual perception and oculomotor action is well demonstrated in primates when they execute the eye-scanning movements (saccades) to overcome the eye's acuity limitation in building up the scene understanding.

In fact, humans can only see clearly the central part of the visual field of a

(a) San Diego skyline and harbor. (b) Progressively blurred image of (a) simulating the progressive loss of retinal acuity with eccentricity. The circle

With three saccades per second, the saccadic eye movement is the most common bodily movement. The eyes remain relatively still between consecutive saccades for about 180–320 ms, depending on the task performed. During this time period, the image is processed by the retinal circuitry and sent, mainly, to the visual cortex (starting with the primary visual cortex, or V1, and reaching higher cortical areas, including the cognitive areas), with a minor part sent to the oculomotor midbrain areas. During the saccadic eye movement, the visual sensitivity is markedly reduced, although some modulations of low spatial frequencies (contrast and brightness) are wellpreserved or even enhanced [

Although they are the simplest of bodily movements, the eyes' saccades are controlled by a widespread neural network that involves nearly every level of the brain. In Figure

The visuosaccadic system. The course of events is the following. (

Although many of the neural processes involved in saccade generation and control are amenable to precise quantitative studies [

We model the human eyes’ imaging functions with the conformal camera; the name of which will be explained later. In the conformal camera, shown in Figure

The conformal camera. (a) Image projective transformations are generated by iterations of transformations covering translations “

The image projective transformations are generated by the two basic transformations

In the homogeneous coordinate framework of projective geometry [

In this embedding, the “slopes”

The stereographic projection

The mappings in (

The image plane of the conformal camera does not admit a distance that is invariant under image projective (linear-fractional or Möbius) transformations. Therefore, geometry of the conformal camera does not possess a Riemannian metric; for instance, there is no curvature measure. It is customary in complex projective (Möbius or inversive) geometry to consider a line as a circle passing through the point

As discussed before, circles play a crucial role in the conformal camera geometry and it should be reflected in psychological and computational aspects of natural scene understanding whether this camera is relevant to modeling primate visual perception.

Neurophysiological experiments demonstrate that the retina filters impinged images extracting local contrast spatially and temporally. For instance, center surround cells at the retinal processing stage are triggered by local spatial changes in intensity referred to as edges or contours. This filtering is enhanced in the primary visual cortex, the first cortical area receiving the retinal output. This area itself is a case study in dense packing of overlapping visual submodalities: motion, orientation, frequency (color), and oculomotor dominance (depth). In psychological tests, humans easily detect a significant change in spatial intensity (low-level vision), and effortlessly and unambiguously group this usually fragmented visual information (contours of occluded objects, for example), into coherent, global shapes (intermediate-level vision). Considering its computational complexity, this grouping is one of the difficult problems that primate visual system has to solve [

The Gestalt phenomenology and quantitative psychological measurements established the rules, summarized in the ideas of good continuation [

This discussion shows that the conformal camera should effectively model the eye's imaging functions related to lower- and intermediate-level visions of natural scenes.

The projective Fourier analysis has been constructed by restricting geometric Fourier analysis of

In log-polar coordinates

In spite of the logarithmic singularity of log-polar coordinates, PFT of any function

On introducing complex coordinates

We discussed before the relevance of the conformal camera to the intermediate-level vision task of grouping image elements into individual objects in natural scenes. Here we discuss the relevance of the data model of image representation based on DPFT to image processing in biologically mediated machine vision systems.

The mapping

The following facts support our modeling of retinotopy with DPFT. First, for small

We conclude this discussion with the following remark. Both models discussed above, as well as all other similar models, are, in fact, fovea-less models [

The DPFT approximation was obtained using the rectangular sampling grid

The retinocortical sampling interface. (a) The exp-polar sampling (the distance between circles displayed in the first quadrant changes exponentially) of a bar pattern. (b) The bar pattern in the cortical coordinates rendered by the inverse DPFT computed with FFT. The cortical uniform sampling grid, obtained by applying the complex logarithm to the exp-polar grid in (a), is shown in the upper left corner.

Let us assume that we have been given a picture of the size

The central disc of radius

We let

In the example from the previous section, the number of pixels in the original image is

The most basic and frequent eye's imaging functions are connected with an incessant saccadic eye movement (about

When the eyes remain fixed, motion of objects is perceived by the successive stimulation of adjacent retinal loci. These image transformations are modeled in the conformal camera by the corresponding covariant transformations of the image representation in terms of DPFT; see the end of Section

Of the numerical approaches to foveate (also called space-variant) vision, involving, for example, Fourier-Mellin transform or log-polar Hough transform, the most closely related to our work are results reported by Schwartz' group at Boston University. We note that the approximation of the retinotopy by a complex logarithm was first proposed by Eric Schwartz in 1977. This group introduced the fast exponential chirp transform (FECT) [

The other approaches to space-variant vision use the geometric transformations, mainly based on a complex logarithmic function between the nonuniform (retinal) sampling grid and the uniform (cortical) grid for the purpose of developing computer programs for problems in robotic vision. We give only a few examples of such problems: tracking [

A sequence of fast saccadic eye movements is necessary to process the details of a scene by fixating the fovea successively on the targets of interest. Given the frequency of three saccades per second and limited computational resources, it is critical that visual information is efficiently acquired without starting anew much of this acquisition process at each fixation. This is critically important in robotic designs based on the foveate vision architecture (silicon retina), and in this section we propose the front-end algorithmic steps in addressing this problem.

The model of perisaccadic perception presented in this section is based on the theory in [

Experimental data (see [

We recall the time course of events (Figure

The modeling steps are the following.

The eye initially fixated at

The local retinotopic mapping of four probes flashed around the saccade target at

The log-polar image is multiplied by two characteristic set functions,

Modeling shifts of RFs of flashed probes. See text for detailed description.

These transformations are shown on the left of the gray arrow in Figure

Further, the image reflection about the vertical axis of the

This perisaccadic compression is obtained by decoding the cortical image representation to the visual field representation:

Perisaccadic compression of flashed probes perceived briefly during the saccadic eye movement. Note that this perception does not involve the visual sensory system.

In the modeling steps we presented above, the cortical translation shown by the green arrow in Figure

Although we do not show here quantitative results of the modeling steps, the qualitative results can be seen if we translate the cortical image of the bar in Figure

The model presented in this section complements the theory proposed in [

The global retinotopy reflects the anatomical fact that the axons in the optic nerve leaving each eye split along the retinal vertical meridian when the axons originating from the nasal half of the retina cross at the optic chiasm to the contralateral brain's hemisphere and join the temporal half, which remains on the eye's side of the brain. This splitting and crossing reorganize the local retinotopy (log-polar mapping) such that the left brain hemisphere receives the right visual field projection and the right brain hemisphere receives the left visual field projection. According to the split theory [

Although crucial for synthesizing

Two consecutive reflections that can be computed with FFT account for the global retinotopy without the foveal region.

At this point, we can only graphically show what we expect to obtain when the foveal image representation complements the peripheral (log-polar) image representation we have developed in terms of projective Fourier transform. To this end, Figure

Schematic depiction of the global retinotopy with the peripheral region (gray) and the foveal region (yellow). Shaded in gray lines is the perifoveal region, which connects the other two regions. The red dots show the shifted neuronal activity in the receptive fields from the veridical position (gray dots) resulting in the illusory space compression.

Our model, which is based on PFT, uses the approximation of the retinotopy given by the complex logarithmic mapping

The results obtained in the simulations in [

Further, in both our model and the model in [

Two computational theories of transsaccadic vision that have been proposed in visual neuroscience are related to our modeling, both with similar functional explanation of perisaccadic mislocalization by the cortical magnification factor. The first theory [

The second theory [

What really sets apart our modeling from other models is the fact that the computational efficiency is built into the modeling process, as all algorithmic steps (except the last one) involve computations with FFT. This is especially important because the incessant occurrence of saccades and the time needed for the oculomotor system to plan and execute each saccade require that visual information is efficiently processed during each fixation period without repeating, afresh, the whole process at each fixation [

All models proposed so far capture only the initial, front-end stage of remapping for a particularly simple scene of flashed probes and, though they explain the perisaccadic mislocalization phenomenon, they leave out the crucial modeling step of the integration of the objects' features (pattern, color,

Although the understanding of neural mechanisms involved in trans-saccadic perception is incomplete, a significant progress in understanding dynamic interaction taking place between different pathways in the visuosaccadic system has been recently made. In particular, the fundamental principles underlying perception of objects across saccades have been outlined [

Further, the conformal camera geometry's effectiveness in the intermediate-level vision problems and the perspectively covariant projective Fourier analysis, well adapted to retinotopy, strongly suggest that DPFT-based image representation should be useful in modeling the neural processes that underlie the transfer of the objects' features across saccades and maintain the continuity and stability of perception.

Finally, it was observed that saccades cause not only a compression of space, but also a compression of time [

In this article, we presented a comprehensive framework we have developed for computational vision over the last decade, and we applied this framework to model some of the processes underlying trans-saccadic perception. We have done this by bringing, in one place, physiological and behavioral aspects of primate visual perception, the conformal camera's computational harmonic analysis, and the underlying conformal geometry. This allowed us to discuss the conformal camera's effectiveness in modeling a biologically mediated active visual system. First, the conformal camera geometry fully accounts for the basic concepts of cocircularity and scale invariance employed by the human vision system in solving the difficult intermediate-level vision problems of grouping local elements into individual objects of natural scenes. Second, the conformal camera has its own harmonic analysis—projective Fourier analysis—providing image representation and processing that is well adapted to image projective transformations and the retinotopic mapping of the brain visual and oculomotor pathways. This later assertion follows from the fact that the projective Fourier transform integrates the head, eyes (conformal cameras), and visual cortex into a single computational system. Based on this system, we presented a computational model for some neural processes of the perisaccadic perception. In particular, we modeled the presaccadic activity, which, through shifts of stimuli current receptive fields to their future postsaccadic locations, is thought to underlie the scene remapping of the current foveal frame to the frame at the upcoming saccade target. This remapping uses the motor command of the impending saccade and may help maintain stability of primate perception in spite of three saccadic eye movements per second with the eyeball's maximum speed of

The author thanks Dr. Noriyasu Homma for helpful comments.