^{1}

^{2}

^{3}

^{1}

^{1}

^{2}

^{3}

The study of the neuronal correlates of the spontaneous alternation in perception elicited by bistable visual stimuli is promising for understanding the mechanism of neural information processing and the neural basis of visual perception and perceptual decision-making. In this paper, we develop a sparse nonnegative tensor factorization-(NTF)-based method to extract features from the local field potential (LFP), collected from the middle temporal (MT) visual cortex in a macaque monkey, for decoding its bistable structure-from-motion (SFM) perception. We apply the feature extraction approach to the multichannel time-frequency representation of the intracortical LFP data. The advantages of the sparse NTF-based feature extraction approach lies in its capability to yield components common across the space, time, and frequency domains yet discriminative across different conditions without prior knowledge of the discriminating frequency bands and temporal windows for a specific subject. We employ the support vector machines (SVMs) classifier based on the features of the NTF components for single-trial decoding the reported perception. Our results suggest that although other bands also have certain discriminability, the gamma band feature carries the most discriminative information for bistable perception, and that imposing the sparseness constraints on the nonnegative tensor factorization improves extraction of this feature.

The question of cortex is of central
importance to many issues in cognitive neuroscience. To answer this question, one important experimental paradigm is to dissociate percepts from the visual
inputs using bistable stimuli. The study of bistable perception holds great
promise for understanding the neural correlates of visual perception [

One important research direction in the field of
neuroscience is to study the rhythmic brain activity during different tasks.
For example, it is discovered that the beta and mu bands are associated with
event-related desynchronization and the gamma band is associated with
event-related synchronization for movement and motor imaginary tasks [

The conventional two-way decomposition approaches include principal component analysis (PCA), independent component analysis
(ICA), and linear discriminant analysis (LDA), which extract features from
two-way data (matrices) by decomposing them into different factors (modalities)
based on orthogonality, independence, and discriminability, respectively.
However, PCA, ICA, or LDA all represent data in a holistic way with their
factors both additively and subtractively combined. For two-way decomposition
of nonnegative data matrices, it is intuitive to allow only nonnegative factors
to achieve an easily interpretable parts-based representation of data. Such an
approach is called nonnegative matrix factorization (NMF) [

In this paper, we develop a sparse NTF-based method to
extract features from the LFP responses for decoding the bistable
structure-from-motion (SFM) perception. We apply the feature extraction
approach to the multichannel time-frequency representation of intracortical LFP
data collected from the MT visual area in a macaque monkey performing a SFM
task, aiming to identify components common across the space, time, and
frequency domains and at the same time discriminative across different
conditions. To determine the best LFP band for bistable perceptual
discrimination, we first cluster each NTF component using the

Electrophysiological recordings were performed in a healthy adult male rhesus monkey. After behavioral training was complete, occipital recording chambers were implanted and a craniotomy was made. Intracortical recordings were conducted with a multielectrode array while the monkey was viewing structure-from-motion (SFM) stimuli, which consisted of an orthographic projection of a transparent sphere that was covered with randomly distributed dots on its entire surface. Stimuli rotated for the entire period of presentation, giving the appearance of three-dimensional structure. The monkey was well trained and required to indicate the choice of rotation direction (clockwise or counterclockwise) by pushing one of two levers. Correct responses for disparity-defined stimuli were acknowledged with application of a fluid reward. In the case of fully ambiguous (bistable) stimuli, where the stimuli can be perceived in one of two possible ways and no correct response can be externally defined, the monkey was rewarded by chance. Only the trials of data corresponding to bistable stimuli are analyzed in the paper. The recording site was the middle temporal area (MT) of the monkey's visual cortex, which is commonly associated with visual motion processing. LFP was obtained by filtering the collected data between 1 to 100 Hz.

In [

Let

A tensor can be converted into a matrix. Let the matrix

The cost function for the sparse NTF approach based on
minimization of the generalized KL divergence can be written as

The

We use the silhouette value to determine the number of
clusters [

Support vector machines (SVMs) is a popular classifier
that minimizes the empirical classification error and at the same time
maximizes the margin by determining a linear separating hyperplane to
distinguish different classes of data [

Assume that

The Lagrange multiplier method can be used to find the
optimal solution for

In this section, we provide experimental examples to
demonstrate the performance of the proposed feature extraction approach for
predicting perceptual decisions from the neuronal data. Simultaneously,
collected 4-channel LFP data were used for demonstration. Gabor transform (STFT
with a Gaussian window) is used to obtain the time-frequency representation of
the data. The number of trials is 96. The time window used is from stimulus onset
to 1 second after that. We find that the performance does not change much if a
different time window, for example, from stimulus onset to 800 milliseconds
after that, is used. We use both nonsparse and sparse NTF approaches based on
minimizing the generalized KL divergence and choose the number of NTF
components to be 20 with random initialization for all modalities. The
regularization parameter

Figure

The silhouette value obtained by clustering the nonsparse NTF components using the

Comparison of the frequency modalities of the 20 nonsparse NTF components clustered by the

The silhouette value obtained by clustering the sparse NTF components using the

Comparison of the frequency modalities of the 20 sparse NTF components clustered by the

To have a closer look at the NTF components, we
construct the time-frequency representation for each component based on the
outer product of its frequency modality and time modality. Figures

The time-frequency plot for (a) the first nonsparse NTF component and (b) the second nonsparse NTF component of cluster 1. Red and blue represent strong and weak activity, respectively. Note that the first component has localized time-frequency representation in the high gamma band, while the second component contains strong activity in both high gamma band and other bands. In addition, these two components occupy different time windows.

The representative time-frequency plot for (a) cluster 1, (b) cluster 2, (c) cluster 3, and (d) cluster 4, respectively, of the sparse NTF components. Red and blue represent strong and weak activity, respectively. Note that the first cluster for the sparse NTF components contains only one component in the high gamma band (50–60 Hz) with well-localized time-frequency representation, and that clusters 2–4 have concentrated time-frequency distributions in the delta band (1–4 Hz), alpha band (10–20 Hz), and low gamma band (30–40 Hz), respectively.

We next compare the SVM decoding accuracy based on
different features of the nonsparse and sparse NTF components in Tables

Comparison of the decoding accuracy based on the combination of all features from each of cluster 1–4 (denoted as c1 (combined)–c4 (combined), resp.), and the single best feature from each of cluster 1–4 (denoted as c1 (best)–c4 (best), resp.). Clusters 1–4 correspond to high gamma band (50–60 Hz), delta band (1–4 Hz), alpha band (10–20 Hz), and low gamma band (30–40 Hz), respectively. The nonsparse NTF approach based on minimization of the generalized KL divergence is used.

Feature | c1 (combined) | c2 (combined) | c3 (combined) | c4 (combined) |
---|---|---|---|---|

Decoding accuracy | 0.61 | 0.63 | 0.63 | |

Feature | c1 (best) | c2 (best) | c3 (best) | c4 (best) |

Decoding accuracy | 0.61 | 0.61 | 0.61 |

Comparison of the decoding accuracy based on the combination of all features from each of cluster 1–4 (denoted as c1 (combined)–c4 (combined), resp.), and the single best feature from each of cluster 1–4 (denoted as c1 (best)–c4 (best), resp.). Clusters 1–4 correspond to high gamma band (50–60 Hz), delta band (1–4 Hz), alpha band (10–20 Hz), and low gamma band (30–40 Hz), respectively. The sparse NTF approach based on minimization of the generalized KL divergence is used.

Feature | c1 (combined) | c2 (combined) | c3 (combined) | c4 (combined) |
---|---|---|---|---|

Decoding accuracy | 0.61 | 0.53 | 0.58 | |

Feature | c1 (best) | c2 (best) | c3 (best) | c4 (best) |

Decoding accuracy | 0.61 | 0.61 | 0.61 |

In this paper, we have developed a sparse nonnegative
tensor factorization-(NTF)-based method to extract features from the local
field potential (LFP) in the middle temporal area (MT) of a macaque monkey
performing a bistable structure-from-motion (SFM) task. We have applied the
feature extraction approach to the multichannel time-frequency representation
of the LFP data to identify components common across the space, time, and
frequency domains and at the same time discriminative across different
conditions. To determine the most discriminative band of LFP for bistable
perception, we have clustered the NTF components using the

The work was supported by the NIH Grant and the Max Planck Society.