A Collaborative Brain-Computer Interface Framework for Enhancing Group Detection Performance of Dynamic Visual Targets

The superiority of collaborative brain-computer interface (cBCI) in performance enhancement makes it an effective way to break through the performance bottleneck of the BCI-based dynamic visual target detection. However, the existing cBCIs focus on multi-mind information fusion with a static and unidirectional mode, lacking the information interaction and learning guidance among multiple agents. Here, we propose a novel cBCI framework to enhance the group detection performance of dynamic visual targets. Specifically, a mutual learning domain adaptation network (MLDANet) with information interaction, dynamic learning, and individual transferring abilities is developed as the core of the cBCI framework. MLDANet takes P3-sSDA network as individual network unit, introduces mutual learning strategy, and establishes a dynamic interactive learning mechanism between individual networks and collaborative decision-making at the neural decision level. The results indicate that the proposed MLDANet-cBCI framework can achieve the best group detection performance, and the mutual learning strategy can improve the detection ability of individual networks. In MLDANet-cBCI, the F1 scores of collaborative detection and individual network are 0.12 and 0.19 higher than those in the multi-classifier cBCI, respectively, when three minds collaborate. Thus, the proposed framework breaks through the traditional multi-mind collaborative mode and exhibits a superior group detection performance of dynamic visual targets, which is also of great significance for the practical application of multi-mind collaboration.


Introduction
Brain-computer interface (BCI) technology aims to build an interaction bridge between human and computer and provide a new technical means for the brain to control and monitor the external environment. Advanced BCI technology can not only help to improve the movement abilities of patients with physical disorders [1,2], but also enhance such abilities of healthy people [3][4][5][6][7]. Affected by changes in the surrounding environment and in the psychological factors of users, a single-mind BCI shows limited performance and thus is hard to translate into practical application. Multi-mind collaborative brain-computer interfaces (cBCIs) have special advantages in enhancing the group detection performance. Multi-mind BCIs are equivalent to multiple information processing systems, which show higher group decision-making performance and stronger robustness. In addition, multi-mind collaborative work is more conducive to the future development of humancomputer interaction socialization. P300-based BCIs broaden the BCI's practical application. e classical visual speller and target detection are based on P300 identification [8,9]. For the task of dynamic target detection, the dynamics of video background, the uncertainty of distractors, and the jitter of detection latency increase the detection difficulty, resulting in the limitations of single-mind BCI [10,11]. e cBCI can be considered as a good strategy to solve the problem, which will contribute to improving the stability and accuracy of comprehensive discrimination [12,13]. erefore, building a novel cBCI framework to highlight the performance advantages of multi-mind enhancement has become the research focus to improve the performance of dynamic visual target detection. e cBCIs are praised as one of the most promising applications in human augmentation [14][15][16].
e group decision-making capability can be improved by integrating the multi-mind information and optimizing collaborative strategies [17,18]. For the task of target detection, the multimind collaborative information integration mainly includes three levels: signal-level fusion, feature-level fusion, and decision-level fusion. Signal-level fusion is the simplest way to improve the signal-to-noise ratio (SNR) of EEG signals, where multi-participant EEG signals are averaged and input to a classifier. Feature-level fusion is the classification of the averaged or concatenated features from multi-participant EEG signals. Both signal-level fusion and feature-level fusion belong to single-classifier cBCI (SC-cBCI). Decision fusion merges the multiple classifiers' decision-making results into the final decision. Each classifier corresponds to one participant.
us, decision fusion is also known as multiclassifier cBCI (MC-cBCI). e specific decision emergence strategies include averaged decision, weighted decision, and majority voting on the decision probability layer. e cBCI has attracted scholars' interest in target detection. To explore the best fusion level, Matran-Fernandez et al. preliminarily verified that the decision fusion in the cBCI performs better than any single-mind BCI (sBCI) for the single-trial P300 detection [19][20][21]. e relevant studies indicated that the decision-level fusion performs better than signal-level fusion and feature-level fusion in the cBCI [22]. To explore the best decision emergence strategies, Cecotti et al. [23,24] found that the averaged decision performs better than weighted decision and voting strategy; Davide et al. [15,[25][26][27], Yuan et al. [28,29], and Jiang et al. [30] trained individual decision weights through least angle regression (LARS) method, twolayer SVM, and a combination of SVM and LDA classifiers to improve group detection performance. Since the above findings are inconsistent, the selection of decision fusion strategies relies on the specific experimental tasks. To introduce the information interaction among multiple minds, Davide et al. [16] studied the impact of individual behavior decision sharing on collaborative behavior decision-making and found that information interaction in the experimental process will lead to the decline of behavior-level collaborative decision-making performance. To fuse more information and improve the group detection performance, Zhang et al. [31] proposed a dual brain collaborative target detection model, which integrates data fusion and feature fusion to ensure that important information is not missed; Eckstein et al. [32] explored and compared the impact of the user number on the collaborative decision performance and found that the best collaborative detection performance generally requires 5∼10 users. ese studies provide technical reference and theoretical support for the design of multi-mind collaborative experimental paradigm and the development of cBCI framework.
Studies on multi-mind collaborative target detection have achieved remarkable results. However, some concerning issues remain. Firstly, in the current cBCIs, the computing models are static and unidirectional [18,[32][33][34] only be known as "multi-mind fusion" rather than "multimind collaboration." Real collaboration should involve multi-mind information interaction, which is a dynamic learning process. Given the higher error rate caused by the individual communication in the experimental process [16], relative to behavior level, the information interaction can be established at the neural decision layer.
us, a dynamic interactive cBCI framework at neural decision layer could be considered to improve the performance. Secondly, previous studies required participants to increase the preparation time to collect labeled signals for an individual-specific computing model [26,27,34]. For the dynamic visual target detection, unsupervised domain adaptation networks, P3-MSDA and P3-sSDA, have been developed as an individualgeneral network with reliable performance in previous studies [35]. erefore, it is necessary to develop a novel cBCI framework with information interaction, dynamic learning, and individual transferring abilities to enhance the group detection performance video targets.
In this study, we designed a novel multi-mind cBCI framework based on a mutual learning domain adaptation network (MLDANet), aiming at enhancing the group detection performance of dynamic visual targets. In the framework, a multi-mind synchronous cBCI experimental paradigm for UAV-video vehicle detection is designed; MLDANet is established where the P3-sSDA network is used as the individual network unit. is work made the following contributions.
(1) e MLDANet-cBCI framework was proposed for achieving better group detection performance. In particular, MLDANet establishes the information interaction and dynamic learning mechanism between individual networks and collaborative decision-making by introducing the mutual learning strategy at the neural decision layer. (2) In the MLDANet-cBCI framework, the mutual learning strategy can effectively improve the individual network capability.

Materials and Methods
is collaborative brain-computer interface (cBCI) framework is designed to detect dynamic visual targets, as shown in Figure 1. e framework consists of four modules, which are stimulus presentation, synchronous acquisition, data preprocessing, and classification. e stimulus presentation module synchronously shows unmanned aerial vehicle (UAV) videos to all participants to detect the vehicles from these videos. e synchronous acquisition module collects multi-mind EEG signals with time synchronization. e data preprocessing module aims to obtain artifact-free and filtered EEG epochs for target and nontarget trials. e classification module is the core of the cBCI framework, where a mutual learning domain adaptation network (MLDANet) is proposed to improve the group detection performance.

Stimulus Presentation.
e experimental paradigm for vehicle detection from UAV videos, reported in our previous study [5], is depicted in Figure 2. e UAV videos recorded traffic conditions while flying along campus streets. A series of video clips were segmented from original videos to construct a stimulus library. One hundred video clips with vehicles (one vehicle per video) and 100 video clips without vehicles were, respectively, regarded as target videos and nontarget videos. In this experiment, the total duration time of videos is about 28 minutes. To alleviate the vision load, we divided all the video clips to 10 blocks and set break time between blocks. In each block, 10 target videos and 10 nontarget videos were randomly presented to the participants in each block. e length of the video clip varied from 4 s to 10 s, and there were 2 s "+" before each video to help participants focus their attention. For each target video, the vehicle could enter the visual field from any direction at any time 1 s after the video stimulus was presented. To overcome the influence of video color and eye movement on visual perception, the video clips (1920 × 1080 px 2 ) were transformed into black and white and reduced to 40% (768 × 432 px 2 ) on the screen center against black background. In particular, the break duration totally depended on the subjects for sufficient relax. On average, the experimental duration time of 10 blocks (including break duration) was around 50 minutes for one participant.

Synchronous Acquisition.
A total of 89 healthy college participants volunteered for this study, with a median age of 25 years (right-handed), all of whom reported normal or corrected-to-normal vision and presented no neurological problems. ey all signed the informed consent form before the experiment. All tests involving human participants were approved by the Ethics Committee of Henan Province People's Hospital.
In this study, the EEG signals were collected by the g.USBamp (g.tec, Austria) EEG recording with 16 electrodes. e electrode distribution followed the international 10-20 electrode location. e EEG online sampling rate was 600 Hz with band-pass filtering at 0.01-100 Hz and notch filtering at 50 Hz.
e study was comprised of two parts: single-mind experiment and multi-mind experiment. 29 participants were recruited for the single-mind experiment, and each time only a single participant was invited to perform the detection task. e EEG signals collected from the singlemind experiment were used as the training set. In the multimind experiment, 20 groups (3 participants in each group) were recruited to perform the detection task together. e acquisition environment for the multi-mind synchronous experiment is shown in Figure 3. e same stimulus materials were simultaneously displayed on three copied displays.
ere was no communication between the three participants during the experiment. Meanwhile, EEG signals from 3 participants were synchronously collected by three parallel EEG amplifiers and recorded by recording software (g.Recorder). e recording software arranged these 16channel signals. Channels 1∼16, channels 17∼32, and channels 33∼48 were, respectively, from participant 1, participant 2, and participant 3. In this manner, the synchronization of time, space, and surrounding environment ensured that the influence of external factors on each participant was identical. All the collected data will be made available to peers for any relevant future work.

Data Preprocessing.
e parameter settings of data preprocessing were consistent between single-mind and multi-mind experiments. Firstly, using the fast ICA algorithm and EEGLAB toolbox, the electrooculogram was removed from the original signals. Next, data were filtered to 0.1-10 Hz and downsampled to 100 Hz. en, target segments and nontarget segments were, respectively, extracted from target video-induced and nontarget video-induced EEG signals. e target trials were segmented starting from Computational Intelligence and Neuroscience target onset time. One target video can only induce one target trial, and nontarget trials were segmented from nontarget video-induced signals without overlapping, where one nontarget video can contain several nontarget trials. us, 100 target trials from 100 target videos and 521 nontarget trials from 100 nontarget videos were extracted. e signals for each trial were 1500 ms. e size of the singletrial sample was 16 × 150 (channels × time sample points).
Since a domain adaptation network was applied to predict the detection performance. Single-mind signals and multi-mind signals were, respectively, used to construct the source domain and the target domain.

Single-Mind Signals (Source Domain).
Single-mind signals from 29 participants were employed to construct the training set. In our previous studies [35], individuals with strong P3 responses by P3 map-clustering method as source domains help to achieve better detection performance. Here, the P3 map-clustering method was applied to select excellent individuals as the source domain for better detection performance. Due to the serious time jitter of P300 latency for the video-induced EEG signals, before using the P3 mapclustering method, individual P3 maps had to be extracted by the event-related potential (ERP) alignment method [11]. e principle of this method is to reduce the spatial dimension of the single-trial signal to construct a one-dimensional target ERP template by the common spatial patterns (CSP) [6,36], and match all the one-dimensional time series with the ERP template to obtain the aligned P300 signals. Using the constructed 1000 ms ERP template, the size of the aligned trials was determined as 16 × 100 (channels × time sample points). e brain topographic map at a peak time of the P300 component was extracted as an individual P3 map. Here, 29 individual P3 maps were obtained from 29 participants. Using a K-means distanceclustering method with two clustering centers, 29 maps were clustered into two groups with strong P3 maps and weak P3 maps [35]. Subsequently, the individuals in the strong P3 map group were considered to act as the source domain.

Multi-Mind Signals (Target Domain).
Multi-mind signals were used as the testing set. Using the ERP template constructed from the source domain, there were 621 aligned trials in total (100 target trials and 521 nontarget trials) available for each individual, which constituted an imbalanced dataset. e size of the single-trial sample was 16 × 100 (channels × time sample points). e validity of the singletrial signals was tested to screen some samples as target domain for domain adaptation network. e threshold value method was adopted for the sample screening. e singletrial signals with maximum amplitude values within ±120 μV were regarded as valid signals; otherwise, they were regarded as invalid signals. us, each single-trial EEG signal corresponded to two labels (category label and validity label) for each participant.

Classification Model.
Aimed at enhancing the performance of group detection of dynamic visual targets, MLDANet is proposed as shown in Figure 4. e core of MLDANet is to establish the mechanisms of multi-mind   information interaction and dynamic learning at the neural decision layer. In the MLDANet, the collaborative decisionmaking guides the individual network to re-decision-making for the enhancement of collaborative decision-making performance.
In the proposed framework, an unsupervised singlesource domain adaptation network with strong P3 map individuals as the source domain for dynamic visual target detection (P3-sSDA network) is used as the individual network unit [35]. e P3-sSDA network is an individualgeneralized model with good performance in EEG-based dynamic visual target detection, as shown in Figure 5. P3-sSDA consists of five parts: source domain selector, feature extractor, domain discriminator, category classifier, and target domain sample selector. In P3-sSDA, a P3 mapclustering method selected the individuals with strong P3 maps as one source domain. Feature extractor extracts the EEG features from video targets-induced EEG signals. Domain discriminator performs the adversarial domain adaptation to eliminate individual differences. Category classifier classifies the EEG features to distinguish target samples and nontarget samples. e testing samples are ranked according to the probability value predicted as target samples. Target domain sample selector selects the samples most like the target samples from testing samples as the target domain samples for the imbalanced data classification. e proportion of samples selected is 80%. In this study, the training individuals and testing individuals were completely independent. us, the P3-sSDA network was suitable for establishing individual-generalized cBCI frameworks for dynamic visual target detection. e detailed network architecture was given in our previous work [35].
In the MLDANet, there were N target domain individuals from one group with N P3-sSDA networks, which were used as classifiers that synchronously worked on predicting the binary classification probability of single-trial signals. Collaborative decision-making was achieved by the decision-making fusion. e mutual learning strategy was introduced between N P3-sSDA networks and collaborative decision-making. Data from the common source domain and different target domains were, respectively, denoted as S 0 , T 1  were input into the n-th P3-sSDA network in each batch. Importantly, target domain samples from N target domain individuals were synchronously recorded for the same stimulus scene. us, the prediction category labels of the target domain could be shared among N P3-sSDA networks, which was crucial to achieve the information interaction. For each iteration, the P3-sSDA network could output the domain discriminant probability between source domain samples and target domain samples, the category prediction probability of source domain samples, and the category prediction probability of target domain samples. ese probabilities were, respectively, denoted as p d n , p s n , and p t n for the n-th P3-sSDA network. Since N individuals synchronously received the same stimuli information, the prediction probability, p t 1 , p t 2 , . . ., and p t N , could reflect the discrimination level for the same information from different individuals. In the process of domain adaptation, the category labels of the source domain l s and the domain labels l d were available; hence, the category loss of the source domain was given as  (2) Nevertheless, the category loss of the target domain was unknown due to the lack of category labels of the target domain l t , which would be estimated in each iteration. By averaging the individual predictions, the integrated prediction probability p t was binarized as l t : us, the category and discrimination losses could be calculated. For the MLDANet with N P3-sSDA network, the entire adversarial learning problem could be described as follows: where L s,t n a dv , L s t n class , and L t n class denote the domain discrimination loss between the source domain and the n-th target domain, the category loss of the source domain when adapted to the n-th target domain, and the category loss of the n-th target domain, respectively; α, c, and β are hyperparameters which, respectively, denote discrimination loss weight, category loss weight of source domain, and category loss weight of target domain. e mechanism of information interaction among multi-mind signals was established through information integration and feedback. In the MLDANet framework, each individual network not only receives the supervision from individual network labels, l d and l s , but also refers to the collaborative label l t , which is calculated from all individual networks. By the backpropagation of collaborative label l t , all the individual networks can learn from each other and make common progress. is process can help small network training to be more powerful. Different from classical cBCI with once collaborative decision-making, with the iteration and updating of network parameters, the dynamic learning ability of the individual network was established in the MLDANet.
us, the single P3-sSDA network (single individual) could learn from the source domain, target domain, and collaborative decision-making (group). After the training process of DA, the online testing can be conducted as the procedure of red dotted line in Figure 5. e new testing samples can directly be tested by feature extractor and category classifier.

Source Domain Individuals.
A total of 29 individual P3 maps were clustered into a strong P3 group sub6, {  sub11, sub12, sub13, sub14, sub16, sub17, sub19,  sub20,  sub21, sub23, sub26, sub28} and a weak P3 group sub1,  {  sub2, sub3, sub4, sub5, sub7, sub8, sub9, sub10, sub15, sub18,  sub22, sub24, sub25, sub27, sub29}. e topographies of the strong P3 group and the weak P3 group at the P3 peak value are presented in Figure 6, where the parieto-occipital region is closely related to P3 responses. Since studies indicated that individuals with strong P3 maps are more suitable as the source domain, we selected all individuals from the strong P3 group as the source domain. e averaged ERP responses of source domain individuals (13 individuals from the strong P3 group) are shown in Figure 7.   Computational Intelligence and Neuroscience

Detection Performance.
In this work, we compared the detection performances of four BCI frameworks, namely, the single-mind BCI (sBCI), the single-classifier cBCI (SC-cBCI), the multi-classifier cBCI (MC-cBCI), and the proposed MLDANet-cBCI framework. e summary of these BCI frameworks is shown in Figure 8.   Computational Intelligence and Neuroscience framework perform decision-level fusion. In particular, the MLDANet-cBCI framework introduces the mutual learning strategy, and the decision-making is dynamic and interactive. e optimal parameter values are shown in Table 1, with all frameworks trained on an NVIDIA TITAN RTX GPU in the PyTorch platform. We fit the model using the Adam optimizer with cross-entropy function. e weighted coefficients among the three participants were (1,1,1), where the collaborative decision-making was obtained by averaging three decision-making probabilities. e detection performances of different BCI frameworks are shown in Table 2, namely, classification accuracy, hit rate, false alarm rate, and F1 score. Here, F1 score is viewed as the main performance evaluation criterion due to the imbalanced classification. e significance level by analysis of variance (ANOVA) was performed between MLDANet-cBCI and other cBCI frameworks. e results indicated that cBCI frameworks outperform the sBCI framework. In cBCI frameworks, decision-level fusion (MC-cBCI and MLDANet-cBCI) outperformed signallevel fusion (SC-cBCI). e MLDANet-cBCI with mutual learning strategy performed the best. Compared with the MC-cBCI framework, the F1 score of MLDANet-cBCI framework improved by 0.12, which highlighted the superiority of the MLDANet-cBCI framework. Relatively, both hit rate and false alarm rate of MLDANet-cBCI framework were lower than those of SC-cBCI framework, which illustrates that MLDANet generates a higher decision threshold. e convergence of training loss and testing F1 score in the MLDANet-cBCI framework from 20 groups is shown in Figure 9. When the iterations exceed 150 rounds, the training loss was unchanged; when the iterations exceeded 50 rounds, the detection performance became stable.

Effects of Mutual Learning Strategy on Individual Network
Capability. It was expected that the mutual learning strategy would improve the individual network capability of the MLDANet-cBCI framework. Here, the individual detection performance for 20 groups in the MC-cBCI and MLDANet-cBCI frameworks was shown (Table 3). e results indicated that the individual F1 score was 0.66 in the MLDANet-cBCI framework, which is significantly higher than that in the MC-cBCI framework (p < 0.01), and could even exceed the group performance in the SC-cBCI framework (0.59) and the MC-cBCI framework (0.61) ( Table 2). is finding further confirmed that the proposed mutual learning strategy improves the information interaction and dynamic learning capability of    Figure 10. e results indicated that around 13∼16 source domain individuals could contribute to the  improved performance for the multi-mind EEG signals in this paradigm. Furthermore, the MLDANet-cBCI framework always showed the best performance. Notably, the proposed MLDANet-cBCI framework was particularly sensitive to the number of source domain individuals, where the F1 score improved by 0.14 with their number varying from 4 to 13. us, the superiority of MLDANet was based on a sufficient number of source domain individuals.

Conclusions
In the present work, we developed a multi-mind cBCI framework to enhance the group detection performance of dynamic visual targets. In this framework, a mutual learning domain adaptation network (MLDANet) was proposed to establish the mechanisms of information interaction and dynamic learning between individual network units and collaborative decision-making. By a mutual learning strategy, the collaborative decision-making could guide the redecision-making process of the individual network for better and more robust detection performance. e results indicated that the proposed MLDANet-cBCI framework outperforms the other cBCI frameworks with the highest F1 score, 0.73, and classification accuracy, 0.91, when three participants collaborate. e mutual learning strategy can effectively improve the individual network capability. erefore, the proposed cBCI framework provides a novel multi-mind collaborative mode for the improvement of collaborative work performance, which is of great significance for the progression of research on human augmentation.

Data Availability
e datasets in this study are available on request to the corresponding author.

Ethical Approval
e study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Ethical Committee of Henan Provincial People's Hospital.

Consent
Informed consent was obtained from all subjects involved in the study.

Conflicts of Interest
e authors declare no conflicts of interest.

Authors' Contributions
Jun Shu contributed to conceptualization. Xiyu Song participated in methodology, data curation, and visualization. Li Tong helped with validation. Bin Yan was responsible for formal analysis and funding acquisition. Qiang Yang and Jian Kou conducted investigation. Xiyu Song and Ying Zeng prepared and wrote the original draft, and reviewed and edited the manuscript. Ying Zeng was involved in project administration. All authors read and agreed to the published version of the manuscript.