A Multichannel fNIRS System for Prefrontal Mental Task Classification with Dual-level Excitation and Deep Forest Algorithm

This paper presents a multichannel functional continuous-wave near-infrared spectroscopy (fNIRS) system, which collects data under a dual-level light intensity mode to optimize SNR for channels with multiple source-detector separations. This system is applied to classify different cortical activation states of the prefrontal cortex (PFC). Mental arithmetic, digit span, semantic task, and rest state were selected as four mental tasks. A deep forest algorithm is employed to achieve high classification accuracy. By employing multigrained scanning to fNIRS data, this system can extract the structural features and result in higher performance. The proposed system with proper optimization can achieve 86.9% accuracy on the self-built dataset, which is the highest result compared to the existing systems.


Introduction
Brain monitoring has been applied to study human brain activity and explore the brain-computer interfaces in recent years. There are many different types of noninvasive brain activity monitoring methods. Traditional techniques such as functional magnetic resonance imaging (fMRI) and positron emission tomography (PET) are expensive and unsuitable for continuous daily brain monitoring. Therefore, some portable and wearable neuroimaging techniques became more popular choices, especially the functional near-infrared spectroscopy (fNIRS).
fNIRS is an optical technique based on the attenuation of near-infrared light that enables us to monitor hemodynamic and metabolic changes during cortical activation [1]. As a noninvasive technique with a balanced spatial-temporal resolution, fNIRS has drawn increasing attention as a powerful alternative or supplement to traditional neuroimaging techniques over the past years [2]. According to measuring changes in the concentrations of tissue chromophores, mainly oxy-and deoxyhemoglobin, fNIRS can be applied to assess functional brain activities in different mental tasks.
Since the brain-computer interface (BCI) technology paves a new way to interact with machines through brain activity, it draws increased research efforts. As a result, novel fNIRS and hybrid fNIRS-EEG systems have been proposed to develop BCI applications with novel analysis algorithms and signal processing techniques [3][4][5][6][7].
A very attractive brain region for BCI application is the prefrontal cortex (PFC), which provides high-quality signals without the interference of the hair and becomes a suitable and popular measurement region in fNIRS. It is known that the PFC is involved in various executive functions, working memory, and semantic tasks [8].
Nowadays, there are many commercial fNIRS equipment in the market for researchers. Most of them provide excellent performance on brain activation detection. But all of these systems are either very complex not suitable for portability or very expensive not suitable for larger scale research study. To implement an fNIRS system for collecting data in the PFC, there are several light-emitting diode (LED) sources and detectors located on the brain region of interest to obtain optical channels. In previous studies, some systems use a single source-detector separation distance [9][10][11][12], and some systems use multiple source-detector separation distances [13,14]. Since detected light intensity varies considerably not only with source-detector separation distance but also with light intensity emitted by light source and tissue optical characteristics, it is necessary to do a careful channel by channel calibration to improve the signal quality for channels with different source-detector separation distances, which means the outputs of all detectors have high signal-to-noise ratio (SNR) rather than saturation.
Calibration should be supported by a hardware which is usually expensive. There are two ways to do the traditional self-adaptive calibration. One is adjusting the emitted source light intensity to enable the detected light intensity achieving the input range of the detect circuit. The other way is adjusting the light detection gain, sensitivity of the light sensor, and related front-end circuit to make the output signal in a proper range of an analog-digital converter (ADC). Besides the complicity of the fNIRS hardware, the adaptive procedure for calibration in software is also complicated, especially for multiple source-detector separation layout configuration which is more flexible for specific brain monitors. What is more, the complicated fNIRS system's size and weight could not be that compact for wearable applications, and its expensive price also limits the large-scale use of fNIRS. Building a robust model to classify trial data is also essential to the fNIRS-BCI system. Previous studies also verify the feasibility of classifying several mental tasks in the PFC (e.g., mental arithmetic). So far, there are lots of binary classification models and some of them can achieve high accuracy, even on a single trial [4,6,7]. However, the multiclass classification of mental tasks in the PFC has not been well studied yet. Different cortical activities show spatial differ-ences in NIRS patterns, so using a multitask classification algorithm based on the multichannel system is promising for more applications in fNIRS-BCI.
Taking the above concerns into consideration, this paper proposes a dual-level light intensity method to provide more useful channels and decrease the requirement of hardware, which makes our fNIRS system more suitable for portability, as it relaxes the need to calibrate the signal channel by channel. Then, we conduct four different mental tasks to activate PFC with our multichannel NIRS system and then employ a deep forest algorithm to classify four cortical activation states. This paper also compares the performances when taking different chromophore concentrations as features and concludes the optimal parameters in feature extraction and model adjustment to achieve high accuracy.

System Design and Experimental Paradigm
2.1. The Hardware of fNIRS System. The overall system block diagram is shown in Figure 1. The proposed system consists of a 6-channel light source module, an 8-channel photodiode (PD) light detector module, and a field programmable gate array-(FPGA-) based controller with a built-in Wi-Fi module. Light source probes and light detector probes are placed on the forehead for light emitting and collecting, respectively.
In the consideration of measured signal sensitivity and the optional light sources in the commercial market, the 735 and 850 nm wavelengths have been selected in the system [15]. The 6-channel light source module consists of 6 double wavelength optical sources (two LEDs in one package, 735 nm and 850 nm, Ushio) and its driving circuit which utilizes a voltage buffer and a triode-based voltage-current converter to convert the output of digital-to-analog converters (DAC, AD5542A) to the corresponding LED driving current without affecting the function of the DAC. In order to implement the system with high adaptability of various   Journal of Sensors experiments and subjects, each wavelength light intensity could be adjusted from 0 mA to 50 mA with a 0.78 μA step.
The light-emitting intensity should be carefully chosen to realize a better final signal during testing based on different experiment conditions. The light-receiving circuit generally magnifies the μAlevel current detected by a photodiode and digitizes the amplified signal for the convenience of following data transmission and signal processing. The amplification circuit has two stages. The first stage is a transimpedance amplifier with a gain of 5000. The second stage is an active low-pass filter with a 100 kHz cutoff frequency of antialiasing, followed by a 24-bit ADC.
The controller maintains the sequences of light driving and receiving with the multithread capability of FPGA (ZYNQ7000) and encodes data received from the ADC for channel identification and data compression. The wireless communication between the system and the terminal device is implemented by the embedded transceiver in TI CC3220SF SoC for further data processing and classification.
Under the coordination of the controller, there are 48 channels working in time-division multiple access (TDMA) modulation at the switching rate of 43.8 Hz, by emitting each light source and each wavelength LED one by one at a certain intensity. A flexible probe distribution plate is designed to hold the light source and detector probes. The layout pattern of the light source and light detector probes placed on the forehead could be custom based on the requirement of a special experiment to locate the observation points on the relative cortex region. A layout used in our BCI task classification experiment with multiple source and detector separations is shown in Figure 2.

Data Acquisition with Dual-Level Light
Intensity. Due to the light highly scattering characteristic in the brain tissue, photons emitted by the light source will be scattered and reflected in the propagation path in the tissue, and some of the photons will be reflected out of the brain and detected by detectors. The light diffuse reflection transmission path formed by the detected photons in the brain tissue between the source and detector is banana-like shaped. And the light coming out from the brain is attenuated as the light is partly absorbed by the chromophores along the path [16,17]. The fNIRS system measures the light intensity through the human tissue to obtain the light attenuation change, so as to measure the concentration changes of oxyhemoglobin (HbO 2 ) and deoxyhemoglobin (Hb) based on the differential form of the modified Beer-Lambert Law (dMBLL) [18][19][20]: where ΔA is the change of light attenuation, I det means the detected intensity values of two different states of the tissue, L is the total mean path length of detected photons, and μ α is the absorption coefficient of the tissue. For different kinds of tissues, the path length L is related to the differential path length factor (DPF) and the sourcedetector separation distance d: The value of the DPF could be obtained through experiments or Monte Carlo simulations under different conditions [21,22]. In this work, we use 5.98 and 6.5 for 735 nm and 850 nm wavelength, respectively.
From Equation (1), we know that the attenuation change is proportional to the change of absorption, which is the weighted sum of the change in the concentration of HbO 2 and Hb: where the α weights are the absorption coefficients of different chromophores.If the attenuation change is measured at two wavelengths, the concentration changes can be calculated from the detected intensity values [20]: Therefore, during the process of converting detected light attenuation to concentration changes of chromophores, the measurement for ΔA is crucial to the accuracy of final results. As shown in Figure 3, the detected light intensity is related to the emitted light intensity of the source and the distance of source-detector separation. LED sources could work under two different intensity levels, and two identical photodiode detectors are marked as PD1 and PD2. The banana-like shapes show different spatial distributions of NIR lights in channels.
The total detected light intensity decreases when the interval of source-detector separation increases. However, the ratio of photons went through the white matter layer increases, which means a gain in sensitivity [23,24]. Therefore, there is a trade-off between the detected light intensity 3 Journal of Sensors and the useful information in fNIRS signal. As shown in Figure 3, if the emitted source light intensity is high, the intensity of signals in channels with a long separation distance of source-detector will be improved, but the detector of channels with a short distance might be saturated. On the contrary, if the emitted source light intensity is low, the quality of signals in the channels with a short distance of source-detector separation will be good, but the channels with a long distance might suffer from low achievable SNR. Although the SNR of some channels is not good under some conditions, it still contains effective information related to the mental state.
To achieve the optimal SNR for each channel, the light intensity of the LED source should be well chosen and be adjusted for each detector with different distances. As shown in Figure 2, there are different source-detector (SD) paths among all channels; it is obvious that single-level light intensity is unable to provide good SNR for long SD path and short SD path channels at the same time. If we calibrate light intensity for each channel and adjust the light intensity for each detector with different distances, 8 kinds of configurations will be used, and the temporal resolution will be decreased by 8 times. In order to solve the confliction between signal quality and temporal resolution, we propose a dual-level light intensity data acquisition method to balance it, as shown in Figure 4.
The single-level intensity mode is easy to realize. However, when the spaces and locations of source/detector pairs are limited, it is impossible to be adjusted properly for all measurement requirements and test objects. Since LED sources are driven by DAC, the controller in this system is able to adjust the light intensity. Therefore, the LED sources could be coded to work at different light intensity levels by software. Based on TDMA, each LED is switched on twice with high-level intensity (Lv.H) and low-level intensity (Lv.L), and all LEDs work in this mode and be switched on one by one. The switching scheme will affect the detection result, especially on the distortion between different channels. This is a common problem for time-division control method in fNIRS. If one switching cycle period (in our system, the cycle period is 1/43:8 Hz = 0:023 sec) is much shorter than the response time of the brain activity hemoglobin signal (usually larger than 1 second), the effect could be ignored. And, in our system, as there is only one DAC, the light source can only be switched one by one. Each time we change the output of DAC to decide the level of light intensity, about five milliseconds later a stable output can be obtained because of the setup time of circuits. If we change the levels frequently, the total additional waiting time will become unacceptable. The number of level change in Figure 4 is one which costs the minimal additional time so as to provide the smallest distortion in final signals. In our system, we keep the light intensity exposure on the tissue much weaker than the requirement of safety standard (IEC62471).
The use of a dual-level light intensity method reduces the dependence on hardware and makes the system to be wearable. Moreover, for channels with multiple source-detector separations, there is always a better result for each channel under two levels. The calibration is to make every channel with good signal-to-noise ratio and no saturation. By using the dual-level light intensity method in a multiple separated source-detector configuration, the source-detector pair with a short separation distance will not be saturated under lowintensity emitted light, and for long separated sourcedetector pair, a low noise signal will be detected under high-intensity emitted light. The dual-level light intensity method expands the tolerance of source-detector separation and relaxes the need for a channel by channel calibration. By combining them together, we could maximize the number of effective channels in the limited area on the forehead. And dual-level mode sacrifices less temporal resolution than any other multilevel modes or channel by channel calibrations under multiple source-detector separations.
Finally, we use the designed fNIRS system with a custom layout pattern under dual-level light intensity mode to collect the PFC activity data during experiment. The complete output consists of 48 channels from all source-detector combination, and its preprocessing procedure is shown in Figure 5. After being applied to the dMBLL, the measured signals are then filtered by a fifth-order Butterworth filter with a passband of 0.01-1 Hz, which prepares the data for the following feature extraction.

Experimental Paradigm.
The NIRS data were collected by our continuous-wave NIRS system with two wavelengths (735 nm and 850 nm). As shown in the right part of Figure 1, the multichannel system consists of eight detectors and six sources; all of them are attached to a special cap made of silicone, providing good coupling to the scalp. The subject needs to wear this cap during the experiment, and the setup is shown in Figure 6.
During the experiment, the subject was asked to sit in a chair and try to avoid head and body movements. Each trial comprised a 30-second prerest period to get the baseline, 5 repetitions of the given task, and a 30-second postrest period. Before each experiment, the instruction was displayed on the screen, and the subject needs to respond as quickly and as  (i) Mental arithmetic (MA) task: the subject needs to calculate the subtraction of a small prime number (such as 13, in this case) from a random three-digit number and continue to do the subtraction with successive subtractions until the task period is finished. During the MA task, only the first number is displayed on the screen.
(ii) Digit Span (DS) task: when it begins, a random sixdigit number is displayed on the screen digit by digit; each digit display lasts for 0.1 s with a 0.4 s interval.
After displaying the entire number, the subject needs to recall the number in reverse order and then press the button to continue the next number display throughout the task period.
(iii) Semantic (SM) task: two Chinese nouns randomly selected from the word database are displayed on the screen. The subject needs to use these two words to make a sentence and press the button to get the next set of words to continue the same semantic task until the task is finished.
The experimental paradigm and an example of screen display are shown in Figure 7. All procedures are controlled by the software automatically to guarantee a standard paradigm, and the NIRS system collects data simultaneously.

Deep Forest for Mental Task Classification
Deep forest is a novel decision tree-based approach. By combining multigrained scanning with a cascaded random forest, deep forest is structurally aware and performs excellently even on small-scale data by automatically setting the model. Moreover, deep forest has fewer hyperparameters than traditional deep neural networks, and its performance is quite robust to hyperparameter settings [25]. Compared to a standard decision tree algorithm, the deep forest approach is better in a feature study as dimensionality reduction of raw data is unnecessary. Secondly, the results of deep forest are more accurate as the results are the decision of multiple classifications and regression trees. Besides, the deep forest approach   As NIRS data are sampled at a high frequency with multiple channels, it will get high dimensions to deal with and have a strong spatial-temporal structure. For future realtime applications in the NIRS-BCI system, the classification algorithm should be as fast and efficient as possible. Since the running efficiency of deep forest is high and can be improved further with optimized parallel implementation [25], deep forest is a suitable and promising choice.

Feature Extraction with Multigrained
Scanning. According to the experiment paradigm mentioned before, we need to select and extract the features from the raw NIRS data in advance. When selecting the concentration changes of HbO 2 , Hb, and HbT (a summation of HbO 2 and Hb) as features, there are 1315 raw features in each time sequence sampled in a 30-second task period under a frequency of 43.8 Hz. For a total of 48 channels with three variables, signals collected under low-level light intensity (LI) and high-level light intensity (HI) are concatenated into a 288 × 1315 matrix as a raw instance of each task according to the given spatial locations.
Taking the spatial-temporal characteristics of NIRS data into account, we scan NIRS data in the style of processing images; thus, we can extract structural features without image reconstruction of cortical activity. As shown in Figure 8, taking the dimensions of final feature vectors into account, the raw instance (with 288 × 1315 raw features) is sliced by sliding a w-dimensional window with a step of s; then, ðð288 − wÞ/s + 1Þðð1315 − wÞ/s + 1Þ new small instances will be produced, which belongs to the same task class as raw instance. If we slide the window one feature by one feature, which means the step is one, the number of new small instances is equal to ð289 − wÞð1316 − wÞ (e.g., if w = 288, sliding the window will produce 1028 small instances for each raw instance).
All instances extracted from the same size of windows will be used to train two different kinds of forest, a completely-random tree forest A and a random forest B. Since we have three tasks and a rest state, each class feature will be generated with 4 dimensions and then concatenated as transformed features. By using multiple sizes of sliding windows, different feature vectors will be generated and prepared for the following cascaded forest stage.

Cascaded Random Forest.
Deep forest employs a cascaded structure, as illustrated in Figure 9, making each layer receive and pass feature information. Since we use multigrained scanning, there will be several levels in each layer, and each level is an ensemble of forests based on decision trees. We use two completely random tree forests and two  Figure 8: Illustration of feature reconstruction with scanning. 6 Journal of Sensors random forests for each level, and each forest will produce an estimate of the class distribution. After using three window sizes to conduct multigrained scanning, three feature vectors in different dimensions are produced, which will be used to train the three grades of the cascaded random forest correspondingly. For each instance, each tree will generate a distribution with the percentage of different classes that the training examples are divided into. By averaging across all trees in the same forest, each forest will produce the estimated result and form a 4dimension class vector. Class vectors of all forests are then concatenated with the original feature vector of the corresponding level to be inputted to the next level of the cascade.
After increasing a new level, the performance of the whole cascade will be estimated on the validation set. This procedure will be repeated until the validation performance converges. If there is no performance gain, the training procedure will terminate automatically; then, the final prediction will be generated by pooling the results of the four random forests in the last layer.

Experimental Results and Discussion
To verify the proposed system and test the performance of classifiers, we use a dataset collected by the experiments mentioned before from two average 23-year-old healthy men.
There are four classes labelled with MA, DS, SM, and REST, and each class has 48 instances.
In order to determine the optimal models, we firstly compared the performance of different kinds of chromophores for feature selection with and without scanning. The fNIRS signals of HbO 2 and Hb usually have a negative correlation relationship during mental tasks, and the change of HbO 2 is larger than Hb in actual cortical activation [26]. It is consistent with the result that only taking HbO 2 as a feature can achieve higher accuracy than Hb in Figure 10. This also explains the poor performance of only taking HbT as raw features, because it usually has the same tendency as HbO 2 but with a smaller change. However, when the blood flow change gives rise to similar trend changes in HbO 2 and Hb, HbT could be an important feature to reflect facts. It is obviously shown that taking the combination of all three kinds of concentrations has a better performance than taking any single kind of chromophores.
We then compared the performance of different light intensity levels. After scanning the raw feature with the window size ranging from 36 to 144, corresponding to 1/8 and 1/2 width, respectively, the results are shown in Figure 11. Different scanning window sizes show different data features from different time frequency domains. Some of them contain more useful information for classification. As a result, the classification accuracy varies with the 7 Journal of Sensors scanning window size. Selecting data under dual-level light intensity shows higher mean accuracy than a single highlevel or low-level intensity. This result is consistent with the fact that dual-level light intensity provides more channel information than single level under multiple sourcedetector separation distance layout configuration, relaxes the complexity of hardware, and also saves the temporal resolution by avoiding channel by channel calibration, which are the two important aspects for wearable BCI equipment.
The data collected by the dual-level light intensity method and all three kinds of chromophores were then selected to construct the features. To compare some traditional machine learning classifiers employing support vector machine (SVM), decision tree (DT), and k-nearest neighbours (KNN) with the deep forest (DF) classifier, we performed 4-fold cross-validation to evaluate the accuracy. As shown in Figure 12, all models and datasets were evaluated with different single-grained scanning sizes, and size 0 means no scanning for the raw features. It is vividly shown that the deep forest classifier has a better performance than other classifiers, especially without scanning to reconstruct the raw features. Other classifiers also benefit a lot from the scanning process with forests, which shows an improvement of mean accuracy in Figure 12.
After a comparison among all these algorithms and sizes, we selected three window sizes with the best performance in   Table 1.
After training with optimal hyperparameters, the generated model was used to predict the test sets and achieved an average accuracy of 86.95%, and the confusion matrixes of two subjects are shown in Figure 13.
The results indicate that the rest state is easy to be identified as the right label, but the MA task, DS task, and SM task might be predicted as the rest state by mistake when there are no obvious fluctuations in the concentration. These three tasks might also be confused with each other sometimes; it is mainly because motion artifacts exist. It will be improved with other algorithms in our further work. Table 2 also compares this work with other recently published fNIRS-based mental task classification. With the designed multichannel fNIRS system, dedicated sourcedetector layout, and dual-level intensity data acquisition, the proposed work is convenient to wear and transmit data and achieves the highest classification accuracy with 4 states.
In conclusion, dual-level light intensity excitation will benefit the brain activity classification by providing more useful channels, which is important for portable compact NIRS-BCI equipment when using a multi-interval source/detector layout to locate the monitoring point on a specific brain region. And the deep forest algorithm can achieve higher accuracy than other methods, especially without scan-ning. This indicates that deep forest has a potential to deal with raw data, which will cost less time and is be promising in future NIRS-BCI application.

Conclusions
This paper proposed a continuous-wave fNIRS system, which has multiple channels of different source-detector intervals to extract the spatial characteristic and collect data, providing flexibility for choosing the concerned brain region. The system is compactable and wearable by involving a dual-level lightemitting intensity mode for better SNR. The system was applied to collect fNIRS data during three cognitive mental tasks and the rest state in the PFC. By employing a deep forest algorithm, our system could achieve a higher classification accuracy than other methods, even with raw data. According to the comparison of different hyperparameters, we determined the optimal model with three-grained scanning. Finally, this work achieves 86.9% accuracy for 4 different cortical activation states.

Disclosure
The research in this paper is an extension of the previous work presented as conference Figure 13: Confusion matrixes of two subjects.