Generalized Information Equilibrium Approaches to EEG Sleep Stage Discrimination

Recent advances in neuroscience have raised the hypothesis that the underlying pattern of neuronal activation which results in electroencephalography (EEG) signals is via power-law distributed neuronal avalanches, while EEG signals are nonstationary. Therefore, spectral analysis of EEG may miss many properties inherent in such signals. A complete understanding of such dynamical systems requires knowledge of the underlying nonequilibrium thermodynamics. In recent work by Fielitz and Borchardt (2011, 2014), the concept of information equilibrium (IE) in information transfer processes has successfully characterized many different systems far from thermodynamic equilibrium. We utilized a publicly available database of polysomnogram EEG data from fourteen subjects with eight different one-minute tracings of sleep stage 2 and waking and an overlapping set of eleven subjects with eight different one-minute tracings of sleep stage 3. We applied principles of IE to model EEG as a system that transfers (equilibrates) information from the time domain to scalp-recorded voltages. We find that waking consciousness is readily distinguished from sleep stages 2 and 3 by several differences in mean information transfer constants. Principles of IE applied to EEG may therefore prove to be useful in the study of changes in brain function more generally.


Introduction
In electroencephalography (EEG), scalp electrodes measure electrical potential as a function of time [1]. EEG measures the sum of local field potentials in the region of cortex below the electrode, comprising ∼10 9 cortical neurons [1]. EEG is typically analyzed by spectral analysis (Fourier transform) that assesses power in frequency bands [1]. However, many studies over the last 20 years have demonstrated that the underlying cortical neuronal dynamics is nonlinear and that EEG signals are nonstationary (the mean and variance change over time unpredictably). This has been mostly convincingly demonstrated both in vivo and in vitro using multielectrode arrays on cortical tissue, demonstrating the presence of "neuronal avalanches" [2,3].
Given that the cortical neuronal dynamics largely responsible for the summed local field potentials that comprise EEG are characterized by scale-free avalanches consistent with a system at a critical state that is well described by power-law dynamics, many attempts have been made to analyze EEG using methods derived from fractal and other nonlinear theories, with some degree of success [4][5][6][7][8][9]. Another avenue of physical understanding of cortical avalanche dynamics would be via statistical physics and thermodynamics; however, the relatively large magnitude changes in scalp-recorded voltages in EEG clearly could not be characteristic of a system in thermodynamic equilibrium [10]. Therefore, a thorough statistical physics understanding of EEG would involve a complete description of cortical nonequilibrium thermodynamics, which is not possible for a noninvasive technique such as EEG [10,11]. Similarly, previously published information-theoretic shortcuts to a thermodynamic understanding (such as maximum entropy approaches) for EEG suffer from insufficient knowledge of appropriate constraints for microscopic variables [12,13].
Instead, we propose to utilize the concept of generalized information transfer, where EEG could be modeled as an information transfer process [11,14]. Generalized information equilibrium (IE) has been proposed as a systemindependent mechanism to study systems far from thermodynamic equilibrium, with applications to astrophysics, economics, materials science, Newtonian physics, and thermodynamics [11,14,15]. The principles of IE were developed from Hartley's original description [16] of an amount of information ( ): where is the number of selected symbols and = ln is a constant which depends on the number of symbols ( ) available at each selection. Note that we use the natural logarithm, so that our natural information measure is in "nats" instead of "bits." Following Fielitz and Borchardt (2014) we will use the Hartley definition of information to say that the information in a given process is where is the size of the alphabet of symbols used to encode and is the number of symbols we select. A key assumption is that ≫ 1 (which we have from the 10 8 to 10 9 neurons in the cortex underlying an electrode).
Note that the more commonly utilized Shannon entropy ( ) defined as [17] reduces to the Hartley definition of information ( ) when the probability of each symbol in the alphabet is equal (i.e., is a constant). The use of Hartley's information theory, lacking any probabilistic assumptions, thus allows an estimation of information flow in any system even without access to knowledge of microscopic states or appropriate constraints in the case of maximum entropy approaches [11,14]. It should also be noted here that Hartley information is a special case of the Rényi entropy for = 0 [18]: It has been demonstrated that one can use Hartley's information theory to define a natural amount of information for any system [11,14]: where is the information transfer constant, |Δ | is the absolute value, and | | is the signal of the process variable , with | | ≪ |Δ |. Using this relationship, virtually any system where information flows from a source ( ) to a destination ( ) can be considered from the point of view of information transfer [11,14]. The important point is, however, that the amount of information ( ) must generally obey the inequality when the process variable is related to the information destination and the process variable to the information source. For the current study, we assume ideal information transfer ( = ) and, hence, information equilibrium (IE). Considering (5) one gets For convenience we will denote the ratio / as and call it the information transfer constant for ideal information transfer or for IE. For EEG, we use (7) to define an information transfer constant ( ) for each time interval (Δ ) to the voltage reading (|Δ |). We analyze the distribution of values to see if they are peaked around a well-defined mean. In that case we can interpret (7) (for small changes in the process variables and ) as a differential equation: which has the solution |Δ | ∼ |Δ | avg .
We will make a few observations here about the IE approach and its relationship to other physical descriptions of dynamic systems. For general information equilibrium, the solution to (8) can be rewritten as Let us now set a new parameter, = 0 . Over short time scales ( 0 ≫ Δ ), (10) reduces to Equation (11) is precisely the form of a Lyapunov exponent if the voltage measurement is considered as a superposition of a large number of neurons at different distances from the EEG sensor (i.e., |Δ | is a sum over individual neuron voltages near the sensor, mapping a 4n dimensional "phase space" to a voltage measurement (R 3 × ) → | |). Lyapunov exponents are deeply related to the study of chaotic dynamical systems, with positive values indicating a chaotic system with exponential divergence from initial conditions [19]. For systems with power-law sensitivity to initial conditions, Lyapunov exponent analysis has been generalized to the scale-dependent Lyapunov exponent, which has been utilized to successfully describe many dynamic physical systems, including EEG-based seizure identification in humans (e.g., [5,[20][21][22]). For the current study, we utilize a publicly available database of polysomnographic data for fourteen subjects with eight minutes each of waking and sleep stage 2 EEG (and eleven subjects with eight minutes of sleep stage 3 EEG) to assess for differences in patterns of values to assess the utility of IE in distinguishing different states of consciousness. Our hypothesis is that different states of consciousness can be identified by different distributions of and different avg values.
Computational and Mathematical Methods in Medicine 3

Database.
We utilized a publicly available EEG dataset (slpdb) http://www.physionet.org/, which was a polysomnogram study of patients with severe sleep apnea [23]. There were = 14 subjects with 8 min of waking EEG and sleep stage 2 EEG and = 11 subjects with 8 min of sleep stage 3 EEG. An additional dataset of = 13 subjects of waking EEG, = 10 subjects of REM sleep EEG, and = 8 subjects of sleep stage 1 EEG (1 minute each, nonoverlapping with the larger 8 min EEG dataset) was also generated from the larger dataset. The exact dataset used has previously been described in a prior unrelated study [9]. EEG segments chosen for further analysis were selected on the basis of the absence of movement artifacts and disordered breathing, which limited the amount of suitable tracings. No demographic and limited clinical information was available from the dataset. Digitized 250 Hz EEG recordings on a 10-20 international system were used with a single EEG lead for each subject, which differed among subjects; no information was provided about reference electrode placement [9]. Use of the dataset for this study was approved by the VA West Los Angeles IRB.

Estimation. EEG is a time series of voltage readings
( ), where = 1, 2, . . . , (length of series) for each value of up to − Δ , given a time interval Δ , so the values for each instant can be calculated: Therefore, each segment of EEG would be characterized by a series of information transfer constant ratios, for different values of the time interval Δ (i.e., 1, 2, 4, 8, time steps, etc.), and for each segment the mean avg was calculated: Code for extracting values from EEG was written in R [24]. We used the natural log for transformation throughout. Values where ( ) = ( +Δ ) were excluded from estimation (as the logarithm of zero is undefined).

Analyses.
Probability density function (PDF) estimation was done using the R density package. Lomb-Scargle periodograms were done using the R package cts [25], designed to follow [26]. To assess statistically significant periodogram peaks, we utilized a ≤ 0.01 threshold, heuristically estimating the maximum possible number of frequencies in the input PDF as twice the number of data points in the PDF [26]. All statistics were done in R [24]. For the REM sleep and sleep stage 1 dataset analysis with the reduced size dataset, we utilized generalized linear mixed modeling (GLMM) with unstructured covariance matrices to account for subject-specific effects, using the R package nlme [27].

Waking Differs from Sleep Stages 2 and 3 in Values at
Multiple Time Scales. We calculated the mean value for each segment in our database with a range of different Δ values (0.004, 0.04, 0.4, and 4 seconds; Figure 1, Table 1). An example of the comparison in the PDFs of values for all three states of consciousness for 1 min each of EEG for a single subject at Δ values from 0.004 to 4 sec is shown in Figure 1.
Segment-specific mean values were then analyzed by repeated-measures ANOVA with state of consciousness as the grouping variable and subject as the repeated measure (Table 1). These results demonstrate that waking EEG is clearly distinguishable from sleep stages 2 and 3 via segment mean values (Table 1). While waking and sleep stage 3 values differ strongly at 0.004-, 0.4-, and 4-second time scales, there is no difference between them at the 0.04-second time scale (Table 1). Interestingly, while there seems to be a pattern for waking and sleep stage 2 values to slowly become less different over time, if anything the opposite is true for the waking/sleep stage 3 comparison, where the 4second time scale shows the largest difference between the two (comparing values; Table 1).  (Table 2). Strikingly, both sleep stage 2 and (more so) sleep stage 3 have a greater proportion of low information transfer values than waking EEG (Table 2). At larger time steps, however, there was no difference between consciousness stages in the proportion of low information transfer (data not shown).

Waking Differs from Sleep Stages 2 and 3 in the Extent of Periodicity in the PDFs of Values at All Time Scales.
As can be noted in Figure 1, there appears to be a periodicity in the   PDF of values, in that certain magnitudes are enhanced, and others are diminished. In order to quantify this, we made PDF estimations for values for all segments and then performed a normalized Lomb-Scargle periodogram analysis, in order to assess for periodicity ( Figure 2; Table 3). Sleep stage 2 exhibits enhanced periodicity, while sleep stage 3 shows diminished periodicity compared with waking at the 0.004 sec time step. For all other time steps, though, sleep stage 3 shows greater periodicity in the value PDF estimations than sleep stage 2 and waking consciousness ( Figure 2; Table 3). Note that although sleep stage 2 and waking appear to be very similar at all other time steps (Figure 2), there are in fact modest quantitative differences between their periodicities (Table 3).  4 3.5 × 10 4 (6.7 × 10 3 ) 3 . 3× 10 4 (5.1 × 10 3 ) 11.08 * * 4.8 × 10 4 (1.6 × 10 4 ) 100.9 * * * 4 5.2 × 10 4 (9.5 × 10 3 ) 5 . 0× 10 4 (7.6 × 10 3 ) 6.9 * * 7.     (Table 4). Interestingly, both sleep stage 1 and REM sleep exhibited no difference from waking EEG in terms of the extent of periodicity in the value PDF estimations at the sampling rate, whereas both showed stronger differences with larger time steps (Table 4). Sleep stage 1 did not differ from waking in periodicity in the value PDF estimations at Δ = 0.04 sec, demonstrated a trend towards an increase in periodicity at Δ = 0.4 sec, and showed an increase in periodicity at Δ = 4 sec ( Table 4). By contrast, REM sleep exhibited a greater degree of periodicity in the value PDF estimations than waking at Δ = 0.04, 0.4, and 4 sec.

Generalized Information Theory and EEG.
To our knowledge, this report is the first application of principles of IE to the study of EEG. Other information-theoretic approaches have a long history of neuroscience applications [28], but maximum entropy applications to EEG in particular have been limited by an appropriate understanding of microscopic constraints [12,13]. The theoretical advantage of IE approaches to the analysis of EEG (or any other system) is that an explicit probabilistic understanding of the underlying system states (i.e., cortical local field potentials) is unnecessary; thus an estimation of "natural" information transfer can be assessed from a macro-observable (like time-dependent scalp voltage) alone [11,14].

Utility of IE for the Study of Sleep Stage Discrimination.
Our study demonstrates several interesting findings with regard to the application of IE to the analysis of polysomnogram data in EEG. Firstly, there is a clear distinction between waking, sleep stage 2, and sleep stage 3 consciousness in terms of mean values across different time scales (Table 1), with sleep stage 1 and REM sleep consciousness having distinction at fewer time scales (Table 4). Secondly, waking differs from both sleep stage 2 and sleep stage 3 (but not sleep stage 1 or REM sleep) in terms of the proportion of low information transfer values at the sampling rate (250 Hz). Thirdly and perhaps most surprisingly, there is a clear periodicity of the PDF of the values at all time scales, which differs strongly between waking, sleep stage 2, and sleep stage 3, with more limited differences between waking and sleep stage 1 and REM sleep (Figure 2, Tables 3 and 4). Thus, there is an overall richness of the IE description of differences in sleep and consciousness states that may well suit it to be used as a general tool to study states of altered cortical function. Indeed, the apparent discriminative power of IE (Tables 1-4) for sleep staging compares favorably with many other descriptions of computer-based analytic techniques for EEG, including fractals [6], multifractals [7,9,29,30], and Tsallis entropy [4], not to mention automatic feature extraction from spectral analysis (reviewed in [31]).

Limitations.
We utilized a publicly available dataset with minimal clinical or demographic information available. The number of subjects available was relatively small, and the number of EEG segments available was smaller still for sleep stage 1 and REM sleep. Waking consciousness is likely to be a heterogeneous state of brain activities; thus identifying this state only based upon clinical polysomnogram staging may limit the ability of any technique to assess for differences between waking and sleep stages. We can not exclude the possibility that some of the observed differences between states of consciousness were caused by differences in motor or muscle activity.

Conclusions
Given the highly significant results from application of IE to EEG, with the ability to discriminate between waking and sleep consciousness stages via multiple distinct statistical descriptions, the study of EEG-based information transfer constant ( ) certainly deserves to be tried more generally with other sleep EEG datasets to ensure replicability. The application of IE to EEG is very straightforward, with extremely simple programming algorithms compared to other techniques. Indeed, if the results of the present study are a guide, it may be interesting to apply IE more widely with states of brain dysfunction to see if it will become a useful tool in the quantitative analysis of EEG.

Abbreviations
IE: Information equilibrium.