Calculating permutation entropy without permutations

A method for analyzing sequential data sets, similar to the permutation entropy one, is discussed. The characteristic features of this method are as follows: it preserves information about equal values, if any, in the embedding vectors; it is exempt of combinatorics; it delivers the same entropy value as does the permutation method, provided the embedding vectors do not have equal components. In the latter case this method can be used instead of the permutation one. If embedding vectors have equal components this method could be more precise in discriminating between similar data sets.


Introduction
Due to technical progress in the areas of sensors and storage devices a huge amount of raw data about time course of different processes, such as ECG, EEG, climate data recordings, stock market data have become available. These data are redundant. The data processing and classification, aimed at extracting meaningful for nonspecialist characteristics, is based on reducing the excess of redundancy. As a result, a new data is obtained, small in size and digestible by a human being. Examples of those reduced data for time series can be mean value, variance, Liapunov exponents, correlation dimension, attractor dimension and others.
A remarkable method suitable for reducing the excess of redundancy in time series has been proposed by Ch.Bandt and B.Pompe in [1], known as permutation entropy. This method is simple and transparent, is robust with respect to monotonic distortions of the raw data, and is suitable for estimating the dynamical complexity of the underlying dynamical process. Many interesting results, e.g. [2,3,4,5], have been obtained with straightforward application of the permutation entropy methodology in its initial form, as it is described in [1]. Nevertheless, this method is subjected to a critique for not taking into account absolute values of the raw data and for not treating properly a possibility of having equal values in the embedding vector (ties), [6,7]. In this connection, it should be taken into account that any redundancy reduction method leaves out some type of information, which may be useless for one process/task and may carry useful information for another one. In the latter case, the bare idea of [1] about how to treat equal values can/should be modified in order to meet a purpose of concrete situation. Examples of such a modification can be found in [8,9] for taking into account absolute values, or in [10,11] for treating equal values. Interesting modification of the permutation entropy method has been proposed in [12] for 3-tuple EEG data.
In the standard permutation entropy methodology, it is preferable that embedding vectors have all their components different. Otherwise, they cannot be plainly symbolized by a permutation without using additional rules, which actually treat equal values as not being such. Situation with equal values in the embedding vector may arise for high embedding dimension, for crude quantization of measured data, for very long data sequences and when observed dynamical system has intrinsically only a small number of possible outputs.
This note is aimed at discussing a slightly different symbolization technique of embedding vectors, which does not refer to combinatorics, and which is capable of preserving information about equal values in embedding vectors. Instead of permutation, an embedding vector is emblematized with a single integer number of base D, where D is the embedding dimension. In the case of no ties (no equal components in the embedding vectors) the technique is equivalent to the standard permutation entropy methodology. In the opposite case, it may discriminate between similar data sets better than the permutation entropy method does.

Permutation entropy
Consider a finite sequence of measurements. By choosing the embedding dimension D < N the data (1) can be embedded into a D-dimensional space by picking out consecutive D-tuples from X. As a result, a set of D-dimensional embedding vectors is obtained: where each vector has the following form: An additional parameter of the embedding procedure is delay τ = 1, 2, . . . . In the above definition, we put τ = 1 for simplicity. With τ = 1 one would have The data represented in (2) and/or (3) is even more redundant than that represented in (1) since, for D N , most data values from (1) are represented in (3) D times. In the permutation entropy technique [1], each embedding vector from (2) and/or (3) is replaced with a permutation π of D integers {0,1,2,. . . ,D − 1}, which is defined by the order pattern of values composing the vector. For any embedding vector V = (x 0 , x 1 , . . . , x D−1 ) the permutation π, which symbolizes it, is calculated as follows. Arrange all components of V either in the descending, [13]: or in the ascending, [11,14]: order 1 keeping their subscripts unchanged. The permutation π which corresponds to V is obtained as the row of the subscripts in the rearranged vector V π from either (4), or (5): π ≡ π(V ) = (r 0 , r 1 , . . . , r D−1 ).
From the set of embedding vectors V, calculate a new set Π of order patterns by replacing each vector in (2) by the corresponding permutation: Now, empirical probability of each permutation, p(π i ), can be obtained by dividing the number of occurrences of π i in the Π by the total number of elements in the Π. The permutation entropy of V is the Shannon entropy of the probability distribution p(π i ): where K is the number of different permutations in the Π.

Treatment of equal values
Equal values in an embedding vector are, to an extent, inconvenient. Indeed, if x r = x s for some 0 ≤ r, s < D in a vector V = (x 0 , x 1 , . . . , x D−1 ), then r and s should be placed side by side in the permutation (6), but which one should go first? Due to sameness of values it is impossible to uniquely determine a corresponding permutation without introducing additional rules. In some cases the possibility of equal values can be ignored due to their low probability. This is reasonable when the embedding dimension is low, and/or a chaotic process data are recorded with high precision, see [1,15,16]. If equal values are inevitable, the following rule is applied 2 if x s = x r and s > r than s goes first.
The rule (7) has different meaning depending of whether (4) or (5) convention is used. Namely, in the case of (4), an embedding vector with all components equal will be equivalent to a vector with monotonically ascending components. If (5) is adopted, then that same vector will be equivalent to a vector with monotonically descending components, see Fig. 1. Without knowing a real system, it is not clear which case is better and whether it is good or bad to label a sequence of same values as being decreasing or increasing. Actually, the permutation symbolization technique aims at reducing redundancy. Discrimination between constant and either increasing, or decreasing sequences of data may appear to be excessive in some cases. On the other hand, when a system generating data has a few possible outputs, or the data was subjected to a crude quantization, or embedding dimension is large, it may happen to be useful if presence of equal values in the embedding vector results in order pattern preserving this fact. One possible approach to do this is discussed in the next section.

Symbolization
The following symbolization is aimed to keep information about equal values in embedding vectors. Having a vector V = (x 0 , x 1 , . . . , x D−1 ) construct a 2 In some cases, e.g. [10,11], the opposite inequality sign is used here.
sequence of integers α: by the following rule. Find the smallest component, c 0 , in the V . If c 0 is found at places r 1 , r 2 , . . . , put number 0 at those places in the α(V ). Find the next smallest component c 1 , c 1 > c 0 in the V . If c 1 is found at places s 1 , s 2 , . . . , put number 1 at those places in the α. Proceed this way until components of V are exhausted. At this stage, all D components of α will be determined. The α obtained this way is used as a symbol of embedding vector V . For example, consider V = (7,15,7,25,15). The corresponding symbol, or the order pattern is α = (0, 1, 0, 2, 1). Here, information about equal values and their positions is preserved.
If V has no equal components, it can be proven (see Appendix A) that α = π −1 . This means that α is the inverse permutation of the one obtained for V if convention (5) is used. Since correspondence between permutations and their inverse is one-to-one, it does not matter which one, π or α, is used for calculating entropy. This further means that for a data set and embedding method, which does not deliver equal values in the embedding vectors, symbolization used here is equivalent to the permutation one 3 while calculating entropy.

Arithmetization
Expect that embedding vector V in (8) has exactly d unique components, where d ≤ D. In this case, corresponding symbol α(V ) will be a sequence of D numbers chosen from the set d = {0, 1, . . . , d − 1} in such a way that not any element from d is missed. The latter can be formulated as the following condition: The sequence α(V ) can be considered as a single integer A(V ), in a base-D positional numeral system 4 , with digits a D−1 a D−2 . . . a 0 : It is clear that there is one-to-one correspondence between order patterns α and integers obtained as shown in (10). Therefore, a set of order patterns, constructed as described in Sec. 3.1, can be replaced with a set A of integers obtained as shown in (10): The empirical probabilities p(A i ) to find an integer A i among those in A can be calculated as usual, and we have for the arithmetic entropy: where L is the number of different integers in the A. For a data sequence and embedding method which does not deliver equal values in the embedding vectors, all d i = D and the integers A i will represent corresponding permutation order patterns unambiguously. In this case, and A max corresponds to pattern α max = (0, 1, . . . , D − 1): In this case, only D! integers will be used from [A min ; A max ] due to condition (9).

How many new possible order patterns are got?
If it is decided to treat order patterns generated from embedding D-vectors with some components equal as not equivalent to those from vectors with all components different, then the number of all possible patterns will be greater than D!. Here we attempt to estimate how many new patterns can be obtained. Any where b(D) are known as ordered Bell numbers, see [19, p.337] for naming discussion. Calculating 5 N (D) for D ∈ {2, 3, 4, 5, 6, 7} we see that the number of new patterns is normally greater than D!, see Fig. 2 and also Table 1. Of course, the possible new patterns may only be significant when they can be observed (see discussion about this in [7]). This depends on the process under study and embedding method.

Coding
Certainly, there are several possible implementations of the algorithm discussed in Secs. 3.1, 3.2. Here, the one used for the examples in Sec. 4, and Appendix C, below, is shown. It is a C++ program. It is expected that the sequence (1) is organized into a one-dimensional array X[N]. For calculating arithmetic order pattern of vector V i shown in (3) it is necessary to pass a pointer to the X[i] to the function get numerical pattern, below, as its third argument: data point = X + i.
In the below example, X[i] is declared as double, but it can be of any type with appropriate sorting defined. The returning value is declared as mpz class, which is a GNU multiple precision integer (https://gmplib.org/). This is used because for embedding dimensions D > 15 the returned number representing an order pattern may exceed 64 bits in size 6 . For smaller D, mpz class can be replaced with int, or long everywhere in the code. Here D is the embedding dimension, tau is the delay. 9 The data_point points to the first component of the Vi in the 10 array of raw data. This code is transparent and does not refer to combinatorics. At the same time, provided an embedding vector does not have equal components, when loop at lines 23-28 above is complete, we obtain in the array pDpnm[D] a permutation π −1 , where π is the permutation for that vector, obtained in accordance with the standard rules of [1] reproduced in Sec. 2 with (5) adopted.

Example
The discussed methodology has been tested at two surrogate sequences. The purpose was to demonstrate that for a pair of sequences the standard permu-tation entropy method gives roughly the same entropy, whereas the arithmetic entropy may be considerably different.
For calculating standard permutation entropy in situation when equal components in embedding vectors are possible we replace the following fragment: With such a replacement we get in the array pDpnm[D] above, the permutation, which is inverse to one obtained for V i in the standard permutation entropy symbolization with rules (5) and (7) adopted. As it was mentioned above, usage of inverse permutations instead of the initial ones delivers the same value for the standard permutation entropy.
1 000 000 long S1 and S2 were produced and both permutation and arithmetic entropy have been calculated. The results are shown in Tables 2 and 3.
Notice that arithmetic entropy is considerably greater than the permutation one. This is due to high frequency of embedding vectors with equal components. Also, from Table 2 with τ = 1 it can be seen that arithmetic entropy discriminates better between S1 and S2. Although, case with delay τ = 2 shown in Table 3 is not similarly conclusive. This might be due to construction method of the S2 sequence. Namely, by pulling from S2 embedding vectors with delay 2,  Table 3: Same as Table 2 for embedding delay τ = 2.
we may get vectors with equal adjacent components, similarly to S1 case. This alleviates difference between S1 and S2. For τ = 1, embedding vectors for S2 do not have equal adjacent components. More examples are in the Appendix C, below.

Conclusions and discussion
In this note, we have discussed a method for calculating entropy in a sequence of data, which is similar to the permutation entropy method. The characteristic features of this method are as follows: (i) it treats equal components in the embedding vectors as being equal instead of ordering them artificially; (ii) it is entirely exempt of combinatorics, labeling order patterns by integers instead of permutations; (iii) if embedding vectors do not have equal components, this method delivers exactly the same value for the entropy as does the standard permutation entropy one.
In the symbolization procedure discussed in Sec. 3.1, new order patterns may appear as compared to the standard permutation method, see Sec. 3.3, above. Those new patterns arise from embedding vectors with some components being equal to each other. In the standard permutation entropy method, the embedding vectors characterized by those new patterns, if any, are labeled by permutations as if there were no equal components. This is made possible through ordering equal values in accordance with the rule (7).
Mathematically, replacing embedding vectors with their order patterns means constructing a quotient set from the set of all embedding vectors with respect to some equivalence relation, [21,10,22]. In the case of permutation entropy, the corresponding equivalence relation is defined by (7) and either (4), or (5).
Denote it by ∼ P . For arithmetic entropy, the corresponding equivalence relation is defined by the algorithm described in the first paragraph of Sec. 3.1. Denote it by ∼ A . It is clear that for two embedding vectors U , V , if U ∼ A V , then U ∼ P V . Namely, if U , V have the same arithmetic order pattern then they do have the same permutation order pattern. That means that ∼ P is coarser relation than ∼ A . Other equivalence relations could be offered, which are courser than ∼ P , or finer than ∼ A , or lying in between, or incomparable with the both, see e.g. [12]. A symbolization which still uses permutations, but is equivalent to discussed here, as regards the treatment of equal values in embedding vectors, has been proposed in [11], see discussion in the Appendix B, below. Which one is better depends on the data sequence and which kind of redundancy is intended to strip.

A Equivalence with permutations
The following theorem proves the statement made in Sec. 3.1.
The latter just means that α(V ) = (π(V )) −1 . Due to this theorem, method discussed in this note is equivalent to the standard permutation entropy method if in any embedding vector any two components are different.

B Comparison with modified permutation entropy
Several versions of modified permutation entropy symbolization have been proposed. We analyze here those proposed in [10] and [11]. Consider firstly [10]. The symbolization proposed there is obtained as follows. Having an embedding vector V = (x0, x1, . . . , xD−1) arrange its components as shown in (5), above, with their subscripts retained. If there are equal components, arrange their subscripts similar to the (7) rule, or any other way. Before fetching the row of subscripts in the resulting vector Vπ = (xr 0 , xr 1 , . . . , xr D−1 ) as modified symbol, do the following preparation. If there is a group of equal components xs 0 = xs 1 = · · · = xs l in the Vπ, then replace all subscripts in this group by the smallest among {s0, s1, . . . , s l }. Do this with all groups of equal components in the Vπ. Use the row of subscripts in the such way modified Vπ as the modified symbol of V . This way modified symbolization retains some information about equal components in the V . Let us denote this type of symbolization as MPE and corresponding symbol as µ(V ).
By comparing values presented in the Table 1, above, with the data of [10, TABLE I] we see that the total number of possible patterns is bigger in the Table 1. Therefore, it could be expected that MPE symbolization used in [10] is coarser than that discussed in this note. Additional hint in the same direction is that for some embedding vectors symbolization of [10] gives the same result while method discussed in this note gives two different. Here is one example: V1 = (3, 5, 3, 5), V2 = (3, 5, 5, 3). MPE symbolization of [10] gives both for V1 and V2 the same order pattern (0, 0, 1, 1), while AE symbolization gives (0, 1, 0, 1) for V1 and (0, 1, 1, 0) for V2 resulting in two different numerical patterns, 68 and 20 calculated for D = 4 as shown in (10), above. Notice now that for any two embedding vectors V1 and V2, if α(V1) = α(V2), then µ(V1) = µ(V2).
Indeed, MPE symbol of any vector V is obtained through rearranging components of V in accordance to their rank order. Symbol α(V ), if considered as a vector, has the same rank of its components as does V . Therefore, α(V ) can be used for calculating µ(V ) instead of V itself. If so, then (14) becomes evident. The above reasoning proves that MPE and AE methods of symbolization are comparable and AE is finer than MPE.
Consider now symbolization used for modified permutation entropy proposed in [11]. In this symbolization each embedding vector V is symbolized with σ(V ), The σ(V ) has the following structure: Here, π(V ) is a permutation. The second half in (15), e(V ) = (e1, e2, . . . , eD−1), keeps information about equal components in V . Call this symbolization MPE2. The symbol σ(V ) is obtained as follows. Arrange components in V = (x0, x1, . . . , xD−1) in the ascending order keeping their subscripts. As a result we obtain a sequence of groups consisting of equal components. Each group may have from one to D elements. Of course, in the latter case there will be only one group. The value composing each group in the sequence increases from left to right. Arrange subscripts in each group in the ascending order. Denote this way prepared sequence of components with their initial subscripts asṼ = (vr 0 , vr 1 , . . . , vr D−1 ). The row of subscripts inṼ is π(V ) from (15). This is standard PE symbol with (5) adopted with the only difference that in the rule (7) the opposite inequality sign is used. The sequence e(V ) is composed of D − 1 zeros and ones by the following rule: if vr i−1 = vr i then ei = 1, otherwise ei = 0.
Theorem 2. Symbolization MPE2 produces the same partition of a set of embedding vectors as does the AE one described in the Sec. 3.1, above.
Proof. In order to prove this statement we need to show that for any V1, V2 the following equivalence holds: It is easily seen that σ(V ) can be unambiguously recovered from α(V ). Indeed, V and α(V ) considered as a vector, have the same rank order of components. And calculation of σ(V ) is based exclusively on the rank order. Therefore, Thus, vectors with same α will have same σ. This proves one half of (16). In order to prove the second half, we need to show how α(V ) = (a0, a1, . . . , aD−1) can be unambiguously recovered from σ(V ). For this purpose we use the mentioned above equality (17). So, if we arrange the α(V ) in ascending order retaining subscripts, we obtain, instead ofṼ , above, a vectorα = (ãr 0 ,ãr 1 , . . . ,ãr D−1 ). This vector consists of groups of equal values: the first group has only zeros, the second one -only '1', the last one -only 'd − 1', where d is the number of unique components in the V or α(V ). The sequence of subscripts in theα is the permutation, which constitutes the first part in the σ(V ). If one would have anα without subscripts inherited from the α(V ), in the form of β = (b0, b1, . . . , bD−1), the required α might be obtained by applying permutation π(V ) to β. Namely, for i = 0, 1, . . . , D − 1, ar i = bi, where ri is taken from the permutation π(V ): ri = πi. The required sequence β can be recovered from the second part of σ. For this purpose, do the following reprocessing of e(V ). Replace e(V ) with the following sequence (0, e1, e2, . . . , eD−1). With the obtained new sequence proceed as follows. At the step number one, if e1 = 1 replace it with 0, otherwise replace it wit 1. Similarly, at the step number i, if ei = 1 replace it with the number put at the previous step in place of ei−1. Otherwise, replace ei with that same number incremented by 1. After replacing eD−1 we obtain the required sequence β. This completes the proof.
C One more example 7 Here we consider the sequence of digits in decimal expansion of √ 2. The first 1 million digits in the decimal expansion of √ 2 has been downloaded from here: https://catonmat.net/tools/generate-sqrt2-digits and here: https://apod.nasa.gov/htmltest/gifcity/sqrt2.1mil. Denote this sequence S1. The first 10 millions digits in the decimal expansion of √ 2 has been downloaded from here: https://apod.nasa.gov/htmltest/gifcity/sqrt2.10mil. Denote this sequence S10. Both PE and AE were calculated for both S1 and S10 for different embedding dimensions D with delay τ = 1.  Table 4: Occurrences of different AE patterns in S1. Here n min denote the smallest number of repetitions in S1 for a pattern, n max -is the biggest number of repetitions, N tot -is the total number of patterns found.  Table 5: Arithmetic entropy, permutation entropy and normalized entropies for the first 1 000 000 digits of √ 2. Entropy is given in bits. NAE is calculated as AE/log 2 (N tot ), were N tot is taken from the bottom row of Table 4.
From the Tables 5 and 6 we see that AE is usually bigger than PE. This could be explained by the bigger total number of patterns available in the AE symbolization. Perhaps, for this same reason normalized AE is smaller than NPE for small D. What seems unexpected, it is the opposite behavior of NPE and APE with growing D. Namely, NAE is increasing and NPE is decreasing function of D for the parameter set considered 8 . As it is illustrated in the previous paragraph, the D-tuples of digits from the expansion sequence are distributed unevenly between different order patterns both for PE and AE. (This might explain dispersion of the patterns' frequencies observed in [6, Fig. 8]). The above mentioned behavior with increasing D suggests that the unevenness decreases for AE and increases for PE, at least in some "normalized" sense. This is for the √ 2 expansion. Whether a similar behavior takes place for other sequences, and a possible practical utilization of this fact require additional study.

Data Availability
The data used to support the findings of this study are available from the author upon request.

Conflicts of Interest
The author declares that there are no conflicts of interest.