Theory of Acceleration of Decision Making by Correlated Time Sequences

Photonic accelerators have been intensively studied to provide enhanced information processing capability to benefit from the unique attributes of physical processes. Recently, it has been reported that chaotically oscillating ultrafast time series from a laser, called laser chaos, provide the ability to solve multi-armed bandit (MAB) problems or decision-making problems at GHz order. Furthermore, it has been confirmed that the negatively correlated time-domain structure of laser chaos contributes to the acceleration of decision-making. However, the underlying mechanism of why decision-making is accelerated by correlated time series is unknown. In this study, we demonstrate a theoretical model to account for accelerating decision-making by correlated time sequence. We first confirm the effectiveness of the negative autocorrelation inherent in time series for solving two-armed bandit problems using Fourier transform surrogate methods. We propose a theoretical model that concerns the correlated time series subjected to the decision-making system and the internal status of the system therein in a unified manner, inspired by correlated random walks. We demonstrate that the performance derived analytically by the theory agrees well with the numerical simulations, which confirms the validity of the proposed model and leads to optimal system design. The present study paves the way for improving the effectiveness of correlated time series for decision-making, impacting artificial intelligence and other applications.


Introduction
Optics and photonics have been extensively studied for highspeed information processing in various applications, especially machine learning [1][2][3][4][5]. One of the important branches of the research frontier is reinforcement learning [6], wherein the impacts of photonics have been intensively examined [7][8][9]. e multi-armed bandit (MAB) problem regards decision-making in obtaining high rewards from multiple selections, called arms, wherein the best arm is initially unknown. MAB problems concern a difficult tradeoff known as the exploration-exploitation dilemma, which captures a fundamental aspect of reinforcement learning [6]. e physical properties of photons have been utilized in solving MAB problems [7,8]. In particular, chaotically oscillating ultrafast time series generated by semiconductor lasers, called laser chaos, has been successfully utilized in resolving two-armed bandit problems in GHz order, which we call laser chaos decision-maker hereafter [7]. As introduced below, the principle of the laser chaos decisionmaker simply depends on the signal-level comparison between the chaotically oscillating time series and the threshold level. It has also been demonstrated that such a level comparison-based principle is scalable in a tree architecture, which can be experimentally demonstrated up to 64 arms [10].
Furthermore, the applications of laser chaos decisionmakers have been studied to benefit from their prompt adaptation abilities in dynamically changing uncertain environments [11][12][13][14]. Takeuchi et al. applied laser chaos decision-making to channel selection problems in wireless communications [11], in which communication channels suffer from dynamically changing disturbances due to traffic, interference, or fading [15]. Kanemasa et al. extended the principle using laser chaos decision-maker to channel bonding in IEEE 802.11ac networks [12]. Furthermore, Duan et al. optimized user-pairing in non-orthogonal multiple access (NOMA) systems by laser chaos decisionmaker [13]. Moreover, Kanno et al. combined laser chaosbased decision-making with photonic reservoir computing, where adaptive model selection is realized to enhance the computing capability [14].
In [7], it was demonstrated that the autocorrelation inherent in laser chaos time series impacted the decisionmaking performances. Indeed, chaotic time series with negative maximum autocorrelation yield superior performances when compared with pseudorandom numbers, colored noise, and random shuffle surrogate data of the original laser chaos time series [7]. Furthermore, Okada et al. extensively examined the decision-making acceleration by laser chaos using surrogate analysis, such as the Fourier transform surrogate [16]. It was found that both statistical distributions of the amplitude of time series and negative autocorrelation therein impact decision-making performances [16].
In the literature, the usefulness of negative autocorrelation in time series has been theoretically analyzed regarding code division multiple access (CDMA) [17][18][19]. To achieve high performance in CDMA, the cross-correlation between the spreading sequences must be small. e optimal negative autocorrelation to minimize the interference has been mathematically derived, and the chaotic map that generates the smallest cross-correlation was defined. In addition, ref. [19] clarifies that the negative autocorrelation that minimizes cross-correlation accelerates the performance of solution search algorithms for combinatorial optimization problems. An FIR filter to generate the optimal chaotic CDMA sequence was also proposed based on the negative autocorrelation analysis [20]. Moreover, the effectiveness of such optimal negative autocorrelation codes has been experimentally demonstrated using software-defined radio systems [21].
However, regarding decision-making, the fundamental underlying mechanism of how negative autocorrelation inherent in time series impacts performance superiority is still unclear.
at is, the results in the previous studies [7,16] are all limited in empirical findings. If the effectiveness of the negative autocorrelation in laser chaos or correlated time series for decision-making is theoretically grasped, it allows, for example, a systematic design approach to derive the optimal autocorrelation depending on given problem situations. Besides, the insights gained by mathematical modeling ensure the reliability of the effectiveness provided by the negative autocorrelation in time series.
In this study, we theoretically construct a model to account for the effect of negative autocorrelation in decisionmaking performances. e theory of this study is inspired by correlated random walk [22,23]. Contrary to conventional random walks, which have transition probabilities independent of prior events, correlated random walks have probabilities dependent on prior events [22,23]. at is, the notion of correlated random walks allows us to represent state-dependent, different probability evolution dynamics. Such a theoretical architecture accounts for the interplay between the correlated time series and the evolution of decision-making. We clarify the validity of the proposed theoretical model by confirming the excellent agreement of the decision-making performances derived analytically by the proposed model and by numerical simulations. e rest of the article is organized as follows: Section 2 reviews the mechanism of laser chaos decision-maker. In Section 3, we introduce a numerical method to generate an arbitrary autocorrelation in time series, by which the relevance between autocorrelation and the resultant decisionmaking performance is systematically examined. Section 4, which is the most important contribution of this study, demonstrates the theoretical model of decision-making based on correlated time sequences. Section 5 demonstrates the agreement of the decision-making performances predicted by the proposed theory and numerical simulations. Section 6 concludes the article.

Laser Chaos Decision Maker: Using Time Series for Decision-Making
As mentioned in Section 1, the laser chaos time series allows ultrafast decision-making. Figure 1(a) schematically illustrates the architecture of the laser chaos decision-maker for a two-arm bandit problem, which is the scope of this study [7]. e two arms are called slot machines A and B. Laser chaos is generated by subjecting a portion of the output light back to the laser by an externally arranged reflector, which is called delayed feedback. We compare the intensity level of the laser chaos with a certain threshold value, which is denoted by e decision-making is executed as follows: When the sampled value of the time series is above the threshold, the decision is to choose slot machine A; otherwise, slot machine B is selected. e threshold T(t) is updated according to the result of the slot machine play. Overall, the threshold update is conducted under the assumption that the revised threshold will lead to the same decision in the subsequent decisions when the present action is successful, whereas the threshold is revised to the opposite direction when the present action is a failure [7,8,10].
More precisely, the values of threshold T(t) are determined by where TA(t) is called the threshold adjuster and [ * ] is the nearest integer to * . [TA(t)] can take an integer value ranging from −N to N, with N being a natural number.
2 Complexity erefore, the number of levels that the threshold adjustor can take is 2N + 1. Here, k is a coefficient to convert [TA(t)] to T(t).
TA(t) is updated depending on the result of the action conducted at t − 1: where Δ denotes increment, which is given by Δ � 1 in this study. α is the forgetting parameter for weighting previous threshold adjuster variables, ranging from 0 to 1, that is 0 ≤ α ≤ 1. Ω is called the penalty parameter [7,8].
A hierarchical formation of such two-armed bandit problems has been proposed to deal with problems with more than two arms [10]. e elemental structure is the abovementioned two-armed situations with a dynamically updated threshold. is study focuses on two-arm situations as the first theoretical analysis on the laser chaos decisionmaker. e analysis of cases with more than two arms can be done by extending the method proposed in this study; however, that will become a very complicated analysis. erefore, we focus on a simple case in this study, and the cases with more than two arms will be our future work.

Effectiveness of Correlated Time Series on Decision-Making
As described in Section 1, the performance of the two-armed bandit problem using laser chaos time series depends on the autocorrelation inherent therein [7,16]. e best performance is obtained when the autocorrelation of the time series exhibited its negative maximum [7]. Furthermore, the surrogate data analysis of laser chaos time series clarifies the impact of time-domain correlation [10]. In this study, to examine the influence of correlations in time series in a systematic manner, we introduce an artistically constructed time-correlated time series and analyze its influence on decision-making performance.
We construct a time series whose amplitude follows a Gaussian distribution while having a determined autocorrelation by utilizing the Fourier transform surrogate method [24]. e various steps involved are as follows: G re a te r th a n T (t ) L e s s t h a n T ( t )

Complexity
(1) A time series r(t) is constructed with t ranging from 0 to T −1, where T is the length of the time series.
Here, we suppose that r(t) � r(0) λ t . Specifically, r(t) � λ r(t − 1) holds, indicating that r(t) undergoes a time correlation specified by λ to its previous point r(t − 1). We call λ the autocorrelation coefficient in this study. rough the process above, the autocorrelation of the resultant r′(t) is equivalent to that of r(t). However, the amplitude distribution of r′(t) follows a Gaussian profile because of the randomized phase factors in the Fourier domain.
e above-described process corresponds to a special case of Fourier transform surrogate [24].
Snapshots of the time series generated for the cases when the time correlation is specified by λ � 0.8, 0, and −0.8 are shown in Figures 1 respectively. All of the timeseries signals appear random, but there are distinct differences in their autocorrelation. With λ � 0.8, the signal level at time t is similar to the signals around that point, that is radically large signal-level differences in consecutive data points are rarely observed (Figure 1(b)). Conversely, with λ � −0.8, meaning a strong negative autocorrelation, the signal at time t has almost the exact opposite value to the surrounding data ( Figure 1(d)). As a result, the time series exhibits a highly time-varying structure. Meanwhile, the histogram of the signal level of these time series follows the same Gaussian distribution.
It should be noted that the above-described Fourier transform surrogate-based procedure does not perfectly reproduce the experimentally observed laser chaos time series. is is because the correlation in the above process is determined only by r(t) � λ r(t − 1) in Step (1), whereas the experimental laser chaos involves very long-range time correlations via delayed optical feedback. However, we consider that the Fourier transform surrogate-based method is quite beneficial to this study for several reasons. e first is that the correlation between two successive points can be specified by an arbitrary number, allowing λ values smaller than even −0.5, which was experimentally not feasible, at least in the previous studies [7,10]. erefore, systematic analysis is enabled for a wide range of λ. e second is that amplitude distributions are kept equivalent between each other even when λ is configured to different values, which also allows us a clear examination of the impact of autocorrelation inherent in the time series.
For these reasons, we use the time-series r′(t) generated using the above process. We then analyze how the MAB performance depends on the autocorrelation specified by λ.
In evaluating the performance of the MAB problem, we employ the correct decision rate (CDR). e CDR(t) is defined as the ratio of selecting a slot machine with the highest reward probability at a time step t and averaged over m simulations or cycles. at is, CDR(t) is expressed by where m is the number of cycles with different random initial conditions. Here, C i (t) � 1 when the slot machine with the highest reward probability is selected at the tth decision (or time t) of the ith cycle. In other words, correct decisionmaking is conducted. Otherwise, C i (t) � 0, meaning that correct decision-making is not executed. In the following simulations, m � 60000. Figure 2 summarizes the calculated CDR at t � 1000 as a function of the autocorrelation coefficient λ in several different reward environments and the setting of the decisionmaker.
e reward probability of the two slot machines, called machine A and machine B, is denoted by P A and P B , respectively. For example, in Figure 2(a), P A and P B are given as 0.9 and 0.3, respectively. In this situation, the correct decision is to select machine A as it is the slot machine with the highest reward probability (P A > P B ). In addition, the number of levels of threshold adjustor is 5, and specified by N � 2. It should be emphasized that a higher CDR is obtained when the autocorrelation is negative; indeed, the best CDR is given by λ � −0.6.  Table 1 summarizes the reward probabilities of slot machines and the number of threshold levels N for each MAB problem. In Figures 2(b) and 2(c), P A and P B are differently configured while maintaining the same threshold number as in Figure 2(a) (i.e., N � 2). More specifically, the difference of P A and P B is only 0.1 in Figure 2(b) by setting (P A , P B ) � (0.6, 0.5). Similarly, the difference is 0.2 in Figure 2(c) by setting (P A , P B ) � (0.9, 0.7).
at is, the difficulties in finding the best machine are configured differently. Here, it should be noted that the highest CDR is accomplished when the autocorrelation coefficient λ is given by −0.8 and −0.3 in Figures 2(b) and 2(c), respectively. at is, the best decision-making is realized with negatively correlated time series. e reward setting of (P A , respectively. e only difference is in the threshold value, which is specified by N � 4. e achieved CDR was different because of the change in the value of N. However, it should be noted that the highest CDR performances are all obtained with negative autocorrelation when λ is given by −0.6, −0.9, and −0.

Theoretical Model of Decision-Making Using
Correlated Time Series is section shows a mathematical model to account for the impact of correlated time series on decision-making. Here, we focus on two-armed bandit problems where two slot 4 Complexity machines are called machines A and B. Figure 3 shows a conceptual architecture of the proposed model. We assume that slot machine A has a larger reward probability than slot machine B, that is P A > P B . erefore, the correct decision would be to choose slot machine A.
Here, we assume that the subjected time sequence takes either of the two signal levels specified by + x or −x, which is denoted by sky blue marks in Figure 3. In the meantime, remember that the threshold level, T(t) given by equation (1), takes in total 2N + 1 different signal levels, each of which is represented by −N, −N + 1, . . ., N − 1, N. Furthermore, we assume that the higher-level signal + x satisfies N − 1 < x < N, meaning that the upper signal level of the incoming time series is below the maximum threshold level but greater than the second maximum threshold. Similarly, the lower signal level (−x) satisfies −N < −x < −N + 1, indicating that the lower signal level of the subjected time series is above the minimum threshold level but less than the second minimum threshold.
Based on the decision-making principle described in Section 2, we summarize the decision-making process in the present situation. Let the signal level of the incoming time series at time t and the threshold level at time t be denoted by s(t) and T(t), respectively. (1) If s(t) is given by + x, the decision is to select machine A because s(t) � +x is greater than N -1.
Furthermore, the incoming signal s(t) contains inherent correlations, as discussed in Sections 1 and 2. Concerning the fact that s(t) under study is a two-level signal train, we can think of the probability where the signal level s(t + 1) at time t + 1 is different from s(t) at time t, that is s(t + 1) � +x results after s(t) � −x or s(t + 1) � −x after s(t) � +x. Since the autocorrelation between two consecutive timings is given by λ, such a signal-level changing probability is given by μ � (1 − λ)/2. Conversely, the probability of exhibiting the same signal level is given by 1 − μ � (1 + λ)/2. erefore, such stochastic processes are represented by conditional probabilities given by  Table 1: e settings of the reward probabilities of slot machines (P A and P B ) and the parameter N that specifies the number of threshold levels (2N + 1). Pr(s(t + 1) � ± x|s(t) � ∓x) � μ and Pr(s(t + 1) where Pr denotes probability. e important aspect is that the internal status of the decision-maker, represented by T(t), is tightly coupled with the correlated time series subjected to the system as well as the betting results of the slot machine playing, which is specified by P A and P B . e behavior of the revision of T(t) is described by the following cases: threshold is updated as when slot machine A is selected, the threshold is updated as T(t) + 1 when machine A fails with probability 1 − P A , T(t) − 1 when machine A wins with probability P A , and when slot machine B is selected, the threshold is updated as It should be noted that regardless of the machine selection and betting result, the threshold level always increases or decreases in this case, meaning that the same threshold level is not allowed. e procedure summarized above is a special case of the principle shown in Section 2 by specifying the parameters therein by k � Δ � Ω � α � 1. In addition, we have to emphasize that the upper and lower limits of T(t) are newly posed when the decrement or increment of the threshold is not permitted beyond the range between −N and N.

Complexity
Hereafter, we refer to this as the stopping rule. is setting is the simplest case for the laser chaos decision-maker. We use this simplest case to keep our analysis model from being too complicated. Cases with other settings may be possible by extending our proposed scheme, but this will be a future project.
To theoretically deal with the abovementioned seemingly complex situations, we introduce a set v t � (T(t), s(t)), which represents the state of the model at time t. e space spanned by v t is −N, −N + 1, · · · , N − 1, Herein, we can characterize the state transition probability between two states. Let, for example, the current state is specified by (i, +x) while T(t) is not at the border, that is −N + 1 ≤ i ≤ N − 1. Here, we consider the probability of the state transition as (i + 1, −x). It should be noted that the decision is to select machine A in this given situation (i, +x) since the signal level + x is larger than the current threshold T(t). In this state transition from (i, +x) to (i + 1, −x), the threshold is incremented (i ⟶ i + 1) and the incoming signal level is reversed (+x ⟶ −x). Such a situation occurs when the slot machine A playing is unsuccessful and the incoming signal level is flipped, whose probability is given by (1 − P A )μ. Similarly, all transition probabilities are determined. e notion of correlated random walk allows us to summarize such transitions in a unified manner [22,23]. We first introduce the probability of the state v t by π t (v) � π t (i, σ), meaning the probability of the state with T(t) = i and s(t) = σ. In addition, we define a probability vector π t (i), which is given by which combines the probabilities involving the threshold level being i for different signal levels of the time series (+x and −x). We denote the probability of the threshold being i at time t, regardless of the incoming signal level, by π t (i), which is mathematically equivalent to the L 1 -norm of π t (i). at is π t (i) � π t (i, +x) + π t (i, −x) � π t (i, +x) 1 Based on these preparations, the recurrent formulae of π t (i) lead us to precisely characterize the behavior of the system.

Case 1.
e probability vector for the case when the threshold is between −N + 1 and N -1 at time t + 1 is given by where the matrices P and Q are given by Equation (11) clearly implies that the probability vector of the threshold being i comprises the transitions from the states with the thresholds being i -1 and i + 1. e elements of the matrices P(i) and Q(i) are intuitively easily understood by the following. e dynamics given by equation (11) are schematically illustrated in Figure 4(a). e matrix P(i) concerns the probability of decrementing the threshold level. For example, the (1, 1)-element of P(i), or P 1,1 (i), represents the probability of the transition from the state (i, +x) to (i − 1, +x). e state (i, +x) indicates that the decision is to select machine A. e decrement of the threshold indicates that the result is a win. e probability of consecutive identical signal levels is given by 1 − μ. Hence, P 1,1 (i) � P A (1 -μ). Similarly, P 1,2 (i) means the probability of the transition from the state (i, −x) to (i -1, +x); the difference is the change of the polarity of the incoming signal level. erefore, P 1,2 (i) � (1 -P B ) μ. Similarly, P 2,1 (i) corresponds to the probability of the transition from the state (i, +x) to (i -1, −x), and P 2,2 (i) corresponds to the transition from (i, −x) to (i − 1, −x). e blue arrows in Figure 4(a) schematically represent the role of the matrix P(i), which concerns the decrementing of the threshold level.
Conversely, the matrix Q(i) concerns the probability of incrementing the threshold level. Q 1,1 (i), for example, represents the probability of the transition from the state (i, +x) to (i + 1, +x), meaning that the threshold is incremented while the signal level is unchanged. is situation represents the decision to select machine A, the result is lost, and the polarity of the incoming signal is the same; the corresponding probability is given by (1 -P A ) (1 -μ). Similarly, other elements of Q(i) are specified straightforwardly. e red arrows in Figure 4(a) schematically represent the role of the matrix Q(i), which concerns the incrementing of the threshold level.

Case 2.
e probability vector for the case when the threshold is at the edge on the negative side, −N at time t + 1 is specified by π t+1 (−N) � P(−N)π t (−N) + P(−N + 1)π t (−N + 1). (14) Edges are to be treated carefully in this case. First, P(−N + 1) in the second term on the right-hand side of equation (14) describes the transition of the decrement of the threshold level from -N + 1 to N, which has already been defined in equation (12). Second, since there are no threshold levels smaller than −N, the transitions involving increments or any Q matrix are not included in equation Complexity (14). ird, what is different from Case 1 above is that the threshold level can be maintained at the edges, which is indicated by the first term on the right-hand side of equation (14). More specifically, the P matrix at −N is given by P 1,1 (−N) means the state transition from (−N, +x) to (−N, +x). is corresponds to the decision to select machine A, the result is a win, and the signal polarity is unchanged. erefore P 1,1 (−N) = P A (1 -μ). Similarly, P 1,2 (−N) means the state transition from (−N, −x) to (−N, +x); what is different from P 1,1 (−N) is the change in polarity. Hence, P 1,2 (−N) = P A μ. Likewise, P 2,1 (−N) and P 2,2 (−N) can be obtained. e blue arrows in Figure 4(b) illustrates the role of the matrix P(−N), which concerns keeping the same threshold level.
Case 3. Similar to Case 2, the probability vector for the case when the threshold is N at time t + 1 is specified by Update threshold Threshold level 8 Complexity e meaning of equation (16) is similar to equation (14). Q(N − 1) in the right-hand side of equation (16) has been already defined in equation (13). As in Case 2, the threshold level can be maintained at the edge, which is shown by Q(N) in equation (16). is is given by Q 1,1 (N) means the state transition from (N, +x) to (N, +x). is corresponds to the decision to select machine B, the result is a win, and the signal polarity is unchanged. erefore Q 1,1 (N) � P B (1 -μ). Similarly, Q 1,2 (N) indicates the state transition from (N, −x) to (N, +x); what is different from Q 1,1 (N) is the change in polarity. Hence, Q 1,2 (N) � P B μ. Likewise, Q 2,1 (N) and Q 2,2 (N) can be obtained. e red arrows in Figure 4(c) illustrate the role of the matrix Q(N), which concerns keeping the same threshold level.
Finally, a remark is needed for the matrix P at N and matrix Q at −N, which should be different from the one given by equations (12) and (13), and are given by is is because the decision at the edges does not depend on the incoming signal level. For example, with the threshold at N, the decision is always to select machine B because both signal levels + x and −x are smaller than the threshold. Hence P 1,1 (N) means the probability of the state transition from (N, +x) to (N − 1, +x), meaning that the decision is to select machine B, the result is a loss, and the polarity of the signal is unchanged. μ). Similarly, all other elements in equations (18) and (19) are specified. e blue arrows in Figure 4(c) and the red arrows in Figure 4(b) illustrate P(N) and Q(−N), respectively. Figure 5 summarizes the chains of the probability vector π t (i) by equations (11), (14), and (16). e blue arrows, which regard the decrement of the threshold level, are induced by either a win by selecting machine A or a loss by selecting machine B. In contrast, the red arrows, which represent the increment of the threshold level, are triggered by either a win by selecting machine B or loss by selecting machine A. e thresholds at the edge (−N and N) involve arrows of transitions to an identical threshold.
Finally, the CDR can be discussed using the probabilities defined above. Assume that the correct decision is to select machine A. e selection of machine A is realized excessively in the following two cases: (1) e threshold is −N. In this case, both signal levels −x and +x result in the decision to choose machine A.
(2) When the threshold is between −N + 1 and N -1, the input signal level of +x results in the decision to choose machine A.
Hence, the probability of selecting machine A at time t, denoted by CDR (theory) (t), is given by

Evaluation
With the theoretical model shown in Section 4, we can calculate the time evolution of the probability vector π t (i) and its L 1 -norm π t (i) from any initial conditions. Consequently, CDR (theory) (t) is derived by equation (20).
Here, we examine the case when the reward probabilities are given by P A = 0.9 and P B = 0.7 and assume that N is given by 2, meaning that the number of threshold levels is 5. Herein, the initial probability vector is given by π 1 (0) � (0.5, 0.5) while assuming all the other vectors are zero. e autocorrelation coefficient λ specifies the time-correlated, two-level signal trains. Figure 5(b) shows the analytically calculated chains of probability vectors. As time evolves, the probability vector at the edge (i � −2) increases, indicating a high likelihood of choosing machine A, which is the correct decision (since P A > P B ).
To examine the mechanism more deeply, Figures 6(a)-6(c) demonstrate the time evolution of the probability when the threshold is at level i (i � −2, −1, 0, 1, 2) and when the autocorrelation λ is specified by −0.8, 0, and 0.8, respectively. What is commonly observed in these figures is that π t (−2), indicated by blue curves, increases as the time elapses, leading to a high chance of selecting machine A or correct decision-making. Meanwhile, π t (2), indicated by green curves, exhibits approximately 0.2 at a time step of 25 when λ is 0.8 ( Figure 6(c)), whereas it shows nearly zero at the same timing when λ is −0.8 (Figure 6(a)). is indicates that the probability of choosing machine B, which is the wrong decision, is not negligible when λ � 0.8.
From another perspective, the blue, red, and yellow markers in Figure 6(d) characterize the probabilities of the threshold at t � 1000, which is written as π 1000 (i), when the autocorrelation is specified for λ values given by −0.8, 0, and 0.8, respectively. We can clearly observe a large probability greater than 0.6 about the threshold level of −2, regardless of λ values.
It is remarkable that for λ � −0.8, the probability monotonically decreases as the threshold increases, whereas for λ � 0.8, the probability increases when the threshold increases from 0 to 2. Even with zero autocorrelation (λ � 0), a slight increase in probability is observed at the threshold level of 2. We assume that a positive autocorrelation tends to conduct similar decisions consecutively, and hence the decision can be locked in a status, which is actually not the  Figure 5: (a) Chains of the probability vector π t (i) given by equations (11), (14), and (16). (b) An example of the evolution of probability vector π t (i) when the initial condition is π 0 (0) � (0.5, 0.5), the autocorrelation coefficient λ is −0.8, the threshold number is specified by N � 2, and the reward environment is (P A , P B ) � (0.9, 0.7). Probability π t (-2) π t (-1) π t (0) π t (1) π t (2) λ = -0.8  Figure 6: Continued. optimal one. Indeed, a related tendency is observed in Figures 6(a)-6(c), where the dynamic change of probabilities, most notably by π t (0) indicated by orange curves, exhibits a strong oscillatory behavior with λ � −0.8, whereas it is attenuated when λ � 0.8.
As discussed in Section 4, the decision-making ability can be theoretically derived as CDR (theory) (t), given in equation (20) using the probability model. We examined CDR (theory) (t) depending on a variety of conditions. Herein, the reward probabilities (P A , P B ) and the number of threshold levels specified by N are summarized in Table 1, which are the same as discussed in Section 3 and Figure 2. For example, Figure 7(a) concerns the case (P A , P B ) � (0.9, 0.3) and N � 2. e red curves in Figure 7 show CDR (theory) (1000) as a function of autocorrelation coefficient λ ranging from −0.95 to 0.9 with 0.05 interval. In addition, λ � −0.99 is examined. For all cases in Figure 7, the maximum CDR (theory) (1000) is obtained when the autocorrelation coefficient is negative, indicated by red arrows therein, which coincide with the numerical observations shown in Figure 2.
Furthermore, we numerically simulate the correct decision rate CDR(t) defined in equation (3) based on the original decision-making algorithm described in Section 3 while adapting the stopping rule in Section 4. e results are shown by the blue curves in Figure 7. We observe in all panels in Figure 7 that the results from theory (red) and simulation (blue) match well with each other. Additionally, while the blue marks exhibit fluctuations since they are obtained as a statistical average via numerical results, the results in red marks are smooth because they are analytically derived based on the theory described in Section 4.

Conclusion
In this study, we construct a theoretical model to account for the acceleration of decision-making by correlated time sequences. Previous studies have shown that the solution to the two-armed bandit problem is accelerated by negative autocorrelation inherent in the time series subjected to the decision-making system. However, its underlying mechanisms are unclear. We begin the discussion by clarifying the impact of time-domain correlation on decision-making by utilizing time series with specific autocorrelation designed via Fourier transform surrogate. Coinciding with the prior reports of using experimentally observed laser chaos time series, we confirm that the negative autocorrelation accomplishes superior decision-making performance. e difficulties in understanding the underlying mechanism of such acceleration stem from the fact that multiple entities are involved: the dynamical reconfiguration of the internal status of the decision-maker (the threshold level and its revision), timedomain structure of the incoming time series, and stochastic attributes of the environment (reward probability of slot machines). e theoretical model of this study unifies these entities based on correlated random walks. Furthermore, the decision-making performance obtained analytically by the theoretical model agrees with the numerical results from simulations, which validates the proposed theory. Additionally, this indicates that the optimal autocorrelation for maximizing can be obtained through the model without executing enormous numerical simulations. e proposed scheme to select the best laser chaos with the best autocorrelation can accelerate performance in applications such as wireless communication systems [11][12][13].
is study constitutes a foundation of the intellectual mechanism enhanced by correlated time series, which is important for future information and communications technology. e laser chaos decision-maker can quickly solve MAB problems with GHz order decisions. erefore, it will be possible to optimize decisions in wireless communication systems in real time. However, a dedicated device for the laser chaos decision-maker is necessary. In the meantime, a chip-scale photonic implementation has been recently demonstrated [25] on the basis of the recent advancements in integrated photonics technology, indicating the potential for system integration and miniaturization.
Data Availability e data that are used to support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.