Strongly Nonlinear Stochastic Processes in Physics and the Life Sciences

Strongly nonlinear stochastic processes can be found in many applications in physics and the life sciences. In particular, in physics, strongly nonlinear stochastic processes play an important role in understanding nonlinear Markov diffusion processes and have frequently been used to describe order-disorder phase transitions of equilibrium and nonequilibrium systems. However, diffusion processes represent only one class of strongly nonlinear stochastic processes out of four fundamental classes of time-discrete and time-continuous processes evolving on discrete and continuous state spaces. Moreover, strongly nonlinear stochastic processes appear both as Markov and non-Markovian processes. In this paper the full spectrum of strongly nonlinear stochastic processes is presented. Not only are processes presented that are defined by nonlinear diffusion and nonlinear Fokker-Planck equations but also processes are discussed that are defined by nonlinearMarkov chains, nonlinear master equations, and strongly nonlinear stochastic iterative maps. Markovian as well as non-Markovian processes are considered. Applications range from classical fields of physics such as astrophysics, accelerator physics, order-disorder phase transitions of liquids, material physics of porous media, quantum mechanical descriptions, and synchronization phenomena in equilibriumandnonequilibrium systems to problems inmathematics, engineering sciences, biology, psychology, social sciences, finance, and economics.


McKean and a New Class of Stochastic Processes.
In two seminal papers, McKean introduced a new type of stochastic processes [1,2].The stochastic processes studied by McKean satisfy drift-diffusion equations for probability densities that are nonlinear with respect to their probability densities.While nonlinear drift-diffusion equations have frequently been used to describe how concentration fields evolve in time [3], the models considered by McKean describe stochastic processes.A stochastic process not only describes how expectation values (such as the mean value) evolve in time, but also, provides a complete description of all possible correlations between two or more than two time points [4][5][6].Consequently, the study of stochastic processes goes beyond the study of concentration fields or the evolution of single-time point expectation values such as the mean.The observation by McKean that nonlinear drift-diffusion equations can describe stochastic processes opened a new avenue of research.

From McKean's Stochastic Processes to Strongly Nonlinear
Stochastic Processes.Due to the key role that the nonlinearity has for the processes considered by McKean and due to the fact that the processes satisfy the Markov property, processes of this type have been called "nonlinear Markov processes" (see, e.g., [7]).Moreover, in this context the term "nonlinear Fokker-Planck equations" has also been introduced because the drift-diffusion equations of nonlinear Markov processes exhibit formally the mathematical structure of Fokker-Planck equations.However, the phrases "nonlinear Markov processes" and "nonlinear Fokker-Planck equations" are misleading because they refer both to processes that are subjected to nonlinear forces and to processes that are described by drift-diffusion equations that are nonlinear with respect to their probability densities.In search for a more specific label, two alternative terms have been proposed in the literature.Wehner and Wolfer [8] suggested the term "truly nonlinear Fokker-Planck equations, " whereas Frank [9] coined the term "strongly nonlinear Fokker-Planck equations." Only recently, it has become clear that the class of stochastic processes addressed by McKean is not limited to be described by driftdiffusion equations of the Fokker-Planck type.Therefore, the research field inspired by the work of McKean is about nonlinear stochastic processes in general.To give processes of this kind a more specific name, we may adopt the suggestions by Wehner and Wolfer or Frank and say that there is a novel class of processes called "truly nonlinear stochastic processes" or "strongly nonlinear stochastic processes." In what follows, we will use the term "strongly" that should be considered as a synonym to "truly" just to avoid repeating the phrase "truly or strongly" over and over again.
For the purpose of this paper, the following unifying definition of a strongly nonlinear process will be used: A strongly nonlinear stochastic process is a stochastic process for which the evolution of realizations depends on at least one expectation value of the process or on the appropriately defined probability density of that process.
Note that for strongly nonlinear processes evolving on discrete state spaces the realizations are required to depend on at least one expectation value or on the occupation probability rather than the probability density.However, as we will demonstrate in Section 2, for our purposes the occupation probability of a discrete-state process can be derived from an appropriately defined probability density.Therefore, the definition as given above holds both for processes evolving on continuous and discrete state spaces.
In this review of strongly nonlinear stochastic processes a distinction will be made between four classes of processes.These classes arise naturally if it is taken into consideration that in many disciplines time is not necessarily considered to be a continuous variable.Rather, time might be considered as a discrete variable such that processes evolve in discrete time steps.For example, a process describing the evolution of a stock price on a day-to-day basis is typically modelled by means of a time-discrete process [10,11].In short, time can be a discrete or continuous variable.Likewise, the state space of a process might be discrete or continuous.The four possible combinations of time and space characteristics give rise to four classes of stochastic processes.We will address these classes in separate sections, see Table 1.
Before we proceed in Sections 3, 4, 5, and 6 with the systematic presentation outlined in Table 1, we will present in Section 2 benchmark examples of processes for each of the four classes.For the examples discussed in Section 2 a unifying mathematical definition irrespective of class membership will be presented in the final subsection of Section 2. Mean field theory and the mathematical literature will be addressed in Sections 7 and 8.As it turns out, all examples reviewed in Sections 2 to 6 exhibit the Markov property.Therefore, Section 9 is devoted to the review of some more recent studies on non-Markovian strongly nonlinear stochastic processes.In Section 10 some key issues of strongly nonlinear stochastic processes will be highlighted.

Benchmark Examples of Strongly Nonlinear Processes and on a General Stochastic Evolution Equation for a Broad Class of Strongly Nonlinear Markov Processes
2.1.Some Notations.At the present stage, it is useful to introduce some notations in order to give some benchmark examples for each of the four classes indicated in Table 1.Time-discrete processes evolve on discrete time steps  = 1, 2, 3, . ... In contrast, time-continuous processes are described by a continuous time variable  defined on some interval.For our purposes this interval will go from zero to infinity, meaning that we will consider initial value problems starting at an initial time  = 0 and extending in time infinitely.Note that the symbol  will be used to denote both discrete-valued and continuous time variables.
Processes evolving in discrete, finite state spaces exhibit a number of  states labeled by  = 1, 2, . . ., .Let  denote the state variable of a state discrete process, which implies that  assumes an integer between 1 and .Moreover, let   denote the th realization of the process.A fundamental (although not complete) stochastic description of processes evolving on finite, discrete state spaces is given by the occupation probabilities ().The occupation probability () defines the probability that the state  is occupied.It can be computed from a number of  realizations in the limiting case of  → ∞ like The function (⋅) is the Kronecker function, which equals 1 for   =  and equals 0 otherwise.The probabilities () defined by (1) satisfy the normalization condition Processes evolving on finite dimensional, continuous state spaces can be described by a state vector  with coefficients  1 , . . .,   .The elements or coordinates   are defined on certain domains of definition Ω  .For example, a domain of definition may correspond to the real line.An important measure characterizing stochastic processes evolving on continuous state spaces is the probability density  defined as the ensemble average over the Dirac delta function (⋅).Note that the same symbol (⋅) will be used for the Kronecker function and the Dirac delta function in order to stress the analogies between the classes listed in Table 1.Mathematically speaking, the probability density  can be defined by In (3) the variable  is the state vector with coordinates  1 , . . .,   , whereas the elements   1 , . . .,    are the elements of the th vector-valued realization   of the process.The probability density satisfies the normalization condition where Ω  are the domains of definitions mentioned previously.
Let us return to processes defined on a discrete state space.In addition, we put  = 1 and assume that Ω tot = Ω 1 denotes the real half-line from 0 to infinity.Formally, the probability density () assumes the form where () are the occupation probabilities of the states  defined in (1).In turn, the occupation probabilities can be calculated from () like where  is a positive number smaller than 1 (e.g.,  = 0.1).
In view of (6) the occupation probabilities () may be considered as functions of an appropriately defined probability density , as stated in Section 1.With these notations at hand, some benchmark examples of strongly nonlinear stochastic processes can be given.
Here,  describes the transition probability matrix of the process.The matrix is an  times  matrix with coefficients ( → ).The coefficients ( → ) describe the conditional probabilities that transitions from state  to state  occur given that the process is in the state .The conditional probabilities ( → ) may depend explicitly on time  such that ( → , ).A nonlinear Markov process is characterized by the property that at least one conditional probability depends at least on one expectation value computed from the process at time step  or on one occupation probability (, ).
In this case we have ( → , , ()) as indicated in (7).A complete description of the stochastic process can be derived from (7) as shown in Frank [12].In view of the definition of a strongly nonlinear stochastic process given in Section 1.2 that is based on realizations of stochastic processes, it is worth while to consider an alternative definition of the process.To this end, the stochastic map (⋅, ) can be defined that maps a state () to a state ( + 1) like  ( + 1) =  ( () , ,  () ,  ()) .
The mapping (⋅, ) depends on a random variable () that is uniformly distributed on the interval [0, 1] at any time step .Across time, () is statistically independent.The stochastic map (⋅, ) describes the jumps from () to ( + 1) satisfying the transition probabilities ( → ).In order to derive an explicit definition of (⋅, ), we first note that the conditional probabilities ( → ) are by definition real numbers between 0 and 1 that add up to unity.Next, it us useful to consider the cumulative conditional probabilities (, ) defined for  = 1, 2, . . . and  = 0, 1, . . .,  by for  > 0. For  = 0 we put (, 0) = 0.For  > 0 the cumulative measures (, ) describe the probabilities that transitions from state  to one of the states 1, 2, . . .,  occur at time  given that the process has an occupation probability vector  at time .The cumulative measures represent a set of monotonically increasing points on the interval [0, 1] such that the interval between two adjacent points (,  − 1) and (, ) equals ( → ).Consequently, the point spectrum (, ) in combination with the random variable () can be used to construct realizations of the process under consideration.If () =  holds and if () falls into the interval [(,  − 1), (, )), then we put ( + 1) = .If we do so, then transitions from  to  happen with the probability ( → ) if the process is in state  at  and distributed like () at -as required.A concise mathematical formulation of this idea can be obtained with the help of the indicator function ( → , ) defined by Then, the map (⋅, ) can be defined as follows: As indicated in (11), the mapping depends on the probability vector () because the indicator function (⋅) depends on () via the conditional probabilities  and .Therefore, each realization   () of the process defined by ( 8)- (11) depends on the occupation probabilities (, ) of the process.
The occupation probabilities in turn can be computed from the realizations; see (1).Therefore, the mathematical model given by (1), and ( 8)-( 11) provides a closed (and complete) description for the strongly nonlinear stochastic process described by the nonlinear Markov chain (7).

Strongly Nonlinear Markov Processes Described by Stochastic Iterative
Maps.The realization of a time-discrete stochastic process on an -dimensional, continuous state space is described by a time-dependent state vector () with timedependent components  1 (), . . .,   ().From (3) it follows that at any time step , the vector () is distributed according to the time-dependent probability density: Strongly nonlinear Markov processes defined by stochastic iterative maps with nonparametric white noise satisfy equations of the general form A simplified version of ( 13) was introduced earlier in Frank [13].In (13) the vector-valued variable ℎ(⋅) is called the drift function and describes the deterministic evolution of the process.Second, the term (⋅) ⋅ () in ( 13) describes the vector-valued fluctuating force of the stochastic process.The random vector () with coefficients  1 (), . . .,   () is an dimensional normally distributed white noise process.That is, each coefficient   () is normally distributed at any time step  and statistically independent in time.The random coefficients   () are statistically independent among each other.The variable  is an  times  matrix with coefficients   .The coefficients   with  = 1, . . .,  for a fixed index  weight the contributions that the random components  1 (), . . .,   () have on the evolution of the process in the direction (or along the coordinate)   .In particular, for   (⋅, ) = 0 the random coefficient   () does not occur in the evolution equation of   at time .Therefore,  may be considered as weight matrix.Importantly, the coefficients   measure the strength of the fluctuating force (⋅) ⋅ () of the stochastic process described by (13).The reason for this is that the random vector () is normalized at any time step .Most importantly, in the case of a strongly nonlinear Markov process, the drift function ℎ(⋅) or the fluctuating force (⋅) ⋅ () or both depend on at least one expectation value or on the probability density  as indicated in (13).The operator  maps the probability density  to a real number or a set of real numbers.This mapping may depend on the value of .
For example, the probability density may be evaluated at the value of the process like In this context, it should be noted that the formal dependencies ℎ(  [𝑃]) and (  []) include the dependencies on expectation values (such as the mean) as special cases.For example, where ⟨⋅⟩ is the operator symbol for an ensemble average such that ⟨⟩ denotes the mean of the random vector .Note that ( 12) and ( 13) provide a closed description of a stochastic process.Moreover, from the structure of ( 13) it follows immediately that processes defined by ( 12) and ( 13) satisfy the definition of strongly nonlinear stochastic processes presented in Section 1.2.An illustrative example of a stochastic iterative map ( 13) is the autoregressive model of order 1 subjected to a so-called mean field force ℎ MF proportional to the mean ⟨()⟩ of the process.For  = 1, the model reads [13]  ( + 1) =  () +  ⟨ ()⟩ +  () , with  ≥ 0, where the parameters  and  are scalars.We will return to model ( 16) in Section 4.

Strongly Nonlinear Markov Processes Described by Nonlinear Master Equations.
For time-continuous stochastic processes evolving on discrete state spaces the occupation probabilities () are functions of time , where  is a continuous variable.Likewise, the probability vector () becomes a time-dependent variable with coefficients (1, ), . . ., (, ) just as in Section 2.2 but with  being a continuous rather than a discrete variable.Time-continuous, strongly nonlinear Markov processes on discrete state spaces can be described by means of nonlinear master equations of the form (see, e.g., [14]) for  >   and  = 1, . . .,  with ( → ) = 0 for all states .Here, the variables ( → ) ≥ 0 are transition rates that describe the rate of transitions (number of transitions per unit time) from states  to states .We will use the convention ( → ) = 0 for all states .Note that as opposed to the conditional probabilities (⋅) occurring in a Markov chain, the rates (⋅) of the master equation are not normalized.
Transition rates may depend explicitly on time such that ( → , ).For strongly nonlinear stochastic processes transition rates do not only depend on the states  and  and on time but also depend on an expectation value or the occupation probabilities (, ).Note that ( 17) is the integral form of the master equation.If the integral is evaluated for a small time interval Δ =  −   → 0, then the integral version exhibits the structure of a Markov chain (7) with which can be written in a more concise form like Note again that we will use the convention that the diagonal elements ( → ) of the transition rates equal zero, which implies that for the diagonal elements ( → ) only the second term in (19) matters as stated in (18).For the sake of completeness, note that the differential version of the master equation ( 17 Just as in Section 2.2, in order to describe the four classes of stochastic processes listed in Table 1 on an equal footing, we consider an alternative description of the process defined by (20).Accordingly, we introduce the flow function Φ(, ) that maps initial states (0) defined on the integer set 1, . . .,  to states (): The variable  is a random variable that is uniformly distributed on the interval [0, 1] at every time point  and is statistically independently distributed across time.In view of the fact that the time-discrete version of the master equation exhibits the structure of a Markov chain, trajectories defined by the flow function can be regarded as time-continuous counterparts to the trajectories of Markov chains as defined by (1), and ( 8)- (11).This analogy can be exploited to construct iteratively the flow function Φ in the limiting case Δ → 0 like Φ :  ( + Δ) =  ( () , , Δ,  () ,  ()) , The iterative map (⋅) involves the indicator function  ( → , , , Δ, ) = { 1 for  ∈ [ (,  − 1) ,  (, )) 0 otherwise (24) and the cumulative transition probabilities Equations ( 1), (18), and ( 22)-( 25) provide a closed approximate description of the trajectories of interest.By definition, the description becomes exact in the limiting case of infinitely small time intervals Δ.Note that the conditional (transition) probabilities (⋅) defined by (18) are in the interval [0, 1] provided that Δ is chosen sufficiently small.In the limiting case Δ → 0 this is always the case assuming that the rates (⋅) are bounded from above.Note also that the probabilities (⋅) defined by ( 18) are normalized to 1. Furthermore, from (18), and ( 22)-( 25) it follows that (⋅) =  holds for Δ = 0.That is, the iterative map (22) describes transitions from states () =  to states ( + Δ) in the small time interval Δ.Most importantly, according to (22), the evolution of realizations depends on the occupation probability vector () at every time point .Therefore, processes defined by ( 1), (18), and ( 22)-( 25) belong to the class of strongly nonlinear stochastic processes (see the definition in Section 1.2).

Strongly Nonlinear Markov Diffusion
Just as in the case of the stochastic iterative maps discussed in Section 2.3, ( 26) and ( 27) exhibit two different components.
First, the vector-valued function ℎ(⋅) occurring in (26) and ( 27) is the drift function introduced in Section 2.3 that accounts for the deterministic evolution of a process.Second, the expressions  ⋅  and  ⋅ Γ, respectively, describe a vector-valued fluctuating force.The variable () is the dimensional random variable introduced in Section 2.3.The function Γ() in the Ito-Langevin equation ( 27) is the counterpart to the random vector () in (26) and corresponds to a so-called vector-valued Langevin force [4,6].More precisely, each component Γ  () is a Langevin force.We will consider Langevin forces normalized with respect to a magnitude of 2, like lim where Γ   () are realizations of the th component.The variable  is the  times  weight matrix introduced in Section 2.3.
The stochastic process defined by the Ito-Langevin equation (27) can equivalently be expressed by means of a strongly nonlinear Fokker-Planck equation: The evolution equation for (, ) is nonlinear for (, ). In For details, see Frank [9].A complete description of the stochastic process can be obtained either from the realizations defined by ( 26) and (27) or from the evolution equations ( 29) and (30) for (, ) and (,  | , ).The expression occurring in (29) and (30) is the diffusion coefficient of the process.
A fundamental example of a strongly nonlinear Markov diffusion process is a generalized Ornstein-Uhlenbeck process involving a mean field force ℎ MF proportional to the mean ⟨()⟩ of the process [9,15,16].For  = 1, the stochastic difference equation reads with  ≥ 0. Typically, the case ,  > 0 is considered.For (32) the strongly nonlinear Langevin equation reads The generalized Ornstein-Uhlenbeck process in its form as stochastic difference equation (32) corresponds to a special case of the generalized autoregressive model of order 1 defined by (16).Model (34) has been studied in the context of the Shimizu-Yamada model.We will return to this issue later.

On a General Stochastic Evolution
The additional variable  defines the characteristic jumping interval of the process under consideration and can be used to distinguish between time-discrete and time-continuous processes.For  = 1 we are dealing with time-discrete stochastic processes.In the case of processes defined on discrete state spaces (, ) is related to the occupation probability (, ) by means of ( 5) and ( 6) and (35) reduces to the special case of (8) of the Markov chain model.For processes defined on continuous state spaces and  = 1 the general form (35) becomes (13) in the case of stochastic processes driven by nonparametric white noise.In the limiting case  → 0 we are dealing with time-continuous stochastic processes.In this case, the function (⋅) possesses the property (⋅) = () for  = 0.For processes that evolve on discrete state spaces and satisfy a nonlinear master equation it follows that (35) becomes the iterative map (22) provided that the argument (, ) in (⋅) is expressed in terms of the occupation probability (, ); see (6).In contrast, for processes evolving on continuous state spaces and satisfying strongly nonlinear Ito-Langevin equations, the general form (35) reduces to the stochastic difference equation ( 26) when we consider the limiting case  → 0.

The Strongly Nonlinear Two-State Markov Chain Model.
A fundamental strongly nonlinear stochastic process is the two-state nonlinear Markov chain process [17].For the sake of convenience, the states  = 1 and  = 2 will be labeled with "" and ".
which is a closed nonlinear equation in (, ).Model (37) involves two functions only: the conditional probabilities ( → ) and ( → ) which can be chosen independently from each other and in the case of a nonlinear Markov process depend on (, ) as indicated in (37).

Social Sciences and Group Decision Making: Time-Discrete
Case.Group decision making has frequently been described by nonlinear stochastic processes [21][22][23][24].The reason for this is that under certain circumstances individuals are likely to be affected by the opinion that is presented by a group as a whole rather than by the opinion of individuals.This socalled group opinion in turn is produced by the opinions and decisions of the individual group members.Consequently, the decision making process of an individual is affected by the sample average obtained from the opinions of the others.
If the groups can be considered as being relatively large, the sample average may be replaced by the ensemble average.The decision making behavior of an individual is then considered as a realization of a stochastic process that is affected by the ensemble average of the process.In the context of voting behavior, the two-state model introduced in Section 3.1 was applied [17].The members of the US House of Representatives had the choice to vote for or against passing the so-called bailout bill (US House of Representatives, USA, 2008).The bill failed to pass in the September 29 vote.In a second vote, on October 3, the bill finally passed.That is, a critical number of "No" voters changed their opinions between September 29 and October 3.The states  = 1 and  = 2 were labeled by "" and "" representing "Yes" and "No" voters.According to model (37), the conditional probabilities ( → ) and ( → ) describing a change in the opinion from "No" to "Yes" and a consistent "Yes" opinion were considered.Using the notation of ( 37), the voting model proposed in Frank [18] reads The data available in the literature suggested that ( → ) = 1.For the ( → ) the function ( → ) = −⋅ (, ) was used to describe the impact of the group opinion on the voting behavior.This relationship assumes that the pressure to change from a "No" voter to a "Yes" voter was high immediately subsequent to the September 29 votes, when the probability (, ) at  = Sept 29 was not high enough to make the bill pass.In this context, it should be mentioned that the failure of the bill to pass on September 29 triggered a dramatic breakdown of the US stock market.That is, the members of the House of Representatives who had voted with "No" were not only subjected to political pressure but also experienced essential pressure from the economy.For ( → ) = 1 and ( → ) =  − (, ) the stationary occupation probability (, st) is given by (, st) = / such that (41) eventually reads The stationary value (, st) was estimated from the data.A detailed model-based analysis of the data showed that the members of the two main parties, the Democratic and Republican members, were differently affected by the group pressure.

Psychology and Human Memory.
A pioneering experiment in human memory research is the free recall experiment in which participants are asked to recall items of a particular category [25,26].For example, they are asked to recall animal names.Typically, participants are instructed to recall as many items as possible and to do so as fast as possible and without repetition.The total number of recalled items is by definition a monotonically increasing function of time.Strikingly, free recall involves clusters in which item subcategories are recalled more or less as a whole.For example, when recalling animal names a participant may recall a number of insect names in succession.From a theoretical point of view, at any given time point during the free recall process the ability to recall items in clusters depends on the size of the pool of items available for recalling.In other words, if there is a large pool of items that have not yet been recalled, then there is a high chance that a whole item subcategory can be recalled.In line with this argument, a stochastic model for free recall dynamics was developed in which the recall process depends on the size of the pool of items available for recall [27].More precisely, time was parceled in discrete time intervals in which recall attempts are executed.The aim was to develop a model for the probability that a single item has been or has not yet been recalled at a given time step .
To this end, the two-state model of Section 3.1 was applied.Accordingly, the states  = 1 and  = 2 correspond to being recalled or being not yet been recalled and are labeled by "" and ", " respectively.The two relevant conditional probabilities are ( → ) and ( → ).By the design of the experiment, we have ( → ) = 1.Moreover, as argued previously, if the pool of items that have not yet recalled is large, that is, if the probability () that an item has not yet been recalled is high, then the conditional probability ( → ) should be high as well.Likewise, if the probability () that an item has been recalled is high, then ( → ) should be relatively low.Therefore, it was suggested to put ( → ) =  ⋅ (1 − (, )) with  > 0. This function implies that the occupation probability (, ) converges to the stationary fixed (, st) = 1 at which ( → ) = 0.
The full model can be obtained by substituting ( → ) and ( → ) into (37) and reads Note that from a mathematical point of view, this model is a special case of model (42) for voting behavior (hint: put (, st) = 1 in ( 42)).Since the model describes the single item recall probability, the model must be scaled to the number of items recalled by any given participant.To this end, a parameter  describing the maximal number of items for recall was introduced.The unknown parameters  and  were estimated for free recall data collected in a study by Rhodes and Turvey [28], and the sample mean values  = 0.009 (SD = 0.005) and  = 307 (SD = 138) were found for a sample of eight participants [27].Note that (43) predicts that the fixed point (, st) = 1 will never be reached in a finite number of time steps.Consequently, the model predicts that the total number of recalled items  at the end of the experiment is smaller than the number of available items for recall.This prediction was confirmed by the data.Moreover, the model-based analysis revealed a statistically significant positive correlation between  and  for the participant sample studied in Frank and Rhodes [27].

Biology and Evolutionary Population Dynamics.
A primary concern of evolutionary population dynamics is the understanding of the origin of life.Among the many models that have been proposed in this context, the studies by Crutchfield and Görnerup [29] and Görnerup and Crutchfield [30] are of interest for our review.In these studies a number of  macromolecules with different functional properties were considered.These macromolecules compete with each other for survival during evolutionary time scales.However, not only competition but also cooperative behavior is taken into account.Roughly speaking, it is assumed that in the early stages of the development of life, macromolecules of different types existed and could change their types due to interactions with other macromolecules of the same or different types.Eventually, only a subset of functionally different types survived the selection process.Let  = 1, . . .,  denote the functionally different types of macromolecules and (, ) the occupation probability of the th type.Then, at every discrete reaction time step  the macromolecules can change their types such that transitions from type  to type  occur.This so-called metamodel can be formulated in terms of a nonlinear Markov chain model [31].Accordingly, the conditional (transition) probability ( → ) describes the probability of a transition from type  to type  given that a macromolecule is of type  at time step .As suggested by the studies of Crutchfield and Görnerup [29] and Görnerup and Crutchfield [30] interactions between two different types of macromolecules can be modeled by assuming that the evolution of occupation probabilities is affected by terms of the form (, )(, ).More precisely, in general, a macromolecule of type  may interact with a macromolecule of type  and turn into a macromolecule of type .It can be shown that from the metamodel it follows that ( → ) assumes the form [31]  ( → ) = In (44) the parameters () ≥ 0 are weight coefficients that describe the relevance of interactions between macromolecules of different types.Note that the conditional probabilities ( → ) satisfy the normalization condition that the sum over all probabilities with index  equals 1.This normalization can be guaranteed by requiring that the sum over all weight coefficients  with index  equals 1. Substituting ( 44) into (7), the metamodel for evolutionary population dynamics can be cast into a nonlinear Markov chain model and reads It has been shown that the nonlinear Markov chain model ( 45) is a useful tool for investigating evolutionary population dynamics.First of all, trajectories that describe the evolution of individual macromolecules were computed numerically using (1), and ( 8)- (11); see Section 2.2.That is, the model allows for generating temporal sequences of type changes for individual macromolecules.Second, it was shown that the degree to which a type  is attractive can be quantified.
In particular, for evolutionary selection process that become stationary, the attractors of the surviving macromolecule types can be analyzed and distinctions between weakly and strongly attractive types can be made (e.g., see Figure 3 in [31]).

The Generalized Autoregressive Model of Order 1 as a
Strongly Nonlinear Markov Process.The generalized autoregressive model of order 1 defined by ( 16) was studied in Frank [13] as time-discrete version of the time-continuous strongly nonlinear Shimizu-Yamada model; see Section 6.
Let us discuss this generalized autoregressive model in more detail.To make this discussion more self-consistent, ( 16) is presented here once again as (46).Since the random variable  exhibits a normal distribution at any time step , the conditional probability density can be calculated that describes the distribution of  at time  + 1 given the value of  =  at the previous time step  and the probability density (, ) of  at time step .Assuming that the variance of  equals 2, the solution reads As indicated in (47) the conditional probability depends only on the first moments (i.e., the mean value) of the probability density (, ).By means of the conditional probability density (,  + 1) can be computed from (, ) like From (46) it follows that the mean value  1 () = ⟨()⟩ evolves like For 0 <  +  < 1 the mean value is an exponentially decaying function.For −1 <  +  < 0 the mean value alternate between negative and positive values but decays in amplitude monotonically.In summary, for −1 <  +  < 1 the mean value converges to zero in the limiting case  → ∞.
A detailed calculation shows that the second moment  2 = ⟨ 2 ()⟩ evolves like For −1 <  +  < 1 and −1 <  < 1 the first moment converges to zero which implies that the second moment converges to a stationary value given by  2 (st) = 2  (40).Then, for −1 <  +  < 1 and −1 <  < 1 the stationary variance of  is given by Not surprisingly, ( 51) is just the expression for the variance of the ordinary stationary autoregressive processes of order 1.Moreover, for normally distributed initial values (1), the probability density (, ) satisfies a normal distribution for all times .This follows from ( 47) and (48) or alternatively from the linearity of (46).Consequently, for −1 <  +  < 1 and −1 <  < 1 the probability density (, ) converges in this case to a normal distribution with zero mean value and variance given by ( 51).In the stationary case, the correlation function Corr() = ⟨()( − )⟩/ Var() for  ≥ 0 has exactly the same form as the correlation function of the ordinary autoregressive process of order 1 because the mean value  1 equals zero and drops out of (46): In conclusion, for −1 <  +  < 1 and −1 <  < 1 the generalized autoregressive process (46) behaves in the stationary case exactly as the ordinary autoregressive process (i.e., the process with  = 0).The two processes however exhibit different transient characteristic properties.

The Generalized Deterministic Autoregressive Process of Order 1 and the Sensitivity of Strongly Nonlinear Processes to
Initial Conditions.In the deterministic case, we put  = 0 in (13) such that Let us consider the case  = 1 and assume that the drift function ℎ(⋅) depends only linearly on the state ().If, in addition, ℎ(⋅) does not explicitly depend on time and [⋅] is a map (functional or linear operator) that maps (, ) to a real number, we arrive at the model In (54) we simplified the notation by introducing the operator (⋅) that maps (, ) to a real number (e.g.,  calculates expectation values of  and then uses them as arguments in a function).Equation ( 54) may be considered as a generalized deterministic autoregressive model of order 1.Model (54) was studied in detail in Frank [32].A striking feature of the model is that the stationary probability density (, st) if it exists depends crucially on the initial probability density (,  = 1).For sake of readability, let us put () = (,  = 1).Then, it can be shown that (, st) if it exists is a rescaled function of (⋅) like [32]  (, st with the scaling parameter In other words, the strongly nonlinear process (54) might exhibit infinitely many stationary probability densities.This was demonstrated more explicitly for a specific form of (⋅) which will be discussed next.

Economics and the Emergence of Stationary Volatilities
as a Group Dynamics Process.In economics the standard deviation or the variance of a price of an asset is a measure for how risky the assist is.That is, the variability measures reflect the beliefs of traders how risky it is to hold the asset.Zero variability reflects a risk-free asset.In this context, the variability of a price is also called the price volatility.Volatility is observed by traders and informs individual traders how risky the market as a whole considers a particular asset.On the other hand, the market is composed of individual traders.An autoregressive model of the form ( 54) can be used to model such a circular relationship.To this end, we assume that the operator (⋅) calculates the variance of .In this case, the evolution of  is determined by the variance of , on the one hand.On the other hand, the variance by definition is calculated from the realization of .Let us consider the following special case of (54) (for details see [32]): with ,  > 0. It can be shown that for  < 2/ the process exhibits stable but not asymptotically stationary probability densities.The shape of these probability densities depends on the initial distributions as discussed in Section 4.2.However, the variance of all stable stationary distributions equals .This is intuitively clear, when we realize that for Var() = , (57) becomes the identity ( + 1) = ().Model (57) also exhibits unstable stationary probability densities.One of these unstable solutions is the Dirac delta function () = ().It can be shown that any process converges to a stationary process with variance  provided that the initial probability density of the process has a variance Var() smaller than a certain critical value.For example, if the Dirac delta function is perturbed slightly such that the variance becomes finite but is still close to zero, then the process will not converge back to the Dirac delta function.Rather, the process will converge to a stationary process with variance equal to .We distinguished previously between stable and asymptotically stable probability densities.Stable but not asymptotically stable stationary solutions have also been called marginally stable stationary solutions.Recently, research has been focused on the importance of marginally stabile stationary solutions for biochemical and biological systems [33][34][35].In the context of strongly nonlinear processes, stable but not asymptotically stable probability densities or alternatively marginally stable probability densities refer to processes that exhibit a family of stationary probability densities with two properties [9].First, by an infinitely small perturbation a particular stationary probability density can be transformed into another stationary probability density of the family of probability densities.Second, the family of probability densities as a whole acts as an attractor.That is, for any perturbation that does not transform a member of the family into another member of the family, the perturbed stationary probability density eventually converges to one of the members of the family of marginally stable probability densities.A simple example can be given for model (57).As mentioned previously, for the marginally stable probability densities with Var() = , (57) reads ( + 1) = ().A shift of the coordinate system is consistent with the identity ( + 1) = () and does not affect the variance.Therefore, any stationary probability density with Var() =  can be shifted along the -axis by an infinitely small amount to give rise to another stationary probability density.Clearly, this kind of perturbation will not decay.
In closing the discussion on model (54), let us return to the interpretation of the model as a model for volatility dynamics.The model predicts that the emergence of a stationary volatility in the market is not guaranteed but depends on market parameters (here the inequality  < 2/ must be satisfied) and the initial variance.Furthermore, the model predicts that the price itself is not affected whether or not a stable, stationary volatility measure emerges.

Phase Synchronization of Inhomogeneous Ensemble of
Globally Coupled Oscillatory Units.Phase synchronization of coupled oscillatory subunits has frequently been studied by means of time-continuous strongly nonlinear stochastic model, see Section 6.However, also time-discrete models have been considered.Phase synchronization is a special case of synchronization when the amplitude dynamics is neglected.Accordingly, each oscillatory unit is described by a single real number representing a phase ().Let us envision each oscillatory unit as an oscillator evolving along a closed orbit in an appropriately defined state space.Then the phase () is the coordinate along that closed orbit.The phase () is a periodic variable.The oscillators are desynchronized if  has a uniform distribution on the domain of definition.If  exhibits a nonuniform distribution (in particular, a unimodal distribution), then the oscillators are partially synchronized.If  =  for all oscillators, then there is complete synchronization.
Let us consider an all-to-all coupling between the oscillators.In this case, the evolution of the phase () of an individual oscillator depends on the evolution of all other oscillator phases.If the set of oscillators is relatively large, the limit of infinitely many oscillators may be considered as a good approximation and the sample of oscillators may be replaced by a statistical ensemble.In this case, the phase dynamics of an individual oscillator depends on the expectation value of the statistical ensemble of oscillators.Moreover, any individual oscillator represents a realization of the ensemble.In summary, a closed description of the phase dynamics in terms of a strongly nonlinear stochastic process can be obtained.
In the deterministic case, we put  = 0 in (13) such that (13) becomes (53) mentioned in Section 4.2.Without loss of generality, the period can be put equal to unity such that  is defined in the interval [0, 1], and  = 0 and  = 1 indicate the same location on the closed orbit.In line with seminal works by Kuramoto [36], Daido [37] proposed an explicit form of (53) that can be written for individual realizations like The variable  is the order parameter of the oscillator system.For desynchronized oscillators (i.e., for a uniformly distributed ) we have  = 0.If  differs from zero, the oscillators are partially synchronized.In (58) the parameter  ≥ 0 describes the coupling strength between the oscillators.More precisely, it is the coupling between individual oscillators and the mean field force defined by ℎ MF () =  ⋅ sin(2)/(2) that acts on an individual oscillator and is generated by all oscillators.In (58) the parameter  denotes the phase velocity for the uncoupled oscillator.As indicated,  is taken from a distribution and differs among the oscillators.Therefore, model ( 58) describes an inhomogeneous ensemble of globally coupled oscillatory units.Model ( 58) is a useful model for studying the emergence of phase synchronization from the perspective of nonequilibrium phase transitions.At a critical value of the coupling parameter , a nonequilibrium phase transition occurs and the probability density () bifurcates from a uniform distribution to a nonuniform distribution indicating that the ensemble makes a transition from a desynchronized state to a partially synchronized state [37].For Lorentzian distributed phase velocities , an analytical expression of the critical coupling parameter can be found [37].Similar aspects of time-discrete models for synchronizations have been studied in Daido [38,39] and by Teramae and Kuramoto [40].
Following more recent work [32], the conditional probability density of the phase synchronization model (58) in terms of the so-called Frobenius-Perron operator involving the Dirac delta function (⋅) can be determined and reads The modulus-1 operator may be eliminated by casting (59) into the form The conditional probability density holds for a particular unit with phase velocity .Let () describe the probability density of phase velocities.Then, (,  + 1) can at least formally be computed from (, ) and () via the integral equation Accordingly, stationary solutions (, st) satisfy The uniform distribution () = 1 satisfies (62) for any coupling parameter  ≥ 0 and consequently is a stationary probability density.However, for oscillator systems with Lorentzian velocity distributions (), the uniform distribution is an unstable stationary probability density if the coupling parameter exceeds the critical value which can be shown by numerical simulations [37].

Time-Continuous Strongly Nonlinear Stochastic Processes on Discrete State Spaces
Strongly nonlinear stochastic processes evolving on discrete state spaces but exhibiting a continuous time variable have been studied in the context of nonlinear master equations as introduced in Section 2.4; that is, in the context of Markov processes.Frequently, nonlinear master equations have been studied as a tool to derive nonlinear Fokker-Planck equations (see, e.g., [14,[41][42][43][44]).However, nonlinear master equations have also been studied in their own merit (see, e.g., [45]).

Nonlinear Master Equations as a Tool for Identifying the Generic Forms of Nonlinear Fokker-Planck Equations.
As mentioned in Section 2.4, transition rates of master equations may depend on occupation probabilities.Curado and Nobre [14] considered only transition rates between neighboring states .In addition, the explicit time dependency of rates was neglected.In this case, (20) reads Note that there are only two types of transition rates.First, there are transition rates that describe the rate of transitions from a level to the next higher level ("upward rates").Second, there are the transition rates of transitions from a level to the next lower level ("down rates").Using (, up, ) = ( →  + 1, ) and (, down, ) = ( →  − 1, ), we can rewrite (63) as Furthermore, Curado and Nobre [14] assumed that the states  represent positions on a one-dimensional grid with spacing Δ.They assumed that the transition rates depend on the spacing Δ but also on the occupation probabilities (, ).
More precisely, the model involves the following up-and downrates: In the limiting case of an infinitely small grid (i.e., for Δ → 0), the master equation ( 64) with (65) behaves like the nonlinear Fokker-Planck equation of the form (29).The diffusion term of that Fokker-Planck equation is proportional to the probability density   .This model, in turn, is a standard model for diffusion in porous media [46,47]; see Section 6.

Strongly Discrete-Valued Nonlinear Stochastic Processes Maximizing Generalized Entropy Measures.
A key property of stochastic processes described by ordinary master equations such as (20) with rates ( → ) independent of the occupation probabilities () is that the processes approach stationarity in the long time limit [48].More precisely, processes evolve such that the Kullback distance measure [49] is a monotonically decreasing function of time and approaches a stationary value.The stationarity of the Kullback distance measure indicates that the process under consideration has become stationary.The Kullback distance measure is defined on the basis of the Boltzmann-Gibbs-Shannon entropy, which is a fundamental entropy measure not only for equilibrium systems [50] but also for nonequilibrium systems [51].It was suggested to exploit this link between the Boltzmann-Gibbs-Shannon entropy, the Kullback distance measure, and the ordinary master equation to define nonlinear master equations based on generalized entropy and generalized Kullback distance measures [9,45].Accordingly, entropy measures  of the form ISRN Mathematical Physics 13 are considered that involve a kernel function (⋅).Furthermore, it is assumed that the kernel (⋅) satisfies certain properties.The second derivative of the kernel () with respect to  is negative for any argument  in the interval [0, 1] which implies that the entropy () is concave [52].In addition, the first derivative of the kernel () with respect to  in the limiting case  → 0 goes to plus infinity (i.e., exhibits a singularity).This property is important for the master equation that will be defined in the following and guarantees that occupation probabilities () solving the master equation do not become negative.The following master equation has been proposed [45]: The nonlinearity in the master equation ( 67) is conceptually different from the nonlinearity in the master equation (20).While in (20) it is assumed that the transition rates depend on occupation probabilities, in (67) the transition rates ( → ) are independent of the occupation probabilities.However, transitions are not proportional to the occupation probabilities () as in (20).Rather, they are proportional to the terms (()), whose explicit form depends on the entropy kernel (⋅).In view of this property, transitions are entropy driven.It can be shown that stochastic processes satisfying (67) converge to occupation probabilities that maximize the entropy measures .Therefore, we may say that the transitions of processes defined by ( 67) are proportional to an entropy drive (⋅) that maximizes the entropy  under consideration.Note that for the special case of the Boltzmann-Gibbs-Shannon entropy the function  reduces to the identity () = , which implies that (67) includes the ordinary, linear master equation as special case.
In closing our considerations on the nonlinear master equation (67), it is useful to note that formally (67) can be cast into the form of the nonlinear master equation ( 20 for (, ) > 0, then model (67) assumes the form of (20).Note that in the limiting case  → 0, we have  → 0 because of the aforementioned assumption that ()/ → ∞ holds for  → 0. This, in turn, implies that ( → , , (, )) for (, ) = 0 can be defined as the product of ( → , ) and the limiting case "0/0, " where the ratio "0/0" has to evaluated for any given entropy kernel ().The relation ( 68) is useful because it can be used to compute realizations of the nonlinear master equation ( 67) from ( 1), (18), and ( 22)-(25).

Physiological, Partitioned (Discrete-Valued) Signals and Strongly Nonlinear Stochastic Processes Exhibiting Stationary
Cut-Off Distributions.It has been argued that any probability density with tails that extend to infinity fails to capture the boundedness of physiological signals [53].In other words, while probability densities with tails that extend to infinity such as the normal distribution are crucially important for developing theory and for modeling and are frequently good approximations, they imply that signals can assume arbitrarily large values in magnitude.This is not plausible.Physiological signals such as muscle force produced by human and animals, limb velocities, limb accelerations, blood flow velocities, heart rates, electroencephalographic recorded brain signals, pulse rates of neurons, and dendritic currents are bounded and cannot exceed certain maximum values.Taking an information theoretical point of view [51], stochastic processes describing physiological signals may maximize information or entropy measures.However, they cannot maximize the Boltzmann-Gibbs-Shannon measure under ordinary constraints because this measure yields normal distributions or other distributions with tails that extend to infinity.In contrast, physiological signals may maximize generalized information and entropy measures that exhibit the so-called cut-off distributions as maximum entropy distributions.This argument was developed for stochastic processes defined on continuous state spaces [53] but holds for processes evolving on discrete-state spaces as well.In particular, signals from sensory systems are known to be partitioned in a certain way.Differences between two magnitudes can only be detected if they exceed certain thresholds.For this reason, modeling of sensory physiological signals and physiological signals in general may assume that the state space of consideration is of discrete nature.
In Frank [45] the coordination of rhythmic movements of two limbs has been addressed exploiting the master equation model (67).It has been assumed that the physiological relevant variable, namely, the relative phase between the limbs, is appropriately described on a partitioned (i.e., discretevalue) state space.Furthermore, it has been assumed that the coordination dynamics corresponds to a stochastic process maximizing an entropy measure that exhibits a maximum entropy cut-off distribution.Out of all possible entropy measures, the entropy measure, nowadays called Tsallis entropy, has been used for that purpose [54][55][56].The model can be considered as a modified version of the seminal coordination model by Haken et al. [57].The benefits of the nonlinear master equation model (67) relative to the original Haken-Kelso-Bunz model are twofold.First, model (67) offers a consistent approach that addresses physiological data within the framework of cut-off distributions.Second, the model predicts that coordination is bistable in a distributional sense.That is, under certain conditions the model exhibits two stationary distributions (, st).This feature is in line with experimental observations.For human coordination models with continuous state spaces similar attempts have been made to deal with bistability in a distributional sense [58,59].

Time-Continuous Strongly Nonlinear Stochastic Processes on Continuous State Spaces
There is a relatively large literature on time-continuous strongly nonlinear stochastic processes defined on continuous state spaces.Most of these studies are concerned with the Markov diffusion processes.Reviews can be found in Frank [9,60].Strongly nonlinear phase synchronization models based on the Kuramoto model [36] are reviewed in Acebrón et al. [61].In this section, some of applications in physics and the life sciences will be highlighted without going into much detail.
6.1.Chaos in Probability: Time-Continuous Case.In Section 3.2 strongly nonlinear time-discrete stochastic processes were considered that exhibit occupation probabilities that evolve in a chaotic fashion.As an example, the logistic map model defined by (39) was discussed.A key feature of this model and of similar models is that the chaotic behavior arises from the coupling of the dynamics of the realizations to the occupation probabilities.In particular, without this coupling equation (39) reads (,  + 1) =  0 = /4 with fixed parameters  taken from the interval [0, 4] and clearly does not exhibit a chaotic solution.Chaos is due to the fact that transition probability  0 = /4 is replaced by a transition probability that depends on the process occupation probabilities like  0 = ⋅(, )(1−(, )).That is, probability measures of strongly nonlinear processes can exhibit chaotic behavior due to the very nature of strongly nonlinear processes, which is the influence of a probability measure on realizations from which the probability measure is calculated.Such a circular causality structure may be realized in many-body systems composed of interacting subsystems.In such systems, chaos may emerge due to the coupling of the single subsystem dynamics to the relative frequency with which states are occupied by subsystems.Chaos emerges even if the isolated subsystem dynamics is not chaotic [18].Consequently, chaos may be considered as a self-organization phenomenon.Recall that self-organizing phenomena in general are emerging properties of many-body systems that do not exist on the level of the subsystems [62].
In our context, the many-body system as a whole behaves qualitatively differently from the individual, isolated parts.The catchphrases "the whole is different from the sum of its parts" or "more is different" apply [63].This feature of strongly nonlinear stochastic processes has also been demonstrated for time-continuous models involving continuous state spaces [64][65][66][67].Subsystems described by ordinary stochastic differential equations that do not exhibit chaotic solutions have been coupled together and the limiting case of a many-body system composed of infinitely many subsystems has been considered.The limiting case models were described by strongly nonlinear stochastic processes in the sense defined in Section 1.2.In particular, in these studies the dynamics of realization was coupled to certain expectation values of the respective processes.It was found that the expectation values under appropriate conditions exhibit a chaotic dynamics.

Plasma Physics and the Landau Form of Strongly Nonlinear Fokker-Planck Equations.
Plasma physics is concerned with the evolution of many-particle systems composed of charged particles.The particles can interact with each other in various ways.Two interaction forms are particle-particle collisions and Coulomb interaction.The Landau form of the Fokker-Planck equation accounts for these two interactions and describes the evolution of the particle density in the three-dimensional velocity space.That is, let V = (V 1 , V 2 , V 3 ) denote the velocity vector of a particle.Let (V, ) denote the particle mass density at time .Assuming that particles are neither created nor destroyed, the total mass  is invariant over time.The particle probability density (V, ) = (V, )/ can be introduced.According to the Landau form, the probability density (V, ) satisfies a strongly nonlinear Fokker-Planck equation of the form (29).More precisely, the Landau form reads [68][69][70][71][72][73][74] Stationary solutions (V, st) of ( 69) have been studied, for example, by Soler et al. [75].In addition, several studies have been concerned with the derivation of transient solutions (V, ) using different (semi)numerical approaches [74,[76][77][78][79].

Astrophysics: Stellar Cut-Off Mass Distributions Defined by Strongly Nonlinear Stochastic Models.
Astrophysical mass distributions may exhibit relative sharp boundaries.More precisely, particle distributions in the stellar space may be spatially bounded due to gravity.The gravitational forces are produced by the mass distributions themselves.That is, the dynamics of an individual particle of a mass distribution is affected by the particle distribution as a whole.In turn, the distribution is composed of its individual particles.In view of this circular causality, it does not come as a surprise that strongly nonlinear stochastic models have been investigated that describe cut-off distributions of stellar particles [80][81][82][83][84].The models typically assume the form of the strongly nonlinear Fokker-Planck equation (29).Two more comments may be appropriate.First, in Section 5.3 we addressed cut-off distributions in the context of physiological signals.However, the motivation in Section 5.3 for considering cut-off distributions was quite different from the argument used in astrophysical applications.Second, in addition to the aforementioned studies on strongly nonlinear processes describing astrophysical cutoff mass distribution, strongly nonlinear stochastic processes satisfying the Landau form ( 69) have also been used for studying astrophysical problems [85,86].

Accelerator Physics: Shortening of Bunch-Particle Distribution of Particle Beams.
Particles in accelerator beams can travel in localized cluster of particles.In a longitudinal beam, the so-called bunch-particle distribution is a mass density that describes the distribution of particles with respect to the center of the cluster.Let  1 denote relative position of a particle with respect to the bunch center.Typically, the particle dynamics also depends on the particle energy.That is, beam particle dynamics involves a second coordinate  2 which is a rescaled energy measure (rather than a velocity or momentum variable).Dividing the mass density with the total mass, the bunch-particle distribution becomes a probability density (, ) of the two-dimensional vector  = ( 1 ,  2 ).Particles in accelerator beams are charged particles that are accelerated by means of electromagnetic forces.Although particles may interact with each other by means of Coulomb forces, the crucial force that determines the shape and length of a cluster is the so-called wake-field [87,88].The wake-field is an electromagnetic field generated by the whole bunch of particles that travels ahead of the bunch but is reflected at the accelerator walls and then acts back on the particles of the bunch.The wake-field can cause a shortening of the particle bunch.That is, under the impact of the wakefield the cluster is smaller in size than it would be theoretically in the absence of the wake-field.The understanding of bunch shortening is an important step towards an understanding of beam instabilities that should be avoided.
The dynamics of bunch particles is typically described by strongly nonlinear Markov diffusion processes.The reason for this is that the dynamics of individual particles is affected by the wake-field which can be calculated from an appropriate expectation value of the bunch particle probability density (, ).As a result, in various studies it has been proposed that (, ) satisfies a strongly nonlinear Fokker-Planck equation of the form (29) (see, e.g., [89][90][91][92][93][94][95][96][97][98]).A useful model, in particular for the purpose of theory development, is the Haissinski model [95,96,[99][100][101][102][103][104] for which analytical solutions of the reduced density ( 1 ) can be derived.In particular, according to the Haissinski model, the dynamics of single particles satisfies a strongly nonlinear Langevin equation ( 27), which reads [100] where  > 0 describes a phenomenological damping constant.For the Haissinski model the wake-field function   assumes the form of a Dirac delta function:   () =  ⋅ ().The parameter  measures the strength of the impact of the wake-field.For  < 0 the width of the reduced probability density ( 1 ) becomes smaller when increasing the magnitude || of the impact of the wake-field.In doing so, model ( 70) provides a dynamic account for the squeezing effect of wake-fields on particle clusters traveling in accelerator beams.

Physics of Liquid Crystal Phase Transitions from the
Perspective of Strongly Nonlinear Stochastic Processes.Liquid crystals typically consist of rod-shape macromolecules that interact with each other by means of weak Van-der-Waals forces.Due to this interaction, molecules can align themselves such that the molecules point with their primary axes more or less in the same direction.The physical properties of a crystal with aligned molecules are qualitatively different from the properties that the crystal exhibits when the macromolecules are isotropically (randomly) distributed.Therefore, liquid crystals exhibit two phases: an ordered phase with molecules that are aligned at least to some degree, which is called the nematic phase, and a disorder phase with molecules pointing in random directions, which is called the isotrope phase.A phase transition from the isotropic to the nematic phase occurs when the temperature is decreased below a critical temperature [105][106][107][108].The degree of orientational order can be captured by the Maier-Saupe order parameter [102,[105][106][107][108][109][110][111].For the isotropic phase the order parameter equals zero.In the nematic phase, the Maier-Saupe order parameter is larger than zero.Stochastic models describing the emergence of orientational order (i.e., isotropic-nematic phase transitions of liquid crystals) exploit the notion that the order parameter itself expresses the magnitude of an overall force that acts on individual molecules and tends to align them with the mean orientation of the liquid crystal macromolecules.The model is based conceptually on the notions of circular causality and self-organization: individual molecules are affected by the order parameter and at the same time constitute the order parameter.Hess [112] as well as Doi and Edwards [113] have proposed a strongly nonlinear stochastic process describing the rotational Brownian motion of the molecules under the impact of the (self-generated) order parameter.The Hess-Doi-Edwards model exhibits the structure of a strongly nonlinear Fokker-Planck equation (29) and can be cast into the form of a strongly nonlinear Langevin equation (27) (see, e.g., [114]).Exploiting the concept of strongly nonlinear stochastic processes, the martingale for the Hess-Doi-Edwards model can be defined [60].
Transient solutions of the Hess-Doi-Edwards model may be computed from approximate coupled moment equations [115].Using Lyapunov's direct method [62,116], it can be shown that the ordered state exhibits an asymptotically stable probability density [114].Importantly, since stochastic models provide a dynamic account, it is possible to study response functions of liquid crystals.Transient responses describing the relaxation to equilibrium [117,118] as well as correlation functions and mean square displacement functions in the stationary case [114] can be determined at least semi-numerically.

ISRN Mathematical Physics
Beyond the application of the concept of strongly nonlinear stochastic processes to the Hess-Doi-Edwards Fokker-Planck equation, in general, strongly nonlinear stochastic models have turned out to be useful models in liquid polymer physics (see, e.g., [119][120][121]).

Classical Descriptions of Quantum Systems by means of Strongly Nonlinear Stochastic
Processes.The relaxation towards equilibrium of quantum mechanical Fermi and Bose systems can be modeled using a classical approach by means generalized Boltzmann equations that exhibit appropriately modified collision integrals [122][123][124][125]. Applying a diffusion approximation to these collision integrals, such quantum mechanical Boltzmann equations reduce to Fokker-Planck equations that are nonlinear with respect to their probability densities [122][123][124].In addition to this approach via the Boltzmann equation, alternative derivations of classical quantum mechanical Fokker-Planck equations that are nonlinear with respect to probability densities have been proposed [126][127][128][129][130][131][132][133].A key property of these models is that they exhibit stationary probability densities that correspond to Fermi or Bose distributions.Interpreting these models in terms of strongly nonlinear Fokker-Planck equations of form (29) allows us not only to consider the evolution of probability densities but also to examine predictions of semiclassical stochastic processes for quantum systems.For example, solutions of trajectories computed from the corresponding strongly nonlinear Langevin equations of form (27) have been studied [126,127].In particular, the relaxation of the free electron gas towards its stationary Fermi energy distribution can be determined by computing numerically individual electron trajectories in the relevant energy space [9,126].Moreover, martingales for quantum processes within this classical Langevin approach can be defined [60].
Finally, note that classical stochastic descriptions of Fermi and Bose systems do not necessarily involve strongly nonlinear stochastic processes.It has been shown that ordinary Fokker-Planck equations with appropriately defined drift functions ℎ(⋅) can also exhibit Fermi and Bose distributions as stationary probability densities (see, e.g., [134]).

Material Physics of Porous Media.
Gas particles and fluids diffusing through porous media satisfy a nonlinear diffusion equation, frequently called the porous media equation.In one spatial dimension, the porous medium equation for the particle mass density (, ) reads [3,9,46,47,135] with  > 0. Dividing the mass density (, ) by the total mass , (71) becomes an evolution equation for a probability density (, ) normalized to unity with  =  ⋅  (−1) , and assumes the form of the nonlinear Fokker-Planck equation (29).The connection between the porous medium equation and nonlinear master equations has been addressed in Section 5.1 (see the aforementioned).Equations ( 71) and ( 72) exhibit exact time-dependent solutions, frequently called Barenblatt-Pattle solutions [136,137].The parameter  captures the nonlinearity of the equation.For  = 1 (linear case) the porous medium equation reduces to the ordinary diffusion equations and the variance of the diffusion process (or the second moment of (, ) assuming a zero first moment) increases linearly with time .In contrast, for  > 1/3 and  ̸ = 1 (nonlinear case) the variance increases as a nonlinear function of  like   with  = 2/(1+) as can be shown by various methods [9,[138][139][140].
Interpreting (72) in terms of a strongly nonlinear Fokker-Planck equation ( 29), trajectories of realizations can be computed from the corresponding strongly nonlinear Langevin equation [138].For a generalized version of (72) involving a drift force such a Langevin equation approach has been exploited to derive analytical expressions of certain autocorrelation functions [141].
The porous medium equation in its form as a nonlinear Fokker-Planck equation ( 72) plays an important role in the theory development of nonextensive thermostatistics.As it turns out, a particular generalization of ( 72), the so-called Plastino-Plastino model, exhibits stationary solutions that maximize the nonextensive entropy proposed by Tsallis.We will return to this issue later.

Material Physics and the Liouville Dynamics of Many-
Body Systems with Interacting Subsystems.The particle distribution of a many-body system in the overdamped, deterministic case satisfies a Liouville equation.For many-body systems with interacting subsystems the Liouville equation involves an integral term describing the interaction force on a single subsystem.This interaction integral may be evaluated using a diffusion approximation.In doing so, the Liouville equation assumes the form of the nonlinear Fokker-Planck equation ( 29) when considering the probability density rather than the subsystem density.The nonlinearity occurs in the diffusion term.This approach has been developed in a series of studies by Andrade et al. [142] and Ribeiro et al. [143,144].For example, the distribution functions of interacting particles in dusty plasmas, interacting vortices, and interacting polymer chains in complex fluids have been studied within this framework.
Note that as mentioned previously the dynamics of the subsystems is deterministic.Nevertheless, the effective model for the subsystem probability density involves a diffusion term.That is, according to the effective, approximative model in terms of a nonlinear Fokker-Planck equation, due to the interaction between the subsystems the many-body system as a whole generates a fluctuating force that acts on the individual subsystems of the many-body system.

Nonextensive Physical Systems and the Plastino-Plastino
Model.Tsallis (Tsallis,[54,55]; see also Abe and Okamoto [56]) introduced in physics a generalized form of the Boltzmann-Gibbs-Shannon entropy, nowadays frequently called the Tsallis entropy.The Tsallis entropy exhibits a parameter .For  = 1 the entropy reduces to the Boltzmann-Gibbs-Shannon entropy capturing properties of extensive systems.
For  ̸ = 1 the Tsallis entropy describes nonextensive systems.A. R Plastino and A. Plastino [145] proposed a nonlinear Fokker-Planck equation of the form (29) that may be written like The model describes the probability density (, ) of a particle living in a one-dimensional continuous state space given by the whole real line.The term involving a force function ℎ() in ( 73) is linear with respect to the probability density (, ).The diffusion term of the model involves the parameter , which using the notation suggested by (73) corresponds to the parameter  of the Tsallis entropy (see Frank [9]).Note that in the original model a slightly different notation was proposed [145].A key feature of the Plastino-Plastino model ( 73) is that stationary probability densities (, st) of ( 73) maximize the Tsallis entropy.For  = 1, that is, in the special case in which the Tsallis entropy reduces to the Boltzmann-Gibbs-Shannon entropy, the model is linear with respect to the probability density (, ).For  ̸ = 1, that is, for the general case that the Tsallis entropy describes a nonextensive system, the diffusion term is nonlinear with respect to (, ).In view of these observations, the Plastino-Plastino model establishes a connection between nonextensivity and nonlinearity.
The Plastino-Plastino model ( 73) has been examined in various studies.In particular, it has frequently been pointed out that (73) may be considered as a generalized version of the porous medium model (72) for gas and fluid flows that are subjected to an additional driving force captured by the force function ℎ().
Exact analytical solutions for the time-dependent probability density (, ) are available in the case of a linear drift force [140,145,146].Trajectories describing the stochastic dynamics of individual particles can be computed when interpreting (73) as a strongly nonlinear Fokker-Planck equation by means of the corresponding strongly nonlinear Langevin equation [138,141,147].In this case, second order statistics measures such as autocorrelation functions can be determined [141] and martingales can be found [60].Exact time-dependent solutions have been derived for generalized versions of model ( 73) that account for source terms [148][149][150].The Kramers escape rate has been determined for processes described by the Plastino-Plastino model involving an appropriately defined drift force [151].The multivariate generalization of ( 73) with [139,152,153] and without [129,154] time-dependent coefficients has been considered.The Plastino-Plastino model ( 73) with explicit state-dependent diffusion coefficients () has been studied [153,155].The model has been generalized for the Sharma-Mittal entropy that involves two parameters and includes the Tsallis entropy as a special case [156].Interestingly, H-theorems have been found that show that transient solutions (, ) of (73) converge to stationary probabilities densities (, st) provided that ℎ() is chosen appropriately [122,[157][158][159][160][161][162][163][164].
6.10.Oscillatory Coupled Nonequilibrium Systems: Canonical-Dissipative Approach.Coupled oscillators with short-range interactions can be studied within the framework of strongly nonlinear stochastic processes [165].In this study isolated oscillators were modeled using the canonical-dissipative approach [166][167][168] that allows to exploit the concepts of Hamiltonian and Newtonian dynamics, on the one hand, and to address nonequilibrium systems such as self-excited limit cycle oscillators, on the other hand.
6.11.Phase Synchronization and Kuramoto Model: Time-Continuous Case.Synchronization is a phenomenon studied in various disciplines ranging from physics to the social sciences [169].In the context of strongly nonlinear stochastic processes, we are primarily concerned with the synchronization of a large ensemble of oscillatory subunits.Synchronization can be studied both in the phase space of the oscillators (e.g., the space spanned by position and momentum variables) and in reduced spaces.An example for the latter case was discussed already in Section 4.4: phase synchronization.In fact, we will focus in this section on the phenomenon of phase synchronization.
A benchmark model that describes the emergence of phase synchronization in a population of phase oscillators is the Kuramoto model [36].Accordingly, each oscillator is described by a phase  that evolves on a continuous time scale  and represents a periodic variable.Each oscillator is coupled to all remaining oscillators and the limiting case of a group of infinitely many oscillators is considered.The oscillators as a whole generate an order parameter, which is a complex-valued variable.The complex-valued order parameter has an amplitude, the so-called cluster amplitude , and an angle, the so-called cluster phase .Both cluster phase  and cluster amplitude  are measures defined in circular statistics (see, e.g., [36,170,171]).Mathematically speaking, for an oscillator phase  defined on the interval [0, 2] the complex order parameter  is defined by the expectation value where  is the imaginary unit (i.e., the square root of −1) and the cluster phase  is chosen such that  ≥ 0 in any case.Note that both  and  are time-dependent variables.
The order parameter  induces a force that acts on the individual phase oscillators.That is, the oscillators interact with each other indirectly via the self-generated order parameter.The population of phase oscillators can be regarded as a statistical ensemble.Accordingly, the order parameter can be computed as an expectation value from the probability density (, ), that is, the probability density of the phase of an arbitrarily chosen phase oscillator.Since the order parameter acts back on the realizations (individual phase oscillators), the model satisfies the definition of a strongly nonlinear stochastic process provided in Section 1.2.+ Γ () (75) and assumes the form (27).In (75), the parameter  ≥ 0 measures the strength of the force induced by the order parameter .
The uniform probability density  = 1/(2) describing a desynchronized oscillator population is a stationary solution for any parameter  because for  = 1/(2) the cluster amplitude  equals zero.However, only for  < 2 2 the uniform distribution is asymptotically stable.For  > 2 2 the uniform distribution is unstable meaning that for any initial condition that slightly deviates from the uniform probability density  = 1/(2) the strongly nonlinear stochastic process will not converge to a stationary process with uniform probability density.Rather, for  > 2 2 the Kuramoto model exhibits stable, stationary unimodal distributions [9,36].These unimodal probability densities indicate that the oscillator population is partially synchronized.Moreover, the order parameter amplitude  is larger than zero for these unimodal stationary probability densities.The unimodal probability densities are stable but not asymptotically stable (see, e.g., [9]).A shift of such a stationary probability density along the -axis transforms a member of the family of stable stationary probability densities into another member of that family.This feature becomes intuitively clear when we realize that the form of ( 75) is invariant against transformation  →  +  and  →  + , where  is an arbitrarily small real number.We may use the term "marginally" stable to characterize the stability of these stationary probability densities (see also Section 4.3).
The Kuramoto model has been reviewed in Strogatz [172] and Acebrón et al. [61].In particular, there exists an H-theorem of the Kuramoto model showing that transient probability densities converge to stationary ones [173].This H-theorem is related to a free energy approach to the Kuramoto model (Frank [174]; see also Frank [9]).
A key question in the context of the Kuramoto model concerns the bifurcation from the desynchronized state to the synchronized state for oscillator populations with inhomogeneous eigenfrequencies.In this case, (75) may be written for realizations   of the process like where   is the phase velocity of the th realization of the process.If we restrict our considerations to symmetric distributions of velocities and to symmetric initial probability densities, then the symmetry property remains conserved during the evolution of the process and the cluster phase can assume only one of the two values  = 0 or  = .Moreover, the order parameter  becomes a real-valued variable.It can be shown that in this case (76)  As mentioned previously, reviews for the Kuramoto model are available in the literature and the reader may consult these works for more details ( [61,172]; see also Tass [175] and Section 7.5 in [9]).
6.12.Biology: Population Dispersion and Population Dynamics.The spatial distribution of insects forming swarms may be described in terms of cut-off distributions.For example, the insect density () =  ⋅ [{1 −  ⋅ ||} + ] 2 has been proposed [176], where the coordinate x measures the location of an insect with respect to the center of the swarm and ,  > 0. The operator {⋅} + corresponds to the identify for positive arguments and equals zero for negative arguments (i.e., {} + =  for  > 0 and {} + = 0 otherwise).Normalizing  by the total mass , a stochastic model for the probability density () = ()/ needs to explain the emergence of a stationary insect cut-off probability density.In fact, to this end, a model similar to the Plastino-Plastino model ( 73) with an appropriate drift term was proposed [176,177]: with  > 0. The stationary probability (, st) =  ⋅ [{1−  ⋅ ||} + ] 2 is obtained for  = 0.5.However, exponents different from  = 0.5 have also been proposed [178,179].Moreover, descriptions of population dynamics in terms of nonlinear Fokker-Planck equations alternative to (78) have been considered in the literature as well ( [81,168]; see also the Shigesada-Teramoto model in Okubo and Levin, [176], Section 5.6).

Social Sciences and Group Decision
Making: Time-Continuous Case.Group decision making has already been addressed in Sections 3.3 and 4.3 in the context of timediscrete models.In a series of studies, Boster and coworkers developed the so-called linear discrepancy model of group decision making and compared predictions with experimental data [180][181][182].In particular, the model accounts for a key phenomenon in the social sciences: the risky shift.A risky shift occurs when people arrive at riskier decisions when discussing a given topic in a group than when making their own decisions individually.According to the linear discrepancy model due to discussions the opinion in favor or against a particular issue shifts by an amount that is proportional to the opinion that an individual holds and the opinion that is presented in the group by another individual.The linear discrepancy model predicts that a risky shift can be observed when the initial distribution (i.e., the prediscussion distribution) of opinions is skewed.Accordingly, during the discussion session, the opinion distribution tends to become more symmetric, which is accompanied with a shift of the mean value.Some of the mathematical properties of the linear discrepancy model have been studied in more detail within the framework of a strongly nonlinear stochastic process [183].Interestingly, the stationary, symmetric probability densities describing the distribution of opinions among the group members are only stable and not asymptotically stable.
In particular, a shift along the opinion axis transforms one member of the family of stable, stationary probability density into another member.In this regard, the group decision making model exhibits the same feature as the Kuramoto model.

Finance and Option Pricing.
A stochastic option pricing model has been developed on the basis of the Plastino-Plastino model (73).In particular, the option pricing model exhibits a diffusion term similar to the one of the Plastino-Plastino model [184][185][186].A particular benefit of the model is that it features non-Gaussian distributions.This is due to the nonlinearity of the diffusion term (or the nonextensivity of the Tsallis entropy measure involved in the definition of the process).In fact, non-Gaussian distributions frequently fit financial data better than Gaussian distributions.

Economics and Modeling the Dynamics of Target Leverage
Ratios.The total capital (firm value) of a company is typically composed of borrowed money and equity.The leverage of a company or the leverage ratio is the ratio of borrowed money relative to equity.A company needs to decide what leverage ratio the company should have.The decision is not necessarily made once and for all.Rather, the desired leverage ratio (i.e., the target leverage ratio) may be updated dynamically depending on the internal and external circumstances.For this reason, the leverage ratios of companies frequently correspond to dynamic variables subjected to fluctuations.Lo and Hui [187] proposed a strongly nonlinear stochastic model for the dynamics of the leverage ratio.The model is based on a model developed earlier by Hui et al. [188] that involved an explicitly time-dependent function.In the strongly nonlinear stochastic model, this function is eliminated, and in this sense the strongly nonlinear stochastic model can be regarded as being closed.In order to achieve this closure, Lo and Hui [187] assumed that the leverage dynamics of a company is affected by the leverage ratios of similar companies.Within the framework of strongly nonlinear stochastic processes this implies that the leverage ratio dynamics of companies depends on an expectation value of the leverage ratio process.Formally, the model by Lo and Hui is closely related to the Shimizu-Yamada model that will be discussed later.

Engineering Sciences:
Generators for Noise Sources.In engineering sciences and related disciplines a common problem is to simulate numerically signals of noise sources.These signals can then be used to test the response of engineered systems to random signals emerging from noise sources.A characteristic feature of noise sources is that they are stationary.In view of this property, strongly nonlinear stochastic processes can be defined which for any initial probability density are stationary [147,189,190].For example, the strongly nonlinear Langevin equation with , ,  > 0, corresponds to a nonlinear Fokker-Planck equation of the form (29) whose right-hand side is by definition equal to zero [9,147].Consequently, the Langevin equation defines for any initial probability density (,  = 0) a stochastic process with a stationary probability density ().The parameters , ,  can be chosen in order to select a desired correlation time of the process.
6.17.Further Benchmark Models 6.17.1.The Shimizu-Yamada Model.The Shimizu-Yamada model is motivated by research on muscle contraction (Shimizu [191], and Shimizu and Yamada [192]; see also Section 5.5.4 in [9]).The Shimizu-Yamada model corresponds to the generalized Ornstein-Uhlenbeck process that is defined by (33) and involves a mean-field force proportional to the mean value of the process.The Shimizu-Yamada model is highly relevant for the theory development of strongly nonlinear Markov diffusion processes because it can be solved exactly.That is, exact transient solutions can be found and exact solutions for the conditional probability density (,  | , , ⟨()⟩) are available.The conditional probability density, in turn, can be used to calculate all correlation functions of interest both in the transient and in the stationary cases.For details, see Frank [9,193].Since the model exhibits exact solutions, it has also been used to test the accuracy of numerical algorithms for solving nonlinear Markov diffusion processes [16].
6.17.2.The Desai-Zwanzig Model.The Desai-Zwanzig model involves a mean-field force proportional to the mean value of the process just as the Shimizu-Yamada model.However, the individual, isolated subsystems are assumed to perform an overdamped motion in a double-well potential [194,195].for a stochastic process defined on the whole real line and , ,  > 0. The term  −  3 in (80) describes the potential force derived from the aforementioned double-well potential.The term −( − ) describes the mean-field force proportional to the mean value of the process.The force attracts the realizations of the process towards the mean value of the process.The parameter  measures the strength of the mean-field force.The model is symmetric with respect to  = 0.In view of this observation it is intuitively clear that the model exhibits a symmetric stationary probability density with  = 0 for all parameters .It can be shown that for parameters  smaller than a certain critical value this symmetric probability density is asymptotically stable and is the only stationary probability density.However, when  exceeds the critical value, the symmetric stationary probability density becomes unstable.Two asymptotically stable, stationary probability densities emerge with mean values  =  and  = −, where  > 0 [9,194,196].The mean value M has typically been considered as order parameter of a many-body system described by the strongly nonlinear stochastic process (80).For  = 0 the manybody system exhibits a disordered state.For  ̸ = 0 the system exhibits some order.Therefore, the Desai-Zwanzig model describes an order-disorder phase transition.
The special importance of the Desai-Zwanzig model is its feature that provides a fundamental stochastic model for an order-disorder phase transition.The Desai-Zwanzig model and the Kuramoto model may be viewed as the two key models in this regard.Both models are able to capture order-disorder phase transitions.While the Kuramoto model is a prototype system for many-body systems with state variables subjected to periodic boundary conditions, the Desai-Zwanzig model is a prototype system for many-body systems featuring state variables satisfying natural boundary conditions.Moreover, the Desai-Zwanzig model has played a crucial role in the development of the H-theorem for strongly nonlinear Fokker-Planck equations (we will return to this issue later).
Various studies have addressed at least to some extent the Desai-Zwanzig model.This can be shown by interpreting the Desai-Zwanzig model in the context of the free energy approach to nonlinear Fokker-Planck equations [9].Accordingly, the Desai-Zwanzig model does not only involve the mean value M of the process but also the process variance  = ⟨ 2 ⟩− 2 .In order to see this, we need to take advantage of variational calculus.Let / denote the variational derivative of  with respect to the probability density ().A detailed calculation shows that Exploiting this result, the free energy  can be defined like where  denotes the Boltzmann-Gibbs-Shannon entropy of the probability density ().The potential () denotes the double-well potential () = − 2 /2 +  4 /4.The strongly nonlinear Fokker-Planck equation ( 80) can then equivalently be expressed by [9,194,196] where / is the variational derivative of the free energy  with respect to the probability density (, ).According to (82) and ( 83) the Desai-Zwanzig model involves the variance  in the free energy.It has been suggested to call strongly nonlinear stochastic processes "variance models" or " models" provided that they involve the variational derivatives of the variance (i.e., they involve a coupling to the process mean value).A minireview of the Desai-Zwanzig models and related "variance models" or " models" can be found in Frank [9].Finally, note that the Shimizu-Yamada model discussed in Section 6.17.1, that is, the generalized Ornstein-Uhlenbeck model (33), belongs to the class of "variance models" as well.

Shiino's H-Theorem for Nonlinear Fokker-Planck Equations.
As mentioned in Section 6.17.2, the Desai-Zwanzig model played a crucial role in the theory development of the H-theorem for strongly nonlinear Fokker-Planck equations.
While it was well known that stochastic processes described by ordinary Fokker-Planck equations under appropriate circumstances converge to stationary processes in the long time limit, the situation in this regard was not clear for strongly nonlinear Markov diffusion processes described by strongly nonlinear Fokker-Planck equations.In physics, the proof of convergence to the stationary case is typically given in terms of a so-called H-theorem.In a seminal work, Shiino [196] provided an H-theorem for the Desai-Zwanzig model.This H-theorem was generalized and discussed in various studies [122, 157-164, 173, 197-201]; see also our comments in Section 6.8 and 6.10 for H-theorems related to the Plastino-Plastino model and the Kuramoto model, respectively.A selection of H-theorems can be found in Frank [9].Finally, note that one can show that not only the single time point probability density (, ) of a strongly nonlinear stochastic process converges to a stationary probability density.But, the whole process becomes stationary as well [202].
In the context of Shiino's work on the H-theorem, we would like to point out that the study by Shiino [196] also established a novel approach to conduct a stability analysis for nonlinear Fokker-Planck equation.A minireview of the stability analysis by Shiino can be found in Frank [9].

Mean Field Theory and Strongly Nonlinear Stochastic Processes
While from a mathematical point of view strongly nonlinear stochastic processes are well defined, the question arises to what extent strongly nonlinear stochastic processes describe physical systems.In other words, we may ask what kind of physical systems give rise to strongly nonlinear stochastic processes.An important class of physical systems that has been discussed in this context is the class of many-body systems that can be treated at least approximately with the help of mean-field theory (see, e.g., [36,175,195,203,204]).
The departure point is a many-body system composed of  subsystems.The subsystems are coupled with each other by means of an all-to-all coupling which involves an average over a function  of the state variables of the subsystems.The limiting case  → ∞ (the so-called thermodynamic limit) is considered next.In this limiting case, it is assumed that (i) the subsystems become statistically independent of each other and (ii) the average over  becomes an expectation value of  of an arbitrarily chosen representative subsystem.The function  frequently represents a component of a force or corresponds to a force itself.This force is called a mean-field force.A key problem in this regard is to provide a mathematical rigorous proof that in the limiting case  → ∞ the many-body system composed of  subsystems can be regarded as a statistical ensemble of realizations.That is, it is often a challenge to prove the convergence of an ordinary multivariate stochastic process to a strongly nonlinear stochastic process.In particular, the mathematical literature is concerned with this problem (see the following).An alternative bottom-up approach to strongly nonlinear stochastic processes uses as departure point a many-body system in the limiting case  → ∞.Again, the subsystems are assumed to be coupled by means of an average over a function  of the subsystem state variables.Furthermore, it is assumed that the subsystems of the many-body system are initially statistically independent and identically distributed.It can then be shown with relatively little effort that if an arbitrary finite subset of size  is considered, then the subsystems of this subset remain statistically independent at all times.The implication is that in the limiting case  → ∞ the subset converges to a statistical ensemble and the many-body system can be described by means of a strongly nonlinear stochastic process [9,202].

Strongly Nonlinear Stochastic Processes in the Mathematical Literature: Mean-Field Approach, H-Theorems, and Martingales
In the mathematical literature, nonlinear Markov processes have been discussed in the monograph by Kolokoltsov [205].
Besides the seminal work by McKean that opened the avenue to the novel research field on strongly nonlinear stochastic processes (see Sections 1.1 and 1.2), the mathematical literature on strongly nonlinear stochastic processes has traditionally been concerned with the convergence of a multivariate stochastic process to a strongly nonlinear stochastic process in the limiting case in which the system size goes to infinity [7,195,[206][207][208][209][210].That is, the argument advocated by meanfield theory presented in Section 7 has been examined in those studies from a mathematical perspective.A second line of inquiry concerns the convergence of processes towards stationarity.That is, the issue of the existence of H-theorems discussed in Section 6.17 has also attracted the attention of researchers in mathematics [197,[211][212][213][214]. Furthermore, the existence of martingales for strongly nonlinear Markov processes has been pointed out in the mathematical literature [7,208,209,[215][216][217][218][219][220] and has been taken from there to physics and the life sciences [60].
Strongly nonlinear stochastic processes have also been approached from two perspectives slightly different from the mean-field theory approach.Dawson et al. [221] approached strongly nonlinear stochastic processes from the perspective of measure-valued processes, whereas Stroock [222] provided a geometrical approach to the notion of strongly nonlinear Markov processes.
Finally, nonlinear Fokker-Planck equations have frequently been addressed in the mathematical literature in the context of semilinear and nonlinear parabolic partial differential equations.In this context, the reader may consult the books by Friedman [223,224] and Logan [225] and the works by Milstein and colleagues [226][227][228] on numerical solution methods for nonlinear parabolic partial differential equations.

Strongly Nonlinear Non-Markovian Processes
The examples reviewed in the previous sections belong to the class of Markov processes.The definition of strongly nonlinear stochastic processes given in Section 1.2 however does not imply that strongly nonlinear stochastic processes satisfy the Markov property.In fact, non-Markovian strongly nonlinear stochastic processes can be defined with little effort.For example, while the generalized autoregressive model of order 1 defined by ( 16 with   ̸ = 0 or   ̸ = 0, describes a strongly nonlinear stochastic non-Markovian process for  > 1.Likewise, strongly nonlinear stochastic non-Markovian processes have been studied evolving on continuous time scales and continuous state spaces [229][230][231], in particular, in the context of phase synchronization and the Kuramoto model [232] and in the context of muscle contraction and the Shimizu-Yamada model [233].

Coda
Let us highlight some aspects of strongly nonlinear stochastic processes.
Any closed description of an evolution equation for the realization of a process provides a complete and well-defined description of the process because all quantities of interest can at least numerically be computed from (an infinitely larger) ensemble of realizations.Therefore, in Section 1.2 strongly nonlinear stochastic processes have been defined on the basis of evolution equations for realizations of stochastic processes.In Section 2 closed descriptions of such evolution equations have been provided.In view of these considerations, we conclude that strongly nonlinear stochastic processes are well defined.That is, it is advocated here that strongly nonlinear stochastic processes are processes worth being studied in their own merit irrespective of underlying physical models or derivations based, for example, on generalized thermostatistics (Section 6.8) or mean-field theory (Sections 7 and 8).
The term "strongly nonlinear" should be considered as a placeholder for a variety of other phrases that have been used in the literature to label processes of the kind discussed in this paper.In particular, it should be considered as a synonym to the phrase "truly nonlinear." Using the phrase "strongly nonlinear, " we may also regard "weakly nonlinear" processes.Such processes exhibit stochastic evolution equations for realizations that do not depend on expectation values or probability densities but feature other types of nonlinearities, for example, nonlinear functions of the relevant state variables.The stochastic Gompertz model and the stochastic logistic model [234] of population growth are two examples of processes that may be called "weakly nonlinear." From our considerations in Sections 2 to 6 it becomes clear that the concept of strongly nonlinear stochastic processes is a fundamental one and does not depend on the explicit representation of time and space.In particular, strongly nonlinear stochastic processes are not restricted to the class of processes originally considered by McKean.That is, strongly nonlinear stochastic processes are not limited to the class of diffusion processes evolving on continuous state spaces and on a continuous time scale.
Likewise, as it obvious from our discussion in Section 9, the concept of strongly nonlinear stochastic processes and the concept of Markov processes are two independent concepts.A strongly nonlinear stochastic process may or may not exhibit the Markov property.In this context, it should be mentioned that textbook definitions of the Markov property frequently give rise to some confusion because such textbook definitions typically make use of the notion of a "state variable." However, the key issue concerning the Markov property is the clear separation between future, presence, and past.Accordingly, a process is a Markov process if the future depends only on the present characteristic properties of the process and is independent of the characteristic properties that the process exhibited in the past.Subsequently, we may ask the question about what are the characteristic properties.These properties include the state variables of the process but may also include the occupation probabilities for a process defined on discrete states or the probability density for a process defined on a continuous state space.That is, in the case of a Markov process the present state as well as the present probability density determines entirely the future of a realization of a process.The knowledge about states prior to the present state as well as the knowledge about the probability density at times prior to the present time is irrelevant.This notion is at the heart of the definition of a Markov process [13,60] and applies to the processes reviewed in Sections 2 to 6.In this context, the fact that in the mathematical literature Markov diffusion processes have been frequently related to martingales (see Section 8) should be highlighted because martingales nicely illustrate the dependency of the future of a process on the present properties of the process and the independency of the future of the process on its past properties.

Table 1 :
Four classes of strongly nonlinear stochastic processes and sections in which they will be addressed.
" Transitions from  to ,  to ,  to , and  to variance of the stationary processes is determined by Var() =  2 (st).Let us substitute the variable  =  ⋅  (which is a normally distributed random variable with variance 2 2 ) into 2 /(1 −  2 ) such ISRN Mathematical Physics that the reduces to     () =   −  sin (  ()) + Γ  ()