The Stochastic Evolution of a Protocell: The Gillespie Algorithm in a Dynamically Varying Volume

We propose an improvement of the Gillespie algorithm allowing us to study the time evolution of an ensemble of chemical reactions occurring in a varying volume, whose growth is directly related to the amount of some specific molecules, belonging to the reactions set. This allows us to study the stochastic evolution of a protocell, whose volume increases because of the production of container molecules. Several protocell models are considered and compared with the deterministic models.


Introduction
All known life forms are composed of basic units called cells; this holds true from the single-cell prokaryote bacterium to the highly sophisticated eukaryotes, whose existence is the result of the coordination, in term of self-organization and emergence, of the behavior of each single basic unit.
While present-day cells are endowed with highly sophisticated regulatory mechanisms, which represent the outcome of almost four billion-years of evolution, it is generally believed that the first life forms were much simpler. Such primordial life-bricks, the protocells, were most probably exhibiting only few simplified functionalities, that required a primitive embodiment structure, a protometabolism, and rudimentary genetics, so to guarantee that offsprings were "similar" to their parents [1][2][3].
Intense research programs are being established aiming at obtaining protocells capable of growth and duplication, endowed with some limited form of genetics [3][4][5][6]. Despite all efforts, artificial protocells have not yet been reproduced in laboratory and it is thus extremely important to develop reference models [6][7][8][9] that capture the essence of the first protocells appeared on Earth and enable to monitor their subsequent evolution. Due to the uncertainties about the details, high-level abstract models are particularly relevant. Quoting Kaneko [10], it is necessary to "consider simplified models able to capture universal behaviors, without carefully adding complicating details." Most of the models present in the literature are based on deterministic differential equations governing the evolution of the concentrations of the involved reacting molecules. Even if the results are worth discussing and provide important insights, it should be stressed that the former assumptions are rarely satisfied in a cell [11]. Firstly, the number of involved molecules is small and should be counted by integer numbers, hence the use of the concentrations can be questioned; secondly, the presence of the thermal noise introduces in the system a degree of stochasticity than cannot be trivially encoded by a differential equation, mostly because this makes the time evolution a stochastic process. One possible way to overcome such difficulties is to use the Chemical Master equation: given the present state of the system, namely, the number of available molecules for each species, and the possible reactions among them, one can compute the transition probabilities to reach and leave the given state and thus get a partial differential equation describing the time evolution of the probability distribution of having a given number of molecules at any future times [11,12].
Analytically solving the resulting equation is normally a very hard task, one should thus resort to use numerical methods. A particularly suitable one is the algorithm presented by Gillespie [11,12], allowing to determine, as a function of the present state of the system, the most probable reaction and the most probable reaction time, that is, the time at which such reaction will occur.
Let us however observe that in the setting we are hereby interested in, the chemical reactions occur in a varying volume, because of the protocell growth; we thus need to adapt the Gillespie method to account for this factor. To the best of our knowledge, there are in the literature very few papers dealing with the Gillespie algorithm in a varying volume [13,14]. Moreover in all these papers, the volume variation can be considered as an exogenous factor, not being directly related to the number of lipids forming the protocell membrane. So our main contribution is to improve the Gillespie algorithm taking into account the protocell varying volume which is moreover consistent with the increase of the number of lipids constituting the protocell membrane.
The paper is organized as follow. In Section 2 we briefly recall the Surface Reaction Models of protocell that would be used to compare our stochastic numerical scheme. Then in Sections 3 and 4 we will present our implementation of the Gillespie algorithm in a dynamically varying volume. Finally in Section 5 we will present some applications of our method.

Surface Reaction Models
Among the available models for protocells, a particularly interesting one is the Surface Reactions Model [7,9], SRM for short, and its applicability to the synchronization problem. Such model is roughly inspired by the Los Alamos bug hypothesis [2,6] but which, due to its abstraction level, the SRM can be applied to a wider set of protocell hypotheses.
The SRM is built on the assumption that a protocell should comprise at least one kind of "container" molecule (typically a lipid or amphiphile), hereby called C molecule, and one kind of replicator molecule-loosely speaking "genetic material," hereafter called Genetic Memory Molecule, GMM for short, and named with the letter X. There are therefore two kinds of reactions which are crucial for the working of the protocell: those which synthesize the container molecules and those which synthesize the GMM replicators,respectively, In both cases L i and P j are the buffered precursors, respectively, of container molecules and of the jth GMM, while α i and M i j are the reaction kinetic constants. A second main assumption of the SRM, is that such reactions occur on the surface of the protocell, exposed to the external medium where precursors are free to move. Hence, as long as container molecules are produced, they are incorporated in the membrane that thus increases its size, until a critical point at which, due to physical instabilities, the membrane splits and two offsprings are obtained, each one getting half of the mother's GMMs and whose size is roughly half that of the mother just before the division.
Under the previous assumptions and in the deterministic setting, one can prove [7,9] that the number of membrane molecules and the number of GMMs evolve in time according toĊ where X = (X 1 , . . . , X N ) represents the amount of each GMM, α = (α 1 , . . . , α N ) is the vector of the reaction constants responsible for the production of C molecules from the X molecules plus some appropriate precursor. (M i j ) denotes the reaction constant at which X i is produced by X j plus some precursor. β ∈ [2/3, 1] is a geometrical shape factor that relates the surface to the volume of the protocell and ρ is the lipid density (for more details the interested reader can consult [7,9]). Let us observe that in this setting the precursors are assumed to be buffered and thus their amount to be constant, hence the latter can be incorporated into the constants α and M.
So starting with an initial value of container molecules, C(t 0 ) = C 0 , and of GMMs, X(t 0 ) = X 0 , the protocell will grow until some time t 0 + ΔT 1 at which the amount of C molecules has doubled with respect to the initial value, C(t 0 + ΔT 1 ) = 2C 0 and thus the protocell undergoes a division. Each offspring will get half of the GMMs the mother protocell had just before the division, X (1) = X(t 0 + ΔT 1 )/2. And the protocell cycle starts once again. One can prove [7,9] that under suitable conditions X (n) tends to a constant value once n goes to infinity, implying thus the emergence of synchronization of growth and information production.

The Method
Let us now improve the previous scheme by introducing a probabilistic settingà la Gillespie. We thus consider a protocell made by a lipidic vesicle and containing a wellstirred mixture of N GMMs, X 1 , . . . , X N , that may react through m elementary reaction channels R μ , μ = 1, . . . , m, running within the volume V (t) of the protocell.
Let us observe that because of the protocell growth the volume is an increasing function of time. Actually one can relate the volume to the amount of container molecules via their density V = C/ρ where C denotes the integer number of molecules forming the lipidic membrane. We will hereby use the same symbol X i to denote both the ith GMM and the integer number of molecules of type X i in the system.
For each reaction channel R μ assume that there exists a scalar rate c μ such that c μ dt + o(dt) is the probability that a random combination of molecules from channel R μ will react in the interval [t, t + dt) within the volume V (t).
Let h μ (Y ) be the total number of possible distinct combinations of molecules for a channel R μ when the system is in Computational and Mathematical Methods in Medicine 3 state Y = (X 1 , . . . , X N , C), then we can define the propensity [14] of the reaction R μ to be a μ (Y ) = h μ (Y )c μ .
One can prove [11] that for a binary reaction the rate c μ can be written in the form c μ = k μ /V , where k μ is a fixed constant. Similarly one can prove that for a reaction involving n different species, we get c μ = k μ /V n−1 . And thus for a single molecule reaction, that is, a decay, we get c μ = k μ , namely, independently from the volume.
Let us now assume that among the m reactions, Q 1 involve one single molecule, Q 2 are binary reactions, Q 3 are ternary reactions, and so on. Of course Q 1 +Q 2 +· · ·+Q N+1 = m. We recall that we have N GMMs and the container-type molecule C, hence N + 1 species. For short we will denote Q 1 the set of indices μ for monomolecule reactions, and by Q the remaining ones. Let us observe that in this way some coefficient a μ , will depend both on the system state Y and on the time via the volume More precisely to study the time evolution of the system we need to determine the probability P μ (τ | Y , t)dτ, that given the system in the state Y = (X 1 , . . . , X n , C) at time t, then the next reaction will occur in the infinitesimal time interval (t + τ, t + τ + dτ) and it will be the reaction R μ . This probability will be computed as where P not (τ | Y , t) is the probability that no reaction occurs in (t, t + τ) given the state Y at time t whereas the rightmost term denotes the probability to have a reaction R μ in (t+τ, t+ τ + dτ) given the state Y at time t + τ.
To compute the first term P not , let us take s ∈ [t, t + τ] and observe that: being 1 − μ a μ (Y , t + s)ds the probability that no reaction will occur in (t + s, t + s + ds) once we are in state Y at time t + s. Thus rewriting the previous difference equation as a differential equation, passing to the limit ds → 0, and observing that P not (0 | Y , t) = 1, we get the solution: where The apparent asymmetry in the exponential term in (6) is We can thus conclude that Let us observe that the rightmost term is correctly a μ (Y , t+τ), namely the system is still in the state Y at time t + τ, because no reaction has been produced in (t, t + τ). Let us recall that the volume enters in the previous relation via the function A Q , more explicitly one has that can be rewritten in terms of C molecules using the relation C = ρV . So our method applies to a different problem with respect to the one considered in [14], in fact in our case the volume growth is not imposed a priori but dynamically evolves according to the reaction scheme, if C is produced then V increases otherwise it will keep a constant value, while in [14] the volume growth is an exogenous variable.

The Stochastic Simulation Algorithm in a Growing Volume
Once we have the probability function P μ (τ | Y , t) we can build an algorithm that reproduces the time evolution given by the model defined above.
Given the system in some state Y at time t, we must determine the interval of time τ and the reaction channel R μ according to the probability distribution function P μ (τ | Y , t), and finally update the state Y → Y + ν μ , where ν μ is a stoichiometric vector representing the increase and decrease of molecular abundance due to the reaction R μ . This will be accomplished following the standard approach by Gillespie [11] but taking care of the time dependence of the propensities. We will thus need to compute the cumulative probability distribution function and then make use of the inversion method [12], to determine the channel μ and the next reaction time τ, distributed according to P μ (τ | Y , t).
From (8) we can compute the cumulative distribution function providing the probability that any reaction will occur in (t, t+ τ) starting from the state Y at time t. The function F(τ | Y , t) can be explicitly computed by the following.

Proposition 1. Under the above assumptions we have
Proof. The first step is to use (8) and perform a sum over all the channels μ to rewrite (10) as Then we can observe that and thus Once we have the cumulative distribution function we can obtain the value τ by drawing a random number u 1 from an uniform distribution in [0, 1] and then solve with respect to τ the implicit equation: Let us stress once again that this is not as straightforward as for the original Gillespie [11] scheme, or the simplified one presented in [14], because of the time dependence of A Q via the volume. One can nevertheless find suitable approximation for the integral, this will be the goal of the next sections.

The Adiabatic
Assumption. Let us assume that τ is very small, or which is equivalent, that the time scale of the chemical reactions involving the GMMs is much faster than the production of container molecules, hence the volume growth is very slow compared with the production of the chemicals X i .
Under this hypothesis one can assume that in the interval (t, t + τ) the volume does not vary and thus one can make the following approximation One can thus explicitely solve (15) to get that is the standard Gillespie result except now that A Q (Y , t) depends on time and as long the volume increases, then the contribution arising from A Q (Y , t) might become smaller because A Q ∼ 1/V .

The Next Order Correction.
One can obtain a somehow better estimate valid in the case of comparable time scales for the reactions involving GMM and the container growth. The idea is to compute the integral in (15) using the following approximation: where ∂A Q (Y , t)/∂t can be obtained using the definition (9) and expressing the volume in terms of C = V (t)ρ, namely, To computeĊ/C we make the assumption that in a very short time interval, as the one we are interested in, the deterministic growth of the container is a good approximation for the stochastic underlying mechanism; this implies that we can use (3)Ċ Inserting the previous result into (18) and finally solving (15) with respect to τ, we can compute the next reaction time up to correction of the order of τ 3 , as follows where we wrote for shortȦ Q (Y , t) = ∂A Q (Y , t)/∂t and we selected the positive square root in such a way in the limiṫ A Q (Y , t) → 0 we recover the previous solution (17).
Remark 2 (On the existence of τ Gill ). In the case of variable volume a new phenomenon can arise: the volume growth can be so fast that no reaction can occur in the interval (t, t + τ + dτ) for any τ. Mathematically this translates into a sign condition for the term under square root in (21), if then (15) has no real solution. This can be geometrically interpreted as follows. The relation (15) determines τ Gill as the intersection of the , which is negative because u 1 ∈ (0, 1). Such parabola intersect the y-axis at τ 1 = 0 and τ 2 = −2(A Q1 (Y ) + A Q (Y , t))/Ȧ Q (Y , t) > 0 and it is concave. Then its absolute (negative) minimum is reached at the vertex τ V = (t 1 + t 2 )/2 and its value is and it is negative becauseȦ Q (Y , t) is negative. Hence if the horizontal line is below this value, that is, condition (22) is verified, the parabola and the line do not have any real intersections (see Figure 1).
Let us also observe that, whenever it exists, τ Gill is always positive as it should be. In the case of a protocell the nonexistence of such next reaction time could be translated into the death by dilution of the protocell.

The Next Reaction
Channel. Whenever the next reaction time does exist, the next reaction channel is determined using the classical Gillespie method, namely, by drawing a second uniformly distributed random number u 2 ∈ [0, 1] and fix the channel μ such that

Remark 3.
Let us observe that if all the reactions involve the same number of chemicals, then the determination of which reaction channel μ will be activated in the next reaction does not depend on the volume which factorizes out from (23). In fact assuming all the reactions to involve p chemical, we obtain by definition and thus (23) rewrites which is clearly independent of the volume value V .

Some Applications
The aim of this section is to provide some applications of the previous algorithm to the study of the evolution of a protocell.

One Single Genetic Memory
Molecule. The simplest model is the one where only one GMM specie is present in 6 Computational and Mathematical Methods in Medicine the protocell [9] and thus only two chemical channels are active: where P 1 and L 1 are, respectively, precursors of GMM, that is, nucleotide, and precursors of amphiphiles. One can thus compute the propensities in the state Y = (X, C) at time t: let us observe that we assume that precursors are buffered and thus they are constant.
Because system (26) contains only bimolecular reactions, all the propensities are time dependent, hence A Q1 = 0 and A Q = a 1 (X, C, t)+a 2 (X, C, t) = (P 1 η +L 1 α)X/V (t), thus (15) simplifies into whose second-order solution (21) is given by So we can finally obtain Which reaction channel μ will be active in the time interval [t, t + τ] can be obtained according to Let us observe that according to Remark 3, the choice of μ does not depend on the volume, because only binary reactions are present.
Let C 0 be the initial amount of container molecules, then we assume that once C(t) = 2C 0 the protocell splits into two offspring, almost halving the GMM amount. More precisely we assume that the first offspring will get a number of GMMs drawn according to a binomial distribution with parameter p = 1/2 and n = X(t). From this step, for technical reason, only one randomly chosen offspring will be studied during each generation.
In Figure 2 we report a comparison between the deterministic (3) and the stochastic dynamics, under the adiabatic assumption for τ Gill , corresponding to the continuous growth phase of the container between two successive divisions. As one should expect, a system composed by a large number of molecules exhibits small stochastic fluctuations whose average is not too far from the dynamics described by the deterministic model.
In Figure 3 we report the amount of GMM, X (k) (Figure 3(a)), at the beginning of each protocell cycle and the duplication time (Figure 3(b)), namely, the interval of time needed to double the amount of C molecules, for both the stochastic and deterministic models. Once again one can clearly observe the small fluctuations of the stochastic system around the value obtained by the numerical integration of the deterministic description (3). Let us observe that these fluctuations are due to the stochastic integrator scheme and also on the division mechanism.
We are now interested in studying the fluctuations dependence on the amount of molecules. We already know that for a sufficiently large number of molecules the stochastic dynamics follows closely the deterministic one and thus the fluctuations are small. On the other hand, one should expect that when the number of molecules decreases, then the fluctuation will rise and the system behavior could not be completely described by means of a deterministic approach. This is confirmed by Figures 4 and 5, where we can observe that a model composed by a small number of initial molecules, 20-times lesser than in the model presented in Figure 2 exhibits larger stochastic fluctuations.
In Figure 6 we summarize the results of several protocell models each one with a different amount of initial molecules, in order to appreciate the influence of the latter on the stochastic fluctuations. To compare with, we also report the case of the deterministic model. Because the kinetic constants are kept constant, the analytical theory for the deterministic model ensures that the division time does not vary [7]. Nevertheless the fewer the initial amount of X 0 and C 0 is, the larger the fluctuations present in the stochastic integration are.
To get a more complete understanding of the fluctuations dependence, we decided to measure them using the standard deviation of the protocell division time (after a sufficiently long transient phase). In Figure 7 we report the standard deviation of the division time ΔT as a function of the initial amount of molecules. As expected the fluctuations strength decreases rapidly as soon as the number of molecules increases and the relation can be very well approximated by a power law distribution with exponent −0.54 ± 0.03 (linear best fit).

Two Noninteracting Genetic Memory Molecules.
A slightly more sophisticated model can be obtained by considering two linear non interacting GMMs. The system can be described by the following chemical reactions: where P i and L i are, respectively, precursors of the ith GMM, that is, nucleotide, and precursors of amphiphiles used by the ith GMM to build a C molecule.
As previously done, we compare the stochastic and the deterministic models. Results are reported in Figure 8 and one can still observe that in presence of a large number of molecules the deterministic dynamics well approximates the stochastic model. On the other hand, the protocell division time exhibits large fluctuations around the deterministic value even in presence of quite large number of molecules (see Figure 9(b)).
The parameters have been set in such a way that only one GMM will survive according to the analytical theory for the deterministic model. One can observe that, despite the fluctuations, the same fate is obtained for the stochastic model (see Figure 9(b)).
Once we reduce the number of involved molecules, the stochastic fluctuations dramatically increase (see Figures 10 and 11).
As in the case of only one GMM, when two non interacting linear GMMs are present the size of the stochastic fluctuations as a function of the initial number of molecules follows a power law distribution with exponent −0.51 ± 0.05 (linear best fit), see Figure 12: the fewer the molecules in the system are, the larger the fluctuations around the deterministic dynamics are.
A new phenomenon arises in the case of two GMMs modeled by a stochastic process. There can be a breaking of the symmetry emerging in systems composed of two identical GMMs (i.e., equal kinetic constants, equal initial amounts, and availability of precursors) present with a few initial amounts of each one. Although adopting a deterministic approach the dynamics of the two replicators would be perfectly the same, a small fluctuation in the very first instants of the protocell evolution entails the dilution of one of the two replicators and thus a different fate for the protocell. Let us observe that the probability to have a large fluctuation is never zero, thus waiting for a sufficiently long time, a specie can always disappear from the system, thus giving rise to the breaking of the symmetry phenomenon. See Figure 13 where we report, as a function of the initial amount of molecules X i (0), i = 1, 2, the proportion of simulations where the symmetry breaking has been observed repeating 50 times each simulation with the same set of parameters and initial conditions during 100 generations.      Figure 12: Fluctuation dependence on the initial conditions. We report the standard deviation of the protocell division time as a function of the initial amount of molecules X i (0), i = 1, 2, (•) and a linear best fit, whose slope is = −0.51 ± 0.05. Parameters are X 1 (0) = X 2 (0) = 2 n with n = 0, . . . , 10, C(0) = 10X 1 (0), η 1 = η 2 = 1, α 1 = α 2 = 2, L1 = 500, L2 = 500, P1 = 500, P 2 = 600, ρ = 100, and β = 1.

Conclusion
In this paper we presented a new stochastic integration algorithm based on the one introduced by Gillespie. Our contribution is devoted to the explicit introduction of the volume variation in the algorithm, which moreover is directly related to the amount of contained molecules, and thus it evolves in a self-consistent way. This algorithm straightforwardly adapts to the study of the evolution of a protocell, simplified form of cells, where an ensemble of chemical reactions occurs in a varying volume, the volume of the protocell, that in turn increases because of the production of container molecules.
We presented several protocell models and we compare them with the analogous deterministic protocell models, namely, solved using the ODE. In this preliminary study, we emphasized the role of the fluctuations and their dependence on the initial amount of molecules. The dynamics is richer than the deterministic one and thus it is worth studying, in particular we deserve to future investigations the case where the interactions among the molecules can be modeled by a linear system, whose interaction matrix is not diagonal; the off diagonal terms representing the cross-catalysis. Also the case of nonlinear interactions will be deferred to a forthcoming paper. Also the study of the emergence of timeperiodic patterns due to the fluctuations, will be analyzed. An analytical treatment of the latter case could be possible using some recent technics developed by [15,16], see also [17] where the space is also taken into account.