Reduced variable optimization methods via implicit functional dependence with applications

Optimization methods have been broadly applied to two classes of objects viz. (i) modeling and description of data and (ii) the determination of the stationary points of functions. Here, a theoretical basis is developed that optimizes an arbitrary number of variables for classes (i) and (ii) by the minimization of a function of a single variable. Algorithms that focus on a reduced variable set also avoid problems associated with multiple minima and maxima that arise because of the large numbers of parameters. The methods described could have applications in the physical sciences where the optimization of one physically significant variable has priority over the other variables. For (i), we develop both an approximate but computationally more tractable method and an exact method where the single controlling variable k of all the other variables (P,k) passes through the local stationary point of the least squares (LS) metric. For (ii), an exact theory is developed whereby the optimized function of an independent variation of all parameters coincides with that due to single parameter optimization. The implicit function theorem has to be further qualified to arrive at this result. The topology of the surfaces of constant value of the target or cost function are considered for all the methods. A real world application of the above implicit methodology to rate constant and final concentration parameter determination for first and second order chemical reactions from published data. This work is different from and more general than all the reduction schemes for conditional linear parameters nor is it a subset of the Adomian decomposition method (ADM) used for estimating solutions of differential equations, which still require boundary conditions that do not feature in topics (i) and (ii).

other variables of secondary importance. For (i), we develop both an approximate but computationally more tractable method and an exact method where the single controlling variable k of all the other variables P(k) passes through the local stationary point of the least squares (LS) metric. For (ii), an exact theory is developed whereby the optimized function of an independent variation of all parameters coincides with that due to single parameter optimization. The implicit function theorem has to be further qualified to arrive at this result. The topology of the surfaces of constant value of the target or cost function are considered for all the methods. A real world application of the above implicit methodology to rate constant and final concentration parameter determination for first and second order chemical reactions from published data is attempted to illustrate its utility. This work is different from and more general than all the reduction schemes for conditional linear parameters used for example in extracting data from mixed signal spectra of physical quantities such as found in laser spectroscopy since it is valid for conditional and nonconditional nonlinear parameters. Nor is it a subset of the Adomian decomposition method (ADM) used for estimating solutions of differential equations, which still require boundary conditions that do not feature in topics (i) and (ii).

Introduction
The following theory and elaboration revolve about properties of constrained and unconstrained functions that are continuous and differentiable to various specified degrees [1,2], and the existence of implicit functions [3] and the form of the function to be optimized. The implicit function theorem is applied in a way that requires further qualified because the optimization problem is of an unconstrained kind without any redundant variables. Methods (i)a,b (Secs. (2 ) and (3) respectively) refers to modeling of data [4,Chap.15, where the form of the function Q M D (P, k) with independently varying variables (P, k) have the general form where y i and t i are datapoints and f a known function, and optimizations of Q M D may be termed a least squares (LS) fit over parameters (P, k) which are independently optimized. Method (ii) focuses on optimizing a general Q Opt (P, k) function, not necessarily LS in form. There are many standard and hybrid methods to deal with such optimization [4,Ch.10], such as golden section searches in 1-D, simplex methods over multidimensions [4, p.499-525], steepest descent and conjugate methods [5] and variable metric methods in multidimensions [4, p.521-525] . Hybrid methods include multidimensional (DFP) secant methods [6], BFGS (secant optimization) [7] and RFO (rational function optimization) [8] which is a Newton-Raphson technique utilizing a rational function rather than a quadratic model for the function close to the solution point. Global deterministic optimization schemes combine several of the above approaches [9, sec 6.7.6] Other ad hoc, physical methods perhaps less easy to justify analytically include probabilistic "basin-hopping" algorithms [9, sec 6.7.4], simulated annealing methods [10] and genetic algorithms [9, p.346]. An analytical justification on the other hand is attempted here, but in real-world problems some of the assumptions (e.g. C 2 continuity, compactness of spaces) may not always obtain. For what follows, the distance metric used are all Euclidean, represented by | · | or · where det · represents the determinant of the matrix | · . Reduction of the number of variables to be optimized is possible in the standard matrix regression model only if conditional linear parameters β exists [11], where these β variables do not appear in the final S(θ) expression of the least squares function (2) to be optimized, whereas the φ nonconditional linear parameters do and are a subset of the θ variables; for the existence of each conditional linear parameter, there is a unit reduction in the number of independent parameters to be optimized. These reductions in variable number occurs for any "expectation function" f (x, θ) which is the model or law for which a fitting is required, where there are N different datapoints x i , i = 1, 2, . . . N that must be used to determine the p parameter variables θ [11, p.32,Ch.2]. A conditionally linear parameter θ i exists if and only if the derivative of the expectation function f (x, θ) with respect to θ i does not involve -in other words is independent of -θ i . Clearly such a condition may severely limit the number of parameters that can be neglected for the expectation function variables when the prescribed matrix regressional techniques are employed [11,Sec.3.5.5,p.85] where the residual sum of squares is minimized: The N-vectors η(θ) in P dimensional space defines the expectation surface. If the θ variables are partitioned into the conditional linear parameters β and the other nonlinear parameters φ, then the response can be written η(β, φ) = A(φ)β. Golub and Pereyra [12] used standard Gauss-Newton algorithm to minimise S 2 (φ) = that depended only on the nonlinear parameters φ wherê β(φ) = A + (φ)y with A + being a defined pseudoinverse of A [11,Sec.3.5.5,p.85] where A + and A are matrices. The variables must be separable as discussed above and the number of variable reduction is only equal to the number of conditional linear parameters that exists for the problem. In applications, the preferred algorithm that exploits this valuable variable reduction is called Variable Projection. There are many applications in time resolved spectroscopy that is heavily dependent on this technique and many references to the method are given in the review by van Stokkum et al. [13]. Recently this method of variable projection has been extended in a restricted sense [14] in the field of inverse problems, which is not related to our method of either modeling or optimization, nor is the methodology related to the implicit function methods. In short, much of the reported methods developed are ad hoc, meaning that they are constructed to face the specific problems at hand with no pretense to any all-encompassing generality and this work too is ad hoc in the sense of suggesting variable reduction with specific classes of non-inverse problems as indicated where the work develops a method of reducing the variable number to unity for all variables in the expectation function space irrespective of whether they are conditional or not by approximating their values by a method of averages (for method(i)a) without any form of linear regression being used in determining their approximations during the minimization cycles, and without necessarily using the standard matrix theory that is valid for a very limited class of functions. Methods (i)b and (ii) are exact treatments. No "'eliminating"' of conditional linear parameters are involved in this nonlinear regression method because they are explicitly calculated. Nor is any projection in the mathematical sense involved. These more general methods could have useful applications in deterministic systems comprising many parameters that are all linked to one variable, the primary one (denoted k here) that is considered on physical grounds to be the most important one. A generalization of this method would be to select a smaller set of variables than the full parameter list. Examples of multiparameter complex systems include those for multiple-step elementary reactions each with its own rate constant that gives rise to photochemical spectra signals that must be resolved unambiguously [15]. All these complex and coupled processes in physical theories are related by postulated laws Y law (P, k, t) that feature parameters (P, k). Other examples include quantum chemical calculations with many topological and orientation variables that need to be optimized with respect to the energy, but in relation to one or a few variables, such as the molecular trajectory parameter during a chemical reaction where this variable is of primary significance in deciding on the 'reasonableness' of the analysis [9, Sec. 6.2.3,p.294]. Method (i)a and (i)b below refer to LS data-fitting algorithms. Method (i)a is an approximate method where it is proved under certain conditions that it could be a more accurate determination of parameters compared to a standard LS fit using (1). Method (i)b develops the methodology whereby its optimum value for Q M D with domain values (P, k) coincides with that of the standard LS method where the (P, k) variables are varied independently. Also discussed are the relative accuracy of both methods (i)a in subsection (2.2)and (i)b (endnote at end of section 3). Method (ii) develops a single parameter optimization where the conditions of an arbitrary Q OP T (P, k) function are met simultaneously, viz.
We note that methods (i)a, (i)b and (ii) are not related to the Adomian decomposition method and its variants that expands polynomial coefficients [16] for solutions to differential equations not connected to estimation theory; indeed here there are no boundary values that determine the solution of the differential equations.

Method (i)a theory
Deterministic laws of nature are sometimes written -for the simplest examplesin the form Y law = Y law (P, k, t) linking the variable Y law to t. The components of P, P i (i = 1, 2, ...N p ) and k are parameters. Verification of a law of form (3) relies on an experimental dataset The t variable could be a vector of variable components of experimentally measured values or a single parameter as in the example below where t i denotes values of time t in the domain space. The vector form will be denoted x. Variables (x) are defined as members of the 'domain space' of the measurable system and similarly Y law is the defined range or 'response' space of the physical measurement. Confirmation or verification of the law is based on (a) deriving experimentally meaningful values for the parameters (P, k) and (b) showing a good enough degree of fit between the experimental set Y exp (t i ) and Y law (t i ). In real world applications, to chemical kinetics for instance, several methods [17, 18, 19, 20, etc.] have been devised to determine the optimal P, k parameters, but most if not all these methods consider the aforementioned parameters as autonomous and independent (e.g. [18]). A similar scenario broadly holds for current state of the art applications of structural elucidation via energy functions [9]. To preserve the viewpoint of the inter-relationship between these parameters and the experimental data, we devise a scheme that relates P to k for all P i via the set {Y exp (t i ), t i }, and optimize the fit over k-space only. i.e. there is induced a P i (k) dependency on k via the the experimental set {Y exp (t i ), t i }. The conditions that allow for this will also be stated.

Details of method (i)a
Let N ′ be the number of dataset pairs {Y exp (t i ), t i }, N p the number of components of the P parameter, and N s the number of singularities where the use of a particular dataset (Y exp , t) leads to a singularity in the determination ofP i (k) as defined below and which must be excluded from being used in the determination ofP i (k). Then (N p + 1) ≤ (N − N s ) for the unique determination of {P, k}. Let N c be the total number of different datasets that can be chosen which does not lead to singularities. If the singularities are not choice dependent, i.e. a particular dataset pair leads to singularities for all possible choices, then we have the following definition for N c where N −Ns C Np = N c is the total number of combinations of the data-sets {Y exp (t i ), t i } taken N p at a time that does not lead to singularities in P i . In general, N c is determined by the nature of the data sets and the way in which the proposed equations are to be solved. Write Y law in the form and for a particular dataset Proof. The above follows from the Implicit function theorem (IFT) [3, Th.13.7,p.374] where k ∈ K 0 is the independent variable for the existence of the P(k) function.
We seek the solutions for P(k) subject to the above conditions for our defined where the termP and its components are defined below and where k is a varying parameter. For any of the it is in principle possible to solve for the components ofP in terms of k through the following simultaneous equations: from Lemma 1. And each α choice yields a unique solution Hence any functions of P i (k, α) involving addition and multiplication are also in C 1 . For each P i , there will be N c different solutions, . We can define an arithmetic mean (there are several possible mean definitions that can be utilized) for the components ofP as In choosing an appropriate functional form forP (eq.7) we assumed equal weightage for each of the dataset combinations; however the choice is open, based on appropriate physical criteria. We verify below that the choice ofP(k) satisfy the constrained variation of the LS method so as to emphasize the connection between the level-surfaces of the unconstrained LS with the line functionP(k) . Each P i (k, j) is a function of k whose derivative is known either analytically or by numerical differentiation. To derive an optimized set, then for the LS method, define Then for an optimized k, we have Q ′ (k) = 0. Defining the optimized solution of k corresponds to R(k) = 0 which has been reduced to a one dimensional problem. The standard LS variation on the other hand states that the variables P T = {P, k} in (4) are independently varied so that with solutions for Q T in terms of P T whenever ∂Q T /∂P T = 0. Of interest is the relationship between the single variable variation in (8 ) and the total variation in (10). Since P is a function of k, then (10) is a constrained variation where for some function of k) and whereP i are the components ofP. According to the Lagrange multiplier theory [3, Th.13.12,p.381] the function f : R n → R has an optimal value at x 0 subject to the constraints g : R n → R m over the subset S where g = (g 1 , g 2 . . . g m ) vanishes, i.e. x 0 ∈ X 0 where X 0 = {x : x ∈ S, g(x) = 0} when either of the following equivalent equations (12,13) are satisfied ∇f where det D j g i (x 0 ) = 0 and the λ's are invariant real numbers. We refer tō P i as any variable that is a function of k constructed on physical or mathematical grounds, and not just to the special defined in (7). Write where where Y exp (t i ) are the experimental subspace variables as in (6) with x ∈ X 0 defined above. We next verify the relation between Q(k) and Q T . (9 ) is equivalent to the variation of f Q (x) defined in (15) subjected to constraints g i of (14).
Proof. Define the Lagrangian to the problem as Then the equations that satisfy the stationary condition reduces to the (equivalent) simultaneous equations Substituting λ ′ j in (17) to (18) Since dP i dk = ∂p i ∂k , then (19) is equal to dQ(P,k,t) dk = 0 of the Q functions in (10,11 and 15).
Of interest is the theoretical relationship of theP, k variables of the Q function described by (11,8) denoted Q 1 and those of the free variational Q function of (15) denoted Q 2 with the variable set which can be written which is given by the following theorem, where we abbreviate where we note that the f functional form is unique and of the same form for both these α variables.
The theorem, verification and lemma above do not indicate topologically under what conditions a coincidence of solutions for the constrained and unconstrained models exists. Fig.(1) depicts the discussion below. From theorem (3), if set A represents the solution {P, k} for the unconstrained LS method and set B = {P, k ′ } for the constrained method , then B ⊇ A. Define k within the range k 1 ≤ k ≤ k 2 . Then k is in a compact space, and since P(k) ∈ C 1 ,P(k) is uniformly continuous, [2, TH.8,p.79]. Then admissible solutions to the above constraint problem with the inequality B ⊇ A implies Q(P(k)) ≥ Q min where Q min is the unconstrained minimum. The unconstrained Q = Q T LS function to be minimized in (10) implies Defining the constrained function Q c (k)  N p ), ∂Q ∂k = 0 where this solution is a special case of (iii) when the vector P ′ T is ⊥ to ∇Q = 0, i.e. P ′ T is at a tangent to the surface Q = S 2 for some S 2 ≥ Q min where this situation is shown in Fig.(1) where the vector is tangent at some point of the surface Q T = S 2 . Whilst the above characterizes the topology of a solution, the existence of a solution for the line (P(k), k) which passes through the point of the unconstrained minimum of Q T is proven below under certain conditions where a set of equations are constructed to allow for this exceptionally important application. Also discussed is the case when it may be possible for unconstrained solution set U to satisfy the inequality Q T (U) ≥ Q c , where Q C is a function designed to accommodate all solutions of (6).

Discussion of LS fit for a function Q c with a possibility of a smaller LS devi-
ation than for {P, k} parameters derived from a free variation of eq.(10) The LS function metric such as (10) implied Q(P, k) ≤ Q(P, k)) at a stationary (minimum) point for variables (P, k). On the other hand, the sets of solutions (6), N c in number provides for each set exact solutions P(k) averaged toP(k). If the {P i }, i = 1, 2, . . . N c solutions are in a δ-neighbourhood, then it may be possible that the composite function metric to be optimized over all the sets of implying that, under these conditions, the Q c of (28) with Q c,i (P)(k 0 ) = 0, in the following sense:

Nb.
A similar definition obtains for a (strictly) decreasing function with the (<) ≤ inequalities. Since the ∂B boundaries are compact, and f continuous, the maximum and minimum values are attained for all ball boundaries.
Nb. Similar conditions apply for the non-strict inequalities ≤, ≥ . The function that is optimized is Define P i as the solution vector for the equation set {} i . We illustrate the conditions where the solution P T for a free variation for the Q metric given in (10) can fulfill the inequality where Q c is as defined in (32) withP given as in (7). A preliminary result is required. Define max P i − P j = δ ∀ i, j = 1, 2, . . . N c and δP i =P − P i .
Proof. Any pointP would be located within an spherical annulus centred at P i , with radii so chosen so that by lemma (5), the following results: where f = Q c,i in (30). Choose δ i so that δ i < δP i < δ . Define Ann(δ, δ i , P i ) as the space bounded by the boundary of the balls centered on P i of radius δ and δ i (δ > δ i ). ThenP ∈ Ann(δ, δ i , P i ) by lemma (6). Since Q c in (28) is not equivalent to Q T = Q in (10)where we write here the free variation vector solution as P T , then the above results leads to the following: where (36) follows from (31) . Summing (36) leads to Q c (P T , k) > Q c (P, k).
Hence we have demonstrated that it may be more realistic or accurate to fit parameters based on a function that represents different coupling sets such as Q C above rather than the standard LS method using (28) if P T lies sufficiently far away fromP. We note that if P T is the solution of the free variation of the above Q c in (28), then from the arguments presented after the proof of theorem (3), it follows that which implies that the independent variation of all parameters in LS optimization of the Q c variation is the most accurate functional form to use assuming equal weightage of experimental measurements than the standard free variation of parameters using the Q function of (10).

Theory of method (i)b
Whilst it is advantageous in science data-analysis to optimize a particular multiparameter function by focusing on a few key variables (our k variable of restricted dimensionality, which we have applied to a 1-dimensional optimization in the next section) , it has been shown that this method yields a solution that is always of higher value for the same Q function than a full and independent parameter optimization, meaning that it is less accurate. The key issue, therefore, is whether for any Q function, including those of the Q c variety, it is possible to construct a k parameter optimization such that the line of parameter variables P(k) passes through the minimum surface of the Q function. We develop a theory to construct such a function below. However, Method(i) may still be advantageous because of the greater simplicity of the equations to be solved, and the fact that C 1 f (i) functions were required, whereas here there the f (i) functions must be at least C 2 continuous. Proof. As before f (i) = f (P, t i , k) so that (38) and for an independent variation of the variables (P, k) at the stationary point, we have The above results for the functions h j (P, k) = 0 (j = 1, 2 . . . N p ) to have a unique implicit function of k denoted P(k) by the IFT [3, Th.13.7,p.374] requires that det ∂h i (P,k) on an open set S, k ∈ S. More formally, the expansion of the preceding determinant in (41) verifies that a symmetric matrix obtains for ∂h i (P,k) ∂P j due to the commutation of second order partial derivatives of P j Defining Q 1 (k) as a function of k only by expanding Q T yields the total derivative w.r.t. k as Q ′ 1 (k) where Then h i (P, k) = 0 by construction (39) so that ∂Q T ∂P j = 0 (j = 1, 2, . . . N p ) and Substituting (44) derived from (39) and (40) into (43) together with the condition = 0, which satisfies (40) for the free variation in k. Thus Q ′ 1 (k) = 0 ⇒ δQ T = 0 for independent variation of (P, k). So Q T fulfills the criteria of a stationary point at say k = k 0 , since ∇ P,k Q T = 0 ([2, Prop. 16, p.112]). Suppose that Q T is convex, where P T = {P 0 , k 0 } is a minimum point, P T ∈ D, a convex subdomain of Q T . Then at P T , ∇ P,k Q T = 0 , and P T is also the unique global minimum over D according to [1,Theorem 3.2,pg.46]. Thus P T is unique, whether derived from a free variation of (P, k) or via P (k) dependent parameters with the Q 1 function.
Nb. As before, Q T and Q 1 may be replaced with the summation of indexes as for Q c in (28) to derive a physically more accurate fit.

Method (ii) theory
Methods (i)a and (i)b, which are mutual variants of each other are applications of the implicit method to modeling problems. Here, another variant of the implicit methodology for optimization of a target or cost function Q E is presented. One can for instance consider Q E (P, k) to be an energy function with coordinates R = {P, k}, where as before the components of P is P j , j = 1, 2, ... . . . N p , k ∈ R is another coordinate so that R ∈ R Np+1 . For bounded systems, (such as the molecular coordinates) , one can write Then the equilibrium conditions become o j (P, k) = 0 (48) κ(P, k) = 0 (49) Take (46) as the defining equations for o j (P, k) which is specified by Q E in (46) which casts it in a form compatible with the IFT where some further qualification is required for (P, k). Assume Q E is C 2 on D, and det ∂o i Then denote k i as a solution to Q ′ E,1 (k) = 0 in the indicated range where k i ∈ T 0 in the indicated range above in (45). Proof. If Q ′ E,1 (k) = 0 where k ∈ T 0 , then it also follows from the IFT that o j = 0, and therefore ∂Q E ∂k = 0 from (50), which satisfies (46 ) and (47) for the equilibrium point. The conditions (i) and (ii) of the theorem is a requirement of the IFT. Conversely, if o j = 0, (j = 1, 2, . . . N p ) and ∂Q E ∂k = 0 (a stationary or equilibrium point), then by (50) Q ′ E,1 (k) = 0. Hence the coordinates {k i } for which Q ′ E,1 (k i ) = 0 refers to the condition where δQ E (P, k i ) = 0, and uniqueness follows from the IFT reference to the local uniqueness of the P(k) function.
Nb. In a bounded system, one can choose any of the N p components P j of P as the k coordinate, partly based on the convenience of solving the implicit equations and determine the k i minima and thus determine by the uniqueness criterion the coordinates of the minima in R Np+1 space. For non-degenerate coordinate choice, meaning that for a particular k coordinate choice, there does not exist an equilibrium structure (meaning a set of coordinate values) where for any two structures A and B, k A = k B . For such structures, the total number of minima that exists within the bounded range in the k coordinate equals the total number of minima of the target function Q E (P, k) within the bounded range. Hence a method exists for the very challenging problem of locating and enumerating minima [9, Sec 5.1,p.242 "How many stationary points are there" ] . From the uniqueness theorem of IFT, one could infer points in the k axis where non-uniqueness, obtains, i.e. whenever det ∂ 2 Q E ∂P i ∂P j = 0. In such cases, for particles with the same intermolecular potentials, permutation of the coordinates in conjunction with symmetry considerations could be of use in selecting the appropriate coordinate system to overcome these systems with degeneracies [9, Sec. 4.2.5,p.205, "Appearance and dissappearance of symmetry elements"]. Other methods include scanning through one different 1 − D graph for the P l coordinate to locate minima if relative to the P i coordinate, there exists for the same particular k i value in the P i coordinate, there exists two structures with two different values for the P l coordinate. Thus by scanning through all or a select number of the (1-D) P j profile for Q E,1 , it would be possible to make an assign of the location of a minima in R N p+1 space. One is reminded of the methods that spectroscopists use in assigning different energy bands based on selection rules to uniquely characterize for instance vibrational frequencies. A similar analogy obtains for X-ray reflections, where the amplitude variation of the X-ray intensity in reciprocal space can be used to elucidate structure. The minima of the 1 − D k coordinate scan must correspond to the minima in R N p+1 space of the Q E function given that all such minima in Q E are locally strict and global within a small open set about the minima. By continuity, Q E,1 (k) − Q E,1 (k 0 ) > 0 for | k − k o |< δ and for | P (k) − P (k 0 ) |< δ 2 which violate the condition for a maximum.

Applications in Chemical Kinetics
The utility of the above triad of methods is illustrated in the determination of two parameters in chemical reaction rate studies, of 1 st and 2 nd order respectively using data from published literature , where method(i)a yields values close within experimental error to those quoted in the literature. The method can directly derive certain parameters like the final concentration terms (e.g. λ ∞ and Y ∞ if k, the rate constant is the single optimizing variable in this approximation. We assume here that the rate laws and rate constants are not slowly varying functions of the reactant or product concentrations, which has recently from simulation been shown to be generally not the case [21]. Under this standard assumption, the rate equations below all obtain. The first order reaction studied here is (i) the methanolysis of ionized phenyl salicylate with data derived from the literature [22, Table 7.1,p.381] and the second order reaction analyzed is (ii) the reaction between plutonium(VI) and iron(II) according to the data in [23,

First order results
Reaction (i) above corresponds to where the rate law is pseudo first-order expressed as with the concentration of methanol held constant (80% v/v) and where the physical and thermodynamical conditions of the reaction appears in [22, Table 7.1,p.381]. The change in time t for any material property λ(t), which in this case is the Absorbance A(t) (i.e. A(t) ≡ λ(t)) is given by for a first order reaction where λ 0 refers to the measurable property value at time t = 0 and λ ∞ is the value at t = ∞ which is usually treated as a parameter to yield the best least squares fit even if its optimized value is less for monotonically increasing functions (for positive dλ dt at all t) than an experimentally determined λ(t) at time t. In Table 7.1 of [22] for instance, A(t = 2160s) = 0.897 > A opt,∞ = 0.882 and this value of A ∞ is used to derive the best estimate of the rate constant as 16.5 ± 0.1 × 10 −3 sec −1 . For this reaction, the P i of (4) refers to λ ∞ so that P ≡ λ ∞ with N p = 1 and k ≡ k a . To determine the parameter λ ∞ as a function of k a according to (8) based on the entire experimental {(λ exp , t i )} data set we invert (52) and write where the summation is for all the values of the experimental dataset that does not lead to singularities, such as when t i = 0, so that here N s = 1. We define the nonoptimized, continuously deformable theoretical curve λ th where λ th ≡ Y th (t, k) in (5) as With such a relationship of the λ ∞ parameter P to k, we seek the least square minimum of Q 1 (k), where Q 1 (k) ≡ Q of (8) for this first-order rate constant k in the form where the summation is over all the experimental (λ exp (t i ), t i ) values. The resulting P k function (9) for the first order reaction based on the published dataset is given in Fig.(3). The solution of the rate constant k corresponds to the zero value of the function, which exists for both orders. The P parameters (λ ∞ and Y ∞ ) are derived by back substitution into eqs. (53) and (58) respectively. The Newton-Raphson (NR) numerical procedure [4, p.456] was used to find the roots to P k . For each dataset, there exists a value for λ ∞ and so the error expressed as a standard deviation may be computed. The tolerance in accuracy for the NR procedure was 1. × 10 −10 . We define the function deviation f d as the standard deviation of the experimental results with the best fit curve f d = Our results are as follows: k a = 1.62 ± .09 × 10 −2 s −1 ; λ ∞ = 0.88665 ± .006; and f d = 3.697 × 10 −3 . The experimental estimates are : k a = 1.65 ± .01 × 10 −2 s −1 ; λ ∞ = 0.882 ± 0.0; and f d = 8.563 × 10 −3 . The experimental method involves adjusting the A ∞ ≡ λ ∞ to minimize the f d function and hence no estimate of the error in A ∞ could be made. It is clear that our method has a lower f d value and is thus a better fit, and the parameter values can be considered to coincide with the experimental estimates within experimental error. Fig.(2) shows the close fit between the curve due to our optimization procedure and experiment. The slight variation between the two curves may well be due to experimental uncertainties.

Second order results
To further test our method, we also analyze the second order reaction which also follows from the work of Newton et al. [23, eqns. (8,9),p.1429] whose data [23, According to Espenson, one cannot use this equivalent form [24, p.25] "because an experimental value of Y ∞ was not reported." However, according to Espenson, if Y ∞ is determined autonomously, then k the rate constant may be determined. Thus, central to all conventional methods is the autonomous and independent status of both k and Y ∞ . We overcome this interpretation by defining Y ∞ as a function of the total experimental spectrum of t i values and k by inverting where the summation is over all experimental values that does not lead to singularities such as at t i = 0. In this case, the P parameter is given by Y ∞ (k) = P 1 (k), k b = k is the varying k parameter of (4). We likewise define a function Y th of k that is also a function of t, but where the k parameter is interpreted as a "distortion" parameter in the following manner: In order to extract the parameters k and Y ∞ we minimize the square function Q 2 (k) for this second order rate constant with respect to k given as where the summation is over the experiment t i coordinates. Then the solution to the minimization problem is when the corresponding P k function (9) is zero. The NR method was used to solve P k = 0 with the error tolerance of 1.0×10 −10 . With the same notation as in the first order case, the second order results are: Again the two results are in close agreement. The graph of the experimental curve and the one that derives from our optimization method in given in Fig.(4).

Conclusions
The triad of associated implicit function optimization covers both modeling of data and the optimization of arbitrary functions where experimental or theoretical considerations require that a single variable is tagged to a process variable that is iteratively relaxing to equilibrium. Applying method (i)a to chemical kinetics allows for the direct determination of parameters not possible by the standard methodologies used. The results presented here show that for linked variables, it is possible to derive all the parameters associated with a curve by considering only one independent variable which serves as the independent variable for other functions in the optimization process that uses the experimental dataset as function values in the estimation. Apart from possible reduced errors in the computations, it might also be a more accurate way of deriving parameters that are more influenced or conditioned (on physical grounds) by the value of one parameter (such as k here) than others; the current methods that gives equal weight  to all the variables might in some cases lead to results that would be considered "unphysical". In complex dynamical systems with multiprocesses, the physical considerations are such that for scientific purposes, it would be advantageous if optimization would be conducted on just one primary coordinate variable, such as in attempting to derive the most general stable conformer in a large molecule, where there are thousands of local minima present if all free coordinate variables are considered [9, Sec.6.7, p.330] . This generalized potential surface might be found suitable for reaction trajectory calculations [9,Ch.4,p.192 on "Features of a landscape"] that require a single path variable, where the general optimized conformer would be relevant to the study of the potential surfaces and force fields present.

Acknowledgments
This work was supported by University of Malaya Grant UMRG(RG077/09AFR) and Malaysian Government grant FRGS(FP084/2010A). Cordial discussions concerning real world applications with Gareth Tribello (ASC) is acknowledged. I thank Jorge Kohanoff (ASC) and Christopher Hardacre (Chem. Dept., QUB) for congenial hospitality during this sabbatical.