ADIFOR -- Generating Derivative Codes from Fortran Programs

The numerical methods employed in the solution of many scientific computing problems require the computation of derivatives of a function f $R^N$→$R^m$. Both the accuracy and the computational requirements of the derivative computation are usually of critical importance for the robustness and speed of the numerical solution. Automatic Differentiation of FORtran (ADIFOR) is a source transformation tool that accepts Fortran 77 code for the computation of a function and writes portable Fortran 77 code for the computation of the derivatives. In contrast to previous approaches, ADIFOR views automatic differentiation as a source transformation problem. ADIFOR employs the data analysis capabilities of the ParaScope Parallel Programming Environment, which enable us to handle arbitrary Fortran 77 codes and to exploit the computational context in the computation of derivatives. Experimental results show that ADIFOR can handle real-life codes and that ADIFOR-generated codes are competitive with divided-difference approximations of derivatives. In addition, studies suggest that the source transformation approach to automatic differentiation may improve the time to compute derivatives by orders of magnitude.


Introduction
The methods employed for the solution of many scienti c computing problems require the evaluation of derivatives of some function.Probably best known are gradient methods for optimization 22], Newton's method for the solution of nonlinear systems 15, 2 2 ], and the numerical solution of sti ordinary di erential equations 8,21].Other examples can be found in 17].In the context of optimization, for example, given a function f : R n !R one can nd a minimizer x of f using variable metric methods that involve the iteration for i = 1 , 2 , . . . .do Solve B i s i = ;rf(x i ) x i+1 = x i + i s i end for for suitable step multipliers i > 0.Here rf(x) = 0 B @ @ @x1 f (x) . . .@ @xn f (x) is the gradient of f at a particular point x 0 , and B i is a positive de nite matrix that may c hange from iteration to iteration.
In the context of nding the root of a nonlinear function Newton's method requires the computation of the Jacobian matrix f 0 (x) = 0 B @ @ @x1 f 1 (x) @ @xn f 1 (x) . . . . . .@ @x1 f n (x) @ @xn f n (x) 1 C A: (2) Then, we execute the following iteration: for i = 1 , 2 , . . . .do Solve f 0 (x i )s i = ;f(x i ) x i+1 = x i + s i end for Another important application is the numerical solution of initial value problems in sti ordinary di erential equations.Methods such as implicit Runge-Kutta 9] and backward di erentiation formula (BDF) 35] methods require a Jacobian which is either provided by the user or approximated by divided di erences.Consider a system of ODEs y 0 = f (t y) y(t 0 ) = y 0 : (3) System ( 3) is called sti if its Jacobian J = @f @y (in a neighborhood of the solution) has eigenvalues i with Re( i ) << 0 in addition to eigenvalues of moderate size.Equations of this type arise frequently in chemical reaction models, for example.They can be of arbitrarily large dimension, because they also arise as discretizations of partial di erential equations, where J is large and sparse.If explicit methods (multistep, Runge-Kutta, Taylor, or extrapolation) are applied, the step size must be very small in order to retain desirable stability properties of the method.That is why authors as early as the 1920's 19] and again in the late 1940's 20] w ere led to consider implicit methods in which the approximate solution y i+1 at t = t i+1 is given by the solution to some nonlinear system (y i+1 ) = 0 which is solved by a Newton-type iteration requiring the Jacobian @ @y .The exact form of di ers from one implicit method to another, but for many methods, (y i+1 ) = y i+1 ; hf (t i+1 y i+1 ) + terms not involving y i+1 so the user is asked to supply the Jacobian J = @f @y .These methods are examples of a large class of methods for numerical computation, where the computation of derivatives is a crucial ingredient i n t h e n umerical solution process.The function f is not usually represented in closed form, but in the form of a computer program.
For purposes of illustration, we assume that f : x 2 R n 7 !y 2 R and that we wish to compute the derivatives of y with respect to x. W e c a l l x the independent variable and y the dependent variable.While the terms \dependent", \independent", and \variable" are used in many di erent c o n texts, this terminology corresponds to the mathematical use of derivatives.There are four approaches to computing derivatives (these issues are discussed in more detail in 29]): By Hand: Hand coding is increasingly di cult and error-prone, especially as the problem complexity increases.
Divided Di erences: The derivative o f f with respect to the ith component o f x at a particular point x 0 is approximated by either one-sided di erences @ f (x) @ x i x=x0 f (x 0 h e i ) ; f (x 0 ) h or central di erences @ f (x) @ x i x=x0 f (x 0 + h e i ) ; f (x 0 ; h e i ) 2h : Here e i is the ith Cartesian basis vector.Computing derivatives by divided di erences has the advantage that we need only the function as a \black b o x".The main drawback of divided di erences is that their accuracy is hard to assess.A small step size h is needed for properly approximating derivatives, yet may lead to numerical cancellation and the loss of many digits of accuracy.In addition, di erent scales of the x i 's may require di erent step sizes for the various independent v ariables.
Symbolic Di erentiation: This functionality i s p r o vided by symbolic manipulation packages such a s Maple, Reduce, Macsyma, or Mathematica.Given a string describing the de nition of a function, symbolic manipulation packages provide exact derivatives, expressing the derivatives all in terms of the intermediate variables.For example, if This is correct, yet it does not represent a v ery e cient w ay to compute the derivatives, since there are a lot of common subexpressions in the di erent derivative expressions.Symbolic di erentiation is a p o werful technique, but it may not derive good computational recipes, and it may r u n i n to resource limitations when the function description is complicated.Functions involving branches or loops cannot be readily handled by s y m bolic di erentiation.
Automatic Di erentiation: Automatic di erentiation techniques rely on the fact that every function, no matter how complicated, is executed on a computer as a (potentially very long) sequence of elementary operations such as additions, multiplications, and elementary functions such as sin and cos.By applying the chain rule over and over again to the composition of those elementary operations, one can compute derivative information of f exactly and in a completely mechanical fashion.ADIFOR transforms Fortran 77 programs using this approach.For example, if we h a ve a program for computing f = Q 5 ADIFOR produces a program whose computational section is shown in Figure 1.
The $ sign is used to emphasize ADIFOR-generated variables.To i m p r o ve readability, w e deleted continuation line characters.If the variable x is initialized to the desired value x 0 , g$p$ to 5, and the array g$x to the 5 5 i d e n tity matrix, then on exit the vector g$f contains @ f (x) @ x j x=x0 .No redundant subexpressions are computed here, since the overall product is computed in a binary-tree like fashion, and the proper pieces of the product are reused in the derivative computation.The rationale underlying this code will be given in Section 2.
Symbolic di erentiation uses the rules of calculus in a more or less mechanical way, although some e ciency can be recouped by b a c k-end optimization techniques 13, 2 8 ].In contrast, automatic di erentiation is intimately related to the program for the computation of the function to be di erentiated.By applying the chain-rule step by step to the elementary operations executed in the course of computing the \function", automatic di erentiation computes exact derivatives (up to machine precision, of course) and avoids the potential pitfalls of divided di erences.The techniques of automatic di erentiation are directly applicable to functions with branches and loops.
ADIFOR is a tool to provide automatic di erentiation for programs written in Fortran 77.Given a Fortran subroutine (or collection of subroutines) for a function f , ADIFOR produces Fortran 77 subroutines for the computation of the derivatives of this function.ADIFOR di ers from other approaches to automatic di erentiation (see 39] for a survey) by being based on a source translator paradigm and by h a ving been designed from the outset with large-scale codes in mind.ADIFOR provides several advantages: Portability: ADIFOR produces vanilla Fortran 77 code.ADIFOR-generated derivative code does not require any run-time support and can easily be ported between di erent computing environments.
Generality: ADIFOR supports almost all of Fortran 77, including arbitrary calling sequences, nested subroutines, common blocks, and equivalences.Fortran 77 functions and statement functions will be supported in the next version of ADIFOR.We d o n o t a n ticipate support for i/o, alternate returns for subroutines, or entry statements.E ciency: ADIFOR-generated derivative code is competitive with codes that compute the derivatives by divided di erences.In most applications we h a ve run, the ADIFOR-generated code is faster than the divided-di erence code.
Preservation of Software Development E ort: The code produced by ADIFOR respects the data ow structure of the original program.That is, if the user invested the e ort to develop code that vectorizes and parallelizes well, then the ADIFOR-generated derivative c o d e a l s o v ectorizes and parallelizes well.In fact, the derivative code o ers more scope for vectorization and parallelization.
Extensability: ADIFOR employs a consistent subroutine-naming scheme that allows the user to supply his own derivative routines.In this fashion, the user can exploit domain-speci c knowledge, exploit vendor-supplied libraries, and reduce computational bottlenecks.
Ease of Use: ADIFOR requires the user to supply the Fortran source code for the subroutine representing the function to be di erentiated and for all lower-level subroutines.The user then selects the variables (in either parameter lists or common blocks) that correspond to the independent and dependent variables.ADIFOR then determines which other variables throughout the program require derivative information.
Intuitive I n terface: An X-windows interface for ADIFOR (called \xadifor") makes it easy for the user to set up the ASCII script le that ADIFOR reads.This functional division makes it easy both to set up the problem and to rerun ADIFOR if changes in the code for the target function require a new translation.Using ADIFOR, one then need not worry about the accurate and e cient computation of derivatives, even for complicated \functions".As a result, the computational scientist can concentrate on the more important issues of algorithm design or system modeling.
In the next section, we shall give a brief introduction to automatic di erentiation.Section 3 describes how ADIFOR provides this functionality in the context of a source transformation environment, and gives the rationale for choosing such an approach.Section 4 gives a brief introduction into the use of ADIFOR-generated derivative codes, including the exploitation of sparsity structure in the derivative matrices.In Section 5, we present some experimental results which show that the run-time required for ADIFOR-generated exact derivative codes compares very favorably with divided-di erence derivative approximations.Lastly, w e outline ongoing work and present evidence that the source-transformation approach to automatic di erentiation may reduce the time to compute derivatives by orders of magnitudes.

Automatic Di erentiation
We illustrate automatic di erentiation with an example.Assume that we h a ve the sample program shown in Figure 2 for the computation of a function f : R 2 !R 2 .Here, the vector x contains the independent variables, and the vector y contains the dependent v ariables.The function described by this program is de ned except at x(2) = 0 and is di erentiable except at x(1) = 2 .
Figure 2. Sample program for a function f : x 7 !y By associating a derivative o b j e c t rt with every variable t, w e can transform this program into one for computing derivatives.Assume that rt contains the derivatives of t with respect to the independent variables x, rt = @ t @ x(1) @ t @ x(2) !: We can propagate those derivatives by using elementary di erentiation arithmetic based on the chain rule (see 45] for more details).For example, the statement a = x(1) + x(2) implies ra = rx(1) + rx(2): The chain rule, applied to the statement y(1) = a=x(2) implies that ry(1) = @ y(1) @ a r a + @ y(1) @ x(2) r x(2) = 1:0=x(2) r a + ( ;a=(x(2) x(2))) r x(2): Care has to be taken when the same variable appears on both the left-and the right-hand sides of an assignment statement.For example, the statement a = a x(i) implies ra = x(i) r a + a r x(i): However, simply combining these two statements leads to the wrong results, since the value of a referred to in the right-hand side of the ra assignment is the value of a before the assignment a = a*x(i) has been executed.We a void this di culty in the ADIFOR-generated code by using a temporary variable as shown in Figure 3.
Elementary functions are easy to deal with.For example, the statement Straightforward application of the chain rule in this fashion then leads to the pseudo-code shown in Figure 3 for computing the derivatives of y(1) and y(2).The situation gets more complicated when the source statement is not just a binary operation.For example, consider the statement w = ;y=(z z z) where y and z depend on the independent v ariables.We h a ve already computed ry and rz and now wish to compute rw.By breaking up this compound statement i n to unary and binary statements as shown in Figure 4, Figure 4. Expansion of w = -y / ( z * z * z ) in unary and binary operations we could simply apply the mechanism that was used in Figure 3 and associate a derivative computation with each binary or unary statement (the resulting pseudo-code is shown in the left half of Figure 6).
There is another way, though.The chain rule tells us that rw = @ w @ y r y + @ w @ z r z: Hence, if we know the \local" derivatives ( @ w @ y @ w @ z ) o f w with respect to z and y, w e can easily compute rw, the derivatives of w with respect to x.The \local" derivatives ( @ w @ y @ w @ z ) can be computed e ciently by using the reverse mode of automatic di erentiation.Here we maintain the derivative of the nal result with respect to an intermediate quantity.These quantities are usually called adjoints.They measure the sensitivity of the nal result with respect to some intermediate quantity.This approach is closely related to the adjoint sensitivity analysis for di erential equations that has been used at least since the late sixties, especially in nuclear engineering 10, 1 1 ], in weather forecasting 41], and even in neural networks 50].
In the reverse mode, let tbar denote the adjoint object corresponding to t.The goal is for tbar to contain the derivative @ w @ t .W e k n o w that wbar = @ w @ w = 1 :0.We can compute ybar and zbar by applying the following simple rule to the statements executed in computing w, but in reverse order: The forward mode code in Figure 6 requires that space be allocated for three auxiliary gradient objects, and the code contains four gradient computation loops.In contrast, the reverse mode code requires only ve scalar auxiliary derivative objects and has only one gradient loop.In either case, the storage required by automatic di erentiation is at most the amount of storage required by the original function evaluation times the length of the gradient objects computed.
Figures 5 and 6 illustrate a very simple example of using the reverse mode.The reverse mode requires fewer operations if the numb e r o f i n d e p e n d e n t v ariables is larger than the number of dependent v ariables.This is exactly the case for computing a gradient, which can be viewed as a Jacobian matrix with only one row.This issue is discussed in more detail in 29, 3 1 , 33].
Despite the advantages of the reverse mode with regard to complexity, the implementation of the reverse mode for the general case is quite complicated.It requires the ability to access in reverse order the instructions performed for the computation of f and the values of their operands and results.Current tools (see 39]) achieve this by storing a record of every computation performed.Then an interpreter performs a backward pass on this \tape."The resulting overhead often annihilates the complexity advantage of the reverse mode in an actual implementation (see 23, 2 4 ]).
ADIFOR uses a hybrid approach.It is generally based on the forward mode, but uses the reverse mode to compute the gradients of assignment statements, since for this restricted case the reverse mode can easily be implemented by a source-to-source translation.We also note that even though we s h o wed the computation only of rst derivatives, the automatic di erentiation approach can easily be generalized to the computation of univariate Taylor series or multivariate higher-order derivatives 14, 45,30].
The derivatives computed by automatic di erentiation are highly accurate, unlike those computed by divided di erences.Griewank and Reese 34] s h o wed that the derivative objects computed in the presence of round-o corresponds to the exact result of a nonlinear system whose partial derivatives have been perturbed by factors of at most (1 + " i j ) 2 , where j" i j j ", the relative m a c hine precision.

ADIFOR Design Philosophy
The examples in the preceding section have s h o wn that the principles underlying automatic di erentiation are not complicated: We just associate extra computations (which are entirely speci ed on a statement-bystatement basis) with the statements executed in the original code.As a result, a variety of implementations of automatic di erentiation have been developed over the years (see 39] for a survey).
Most of these implementations implement automatic di erentiation by means of operator overloading, which is a language feature in C++, Ada, Pascal-XSC, and Fortran 90 18].Operator overloading provides the possibility of associating side-e ects with arithmetic operations.For example, with an addition \+" we now could associate the addition of the derivative v ectors that is required in the forward mode.Operator overloading also allows for a simple implementation of the reverse mode, since as a by-product of the computation of f we can store a record of every computation performed and then have a n i n terpreter perform a backward pass on this \tape."The only drawback is that for straightforward implementations, the length of the tape is proportional to the number of arithmetic operations performed 33,5 ].Recently, Griewank 31] suggested an approach t o o vercome this limitation through clever checkpointing.
Nonetheless, for all their simplicity and elegance, operator overloading approaches present t wo fundamental drawbacks: Loss of context: Since all computation is performed as a by-product of an elementary operation, it is very di cult, if not impossible, to perform optimizations that transcend one elementary operation (such a s the constant-folding techniques that simpli ed the reverse mode shown in Figure 5 into that shown in Figure 6).The resulting disadvantages, especially those associated with the exploitation of parallelism, are discussed in 2].
Loss of E ciency: The overwhelming majority of codes for which computational scientists want derivatives are written in Fortran, which does not support operator overloading.While we can emulate operator overloading by associating a subroutine call with each e l e m e n tary operation, this approach slows computation considerably, and usually also imposes some restrictions on the syntactic structure of the code that can be processed.Examples of this approach are DAPRE 43, 4 9 ], GRESS/ADGEN 37, 3 8 ], and JAKEF 36].Experiments with some of those systems are described in 48].The lack of e ciency of previously existing tools has prevented automatic di erentiation from becoming a standard tool for mainstream high-performance computing, even though there are numerous applications where the need for accurate rst-and higher-order derivatives essentially mandated the use of automatic differentiation techniques and prompted the development of custom-tailored automatic di erentiation systems (see 32]).For the majority of applications, however, automatic di erentiation techniques were substantially slower than divided-di erence approximations, discouraging potential users.
The issues of ease of use and portability h a ve received scant attention in software for automatic di erentiation as well.In many applications, the \function" of which w e wish to compute derivatives is a collection of subroutines, and all that really should be expected of the user is to specify which o f t h e v ariables correspond to the independent and dependent v ariables.In addition, the automatic di erentiation code should be easily transportable between di erent machines.
ADIFOR takes those requirements into account.Its user interface is simple, and the ADIFOR-generated code is e cient and portable.Unlike previous approaches, ADIFOR can deliver this functionality because it views automatic di erentiation from the outset as a source transformation problem.The goal is to automate and optimize the source translation process that was shown in very simple examples of the preceding section.By taking a source translator view, we can bring the many man-years of e ort of the compiler community to bear on this problem.
ADIFOR is based on the ParaScope programming environment 1 2 ] w h i c h c o m bines dependence analysis with interprocedural analysis to support the semi-automatic parallelization of Fortran programs.While our primary goal is not the parallelization of Fortran programs, the ParaScope environment provides us with a Fortran parser, data abstractions for representing Fortran programs, and tools for constructing and manipulating those representations.In particular, ParaScope tools gather data ow facts for scalars and arrays, dependence graphs for array elements, control ow graphs, and constant and symbolic facts.The data dependence analysis capabilities are critical for determining which v ariables need to have derivative objects associated with them, a process we call variable nomination.Only those variables z whose values depend on an independent v ariable x and in uence a dependent v ariable y need to have derivative information associated with them.Such a v ariable is called active.Variables that do not require derivative information are called passive.Interprocedurally, v ariable nomination proceeds in a series of passes over the program call graph by using an \interaction matrix" for each subroutine.Such a matrix represents a bipartite graph.Input parameters or variables in common blocks are connected with output parameters or variables in common blocks whose values they in uence.This dependency analysis is also crucial in determining the sets of active/passive v ariable binding contexts in which e a c h subroutine may b e i n voked.For example, consider the following code for computing y = 3 .0 * x * x : subroutine threexx(x,y) call prod(3.0,x,t)call prod(t,x,y) end subroutine prod(x,y,z) z = x * y end In the rst call to prod, only the second and third of prod's parameters are active, whereas in the second call, all variables are active.ADIFOR recognizes this situation and performs procedure cloning to generate di erent augmented versions of prod for these di erent c o n texts.The decision to do cloning based on active/passive v ariable context will eventually be based on an assessment o f t h e s a vings made possible by i n troducing the cloned procedures, in accordance with the goal-directed interprocedural transformation approach being adopted within ParaScope 7].
Another advantage of a compiler-based approach i s t h a t w e h a ve the mechanism in place for simplifying the derivative code that has been generated by application of the simple statement-by-statement rules.For example, consider the reverse mode code shown in Figure 5.By applying constant folding and eliminating variables that are used only once, we eliminate multiplications by 1.0 and additions to 0, and we reduce the numb e r o f v ariables that must be allocated.
In summary, A D I F OR proceeds as follows: 1.The user speci es the subroutine that corresponds to the \function" for which h e w i s h e s d e r i v atives, as well as the variable names that correspond to dependent and independent v ariables.These names can be subroutine parameters or variables in common blocks.In addition to the source code for the function subroutine, the user must submit the source code for all subroutines that are directly or indirectly called from this subroutine.2. ADIFOR parses the code, builds the call graph, collects intra-and interprocedural dependency information, and determines active v ariables.3. Derivative objects are allocated in a straightforward fashion: Derivative objects for parameters are again parameters derivative objects for variables in common blocks and local variables are again allocated in common blocks and as local variables, respectively.4. The original source code is augmented with derivative statements | the reverse mode is used for assignment statements, the forward mode overall.Subroutine calls are rewritten to propagate derivative information, and procedure cloning is performed as needed.5.The augmented code is optimized, eliminating unnecessary arithmetic operations and temporary variables.The resulting code generated by A D I F OR can be called by users' programs in a exible manner to be used in conjunction with standard software tools for optimization, solving nonlinear equations, or for sti ordinary di erential equations.A discussion of calling the ADIFOR-generated code from users' programs is included in 4]. 4 The Functionality of ADIFOR-Generated Derivative Codes The functionality provided by A D I F OR is best understood through an example.Our example is adapted from problem C2 in the STDTST set of test problems for sti ODE solvers 25].The routine FCN2 shown in Figure 7 that computes the right-hand side of a system of ordinary di erential equations y 0 = f (x y) by calling a subordinate routine FCN.In the numerical solution of the ordinary di erential equation, the Jacobian @ f @ y is required.Original code for problem C2 Nominating Y as independent v ariables, and YP as dependent ones, ADIFOR produces the code shown in Figures 8 and 9.We use the dollar sign $ to indicate ADIFOR-generated names.In practice, ADIFOR generates variable names which do not con ict with any names appearing in the original program.We see that the derivative codes have a gradient object associated with every dependent v ariable.Our convention is to associate a gradient g$<var> of leading dimension ldg$<var> with variable <var>.T h e calling sequence of g$foo$n is derived from that of foo by inserting an argument g$p$ denoting the length of the gradient v ectors as the rst argument, and then copying the calling sequence of foo, inserting g$<var> and ldg$<var> after every active v ariable <var>.P assive v ariables or those not of REAL or DOUBLE PRECISION type are left untouched.Subroutine g$fcn2$6 relates to the Jacobian J yp = 0 B B @ @yp1 @y1 @yp1 @ym . . . . . .@ypm @y1 @ypm @ym 1 C C A as follows: Given input values for g$p$, m, x, y, g$y, ldg$y, and ldg$yp, the routine g$fcn2$6 computes yp and g$yp, where g$yp(1:g$p$,1:m) = (J yp (g$y(1:g$p$,1:m) T )) T The superscript T denotes matrix transposition.While the implicit transposition may s e e m a wkward at rst, this is the only way to handle assumed-size arrays (like real a(*)) in subroutine calls.It is the responsibility of the user to allocate g$yp and g$y with leading dimensions ldg$yp and ldg$y that are at least g$p$.
For example, to compute the Jacobian of yp with respect to y, w e initialize g$y to be an m m identity matrix and set p to m.After the call to g$fcn2$6, g$yp contains the transpose of the Jacobian of yp with respect to y.I f w e wish to compute only a matrix-vector product (as is often the case when iterative s c hemes are applied to solve equation systems with the Jacobian as the coe cient matrix), we s e t p = 1 a n d g$y to the vector by which the Jacobian is to be multiplied.
From the forementioned discussion, ADIFOR-generated code is well suited for computing dense Jacobian matrices.We will now show that it can also exploit the sparsity structure of Jacobian matrices.Remember that the forward mode of automatic di erentiation upon which ADIFOR is mainly based requires roughly g$p$ operations for every assignment statement in the original function.Thus, if we compute a Jacobian J with n columns by s e t t i n g g$p$ = n, its computation will require roughly n times as many operations as the original function evaluation, independent of whether J is dense or sparse.However, it is well known 16,27] that the number of function evaluations that are required to compute an approximation to the Jacobian by divided di erences can be much less than n if J is sparse.The same idea can be applied to greatly reduce the running time of ADIFOR-generated derivative c o d e a s w ell.
As an example, consider the swirling ow problem, which comes from Parter 42] and is part of the MINPACK{2 test problem collection 1].The problem is a coupled system of boundary value problems describing the steady ow of a viscous, incompressible, axisymmetric uid between two rotating, in nite coaxial disks.The numb e r o f v ariables in the resulting optimization problem depends on the discretization.For example, for n = 56 the Jacobian of F has the structure shown in Figure 10.By using a graph coloring algorithm designed to identify structurally orthogonal columns (we used the one described in 16]), we can determine that this Jacobian can be grouped into 14 sets of structurally orthogonal columns, independent of the size of the problem.As a result, we initialize a 56 14 matrix g$x T to the structure shown in Figure 11.Here every circle denotes the value 1.0.The structure of the resulting compressed Jacobian g$Fval T is shown in Figure 11 as well.Here every circle denotes a nonzero entry.N o w, instead of g$p$ = 56, a size of g$p$ = 14 is su cient, a sizeable reduction in cost.The proper and e cient initialization of ADIFOR-generated derivative codes is described in detail in 4].One issue that deserves some attention is that of error handling.Exceptional conditions arise because of branches in the code or because subexpressions may be de ned but not be di erentiable ( p (x) a t x = 0 , for example).ADIFOR knows when Fortran intrinsics are nondi erentiable, and traps to an error handler if we wish to compute derivatives at a point where the derivatives do not exist.The current error-handling mechanism of ADIFOR is described in 3].

Experimental Results
In this section, we report on the execution time of ADIFOR-generated derivative codes in comparison with divided-di erence approximations of rst derivatives.While the ADIFOR system runs on a SPARC platform, the ADIFOR-generated derivative codes are portable and can run on any computer that has a Fortran-77 compiler.
The problems named \camera", \micro", \heart", \polymer", \psycho", and \sand" were given to us by Janet Rogers, National Institute of Standards and Technology in Boulder, Colorado.The code submitted to ADIFOR computes elementary Jacobian matrices which are then assembled to a large sparse Jacobian matrix used in an orthogonal-distance regression t 6].The code named \shock" was given to us by Greg Shubin, Boeing Computer Services, Seattle, Washington.This code implements the steady shock tracking method for the axisymmetric blunt body problem 46].The Jacobian has a banded structure.The compressed Jacobian has 28 columns, compared to 190 for the \normal" Jacobian.The code named \adiabatic" is from Larry Biegler, Chemical Engineering, Carnegie-Mellon University and implements adiabatic ow, a common module in chemical engineering 47].Lastly, the code named \reactor" was given to us by Hussein Khalil, Reactor Analysis and Safety Division, Argonne National Laboratory.While the other codes were used in an optimization setting, the derivatives of the \reactor" code are used for sensitivity analysis to ensure that the model is robust with respect to certain key parameters.
Tables 1 and 2 summarize the performance of ADIFOR-generated derivative codes with respect to divided di erences.These tests were run on a SPARCstation 1, a SPARC 4/400, or an IBM RS6000/550.We u s e d di erent m a c hines because the codes were submitted from di erent computing environments.The numbers  1 are for 10,000 evaluations of the Jacobian, while those in Table 2 are for a single evaluation of the Jacobian.The column of the tables labeled \ADIFOR Improvement" indicates the percentage improvement o f t h e running time of the ADIFOR-generated derivative c o d e o ver an approximation of the divided-di erence running times.For the \shock" code, we had a derivative code based on sparse divided di erences supplied to us.In the other cases, we estimated the time for divided di erences by m ultiplying the time for one function evaluation by t h e n umber of independent v ariables.This approach is conservative, yet fairly typical in an optimization setting, where the function value already has been computed for other purposes.An improvement greater than 0% indicates that the ADIFOR-generated derivatives ran faster than divided di erences.
The percentage improvement for the \camera" problem indicates a stronger-than-expected dependence of running times of ADIFOR-generated code on the choice of compiler and architecture.In fact, the 69% degradation in performance on the \camera" problem is a result of the SPARC compiler's missing an opportunity to move loop-invariant cos and sin invocations outside of loops, as occurs in the following ADIFOR-generated code: C cteta = cos(par(4)) d$0 = par(4) do 99969 g$i$ = 1, g$p$ g$cteta(g$i$) = -sin(d$0) * g$par(g$i$, 4) 99969 continue cteta = cos(d$0) If we edit the ADIFOR-generated code by hand to extract the invariant expression, we get a similar performance on the SPARC.Moving loop-invariant code outside of loops is one of the performance improvements that we will implement in later versions.We see that already in its current v ersion, ADIFOR performs well in competition with divided di erence approximations.It is up to a factor of three faster, and never worse by more than a factor of 1.69.This improvement w as obtained without the user having to make a n y modi cations to the code.We also see that ADIFOR can handle problems where symbolic techniques would be almost certain to fail, such a s t h e \shock" or \reactor" codes.The ADIFOR-generated derivative codes had up to four times as many lines of code as the code that was submitted to ADIFOR.
The performance of ADIFOR-generated derivatives can even be better than that of hand-coded derivatives.For example, for the swirling ow problem mentioned in the preceding section, we obtain the performance shown Figure 12.  Figure 12 shows the performance of the hand-derived derivative code supplied as part of the MINPACK-2 test set collection 40], and that the ADIFOR-generated code, properly initialized to exploit the sparsity structure of the Jacobian.On an RS6000/320, the ADIFOR-generated code signi cantly outperforms the hand-coded derivatives.On one processor of the CRAY Y-MP/18, the two approaches perform comparably.The values of the derivatives computed by the ADIFOR-generated code agree to full machine precision with the values from the hand-coded derivatives.The accuracy of the nite di erence approximations, on the other hand, depends on the user's careful choice of a step size.
We conclude that ADIFOR-generated derivatives are a more than suitable substitute for hand-coded or divided-di erence derivatives.Virtually no time investment is required by the user to generate the codes.In most of our example codes, ADIFOR-generated codes outperform divided-di erence derivative approximations.In addition, the fact that ADIFOR computes highly accurate derivatives may signi cantly increase the robustness of optimization codes or ODE solvers, where good derivative v alues are critical for the convergence of the numerical scheme.

Future Work
We are planning many improvements for ADIFOR.The most important are second-and higher-order derivatives, automatic detection of sparsity, increased use of the reverse mode for better performance, and integration with Fortran parallel programming environments such a s F ortran-D 26].Second-order derivatives are a natural extension, and this functionality is required for many applications in numerical optimization.In addition, for sensitivity analysis applications, second derivatives reveal correlations between various parameters.While we currently can just process the ADIFOR-generated code for rst derivatives, much can be gained by computing both rst-and second-order derivatives at same time 30,44].
The automatic detection of sparsity is a functionality that is unique to automatic di erentiation.Here we exploit the fact that in automatic di erentiation, the computation of derivatives is intimately related to the computation of the function itself.The key observation is that all our gradient computations have t h e form vector = X i scalar i vector i : By merging the structure of the vectors on the right-hand side, we can obtain the structure of the vector on the left-hand side.In addition, the proper use of sparse vector data structures will ensure that we perform computations only with the non-zero components of the various derivative v ectors.We can improve the speed of ADIFOR-generated derivative code through increased use of the reverse mode.The reverse modes requires us to reverse the computation from a trace of at least part of the computation which w e later interpret.If we can accomplish the code reversal at compile time, we can truly exploit the reverse mode, since we do not incur the overhead that is associated with run-time tracing.
ADIFOR currently does a compile-time reversal of composite right-hand sides of assignment statements, but there are other syntactic structures such as parallel loops for which this could be performed at compile time.In a parallel loop, there are no dependencies between di erent iterations.Thus, in order to generate code for the reverse mode, it is su cient to reverse the computation inside the loop body.This can easily be done if the loop body is a basic block.The potential of this technique is impressive.Hand-compiling reverse mode code for the loop bodies of the torsion problem, another problem in the MINPACK-2 test set collection, we obtained the performance shown in Figure 13.This gure shows the ratio of gradient/function evaluation on a Solbourne 5E/900 for the current ADIFOR version, and for a hand-modi ed ADIFOR code that uses the reverse mode for the bodies of parallel loops.If nint is the number of grid points in each dimension, then the gradients are of size nint nint.
Approximation of the gradient b y divided di erences costs nint nint function evaluations.Hence, we see that the current A D I F OR is faster than divided di erence approximations by a factor of 70 on a problem of size 4900 and using the reverse mode for loop bodies, we can compute the gradient in about six to seven times the cost of a function evaluation, independent of the size of the problem.Taken together, these points mean that for the problem of size 4900, we can improve the speed of derivative computation by o ver two orders of magnitude compared to divided-di erence computations.We stop at a problem of size 4900 only because at that size, we ran out of memory.
These examples for which w e h a ve \compiled" ADIFOR-generated code by hand show again the promise of viewing automatic di erentiation as a syntax transformation process.By taking advantage of the context (parallel loops, in this case) of a piece of code, we c a n c hoose whatever automatic di erentiation technique is most applicable, and generate the most e cient code for the computation of derivatives.In many applications where the computation of derivatives currently requires the dominant portion of the running time, the use (2) * rx(1) + x(1) * rx(2)

Figure 3 .
Figure 3. Sample program of Figure 2 augmented with derivative c o d e This mode of automatic di erentiation, where we maintain the derivatives with respect to the independent variables, is called the forward m o de of automatic di erentiation.The situation gets more complicated when the source statement is not just a binary operation.For example, consider the statement

w = t 4 Figure 5 .Figure 6 .
Figure 5. Reverse mode computation of rwIn Figure6, we juxtapose the derivative computations for w = -y/(z*z*z) based on the pure forward mode and those based on the reverse mode for computing rw.F or the reverse mode, we performed some simple optimizations such as eliminating multiplications by 1 and additions to 0.
Figure 7. Original code for problem C2 Nominating Y as independent v ariables, and YP as dependent ones, ADIFOR produces the code shown in Figures8 and 9.We use the dollar sign $ to indicate ADIFOR-generated names.In practice, ADIFOR generates variable names which do not con ict with any names appearing in the original program.

Figure 9 .
Figure 9. ADIFOR-generated code for problem C2 (part 2)We see that the derivative codes have a gradient object associated with every dependent v ariable.Our convention is to associate a gradient g$<var> of leading dimension ldg$<var> with variable <var>.T h e calling sequence of g$foo$n is derived from that of foo by inserting an argument g$p$ denoting the length of the gradient v ectors as the rst argument, and then copying the calling sequence of foo, inserting g$<var> and ldg$<var> after every active v ariable <var>.P assive v ariables or those not of REAL or DOUBLE PRECISION type are left untouched.

Figure 11 :
Figure 11: Left: Structure of g$x TRight: Structure of g$Fval T

Figure 12 :
Figure 12: Swirling ow Jacobian .-. ADIFOR w/ loops done in reverse mode Solbourne 5E/900 ratio number of grid points in each dimension

Figure 13 :
Figure 13: Ratio of gradient/function evaluation

Table 1 :
Performance of ADIFOR-generated derivative codes compared to divided-di erence approximations on orthogonal-distance regression examples for 10,000 Jacobian evaluations

Table 2 :
Performance of ADIFOR-generated derivative codes compared to divided-di erence approximations for a single Jacobian evaluation