Estimation of Open Boundary Conditions for an Internal Tidal Model with Adjoint Method : A Comparative Study on Optimization Methods

Based on an internal tidal model, the practical performances of the limited-memory BFGS (L-BFGS) method and two gradient descent (GD) methods (the normal one with Wolfe’s line search and the simplified one) are investigated computationally through a series of ideal experiments in which the open boundary conditions (OBCs) are inverted by assimilating the interior observations with the adjoint method. In the case that the observations closer to the unknown boundary are included for assimilation, the LBFGS method performs the best. As compared with the simplified GD method, the normal one really uses less iteration to reach a satisfactory solution, but its advantage over the simplified one is much smaller than expected. In the case that only the observations that are further from the unknown boundary are assimilated, the simplified GD method performs the best instead, whereas the performances of the other two methods are not satisfactory. The advanced L-BFGS algorithm and Wolfe’s line search still need to be improved when applied to the practical cases. The simplified GD method, which is controllable and easy to implement, should be regarded seriously as a choice, especially when the classical advanced optimization techniques fail or perform poorly.


Introduction
In the recent years, with the rapid advances in ocean observing systems, increasingly more oceanic observation data have become available.At the same time, mainframe supercomputers have become more powerful, with improvements in memory size, the introduction of multiprocessor capabilities as well as enhanced processor speed.These advances have provided us with an opportunity to improve the precision of numerical simulations of the ocean with these available observation data, for both operational and research purposes.Indeed, this is an active topic of a fast-growing research field, data assimilation, which is an effective method for marine research, and it has become widely used in meteorological and oceanographic predictions in recent years.Among all data assimilation methods, four-dimensional variational (4D-Var) data assimilation is often considered as one of the most effective and powerful approaches developed over the past two decades.It is an advanced data assimilation method that involves the adjoint technique.This method has the advantage of directly assimilating various observations distributed in time and space into the numerical model and can maintain dynamical and physical consistency with the model at the same time.Thus, 4D-Var has been widely applied to meteorological and oceanographic data assimilation, sensitivity studies, and parameter estimation.The issue of 4D-Var applied to the shallow water equations model has been the subject of numerous works (see, e.g.,  and references therein).
In this paper, we are interested in solving a large-scale optimization problem minimizing a cost functional where p represents the control variables and is defined by the open boundary conditions (OBCs) for a regional internal tidal model, which are denoted by a series of Fourier coefficients in the present work,  is the time, [  ,   ] denotes the assimilation window where   and   are the initial and final time, respectively, and x() and x() are the time-dependent state variable in an -dimensional Euclidean space R  and the time-dependent observation variable in an -dimensional Euclidean space R  , respectively.The operator  : R  → R  represents a projection from the -dimensional space (R  ) of the model solution x to the -dimensional space (R  ) of observations.Superscript Τ indicates transpose.The components of x are values of the various model fields (elevation, velocity, temperature, salinity, etc., alone or in combination) at each of the model grid locations.The number of components of x is denoted by , and the number  represents the number of observations at any given time.In general,  is larger than .In fact, the objective function  is the weighted sum of squares of distance between the model solution and available observations distributed in space and time. x is the weighting matrix and theoretically should be the inverse of observation error covariance matrix [1,2].However, determining the "exact" form for  x is far from easy [3].Since this work represents a preliminary study in the application of an adjoint assimilation model, we proceed under the best and even unrealistic scenario by assuming that there are no observation errors and that the model solutions will have a perfect fit to the observations such that min p (p) = 0. Therefore,  x is simplified to be unit matrix [30].The control variables p and the state vector x satisfy the time-dependent model equations of the form x  =  (x, p) , where  : R  → R  is model vector function of x.In general, the model equations ( 2) are nonlinear and are assumed to be twice continuously differentiable.The objective of the adjoint assimilation method is to find model control variables p that minimize the cost functional .Note that  is only a function of the control variables p because x is uniquely defined by the model equations (2) for any prescribed p.The control variables can be initial conditions, boundary conditions or parameters to be estimated, or combination of them.In a regional ocean model, the open boundary conditions, which must be prescribed to complete the model description at non-land boundaries, are very important and have a critical impact on the modeling results.Therefore, for simplicity, only OBCs are taken as control variables in this paper.A number of studies have been concerned about the OBC estimation with adjoint method.For example, the work of Lardner et al. [4] discussed the optimal control of OBC in the channel using a 2D adjoint tidal model.Seiler [5] performed a series of identical twin assimilation experiments using the adjoint method to estimate lateral open boundary values of stream function and relative vorticity for a quasigeostrophic open ocean model.Zou et al. [2] also developed a sequential open boundary control scheme augmenting radiation conditions and applied it to idealized barotropic wind-driven ocean simulations.ten Brummelhuis et al. [6] used the data assimilation technique to estimate the OBCs for a shallow sea model of the entire European continental shelf.Heemink et al. [7] used the adjoint approach for a 3D shallow water flow system (TRIWAQ) to estimate the harmonic constants in the OBCs, the friction parameter, viscosity parameter, and water depth by assimilating tide gauge data as well as altimeter data.Shulman [8] proposed a data assimilation approach to specify open boundary conditions, and their method was then tested by Shulman et al. [9] with respect to the improvement of the model prediction skills of the  2 tidal amplitudes and phases in the Yellow Sea.Taillandier et al. [10] used a fourdimensional variational data assimilation method to control boundary conditions in a nonstratified open coastal model.In the work of Zhang et al. [11], lateral tidal OBCs that forced tides in the internal region were estimated by assimilating predicted coastal tidal elevations into a 2D POM with the adjoint method.Gejadze and Copeland [12] studied the open boundary control problem for free-surface barotropic Navier-Stokes equations with adjoint data assimilation technology.By assimilating the tidal harmonic constants derived from T/P altimeter data with a 3D numerical barotropic adjoint tidal model constructed in [13], Zhang and Lu [14] optimized the OBCs and simulated the  2 tide and tidal current in the Bohai and Yellow Seas.Based on a 2D tidal model, Cao et al. [15] and Guo et al. [16] studied the inversion of OBCs by assimilating the T/P altimeter data with adjoint method.Kazantsev [17] used a variational data assimilation technique to control boundary conditions on rigid boundaries for a shallow-water model.Recently, in the work of Chen et al. [18], an adjoint assimilation model for the simulation of internal tides was constructed and simply tested with a series of ideal experiments in which several prescribed spatial distributions of OBCs were successfully inverted by assimilating the model-generated pseudoobservations.
The previous minimization problem (1) requires largescale optimization techniques.This problem attempts to find a solution of the model equations (2) that best fits, in the generalized least-squares sense, the observations distributed in space and time.In meteorology and oceanography, the fluid dynamics can be enforced through the use of a Lagrangian function constructed by appending the model equations to the cost function as constraints.To solve problem (1), many feasible large-scale optimization methods can be found in [31].However, the number of studies discussing the performances of various optimization methods in the meteorological and oceanographic application is still relatively small.Wang et al. [19] proposed an adjoint truncated Newton algorithm for the large-scale unconstrained minimization in the meteorology application and applied this algorithm to a limited-area shallow-water equation model with modelgenerated data where the initial conditions served as control variables.Subsequently, Wang et al. [20] presented a new adjoint Newton algorithm for carrying out largescale unconstrained optimization required in variational data assimilation and applied this algorithm to three 1D models and to one 2D limited-area shallow-water equation model with both model-generated and First Global Geophysical Experiment data.Zou et al. [21] compared the performances of several limited-memory quasi-Newton and truncated Newton methods for unconstrained nonlinear optimization on some large-scale problems in oceanography and meteorology.Their results showed that among the tested limitedmemory quasi-Newton methods, the limited-memory BFGS (L-BFGS) method of Liu and Nocedal [32] has the best overall performance for the problems examined, and the numerical performance of two truncated Newton methods, differing in the inner-loop solution for the search vector, is competitive with that of L-BFGS.More recently, Alekseev et al. [33] compared the performance of several advanced large-scale minimization algorithms, including the conjugate gradient method, quasi-Newton (BFGS), the limited-memory quasi-Newton (L-BFGS), Hessian Free Newton method, and a new hybrid algorithm, applied to the minimization of the cost functional in the solution of ill-posed inverse computational fluid dynamics problems related to parameter estimation.In the work of [14], the comparison of a simple gradient descent (GD) method with the L-BFGS algorithm was conducted within a 3D barotropic tidal model, and the results demonstrate that, compared with the steepest descent method, the L-BFGS really uses much fewer steps to reach a minimum, while the values of the minimized cost function and inverted Fourier coefficient are almost the same.In the work of [18], the L-BFGS version of Liu and Nocedal [32], combined with the adjoint assimilation technique, was applied to the optimization of the OBCs for an internal tidal model and was proved to be very efficient.
The main objective of this paper is to make a computational investigation on the performances of the L-BFGS method and two versions of GD method.All these methods do not require any evaluations of the Hessian matrices but gradient vectors and thus are computationally feasible.In the present work, the number of the control variables (OBCs) is often large.In this case, the limited-memory methods are suggested, and one of the most commonly used limited-memory methods is the L-BFGS method [31,32].The L-BFGS method is an adaptation of the standard BFGS method to large-scale optimization problems.This method has been proved to be globally convergent for convex objective functions and provides a fast rate of linear convergence as well as requiring minimal storage.As the competitor, we choose the GD method.The GD method is a fundamental and probably the earliest method for nonlinear optimization so that it attracts many researchers' attention since its proposition.A recent and interesting report on the GD method can be found in [31,34].The GD method has the negative gradient direction as the search direction and is one of the simplest minimization algorithms that require calculation of derivatives.The GD method is particularly useful when the dimension of the problem is very large [35].Two versions of GD method are considered here, differing in the formulation of the step length.One normally uses the step length satisfying the strong Wolfe conditions [31,36,37] which have been widely used in line search (see [38,39] and references therein).And the other is a simplified GD method without the line search which has been used in a relatively small number of studies [13,14,22,23].In this paper, based on the model constructed by Chen et al. [18], the performances of the L-BFGS method and the two versions of GD method are compared through a series of ideal experiments in which the OBCs for a numerical internal tidal model are estimated by assimilating the interior observations with the adjoint method.
This paper is organized as follows.We start with a brief introduction of the two-layered internal tidal model in Section 2. For the details of the model, we refer interested readers to [18].The optimization methods for comparison, including the L-BFGS method and two GD methods, are described briefly in Section 3. In Section 4, as illustrative numerical examples, a series of ideal experiments are carried out to invert the prescribed OBCs with the different methods examined here, and the experimental results are presented.Finally, we make a summary and draw some conclusions in Section 5.

Forward Model.
A two-layer version of the numerical internal tidal model described in [18] is considered in this paper.Assuming that the potential density   ( = 1, 2) in each layer is constant, the layer-averaged, nonlinear, timedependent continuity and momentum equations of each layer subject to the hydrostatic approximations are derived from the primitive 3D governing equations.Here, the subscripts 1 and 2 refer to the upper and lower layers, respectively.Using spherical coordinates in the horizontal direction and isopycnic coordinates in the vertical one, we obtain the internal mode equations as follows: upper layer: lower layer: Here,  is the time,  and  are the eastern longitude and northern latitude, respectively, (, , ) and V(, , ) are horizontal velocities in  and , respectively, (, , ) is the time-varying layer mass, and where ℎ  and   are the undisturbed layer thickness and interface (surface for  = 1) elevation above the undisturbed level, respectively. is the radius of the earth,  the gravitational acceleration,  the Coriolis parameter, and  = 2Ω sin , where Ω represents the angular speed of Earth's rotation,  ℎ the horizontal eddy viscosity coefficient, Δ the Laplace operator, and V and  are the interface and bottom friction coefficient, respectively,  = ( 1 +  2 )/2, and ℎ = (ℎ 1 + ℎ 2 )/2.In the forward model ,  and V are the main outputs and called the state variables in this paper.By defining the barotropic currents we can derive a 2D external mode from the internal mode equations.For further details about the external mode, we refer the interested reader to the previous work of [18].Usually, the model domain ∑ is defined within a certain range of space and time, enclosed by the initial and boundary conditions.In the present model, a zero field is assigned to the model state variables as the initial conditions.The model is run for several tidal cycles, and the simulation results in the last cycle are taken as the model output.The boundary located in the wet grid is treated as the open boundary, and the boundary located in the dry grid (e.g., the island and the land) is treated as the closed boundary.The details of the boundary conditions are described as follows.
The Flather condition which was originally proposed by Flather [40] yielded good results [18,41] when applied to the open boundaries in this model.In this paper, an adaptation of the Flather condition used by [18] is installed in the external mode of the forward model, which is given as follows: where η is external data beyond the model boundary representing the clamped surface elevation relating to the boundary barotropic tidal force and  = ℎ +  1 is the time-varying total water depth.The sign in (8) depends on the boundary location (positive for eastern and northern boundaries; negative for western and southern boundaries).For the internal mode, the relaxation conditions, which have been found to have a robust performance in a variety of situations [42,43], are applied at open boundaries.In this condition, a relaxation region of several grids close to the boundary is defined.Within this region, total tidal values including the interface elevation and fluid velocity of both barotropic and baroclinic tides are first calculated using the standard discretization and then relaxed towards the barotropic values as follows: to give the final values .Here,  represents the tidal states   ,   , or V  ,   and   represent the barotropic and baroclinic tides, respectively, and (, ) ∈ [0, 1] is the relaxation factor in grid (, ).In our model,  is chosen to be increased linearly from 0 to 1 while the grid is getting close to the open boundary.
Closed boundary conditions for both the internal and the external modes are zero flow normal to the coast; that is, ⃗  ⋅ ⃗  = 0 for the grid points at closed boundary, where ⃗  is the outward unit vector and ⃗  = (, V) is the velocity vector.

Adjoint Model.
The adjoint method is a powerful tool for parameter estimation.The basic idea of the adjoint method is quite simple: a model is defined by an algorithm and its control variables including initial conditions, boundary conditions, and empirical parameters.The cost function that measures the data misfit between the model output and observations is minimized through optimizing the control variables.In detail, the cost function decreases along a certain search direction which can be calculated with a certain optimization algorithm according to the gradient of cost function with respect to the control variables, and this gradient is calculated by what has been known as the adjoint model.Based on the governing equations (3a)-(4c) of the forward model, its adjoint model can be constructed as follows.The details of the adjoint model in a generalized form (not limited to the two-layer case) can be found in the work of [18].Concretely, the cost function is defined by where  is layer index, ∑ denotes the model domain of both time and space, p represents the generalized control variables and is defined by the open boundary conditions (OBCs) for a regional internal tidal model, which are denoted by a series of Fourier coefficients in the present work,   and V  are the models simulated, and û and V are the observations.In (10), the variables   and V  can be uniquely determined by the control p via the model equations (3a)-(4c).Hence, the functional  is implicitly dependent on p.
The Lagrangian function is defined by  (, , V;   ,   , V  ; p) =  (, , V; p) where   ,   , and V  are called adjoint variables of the state variables   ,   , and V  , respectively.F (3a) , F (3b) , . . ., F (4c) denote the left-hand side of (3a), (3b), . .., (4c), respectively.According to the typical theory of Lagrangian multiplier method, we have the following first-order derivatives of Lagrangian function with respect to all the variables and parameters: Equations ( 12) return the governing equations (3a)-(4c).The adjoint equations can be derived using (13).From ( 14), we can obtain the gradients of the cost function with respect to control variables.Similar to the forward model, the adjoint model also consists of the internal and external modes.Actually, the equations derived from (13) are considered as the internal mode, and the external mode can be derived from the internal mode in a similar way the external mode of the forward model is derived.The details of both the internal and external modes of the adjoint model can be found in [18].
2.3.Discretization.Several numerical methods have been widely used in the discretization of time-dependent 3D primitive equations.The time integration schemes of these methods can be fully explicit [44], semi-implicit [45,46] or fully implicit [47].For large-scale oceanic problems, the applications of 3D models are becoming a reality with the aid of modern computers.The fully explicit finite difference method is relatively simple to implement, except that its time step is strictly restricted by the Courant-Friedrich-Lewy (CFL) stability criterion [48].At present, many existing ocean models are based on an Alternating Direction Implicit (ADI) method which was proposed for the approximate solution of the shallow-water equations in [49].ADI method results in computational efficiency superior to fully explicit methods because their improved stability allows large time steps to be employed, and it is also easy for implementation (see [13,14]).Since the model must simulate fields of both velocity and elevations in each isopycnic layer, a technique known as external-internal mode splitting has been used in several ocean models by Simons [50].Complete details on the model discretization were given in [18].

2.4.
Gradients.Among all the control variables in this model, the OBCs are the most important and have critical impacts for a regional tidal simulation.Solutions in model interior are uniquely determined by the tidal OBCs once the initial conditions and other parameters have been determined.The Flather condition (8) can be determined once the external data η is described.Hence, the determination of the open boundary conditions for the present model is equivalent to the estimation of the tidal force η.Assume that at a certain open boundary grid point (  ,   ), the  2 tidal force η at the th time step is subject to where  denotes the frequency of Details on computing     ,  were given in [18].

Mathematical Problems in Engineering
Having determined the adjoint variables (  ,   and V  ) with the adjoint model, the gradients of cost function with respect to the OBCs (i.e., the Fourier coefficients  and ) can thus be calculated.Then, the OBCs are optimized with some minimization algorithms which will be described in the following section.

Optimization Methods
In general, numerical methods solving the minimization problem (1) have the following iterative formula: where p  ,   , and d  are current iterative point, a positive step length, and a search direction, respectively.For simplicity, (p  ) and ∇(p  ) are denoted by   and g  , respectively.There are many formulas to define search direction d  in formula ( 17) [31].In this paper, three optimization methods are compared when applied to the assimilation model.They are different in the selection of   or d  .The three methods are given as follows.

L-BFGS Method.
The L-BFGS method is a well-known limited-memory quasi-Newton method for solving largescale problems whose Hessian matrices cannot be computed at a reasonable cost or are not sparse [31].It requires the search direction to be where Here, is the identity matrix, and H 0  is the initial Hessian approximation.Many previous studies have shown that typically 3 ≤  ≤ 7, where  > 7 does not improve the performance of L-BFGS.In this paper, the L-BFGS version of Liu and Nocedal [32] is employed, and the Fortran codes are authored by Nocedal [51], while and the number of corrections  used in the L-BFGS update is taken as 5 [33].

Gradient Descent Method with Wolfe's Line Search (GDM-W). This method requires the search direction to satisfy
while the step length   satisfies the following strong Wolfe conditions [36,37]: with 0 <  1 <  2 < 1.The line search routine enforcing the strong Wolfe conditions can also be found in [31].

Simplified Gradient Descent Method (GDM-S).
This method requires the search direction to satisfy while the step length   is simply defined by where the denominator ‖g  ‖ 2 denotes the  2 norm of the gradient g  and  0 is an empirical constant.This method has been used in a relatively small number of studies (see [13,14,23,52]) showing that this method is indeed feasible and effective in practical application.In this work, for simplicity and no loss of generality, only the eastern boundary is treated as the source of tidal force driving the forward model (meaning that both Fourier coefficients  and  are equal to zero on the other three boundaries), and only the estimation of Fourier coefficient  is studied.To be specific, on the eastern boundary, the Fourier coefficient  is always treated as the known parameter (equal to 0 in this work), and the Fourier coefficient  to be estimated here is assumed to be spatially distributed and is prescribed with a spatial distribution which is constructed by the trigonometric function (see Figure 1(c)).Some grids are randomly picked out from the 51 × 41 horizontal grids and treated as the observation positions (see Figure 1(a)).The forward model is run with the prescribed OBCs, and the model-generated results of the surface currents (i.e., currents in the upper layer) at these observation points are taken as the pseudo-observations (referred to as observations for brevity hereafter).Then, the OBCs can be optimized with the following steps.

Numerical Experiments
Step 1.An initial value is chosen empirically and assigned to the unknown control variables.
Step 2. Perform the simulation by running the forward model, and the simulation results, mainly the state variables especially including the current velocity at observation points, are obtained.The value of the cost function is worked out in this step.
Step 3. The difference between simulation results and observations serves as the external force driving the adjoint model.Also, the adjoint variables in a period of  2 tide are calculated through backward integration of the adjoint equations.
Step 4. Having known both the state and adjoint variables from Steps 2 and 3, respectively, the gradients with respect to control variables to be inverted are calculated by formula (16).
Step 5. Update the unknown control variables with a certain optimization algorithm.
Step 6.If some stopping criterion is satisfied, then stop and return the optimized control variables, otherwise; replace the initial value with the new control variables and go to Step 2.
With the procedure previous repeated, the OBCs will be optimized continuously, and the difference between simulated values and observations will be diminished.Meanwhile, the difference between the prescribed and the inverted OBCs will also be decreased.The initial value of  is taken as zero in each experiment.For all methods examined here, the total number of iterations is allowed to be 100 at most, and we report the cost function values, the gradient  2 norms, and the inversion results obtained by each method.Note that the L-BFGS iteration may stop before it reaches the maximum number due to the termination criterion installed in the L-BFGS routine authored by Nocedal [51].The number of corrections  used in the L-BFGS update is taken as 5.For the GDM-W, Wolfe's line search is carried out with parameters  1 = 10 −4 and  2 = 0.9.For the GDM-S, the constant  0 is experientially chosen to be 0.1.

Results.
As shown in Figure 1(a), all observations can be divided into two groups: O1 which are closer to the eastern boundary and O2 which are further from the eastern boundary.Accordingly, the experiments are divided into three groups: Group 1 (both O1 and O2 are used), Group 2 (only O1 is used), Group 3 (only O2 is used).After assimilation, the inversion error (in the root mean square sense) and correlation coefficient between the prescribed and inverted OBCs are reported in Table 1, and the inverted OBCs are compared with the prescribed ones in Figure 2. All experiment results will be examined group by group as follows.
First, let us examine the performances of the three methods in the experiments of Group 1 in which observations of both O1 and O2 are assimilated.The results of Group 1 shown in Figure 2(a) and Table 1 indicate that with sufficient observations assimilated, the inversion results obtained by using all three methods are satisfactory in terms of the inverted OBC curve as well as the inversion error and the correlation coefficient.And especially, the solution obtained with the L-BFGS method is the closest to the exact solution (the prescribed OBCs).The variation of the cost function normalized by its initial value, that of the  2 norm of the gradient of the cost function with respect to the OBCs, and that of the inversion error are plotted in Figures 3(a), 3(b), and 3(c), respectively, as a function of the iteration number.The optimization procedure clearly shows that among the three methods examined, the L-BFGS performs the best and its convergence rate is much faster than those of the GD methods, which is consistent with the classical theories about the convergence rate of the quasi-Newton methods and GD method [31].Now, let us pay more attention to the results obtained with GDM-W and GDM-S.The comparison between the two versions of GD method is rather interesting.At the beginning stage of the iteration, the GDM-W has a faster convergence rate than the GDM-S.After more than 20 iteration steps, the result of GDM-S begins to be better than that of the GDM-S.Then, after a neighborhood of the exact solution is reached, a slow convergence of GDM-S occurs in the following iterations, also with oscillations in both of the cost function and gradient norm which may be caused by the constant  0 making the step length relatively overlarge when the solution is close to the exact one.In contrast, the GDM-W keeps the cost function and the gradient norm declining on the whole due to its step length satisfying the strong Wolfe conditions (21a) and (21b).It is worth noting that in the latter part of the assimilation procedure (after about 30 iteration steps when the oscillations in the cost function and in the gradient norm for the GDM-S appear (see Figures 3(a) and 3(b)), the convergence rate of the GDM-W is a little faster than that of the GDM-S.This leads to the final values of both the cost function and gradient norm obtained with the GDM-W slightly smaller than those obtained with the GDM-S.On the other hand, however, the comparison between the inversion errors for the GDM-W and GDM-S presents a different pattern.As shown in Figure 3(c), during the beginning several iterations, the descent rate for both GD methods is very rapid and the inversion error for the GDM-W has a faster descent rate than that for the GDM-S.This state continues until the 25th iteration step when the inversion error for the GDM-W begins to exceed that for the GDM-S, and these two methods almost begin to have the same slow descent rate of the inversion error.Finally, the inversion result obtained with the GDM-S is better than that obtained with the GDM-W (also see Figure 2(a) and Table 1).For all experiments in Group 2, only the observations of O1 are assimilated.For this group, the variation of the cost function normalized by its initial value, that of the  2 norm of the gradient of the cost function with respect to the OBCs, and that of the inversion error are plotted in Figures 4(a), 4(b), and 4(c), respectively, as a function of the iteration number.As we can see, the assimilation procedure of Group 2 shown in Figure 4 follows a similar pattern as that of Group 1 shown in Figure 3, although the number of observations for Group 2 is only half of that for Group 1.This suggests that removing the observations that are far from the unknown boundary may have very little effect upon the experiment results, whereas the observations that are closer to the eastern open boundary play a much more important role in the estimation of the OBCs in our model.Probably, this may be the main reason why we can obtain the satisfactory results in Group 1.
To sum up, from the previous results, we have seen that the L-BFGS method can hold its advantage over the other two versions of GD method in terms of both the optimization procedure and the final results providing the observations that are closer to the unknown boundary are included for assimilation, and the L-BFGS method is recommended in this case.Meanwhile, both results obtained with the two versions of GD method are also satisfactory.The comparison between the results obtained with the GDM-W and GDM-S indicates that the Wolfe's line search is effective in finding a reasonable step length that can achieve adequate reductions in the cost function and can guarantee a rapid rate of convergence at the beginning of iteration.This allows the GDM-W to gain an advantage over the GDM-S in terms of the final values of the cost function and the gradient norm.Nevertheless, their difference is smaller than expected, and finally, the GDM-S excels the GDM-W in the inversion result which is an essential criterion testing whether the whole model is valid.Therefore, the GD method simplified with a constant step length is competitive to that with the Wolfe's line search in this case.
It is very interesting to find that when we come to the results of Group 3 which are dramatically different from those of Group 2 although the same number of observations is used in these two groups.In Group 3, only the observations O2 that are further from the unknown boundary are assimilated.For this group, the variation of the cost function normalized by its initial value, that of the  2 norm of the gradient of the cost function with respect to the OBCs, and that of the inversion error are plotted in Figures 5(a), 5(b) and 5(c), respectively, as a function of the iteration number.From the optimization procedure shown in Figure 5, we can clearly see that, in this case, the L-BFGS method stops at the 36th iteration step and fails to converge, and the GDM-W has an unacceptable slow convergence rate, both leading to a relatively large discrepancy between the inverted and prescribed OBCs (see Figure 2(c) and Table 1), whereas the simple GDM-S performs the best and, more importantly, can still yield satisfactory inversion result (see Figure 2(c) and Table 1).The performance of GDM-S shown in Figure 5 is more interesting.After about 10 steps of iteration in the beginning, the cost function and the gradient norm begin to vary with oscillations which become increasingly larger as the iteration goes on, but meanwhile, their minimum values are getting smaller.At last when the maximum of 100 iterations is reached, the cost function and the gradient norm are reduced by more than 3 orders of magnitude and more than 1.5 orders of magnitude, respectively, compared with their initial values, which are comparable to the results of GDM-S in Groups 1 and 2. On the other hand, the variation of the inversion error for GDM-S also contains oscillations, but compared with the oscillations in the cost function and gradient norm (see Figures 5(a) and 5(b)), the ones in the inversion error are much smaller, and therefore the variation curve looks much smoother (see Figure 5(c)).This may be caused by the high complexity of the cost function in the control variable space if the observations to be assimilated are far from the unknown boundary, and this high complexity may also be a possible reason for the failure of the L-BFGS method as well as for the inefficiency of Wolfe's line search in Group 3. The results of Group 3 indicate that the far distance between the observations and the unknown boundary has almost little effect on the validity and feasibility of the GDM-S, and therefore the simplified GD method is recommended in this case, instead of the L-BFGS method.

Summary and Conclusions
In this paper, based on the numerical internal tidal model constructed by Chen et al. [18], the practical performances of the L-BFGS method (given by Liu and Nocedal [32]) and two versions of GD method are compared and investigated computationally through a series of ideal experiments in which the OBCs are estimated by assimilating the interior observations with the adjoint method.The two GD methods include the normal one with Wolfe's line search for the step length and the simplified one with a fixed step length which should be predetermined by modeler.The cost function, the gradient norm, and the inversion result are reported and examined for each experiment.
According to the locations of the observations assimilated, all observations can be divided into two groups.Accordingly, all experiments are divided into three groups.In both of the first two groups, the observations that are closer to the unknown boundary are included for assimilation.In this case, as expected, the L-BFGS method has a remarkably better performance than the GD methods, not only in terms of the convergence rate but also in terms of the final result.This is consistent with the classical theory about the convergence properties of the L-BFGS method and the GD method.On the other hand, compared with the simplified GD method, the normal one with Wolfe's line search can really use fewer iteration steps to reach a satisfactory solution, but the values of the minimized cost function and the gradient norm are almost the same.Although great computational efforts have been made to implement Wolfe's line search to optimize the step length, the effect is much smaller than expected, and even the normal GD method finally yields a worse inversion result than the simplified one.In the third group of experiments, only the observations that are further from the unknown boundary are assimilated.In this case, the simplified GD method remains effective and yields satisfactory results, whereas the L-BFGS method fails to converge and the GD method with Wolfe's line search presents an unacceptable slow convergence rate.This demonstrates that in the practical application, the simplified GD method, which is controllable and easy to implement, is competitive to both the classical L-BFGS method and the normal GD method with Wolfe's line search.This suggests that when applied to the practical cases, the advanced L-BFGS algorithm and Wolfe's line search may still need to be improved, while how to take sufficient advantage of some simple methods is still worth investigating.Nonetheless, the simplified GD method should not be ignored and should be regarded seriously as a choice, especially when the classical advanced optimization techniques fail or perform poorly.Furthermore, how to take sufficient advantage of some other simplified methods is also worth investigating.

4. 1 .
Numerical Implementation.The performances of L-BFGS, GDM-W, and GDM-S are compared by a set of ideal experiments.The computing domain has 51 × 41 horizontal grids (with the horizontal resolution 10  × 10  ) and two vertical isopycnic layers (Figures1(a) and 1(b)).The maximum undisturbed water depth is 6000 m, and the undisturbed interface is placed at the depth of 2000 m.The 2D bottom topography is constructed by the superposition of an inclined plane and a Gaussian surface (Figure1(b)).The angular frequency of  2 tide is 1.4050789025 × 10 −4 s −1 , and the whole-time step is 496.863s (1/90 of the period of  2 tide) for both external and internal modes.The horizontal eddy viscosity coefficient is chosen to be  ℎ = 1000.The coefficients of bottom and interface frictions are taken as  = 0.0025 and  V = 0.003, respectively.As shown in Figure1(a), all four boundaries are open and installed with the passive Flather condition allowing the outgoing wave to radiate out of the computing domain through the open boundaries.

Figure 1 :
Figure 1: Experiment design.(a) Plane view of the computing domain and observation locations.The black dots and the black open circles represent the locations of the observations O1 and O2, respectively.The white dots and the white open circles denote the open boundaries and the open boundary points where the tidal force is installed, respectively.The white dashed line indicates the position of the longitudinal section which will be shown in (b).(b) Bottom topography along the dashed line shown in (a) and the location of the undisturbed interface between the two layers shown as the dashed line.(c) Prescribed spatial distribution of OBCs (Fourier coefficient  specially here) along the eastern boundary.

Figure 2 :
Figure 2: Inverted spatial distribution of OBCs (Fourier coefficient  specially here) compared with the prescribed one.

Figure 3 :
Figure 3: Optimization history for experiments of Group 1, about (a) the cost function normalized by its initial value  0 , (b) the  2 norm of the gradient of the cost function with respect to the OBCs, and (c) the inversion error (in the root mean square sense) between the inverted and prescribed OBCs.

Figure 4 :Figure 5 :
Figure 4: As in Figure 3 but for the experiments of Group 2.
2 constituent, Δ is the time step length, and    ,  and    ,  are the Fourier coefficients as well as tunable parameters in the model.The gradients of cost function with respect to    ,  and    ,  can be deduced from (14) which yields     ,  + ∑ ,  cos (Δ) = 0,     ,  + ∑      ,  sin (Δ) = 0.

Table 1 :
Inversion error (in the root mean square sense) and correlation coefficient between the inverted and prescribed OBCs.