An Evaluation of the Use of Simulated Annealing to Optimize Thinning Rates for Single Even-Aged Stands

We evaluated the potential of simulated annealing as a reliable method for optimizing thinning rates for single even-aged stands. Four types of yield models were used as benchmark models to examine the algorithm’s versatility. Thinning rate, which was constrained to 0–50% every 5 years at stand ages of 10–45 years, was optimized to maximize the net present value for one fixed rotation term (50 years). The best parameters for the simulated annealing were chosen from 11 patterns, using the mean of the net present value from 39 runs to ensure the best performance. We compared the solutions with those from coarse full enumeration to evaluate the method’s reliability and with 39 runs of random search to evaluate its efficiency. In contrast to random search, the best run of simulated annealing for each of the four yield models resulted in a better solution than coarse full enumeration. However, variations in the objective function for two yield models obtained with simulated annealing were significantly larger than those of random search. In conclusion, simulated annealing with optimized parameters is more efficient for optimizing thinning rates than random search. However, it is necessary to execute multiple runs to obtain reliable solutions.


Introduction
Optimizing the thinning rate for single stands is a fundamental problem in improving the economy of forest management.In Japan, the proportion of small-scale forest holders is large.For example, 65.4% of private forest owners in Nagano Prefecture own areas < 1 ha in extent, and 90.6% own areas smaller than 5 ha [1].They therefore have little leeway in performing spatial optimization, and it is important to optimize the thinning rates applied to single stands to improve profitability.Fortunately, recent developments in computing enable rapid quantitative optimization with no experimental knowledge.
The problem may be optimized in a straightforward manner using full enumeration (FE), that is, enumerating and examining all patterns.Because calculation cost increases exponentially with the number of control variables (referred to as "the curse of dimensionality"), this however tends to be impractical.Approaches that reduce the computing costs are therefore required.Many algorithms for solving this problem have been proposed and can be grouped as follows: dynamic programming (DP) [2][3][4][5][6]; nonlinear programming (NP) [7,8]; and heuristic methods such as random search (RS) [9], tabu search, or greedy algorithms [10].Of these, DP and NP have been widely used and have been further developed based on comparison with each other.
Roise [7] reported that two NP approaches, the methods of Hooke and Jeeves [8] and Powell [11], were more efficient than the discrete DP algorithm which served as a baseline comparison.On the other hand, Paredes et al. [4] proposed a new DP algorithm named PATH (projection alternative technique) that was more efficient than traditional DP algorithms.Yoshimoto et al. [3] developed an extended PATH algorithm named MSPATH (multistage PATH), which can reflect long-term effects such as an increase in log price with diameter and showed that it could provide better solutions than PATH [6,12].Yoshimoto et al. [6] developed an improved PATH algorithm named RLS-PATH (regionlimiting strategies PATH) to deal with yield models including many variables and reported that in many cases it provided better solutions than Hooke and Jeeves' method.Each of these methods has been used to optimize thinning rates for single stands (e.g., NP: [13][14][15][16][17][18][19][20][21], DP: [22][23][24][25][26][27][28][29][30][31][32]).In particular, the MSPATH algorithm is more appropriate than PATH when long-term effects caused by competition-density effects or fluctuations of log price with diameter are taken into account [12].Yoshimoto [12] developed a way to apply it to yield models based on stand density management diagram (SDMD) [33], a type of forest growth model incorporating competition-density effects that is commonly used in Japan, which has been applied in several studies [27][28][29][30][31][32].However, it is necessary to consider carefully whether such optimized results represent a "better choice" or "the best choice." With respect to the spatial planning of forests, it has been argued that it is important to validate optimization algorithms [34].This also applies to algorithms for optimizing thinning rates for single stands, but existing algorithms have as yet only been compared with each other, not against exact optimal solutions as provided by FE.Producing a solution that is validated as close to the optimal solution is potentially not always essential, if the user is only using the optimization algorithm as a way to improve the current situation; in other words, for them "optimized" means "better choice."However, in some cases a solution close to the optimum is needed.For example, it is difficult to manage forests without subsidy in Japan.In this case, the result of a stand-level optimization analysis can determine whether a stand can be managed successfully under optimized conditions and whether it is worth subsidizing.Such an analysis can provide important information for regional zoning, and in this case a high degree of reliability, that is, confidence that the method provides solutions close to "the best choice," is necessary to satisfy accountability requirements, especially towards forest owners who have applied for rezoning and whose forest is not subsidized.The same considerations may apply to countries or regions in situations similar to Japan, that is, little leeway in performing spatial optimization and a limited supply of funding.
In a previous study [35], RS with 10 6 iterations and MSPATH were compared with coarse FE (CFE), by splitting variables into a lattice, comparing them using two simple SDMD-base small-scale yield models (differing only in their price functions) as benchmark, and setting net present value (NPV) for 50 years as the objective function.The NPV of the nonmonotonic pricing yield model provided by MSPATH was 60,419 yen/ha less than that derived using CFE, and the monotonic pricing yield model provided by MSPATH was 30,920 yen/ha less.In contrast, those provided by RS were 3,755 yen/ha and 5,487 yen/ha less, respectively, than that derived using CFE.On the other hand, MSPATH reduced the calculation cost by 10 −5.2 ∼10 −5.1 times relative to CFE and 10 −2.6 ∼10 −2.5 times relative to RS. MSPATH therefore confers a great advantage in terms of calculation cost, but it is not possible to rely on it providing "the best choice."Thus we need to establish a more reliable method of optimizing thinning rates for single stands.
There are diverse thinning rate models for single stands as a result of differences in the complexity of the models (e.g., whether they consider variation in trees [26,27] or in logging and lumbering [31]), in the objective function used (e.g., NPV [10,12,29,32] or in soil expectation value (SEV) [14,36]), or in other criteria [26,27].It is difficult to deal with the necessary complexity using NP algorithms, since the various trade-off relationships, for example, between yield volume at thinning and that at final cutting, may require multimodality of yield models, and the incorporation of logging processes makes them discontinuous.On the other hand, theoretical proofs are required to ensure the reliability of the DP algorithm; however it may be difficult to obtain sufficient proof to support complex yield models.We therefore suggest that simulated annealing (SA) [37,38], a stochastic algorithm inspired by the annealing process of metals, may satisfy the requirements.It is worth noting that SA has been theoretically proven to provide the global optimum for a wide variety of problems [38], such as the traveling salesman problem [37,38] and the knapsack problem [39], as long as the "temperature" parameter is decreased gradually enough.
In the field of forestry, SA has usually been used for forest planning under spatial constraints [40][41][42] or with a special objective function such as that defined based on "forest spatial value" [43,44].Lately the performance of SA itself has been analyzed, for example, its sensitivity to the parameters used [45] and its application in neighborhood methods [46].Thus, SA is commonly used in the field and its performance is evaluated.However, this approach has never been used to optimize thinning rates for single stands and its performance in this area has not been evaluated.It is therefore necessary first to evaluate whether SA can be used reliably to optimize thinning rates for single stands.
In this study, we evaluate the potential performance of SA in optimizing thinning rates for single even-aged stands, using plural yield models based on SDMD.We set the thinning interval in each model at five years and the objective function as the NPV before planting for one rotation term.The age of the rotation is one of the most important variables for improving the financial viability of stand-level forest management.However, we fixed this variable in our model, because a change in rotation age leads to a change in the number of dimensions of the model (the number of variables, in this case the number of thinning ages).To optimize both rotation age and thinning rates, it would thus be necessary to define several models with different rotation ages, optimize them, and then choose the best one.The searching performance of the model with a fixed rotation age therefore determines performance in optimizing a model that treats rotation age as a variable.
Whereas the "performance" of an optimization method includes both reliability and efficiency, we must first improve its reliability and only then enhance its efficiency while maintaining reliability.The emphasis is therefore on securing reliable optima with proven, orthodox SA methods.However, the combination of parameters used has strong effects on the performance of SA approaches [45].In addition, SA is a stochastic algorithm, and as a result the solution obtained from each searching process varies.We therefore search for "the best" combination of SA parameters using CFE, to ensure the best possible performance, using the mean NPV from 39 runs of the SA as the objective function.We also compare the 39 NPVs from the solutions obtained using optimized SA parameters with those obtained using CFE to evaluate whether SA can provide solutions close to the global optimum.Finally, we statistically compared these values with those obtained using RS with five times iterations, to evaluate the comparative merit of SA.

Benchmark Model.
A hypothetical even-aged stand of Japanese larch (Larix kaempferi) in Nagano Prefecture with medium site quality and a density of 2500 planted trees/ha was used as a benchmark stand.Minimum thinning age was fixed in 10 years and the thinning interval in five years.The rotation age was fixed in 50 years, since this value would still allow application of CFE.

Objective Function.
We defined the objective function as NPV before planting for one rotation term: where   is the income from the th cutting (yen/ha),  is the annual interest (%),   is the stand age at the th cut (year), and   is the cost of regeneration (yen/ha).Annual interest was set at 0.9% [47].As mentioned above,   values were fixed to increments of five years over a range of 10-50 years.

Growth Model.
We used SDMD to define the growth model.The formulae for larch in Nagano Prefecture are as follows [48]: log 10   = 5.529749 − 1.780184 log 10 , where  is the stand stem volume per area (m 3 /ha),  is the mean height of dominant trees (m),  is the present number of trees per area (trees/ha),  is the stand form height (m),  is the basal area (m 2 /ha),  is the diameter at breast height (DBH) of the mean basal area (cm),  is the mean DBH (cm),   is the number of trees per area on the full-density curve (trees/ha),   is the stand volume per area on the full density curve (m 3 /ha), and Ry is the relative yield index (the standard density index for the SDMD). and  are parameters that change with the mean height of the dominant trees and are defined as follows: Note that these formulae are common to the aforementioned previous study [35], but the aspects of the model described in the rest of this section and in the next one have been modified.The present model is more appropriate because it simulates self-thinning after the first thinning.
We modeled self-thinning using Tadaki's model [49], calculated as follows [50]: where   is "initial number of trees per area" (trees/ha), a parameter of the self-thinning curve. is given as follows: Equation ( 11) is the self-thinning curve before the stand reaches the full-density condition.After that, self-thinning proceeds according to (12), the full-density curve, which takes the mean height of the dominant trees as a variable.  is what the number of trees per area at age 0 would be if the stand had only been subject to self-thinning but not to additional thinning by humans.  is therefore only ever equal to the planted number of trees per area if the forest is never thinned.We call this parameter   to avoid confusion with planted number of trees per area and initial values of optimization.
SDMD is a function of the mean height of the dominant trees and the number of trees per area.It hypothesizes that only the number of trees per area, not the mean height of the dominant trees, is affected by thinning.In other words, SDMD simulates thinning by translating the self-thinning curve under a specific mean height of dominant trees.The value of   corresponds to the self-thinning curve in (11).As the relationship between the number of trees per area under a specific mean height of dominant trees and   is monotonic, we can simulate thinning by decreasing   .
It is easy to calculate the number of trees per area based on   under an arbitrary mean height of dominant trees using (11) and (12).However, if number of trees per area at an arbitrary mean height of dominant trees is given, we first need to identify   with a numerical approximation with (11) to simulate self-thinning after that age.Moreover, after International Journal of Forestry Research the stand has reached full-density condition, we cannot compute   using (12) and given the number of trees per area, because the full-density curve as ( 12) is independent of   .To address these problems, we define thinning rate as the ratio of the difference between   before and after thinning as follows: where  is the thinning rate (%),  all is   before the thinning, and  main is   after the thinning.
To simulate the growth of the stand with age using SDMD, the mean height growth curve of the dominant trees is required.The growth curve of Japanese larch at an average site in Nagano Prefecture can be predicted using the following formula [51]: where  is the stand age (years).Since SDMD assumes lower thinning, the mean height of the dominant trees will not be affected by thinning.

Volume and Diameter of Yield
Trees.The total stem volume of cut trees per area is calculated as follows: where  cut ( cut , ) is the cut stem volume per area when  cut trees per area are cut at time , (, ) is the standing tree volume per area calculated using ( 2) and (15), where the number of trees per area is  at time ,  cut is the number of cut trees per area,  main is the number of standing trees per area, and  all is the total number of standing and cut trees per area.This equation can be applied to both thinning and final cutting.Mean stem volume per tree can be calculated by dividing the left side of ( 16) by  cut .The mean DBH of the cut trees was calculated using the formula: where the subscripts of  are the same as those of  in (16).SDMD does not describe variation in stem volume, so we assumed a constant size for all cut trees for a given stand age.

Constraints of Thinning Rates and the Number of Trees per Area.
Because the parameters of SDMD were estimated using data from real forests subjected to standard silvicultural processes, unusual conditions such as extremely small number of trees per area should be avoided when using this model.Accordingly, the lower bound of   was set to 200 trees/ha.In addition, it is recommended that a high thinning rate be avoided when using SDMD; however intensive thinning is often performed to reduce thinning costs.For this study, we placed emphasis on actual practice and restricted the thinning rates to the range 0-50%.
where  is the height for which stem width is being calculated relative to total tree height (scaled from 0 to 1),  is the diameter at relative height , and  0.9 is the diameter at relative height 0.9.Parameters  and  are calculated as follows [52]: where V  is stem volume,   is the total height of the tree,  is the relative height at breast height, and  bh is the DBH.Equation ( 18) can be transformed as follows to provide the relationship between the diameter at any height and the DBH: 2.1.6.Log Price.The relationship between the end diameter of 4 m long Japanese larch logs and the value in a real market in Nagano Prefecture [53] is shown in Table 1.Although there is a brisk market for 6-11 cm logs for log piles, the value of 12-14 cm logs is lower as there is less demand for this size.This price model adds multimodality to the NPV.

Cost of Regeneration and
Yield.The thinning cost was set to 4588 yen/m 3 , based on the mean total cost of logging and transportation of Japanese larch [54], assuming a 50% subsidy is provided, and the cost of final cutting was set to 5987 yen/m 3 , assuming no subsidies.Usually no subsidy for final cutting is provided; however for the purpose of examining a range of patterns, we also tested the model on a scenario in which the final cost was similarly reduced to 2994 yen/m 3 .Hereafter, we refer to the two scenarios as F (full) and H (half), respectively.The difference between scenarios affects the flatness of the NPV.The present costs for regeneration are calculated as 1,078,590 yen/ha according to standard government sources [55], which includes ground clearance of a typical meadow at the beginning of the planting year, planting 2500 trees/ha, prevention of mammal damage by spreading Ziram solution in three years, bush cutting every year from one to five years, tree trimming and crosscutting in 10 and 15 years, 8% consumption tax, 0.9% annual interest, and 50% subsidy.This total was applied to   in (1).
2.1.8.Yield Model.We defined two types of yield models that differ in the degree of difficulty involved in optimizing them.
(1) Simple Yield Model.This model calculates the yield volume by simply multiplying the yield rate by the stem volume of the harvested trees.We calculated income based on the price in Table 1 corresponding to the DBH of the harvested trees.
The thinning and final cutting yield rates were set to 58% and 65%, respectively.Gaps in the classes in Table 1 were linearly interpolated.The income from a cut was calculated as follows: where  is the income from a cutting per area (yen/ha),  is the yield rate (%),  cut is the total volume of cut trees per area (m 3 /ha),  cut is the price of a cut tree per volume (yen/m 3 ), and  cut is the cutting cost per volume (yen/m 3 ).Hereafter, we refer to this model as S and in combination with the half and full scenarios described in Section 2.1.7,S-H and S-F model are defined.The models are identical to the nonmonotonic pricing yield models used in a previous study [35], except for the self-thinning model and volume calculation used.
(2) Logging Yield Model.This model takes the logging process into consideration.As many 4 m logs (with 0.1 m margin) as possible were bucked, from stump height (0.5 m) to the top of the harvested tree.Diameters were calculated at every height from 4.6 m to the top of the tree in 4.1 m increments, using the stem profile curve, and were assumed to be end diameters.The Japanese Agricultural Standard for logs [56] was applied to the calculations of diameter and volume.The thinning and final cutting yield rates were set to 80% and 90%, respectively.Income from a cut was calculated as follows: where  is the index number of a log from its stump,   is the price of the th log (yen/m 3 ), V  is the volume of the th log (m 3 ) calculated as where   is the end diameter of the th log (cm) (omitted for multiples of two for diameters > 14 cm and omitted for natural numbers otherwise), and  is the number of bucked logs from a harvested tree, calculated as follows: where "floor" rounds down to the next lower integer.
Because the number of bucked logs from a tree and the end diameters were restricted to integers, the NPV was a discontinuous, multimodal function.Hereafter we refer to this model as L and in combination with the half and full scenarios described in Section 2.1.7,L-H and L-F model are defined.

Optimization Method.
This section details the implementation of SA, its application to the thinning rate optimization problem, and the method we used to evaluate its performance.

Simulated Annealing.
SA was developed based on the metal annealing process, which finds the global minimum "energy" required by gradually decreasing a parameter named "temperature."At the beginning, the temperature is high, so frequent transitions occur for both low-energy and high-energy states.This makes it possible to find the global optimum by searching a wide solution space.As the temperature cools, the frequency of transitions to a higherenergy state decreases, and the system tends to transition to a lower-energy state more frequently.Using these features, approximate global minima can be found heuristically.The flowchart is shown in Figure 1.The procedure can be described as a Metropolis algorithm [57] with changes in temperature.SA is an adaptable method that requires the following four components: (1) a cooling function, which controls the rate of decrease in temperature; (2) a proposal density function, which is a probability distribution function (PDF) that generates candidate variables, that is, thinning rates; (3) an energy function, which is the metaobjective function to be minimized; and (4) an acceptance probability, a temperature-dependent PDF that decides whether or not the proposed state is accepted.
The cooling function is defined as an exponential decay function; that is, a decreasing rate determined by the initial and final temperatures is multiplied with temperature at time  to obtain temperature at time  + 1.The proposal density is defined by a normal distribution with a mean of the present value of the target variable and a standard deviation that is optimized as described later.The energy function is defined as follows: where  is the energy function, NPV min  is the tentative minimum NPV, and NPV max  is the tentative maximum NPV.Since our aim was to maximize the NPV, the problem was to minimize the negative NPV.This definition limits the differential of the energy function to the range 0-1.We set Set i = 0 Is the number of iterations per temperature < i?
Is the present temperature < final temperature?Random (0, 1) < acceptance probability?the acceptance probability as the Boltzmann distribution, as follows: where   is the acceptance probability,   is the energy of the candidate state, and   is the energy of the present state.It is difficult to set appropriate temperature bounds to control the acceptance probability, since this depends on the units of the objective function when it is set directly as the energy function.The limits inherent in (25) facilitate control of the acceptance probability.
The following SA parameters are required: (1) number of iterations per temperature; (2) number of total iterations; (3) initial temperature; (4) final temperature; (5) standard deviation of the proposal density.Because of the practical constraints of computing costs, we fixed parameters 1 and 2 to 5 × 10 3 and 2 × 10 5 , respectively.We then obtained the best combinations of parameters 3-5 using CFE.Taking into consideration that log-transformed parameters are more suited for optimization and that final temperature must be lower than initial temperature, we chose the best combinations from 11 3 (1331) patterns as shown below, using the mean NPV of 39 runs with random initial thinning as the objective function, to ensure average performance:   = 10 −  ,   = 0.0, 0.4, . . ., 4.0,   = 10 −(  −  ) ,   = 1.0, 1.5, . . ., 6.0, where   is initial temperature,   is final temperature, Sd  is the standard deviation of the proposal density (cut rate, %), and   ,   , and  sd are the variables to be selected.

Control of Variables.
The variables to be optimized are   for each thinning age.Each of these is restricted to be lower than or equal to that at the previous thinning age.Controlling them directly requires changing the bounds according to the value of   at the previous and next ages.However, this restriction can always be satisfied by constraining the thinning rate at each age to the range 0-50%.No thinning is simulated by setting the thinning rate to 0%.We implemented this method because it simplifies controlling   .

Controlling for Infeasible Solutions. If high thinning
rates are used at multiple cutting ages,   at the final cutting may be <200 trees/ha.Penalty functions are often defined for infeasible solutions but may be difficult to define appropriately [42,46].The quality of definition has a significant influence on performance.Because we placed emphasis on the potential performance of SA and on fair comparison with RS (by random sampling only; see next section), the generation of infeasible solutions was not prevented; rather, candidate variables from the proposal density were sampled repeatedly until the generated solution was feasible.This differs from defining the energy function as infinity for infeasible solutions in that the latter method does not increase iterations if generated solutions were infeasible.

Evaluation.
We derived feasible approximate global optima using CFE (Figure 2).The "combinatorial state" in this case is all patterns of repeated permutations of candidate thinning rates ranging from 0 to 50% in 5% increments.Since this solution was just an approximation based on a coarse thinning rate, it is possible to obtain better solutions (or at least solutions sufficiently close to the optimum) using other algorithms, given satisfactory performance.
We also applied RS to the same yield models, for the purposes of assessing the effectiveness of SA (Figure 3).Independent combinations of thinning rates with a uniform distribution over the range 0-50% were generated at each cutting age.Since this method was implemented as random sampling rather than a random walk, all solutions for each sample were independent of each other.A total of 10 6 feasible solutions were simulated.We ran each model 39 times, and the optimal NPVs were sampled for comparison with those generated by SA.Because the thinning rates are initiated randomly at the beginning of each SA process, the results can be tested statistically [44,58,59].We tested our results to confirm independence between the initial and optimal solutions, differences between solutions generated by SA and RS, and differences between yield models.

Examination of the Basis of Data.
The best values of   ,   , and  sd were 1.2, 6.0, and 1.9, respectively, for the S-H model; 0.8, 1.5, and 1.6 for L-H; 0.0, 5.5, and 1.0 for S-F; and 0.0, 2.0, and 1.0 for L-F.There was no significant correlation between the initial and the optimal NPV for any model (Spearman's correlation test; S-H:  = 0.83; L-H:  = 0.69; S-F:  = 0.86; L-F:  = 0.28).This indicates that the optimal solutions provided by SA can be regarded as independent of their initial solutions.

Comparing SA with CFE and RS.
The NPV derived using CFE was 674,765 yen/ha for the S-H model, 769,711 yen/ha for L-H, 177,065 yen/ha for S-F, and 185,302 yen/ha for L-F.  at final cutting was 2000, 819, 205, and 819 trees/ha for S-H, L-H, S-F, and L-F, respectively.The numerical differences between the optimal NPVs produced by SA and RS and those derived using CFE are presented in Table 2.The median and mean optimal NPVs provided by SA were better than those produced by the RS.In addition, some of the SA runs for each model provided better solutions than the CFE.Comparison of each model's NPVs provided by SA with those by RS reveals distinct differences in the level of variability (Figure 4).With respect to S-H, the standard deviation of the optimal NPVs was less than 1 yen/ha, much smaller than that for the other models (Table 2).SA also provided some low-variability outcomes for the S-F model (Figure 4); however, some of the runs might have converged to local optima (Figure 4) that caused the relatively large standard deviation for this model (Table 2).SA provided better solutions than RS for the L-H and L-F models, particularly with respect to the median, maximum, and mean optimal NPVs, although the standard deviations were larger (Table 2).Each of these values followed the Gumbel distribution (Kolmogorov-Smirnov test; L-H using SA:  = 0.50 and using RS:  = 0.17; L-F using SA:  = 0.58 and using RS:  = 0.20).A Gumbel distribution is defined by two parameters: location (equal to the mode) and scale.There are four possible SA/RS parameter combinations to describe the density of the data: common scale and location; independent scale and location; common scale and independent location; and independent scale and common location.A likelihood ratio test (using the  2 test) between all patterns for each model revealed significant differences ( < 0.05) between SA and RS for both the scale and location parameters.Using maximum likelihood estimation, the estimated L-H location parameters were −5128 for SA and −20,390 for RS.For the L-F model they were −13,663 for SA and −19,216 for RS.The estimated L-H scale parameters were 6362 for SA and 3352 for RS, and for the L-F model they were 8013 for SA and 3796 for RS.

Number of Updates.
In a comparison of the models, the order of performance of the models in terms of the median or mean number of updates (Table 3) is the same as the order of performance in terms of the standard deviation of optimal NPVs (Table 2).The same pattern was apparent for the number of iterations at last update; however, the order of performance is reversed with respect to the medians of the L-H and L-F model results (Table 3).There were significant correlations between the optimal NPVs and the number of iterations at last update or the number of updates for each model except S-H (Spearman's test; iterations at last update: S-H:  = 0.58; S-F:  < 0.001; L-H:  < 0.001; L-F:  < 0.001; number of updates: S-H:  = 0.79; S-F:  < 0.001; L-H:  < 0.01; L-F:  < 0.01).
Fluctuations in NPVs of the worst, best, and median runs of SA for each model are shown in Figure 5.Those of the S-H model included many updates and the trajectories were smooth.This indicates that for this model SA achieved its aim at each stage, that is, rough global sampling at the beginning and local optimization at the end.However, the updates of the S-F model were concentrated in the final stages.The L-F model runs contained the fewest updates over all stages.
3.4.  Trajectory.Thinning rates at each age provided by the best, median, and worst runs of SA for each model are shown in Table 4.There were some differences between the thinning rates reached for each age by CFE and the best SA runs, since the thinning rates of CFE were split into 5% increments.The NPV of the best runs was higher than that of CFE, so the solutions produced by the best runs are closer to the global optima.However, there was some variation between the SA runs of each model except S-H.Even if the thinning rate is constant, the numbers of cut trees differ if numbers of trees per area are different.Figure 6 presents the   trajectories  for each model, which indicate the changes in the number of trees per area as the stand ages.Those of the worst, best, and median S-H model runs were very similar (Figure 6), reflecting almost identical thinning rates (Table 4).The other models showed some discrepancy between the trajectories produced by the worst runs and those produced by CFE or the best runs.With respect to the S-F model,   trajectory of the worst runs differed from the others in 35 years, but converged again in 40 years (Figure 6).In contrast, all   trajectories of the L-H model differed at almost every age.  trajectory of the worst runs for the L-F model varied greatly from the others and that of median also differed from those of the best and CFE at most ages.However, the trajectory of the best run for each model matched that of the CFE closely.

Discussion
There were no significant correlations between the NPVs calculated with the initial solution and that calculated with the optimized solution for any of the models.This indicates that an SA process using optimized parameters could provide solutions, regardless of the initial solutions.
The best runs for all models provided better solutions than the respective CFE, whereas the RS never provided such solutions.The   trajectory of the best runs of each yield model was almost identical to that of the respective CFE.These results suggest that SA can provide an approximate global optimum for similar yield models to those used in this study, if the parameters are optimized and the best solution is chosen from multiple runs.The S-H and S-F models are identical to the nonmonotonic pricing yield models used in a previous study [35] (see Introduction), except for the selfthinning model and volume calculation used.Our results therefore suggest that SA may be a better algorithm than MSPATH for optimizing thinning rates for single even-aged stands, for the purposes of obtaining reliable solutions.
Compared with RS, SA provided highly reliable solutions for the S-H model, as indicated by the negligible difference between the minimum and maximum NPVs provided by the SA runs (<1 yen/ha).With respect to the S-F model, a few of SA runs may have converged on local optima, although even those NPVs were higher than most of those produced by the RS runs.In contrast, the scale and location parameters of the Gumbel distribution of NPVs produced by SA for the L-H and L-F models were both significantly larger than those produced by the RS.The difference in location parameters means that NPVs produced by SA runs were higher overall than those produced by RS.However, the scale parameter determines the variance of the Gumbel distribution, so this indicates that the variation in NPVs produced by SA was larger than those produced by RS.
The only difference between the H and F model definitions was the final cutting cost, but this resulted in substantial differences in the variance of the optimal NPVs and the amounts of variation in the   trajectories.It is possible that the high degree of sensitivity of the yield model caused large changes in the form of the objective function.In fact, in the S-F model, optimal   at final cutting was 205 trees/ha, based on CFE, which is close to the lower bound of   values, whereas in the S-H model it was 2000 trees/ha.This difference indicates that the change in final cutting cost resulted in large changes in the optimal solution.This suggests that the degree of variation in NPVs provided by SA was strongly influenced by the yield model.
Bettinger et al. [60] examined the performance of eight optimizing algorithms for spatial forest planning, using three benchmark models.They showed that SA, the parameters of which were decided based on "several trials," could provide values close to the global optima or assumed "best" values for two of the models with a high probability, but not for the other model or the other algorithms.Our study shows that this is a common problem for the optimization of thinning rates for single stands; hence the variation in the solutions provided by SA depends on the model.In cases where the performance of SA is influenced by the parameters [45], this can be reduced by optimizing the parameters of SA themselves, as in the present study.However, our results suggest that this problem cannot be solved entirely by optimizing the parameters.
Considering these results, we suggest that using SA as demonstrated here has the potential to achieve the approximate global optima for thinning rates for single stands, if the controlling parameters are optimized.However, it is necessary to execute multiple runs and choose the best solutions from among them.
Despite the above, the reliability of SA is weakened by the large variation between the L-H and L-F models, and it is important to decrease this variation.The results of updating processes (in Section 3.3) indicate that few updates of L-H and L-F are the main cause of the large variation.Whereas the controlling parameters have been optimized, there is little room to improve them, and a simple increase in the number of iterations leads to increased computing time.Therefore other approaches may be advisable.In this study, we used orthodox SA because it is widely used.However, different approaches to SA implementation may be more efficient, for example, fast annealing [61] or adaptive annealing [62], or other types of SA such as the great deluge algorithm [63] used by Bettinger et al. [60].At the same time, Borges et al. [46] showed that improving the neighborhood search method can enhance the performance of SA for spatial planning.Attempting to decrease the variation in optimal NPVs of SA and enhancing its reliability using all of these methods may be worthwhile.

Conclusion
With the aim of establishing a reliable method for optimizing thinning rates for single even-aged stands, we evaluated the performance of SA by comparing it with RS and CFE, using optimized parameters and four benchmark models.A comparison of 39 runs of the SA (2 × 10 5 iterations of feasible solutions) and RS (10 6 iterations) indicated that SA generally provided better solutions than RS.Further, the best SA run of each model provided better solutions than CFE did for that model.Therefore, if the controlling parameters for the SA are optimized, as in this study, the best of multiple runs has a high probability of approximating the global optimum.Because stand-level thinning optimization methods have rarely been compared with exact solutions such as those provided by CFE, their reliability has not previously been tested.At present, SA is the only method validated to provide solutions superior to those provided by CFE for the same problem.
However, the variance of the NPVs provided by SA differed noticeably between yield models: for the S-H model, the standard deviation of the optimal NPVs was <1 yen/ha, but for the L-H and L-F models it was substantially larger than that provided by RS.The dependence of the variance on the yield model decreases the reliability of SA and makes it necessary to select the best from among multiple runs.To increase the reliability of the SA method, the variation of optimal solutions within and between yield models should therefore be decreased.

Figure 4 :
Figure 4: The difference between the optimal NPVs produced by SA (black bars) or RS (gray bars) and that derived using CFE (upper: S; lower: L; left: H; right: F).

Table 1 :
Relationship between end diameter and log price.

Table 2 :
Differences between optimal NPVs produced by SA and RS and that derived using CFE.

Table 3 :
Summary of the updating processes of SA.

Table 4 :
Optimized thinning rates (%) provided by CFE and SA.