Resolution of the Min-Max Optimization Problem Applied in the Agricultural Sector with the Estimation of Yields by Nonparametric Statistical Approaches

The ultimate objective of the problem under study is to apply the min-max tool, thus making it possible to optimize the default risks linked to several areas: the agricultural sector, for example, which requires the optimization of the default risk using the following elements: silage crops, annual consumption requirements, and crops produced for a given year. To minimize the default risk in the future, we start, in the first step, by forecasting the total budget of agriculture investment for the next 20 years, then distribute this budget efficiently between the irrigation and construction of silos. To do this, Bangladesh was chosen as an empirical case study given the availability of its data on the FAO website; it is considered a large agricultural country in South Asia. In this article, we give a detailed and original in-depth study of the agricultural planning model through a calculating algorithm suggested to be coded on the R software thereafter. Our approach is based on an original statistical modeling using nonparametric statistics and considering an example of a simulation involving agricultural data from the country of Bangladesh. We also consider a new pollution model, which leads to a vector optimization problem. Graphs illustrate our quantitative analysis.


Introduction
The study is based on the quantification of the risk of having the need exceeding the production and the quantity of the production already ensiled; the same quantification will be applied in the case where the production of a given year exceeds both the need as well as the capacity of the silos for the same year; the idea is to calculate these risks of faults over 20 years in the future, based on the total investment amounts allocated for irrigation and the construction of the silos planned over these 20 years, via 3 calculation methods simulated in N iterations (N distributions of the amounts of irrigation and construction of the silos), and optimize them via a vector Pareto optimization algorithm for faults calculated by considering the pollution and those calculated without the assumption of the pollution; the strategies (belonging to the Pareto front) said optimal strategies will be linked to their amount of investment in irrigation and construction of silos affected and based on the total of these planned investment amounts. Therefore, it is up to the decision-maker to choose a strategy for allocating these investment amounts, among all optimal strategies retained (strategies belong to the Pareto front).

The Problem Schematization
2.1. The General Idea of the Work:. Minimize the difference between the production PðnÞ , the requirement φðnÞ, and the quantity to be removed or ensiled QðnÞ after having maximized the risk through 3 scenarios that will be well explained after.
The variables S tot = S n ð Þ + s n ð Þ, P n ð Þ = p n ð Þ * S n ð Þ + q n ð Þ * s n ð Þ, ð1Þ with S tot as the total cultivated area in year n, SðnÞ the surface area of irrigated land with yield p in year n, sðnÞ the surface of nonirrigated land with yields q in year n, PðnÞ the total production in year n, pðnÞ the yield of irrigated land, and qðnÞ the yield of nonirrigated land. Knowing the requirement φðnÞof year n, the difference between the production PðnÞ of year n and the requirements can be explained as follows: If : P n ð Þ − φ n ð Þ > 0 : quantity to ensile Otherwise∶the silo stock is to be consumed ð2Þ However, the quantities to ensile or to consume are not often equal, respectively, to the capacity of the silo or to the harvest stock for a given year, which requires an optimization of resources. To do this, we need to calculate the default Δ, as follows: (iii) P cons ðn − 1Þ: the budget allocated to the construction of the silo for year n (iv) c x : the cost price of a unit of silo capacity: (v) P irrig ðn − 1Þ: the budget allocated to irrigation for year n (vi) c y :the expenditure per unit of irrigated land (vii) φðnÞ: the product requirements in year n Therefore, the total budget P tot ðnÞ is given by 3. Fundamental Elements for the Calculation The data was collected via the FAO site, selected country: Bangladesh (total agricultural land = 8:549 million ha and irrigated land areas) [1,2].
The surface area of nonirrigated lands has been deducted as shown in Table 1.
(3) Irrigation and silo construction budgets: based on FAO data (if we consider that the unit costs of irrigation and silo construction are, respectively c y = 622 $/ha [3] and c x = 25$/tonne [4] (for the case of tank stores, source FAO)), and based on the total production of the crops and the irrigated land area, we will have Table 2 It is also assumed that 75% of the total production is devoted to cereals; therefore, the total production can be approximated to the production of cereals.
(4) Irrigated and nonirrigated lands yields: based on FAO publications [5], the yields of irrigated land are recorded between 5 tonnes/ha and 13 tonnes/ha from an irrigation of 4500 m 3 /ha and from 0 tonne/ha to 5 tonnes/ha for nonirrigated lands; therefore, we go through a numerical simulation of the respective yield values of irrigated and nonirrigated lands to deduce the simulated productions, while assuming these yields, do not take into account the factor of pollution (the yields of the supposedly polluted land is to be deduced later) (see Table 3) In order to determine the distribution function of the returns p and q in order to deduce the expectations, nonparametric methods will be the subject of the next part of this study.

Nonparametric Estimation of Yield
n is the number of the observed values in the interval ½a, b½, and x is a continued value that belongs to the same interval that we are looking for its probability formula.
The density function [6] associated to x is in the following form:f Such thatp k ðxÞ Abstract and Applied Analysis Therefore, the problem comes down to estimating the probability vector:p While the question that arises is how to choose h, this comes down to choosing the number of intervals to have a well-smoothed distribution (do not fall in the case of a histo-gram called "oversmoothing" when h is larger or the opposite case of a histogram called "undersmoothing" when h tends towards zero).

(1) The quadratic risk off H ðxÞ
The solution proposed in this sense is to establish a risk function RðhÞ and minimize it as a function of the window h; the function adequate to this problem is that expressing the expectation of the quadratic deviation betweenf H ðxÞ and f ðxÞ: However, ∑ n i=1 I ½L k−1 ,L k ½ ðx i Þ = n k and ∑n k = n; this says that n k~B ðp k , nÞ : the law of succeeding an experiment with a probability p k = n k /n, repeating n times.
Therefore, we will have which implies and that Finally, the expectation of the quadratic deviation off H ðxÞ is written as follows: (2) The quadratic risk off H ðxÞ integrated For the function to be as a function of h, the EQMðh, xÞ just mentioned, it will be convenient to integrate the function over the interval ½a, b½ : It seems that the speed of convergence of RðhÞ depends on nh, which leads us to reformulate the RðhÞ and going through the limited expansion of f ðxÞ : Passing through the limited development of the first Taylor degree at pointx, we will have Therefore, Proof of Equation 19.
Since v ðtÞ is negligible in front of ðt − xÞ, then However, we have Therefore, Abstract and Applied Analysis Finally, which implies After the calculation is done, we will have Finally, we will have For its part, the variance is given by If we approach the quantity ð1 − Ð c k f ðtÞdtÞ to 1, the variance can be expressed in the following form: Finally, However, the quantification of this risk function, in other words, the determination of optimal h, must be carried out over the entire interval ½a, b½; therefore, the integral is applied only to the nonnegligible part of EQMðx, hÞ: The optimal h corresponds to finding This leads us to look for the argument of the null derivative of RðhÞ. Therefore, However, we do not have the distribution of f , to deduce Ð b a ðf ′ðxÞÞ 2 dx.
To do this, it will be necessary to go through the "crossvalidation" estimator.

Abstract and Applied Analysis
(3) The "cross-validation" formula of the estimation It seems that, according to the formulation of Integrate-dRisk, the calculation of Ð b a ðf ′ ðxÞÞ 2 dx turns out to be difficult in practice; hence, the idea is to calculate de RðhÞ − Ð f 2 ðtÞdt; this amounts to calculating the formula "cross-validation" (see Figure 1): If we estimate Eðf H ðxÞÞ = p k /h, we will havȇ Finally, we have And finally, F M ðxÞ ⟶ M→+∞ FðxÞ almost certainly.
The derivative of the function FðxÞ is given by This implies Such an estimator is known as the Rosenblatt estimator.
Generally, we put [8] c K is called the kernel of the estimate; generally, this kernel can be illustrated following different paces [9]: (1) The quadratic risk off K ðxÞ The mean square deviation as mentioned in the histogram approach is defined as follows: Definition 1. Let T be an interval in R and the pair ðβ, LÞϵR +2 , 6 Abstract and Applied Analysis the class Hölder H ðβ, LÞ defined on T is the set of differentiable functions l = bβc which satisfy Definition 2. Let l ≥ 1 be an integer; we would say that K : R ⟶ R is a kernel function of order l [10] if the functions v ⟶ v j KðuÞ, j = 0, ⋯ ⋯ , l, are integrable and which satisfy ð In the following, we will focus on the kernel function of order 2.
Definition 3 (Cauchy-Schwarz inequality). If f and g are two integral functions on T, then the absolute value of the integral of their product satisfies the following inequality: Let us calculate the upper bound of the bias [11,12]: Let ℘ðβ, LÞ = f f ϵH ðβ, LÞ/f ≥ 0, Ð f ðyÞdy = 1∀y ∈ Rg. Going through a Taylor expansion of order 2 of pðuh + xÞ, we have However, we previously assumed that the kernel function 7 Abstract and Applied Analysis is of order 2, and therefore, The hypothesis of p ∈ ℘ðβ, LÞ implies Following Cauchy-Schwartz's theorem, we will have the following inequality: Knowing that the calculation of the expectation is done on the interval ½a, b½: u > 0 if y > x and vice versa. Finally, Likewise, the variance of b f K ðxÞ is calculated as follows: The maximum of a probability function = 1: Put Computing h opt amounts to minimizing the upper bound of EQMðx, hÞ.
(2) The quadratic risk off K ðxÞ integrated Computing h opt from RðhÞ amounts to deducing it from the following integral: (3) The "cross-validation" formula of the estimation As mentioned previously, the cross-validation formula "CV ðhÞ" of the estimation of distributions is defined by Let us saŷ is an unbiased estimator of Eð Ðf K ðxÞf ðxÞdxÞ. This amounts to demonstrating that Under the assumption that X j are i, i, d, therefore we have Likewise,

Theoretical Calculation of Strategies
Recall that the default risk ΔðnÞ, for a given year n, "as discussed by Moiseev [13]," is calculated as follows: The nonparametric estimate of yield expectations will allow the calculation of strategies for the distribution of budgets between irrigation and the construction of silos over the next N years; these strategies are given by

Abstract and Applied Analysis
The project strategy can be calculated in another way: Strictly positive or strictly negative defaults do not have the same meaning. This leads us to calculate the budget allocation strategy using the following formula: The problem of optimizing the min-max comes down to choosing the optimal strategy for the allocation of budgets that minimizes the risk of default; this amounts to finding the following scalar minimum: In the case where the pollution factor is considered, the new default risk values will be obtained from the formula below:   Figure 1) of calculating the risk of default Δ"delta" and Δ * "delta_pollution" for as mentioned in Figure 1 under the R software: the last iteration of simulation.

Abstract and Applied Analysis
p * , q * , and Q * ðnÞ are, respectively, the yield of irrigated land, the yield of nonirrigated land, and the quantities to ensile or to consume by integrating the factor of pollution in the modeling of the risk of default, then go through a vector optimization between the strategies as a 14 Abstract and Applied Analysis function of ΔðnÞ and that calculated as a function of Δ * ðnÞ; this leads us to calculate: Then, minimize the following vector:

Multiobjective Pareto Optimization (Pareto Front)
A multiobjective optimization problem presents itself, in general, as follows [14,15]: h k x ð Þ, k = 1, ⋯ ⋯ :,l: We will say that Y dominates [16] Z, and we write Y≺ P Z, if and only if [17] ∀i ∈ 1, ⋯, p, U i ≤ U i ′ , A solution X ′ = ðX 1 ′, X 2 ′, ⋯ ⋯ :X n ′Þ of the multiobjective problem P is said to be Pareto optimal [18] if there is no point of the decision space X * = ðX * 1 , X * 2 , ⋯ ⋯ :X * n Þ which dominates X ′ such that F i ðX * Þ ≤ F i ðXÞ for each i and j which verifies that F j ðX * Þ < F j ðXÞ. All of these nondominated points in the Pareto sense constitute the Pareto front [19]. In our example, the vector of irrigation prices over the next 20 years will be the vector of decisions and the vector of objectives is the vector ðJ i , J * i Þ. Notice that "yes" means the corresponding identity holds and "no" means the identity does not hold.
Once the strategies are simulated, we will have a data frame of two columns and of length Nrows tab of (J i , J * i ) (see Figure 2). Therefore and by applying the principle of dominance, the efficient strategies (Pareto Front strategies) are calculated by the algorithm of the Figure 3.

Empirical Part
Following a Shapiro-Wilk test [20] of the Gaussian law, the p value of the test (>5%) indicates that the yields of irrigated and nonirrigated land follow a Gaussian law (Figure 4).
Under the assumption of the normal distribution of these returns, we choose to calculate their expectations through a nonparametric estimate of the density by the kernel method by the choice of a Gaussian kernel, through the following program in Figures 5 and 6.
Since the hypothesis of the normality of yields of nonirrigated land and those of irrigated land is verified, the median and the mean must be equal, given the Gaussian law is a symmetrical law.
Through the "summary" command, we notice that the arithmetic mean is not significant for estimating the expectation because it is different from the median.
By using the kernel estimate, the average obtained corresponds perfectly to the median; this means that the estimate of the density by this method turns out to be robust for the calculation of the expected returns (mean = median = E ðqÞ = 2,215 tonnes/ha for nonirrigated land and mean = median = E ðpÞ = 9, 38 tonne/ha for irrigated land).
We assume that the pollution factor can reduce the yields of irrigated and nonirrigated land by 12% and 20%, respectively. The average of the returns by the kernel Figure 15: The optimal Pareto values of (J i , J * i ) simulated. 16 Abstract and Applied Analysis method makes it possible to deduce that Eðq * ðnÞÞ = 1, 77 tonnes/ha and Eðp * ðnÞÞ = 8, 25 tonnes/ha. The yields qðnÞ and pðnÞ were simulated by the numerical "Box-Müller" method taking into account the intervals of the values of qðnÞ and pðnÞ according to the FAO site (from 0 to 5 tonnes/ha for nonirrigated land and more than 5 tonnes/ha for irrigated land).
The nonsimulated data series cover the period (area of irrigated land) 2001-2016 (source: FAO site); thereafter, the areas of irrigated land will be forecast over the period 2017-2036.
Total agricultural land = 8,549 million ha (to deduct the surface areas of nonirrigated land).
The total budget to be invested will be planned based on the ARIMA time series (p, d, q) [21], then will be distributed between irrigation and construction of silos; the total budget to be invested will be forecast [22] based on the ARIMA time series (p, d, q), then will be distributed between irrigation and construction of silos.
If a process X t follows an ARIMA (p, d, q) model, then it is represented by the following relation: such that (i) ε t is white noise (ii) d is order of differentiation of the process X t (iii) L is the delay operator If X t follows an ARIMA (p, d, q) process, so Y t follows an ARIMA (p, q) process.
In practice, the series of budgets was not stationary following the increased Dickey-Fuller test (p value > 5%) and therefore, the alternative hypothesis of stationarity is rejected.
After two differentiations, the series of investment budgets became stationary (p value < 5%) as shown in the code in Figure 7.
To determine the degrees p and q of the series, we use the partial autocorrelation function PACF and the autocorrelation function ACF, respectively (Figures 8 and 9, respectively).
From the two graphs, we can first assume that p = 2 and q = 1 (after these values, the PACF and AFC functions practically cancel each other out).
To ensure the choice of the appropriate model, we proceed to build the AIC matrix (Akaike Information Criterion) of the ARIMA models (i, 2, j) with i ranging from 1 to 3 and j ranging from 1 to 3 and retain the model which corresponds to the minimum AIC index ( Figure 10).
The AIC matrix confirms that the model to be retained is ARIMA (2, 2, 1).
The forecast of this series under the software over the period 2017-2036 is represented by the graph in Figure 11.
The construction of the budgets allocated to irrigation and the construction of silos over the period 2001-2016 is based, respectively, on the unit prices c y = 622$/ha and c x = 25$/tonne; then, their forecasts (2017-2036) will be a distribution simulation of the series of investment budgets between irrigation and the construction of silos by digital simulation methods (we go through 4 distribution simulations of the total investment budget).
The product quantity requirements for the year over the period 2017-2036 will be simulated; after the calculation of the "delta" faults under the software, we move on to calculate the (J i , J * i ) with i = 1, 2, 3 over the period 2017-2036. An example of calculation of Δ * and Δ for the Nrows tab = 249; the number of simulation of (J i , J * i ) with i = 1, 2, 3 is 17 Abstract and Applied Analysis N simule = 249/3 = 83 ( tab, Nrows tab and N simule are mentioned in Figure 1), (Figure 12).
The table of the values (tab) of the strategies ðJ i , J * i Þ = ðX3, X4Þ simulated (in each iteration of simulation, we obtain a distribution of the total investment budget between the construction of the silos and the irrigation over 20 future years) follows Figure 2 (Figure 13).
The strategies belonging to the so-called efficient Pareto front calculated following Figure 1 are presented as follows (the strategies which do not belong to the front register a missing value NA as indicating the algorithm) ( Figure 14).
After eliminating the missing values, NA was obtained by the Pareto front algorithm (Figure 15).
By going through the program encoded on R [23], we can plot the graphs the strategies obtained following Figure 1 and the Pareto front following Figure 3 (Figure 16).

Conclusion
Therefore, each point belonging to the Pareto front corresponds to its vector of irrigation and construction of silos amounts and it is up to the decision-maker to choose the distribution of the total investment budget between irrigation and the construction of silos, which gives a strategy belonging to the Pareto front and minimizing the agricultural risk defaults in the case of Bangladesh in future. In this article, the min-max tool was suggested to be an appropriate formula to determine strategies said "optimal" especially when we use the dimension rather than d = 1 and to give an exhaustive study, other mathematics principles are invited to apply nonparametric statistics, ARIMA time series, etc.

Data Availability
All data are available on the FAO website.

Conflicts of Interest
The authors declare that they have no conflicts of interest.