Ranking Methods for Multicriteria Decision-Making: Application to Benchmarking of Solvers and Problems

Evaluating the performance assessments of solvers (e.g., for computation programs), known as the solver benchmarking problem, has become a topic of intense study, and various approaches have been discussed in the literature. Such a variety of approaches exist because a benchmark problem is essentially a multicriteria problem. In particular, the appropriate multicriteria decisionmaking problem can correspond naturally to each benchmark problem and vice versa. In this study, to solve the solver benchmarking problem, we apply the ranking-theory method recently proposed for solving multicriteria decision-making problems. (e benchmarking problem of differential evolution algorithms was considered for a case study to illustrate the ability of the proposedmethod.(is problem was solved using ranking methods from different areas of origin.(e comparisons revealed that the proposed method is competitive and can be successfully used to solve benchmarking problems and obtain relevant engineering decisions. (is study can help practitioners and researchers use multicriteria decision-making approaches for benchmarking problems in different areas, particularly software benchmarking.


Introduction
Recently, evaluating the performance of solvers (e.g., computer programs), that is, the problem of solver benchmarking, has attracted significant attention from scientists. Currently, most benchmarking tests produce tables that present the performance of each solver for each problem according to a specified evaluation metric (e.g., the central processing unit (CPU) time and number of function evaluations) and use various statistical tests for the conclusions. us, the selection of the benchmarking method currently depends on the subjective tastes and preferences of individual researchers. e following components of the benchmarking process, including the solver set, problem set, metric for performance assessment, and statistical tools for data processing, are chosen individually according to the researcher's preferences. For example, the performance profile method, which is currently the most popular and widely used method in practice (see [1]), is based on a comparative analysis of empirical probability distribution functions obtained in numerical experiments with different solvers.
In this study, we consider the benchmarking process based on the viewpoint that emphasizes natural relations between problems and solvers, as determined by their evaluation tables (see [2]). Specifically, we present data for benchmarking in the form of a so-called benchmarking context, that is, a triple 〈S, P, J〉, where S and P are sets of solvers and problems, respectively, and J: S × P ⟶ R is an assessment function (a performance evaluation metric). roughout the paper, the sets of solvers and problems are assumed to be finite.
is concept is quite general and emphasizes that problems, solvers, and assessment functions must be considered closely related and not independent. e benchmarking procedure presented in this study is described as follows. e data encapsulated by the given benchmarking context 〈S, P, J〉 are used to build the corresponding multicriteria decision-making (MCDM) problem 〈A, C〉, where A � S is a set of alternatives, and C � J(·, p)|p ∈ P is a set of criteria. Hence, we define a decision matrix as a matrix whose elements exhibit the performance of different alternatives (i.e., solvers) concerning various criteria (i.e., problems) through the assessment function.
us, the investigation of benchmarking problems was reduced to an MCDM problem. Moreover, for each MCDM problem, a corresponding benchmark context is presented. e rationale for such a consideration is that a vast array of different approaches for MCDM problems can be used for benchmarking problem analysis. In particular, such a multicriteria formulation allows the consideration of Pareto-optimal alternatives (i.e., solvers) as "good" solvers. e next innovation presented in this study is that a recently proposed technique (see [3]) is used to solve the MCDM problem corresponding to a benchmarking problem. e multicriteria formulation is a typical starting point for theoretical and practical analyses of decision-making problems to clarify the essence of the new technique used in this study. Correspondingly, based on the fundamental concept of Pareto optimality, several methods and computational procedures have been developed to solve MCDM problems (see, e.g., overviews by [4][5][6][7][8], and more recently, [9][10][11]). However, unlike single-objective optimizations, a characteristic feature of Pareto optimality is that the set of Pareto-optimal alternatives is typically large. In addition, all these Pareto-optimal alternatives must be considered mathematically equal (equally "good"). Correspondingly, the problem of choosing a specific Pareto-optimal alternative for implementation arises because the final decision must usually be unique. Hence, additional factors must be considered to aid decision-makers in selecting specific or more favorable alternatives from the set of Pareto-optimal solutions. erefore, we build a special score matrix for the MCDM problem, which allows us to construct the corresponding ranking for alternatives [3]. e score matrix can be built in different ways, but we use the simplest and most natural method. is study uses a scoring matrix calculating how many times one alternative is better than another according to the criteria. Hence, the proposed approach may yield an "objective" ranking method and provide an "accurate" ranking of the alternatives for MCDM. Correspondingly, a best-ranked alternative from the Pareto set is declared a "true" solution to the MCDM problem. e approach presented in this study for solving MCDM problems is useful when no decision-making authority is available or when the relative importance of various criteria has not been previously evaluated.
Finally, we demonstrate the possibilities of the proposed method in a case study based on the computational and experimental results for benchmarking differential evolution (DE) algorithms presented by Sala et al. [12]. Specifically, we benchmark nine DE algorithms on a set of 50 test problems, following the random sampling equivalent expected run time (ERTRSE) performance metric. By conducting a numerical investigation, we demonstrate that the solution results of the MCDM problem obtained using the methods proposed in this study are quite competitive.

Contributions.
is paper makes the following main contributions: (1) e concept of the benchmarking context is introduced according to [2], and it is confirmed that a one-to-one correspondence exists between the set of benchmarking contexts and the set of MCDM problems (2) e ranking-theory approach is proposed for solving MCDM problems corresponding to a given benchmarking context [3] (3) e approach proposed in this article is tested on a known literature dataset for benchmarking DE algorithms (see [12]), and the possibility of effectively solving benchmarking problems is fully confirmed Without claiming to be a complete review, we present a brief overview of the literature on the benchmarking problem in the context of optimization problems. Generally, the consideration of a benchmarking problem is motivated by various reasons, such as selecting the best solver (algorithm, software, etc.) for some class of problems, testing the proposed novel solvers, and evaluating the solver performance for different option settings. For example, early contributions in the benchmarking of optimization algorithms are considered [13]. e results achieved at an early stage in the development of the subject can be judged according to work by the following researchers: Nash and Nocedal [14], Billups et al. [15], Conn et al. [16], Sandu et al. [17], Mittelmann [18], Vanderbei and Shanno [19], and Bondarenko et al. [20]. e beginning of a new stage of development is associated with research work of Dolan and Moré [21], in which a performance profile comparison technique was proposed.
is technique is now prevalent (but see, e.g., Gould and Scott [22]). Along with the performance profile comparison method, other more direct approaches have also been used in modern research. An idea of the modern research in the area under consideration can be obtained from the following research examples: Moles et al. [23], Mittelmann [24], Benson et al. [25], Kämpf et al. [26], Foster et al. [27], Rios and Sahinidis [28], Weise et al. [29], Sala et al. [12], and Cheshmi et al. [30]. A critical overview of the current state in the subject area was provided by Beiranvand et al. [1].
At the end of this brief overview, this study focuses on benchmarking for solvers of only the optimization problem. However, the concept of benchmarking has a much broader context (see, e.g., https://en.wikipedia.org/wiki/ Benchmarking). e approach proposed in this article is quite general and can also be applied in other areas, but we do not consider this possibility here. Scientific Programming

Notation.
roughout the article, the following general notation is used: N is a set of natural numbers, and for a natural number n ∈ N, we denote an n-dimensional vector space by R n and ‖ · ‖ p is the l p − norm in R n . If not otherwise mentioned, we identify a finite set A with set N(n) � 1, . . . , n { }, where n � |A| is the capacity of set A. We also introduce the following notations for special vectors and sets: for any n ∈ N: and R n + ⊂ R n is a positive orthant. By necessity, we also identify the matrix Π ∈ R n×m with the map Π: N(n) × N(m) ⟶ R. For a matrix Π ∈ R n×m , we denote its transpose by Π T ∈ R m×n .

Outline.
e remainder of this paper is structured as follows. In Section 2, all necessary theoretical preliminaries regarding the MCDM problem (Section 2.1) and rankingtheory methods for solving MCDM problems are presented (Section 2.2). Section 3 introduces the concept of benchmarking contexts, and its relationship with the MCDM problem is discussed. In Section 4, the case-study problem of DE algorithm benchmarking is investigated numerically. Finally, the conclusions are presented in Section 5.

Multicriteria Decision-Making
Problems. We use the following notation from the general theory of multicriteria optimization theory [31]. We consider the MCDM problem 〈A, C〉, where A � a 1 , . . . , a m is a set of alternatives, and C � c 1 , . . . , c n is a set of criteria, that is, c i : A ⟶ R, i � 1, . . . , n. Hence, we introduce the following decision matrix: where x ij � c j (a i ) is the performance measure of alternative i ∈ N(m) on criterion j ∈ N(n). Without loss of generality, we assume that the lower value is preferable for each criterion (i.e., each criterion is not beneficial; see [32]), and the goal of the decision-making procedure is to minimize all criteria simultaneously. Furthermore, A is the set of admissible alternatives, and map c → � (c 1 , . . . , c n ): A ⟶ R n is the criterion map (correspondingly, c → (A) ⊂ R n is the set of admissible values of criteria). A point ξ I � (ξ I 1 , . . . , ξ I n ) ∈ R n , where ξ I j � min a∈A c j (a), j ∈ N(n), is called the ideal point. An ideal point is considered attainable if alternative a I ∈ A exists such that ξ I � c → (a I ). e following concepts are also associated with the criterion map and set of alternatives. An alternative a * ∈ A is Paretooptimal (efficient) if a ∈ A exists such that c j (a) ≤ c j (a * ) for all j ∈ N(n) and c k (a) < c k (a * ) for some k ∈ N(n). e set of all efficient alternatives is denoted as A e and is called the Pareto set. Correspondingly, f(A e ) is called an efficient front.
Pareto optimality is an appropriate concept for solutions to MCDM problems in general. However, the set A e of Pareto-optimal alternatives is very large, and all alternatives from A e must be considered "equally good solutions." However, the final decision must be unique. Hence, additional factors must be considered to aid in selecting specific or more favorable alternatives from the set A e . We cannot provide a detailed analysis of these methods; however, interested readers can become acquainted with them through overviews [4][5][6][7][8]. Furthermore, we consider only the method proposed by Gogodze [3] without diminishing the value of more classical methods.

Ranking Methods and eir Applications to MCDM Problems.
is section provides a brief overview of the basic concepts of the ranking theory (e.g., see [33] for further details) and presents the necessary formal definitions. For a natural number N, is the ranking problem. We assume (conditionally) that the elements of N(N) are athletes (or sports teams) who compete in matches between themselves. Moreover, M(i, j) denotes a joint match for each pair of athletes (i, j), 1 ≤ i, j ≤ N, and we interpret entry S ij , 1 ≤ i, j ≤ N, of matrix S as the total score of athlete i against athlete j in match M(i, j). In addition, athlete i scored against athlete j in match M(i, j) if S ij > 0, and athlete i has beaten athlete j in match M(i, j) if S ij > S ji . Based on the introduced notation, we define the following quantities: (2) e weak order on the set N(N) is transitive and the complete relation is a ranking method if, for any given ranking problem (N(N), S), R N (S) is a weak order on the set N(N). Any vector r � (r 1 , . . . , r N ) ∈ R N can be considered a rating vector for elements of N(N), in the sense that each r i , 1 ≤ i ≤ N, can be interpreted as a measure of the performance of player i ∈ N(N). For the ranking problem For illustrative purposes, we consider only a few of the many ranking methods discussed in the literature. All of these methods are induced by their corresponding rating vectors. e considered ranking methods originate from different areas, such as athlete/team ranking in sports, citation indices, and website ranking. Hence, all of these reflect some (intuitive as a rule) human experience regarding the Scientific Programming 3 solution concept of the ranking problem. A brief overview of the ranking methods in this article is provided in Appendix.
We can unite all the information described above and demonstrate that, for any MCDM problem, we can construct the necessary matrices (e.g., S, P(S), and A(S)) and, therefore, apply a suitable ranking method for the MCDM problem solution. To simplify the perception of the constructions described below, we use sports terminology. We assume that 〈A, C〉 is an MCDM problem (see Section 2.1) with a set of alternatives A � a 1 , . . . , a m and a set of nonbeneficial criteria C � c 1 , . . . , c n and that the decisionmaking goal is to minimize the criteria simultaneously. We imagine that the number of athletes is N � m for constructing matrix S and that they are competing in an n-athlon (i.e., each match M(i · j) includes competitions in n different disciplines, 1 ≤ i, j ≤ n). For illustrative purposes, we introduce the simplest method for score calculation: us, for criterion k ∈ N(n), the equality s k ij � 1 means that c k (a i ) < c k (a j ) and the alternative a i (i.e., athlete i ∈ N(m)) receives one point (i.e., athlete i ∈ N(m) wins the competition in discipline k ∈ N(n)). Correspondingly, S ij indicates the number of total wins of athlete is the score matrix for a set of alternatives.
us, we can define an auxiliary matrix Furthermore, using matrix Π(S) and a well-known transformation, we can construct a (row) stochastic matrix P � P(S) as follows: ) is a vector defined as follows: e introduced matrix Π(S) can be interpreted as an adjacency matrix for a directed graph Γ(A, C) (associated with the MCDM problem 〈A, C〉), called the adjacency matrix for the MCDM problem 〈A, C〉. Correspondingly, matrix P(S) can be interpreted as a transition probability matrix for the Markov chain determined by the graph Γ(A, C). Moreover, we can construct a reciprocal matrix of pairwise comparisons A(S) � [a ij ], i, j � 1, . . . , m, for the MCDM problem 〈A, C〉 as follows: Subject to the facts presented in this section, the following procedure for solving the MCDM problem under consideration, 〈A, C〉, can be formulated: Using the score matrix S, the alternatives from set A are ranked using a ranking method R (R ∈ R S , R N , R B , R C , R K , R PF , R GM , . . . , ; see, e.g., Appendix) (iii) e alternative from the Pareto set, A e , ranked best by method R is declared the R solution of the considered MCDM problem

Benchmarking Problem
We consider a set P of problems, a set S of solvers, and a function J: S × P ⟶ R, the assessment function (performance metric). e terms "solver," "problem," and "assessment function" are used conditionally only to simplify interpretation, although this is not generally necessary (and, as we observe below, can even lead to terminological inconsistency). Furthermore, we assume for definiteness that the high and low values of J correspond to the worst and best cases, respectively, and for convenience, we interpret J(s, p) as the cost of solving the problem p ∈ P by the solver s ∈ S. Moreover, the following conditions are assumed: (i) Slover s ∈ S solves problem p ∈ P better than solver us, we can introduce the following definition, which is sufficient for many real-world applications. Definition 1. A triple 〈S, P, J〉 is in the (solvers) benchmarking context if and only if S and P are the finite sets (called a set of solvers and set of problems, respectively), J: S × P ⟶ R is a function (called the assessment function, or performance evaluation metric), and the following assumptions hold: e presented concept is quite general and, as mentioned, emphasizes that the set of solvers, set of problems, and assessment function must be considered closely related objects for the benchmarking goal and not independently. Assumption (A0) establishes that sets S and P have sizes m, n ∈ N, respectively, and Assumption (A1) establishes the nonnegativity of the assessment function. Moreover, because sets S and P are finite, Condition (A1) does not limit the generality of our considerations. Generally, the selection of a benchmarking context 〈S, P, J〉 component is based on the research questions motivated by a benchmarking analysis goal. However, the choice of sets S and P is often a disputable issue in the practice of certain applications. In contrast, the situation is relatively straightforward in choosing the assessment function, J, at least in computer science (see, e.g., [34]). For example, the following indicators are often used in this case: running time (e.g., the CPU time [35]), reliability (i.e., the solver's ability to successfully solve several problems, such as the success rate [36]), and others. Moreover, the case when assessment, J, is a mapping in R l , where l ∈ N (i.e., it is a multiple criterion), also can be considered but we do not delve into this issue. Next, we consider the benchmarking context 〈S, P, J〉 as given, and introduce the following definition: Definition 2. For a given (solver) benchmarking context 〈S, P, J〉, we define function J * : P × S ⟶ R as follows: J * (p, s) � J(s, p), ∀p ∈ P, ∀s ∈ S. We call J * the adjoint (to J) assessment function, and 〈P, S, J * 〉 is the adjoint to the 〈S, P, J〉 benchmarking context or problem benchmarking context (corresponding to the solver benchmarking context 〈S, P, J〉).
Definition 2 is easily validated as correct (i.e., J * is the assessment function in the sense of Definition 1). Terminological inconsistency appears, as noted above. In the benchmarking context 〈P, S, J * 〉, the set of solvers is set P, which is the set of problems in the sense of the benchmarking context 〈S, P, J〉. We hope that this does not create any problems in understanding the text below.
We assume now that a benchmarking context 〈S, P, J〉 is given and build a corresponding MCDM problem 〈A, C〉 as follows: A � S is a set of alternatives, and C � c p |p ∈ P , c p (·) � J(·, p): A ⟶ R, ∀p ∈ P is a set of criteria. Hence, we define the decision matrix as a matrix whose elements exhibit the performance of different alternatives (i.e., solvers) with respect to various criteria (i.e., problems) through the assessment function. From Property (A1), c p (s) ≥ 0∀s ∈ S, 0∀p ∈ P In contrast, we assume that 〈A, C〉, where A � a 1 , . . . , a m and C � c 1 , . . . , c n , is a given MCDM problem such that c k (a) ≥ 0∀a ∈ A � N(n) and ∀k ∈ N(n) � C. Hence, for P � N(n), S � N(m), and J(i, k) � c k (a i )∀i ∈ N(n), ∀k ∈ N(n), triplet 〈S, P, J〉 is a benchmarking context corresponding to the MCDM problem 〈A, C〉. e correspondences described above are one-to-one and reciprocal. us, we prove that the following proposition holds. Proposition 1. One-to-one mapping exists between the benchmarking contexts and MCDM problems with nonnegative criteria.
To summarize the results of this section and achieve greater clarity in the presentation, we formulated the proposed approach to solving benchmarking problems in an algorithmic form. Furthermore, we assumed that the considered benchmarking problem has already been formalized as a benchmarking context 〈S, P, J〉, where S is a set of solvers, P is a set of problems, and J is an assessment function. e flowchart of the algorithm is presented in Figure 1. All elements of the Pareto set, Ae, are considered equally "good" solvers (in the sense of Pareto optimality). However, the R ranking allows detailed classification to define the "best of the good," "worst of the good," and other intermediate "good" solvers.
us, S � S 1 , . . . , S 9 is a set of solvers. e set of problems P � P 1 , . . . , P 50 comprises 50 problems, and each problem is defined by the dimension indicator d � 30, 50 and by the test function types F 1 , . . . , F 25 , as listed in Table 2.
A description of the assessment function used by Sala et al. [12] is as follows: first, the expected running time (ERT), a widely used performance metric for optimization algorithms, is defined as follows:
Step 3: An appropriate ranking method R is chosen (see, e.g., Annex A).
Step 4: Using the score matrix S, the alternatives from set A are ranked using ranking method R.
Output: The alternatives from the Pareto set, A e , ranked using method R are declared the R solutions of the benchmarking problem <A(S), C(P, J)>.  Shifted rotated Griewank's function F 8 Shifted rotated Ackley's function (with the global optimum on the bounds) F 9 Shifted Rastrigin's function F 10 Shifted rotated Rastrigin's function F 11 Shifted rotated Weierstrass function F 12 Schwefel's problem 2.13 F 13 Expanded extended Griewank's plus Rosenbrock's function F 14 Shifted rotated expanded Schaffer's F 6 F 15 Hybrid composition function F 16 Rotated hybrid composition function F 17 Rotated hybrid composition function (with noise in fitness function) F 18 Rotated hybrid composition function F 19 Rotated hybrid composition functions (with the global optimum on the bounds) F 20 Rotated hybrid composition function (with a narrow basin for the global optimum) F 21 Rotated hybrid composition function F 22 Rotated hybrid composition function (with a high condition number matrix) F 23 Noncontinuous rotated hybrid composition function F 24 Rotated hybrid composition function F 25 Rotated hybrid composition function where τ indicates a reference threshold value, M τ is the number of function evaluations required to reach an objective value better than τ (e.g., successful runs), N max denotes the maximum number of function evaluations per optimization run,N succes represents the number of successful runs, N total is the total number of runs, and q denotes the named success rate [45]. e ERT is interpreted as the expected number of function evaluations of an algorithm to reach an objective function threshold for the first time. A threshold or success criterion is required for the ERT performance measure. However, unlike conventional optimization problems (where the ERT criterion is usually related to reaching the value of the known global optimum within a specified tolerance), the probability of coming close to the global optimum is negligible for difficult optimization problems, and a more acceptable alternative success criterion is required. Moreover, all compared algorithms must meet the success criterion a few times to compare qualitative performance using ERT for difficult optimization problems. Correspondingly, Sala et al. [12] used the success criterion to reach the target value corresponding to the expected value of the best objective function value from the uniform random sampling (1000 samples). Next, the estimation of the expected objective value E RSE (f) for test function f is based on 100 repetitions. Finally, the ERT with respect to this objective function value limit was referred to as ERT RSE for test function f. e dataset of ERT RSE estimations [12] for the above-described solvers and problems is presented in Table 3.
us, the benchmarking context 〈S, P, J〉, where S � s 1 , . . . , s 9 , P � p 1 , . . . , p 50 , and J is the ERT RSE assessment, is fully defined. Hence, in Section 3, the MCDM problem associated with the benchmarking problem under consideration is fully defined with a set of alternatives A � S � N(9) a set of (nonbeneficial) criteria C � P � N(50), and a primary decision matrix obtained by transposing the matrix/table presented in Table 3, which is the transposed primary decision matrix Z � [z ij ], i ∈ N(9), j ∈ N(50), for writing convenience. Hence, the MCDM problem associated with the benchmarking context 〈S, P, J〉 (i.e., the solver benchmarking problem) is fully defined. e benchmarking context 〈P, S, J * 〉 is analogously defined, where the assessment function J * is obtained based on the decision matrix Z * (which is the transpose of the decision matrix Z defined above). Hence, the MCDM problem associated with this benchmarking context (i.e., the benchmark problem for the problems) is also fully defined.

Calculation Results.
In this section, we present a brief description of the calculation results (all calculations related to the case study were calculated in the MATLAB environment using standard equipment: laptop with 2.59 GHz, 8 GB RAM, and a 64 bit operation system and required a few seconds (4.87 s for the solver benchmarking and 5.04 s for the problem benchmarking for calculating all considered rankings without special code optimization measures). First, we consider the solver benchmarking problem and explain the construction of the normalized decision matrix by transforming the primary dataset (see, e.g., [32]).
For the primary decision matrix Z � [z ij ], we define the normalized decision matrix X � [x ij ] as x ij � (z ij − l j )/ (u j − l j ) where u j � max i∈N (9) z ij , l j � min i∈N (9) z ij , j ∈ N(50). For the solver benchmarking problem, we consider all criteria to be nonbeneficial (i.e., minimizable). We consider a solver to be better if it solves a given problem in less time (ERT RSE ).
To illustrate this, we present the score matrix for the solver benchmarking problem in Table 4. Table 5 presents the obtained R S , R N , R B , R C , R K , R PF , and R GM ranks for the solver benchmarking problem.
Analogously, we consider the problem benchmarking but define the normalized decision matrix X � [x ij ] as follows: , is the corresponding primary decision matrix. For the benchmarking problem, we also assume that all criteria are nonbeneficial (i.e., minimizable). Again, a problem is better (i.e., easier) for a given solver if it is solved in less time (ERT RSE ) by this solver. Table 6 presents the R S , R N , R B , R C , R K , R PF , and R GM ranks for the problem benchmarking (the score matrix for the problem benchmarking is not presented). Table 5 indicates, the results of solver ranking using the considered methods (R S , R N , R B , R C , R K , R PF , and R GM ) are somewhat similar.

Discussion. As
is observation was confirmed quantitatively by considering the Spearman correlations between ranks (Table 7), where the correlations of the solver ranks for the R S , R N , R B , R C , R K , R PF , and R GM rankings are presented. As Table 7 demonstrates, the R S , R N , R B , R C , R K , R PF , and R GM ranks are strongly correlated with each other. Analogously, Table 8 reflects the interrelation between ranks for problem benchmarking.
In particular, R S , R N , R B , R C , R K , R PF , and R GM ranks are strongly correlated with each other.
Regarding the results of the correlation analysis, the observed similarity of the ranking results for R S , R N , R B , R C , R K , R PF , and R GM ranking methods appears very intriguing, given that these methods have completely different areas of origin and underlying ideas (see the corresponding scholium in Appendix). It is interesting to consider the Pareto optimization results (see the solvers and problems marked in gray in Table 5 and 4, respectively). In particular, from Table 5, all considered solvers were Paretooptimal (i.e., they are considered "equally good" in the considered benchmarking context). We believe that this is due to the large (compared to the number of solvers) number of problems (i.e., too many criteria exist in the Scientific Programming corresponding MCDM problem) and, accordingly, each solver is good in "its own way." However, ranking methods enable the establishment of an appropriate hierarchy among solvers. Analogously, Table 6 demonstrates that Paretooptimal problems are allocated to different groups or clusters, indicating similar problems belonging to the same   clusters. Ranking methods also make it possible to establish an appropriate hierarchy among the problems. Summarizing the results of the case-study investigation, we conclude the following: (i) e results of the calculations (Table 5) confirm that the SQG-DE algorithm (solver S 9 ) is the best in the considered benchmarking context (for comparison, see [12]), and this conclusion is correct for all rankings used in this study, despite their quite different natures. Moreover, the worst results are DE2 (solver S 2 ) according to all considered ranking methods, excluding Neustadt's method, and DE (solver S 1 ) according to Neustadt's ranking method.
(ii) Unlike Sala et al. [12], where the analysis of the problems was not carried out, our calculations also indicate ( Table 6) that the best problems in the considered benchmarking context (in the sense of a lower value of the considered metric) are the shifted sphere function in Dimension 50 (problem 26) and the rotated hybrid composition function in Note: the Pareto-optimal problems are gray marked.  We stress that these results were obtained using only the ranking-theory methods without an analysis of any statistical indicators of the assessment function values, as currently practiced (see, e.g., the related literature overview in the Introduction section).

Conclusions
In this study, we presented a new MCDM technique for solving decision-making problems for benchmarking. Our investigation was based on the concept of a benchmarking context, presented in detail, and the observation that a benchmarking problem is an MCDM problem. Correspondingly, to solve benchmarking problems successfully, an extensive array of MCDM methods can be used. We also presented a new approach to the MCDM problem solution based on the ranking-theory methods. e corresponding ranks are obtained by constructing a special score matrix. We emphasize that this method defines the appropriate ranks directly from the decision matrix and does not use preliminary assessments conducted by external experts or other methods. erefore, the technique presented in this study is useful when the relative importance of various criteria has not been evaluated in advance. As a case study, the benchmarking problem of DE algorithms was considered based on the data presented by Sala et al. [12]. A detailed numerical investigation was conducted using various ranking methods. Moreover, these ranks were also correspondingly compared for solvers and problems. e results demonstrate that the method presented in this study is competitive and generates relevant solutions.
Referring to the analysis presented in this study, we conclude the following: (i) e results of applying MCDM methods to aid benchmarking problem solutions based on the proposed approach are encouraging. (ii) e proposed approach provides a constructive view of the benchmarking problem solution, identifying the "best" and "worst" cases and ordering all intermediate cases. (iii) e proposed approach is easily implementable because of its simplicity and flexibility. Moreover, the approach is sufficiently general and can be successfully used to investigate benchmarking problems in other application areas.
However, this study has limitations because we provided a tool for benchmarking only in the case in which the benchmarking context is given (i.e., when the sets of solvers (problems), problems (solvers), and performance metrics are given). However, issues regarding selecting benchmarking context components remain unresolved. e literature does not contain clear and direct recommendations regarding the correct selection of solvers, problems, and performance metrics. Hence, further investigation in this direction will be helpful. Now, using score matrix S � [S ij ], 1 ≤ i, j ≤ N, we define quantities w i , l i , n i , n ij as follows: and, obviously, n i � w i + l i � j� 1 N n ij , 1 ≤ i, j ≤ N. e Colley rating vector, r C , is obtained as a solution of the equation Cr C � v C , and the ranking defined by rating vector r C is called R C rank.
A.5. Keener Method. We describe the Keener method [46] as follows: let N be the number of athletes/teams and S � [S ij ], i, j � 1, . . . , N is the corresponding score matrix. Keener matrix K � [K ij ] i.j�1,...,N is defined as follows: Correspondingly, the rating vector for the Keener method r K is obtained as a solution of the eigenvalue problem Kr K � λr K and the ranking defined by rating vector r K is called R K rank.
A.6. Analytical Hierarchy Process. e analytical hierarchy process (AHP) is a well-known decision-making method [47]. Many modifications of this method exist, but we restrict ourselves to considering only two of them: AHP Perron-Frobenius version (AHP PF ) and AHP geometric mean version (AHP GM ), which are briefly described below. A main problem related to AHP is the inconsistency problem (of a pairwise comparison matrix). We will not discuss this problem here because of its technical nature. erefore, we consider AHP only as a procedure for constructing a rating vector. Let us assume again that N is the number of athletes/teams, which should be ranked based on the score matrix S � [S ij ], i, j � 1, . . . , N. We also assume that the score matrix S allows the construction of a matrix A � A(S) which is the reciprocal matrix of pairwise comparisons. Recall that matrix A � [a ij ], i, j � 1, . . . , N, is called the reciprocal matrix of pairwise comparisons if it has the following properties: a ij > 0, a ii � 1, a ij � a −1 ji , ∀i, j ∈ 1, . . . , m { }. Note also that for a positive reciprocal matrix A, its principal eigenvalue λ max has following properties: λ max ≥ n and if λ max ≠ n we have an inconsistency problem. e AHP PF rating vector r PF is defined as the solution to the eigenvalue problem: Ar PF � λ max r PF , with the principal eigenvalue λ max , and the corresponding ranking is called R PF rank. At the other hand, the AHP GM rating vector r GM � (r GM 1 , . . . , r GM m ) is defined as follows: and corresponding ranking will be call R GM rank.
Data Availability e data of Sala et al. [12] were used to support this study.

Conflicts of Interest
e authors declare no conflicts of interest regarding this article.