Degree-Constrained k -Minimum Spanning Tree Problem

variant of it which is obtained while embedding a Q-learning strategy. We also propose a pure Q-learning algorithm that is competitive with the ACO ones. Finally, we conduct substantial numerical experiments using benchmark input graph instances from TSPLIB and randomly generated ones with uniform and Euclidean distance costs with up to 400 nodes. Our numerical results indicate that the proposed models and algorithms allow obtaining optimal and near-optimal solutions, respectively. Moreover, we report better solutions than CPLEX for the large-size instances. Ultimately, the empirical evidence shows that the proposed Q-learning strategies can bring considerable improvements.


Introduction
Intelligent communication systems will be mandatorily required in the next decades to provide low-cost connectivity within many application domains involving network structures in the form of ring, tree, and star topologies [1][2][3], for instance, when designing network structures in telecommunications, facility location, electrical power systems, water and transportation networks, and for networks to be constructed under the Internet of ings (IoT) paradigm [2][3][4][5][6][7][8][9][10]. A particular example related with wireless sensor networks is due to density control methods [1,11,12]. ese methods allow reducing energy consumption in sensor networks by means of activation or deactivation mechanisms which mainly consist of putting into sleep mode some of the nodes of the network while ensuring sensing operations, communication, and connectivity [1][2][3][12][13][14][15][16].
Let G(V, E) represent a simple undirected complete graph with set of nodes V � 1, . . . , n { } and set of edges E � 1, . . . , m { }. In this paper, we consider the degree-constrained k-minimum spanning tree problem (DCkMST), which consists of finding a minimum cost subtree of G formed with at least k ≤ n vertices of V where each vertex has a degree lower than or equal to d ∈ 2, . . . , k − 2 { }. Notice that, for d � 2, DCkMST reduces to find a minimum cost Hamiltonian path with cardinality k. e DCkMST problem generalizes two classical combinatorial optimization problems, namely, the degree-constrained and k-minimum spanning tree problems (resp., DCMST and kMST). Recently, a variant of DCkMST was studied in [17], where the authors assume that there exists a predefined root node that should not satisfy the degree constraint. In this paper, we consider the more general case where all nodes should satisfy this condition. Proofs related with the NP-hardness of DCMST and kMST can be found, respectively, in [18][19][20]. However, other applications related to the kMST problem can be consulted in [17,20]. Notice that the degree constraints ensure that no overloaded nodes are present in the network.
It is easy to check from the literature that the DCkMST problem has not been addressed in depth so far. Consequently, our main contributions in this paper can be highlighted as follows. First, we propose three mixed-integer linear programming (MILP) models for the DCkMST problem and derive for each one an equivalent formulation using the handshaking lemma [21][22][23]. More precisely, we propose four compact polynomial formulations and two exponential models. Two of the compact models are formulated based on a Miller-Tucker-Zemlin-constrained approach, whilst the two remaining ones are flow-based models. We solve our exponential models with an exact iterative method adapted from [2,24]. In order to obtain feasible solutions alternatively, we further propose ant colony optimization and variable neighborhood search (resp., ACO and VNS for short) algorithms for d � 2 and d � 3, respectively [25][26][27][28][29]. We choose VNS and ACO metaheuristics as they are well-known techniques in the operations research community which have proved to be highly efficient in order to find feasible solutions for many hard combinatorial optimization problems [25][26][27][28][29]. In particular, within each iteration of the VNS algorithm, for each random k− tuple of vertices generated, we obtain degree-constrained spanning trees by using a modified penalty approach proposed in [30]. However, for the VNS algorithm, we further introduce an embedded Q-learning strategy that allows performing a random local search based on the experience of previous solutions found [31]. e Q-learning approach is a simple reinforcement learning technique initially proposed in [31] that allows an agent to interact with its environment in order to learn from the experience of previous occurrences the best action to choose from a set of actions in order to maximize its revenue. e underlying idea in doing so is to construct optimization methods with learning capabilities in order to make them robust, self-adaptive, and independent from decisionmakers. Notice that recently there is a growing interest from the operation research community in developing novel machine learning-based optimization methods that allow solving hard combinatorial optimization problems [32]. Consequently, embedding a Q-learning strategy in a random local search algorithm is a novel approach to the literature. Our embedded Q-learning strategy is then compared with a traditional VNS near-far random local search procedure [28]. Another paper dealing with a similar embedded reinforcement learning strategy using VNS is proposed in [33]. More precisely, the authors propose a reactive search VNS method that allows learning which is the order in which different local search heuristics must be applied in order to obtain better solutions based on the experience of previous trials. eir method is applied and tested on the symmetric traveling salesman problem obtaining good results in terms of solution quality and speed. e embedded Q-learning strategy in our VNS approach is different as we use it as a learning mechanism in order to perform a random local search. Besides, it does not require the use of specialized local search methods, and consequently, it can be extended and used straightforwardly for any other combinatorial optimization problems. Similarly, we also compare the proposed ACO algorithm with an adapted version of the generic ant-Q method proposed in [27]. Finally, we propose a pure Q-learning-based algorithm that is competitive with both ACO methods. We report numerical results for Euclidean benchmark instances from [34] and for randomly generated ones with both uniform and Euclidean distance costs with up to 400 nodes. e paper is organized as follows: in Section 2, we present some related work, whilst in Section 3, we present the MILP formulations. en, in Section 4, we present and explain each proposed algorithm. Next, in Section 5, we conduct substantial numerical experiments in order to compare all the proposed models and algorithms. Finally, in Section 6, we give the main conclusions of the paper and provide some insights into future research.

Related Work
As mentioned in Section 1, the DCkMST problem generalizes both the DCMST and kMST problems simultaneously. Consequently, previously related work is mainly focused on these graph combinatorial optimization problems. Notice that our work in this paper corresponds to an extended version of the preliminary work reported in [21]. However, now we generalize the previous formulations presented in [21] and allow each model to obtain feasible solutions with at least k vertices instead of using a unique value of k. us, the proposed models in this paper are more accurate with respect to the definition of the kMST problem [20]. In addition, we propose several algorithmic approaches for the DCkMST problem. Finally, we report numerical results for complete graph instances using degree values of d ∈ 2, 3 { }. Notice that solving complete graph instances for the DCMST problem while using degree values of d ∈ 2, 3 { } implies solving the hardest type of instances as reported in the literature [35].
Regarding the complexity of the kMST problem, it has proved to be NP-hard by reduction from the Steiner tree problem for variable values of k [19,20]. Consequently, it is hard to obtain an approximation ratio better than (96/95) [36]. In fact, the best-known approximation ratio for this problem is 2 and is reported in [37]. Some recent metaheuristic approaches are due to [38,39]. In particular, the authors in [39] propose a new hybrid algorithm based on tabu search and ant colony optimization leading to better numerical results than state-of-the-art values reported in the literature. Similarly, in [38], the authors propose a hybrid 2 Complexity approach using a memetic algorithm where the genetic operator is based on a dynamic programming method.
Numerical results show that their proposed method outperforms several existing algorithms in terms of solution quality and accuracy. On the contrary, the DCMST problem has been studied extensively in the literature including exact and heuristic approaches such as Lagrangian relaxations, approximation algorithms, and branch-and-cut and metaheuristics approaches [22,30,35,[40][41][42][43][44]. In particular, the authors in [35] propose a Lagrangian-based heuristic for the DCMST problem that uses a greedy construction heuristic followed by a heuristic improvement procedure. ey report extensive computational experiments and indicate that their solving method is competitive with the best heuristics and metaheuristics proposed in the literature. Instances with up to 2000 nodes are reported. Similarly, the authors in [22] propose a branch-and-cut algorithm for this problem dealing with instances with as many as 800 nodes. Concerning metaheuristic approaches, a novel genetic algorithm is proposed in [41]. More precisely, the authors first propose a transformation of the problem into a preference with two objective minimum spanning tree problems. en, new crossover, mutation, and selection operators together with a local search approach are derived based on the preference of the two objectives. Finally, they show that their algorithm converges with probability one to a globally optimal solution. A more recent VNS approach is presented in [45] where the authors develop ideas that allow the enhancement of the performance of the VNS by guiding the search in the shaking phase while using second-order algorithms and skewed functions. ey conduct computational experiments on hard benchmark instances from the literature and improve upon the best-known solutions. Finally, their solutions significantly reduced the gaps with respect to tight lower bounds reported in the literature as well. Other variants in the DCMST problem such as the multiperiod DCMST and min-degree-constrained minimum spanning tree problems are presented in [46,47].
In this paper, our VNS approach for the DCkMST problem performs a traditional near-far random local search procedure as performed in the reduced VNS presented in [28,29] and also a novel one which consists of embedding a Q-learning strategy [31] for each random generated k-tuple of vertices of V. In both cases, we use a modified penalty heuristic approach proposed in [30] in order to find spanning trees while satisfying the degree constraints. Finally, our ACO algorithms are adapted from [25][26][27] and incorporate an adaptive mechanism that allows obtaining consecutive k-Hamiltonian paths within each iteration in order to find feasible solutions. Practical applications related to the DCMST include the design of computer and communication networks, electric circuits, design of integrated circuits, road networks, and energy networks, among many others [17,35]. Notice that, aside from these applications, all the proposed models and algorithms could also be extended and adapted for multiagent systems with varying graph topologies in order to deal with self-organization and congestion control in communication networks [48,49].

Mathematical Formulations
In this section, for the sake of clarity, we first present a feasible solution for the DCkMST problem, and then, we present and explain the proposed MILP models.

A Feasible Solution for the DCkMST Problem.
In Figure 1(a), we depict an input complete graph instance with 10 nodes where the coordinates of each node are randomly generated within a square area of 1 km 2 , whereas in Figures 1(b) and 1(c), we show feasible solutions obtained for the instance in Figure 1(a) while using degree values of 3 and 2, respectively. For the sake of clarity, in Figures 1(b) and 1(c), we only draw the edges and highlight with green color the nodes which are part of the solutions, i.e., the nodes 1, 2, 3, 5, 6, 7 { }. Notice that the minimum cost value of an optimal solution in Figure 1(b) should be lower than or equal to the minimum cost value of an optimal solution in Figure 1(c) since the latter is also a feasible solution for the degree value of 3. On the opposite, the solution in Figure 1(b) is not feasible for the degree value of 2.

Proposition 1.
e following statements are true: (1) If k � n and d ≤ n − 2, the DCkMST problem reduces to the DCMST problem. (2) If k < n and d ≥ k − 1, the DCkMST problem reduces to the kMST problem. (3) If k � n and d ≥ n − 1, the problem reduces to the classical minimum spanning tree (MST) problem, which can be solved in polynomial time by a greedy algorithm [50,51].
From Proposition 1, it is easy to see that the DCkMST problem is NP-hard as it generalizes both the DCMST and KMST problems. Hereafter, we present the MILP models.

MILP Models.
In order to formulate a first MILP model, let H � (V, A) be the directed graph obtained from G(V, E), where each edge (i, j) ∈ E is replaced by the two arcs (i, j) and (j, i) ∈ A. us, we have the following proposition.

Proposition 2.
Model P 1 allows to obtain an optimal solution for the DCkMST problem with minimum cost: Complexity i|(i,j)∈A where P � (P ij ) for each (i, j) ∈ A is defined as an input symmetric matrix denoting the connectivity costs between nodes i, j ∈ V. We also define the binary variable z ij to be equal to one if and only if arc (i, j) ∈ A belongs to the solution of the resulting spanning tree and z ij � 0 otherwise.
Proof. Notice that the total connectivity cost is minimized in (1). Next, notice that constraints (2) and (3) ensure that at least k nodes from set V should belong to the output solution of the problem and that the number of arcs must be equal to the number of nodes minus one, respectively. For this purpose, we define the binary variable x j for each j ∈ V where x j � 1 if and only if node j is in the solution and x j � 0 otherwise. Subsequently, the constraints (4)- (7) ensure that the solution does not contain subtours. For this purpose, we also define the nonnegative variable u j for each j ∈ V. To see how these constraints avoid cycles, let us assume that the subset of nodes a 1 , a 2 , a 3 , . . . , a t (t ≤ k) is part of the solution tree and that these nodes are connected according to the following sequence en, the set of arcs (a 1 , a 2 ), (a 2 , a 3 ), . . . , (a t− 1 , a t ) is also part of the solution tree and z a 1 ,a 2 � z a 2 ,a 3 � z a 3 ,a 4 � · · · � z a t− 1 ,a t � 1. Also, by constraint (7), we have the following relation with the u j variables, u a 1 < u a 2 < u a 3 < · · · < u a t . Notice that, for any node in this sequence, we cannot have any other incoming arc; otherwise, we would violate constraint (6) which ensures that each node cannot have more than one incoming arc if it belongs to the solution. In particular, if node a 1 is the root node, then no incoming arc can be connected to it either since this would imply that u a t < u a 1 , which is a contradiction. Finally, notice that any node that belongs to another branch of the resulting tree cannot be connected to any node of any other sequence of nodes forming another branch as this is prevented by constraint (6) too. Consequently, the resulting digraph has to be acyclic. Notice that the constraints (4) and (5) only apply for the nodes which are part of   Complexity the solution by restricting the values of each variable u j , j ∈ V. Also, the degree constraints (8) and (9) ensure that the number of incoming plus outgoing arcs of each node is less than d ∈ 2, . . . , k − 2 { } where k ≥ 4. Notice that the degree constraints are redundant if a particular node j ∈ V is not part of the solution, i.e., if x j � 0. In fact, this also implies that i|(i,j)∈A z ij + i|(j,i)∈A z ji � 0. Next, constraint (10) indicates that at most one of the arcs (i, j), (j, i) ∈ A can belong to the solution if and only if node i ∈ V belongs to the solution. Constraint (11) is a domain constraint for the decision variables. Finally, it is observed that an undirected subtree graph can be obtained by dropping the direction of each arc in the resulting digraph, as required. □ Proposition 3. Constraint (7) is tighter than constraint (35) presented in [2].
Proof. Constraint (35) in [2] is written as However, these constraints can be equivalently written as en, we have that More details related to constraint (7), when used for the traveling salesman problem and extended to other types of vehicle routing problems, can be consulted in [52]. Notice that model P 1 is a directed graph formulation constructed based on a Miller-Tucker-Zemlin characterization [2,3,13]. □ Proposition 4 (see [23]). In a finite simple undirected graph, the sum of the degrees of every vertex v ∈ V is twice the number of edges.
Proposition 4 is known as the handshaking lemma, and it is also valid for directed graphs. In the latter case, the degree of each vertex is simply counted as the sum of incoming plus outgoing arcs. Consequently, this proposition allows us to write the degree constraints (8) and (9) in P 1 equivalently as follows: where δ j , for each j ∈ V, is a continuous nonnegative variable used to denote the degree of each vertex. us, we obtain another MILP model that we denote hereafter by P h 1 . Notice that the variable δ j , for each j ∈ V, need not be defined as an integer variable. In fact, this is ensured by the left-hand side of constraint (15) and the right-hand side of constraint (16) which are both integer values. Constraints (17) and (18) restrict the value of variable δ j between 1 and d if and only if node j ∈ V is part of the solution; otherwise, δ j � 0. Finally, constraint (19) is a domain constraint for each variable δ j , j ∈ V.
In order to formulate a flow-based model for the DCkMST problem, let H � (V ∪ r, A ∪ A r ) be an expanded digraph obtained from H � (V, A) where we add to H an artificial root node r and a set of arcs A r with zero costs from r to every node v ∈ V. e underlying idea is thus to construct an arborescence rooted at r while expanding all nodes in the solution with exactly one arc leaving r. For this purpose, we expand the vector of node and arc variables to x ∈ 0, 1 { } n+1 and z ∈ 0, 1 { } (n+1) 2 , respectively. We also define continuous nonnegative flow variables f ∈ [0, ∞) (n+1) 2 , where f ij denotes the amount of flow on arc (i, j) ∈ A ∪ A r . Finally, if we represent the number of nodes in the solution by a nonnegative variable λ, then the idea is to send λ units of flow from r to these nodes, one unit of flow to every one of them. Consequently, a flow-based model can be verified by means of the following proposition.

Proposition 5.
Model P 2 allows to obtain an optimal solution for the DCkMST problem with minimum cost: Complexity where the objective function is the same as for P 1 .
Proof. In order to prove the correctness of model P 2 , first we make the following observation.
□ Observation 1. e number of leaf nodes (nodes with degree equal to one) in any spanning tree graph with n ≥ 3 vertices is at least 2 and at most n − 1, corresponding to the cases of a line and a star graph, respectively.
Next, notice that constraints (20) and (21) guarantee that λ is at least k units of flow and that it is equal to the number of active nodes in the resulting subtree, respectively, whilst constraint (22) states that the total number of arcs in the solution should be equal to the number of active nodes minus one. Constraints (23) and (24) ensure that the total amount of flow going out from node r equals λ and that it must be moved through a unique arc (r, j) ∈ A r , j ∈ V, respectively. Similarly, constraint (25) states that the incoming minus the outgoing flow equals one if node v ∈ V is part of the solution and equals zero otherwise. Constraint (26) ensures that the flow going out from i to j equals zero if and only if arc (i, j) ∈ A ∪ A r is not part of the solution. Otherwise, it should be at most (n − 1) units. Constraints (27) and (28) are the degree constraints. Notice that the node index in each sum of these constraints applies for all arcs (i, j) ∈ A ∪ A r in contrast to the node indexes in constraints (8) and (9) which only apply for the arcs (i, j) ∈ A. ese degree constraints are valid since Observation 1 implies that the artificial node r can always be connected to a leaf node of the resulting subtree without affecting the maximum degree value d imposed to each node v ∈ V. Next, constraint (29) forces the fact that the root node r is always active. e remaining constraints are the same as for P 1 including constraint (30) which is a domain constraint for the decision variables. Finally, notice that the simultaneous conditions imposed by constraints (22)- (26) ensure that the resulting digraph obtained is a tree [3,13]. Again, an undirected subtree can be obtained by dropping the directions of the arcs. (27) and (9) are tighter than constraints (8) and (28), respectively.

Proposition 6. Constraints
Proof. Let us assume that node t ∈ V is part of the resulting subtree. en, we have that Analogously as for P 1 , we can replace the degree constraints (27) and (28) in P 2 by the set of constraints (15)- (19), leading to another MILP formulation we denote hereafter by P h 2 . Notice that, from the constructions of the above formulations, an exponential model can also be written as follows: where inequalities (33) represent subtour elimination constraints. Similarly, as for the above models P 1 and P 2 , we replace the degree constraints (34) and (35) in P 3 by the set of constraints (15)- (19). We denote by P h 3 this alternative exponential model. Hereafter, we also denote the corresponding linear programming (LP) relaxations of P 1 , P h 1 , P 2 , and P h 2 by LP 1 , LP h 1 , LP 2 , and LP h 2 , respectively.

Proposed Algorithms
In this section, first we present and explain the pseudocode for the exact iterative approach used to solve the exponential models P 3 and P h 3 . en, for the sake of clarity, we present the modified penalty approach proposed in [30] which allows to obtain feasible solutions for the DCMST problem. en, we present our VNS and ACO algorithms while using traditional and embedded Q-learning random local search strategies. Finally, we present a pure Q-learning-based algorithm.

Exact Iterative Algorithm.
As mentioned in Section 1, we solve our models P 3 and P h 3 with an exact iterative method adapted from [2,24]. e method is simple and can be described as follows. It consists of solving first the MILP model P 3 (or P h 3 ) without subtour elimination constraints (SECs).
en, new cycles are obtained from the current optimal solution found with a depth-first search procedure [53]. Next, for each cycle found, we write a new constraint of the form (33) and add it to the feasible set of P 3 (or P h 3 ). Finally, the model P 3 (or P h 3 ) including all the new added inequalities is reoptimized. is process goes on until the current solution found does not contain any cycle. When this condition is met, it means the optimal solution has been reached. is procedure is depicted in Algorithm 1.
e convergence proof of this method is reported in eorem 2 in [24]. Notice that, according to eorem 2 in [24], the solutions obtained within each iteration of Algorithm 1 are lower bounds for the optimal solution of the problem. Further notice that a depth-first search algorithm will always find cycles within each iteration of Algorithm 1 if there exists at least one in the resulting digraph. is fact ensures the convergence of the method in a finite number of iterations (see Property 1 in [24]). We refer the reader to the works in [2,24] for further details on how we obtain cycles.

Modified Penalty Approach.
In particular, within each iteration of our VNS algorithms, for each random k− tuple of vertices generated, we obtain degree-constrained spanning trees by using a modified penalty approach proposed in [30]. e procedure of this method is depicted in Algorithm 2, and it is explained as follows.
Initially, we find a minimum spanning tree using the Kruskal method [53]. en, we enter into a while loop doing the following. First, we check if each node satisfies the degree condition. If it is not the case for a particular node, then we update all the weights of the arcs which are incident to it using the formula P ij � P ij + θ((P ij − eMin)/ (eMax − eMin))eMax, where θ represents an arbitrary parameter which can be drawn from the interval (0;1) and eMin and eMax are the smallest and largest weights of the current spanning tree obtained, respectively. Subsequently, we symmetrize the updated arcs of matrix P ij in order to avoid choosing previous arcs with opposite directions. Finally, if there still exists a node violating the degree condition, we continue with the process and find a new minimum spanning tree with the Kruskal method. is process continues until a feasible spanning tree is obtained where all nodes satisfy the degree condition. As it can be observed, Algorithm 2 mainly consists of penalizing iteratively the edges (arcs) which are incident to the vertices with degree violations until a feasible spanning tree solution for the problem is obtained.

VNS Algorithms.
We now present our VNS algorithms with the classical near-far random local search strategy and then using an embedded Q-learning strategy.

Classical Random Local Search Strategy.
Our VNS procedure is depicted in Algorithm 3. As it can be observed, the method requires an instance of the DCkMST problem and a particular degree with value d ≥ 3. As in this paper, we only consider degree values of d ∈ 2, 3 { }, then we only solve instances for a degree value of d � 3 with this VNS algorithm. e method is simple and can be described as follows. First, it checks if k equals n; if this is the case, then we execute Algorithm 2 using parameters n, P, and d. For this particular case, it means the algorithm finds a feasible solution for the DCMST problem including all nodes of set V. On the opposite, if k < n, then we randomly choose an initial k-tuple of vertices from V in order to form the set As. Let Ac be the complement of As.
Next, we construct the submatrix P ′ with the rows and columns of P corresponding to the nodes in As and execute Algorithm 2 using parameters k, P ′ , and d. is allows us to save an initial feasible solution together with its objective function value. Notice that the cardinality of set As is k, whilst the cardinality of set Ac is n − k. e current sets As and Ac are also saved in AsOP and AcOP, respectively. Subsequently, we perform the following steps iteratively, while the CpuTime variable is less than or equal to the maximum CPU time allowed which is controlled with parameter maxTime. Inside this loop, we randomly interchange an element of As with an element of Ac a predefined number of times which is controlled with parameter indK that is initially set to a value of one. en, we check if a better solution is obtained. If so, we save the new solution found, its objective function value, update sets AsOP and AcOP, and reset the CpuTime variable to zero. Otherwise, we keep the previous best solution found and go back to the interchange step. Notice that parameter indK controls the number of swap moves the algorithm performs between sets As and Ac. It turns out that this step corresponds to a classical near-far random local search strategy related to VNS algorithm [28,29]. e larger the number of swap moves, the wider the feasible space being explored. On the opposite, the smaller the number of moves, the closer the solutions which are being tested. In particular, in Algorithm 3, this is handled with indK variable which is increased by one unit each time a worse solution is obtained. is variable can be increased with up to a predefined value of indKMax − 1. And we reset this parameter to a value of one each time a better solution is obtained in order to restart the search from a local to a wider feasible region. Finally, if indK equals indKMax, then we also reset indK to a value of one.

Embedded Q-Learning Strategy.
Our Q-learning strategy is simple and consists of embedding a reinforcement learning approach in Algorithm 3 instead of using classical random local search. Notice that the Q-learning approach was initially proposed in [31]. is approach mainly consists of an agent, a set of states S, and a set A of actions per state whereby performing an action a ∈ A; the agent moves from one state to another one according to a reward value, and the goal of the agent is to maximize its total future reward. e potential reward that an agent can obtain is computed as a weighted sum of the expected values of the rewards of all future steps starting from any of the current states. In order to embed this Q-learning approach in Algorithm 3, before entering the inner while loop of Algorithm 3, we need to initialize the matrix variable Q � Q(indKM, indKM) with zero entries and also to declare the two scalar variables indKA and indicator.
In particular, the value of indicator variable, which can be true or false, is used to perform or not a piece of the code, whereas variables indK and indKA represent the number of swap moves from one to indKM to be performed between sets As and Ac. us, the underlying idea is to move from a Complexity particular number of swap moves to another one by using the aforementioned reinforcement learning strategy. Under this setting, the values of indKA and indK represent two consecutive states where an agent can be in different moments. For example, consider the case when indKA � 2 and indK � 8, then we should read it as an agent that is moving from state 2 to state 8. In our problem, it means we are first interchanging 2 elements and then 8 elements between sets As and Ac. Consequently, we can replace the inner code of the Else statement corresponding to the classical random local search strategy step of Algorithm 3 with the code of Algorithm 4. By doing so, we provide learning capabilities to Algorithm 3 in order to perform the random local search. In Algorithm 4, the parameters λ, μ, and Prob ∈ (0; 1) are the learning rate, discount factor, and probability of moving randomly from one state to another one [31]. Finally, the function expressions Rand, Randint(indKM), and argmax j∈ 1,...,indKM ) return a random fractional value between zero and one, a random integer value between one and indKM, and the jth column position with current maximum value in matrix Q while moving from state indKA, respectively.

ACO Algorithms.
We now present the ACO algorithms. More precisely, first we present our classical ACO based strategy and then we present an adapted version of the generic ant-Q algorithm which is based on Q-learning strategy [26,27]. Finally, we present a pure Q-learning-based algorithm. All these methods allow finding feasible solutions for the DCkMST problem while using a degree value of d � 2.

Classical Ant Colony Optimization Strategy.
Our first procedure is depicted in Algorithm 5. e method requires an instance of the DCkMST problem. In step one, it first initializes all required parameters and variables. In particular, ρ, α, β, and q denote the pheromone evaporation coefficient, the influence of the pheromone, influence of visibility matrix, and a constant real value, respectively. Each entry of the pheromone matrix τ � (τ ij ) for all i, j ∈ V represents the amount of pheromone deposited by the ants in the edge i, j for a state transition going from node i to j. Notice that each of these entries is initialized to a random value between zero and one, which is performed by using the Rand function. Similarly, the visibility matrix η � (η ij ) for all i, j ∈ V represents the desirability value of edge i, j while choosing the state transition from i to j. In particular, this value is computed by η ij � (1/P ij ). e set of ants is denoted by Ants. e variables CpuTime, iter, LBest, and BestCPU denote the current CPU time, the number of iterations, and best current objective function value found so far and its exact CPU time in which this solution is obtained. Recall that, within ACO algorithms, each ant constructs a tour solution starting from a particular initial node while covering all nodes of the graph. For this purpose, each ant selects the next edge in its tour according to the following probability: where R a denotes the unvisited set of nodes of ant a. As it can be observed, this probability depends on both the pheromone and visibility values [26]. Subsequently, in step two of Algorithm 5, we enter into a while loop in which we perform the following steps. First, we initialize to a value of zero each entry of matrix Δτ � (Δτ a ij ), i, j ∈ V, a ∈ Ants, where each entry in this matrix represents the amount of pheromone deposited by ant a ∈ Ants. Next, for each ant, we construct a tour T a starting from a randomly assigned node and compute its length L a . en, for each arc in T a , we deposit the amount of pheromone of ant a. is value is computed by Δτ a ij � (q/L a ). en, we compute the shortest tour of length k according to T a and save it as the best tour found so far if its objective function value is less than LBest. For this purpose, we denote by L k a (t) the length of k consecutive nodes in T a starting from node t.
Also notice that the variable CpuTime is reset to a value of zero each time a better solution is obtained. is allows Algorithm 5 to run for another maxTime unit of time with the hope of finding better solutions. Next, we update each arc of the pheromone matrix τ according to parameter ρ and matrix Δτ. Finally, when the while loop is finished, we return the best feasible solution obtained together with its objective function value.

Ant-Q-Based
Approach. Now, we present our adapted version of the generic Ant-Q algorithm which is based on Q-learning strategy [25,27]. is procedure is depicted in Algorithm 6.
Similarly to Algorithm 5, in step one of the Ant-Q Algorithm 6, we first initialize the required parameters and variables. In this case, the parameters α and β allow us to weigh the relative importance of learned AQ and visibility values, respectively. Parameter q is a nonnegative constant value, whilst parameters Prob, λ, and μ represent a probability value, a learning step, and a discount factor, respectively. Variables CpuTime, iter, LBest, BestCPU are analogously defined as for Algorithm 5. In particular, matrices AQ � (AQ(i, j)) and Δ � (Δ(i, j)), for all i, j ∈ V, denote the Q-learning matrix and the delayed reinforcement matrices associated with the Ant-Q learning method [31].
In step two of Algorithm 6, we enter into a while loop and perform the following substeps. First, we construct a tour for each ant a ∈ Ants.
is is performed iteratively while choosing a new unvisited node j randomly or according to the following expression: where T a (i) denotes the current ith position of ant a in its tour T a . en, we update the learning matrix AQ according to all the edges of each ant tour and save the best solution obtained with cardinality k. Next, we proceed with 8 Complexity reinforcement steps that are applied to all the edges belonging to each ant tour and best solution found so far [25,27]. Finally, the algorithm returns the best feasible solution obtained and its objective function value.

A Pure Q-Learning-Based
Algorithm. Now, we present a pure Q-Learning based approach that allows obtaining feasible solutions for the DCkMST problem while using d � 2. is method is depicted in Algorithm 7, and it is described as follows.
Similarly to Algorithm 6, the first step of Algorithm 7 consists of initializing parameters λ, μ, and Prob, which represent the learning rate, discount factor, and a given probability value, respectively. We also define and initialize the required variables CpuTime, iter, LBest, BestCPU, and ese variables allow us to handle the current CPU time, the number of iterations, the best objective function value obtained so far, CPU time of the best current solution found, and the Q-learning matrix, respectively. Next, in step two of Algorithm 7, we iteratively construct a unique tour T with cardinality n and evaluate each subtour composed of k consecutive nodes in T. ese steps are performed, while the current CPU time value is lower than the maximum value allowed which is denoted by maxTime. In order to obtain the best subtour of cardinality k from each tour T generated, we let L k (t) denote the length of k consecutive nodes in T, starting from node t, and let L k � min L k (1), . . . , L k (n − k + 1) be the minimum length value obtained. Notice that index t � 1, . . . , n − k + 1 { }; otherwise, no subtour of cardinality k can be obtained from T. If a better solution is obtained, we save its length value in LBest, the subtour solution in T k , the number of iterations in which this new solution is obtained, and reset the current CPU time value to a value of zero. e latter allows the algorithm to search for another maxTime unit of time. Also, notice that the construction of tour T requires to add at each step a new unvisited node u randomly or with maximum value Q(j, u) where j denotes the last node added to T. For this purpose, we remove node u from V at each step. e unvisited node u is chosen randomly if the Rand function generates a value in the interval [0; 1] which is lower than Prob; otherwise, it is chosen according to the maximum value in the Q-learning matrix. Next, if set V is not empty, we update the corresponding entry in the Q-learning matrix as follows: Notice that equation (39) is analog to the classical Q-learning algorithm [31]. In particular, the term (1/P(j, u)) represents the reward value obtained when going from node j to node u ∈ V. Finally, the algorithm returns the best feasible solution obtained and its objective function value.

Numerical Experiments
In this section, we conduct substantial numerical experiments in order to compare all the proposed models and algorithms. For this purpose, we implement Matlab programs using CPLEX 12.7 solver [54]  { } nodes where each entry in the symmetric matrix P � (P ij ), (i, j) ∈ A is randomly drawn from the interval (0; 100). en, we further consider four Euclidean benchmark instances from TSPLIB [34] referred to as "Berlin52," "gr96," "ch150," and "a280" with dimensions of n � 52, 96, 150, 280 { } nodes, respectively. All these instances are solved for a different number of active nodes k ≤ n ranging from k � 10 and up to k � 400 nodes for the instances with random costs and up to k � 280 for the Euclidean ones.

Numerical Results for the MILP Models.
In Tables 1-4, we present numerical results for P 1 , P h 1 , P 2 , and P h 2 . Notice that all these tables present the same column information. More precisely, column 1 shows the instance number, whilst columns 2 and 3 present the number of nodes of each graph instance and the value of k, respectively. Next, in columns 4-8 and 9-13, we present the optimal or best solution obtained with each model in at most 2 hours of CPU time, number of branch and bound nodes, CPU time in seconds, the optimal solution of each LP relaxation, and its CPU time in seconds. Finally, columns 14 and 15 present gaps that we compute by [(Opt − LP)/Opt] * 100 where Opt and LP refer to the optimal solution found with the MILP and LP models, respectively. Also, notice that each row in Tables 1 and 3 and  in Tables 2 and 4 corresponds to a same instance.
We limit CPLEX to 2 hours of CPU time for the random input graphs, whilst for the Euclidean ones, we limit CPLEX to 1 hour in order to avoid CPLEX shortage of memory events. Consequently, each reported solution is optimal if its CPU time is less than 2 hours. Otherwise, it corresponds to the best solution obtained with CPLEX in at most 2 hours. Subsequently, in Tables 5 and 6, we present numerical results for both P 3 and P h 3 . ese two tables also present the same column information. In particular, columns 1-3 present the same information as in Tables 1-4. Next, in columns 4-8 and 9-13, we report the optimal or best solution obtained with Algorithm 1 in at most 1 hour of CPU time, number of branch and bound nodes, CPU time in seconds, number of cycles added to each model, and number of iterations as well.
e number of branch and bound nodes, in this case, corresponds to the sum of all branched nodes within each iteration of Algorithm 1. Notice that eorem 2 in [24] ensures that the solutions obtained within each iteration of Algorithm 1 are lower bounds for the optimal solution of the problem. Consequently, an optimal solution is obtained when the corresponding CPU time is less than 1 hour. Notice that obtaining tight lower bounds is of crucial importance when developing exact methods as it allows to speed up the process significantly. Further notice that Complexity 9 models P 1 , P h 1 , P 2 , and P h 2 obtain upper bounds for the problem if the optimal solution cannot be reached within 2 hours. us, we provide an interval where the optimal solution lies.
From Table 1, first we observe that both P 1 and P h 1 allow obtaining the optimal solution of the problem for most of the instances in less than 2 hours. In particular, for the degree value of d � 3, we see that P 1 and P h 1 cannot solve the instances #12 and #11-#12, respectively, whilst for d � 2, all the instances are solved to optimality. Regarding the CPU times, for d � 3, we see that P h 1 requires a higher amount of CPU time, whilst for d � 2, we obtain similar values with both models. Next, we observe that the number of branch and bound nodes is slightly lower for P 1 than for P h 1 . We also see that the LP bounds are near-optimal and exactly the same for both models which are confirmed by the gaps. Notice that all the LP relaxations are solved in less than 3 seconds. Finally, we observe that the CPU time values required to solve both models increase with k.
Regarding the Euclidean graph instances presented in Table 2, we observe that most of them cannot be solved to optimality in one hour. We mention that these instances proved to be very difficult to solve, and this is the main reason we limit CPLEX to one hour in order to avoid CPLEX shortages of memory. e difficulty in solving these instances is also reflected in the number of branch and bound nodes and in the gaps obtained which are significantly higher compared to Table 1. Notice that these gaps decrease with k in which evidences solving the DCkMST problem is harder to solve than the classical DCMST problem, at least for these instances. Next, we observe that the CPU time values of the LP relaxations are higher than those reported in Table 1 and that the degree values do not seem to affect the performance of each model. We further see that the optimal or best objective function values obtained for the instance "ch150" are significantly higher for d � 2. Finally, we observe that the objective function values obtained with P 1 for the instances #7-#10, #12 and #6-#9, and #11 are lower than those obtained with P h 1 for d � 3 and d � 2, respectively. However, the opposite occurs for the instances #6, #11 and #10, #12 while using d � 3 and d � 2, respectively.
From Table 3, we mainly observe that, for d � 3, only P 2 allows obtaining the optimal solution of the problem for all the instances in less than 2 hours. Notice that P h 2 only failed to find a feasible solution for the instance #10, whilst for d � 2, neither P 2 nor P h 2 can solve all the instances to optimality. However, P 2 can still solve more instances than P h 2 . Regarding the number of branch and bound nodes and LP times, we observe smaller and slightly higher values than in Table 1, respectively. Finally, we observe that the LP bounds are near-optimal which is confirmed by the gaps. In Table 4, we observe that P 2 and P h 2 can solve more instances to optimality than P 1 and P h 1 in Table 2, respectively. However, in this case, both P 2 and P h 2 cannot find a feasible solution for some of the instances. Next, we observe smaller values for the branch and bound nodes and CPU times with similar Data: a problem instance of P 3 (or P h 3 ) Result: the optimal solution of P 3 (or P h 3 ) Solve P 3 (resp., P h 3 ) without using any constraint of the form (33) Find cycles on the resulting digraph obtained by using a depth-first search algorithm While (no cycles remain in the optimal solution of P 3 (resp., P h 3 )) do For each cycle found, write a new constraint of the form (33) and add it to the feasible set of P 3 (resp., P h 3 ) Solve P 3 (resp., P h 3 ) Return: the optimal solution found and the objective function value ALGORITHM 1: Iterative procedure for solving P 3 (or P h 3 ) Data: a problem instance of the DCMST problem.
Result: a feasible solution for the DCMST problem. Find a minimum spanning tree using the Kruskal method [53] Set SW � true Compute the degree of node j and save this value in variable d j If (d j > d) then SW � true Update all incident arcs (i, j) ∈ A for node j as follows: Find a minimum spanning tree using the Kruskal method [53] Return feasible solution obtained and its objective function value ALGORITHM 2: Modified penalty approach to obtain feasible degree-constrained spanning trees.

Complexity
Data: an instance of the DCkMST problem using degree d � 3.

Result: a feasible solution and its objective function value.
If (k � n) then Execute Algorithm 2 using parameters (n, P, d) Save feasible solution found and its objective function value Else Classical random local search strategy: Generate an initial random k− tuple of vertices. Let As denote this set of vertices and Ac its complement, P ′ � P(As, As) Execute Algorithm 2 using parameters (k, P ′ , d) Save the initial feasible solution found and its objective function value AsOP � As, AcOP � Ac indK � 1, cont � 0, CpuTime � 0 While (CpuTime ≤ maxTime) do iter � iter + 1 For i � 1 to indK do Interchange randomly an element of As with an element of Ac P′ � P(As, As), execute Algorithm 2 using parameters (k, P ′ , d)

If (a better solution is obtained) then
Save the new solution and set AsOP � As, AcOP � Ac indK � 1 Return best feasible solution obtained and its objective function value ALGORITHM 3: VNS algorithm for the DCkMST problem using classical random local search strategy.
Generate an initial random k− tuple of vertices. Let As denote this set of vertices and Ac its complement P ′ � P(As, As) Execute Algorithm 2 using parameters (k, P ′ , d) Save the initial feasible solution found and its objective function value AsOP � As, AcOP � Ac indK � 1, indKA � 1, Q � zeros(indKM, indKM), indicator � false, CpuTime � 0 While (CpuTime ≤ maxTime) do iter � iter + 1 For i � 1 to indK do Interchange randomly an element of As with an element of Ac Set P ′ � P(As, As) and run Algorithm 2 using parameters (k, P ′ , d)

If (a better solution is obtained) then
Save the new solution and set AsOP � As, AcOP � Ac, indicator � true iterOp � iter, Optime � Optime + CpuTime, CpuTime � 0 Else As � AsOP, Ac � AcOP Complexity orders of magnitude for the LP relaxations when compared to Table 2. Finally, we observe that the LP bounds are not tight which is again confirmed by the gap columns.
From Table 5, we mainly observe that models P 3 and P h 3 , which are both solved with Algorithm 1, allow to solve to optimality the instances #1-#9 and #1-#12 for d � 3 and d � 2, respectively. In particular, we see that, for d � 3, it is harder to solve these instances than for d � 2. is fact is reflected in the number of branch and bound nodes, CPU times of the LP relaxations, number of iterations, and number of cycles added to each exponential model which are clearly lower when d � 2. Notice that, for the instances #10-#12 and degree value of d � 3, we only report the best objective function values obtained which are in fact lower bounds. Regarding the Euclidean instances reported in Table 6, we observe that the instances #1-#3, #12 and #1-#8, #11-#12 are all solved to optimality for d � 3 and d � 2, respectively. We also see in Table 6 that it is harder to solve the instances for d � 3 than for d � 2. Notice that, for d � 2, we almost solve all the instances to optimality with the exception of instances #9 and #10 for which we obtain lower bounds. Again, this observation can be verified by looking at the number of branch and bound nodes, CPU times of the LP relaxations, number of iterations, and number of cycles added to each proposed model which are smaller when d � 2. Finally, from the numerical results presented in Tables 1-6, we can conclude that model P 1 outperforms the other ones as it allows to obtain either an optimal or a best upper bound for most of the tested instances. Notice that the flow and exponential models also have good performance in terms of optimality, but for the large-size instances, the flow models deteriorate rapidly, whilst the exponential ones cannot be handled efficiently in terms of CPU times. Consequently, in Tables 7 and 8, we present numerical results for P 1 for large random and Euclidean input graph instances with up to 400 and 280 nodes, respectively. In particular, in Table 7, these numerical results are reported for d � 3, whereas in Table 8, these numerical results are obtained while using a degree value of d � 2.
From Tables 7 and 8, we observe similar trends as for the  above Tables 1-4. More precisely, we observe that P 1 can solve almost all random input graph instances to optimality in less than 2 hours using both degree values. Only the instance number #8 could not be solved to optimality although an upper bound is reported for this particular instance. In contrast, none of the Euclidean instances can be solved to optimality in less than 1 hour of CPU time. For these instances, only upper bounds are reported too. e difficulty in solving the Euclidean instances can also be observed in the gap columns which report significantly higher values as a consequence of the LP bounds obtained. On the opposite, the LP bounds obtained for the random graph instances are near-optimal. Finally, we observe from both Tables 7 and 8 that the number of branch and bound nodes is significantly smaller for d � 2 than for d � 3 and that the LP times are slightly higher for d � 2. Tables 9 and  10, we present numerical results obtained with the proposed Data: an instance of the DCkMST problem using degree d � 2. Result: a feasible solution and its objective function value.

Numerical Results for VNS Algorithms. In
Step 1 Initialize parameters ρ, α, β, q and the set of ants (Ants) Construction of each ant tour Randomly assign an initial position j ∈ V for the ant a according to graph G and construct a tour T a using all remaining nodes of G.

Find best solution
Compute the length L a of T a Denote by L k a (t) the length of k consecutive nodes in T a starting from node t Compute L k a � min L k a (1), . . . , L k a (n − k + 1) If (LBest > L k a ) then Set LBest � L k a and save the best tour of length k in T k iterOp � iter, BestCPU � BestCPU + CpuTime, CpuTime � 0 Update accumulated pheromone matrix Return best feasible solution found and its objective function value ALGORITHM 5: Classical ACO algorithm for the DC kMST problem.

Complexity
VNS algorithms for both random and Euclidean input graph instances while using a degree value of d � 3. Recall that our second VNS approach consists of replacing the code lines of Algorithm 3 corresponding to the random local search strategy with the code of Algorithm 4 which represents the embedded Q-learning strategy. Hereafter, we denote by VNS R and VNS Q these two VNS approaches. In Algorithms 3 and 4, we arbitrarily set the maximum CPU time allowed to a value of maxTime � 250 seconds, whereas the input parameters indKMax and indKM are set to a value of 10. Similarly, the required parameters of Algorithm 4 were calibrated to λ � 0.25, μ � 0.25, and Prob � 0.15. Finally, the reward value and parameter θ of Algorithm 2 were set to Reward � 1 and θ � 0.1, respectively. In particular, in Table 9, we report numerical results for the instances presented in Tables 1-6, whereas in Table 10, we report numerical results for the large-size instances presented in Table 7. In both tables, columns 4-15 and 2-13 contain exactly the same column information, respectively. More precisely, in Table 10, columns 1 to 3 present the instance number, number of nodes, and the value of k. Next, in columns 4 and 5, we report the minimum objective function and CPU time values reported for each instance in Tables 1-4. Subsequently, in columns 6-10 and 11-15, we Data: an instance of the DCkMST problem using degree d � 2. Result: a feasible solution and its objective function value.
Step 1 Initialize parameters α, β, q, Prob, λ, μ, and the set of ants (Ants) ForEach a ∈ Ants do If (Rand < Prob) then Randomly choose an unvisited node of the ant and add it to T a .

Find best solution ForEach a ∈ Ants do
Denote by L k a (t) the length of k consecutive nodes in T a starting from node t Compute L k a � min L k a (1), . . . , L k a (n − k + 1) If (LBest > L k a ) then Set LBest � L k a and save the best tour of length k in T k iterOp � iter, BestCPU � BestCPU + CpuTime, CpuTime � 0

Complexity 13
Data: an instance of the DCkMST problem using degree d � 2.
Result: a feasible solution and its objective function value.
Step 1 Initialize parameters λ, μ, Prob Set LBest � L k and save the best tour of length k in T k iterOp � iter, BestCPU � BestCPU + CpuTime, CpuTime � 0 Return best feasible solution found and its objective function value ALGORITHM 7: Pure Q-learning-based approach for the DCkMST problem.     Tables 9 and 10, first we observe that the objective function values of the initial solutions are significantly higher than the best ones. e latter clearly evidences the effectiveness of both VNS approaches. Notice that when k � n, the problem reduces to the classical DCMST problem. Consequently, for these instances, the solutions are obtained only with Algorithm 2. Next, we observe that the CPU times required by both VNS methods are larger for the random instances than for the Euclidean ones. From Table 9, next, we observe that the solutions obtained with VNS Q outperform those obtained with VNS R for both random and Euclidean instances. Although, in general, we see that both VNS procedures allow obtaining near-optimal solutions which is confirmed by the gap columns. On the opposite, we see that the objective function values reported in Table 10 are not tight for the random instances when compared to those obtained by the MILP models. But still, in this case, the gaps reported for VNS Q are significantly better than for VNS R . However, for the Euclidean instances reported in Table 10, we see that both VNS approaches allow us to obtain better solutions than the MILP models. e latter can be verified by the negative gaps which show that the objective function values obtained with VNS algorithms are significantly lower than those obtained with the MILP models.

Numerical
Results for ACO Algorithms. Now, we report numerical results obtained with Algorithms 5 and 6 for random and Euclidean instances while using a degree value of d � 2. More precisely, in Table 11, we report numerical results for the instances presented in Tables 1-6. However, in Table 12, we report numerical results for the large-size instances presented in Table 8. Hereafter, we denote by ACO R and ACO Q the two ant colony optimization approaches presented in Algorithms 5 and 6, respectively. Recall that the first one is constructed based on the classical ACO metaheuristic [26]. However, the latter is based on the Ant-Q algorithm proposed in [25,27]. In Algorithm 5, we calibrated the input parameters to α � 0.5, β � 2, ρ � 0.9, and q � 1. However, in Algorithm 6, we calibrated the input parameters to α � 1, β � 4, q � 1, Prob � 0.1, λ � 0.15, and μ � 0.15. Finally, in both Algorithms 5 and 6, we arbitrarily set the maximum CPU time and number of ants to maxTime � 250 seconds and |Ants| � 20, respectively. e legend of Table 11 is as follows. From columns 1 to 3, we present the instance number, the number of nodes of the input graph, and the value of k. Next, in columns 4 and 5, we present the minimum objective function and CPU time values reported in Tables 1-6. We repeat this information for the sake of clarity. Next, in columns 6-10 and 11-15, we report the objective function values of the initial solutions, the objective function values of the best solutions, CPU time in seconds, number of iterations, and gaps which are computed by [(ACO − Best)/Best] * 100, respectively. Similarly, the first three columns of Table 12 report the instance number and the best objective function values and CPU times reported in Table 8. Again, this information is repeated for comparison purposes. Finally, the legends of columns 4-8 and 9-13 are exactly the same as for the columns 6-10 and 11-15 in Table 11. From Table 11, we observe that the best solutions obtained with both ACO approaches are near-optimal and far from the initial solutions obtained. ese facts prove the effectiveness of the proposed methods. Next, we see that the CPU times are lower for ACO R than for ACO Q for most of the instances and in particular for the larger ones. Regarding the number of iterations, in general, we observe similar orders of magnitude for both methods. Finally, we observe that the gaps obtained with ACO R are tighter than those obtained with ACO Q for both the random and Euclidean instances. In particular, for large instances, these values are significantly better. Finally, notice that we obtain negative gaps for some of the large Euclidean instances. is clearly evidences that the solutions obtained with the ACO methods outperform significantly the solutions obtained with the MILP models.
From Table 12, we mainly observe that the solutions obtained with both ACO algorithms are significantly worse than those reported in Table 11 for random instances. On the opposite, for all the Euclidean instances, the solutions obtained with the ACO approaches are significantly better than those obtained with P 1 which is again confirmed by the negative gaps obtained. Tables 13 and 14, we present numerical results obtained with Algorithm 7 (denoted as QL) for both random and Euclidean instances while using a degree value of d � 2. In Tables 13 and 14, we report numerical results for the same instances presented in Tables 1-6 and for the instances of Table 8, respectively. e legend of Table 13 is as follows: columns 1-3 present the instance number, number of nodes (or name of the Euclidean instance), and the value of k, respectively. Notice that the name of each Euclidean instance indicates at the end the number of nodes it contains. For example, the instance name "Berlin52" has 52 nodes. Next, columns 4 and 5 report the best solution and minimum CPU time in seconds obtained with the MILP models. Finally, columns 6-10 report the initial solution obtained with Algorithm 7, its best solution found, CPU time in seconds, number of iterations, and gaps obtained which are computed by [(QL − Best)/Best] * 100, respectively. e legend of Table 14 is exactly the same as for Table 13 and is obtained by removing columns 2 and 3 from Table 13.

Numerical Results for QL Algorithm. In
From Table 13, first we observe that the initial solutions obtained with Algorithm 7 are considerably worse than those obtained with the ACO ones. However, we also see that the best solutions obtained with it are near-optimal and competitive with the ACO methods. Notice that this fact is relevant as it clearly shows the effectiveness of Algorithm 7 which is mainly based on its learning capability and simplicity. Concerning the gaps reported in Table 13, we observe that, for some random and Euclidean instances, Algorithm 7 allows obtaining tighter gaps than ACO Q in less CPU time.
Finally, we observe that the number of iterations required by Algorithm 7 is significantly larger than the ACO methods. is can be explained by the fact that each iteration of Algorithm 7 requires a considerable less computational effort.
From Table 14, we observe that the distance between the initial and best solutions obtained with Algorithm 7 is even larger than for the instances presented in Table 13. is confirms again that the learning capability of Algorithm 7 is effective. Next, we see that the CPU time required by Algorithm 7 is significantly lower than the amount required by ACO Q for random instances. We also see that the gaps reported in Table 14 are tighter than those reported in Table 12 for the ACO Q approach for all random instances. In particular, we see that, for the random instances #2 and #3 in   Table 14, the gaps obtained by Algorithm 7 are smaller than those reported for both ACO methods in Table 12. Regarding the Euclidean instances, we observe that the gaps obtained are still competitive when compared to those reported for the ACO methods. In fact, we also obtain negative gaps which means we outperform the best solutions obtained with the MILP models in less than 1 h. Finally, from Table 14, we observe that the gap obtained for the Euclidean instance #4 outperforms both gaps reported for the ACO methods in Table 12.

Average Numerical Results for the Proposed Algorithms.
In order to give more insights with respect to the behavior of the proposed algorithms, in Figures 2 and 3, we report average upper bounds and CPU times in seconds obtained with both VNS R and VNS Q for different values of k. More precisely, we randomly generate 20 large-size instances using n � 300 nodes for each value of k in each figure. In particular, in Figure 2, these averages are reported for instances using random costs, whereas in Figure 3, these average values are reported for Euclidean instances. From Figures 2 and 3, we mainly observe that the average upper bounds obtained with VNS Q are significantly smaller than those obtained with VNS R . is fact clearly shows that the embedded Q-learning strategy of VNS Q outperforms the classical near-far local search approach. Next, we further notice that the average CPU times are significantly smaller for VNS Q than for VNS R when solving the Euclidean instances. On the opposite, the VNS Q approach requires a significantly   Figure 4, these averages are reported for instances with random costs. However, in Figure 5, these values are reported for Euclidean instances. From Figures 4 and 5, we mainly observe that the average upper bounds obtained with ACO Q are slightly larger than those obtained by QL. In turn, the average upper bounds obtained with QL are larger than those obtained with ACO R . We further notice that this trend is more evident when solving graph instances with Euclidean distance costs. On the opposite, these bounds seem to be closer for the instances with random costs. Finally, we observe that the average CPU time values are significantly smaller for ACO R than for QL and ACO Q . Similarly, QL requires less CPU time than ACO Q . In conclusion, we observe that ACO R outperforms both QL and ACO Q . However, the QL method outperforms the ACO Q approach.  Figure 2: Average upper bounds and CPU times in seconds obtained with VNS R and VNS Q while varying k for random complete graph instances of size n � 300 using d � 3.

Conclusions
In this paper, we consider the degree-constrained k-cardinality minimum spanning tree problem which emerges as a combination of two classical combinatorial optimization problems, namely, the degree-constrained and k-minimum spanning tree problems. One can see from the literature that this problem has not been studied in depth yet. is leads us to propose three mixed-integer linear programming models for which we derive equivalent formulations by using the handshaking lemma. In order to obtain near-optimal solutions, we further propose ant colony optimization, variable neighborhood search, and a pure Q-learning-based algorithm. In particular, for each proposed metaheuristic, we obtain new algorithms while embedding a Q-learning strategy. We conduct substantial numerical experiments using benchmark input graph instances from TSPLIB and randomly generated ones with random uniform and Euclidean distance costs with up to 400 nodes. From the numerical results obtained, our main conclusions can be listed as follows: (1) We observe that, in general, the proposed models which are constructed based on Miller--Tucker-Zemlin-constrained approach are more robust than the flow ones as they allow to obtain optimal or best upper bounds for all the instances. In particular, we see that the flow models allow us to solve to optimality Euclidean instances with up to 100 nodes, which is not possible to achieve with the other models. However, the flow models cannot provide an upper bound for some of the instances with higher dimensions. Similarly, we observe that our proposed exponential models do also show good performance in terms of optimality, but again they cannot be handled efficiently when solving large-size instances. However, they provide an interval where the optimal solution lies. We further conclude that it is not evident to decide whether the performance of the proposed models improves or deteriorates while using the handshaking lemma. Finally, we observe that the Miller-Tucker-Zemlin-based models allow us to obtain optimal solutions for instances with up to 400 nodes while using random costs and degree values of d ∈ 2, 3 { }. On the opposite, they cannot solve all the Euclidean instances to optimality.
(2) Concerning the VNS algorithms, we observe that the objective function values obtained with both VNS algorithms are significantly lower when compared to their initial solutions obtained. is clearly evidences the effectiveness of our VNS approaches. In general, the two VNS procedures allow obtaining near-optimal solutions and even better solutions than CPLEX for the large-size instances. Next, we see that the CPU times required by these algorithms are larger for the random instances than for the Euclidean ones. Notice that Euclidean instances are  significantly harder to solve by the MILP models. We can also conclude that the embedded Q-learning strategy in our VNS algorithm allows us to obtain better solutions than the classical near-far random local search strategy.
is suggests that the construction of optimization methods with learning capabilities in order to make them robust, selfadaptive, and independent from decision-makers is worth to be further investigated.
(3) Regarding the ACO algorithms, we observe that both methods allow obtaining near-optimal solutions which certainly prove their effectiveness. In general, we obtain better solutions for the Euclidean instances than for the random ones. In particular, the quality of the Euclidean solutions obtained improves significantly for large-size instances of the problem for which both ACO methods obtain better solutions than CPLEX. Next, we observe that the solutions obtained with ACO R are better in terms of quality when compared to the solutions obtained with ACO Q for both random and Euclidean instances of the problem. Concerning our pure Q-learning approach proposed to solve instances with degree d � 2, we observe that the solutions obtained for random and Euclidean instances are competitive with those obtained with the ACO methods. In particular, we see that the CPU time required by this method is significantly lower than the amount required by ACO Q . en, we further see that, for some of the instances, the gaps obtained by the pure Q-learning approach are tighter than those obtained with ACO Q . We also obtain better solutions than CPLEX solver which is used to solve the MILP model. Finally, notice that this pure Q-learning method is simple and versatile, and then, it can be adapted to any other combinatorial optimization problem in a straightforward manner.
As future research, we plan to propose new formulations related to the degree-constrained k-minimum spanning tree problem with applications on network design problems.

Data Availability
e data used to support the findings of the study are available upon request to the corresponding author.

Conflicts of Interest
e authors declare that they have no conflicts of interest.