Optimizing Network Controllability with Minimum Cost

In this paper, the issue of optimally modifying the structure of a directed network to guarantee its structural controllability is investigated. Given a directed network, in order to obtain a structurally controllable system, a framework for finding the minimum number of directed edges that need to be added to the network is proposed. After we get these edge-addition configurations, we further calculate the network cost of each optimization scheme and choose the one with the minimum cost. Our main contribution is twofold: first, we provide an algorithm able to find all optimal network modifications in polynomial time; second, we provide a way to calculate the cost of optimizing the network based on the node betweenness. Numerical simulations are given to illustrate the theoretical results.


Introduction
e ultimate goal of complex network research is to find effective means to control network behavior and make it serve human beings. Controllability is a basic concept in control theory, which quantifies the ability to control a dynamical system from any initial state to any final state in finite time [1]. In the past decade, the issue of network controllability for complex dynamical systems has attracted increasing attention and becomes a focal topic in interdisciplinary research . Numerous works have been reported from rather diverse perspectives on such topics as structural controllability [2,3]; exact controllability [18]; edge dynamics [19][20][21]; optimization [22][23][24]; control energy [25,26]; and robustness [27,28].
In the study of network controllability, we usually rely on the theory of structural controllability [31][32][33][34][35][36][37]. If there is a matrix pair that is controllable, all structurally equivalent matrix pairs are controllable except for special ill-conditioned cases [31]. Recently, those results have been applied to the controllability analysis of directed complex networks [2,3,16,19,22,23] from a graph-theoretic perspective. Note that it is very effective to analyze network controllability by using tools developed under the background of structural control theory [31].
Optimization of the network controllability is of prime importance in real applications. Generally speaking, given a network which is structurally uncontrollable, we can make it structurally controllable through two strategies: (i) add external input signals to the original network [16] and (ii) add new edges to the network topology [23]. Wang et al. provided a method to change the structure of a complex network to make the system structurally controllable when only a single driver node was considered [22]. Zhang and Zhou considered three related problems on determining the minimal cost structural perturbations, including edge additions, edge deletions, and input deletions to make a networked system structurally controllable/uncontrollable [24]. Chen et al. proposed an approach to adding minimum directed edges to the original network so as to ensure structural controllability [23].
Motivated by the above discussions, a minimum-cost optimization method to guarantee structural controllability is investigated in this paper. It should be emphasized that, differing from [23], in this work, a new method is proposed to optimize network topology and thus to ensure the network controllability. Moreover, it also provides a way to calculate the total cost of optimizing the network. However, in [23], it only gives a method to optimize the network topology without considering the optimization cost. Note that calculating the optimization cost is exactly the major point in this work. In [27], Zhang et al. considered the problem of network cost. Although the measurement index of edge cost was given therein, it did not provide a simple and effective method to calculate the total network cost. Compared with the previous works, we not only address the problem of optimizing network controllability but also propose a way to calculate the cost of optimizing the network. e main contributions of this article are as follows. (i) We propose a new method to optimize the network topology so as to ensure the network controllability. (ii) We propose an algorithm to solve the optimal edge-addition configuration problem. (iii) After getting all the edge-addition configurations, we introduce network cost measurement indexes to calculate the cost of optimizing the network. Based on which, we can determine the optimal edge-addition configuration with minimum-cost. e results of this paper can provide both theoretical and technical guidance for the analysis and control of real complex networks. e obtained results shed some lights on the transformation of a structurally uncontrollable network to a structurally controllable one with a low cost. For example, in the power network, transmission lines with the lowest cost can be set up among substations to safely and efficiently control the entire power network. e rest of the paper is organized as follows. Section 2 introduces the notation and terminology used in this paper. Problem formulation and preliminaries on graph theory are introduced in Section 3.
e main results are given in Section 4. In Section 5, a network cost index is given to determine the minimum-cost edge-addition configuration. Finally, the summary of this paper and the prospect of future research are presented in Section 6.

Notations
In this paper, R denotes the set of real numbers, R m is the space of real m-vectors, and R m×n is the space of m × n real matrices. For a set S, its cardinality is denoted by |S|.
A directed graph G � (V, E) consists of a node set V � 1, 2, . . . , n { } and an edge set E � (i, j) . Here, (i, j) ∈ E implies that there exists a directed edge from node i to node j, and i and j are called the parent node and the child node, respectively. We can also say that the tail node i is pointing toward the head node j. For a digraph G, a directed path of length k + 1 from node i to node j is defined as a sequence of distinct edges of the form (i, i 1 ), (i 1 , i 2 ), . . . , (i k , j), in which all nodes i, i 1 , . . . , i k , j are distinct. Here, node i is called the beginning node and j the end node of the directed path.
A directed graph is said to be strongly connected if there exists a directed path between any two nodes. A strongly connected component (SCC) is a maximal subgraph G s that is strongly connected. Particularly, a source SCC has no incoming edges from another SCC.
A digraph G contains a dilation if there is a subset of nodes S ⊂ V such that the common-neighbor set of S, denoted by T(S), has fewer nodes than S itself, i.e., |T(S)| < |S|. Here, T(S) is the set of nodes j, in which there is a directed edge from node j to some other node in S. Notice that a digraph G contains no dilation if each node has its own independent parent node. It is intuitively plausible that a dilation is a subgraph containing a relatively large number of nodes that are "dominated" by a small number of other nodes.

Problem Statement and Preliminaries
Consider a linear time-invariant (LTI) networked dynamical system described by identifying the nodes that are directly controlled, and A � (a ij ) ∈ R n×n is the adjacency matrix of the underlying network. e overall networked system described by (1) can be denoted by the matrix pair (A, B). (1) is said to be state controllable if, for any initial state x(t 0 ) ∈ R n and any final state

Definition 1. Linear network
x(t f ) ∈ R n , there exist a finite time t 1 and an input If networked system (1) is state controllable, we can say that the matrix pair (A, B) is state controllable.
Definition 2 (see [16,31]). A linear control system (A, B) is a structured system if the elements in A and B are either fixed zeros or independent nonzero parameters. Both the two matrices A and B are called structured matrices.
In this paper, it is assumed that we only know the structure of the matrices A and B. is means that we know which elements in the matrices are fixed to zero and consequently which elements are nonzero free parameters.

Definition 3. A linear control system (A, B)
is structurally controllable if we can set some values to the nonzero parameters in A and B such that the resulting system is state controllable in the sense of Kalman defined in Definition 1.
A structured system can be represented by a directed graph whose nodes denote the (state and input) variables and edges indicate the connections between some variables [31]. In this paper, a structured system (A, B) is denoted by a directed graph G(A, B) � (V, E), in which V � V A ∪ V B is the node set and E � E V A ,V A ∪ E V B ,V A is the edge set. In particular, V A � x 1 , x 2 , . . . , x n is the set of state nodes, corresponding to the n nodes in the original network; V B � u 1 , u 2 , . . . , u m is the set of input nodes corresponding to the m inputs; Complexity between state nodes; and E V B ,V A � (u i , x j ) | b ji ≠ 0 is the set of edges between input nodes and state nodes. In the whole paper, suppose that any input signal is applied to only one node, referred to as a driver node. A state node being reachable means that there is a directed path from some input node to this state node. Similarly, a node set is reachable if each node in the set is reachable. Notice that, in the remaining of the paper, unless otherwise specified, the reachability is only used for the state nodes. In a digraph, an edge subset M is a matching if no two edges in M share a common parent node or a common child node. A matching of maximum size is called a maximum matching.
e maximum matching of a digraph can be denoted by mapping the digraph to its bipartite representation. Consider a directed network G (A, B), whose bipartite representation can be described by B (A, B) at is, each state node x i of the original digraph is split into two nodes x + i and x − i . Here, To describe the relationship between the digraph and its bipartite graph, we use a signal-notation mapping f: to map directed edges from the system digraph into undirected edges of the system bipartite graph as follows: Definition 4. e element r ij � 1 in the matrix R ∈ R n×n if there is a directed path from node i to node j (i ≠ j). Set r ii � 1, i � 1, 2, . . . , n. e matrix R is called reachable matrix.
If only one external input is applied to node 1, then the first row of the matrix R can be used to determine which nodes are unreachable.

Definition 5.
e element p ij � 1 in the matrix P if edge (i, j) is one of the matching edges of a maximum matching about a bipartite graph. e matrix P is called maximum matching matrix. e maximum matching of a directed graph is not unique. erefore, the corresponding maximum matching matrix P is not unique. It can be found from the matrix P that the number of nonzero elements in the matrix P is the number of matching edges in the maximum matching, and each row and each column have at most one nonzero element. e j th column is full of zero elements, indicating that node j in the network does not have its own independent parent node. Definition 6. Consider a directed network, in which only one external input signal is applied to node 1. If n � n j�1 r 1j , r 1j ∈ R, then such reachable matrix R is called 1 − R matrix. For example, Obviously, if the reachable matrix R of a network is a 1 − R matrix, then all the state nodes in the network are reachable.
Definition 7. Consider a directed network, in which only one external input signal is applied to node 1. If the maximum matching matrix P has a unique nonzero element in each column except for the first column, then such maximum matching matrix P is called 1 − P matrix. For example, Obviously, if the maximum matching matrix P of a network is 1 − P matrix, then there is no dilation in the network.
A necessary and sufficient condition for the structural controllability of an LTI system is given as follows [31].
Lemma 1 (see [31]). e pair (A, B) is structurally controllable if and only if the following two conditions are satisfied simultaneously: en, we have the following controllability criterion.
In this paper, given a structurally uncontrollable directed network, we study the problem of adding the least edges to improve the topology so as to obtain a structurally controllable system. After we get these optimal edge-addition configurations, we need to calculate the network cost of each optimization scheme and choose the one with the minimum cost. In summary, the problem is given as follows. Complexity where ‖A‖ 0 denotes the number of nonzero elements in a matrix A.
If (A + A, B) is structurally controllable, we refer to the matrix A as an effective perturbed matrix and to A * in (4) as the modified matrix. e aim of this paper is to provide a characterization of all possible modified matrices by using graph-theoretical tools and design an algorithm to obtain such a solution.

Network Topology Optimization to Ensure Structural Controllability
Note that the system digraph is denoted by erefore, given an effective perturbed matrix A, we can relate a digraph to the perturbed Since the matrix A is closely related to the E, we can rewrite Problem 1 in a different way.
is a 1 − R matrix and the maximum matching matrix is a 1 − P matrix.
Additionally, define a feasible edge-addition configuration as a set of directed edges that is a feasible solution of Problem 2. e solutions to Problem 2 are given in this section. First, a definition is introduced to describe the smallest set of edges needed to achieve reachability, i.e., satisfy condition (1) in be the system digraph. e set of state nodes V A can be divided into two sets based on their reachability, namely, where R is the set of reachable nodes and N is the set of unreachable nodes. In addition, assume that there are r source SCCs that are unreachable, whose node sets are denoted by N 1 , N 2 , . . . , N r ⊆N. In order to make the nodes in these unreachable source SCCs reachable, we need to add a new edge between the reachable node and the node in the source SCC so that all the nodes in the source SCC are reachable. Moreover, since the source SCC has outgoing edges pointing to other nodes, the unreachable nodes that are connected to the source SCC will also become reachable.

Definition 8.
A set S E is made up of connected edges, then the set S E is called the connected edge set. Here, the connected edge refers to the connecting edge between the reachable node and the unreachable node. Algorithm 1 is illustrated in Figure 1. e connected edge set contains the minimum number of added edges required to ensure that all the state nodes are reachable. Obviously, the connected edge set can only satisfy condition (1) in Lemma 1 and cannot guarantee the structural controllability of the networked system. To ensure structural controllability of the system, these edge additions must satisfy two conditions: (i) a set of connected edges and (ii) the "tail" node of the new edge is not used as an independent parent node in the maximum matching. It is the "head" node of the edge that has no independent parent node.
. . , n o be a node set in which each node is not used as independent parent node, and U r (M) � v r i : i ∈ 1, 2, . . . , n r be a node set with no independent parent nodes. A set E is a feasible edge-addition configuration if and only if it contains the union of the following two sets: . . , n r } eorem 2 provides some feasible edge-addition configurations, but we need to find the optimal one from these configurations. erefore, the first task is to select the optimal solution from these feasible solutions. From the above discussion, it can be found that, after determining the maximum matching of a bipartite graph, if those unmatched nodes (nodes without independent parent nodes) happen to be distributed in different source SCCs, then the added edges just meet both conditions in Lemma 1, which is exactly what is needed. To explore this situation, we introduce the following concepts. G(A, B), whose bipartite representation is denoted by B(A, B). Let M be a maximum matching associated with B (A, B). Moreover, let U r (M) be the set of nodes in which each node has no independent parent nodes. If there is at least one node i, i ∈ U r (M) in an unreachable source SCC, then such an unreachable source SCC is called an ideal source SCC.

Definition 9. Consider a directed network
Whether an unreachable source SCC is an ideal source SCC depends mainly on the specific maximum matching. Because there may be more than one maximum matching corresponding to a directed network, it is not possible to determine whether a node has an independent parent node in the maximum matching.

Definition 10.
e N s of the directed network G (A, B) is defined as the maximum number of ideal source SCCs in all the maximum matchings.

Complexity
We can determine a maximum matching attaining N s using Algorithm 2.
We take Figure 2, for example, to illustrate Algorithm 2. e reachable matrix corresponding to the digraph in Figure 2(a) is expressed as follows: e unreachable node set can be determined as N � x 3 , x 5 , x 6 by the position of the 0 element in the first row of R. Moreover, there are two unreachable source SCCs (red box), whose node sets are N 1 � x 3 and N 2 � x 5 , x 6 , respectively. en, we can label columns 3, 5, and 6 of R as follows: Figure 2(b) shows the bipartite representation of the original directed network (Figure 2(a)). In order to make the column ordinals corresponding to all 0 columns in the maximum matching matrix P coincide with the marked column ordinals as much as possible, an ideal maximum matching M is determined in Figure 2(c), and its corresponding maximum matching matrix is expressed as follows: ere are at most two 0 columns in P * that are consistent with the marked column ordinals, and the corresponding node x 3 is located in N 1 , and node x 5 is located in N 2 , so N s � 2.
If all the state nodes that are not used as independent parent nodes are unreachable, then additional edges are needed to satisfy condition (1) in Lemma 1. erefore, in this case, calculating N s according to Algorithm 2 does not necessarily lead to an optimal configuration of added edges. To illustrate this statement, we take Figure 3 for example.
Next, we will propose Algorithm 3 to solve Problem 2. Algorithm 3 is mainly divided into the following four steps: Step 1. All the state nodes in the directed network are classified into a reachable node set and an unreachable node set, respectively, based on the node reachability.
Step 2. Determine the ideal maximum matching to get N s . If there exist some unreachable nodes that are not used as independent parent nodes in the ideal maximum matching, then we alter the matching by finding a directed path rooted at the input node.
Step 3. Add some edges to satisfy Lemma 1. ese edges start at reachable nodes that are not used as independent parent nodes and end at nodes that have no independent parent nodes in unreachable source SCCs.
Step 4. If there are unreachable nodes that are not used as independent parent nodes, then we need to add a set of connected edges to ensure that both two conditions of Lemma 1 are satisfied.
Given a structurally uncontrollable system (A, B) that contains unreachable nodes and/or dilations. erefore, we need to optimize the network topology to ensure structural controllability by adding edges. Algorithm 3 is given to obtain optimal edge-addition configuration to solve Problem 2. (1) Write the reachable matrix R of the directed network, and determine the unreachable node set N in the network by the position (column ordinal) of the 0 element in the first row. (2) Find the unreachable source SCCs.
(3) Select the nodes located in the source SCCs from the unreachable nodes set N and mark their column ordinals. (4) By using the marked column ordinals to identify an ideal maximum matching M. Its corresponding maximum matching matrix is P * . e column ordinals corresponding to all 0 columns in the matrix P * need to match the marked column ordinals as much as possible. (5) According to Step 3, an ideal maximum matching matrix P * can be obtained. From the matrix P * , the nodes corresponding to the matching column ordinals can be found. (6) Based on the distribution of the nodes found in Step 5 in the source SCCs, N s can be calculated. ALGORITHM 2: Determine the ideal maximum matching to get N s . 6 Complexity Next, an example in Figure 4 is given to illustrate Algorithm 3.

Network Optimization Cost
We have solved the optimal edge-addition configuration problem; however, there are multiple potential edge-addition configurations to ensure structural controllability. From the application perspective, the lowest cost configuration is usually selected as the final optimization solution. erefore, we present Problem 3 based on Problem 2, taking the network cost into account. In order to solve Problem 3, we introduce an edge cost measurement index to calculate the edge cost and thus obtain the cost of the whole network. x 2 Figure 3: e maximum matching of a directed graph is not unique, and different maximum matchings will result in different feasible edgeaddition configurations. In (a), the initial system digraph G (A, B) is given. e red edges in (b) and (d) form two different maximum matchings. e red edges in (c) and (e) are determined by the maximum matchings in (b) and (d), respectively. In (c), after determining the maximum matching, node x 4 has no independent parent node and node x 2 has not been used as the parent node. So, we need to add the edge (x 2 , x 4 ) to satisfy condition (2) of Lemma 1. Since node x 2 is unreachable, we also need to add the edge (x 1 , x 2 ) to satisfy condition (1) of Lemma 1. en, we have E 1 � (x 1 , x 2 ), (x 2 , x 4 ) . In (e), after determining the maximum matching, node x 4 has no independent parent node and node x 1 has not been used as the parent node. So, we can add edge (x 1 , x 4 ) to satisfy both two conditions of Lemma 1, i.e., E 2 � (x 1 , x 4 ) . erefore, E 2 is an optimal edge-addition configuration but E 1 is not.

Input: A directed network G(A, B);
(1) All the state nodes in the network are classified into a reachable node set R and an unreachable node set N. en, determine the unreachable source SCCs in the directed network G (A, B). where the beginning node of each L i is in some unreachable source SCCs and the end node is not used as a separate parent node; (10) Let Q � q 1 , q 2 , . . . , q n and Z � z 1 , z 2 , . . . , z n , q i , z i are the beginning and end nodes of each path L i , respectively; (11) Let E * ←∅, k←1; Complexity 7 In addition, we need to adopt a simple and practical method to calculate the cost of the network and determine a minimum-cost configuration to ensure the controllability based on the optimal edge-addition configuration. G(A, B), find

Problem 3. Consider a directed network
s.t. the new directed network G (A + A, B) contains neither unreachable nodes nor dilations. Also, the cost of the new directed network must be the lowest one.

Main Idea.
Given a structurally uncontrollable directed network G (A, B). e optimal edge-addition configuration is obtained by using Algorithm 3. e first step of calculating the network optimization cost is to obtain the load of each node in the network. Note that the nature of node load is exactly consistent with the betweenness centrality of the node. Betweenness centrality of a node refers to the proportion of the number of paths passing through the node in the total number of shortest paths. Intuitively, the betweenness centrality reflects the importance of the node as a "bridge." erefore, the initial load on each node can be denoted by its betweenness centrality [27]. We can calculate the betweenness centrality of each node by "pajek" software after importing a directed network.
ere is a nonlinear relationship between the load of a node and its capacity [38,39], so we can determine the node capacity by this nonlinear relation. e cost of a node can be measured by its node capacity in the network. We take the larger one of the two node capacities as the cost of the edge that connects these two nodes [40]. In this paper, we calculate the network costs of all optimal edge-addition configurations and then choose the one with the lowest network cost as the optimal edge-addition configuration. e specific calculation process of network cost is given as follows: Step 1. Node load can be measured by the betweenness centrality u x 1 x 2 x 3 x 4 x 5 . , x 8 . We will first decompose the directed graph according to the first step of Algorithm 3, 8 . In (b), we provide B(A, B) the bipartite graph of the directed graph to attain a maximum matching M′(red edges) according to Step 2 of Algorithm 3, i.e., M′ � (u, According to the maximum matching, nodes x 4 , x 6 , and x 7 have no independent parent node, U r (M′) � x 4 , x 6 , x 7 . e nodes x 2 , x 3 , x 5 , and x 6 are not used as the independent parent node, U o (M′) � x 2 , x 3 , x 5 , x 6 . According to Step 3 of Algorithm 3, reachable node x 2 is not used as the parent node, x 2 ∈ R, x 3 , x 5 , x 6 ∈ N. erefore, we need to pick nodes in U o (M′) and U r (M′), respectively, and they form edges that make nodes x 3 , x 5 , and  Complexity where C B (v) denotes the betweenness centrality of node v, σ st (v) denotes the number of the shortest directed paths (s ⟶ t) that passes through node v, and σ st means the number of the shortest directed paths from node s to node t.
Step 2. ere is a nonlinear relationship between node load and node capacity described by where Cap(v) is the capacity of node v, α > 0, β > 0.
Since there is a positive correlation between node load and capacity, set α � β � 1. us, the node capacity is determined by Step 3. Use the index of node capacity to measure the node cost where Cost(v) denotes the cost of node v.
Step 4. Compare the capacities of two nodes of an edge, and take the larger one as the capacity of the edge (edge cost) where Cost(l ij ) is the cost of edge l ij .
Step 5. Calculate the network cost of each configuration according to Step 4 Cost(Net) � Cost l ij , (15) where Cost(Net) denotes the cost of the whole network.

Data
Processing. In Figure 4(a), the initial directed network G (A, B) is given. Get the optimal edge-addition configuration by Algorithm 3, E *   6 ) . e new directed network resulting from the first configuration scheme is shown in Figure 5. Figure 6 shows the curve of the state of each node over time.
We import this new directed network G(A + A, B) into pajek software to calculate the betweenness centrality of each node. e original data of betweenness centrality of each node are shown in Table 1. In Table 2, we collate the data of node load, node capacity, edge cost, and network cost according to each step described in Section 5.1. en, we get the network cost of the first configuration scheme. e new directed network resulting from the second configuration scheme is shown in Figure 7. e original data of betweenness centrality of each node are shown in Table 3. Similarly, we can obtain the data of node load, node capacity, edge cost, and network cost, as shown in Table 4.
e new directed network resulting from the third configuration scheme is shown in Figure 8. e original data of betweenness centrality of each node are shown in Table 5. Similarly, we can obtain the data of node load, node capacity, edge cost, and network cost, as shown in Table 6.
e new directed network resulting from the fourth configuration scheme is shown in Figure 9. e original data of betweenness centrality of each node are shown in Table 7. Furthermore, we can obtain the data of node load, node capacity, edge cost, and network cost, as shown in Table 8.
Comparing the network costs of the above four configuration schemes, we choose the fourth scheme as the optimal edge-addition configuration so as to get the solution of Problem 3.

Illustrative Example.
In [23], a directed network as shown in Figure 10 is considered. e authors proposed 14 edge-addition configurations, i.e., E * � (x 2 , x 10 ), (x 9 , x 5 ), (x i , x j )}, i ∈ 1, . . . , 6, 10 { }, j ∈ 7, 8 { }. However, they did not tell us which one is the optimal edge-addition configuration with the lowest cost. Using the results of our work, the cost of each optimization scheme can be calculated, and finally a scheme E * 8 � (x 2 , x 10 ), (x 9 , x 5 ), (x 4 , x 8 ) with the lowest Table 1: e original data of betweenness centrality of each node in Figure 5. We calculate the betweenness centrality value of eight nodes in Figure 5 by pajek software.   We calculate the betweenness centrality value of eight nodes in Figure 7 by pajek software Figure 7: e new directed network resulting from the second configuration scheme E * 2 � (x 2 , x 4 ), (x 5 , x 7 ), (x 3 , x 6 ) . 10 Complexity cost can be selected to ensure the structural controllability of the network.

Conclusions
In this paper, we have solved the problem of how to optimize the network topology to ensure structural controllability. Given a structurally uncontrollable directed network, Algorithm 3 presents all possible edge-addition configurations.
After determining the optimal edge-addition configuration, a network cost index is given to choose the lowest cost configuration.
In future, we can combine these two strategies of adding edges and adding external input signals to ensure the network controllability and choose the scheme with the highest benefit by comparing the costs of several strategies. In  We calculate the betweenness centrality value of eight nodes in Figure 8 by pajek software. x 1 x 2 x 3 x 4 x 5 N 1 u Figure 9: e new directed network resulting from the fourth configuration scheme E * 4 � (x 2 , x 4 ), (x 6 , x 7 ), (x 3 , x 6 ) . Table 7: e original data of betweenness centrality of each node in Figure 9. We calculate the betweenness centrality value of eight nodes in Figure 9 by pajek software. x 2 x 3 x 5 x 6 x 4 x 10 x 9 x 8 x 7 Figure 10: A directed network. addition, we can extend a single directed network to the topology design of a multiplex network [29,41] so as to ensure the structural controllability of the multiplex network.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare that they have no conflicts of interest.