Communities detection in multiplex networks using optimization. Study case: Employment in Mexico during the COVID-19 pandemic

In this work we present a methodology based on the robust coloring problem (RCP) and the vertex cover problem (VCP) in order to ﬁnd the communities in multiplex networks. For this, we consider that the RCP ﬁnds a partial detection based on the similarity of connected and unconnected nodes and using VCP we look for that


Introduction
In recent years, the analysis of several characteristics of multilayer and multiplex networks has been of great importance to scientist.On the other hand, since 2020 the world was affected by the pandemic caused by the disease known as COVID-19 and although most of the effects were related to the health of the population, the pandemic has caused various socioeconomic effects across the world.
In this work we focus on the detection of communities in multiplex networks and as an applied case study, we analyze the dynamics of employment in Mexico, characterizing it as a complex network of multiple layers (multiplex network).
The detection of communities aims to identify modules or groups with one or more properties in common based solely on the information encoded in the topology of the network.However, we present a novel methodology to identify communities without using topological information.

Complex systems
At the moment, there is no precise and accurate definition of what a complex system is.However, complex systems are composed of several elements connected in a certain way, where, from that relationship, some properties arise, such as: 1. Emergency: It is the essential characteristic of complex systems, which arises from the analysis of each of its components and the difference on analyzing the entire system.That is, the emergent behavior arises as the nascent properties from the interactions of all the elements that make up the system and that cannot be seen or predicted, analyzing each element one by one.2. Self-organization: It is the characteristic that allows coordinating and synchronizing all the elements that make it up as well as all its processes autonomously, without requiring an internal or external agent to direct these activities.
3. Null predictability: This property arises based on the two previously described properties, since emergent behavior and self-adaptation cause the behaviors and dynamics of complex systems to be difficult to predict.

Complex networks
The usual representation of complex systems is through network science (complex network analysis).and in recent years, the study of the structure of complex networks and their applications are essential topics for research [1][2][3] because most systems of daily life can be modeled as complex networks, for example: social networks, transport, electrical or communications networks, epidemic propagation, among others [4][5][6].
A complex network is a network with non-trivial topological characteristics, which do not occur in simple networks, such as: degree distributions, high local cohesiveness (measured through the clustering coefficients, etc.), community structures, hierarchical structures, among others [7].
In recent years, the study of multilayer networks has been emphasized thanks to the fact that most real systems have structures with multiple types of links or interactions between nodes [8], for example: multimodal transport systems, biological systems, social networks and numerous communication modes [9].

Multiplex networks
On the other hand, multiplex networks are a particular class of multilayer networks, which were introduced to better model complex real-world systems [10][11][12].The main characteristic of multiplex networks is that all the nodes in each layer are replicated in other layers, and there is a direct link between each replica node to denote the relationship.Formally, let GP = (G α , C) ∀ α ∈ {1, ...M }, be a multiplex network where: • G α = (X α , E α ), is a monoplex network called layer α, where X α and E α are the set of nodes1 and the links in layer α, respectively.
tions between nodes in different layers.The elements of C are called cross layers and the elements of each E α are called intralayer connections of GP.
The importance of this type of network is that we can work on different characteristics and relationships for each element and thus be able to have a multicriteria analysis, since, as described above, all the elements (nodes) are found in all the layers.However, as their multiplicity must be considered, the metrics and methodologies initially developed for the analysis of single-layer networks are difficult to adapt.Therefore, analyzing a system as a multiplex network has a higher degree of complexity than a single-layer system.

Employment and COVID-19
Measuring the economy and employment during COVID-19 has originated a series of obstacles and prospects for the private sector and academia.In this work we focus on from the perspective of complex networks and optimization.
In particular, in Mexico millions of workers were forced to stay at home, telecommute or faced consequences of the crisis such as low wages or layoffs since the end of March 2020, when the health emergency was established due to the COVID-19 epidemic.Furthermore, after more than two years (mid-2022) this effect continues to affect a large number of people in the country.Therefore, in this paper we seek to analyze the dynamics of employment in Mexico before and during the COVID-19 pandemic.
The rest of this paper is organized as follows: Section 2 shows the main works related to the idea of this paper.In section 3, we present the strategy that we used to model the employment networks and the development of the proposed methodology for detection of communities.In section 4, we present the numerical classification of the nodes of employment networks, in section 5, we show the discussion and limitations of the study.

Related work
The elements belonging to a complex system are known to play different roles; in consequence, identifying the elements (nodes) that are most influential and the communities that are formed through their interactions are issues of great importance given their application to real-world problems.In this section, we present the main works related mainly to the detection of communities in complex systems and the analysis of employment during the COVID-19 pandemic in the health sector.

COVID-19 and employment
To analyze COVID-19 and employment, most works use the statistical, sociological or economic approaches.Then, in this subsection we describe some related works.
• In the work of Cohen's [13], the author shows a summary of existing and new ways to track employment statistics.However, as the same author comments, the results leave more questions than answers.• Blustein et al. [14] explore how the unemployment crisis caused by COVID-19 may differ from previous spells of unemployment; examining the nature of the pain evoked by the parallel loss of work and loss of life using a psychological approach.
Given the above, many of the works found in the literature are based on the analysis of certain productive sectors, specifically health and informal economy, where there are works such as: • Llop-Gironés et al. [15] the authors describe that the working and employment conditions of many nurses around the world are precarious, and the current pandemic has led to increased visibility of the vulnerability to harmful factors for the health of nurses around the world.• In the work of Webb et al. [16] the authors look at how the pandemic affects those in informal employment, given that they often receive less government support than those in formal employment.• On the other hand, in a different analysis Webb et al. [16] targets workers in flexible employment relationships (eg temporary agency work and other forms of contract labour, as well as new forms of work, such as in the informal economy).• Gezici et al. [17] analyze the probability of being unemployed among groups of white men, white women, black men, black women, hispanic men and women in the US focused on the analysis of racism.
Specifically, for Mexico there are some works that are based on job loss with 100% economic approaches, where the main ones are: • In the work of Samaniego [18] the author analyzes the necessary and immediate measures to protect the employment and income of workers who have become unemployed through the use of all available instruments.• Ruiz-Ramirez [19] analyzes the situation of employment and underemployment in Mexico, in the period immediately before and during COVID-19.
The results show that in the current situation of confinement and in view of the reduction in economic activity, people expelled from the formal sector cannot find employment in the informal sector either, so the labor market has been negatively impacted in both sectors.
• Su et al. [26] and Han et al. [27] propose some nodes classification algorithms that are based on the identification of structural holes.A structural hole is known as the phenomenon that occurs when a node that is connected to multiple local bridges (multiple communities) is removed and an empty space is produced.• In the work of Rossi et al. [28] the authors present a unified framework for coloring large and complex networks and a parallel algorithm for coloring sub-graphs.In general, this work shows that the coloring methods proposed by the authors are accurate with near-optimal solutions, fast and scalable for large networks, and flexible to use in a variety of applications.• Wang et al. [29], present a modified efficiency centrality which considers the influence of the average degree of all nodes and the average distance of the network.
• In the work of J. Zhao et al. [30], the authors show a novel method to improve PageRank based on the structural similarity of nodes calculated by the Kullback-Leibler divergence (which is a non-symmetric measure of the similarity or difference between two probability distribution functions).• Liao et al. [31] present a social network community detection algorithm based on the similarity measure between pairs of vertices and the b-coloring of the graph.This algorithm guarantees polynomial time complexity, making it suitable for detecting communities in large and complex networks.• Yang et al. [32] present an algorithm that can obtain the closeness between adjacent nodes and non-adjacent nodes depending on the interaction time of the nodes and the delay of their jumps.• In the work of Ma and Fan [33] they consider the local information (cliques) of the communities.The algorithm is based on the assumption that cliques are the core of communities, since clique takes into account the local characteristics of the community.The results shown by the authors dictate that the proposed algorithm detecting overlapping communities effectively.
In addition, there are some methodologies where the structural information is obtained using dynamic processes and iterative refinement methods to explore the structural properties, for example in the work of Lü et al. [34], the authors present a method to predict links through the eigenvector centrality.
On the other hand, Rahimi et al. [35] describe that most algorithms developed for community detection take advantage of single goal optimization methods which can be inefficient for complex networks.
Therefore, they propose a new multi-target community detection method based on a modified version of particle swarm optimization, called MOPSO-Net.Experiments on real-world and synthetic networks confirmed a significant improvement in terms of normalized mutual information (NMI) and modularity compared to similar recent approaches.
These methodologies, obtain information about the topology and structure of the network and the results are very reliable; however, in most cases, networks need to meet certain special characteristics; thus, these strategies are not applicable to all types of network models.
Finally, in the survey presented by Maji et al. [36] the authors conclude that heuristics that mainly use local structure, such as low job rating for faster and easier computation, offer a less competitive result than exact algorithms.

Materials and methods
In this section, we present the main characteristics used in the study, the structural metrics analysis and modeling of the networks.

Employment networks
As we present previously, the objective of this work is based on analyze the changes and characteristics per year; that is, multiplex networks that consider each year as a layer and that all the states are in each of the layers.
In addition, the intra-layer links are the relationships in a given year and the inter-layer relationships occur only between replica states, that is, with their analog in the different years.
Then, the number of jobs generated per year for the years 2018 and 2019 (before the pandemic) and the years 2020 and 2021 (after the pandemic) must be considered.
For modeling the employment networks, we use the information available in the INEGI web page and the links by year are generated using the Mahalanobis distance.Mahalanobis distance measures the similarity between two variables.In contrast with the Euclidean distance, Mahalanobis distance considers the correlation between the random variables [37,38].
Then, based on the calculation of this distance, the relationships of the states are given by the number of characteristics in which they are similar, and their quantification is obtained as follows: • The Mahalanobis distance between each pair of states is calculated.
• The median of Mahalanobis distances is calculated.
• For each pair of states with a lower distance than the median, a link is added.
Based on the above, we obtain the links between the states with a big similarity by year.Thus, in order to analyze the dynamics and behavior for the established periods, we use the multiplex networks structure, where: • The nodes of each layer represent the states of the Mexican Republic.Then, each layer will have 32 nodes.• The intralayer relationships were obtained using the process described above.
• Interlayer relationships will occur between replica nodes.As in all the periods to be studied, the 32 states of the Mexican Republic will be present, there is a link for each state with its replicas in each layer.

Methodology
In this subsection we present the proposed methodology to identify communities in multiplex networks, its exemplification with a simple multiplex network and the resolution method.

Adaptation of robust coloring problem and vertex cover tu identify communities in multiplex networks
For this adaptation, we considere the robust coloring problem RCP and the vertex cover problem V CP .
As for RCP , we take the distances of the edges2 as the penalties (the greater the distance, the greater the socioeconomic difference between the states).Therefore, using the complementary network, we look for the coloration with the lowest rigidity and the highest similarity.
In addition, with V CP applied to the complementary graph, we can see how similar the nodes (for our study case: states of the Mexican Republic) are regarding the characteristics to be analyzed and avoiding the penalties of the non-edges (edges of the complementary graph) of RCP , we can generate communities of similar elements.
Thus, once we have the way to generate the networks and the idea of modeling the problem, we can define the following: Given a multiplex network M = (V, L, P, M ), where: • V , is the set of nodes that represents the components of the system.
• L, is the set of layers that represent different types of relationships or interactions in the system.• P , is the set of links that represent the relationships (we have GP L ∈ P as the links of a certain layer).• M , is the set of networks of each monolayer system (networks of interactions of a particular type between the nodes).
Then, given the distance matrix d L i,j and the penalty matrix p L i,j ≥ 0, {i, j} ∈ P the problem can be addressed.On the other hand, knowing that the rigidity of a k-coloration (R(c)) is p L i,j if the penalties are assigned to the {i, j} ∈ P .Thus, it is sought that the network has a minimum rigidity and, in addition to the idea of the coverage problem, it is sought that the difference of the states that belong to the color class is minimal.That is, they form the minimum subset S of V S(V ) = v∈V y v such that for each edge {i, j} of the set P , either node i or node j belongs to S.
Therefore, we have the following mathematical programming model: s.a.
x i,c ≥ 1 (4) And each y u + y v indicates that at least one of the nodes u or v is in the coverage of the non-edges {u, v} ∈ P ∀ i = {1, ..., n}.
So the constraint set (2) helps each node to be assigned a color; the constraints (3) ensure that adjacent links have different colors, the constraint set (4) ensures that all k colors are used, the constraint set (5 ) ensures that each non-edge is covered and, finally, the constraint (6) indicates that each node can only be or not be in the coverage of the non-edges ( P ).
In order to verify the operation of this model, we present an example with a multiplex network of 5 nodes and two layers, whose individual layers are seen from the following way.
Suppose we have the following multiplex network example: • Layer 1 Table 1 Adjacency matrix layer 1.
The supra-adjacency matrix of the network is: Based on Table 3, we can see that the supra-adjacency matrix is made up of the adjacency matrices of the individual layers related through the identity matrix for each of the nodes belonging to the multiplex network .
Therefore, in order to graphically show the multiplex network, we show Figure 1.Layer 1 Fig. 1 Adjacency matrix and multiplex network (graphic).
**Note: It is important to mention that node 5 is a replica of node 0, node 6 is a replica of node 1, node 7 is a replica of node 2, node 8 is a replica of node 3, and node 9 is a replica of node 4.
Therefore, the adjacency matrices and the complementary multiplex network can be viewed as: Therefore, the supra-adjacency matrix of the multiplex network can be viewed as: Based on Table 6, we can see that the supra-adjacency matrix is made up of the adjacency matrices of the individual complementary layers related through the identity matrix for each of the nodes belonging to the network multiplex.
Table 5 Complementary layer 2 adjacency matrix.Layer 1 Therefore, in order to graphically show the multiplex network, we present Figure 2. **Note: It is important to mention that, as in the previous case, node 5 is a replica of node 0, node 6 is a replica of node 1, node 7 of node 2, node 8 of node 3 and node 9 it is a replica of node 4.
Based on the previously model and the complementary multiplex network presented in Figure 1, we can consider the following penalty matrix of P (Table 7) which are the values of the inverse of the distance between each pair of nodes: We proceed to carry out the practical example: • First find the valid coloring for the complementary multiplex network: As we can see in Figure 3, the coloring is valid considering the connections of the node in all the layers and 4 colors are necessary to be able to carry it out.
However, as we mentioned above, the coloration should be done minimizing the penalties with the original network.Then, based on the information shown in Table 7 and the information on coloration penalties presented in Figure 3, we show Figure 4 .In Figure 4, we can see the penalties (marked as X).Therefore, we sum the values for 0.4 (yellow link) and 0.3 (green link) for the penalties for Color 1 (nodes 1, 3, 6, 8 all with color C1).
Then, based on the idea of the coverage set problem for the complementary multiplex network, we have: Therefore, as we can see in Figure 5, we obtain a classification of the elements (nodes), which considers the various types of relationships or changes in time between the connections of the elements (given by the layers of the network).
However, we can see that the number of groups (colors) needed to classify the communities is less than the obtained by RCP .The above is achieved through a classification based on multicriteria analysis (with RCP ) and the dynamic over time is obtained thanks to the multiplicity of layers.On the other hand, based on the V CP the number of groups is minimized, then we can to improve the classification based on the similar characteristics between the groups formed by PCR.

Resolution method
In this work, the adaptation of RCP and V CP was solved using an adaptation of a Genetic Algorithm (GA) [39] developed in Python language with the following set of control parameters: • Number of generations= 100.
• Crossover rate=0.65.• Mutation rate=0.33 GA is a technique based on genetic operators, and it is necessary to establish the structure of mutation.In this work, we use mutations that considerate changes in the color of the selected gene (choosen randomly) with other color.For example, if we have a solution with 3 colors for a network of 9 nodes: We consider gene by gene 3 and based on the mutation rate, we generate a randomly number r 1 between 0 and 1 (continous) and if this r 1 is less than mutation rate, we change the value of the gene choosing a number between (0 and number of nodes) randomly.For example, based on the above vector, a mutation is: Where, the mutated vector has 4 colors.On the other hand, the crossover works as an operator at one point using two parents.Then, a random position between 1 and the number of nodes is obtained and a cut is made in the vector of each of the parents.
Therefore, the first part from the first parent and the second part from the second parent produce child 1.On the other hand, the second part from parent 1 and the first part from parent 2 produce child 2.
For example.Taking the above vectors and position "4", we produce two children as follows: • Child 1: • Child 2: Based on the above, we can see that the genetic operators satisfies the sexual reproduction and adaptation of individuals.

Analysis of results
In this section, we show the values for the main structural metrics and the analysis about the communities for the employment networks.
In order to understand better the results shown in the following subsections, each multiplex network is made up of 4 networks, where the first layer belongs to the year 2018, the second to the year 2019, the third to the year 2020 and the fourth to the year 2021.
Therefore, in Figure 6 we show the matrix of adjacency and the graph of the multiplex network formed by the aforementioned networks.It is important to mention that in Figure 6 each node has its replicas.For example node 1 has its replicas shown as 0, 32, 64 and 96, node 2 has its replicas shown as 1, 33, 65 and 97.This is repeated analogously up to node 31 (because identifiers start at 0 and end at 31 since there are 32 states in Mexico) with their replicas shown as 31, 63, 95 and 127.

Communities using RCP-VC
For our study case, the objective is based on obtaining the coloration that is valid both for the complementary network and for the original network.Therefore, Figure 7 show the multiplex complementary network of the original network presented in Figure 6.It is important to mention that in the multiplex complementary network (Figure 7), the intralinks for each layer are the complementary for the original; however, the replicas are maintained.
If we consider the penalty matrix (distance between the states based on their characteristics), we look for the valid coloration for the complementary multiplex network.
Then, for our study case we need 23 colors to satisfy the coloration and it is important to mention that the number of colors is high compared to the number of nodes because the coloration is considered to be valid for the original network and the most important thing is that it is valid for the multiplicity of layers in both networks (original and complementary).
Therefore, the classification for the 32 Mexican states are: Based on Table 8, we can see the distribution of similarities (using RCP ).However, if we apply the idea of V CP , we need only 9 colors.
It is important to mention that, if we consider this 9 colors, the coloring obtained is invalidated.However, it is the basis for a correct classification.
In other words, the 23-coloration is the first approximation of the formation of subsets by links and once V CP is applied, we obtain a classification of the elements based on similarity, which causes that decrease the number of groups.
Therefore, we can verify that a detection of communities is achieved without considering the calculation of structural or topological metrics of the network.Now, Table 9 shows the classification by communities obtained for the proposed methodology in the study case.
Based on the information of Table 9, we can see that the classification is now done in 9 communities.Therefore, we can verify what characteristics the elements of each of them share and thus be able to analyze the dynamics (changes) of employment in Mexico during the COVID-19 pandemic.Now, we present the most important characteristics of the numerical results (classification by communities) using the social approach: were affected by the drop in tourism.However, due to various activities that produce food supplies, there was no great impact on job creation.• States such as Nuevo León, Tabasco, Sinaloa, Sonora and Baja California Sur, were affected by manufacturing production that was reduced.However, other productive sectors such as: livestock, fishing and agriculture, were not affected.
Therefore, we can see that using the complex networks approach to identify elements with similar characteristics (based on the identification of communities) can help to classify and observe phenomena from different fields.

Conclusions and future work
In this work, we present a methodology based on the adaptation of the robust coloring problem (RCP) and the vertex cover (VC) to find communities in multiplex networks.In order to verify if this methodology can be applied in real-world problems, we used the information about the employment in Mexico before and during the COVID-19 pandemic.
The results show that the proposed methodology is capable of identifying communities in multiplex networks; which, due to this multiplicity of layers, present a greater challenge in order to identify similarity relationships between each node (given that there are multiple replicas of each one).
The above is achieved through a classification based on multicriteria analysis (with RCP ) and the dynamic over time is obtained thanks to the multiplicity of layers.On the other hand, based on the V CP the number of groups is minimized, then we can to improve the classification based on the similar characteristics between the groups formed by P CR.
The next step of this research is based on the specific analysis of the productive sectors: Primary, secondary and tertiary for all the states of the Mexican Republic and thus be able to have an idea of which were the most affected or benefited by the COVID-19 pandemic.
Finally, in order to verify if the methodology is correct and works for various types of networks, it is sought to apply it to networks with different characteristics and structural properties.

Fig. 3
Fig. 3 Valid coloring for the complementary multiplex network.

Fig. 4
Fig. 4 Penalties of the original multiplex network with the coloring of the complementary multiplex network.

Fig. 5
Fig. 5 Nodes that belong to the coverage set of the complementary multiplex network.

Table 3
Multiplex network of layers 1 and 2.

Table 6
Multiplex network of complementary layers 1 and 2.

Table 7
Matrix of penalties p i,j .

Table 8
Classification by communities of the 32 Mexican states.

Table 9
Classification by communities obtained from the adaptation of RCP and V CP .States such as Mexico City, the State of Mexico, Jalisco and Hidalgo, were affected by the COVID-19 pandemic in terms of job creation and preservation.• States such as Michoacán, Aguascalientes and Zacatecas were not affected by the COVID-19 pandemic, in terms of job creation and preservation, but they are exporters of perishable products.• States such as Veracruz, Tlaxcala, Guerrero, Baja California and Oaxaca