Sensitivity of Importance Metrics for Critical Digital Services Graph to Service Operators’ Self-Assessment Errors

Interdependency of critical digital services can be modeled in the form of a graph with exactly known structure but with edge weights subject to estimation errors. We use standard and custom centrality indexes to measure each service vulnerability. Vulnerability of all nodes in the graph gets aggregated in a number of ways into a single network vulnerability index for services whose operation is critical for the state. is study compares sensitivity of various centralities combined with various aggregation methods to errors in edge weights reported by service operators. We nd that many of those combinations are quite robust and can be used interchangeably to reect various perceptions of network vulnerability. We use graphs of source les’ dependencies for a number of open-source projects, as a good analogy for real critical services graph, which will remain condential.


Introduction
Correct operation of digital services and infrastructures has since long become critical for societies, and therefore demands coordinated actions for maintenance and incident response.
e Directive on Security of Network and Information Systems (NIS [1]), by the European Parliament, provides a framework for coherent implementation of security measures by European Union member states.Due to the scale and dynamics of digital networks, e ective and e cient protection of their operation must be assisted by intelligent decision support systems operating at national level.Such systems should be (i) Complete, i.e., possessing information about all critical services in the country (ii) Automated, i.e., minimizing human factor in daily operations as well as in network model construction (iii) Coupled, i.e., exchanging information at international level Researchers, industry, and regulators stay aware of the above challenges and come up accordingly with ideas of such systems (cf., e.g., [2,3] and references therein).Notably, Polish government is supporting National Cybersecurity Platform (NPC), a R&D project whose goal is to address the rst two of the above issues, i.e., actually implement and deploy a system supporting security operation centers (SOCs).A crucial phase of NPC operation is creation of a graph modeling interdependent digital services run by various operators.is process is done semiautomatically from SOC perspective: service dependencies are discovered in depth-rst search fashion, by interviewing subsequent operators with online questionnaires.
Apart from privacy and organizational obstacles, lling a questionnaire can be a challenge of its own for an operator.For a given own service, an operator is asked to report services preconditioning its correct operation, and to provide estimates of their impact on own service in terms of availability, con dentiality, and integrity (CIA) [4].While the earlier is quite straightforward (as it can be based on inspection of business contracts, service level agreements (SLAs), and invoices or any other formal documents), measuring the magnitude of service dependencies is prone to errors and bias.But, on the other hand, the national critical services network model is built exactly with this info.e model includes routines for vulnerability calculation for each service.Vulnerabilities in turn get combined into a scalar index of overall network vulnerability.
Our goal is to examine how the above process is sensitive to incorrect information about mutual service impact as reported by operators, with the assumption that the structure of the network is known fully and correctly.Such information is crucial because that the scalar index value will be reported to SOCs and, consequently, will play the role of the main threat indicator.
We organized the paper as follows.A network model of services is presented in the remaining part of this section.A suite of methods for calculation of service vulnerability and for aggregation of vulnerabilities into a scalar vulnerability index are described in Section 2. It is followed by the section with discussion of results (Section 3), and we conclude in Section 4.
e network of interdependent digital services is modeled as a directed graph: where V is a list of ordered vertices, representing services, , and E is a list of ordered edges: e ij ∈ E if operation of service v i influences operation of service v j .e impact of such influence is defined by the operator of service v j on a discrete scale from 1 to 10.All the information about the graph structure and service impact can be expressed conveniently by adjacency matrix A, whose element a ij is equal to the impact value or zero if there is no edge e ij .Here, we assume to operate with respect to only one impact aspect, e.g., how much the loss of service i availability influences service j availability.
ere can be nine such aspects in total, C, I, A It is possible to combine them all into one scalar coefficient, when some assumptions on their meaning are made, e.g., if one considers them as probabilities.
Such graph model extension with edge weights represented actually by a matrix of up to nine aspects of impact demands developing new graph algorithms-or picking up one of the aspects, like it is done in this paper.It makes the model universal enough to accommodate both digital services and physical infrastructure elements.In the latter case, one refers to just the availability aspect.For example, availability of backup power supply may influence availability and integrity of the physical access control system; hence, an operator has to address the influence in two aspects: A ⟼ A and A ⟼ I.
Topology of a service graph represents existence of service interdependencies, while edge weights stand for intensity of those interdependencies.When combined, they make it possible to calculate the overall vulnerability of each service.ere are many ways such vulnerability could be formulated; we express its definition as where Φ is some function defined over adjacency matrix that computes vector r of vulnerabilities for each service, respectively.While r contains complete information about vulnerability of each service, a single scalar index c of overall network vulnerability would be much more convenient in everyday use.Like for individual vulnerabilities, its calculation can be accomplished in many ways; we denote this process as where Γ is some function defined over vulnerability vector.e major practical problem concerns credibility of c, which is computed indirectly from A whose values are not objective.
ey come from the questionnaires and are a result of self-assessment process by service operators, whose accuracy depends on their cybersecurity awareness and maturity of methodologies used in service impact estimation.An objective approach to vulnerability estimation would require excessive provocative tests on critical services or postmortem analyses, both of which are costly and undesirable.
erefore, we must assume that, contrary to structure of service dependencies that is known and correct, the reported impact values  A differ from true ones by some errors: where ξ is realization of a random variable with uniform discrete distribution U − N, N { }.Here, N is the maximum impact estimation error in the ten-star rating scale.Note that in (4), we curb disturbed rating within the original scale of one to ten stars.Consequently, we denote calculated vulnerabilities of serivces for the reported values of  A as Star ratings have been commonplace practice in many fields where user feedback is required.While facilitating the questioning process from a psychological perspective, it complicates analysis of statistical properties of responses, as it has been reported in [5].e same authors claim that scales with more than seven stars provide too many possibilities and spoil the quality of a poll.Likewise, providing the respondent a scale with odd number of stars prompts him a safe and lazy option to hit the middle of the scale, which also reduces response quality.
In our case, we kept the original 10-star scale as proposed by the NPC risk-analysis team.Such scale leaves operator no "middle" option, unlike grade "3" on 5-star scale.Indeed, we do not want operators to answer neutrally because, opposite to, e.g., hotel ranking, there is no "neutral" answer other than absence of the edge connecting the two services.Moreover, finer scale makes room for elaborating more precise instructions on self-assessment and answering in the future.As regards the choice of distribution for ξ, it came from papers [5,6].e cited authors applied disturbances of moderate scale of one to two stars only.
e main aim of this paper is to evaluate sensitivity of various definitions of service vulnerability, Φ, and of importance aggregation functions, Γ, to errors in user assessment of service impacts.

Importance Definitions.
ere exist a number of recognized and widely known definitions of vertex structural importance that can be used as candidates for Φ.In parlance 2 Security and Communication Networks of networks, they are usually called node centralities [7].Some of them are trivial ones, like node degree-they are useful but out of scope of this study as they do not consider link weights, i.e., impact values.Some others are related to network flow maximization problems [8].ey also are inappropriate here because software malfunctions, unlike flows, are indivisible, and on the contrary, replicable. is is why we decided to consider the following three ways to calculate service vulnerability: where H is adjacency matrix A normalized so that the sum of elements in each column of H equals one.Vulnerability of a service calculated this way reflects therefore vulnerability of all other services that service depends on.Such was exactly the original idea of web page rank calculation, by Google founders [9].In our case, a service is a counterpart of a web page.Note, however, that such normalization, necessary from theoretical point of view, weakens impact of vertices with high outdegree.While reasonable for a user clicking through web pages, this assumption does not necessarily hold in case of, e.g., spreading failures, as they may affect dependent services equally strongly, independently of their number.(ii) Φ RC Reach Centrality.Values of r represent fraction of all services whose operation may affect a given service.To account for service impact, a weighted variant is used [10].Originally, any v i affecting v j increases r j by 1/(1 − |V|).In the weighted version, this amount depends on average link weight on the shortest path from v i to v j , in relation to average link weight in the graph.With such approach, a kind of weighted impact summation is performed for each service; however, without concern for important structural properties of the graph as, for example, existence of bridges.(iii) Φ MI Maximum Input.Values of r are solution of the following equation: e aim of the above formula is to calculate centralities like for page rank; however, taking into account only currently most important factors.Algorithm ( 6) is repeated until convergence, guaranteed by curbing the outcome within <1, 10> interval, consistent with our rating scheme.Finally, a strongest impact path is created for each dependent service, which identifies most crucial parts of the graph, and service vulnerabilities, accordingly.However, it ignores all relations outside the path, even if they stay close to the path in terms of their importance.
Service vulnerabilities calculated above are based on incoming edges and in fact have the meaning of service susceptibility to failure.

Aggregation Functions.
Vulnerabilities can be aggregated by equation ( 3) into a single network vulnerability index c in many ways.Here, we propose three of them: (i) Γ AV , the mean of r: it represents the total of service vulnerabilities, without regard for their distribution.While providing a good measure of overall vulnerability, it hides the existence of extraordinary vulnerable services in the network.(ii) Γ 50 , the median: it represents the typical value of service vulnerability in the network, i.e., it discards extreme values.(iii) Γ MX , the maximum: contrary to Γ 50 , the service with biggest vulnerability is picked up, regardless of vulnerability of the other ones.

Sensitivity of Vulnerability to Self-Assessment Errors.
For any instance of reported impact matrix,  A m , we can calculate corresponding  r m and finally, vulnerability index  c m -using any combination of Φ's and Γ's provided above.
en, we can calculate the difference between vulnerabilities calculated for reported and for real impact values.
In the context of difference between two sets of services, we may introduce yet another measure based on difference in ordering of the most important services there: δ m (Φ, Γ L5 ).It uses Levenshtein distance [11] to compare the contents and order of first five most important services in r and in  r.
e Levenshtein distance counts the number of edit operations to apply to one sequence to convert it to another sequence.In our case, five-element sequences are compared.Edit operations are: insertion, deletion, and change of a single element in a sequence.For example, if r � [0, 1, 3, 4, 6, 5] and r m � [1, 0, 3, 4, 5, 6], the five most important services would be (r 5 , r 6 , r 4 , r 3 , r 2 ) and (r 6 , r 5 , r 4 , r 3 , r 1 ), respectively.It takes three operations to transform one set into the other: two for swapping of r 5 with r 6 , and one for replacement of r 2 with r 1 -and therefore, the edit distance equals three.

Used Networks. In practice, the service graph G and reported impact values 􏽥
A are compiled after a laborious process of questioning service operators about their services relationship structure and relationship intensity.A sample real graph of services made this way is presented in Figure 1.Reconstruction of service dependencies between operators is particularly hard, since such information is often considered confidential.Collected data are inherently sensitive because they may serve as well for improving network reliability as for attacking its weakest points.Such observation has been made previously in case of critical infrastructure modeling and holds also for digital services.e papers [12,13] cover sector-wise interdependency analysis and summarize modeling approaches, respectively.All the authors express their concern about privacy of the collected data; consequently, only a small fraction of interdependencies is Security and Communication Networks presented in [12].Similarly, we decided to carry out our study for networks whose operation is partially analogous to the interplay of digital services instead of the real network.
We found that networks of source code dependencies are a close analogy.First, they represent software components, on a much smaller scale though.Second, the dependency between modules can be relatively easily tracked by static code analysis.ird, failure or malfunction of one software module influences the operation of all modules that depend on it, although differently.Fourth, module dependencies in open-source projects appear not in predefined way but represent current needs of programmers, as already reported in [14].Finally, dependencies between source code modules as well as between essential services can be relatively easily traced, while their intensity can not.
All networks analyzed in this study describe software module dependencies in Javascript (JS) projects available from hosting platform github.com.Dependencies have been found by using the static code analysis tool, Madge http:// www.npmjs.com/package/madge.Project properties are given in Table 1.Projects differ in size; moreover, some of them happen to have circular dependencies of the code, which also happens for real digital services.A sample graph of dependencies is shown in Figure 2.

Results and Discussion
Formula (7) calculates the vulnerability estimation error for a single realization of  A. To assess the error in statistical sense, one would need to calculate analytically how ξ affects  A,  r, and finally, δ.In this paper, we rather present results of cursory estimation of δ, based on random sampling of δ m for a number of M samples, m ∈ 1, 2, . . ., M. We calculate the following statistics from sample distributions of δ: All the reasoning provided above concerns a single instance of A, whose values are chosen randomly.In order to draw more general conclusions about the properties of chosen combination of Φ and Γ, we need to repeat calculations for a number of test cases.Let us call them experiments-nonzero values of new impact factor matrix A are chosen and disturbed using equation (4) in each experiment.Finally, all θ's are calculated, accordingly.Sample graphical results from two series of 1,000 experiments each for Airbnb network are given in Figure 3.In all our analyses, from now on, the number of experiments will be equal to the number of samples in each experiment, M.
Figures 3(a) and 3(b) show various characters of vulnerability errors.In some aspects, the two demonstrated examples bear similarity, e.g., c, and the average of δ is negatively correlated.(Intuitively, the more high-score links in the network, the less important is error by one star in impact estimation by the service operator.)Next, some configurations result in more discrete error distribution-as in case (b) where the switching nature of median manifests in striped dot patterns.Finally, histograms show how much variable are vulnerability errors across experiments.For example, we see that in case (a) they are quite stable, clustered closely around one value, while in case (b) they show much bigger variability.
Results in Figure 3 justify the need for deeper inspection of the nature of observed errors.However, to compare sensitivity of many networks in multidimensional parameter space of Φ's, Γ's, and N's, we have to develop a simpler approach.We propose to calculate and compare average values of, θ's, i.e., θ AE , θ RE , θ AD , and θ RD , over all performed experiments.Such averaged indicators are collected in Tables 2-6, each table for a different project.Table 1: Properties of projects used for analysis.Security and Communication Networks e figures given in Tables 2-6 cover all combinations of five graphs, three importance indices Φ, four importance aggregation functions Γ, and two amplitudes of estimation error N. Basically, we search this space to find valuable combinations of Φ's and Γ's.A valuable combination is characterized by (i) Small total error Δ for all considered projects and values of N-we want the approach to be independent of graph structure (ii) Big sensitivity S to change of N, for all projects (pick the worst case)-we want operators' errors of estimation to really influence the value of overall metrics θ (iii) Small standard deviation Σ of error, for all projects (pick the worst case)-we want small variance of θ's, in general Candidate combinations of Φ and Γ should therefore be in general tolerant to imprecise information provided by operators, but at the same time, sensitive to the scale of such lack of precision.Moreover, it is desirable that errors in network vulnerability calculated by such combination do not vary widely.We check the last two requirements with respect to the worst results found for the analyzed projects.Results of such three-criteria scoring are presented in Figure 4, projected on three planes.e axes have been selected or adjusted so that markers located near an axis correspond to combinations that perform better.Visual comparison provided in Figure 4 does not determine strictly the optimum combination, but makes it possible to observe that, in general, performance indices do not vary widely-at least so that using linear axis scaling will do to reveal differences.Secondly, markers get clustered mainly with respect to their color, which means that the choice of aggregation method Γ is more important than the choice of algorithm for importance index calculation.
As analyzed combinations form a cloud in 3D space, we may find a Pareto front, i.e., a set of nondominated combinations.ey are (i) (Φ RC , Γ AV )-the average of reach centrality (ii) (Φ PR , Γ AV )-the average page rank (iii) (Φ PR , Γ AV )-the median of page rank (iv) (Φ PR , Γ MX )-the maximum of page rank (v) (Φ MI , Γ AV )-the average of maximum input importance

Conclusions
It should be reminded that research reported here is done in context of a large project aiming to build a nation-wide  model of critical services network.While integrity of the resulting graph can be obtained by careful automated inspection of questionnaires filed by service operators, the estimated reported impact between services will be biased and inherently erroneous.erefore, it was worth to study sensitivity of some candidate synthetic metrics of overall network vulnerability with respect to incorrect input.We felt it correct to use networks of software module dependencies because of their functional and structural similarity to network of critical services, let alone that such real networks will probably remain confidential.e study shows that all three proposed formulas for individual service vulnerability calculation are valuable.is is rather a positive observation, as each of them has its own specifics and can be used under various circumstances.Also, almost all proposed ways of vulnerability aggregation into a single vulnerability index are useful (except the Levenshtein distance, which shows much variation and has turned out to be useless).Naturally, combinations of formulas appropriate for capturing "extreme" phenomena, as (Φ MI , Γ MX ), will have show variability.
e main takeaway is that it is safe to apply mean or median aggregation of individual service vulnerability, whatever is the formula for importance calculation.Such aggregated value may serve as a single, comprehensive vulnerability index.Note that being robust to errors in graph edge weights, it will be affected by major structural graph changes-e.g., edge removal as result of real-time detected failure.Our previous work has shown that networks of autonomous systems (AS) can be really badly affected by just one link failure, contrary to widespread belief in Internet robustness [15].
One should remember that results reported here were based on the sound assumption of analogy between critical services and software modules. is assumption will eventually get verified in practice, once the national cybersecurity platform is operational and filled with data.We look forward to compare properties of vulnerability calculation formulas calculated here by random sampling with careful expert judgment and postmortem analyses for real services graph.
(i) Mean average absolute error, θ AE � (1/M) m |δ m | (ii) Mean average relative error, θ RE � θ AE /c (iii) Standard deviation of error, θ AD � stdev(δ) (iv) Standard deviation of error, relative to true value, θ RD � θ AD /c ey all are comprehensive measures of how errors of operators impact estimation affects errors of network vulnerability, given any of the proposed formulas of Φ and Γ.

Figure 1 :Figure 3 :
Figure 1: Graph of real dependencies between 33 services run by 17 operators in 3 branches of national economy.