A Grey Theory Based Approach to Big Data Risk Management Using FMEA

,


Introduction
In recent years, big data has rapidly developed into an important topic that has attracted great attention from industry and society in general [1].The big data concept and its applications have emerged from the increasing volumes of external and internal data in organizations and it differs from other databases in four aspects: volume, velocity, variety, and value.Volume refers to the amount of data, velocity refers to the speed with which data can be analyzed and processed, variety describes the different kinds and sources of data that may be structured, and value refers to valuable discoveries hidden in large datasets [2].The emphasis in big data analytics is on how data is stored in a distributed fashion that allows it to be processed in parallel on many computing nodes in distributed environments across clusters of machines [3].
Given the significance that big data has for business applications and the increasing interest in various fields, relevant works should be mentioned: [4] argued that consumer analytics lies at the junction of big data and consumer behavior and highlights the importance of the interpretation of the data generated from big data.Reference [5] examined the role of big data in facilitating access to financial products for economically active low-income families and microenterprises in China.Reference [6] investigated the roles of big data and business intelligence (BI) in the decision-making process.Reference [7] presented a novel active learning method based on extreme learning machines with inherent properties that make handling big data highly attractive.Reference [8] developed a selection algorithm based on evolutionary computation that uses the MapReduce paradigm to obtain subsets of features from big datasets.Reference [9] discussed the advancement of big data technology, including the generation, management, and analysis of data.Finally, [10] described a brief overview of big data problems, including opportunities and challenges, current techniques, and technologies.
Big data processing begins with data being transmitted from different sources to storage devices and continues with the implementation of preprocessing, process mining and analysis, and decision-making [6].Much of this processing takes place in parallel, which increases the risk of attack, and how best to guard against this is what big data management seeks to do [11].
Over the last few years, several researchers have proposed solutions for mitigating security threats.In [12], a taxonomy of events and scenarios was developed and the ranking of alternatives based on the criticality of the risk was provided by means of event tree analysis combined with fuzzy decision theory.Reference [13] developed a mathematical model to solve the problem according to the risk management paradigm and thereby provided managers with additional insights for making optimal decisions.There has also been research on the use of large network traces for mitigating security threats [14].
However, research analyzing the risks associated with big data is lacking.Moreover, from this perspective, information security measures are becoming more important due to the increasingly public nature of multiple sources.Hence, many issues related to big data applications can be addressed first by identifying the possible occurrences of failure and then by evaluating them.Consequently, this paper proposes the use of a specific Failure Mode and Effects Analysis (FMEA) method and Grey Theory, which allows for risk assessment at the crucial stages of the big data process.Both mathematical rigor, which is necessary to ensure the robustness of the model, and the judgments of those involved in the process, given the subjective characteristics of the types of assessments made, are considered in this model.This paper contributes to the literature in the following aspects.First, it offers new insights into how the different characteristics of big data are linked to risk in information security.Second, it provides a model risk analysis based on a multidimensional perspective of big data risk analysis.
The first section of the paper discusses big data and information security issues.Then, the discussion that follows relates to existing methodologies for information security and background information, which are necessary for developing the proposed approach.Next, we introduce the methodology and present a real case that illustrates how the methodology validates the proposed approach.Finally, the discussion presents the limitations of the research, suggested areas for further study, and concluding remarks.

Big Data and Methodologies for Risk Management.
As mentioned before, big data has different characteristics in terms of variety, velocity, value, and volume compared to classic databases.Consequently, big data risk management is more complex and is becoming one of the greatest concerns in the area of information security.Currently, another important point is that data availability and confidentiality are two top priorities regarding big data.
Recently, several works relating to big data and security have been published.Reference [15] proposed a new type of digital signature that is specifically designed for a graphbased big data system.To ensure the security of outsourced data, [16] developed an efficient ID-based auditing protocol for cloud data integrity using ID-based cryptography.In order to solve the problem of data integrity, [17] proposed a remote data-auditing technique based on algebraic signature properties for a cloud storage system that incurs minimal computational and communication costs.Reference [18] presented a risk assessment process that includes both risk arising from the interference of unauthorized information and issues related to failures in risk-aware access control systems.
There are many methods and techniques with respect to big data risk management.Table 1 lists and briefly describes qualitative methodologies for risk analysis.
Some approaches based on quantitative methods have also been proposed.Reference [19] presented an approach to the risk management of security information, encompassing FMEA and Fuzzy Theory.Reference [20] developed an analysis model to simultaneously define the risk factors and their causal relationships based on the knowledge from observed cases and domain experts.Reference [21] proposed a new method called the Information Security Risk Analysis Method (ISRAM) based on a quantitative approach.
As can be seen, the purpose of big data security mechanisms is to provide protection against malicious parties.Hence, researchers have also identified several forms of attacks and vulnerabilities regarding big data.Reference [22] investigated key threats that target VoIP hosts.Reference [23] analyzed the impact of malicious servers on different trust and reputation models in wireless sensor networks.Reference [24] examined a cloud architecture where different services are hosted on virtualized systems on the cloud by multiple cloud customers.Also, [25] outlined a discussion of the security and privacy challenges of cloud computing.
In this context, attacks themselves are becoming more and more sophisticated.Moreover, attackers also have easier access to ready-made tools that enable exploitation of platform vulnerabilities more effectively.For these reasons, the security risks of high volumes of data from multiple sources, complex data sharing, and accessibility-related issues arise in a big data environment.Therefore, there is an increasing need to develop and create new techniques for big data risk analysis.(FMEA).FMEA was first proposed by NASA in 1963.The main objective of FMEA is to discover, prevent, and correct potential failure modes, failure causes, failure effects, and problem areas affecting a system [31].According to FMEA, the risk priorities of failure modes are generally determined through the risk priority Comprises three stages; the first two stages identify and analyze the risks to the system and the third stage recommends how these risks should be managed.[26] Expert system for security risk analysis and management (RAMeX)

Failure Mode and Effects Analysis
Proposes examining the risk assessment portion of the risk management process in seven steps: define the problem, identify threats, determine the probability of occurrence, identify existing security, assess the business impact, assess security countermeasures, and generate report. [27] Facilitated risk analysis process (FRAP) The process involves analyzing one system of the business operation at a time and convening a team of individuals who have business information needs and technical staff who have a detailed understanding of potential vulnerabilities of the system and related controls. [28] Information risk analysis methodologies (IRAM) Provides three phases; first phase: conduct a comprehensive assessment of the business impact and determine the business security; second phase: assess threat and vulnerability of incidents occurring in a system; third phase: control selection. [29] Operationally critical threat, asset, and vulnerability evaluation (OCTAVE) Organized into four phases: develop understanding of risk to the business, create a profile of each information asset that establishes clear boundaries and identify its security requirements, identify threats to each information asset, and mitigate this risk. [30] Based on [33,34], the classic proposal uses the 10-point linguistic scale for evaluating the O, S, and D factors.This scale is described in Tables 2, 3, and 4 for each risk factor.The failure modes with higher RPNs, which are viewed as more important, should be corrected with higher priorities than those with lower RPNs.
The FMEA method has been applied to many engineering areas.Reference [35] extended the application of FMEA to risk management in the construction industry using combined fuzzy FMEA and fuzzy Analytic Hierarchy Process (AHP).Reference [36] described failures of the fuel feeding system that frequently occur in the sugar and pharmaceutical industries [37].Reference [38] proposed FMEA for electric power grids, such as solar photovoltaics.Reference [39] presented a basis for prioritizing health care problems.
According to [40], the traditional FMEA method cannot assign different weightings to the risk factors of O, S, and D and therefore may not be suitable for real-world situations.For these authors, introducing Grey Theory to the traditional FMEA enables engineers to allocate the relative importance of the risk factors O, S, and D based on the research and their No chance of detection There is no known mechanism for detecting the failure.9 Very remote/unreliable The failure can be detected only with thorough inspection and this is not feasible or cannot be readily done.8 7 Remote The error can be detected with manual inspection but no process is in place, so detection is left to chance.6

Moderate chance of detection
There is a process for double checks or inspection but it is not automated and/or is applied only to a sample and/or relies on vigilance.4 High There is 100% inspection or review of the process but it is not automated.

2
Very high There is 100% inspection of the process and it is automated.1 Almost certain There are automatic "shut-offs" or constraints that prevent failure.
experience.In general, the major advantages of applying the grey method to FMEA are the following capabilities: assigning different weightings to each factor and not requiring any type of utility function [41].
References [32,33] pointed out that the use of Grey Theory within the FMEA framework is practicable and can be accomplished.Reference [42] examined the ability to predict tanker equipment failure.Reference [43] proposed an approach that is expected to help service managers manage service failures.Thus, Grey Theory is one approach employed to improve the evaluation of risk.[44], is a methodology that is used to solve uncertainty problems; it allows one to deal with systems that have imperfect or incomplete information or that even lack information.Grey Theory comprises grey numbers, grey relations (which this paper uses in the form of Grey Relational Analysis, GRA), and grey elements.These three essential components are used to replace classical mathematics [45].

Grey Theory. Grey Theory, introduced by
In grey system theory, a system with information that is certain is called a white system; a system with information that is totally unknown is called a black system; a system with partially known and partially unknown information is called a grey system [46].Reference [47] argued that, in recent days, grey system theory is receiving increasing attention in the field of decision-making and has been successfully applied to many important problems featuring uncertainty such as supplier selection [48,49], medical diagnosis [50], work safety [40], portfolio selection [51], and classification algorithms evaluation and selection [52].
According to [53], a grey system is defined as a system containing uncertain information presented by a grey number and grey variables.Another important definition is that of a grey set  (of a universal set ), which is defined by its two mappings   () and   () as follows: where   () ≥   (),  ∈ ,  = , and   () and   () are the upper and lower membership functions in , respectively.
A grey number is the most fundamental concept in grey system theory and can be defined as a number with uncertain information.Therefore, a white number is a real number  ∈ R, and a grey number, written as ⨂ , refers to an indeterminate real number that takes its possible values from within an interval or a discrete set of numbers.In other words, a grey number, ⨂ , is then defined as an interval with a known lower limit and a known upper limit, that is, as ⨂  [, ].Supposing there are two different grey numbers denoted by ⨂  1 and ⨂  2 , the mathematical operation rules of general grey numbers are as follows: GRA is a part of Grey Theory and can be used together with various correlated indicators to evaluate and analyze the performance of complex systems [54,55].In fact, GRA has been successfully used in FMEA and its results have been proven to be satisfactory.Compared to other methods, GRA has competitive advantages in terms of having shown the ability to process uncertainty and to deal with multi-input systems, discrete data, and data incompleteness effectively [55].In addition, [41] argues that results generated by the combination of Grey Theory and FMEA are more unbiased than those of traditional FMEA, and [42] claims that combining Fuzzy Theory and Grey Theory with FMEA leads to more useful and practical results.
GRA is an impact evaluation model that measures the degree of similarity or difference between two sequences based on the degree of their relationship.In GRA, a global comparison between two sets of data is undertaken instead of using a local comparison by measuring the distance between two points [56].Its basic principle is that if a comparability sequence translated from an alternative has a higher grey relational degree between the reference sequence and itself, then the alternative will be the better choice.Therefore, the analytic procedure of GRA normally consists of four parts: generating the grey relational situation, defining the reference sequence, calculating the grey relational coefficient, and finally calculating the grey relational degree [55,57].The comparative sequence denotes the sequences that should be evaluated by GRA and the reference sequence is the original reference that is compared with the comparative sequence.Normally, the reference sequence is defined as a vector consisting of (1, 1, . . ., 1, . . ., 1).GRA aims to find the alternative that has the comparability sequence that is the closest to the reference sequence [43].

Critical Analysis.
Big data comprises complex data that is massively produced and managed in geographically dispersed repositories [63].Such complexity motivates the development of advanced management techniques and technologies for dealing with the challenges of big data.Moreover, how best to assess the security of big data is an emerging research area that has attracted abundant attention in recent years.Existing security approaches carry out checking on data processing in diverse modes.The ultimate goal of these approaches is to preserve the integrity and privacy of data and to undertake computations in single and distributed storage environments irrespective of the underlying resource margins [11].
However, as discussed in [11], traditional data security technologies are no longer pertinent to solving big data security problems completely.These technologies are unable to provide dynamic monitoring of how data and security are protected.In fact, they were developed for static datasets, but data is now changing dynamically [64].Thus, it has become hard to implement effective privacy and security protection mechanisms that can handle large amounts of data in complex circumstances.
In a general way, FMEA has been extensively used for examining potential failures in many industries.Moreover, FMEA together with Fuzzy Theory and/or Grey Theory has been widely and successfully used in the risk management of information systems [12], equipment failure [42], and failure in services [43].
Because the modeling of complex dynamic big data requires methods that combine human knowledge and experience as well as expert judgment, this paper uses GRA to evaluate the level of uncertainty associated with assessing big data in the presence or absence of threats.It also provides a structured approach in order to incorporate the impact of risk factors for big data into a more comprehensive definition of scenarios with negative outcomes and facilitates the assessment of risk by breaking down the overall risk to big data.Finally, its efficient evaluation criteria can help enterprises reduce the risks associated with big data.
Therefore, from a security and privacy perspective, big data is different from other traditional data and requires a different approach.Many of the existing methodologies and preferred practices cannot be extended to support the big data paradigm.Big data appears to have similar risks and exposures to traditional data.However, there are several key areas where they are dramatically different.
In this context, variety and volume translate into higher risks of exposure in the event of a breach due to variability in demand, which requires a versatile management platform for storing, processing, and managing complex data.In addition, the new paradigm for big data presents data characteristics at different levels of granularity and big data projects often encompass heterogeneous components.Another point of view states that new types of data are uncovering new privacy implications, with few privacy laws or guidelines to protect that information.

The Proposed Model
In this paper, an approach to big data risk management using GRA has been developed to analyze the dimensions that are critical to big data, as described by [65], based on FMEA and [31,32].The approach proposed is presented in Figure 1.
The new big data paradigm needs to work with far more than the traditional subsets of internal data.This paradigm incorporates a large volume of unstructured information, looks for nonobvious correlations that might drive new hypotheses, and must work with data that float into the organization in real time and that require real-time analysis and response.Therefore, in this paper, we analyzed the processing characteristics of the IBM Big Data Platform for illustrative purposes, but it is important to note that all big data platforms are vulnerable to both external and internal threats.Therefore, since our analysis model based on the probability of the occurrence of failure covers a wide view of the architecture of big data, it is eligible for analyzing other platforms, such as cloud computing infrastructures [66] and platforms from business scenarios [67].Finally, our model considers the possible occurrence of failures in the distributed data and then we consider its implementation in a distributed way.

Expert Knowledge or Past Data regarding Previous Failures.
The first step in the approach consists of expert identification or use of past data.The expert is the person who knows the enterprise systems and their vulnerability and is thus able to assess the information security risk of the organization in terms of the four dimensions [65].One may also identify a group of experts in this step, and the analysis is accomplished by considering a composition of their judgments or the use of a dataset of past failures.The inclusion of an expert system in the model is also encouraged.
According to [68], an expert is someone with multiple skills who understands the working environment and has substantial training in and knowledge of the system being evaluated.Risk management models have widely used expert knowledge to provide value judgments that represent the expert's perceptions and/or preferences.For instance, [69] provides evidence obtained from two unbiased and independent experts regarding the risk of release of a highly flammable gas near a processing facility.References [70,71] explore a risk measure of underground vaults that considers the consequences of arc faults using a single expert's a priori knowledge.Reference [19] proposes information security risk management using FMEA, Fuzzy Theory, and expert knowledge.Reference [72] analyzes the risk probability of an underwater tunnel excavation using the knowledge of four experts.

Determination and Evaluation of Potential Failure Modes (FMEA).
In a general way, this step concerns the determination of the failure modes associated with the big data dimensions (Figure 2) in terms of their vulnerabilities.Each dimension is described in Table 5.
Furthermore, these dimensions can be damaged by various associated activities.Table 6 presents failure modes relating to the vulnerability of big data for each dimension.

Dimension Description Identification and access management
Given the opportunity to increase knowledge by accessing big data, it is necessary that only authorized persons can access it; thus, big data requires confidentiality and authenticity; to address this problem, [58] mentioned that sometimes both are needed simultaneously; this source recommended and proposed three different schemes: an encryption scheme, a signature scheme, and a sign-encryption scheme Device and application registration Data provenance refers to information about the history of a creation process; in other words, it refers to a mechanism that can be used to validate whether input data is coming from an authenticated source to guarantee a degree of information integrity [59]; then, provenance-related security and trustworthiness issues also arise in the system [60]; they include the registration of devices in machine-to-machine (M2M) and Internet-of-Things (IoT) networks, which can be considered one of the major issues in the area of security [61] Infrastructure management As big data physical infrastructures increase, difficulties associated with designing effective physical security also arise; thus, we use the term "system health" to describe the intersection of the information worker and the nominal conditions for infrastructure management monitoring of big data for security purposes, which include technical issues regarding the interoperability of services [62] Data governance Data governance can ensure appropriate controls without inhibiting the speed and flexibility of innovative big data approaches and technologies, which need to be established for different management levels with a clear security strategy

Big data security
Identification and access management Data governance Infrastructure management In fact, the determination of the failure modes is achieved using the FMEA methodology and evaluated regarding its occurrence (O), severity (S), and detection (D).

Establish Comparative
Series.An information series with  decision factors, such as chance of occurrence, severity of failure, or chance of lack of detection, can be expressed as follows: = (  (1) ,   (2) , . . .,   ()) . (4) These comparative series can be provided by an expert or any dataset of previous failures, based on the scales described in Tables 2-4.

Obtain the Difference between the Comparative Series and the Standard Series.
To discover the degree of the grey relationship, the difference between the score of the decision factors and the norm of the standard series must be determined and expressed by a matrix calculated by where  is the number of failure modes in the analysis [31].

Compute the Grey Relational Coefficient.
The grey relational coefficient is calculated by where  is an identifier, normally set to 0.5 [31].It only affects the relative value of risk, not the priority.

Determine the Degree of Relation.
Before finding the degree of relation, the relative weight of the decision factors is first decided so that it can be used in the following formulation [31].In a general way, it is calculated by where   is the risk factors' weighting and, as a result, ∑  =1   = 1.
3.8.Rank the Priority of Risk.This step consists of dimension ordering.Based on the degree of relation between the comparative series and the standard series, a relational series can be constructed.The greater the degree of relation, the smaller the effect of the cause [31].

An Illustrative Example
To demonstrate the applicability of our proposition based on FMEA and Grey Theory, an example based on a real context is presented in this section.The steps performed are the same as shown in Figure 1, explained in Section 3. Following these steps, the expert selected for this study is a senior academic with more than 20 years' experience.She holds a Ph.D. degree in information systems (IS), has published 12 papers in this field, and also has experience as a consultant in IS to companies in the private sector.
In the following step of the proposed model, the four dimensions associated with the potential failures of big data are represented according to Figure 2 and described in Table 5.Furthermore, Table 6 presents the failure modes relating to the vulnerability of big data for each dimension.Based on these potential failures, Tables 7 and 8 show the establishment of comparative and standard series for occurrence, severity, and detection, respectively.
To proceed to a grey relational analysis of potential accidents, it is necessary to obtain the difference between comparative series and standard series, according to (4).Table 9 shows the result of this difference.
In order to rank the priority of risk, it is necessary to compute both the grey relational coefficient (Table 10) and the degree of relation (Table 11) using ( 5), (6), and (7).Therefore, the greater the degree of relation, the smaller the effect of the cause.Assuming equal weights for risk factors, Table 11 also presents the degree of grey relation for each failure mode and dimension and final ranking.
From the analysis of failures using the proposed approach, we have shown that big data is mainly in need of structured policies for data governance.This result was expected because the veracity and provenance of data are fundamental to information security; otherwise, the vulnerabilities may be catastrophic or big data may have little value for the acquisition of knowledge.Data governance is also an aspect that requires more awareness because it deals with large amounts of data and directly influences operational costs.
Since the model works with a recommendation rather than a solution and compatible recommendations depend on expert knowledge, it is important to test the robustness of this information and therefore to conduct sensitivity analysis.Thus, different weightings, based on the context, may also be used for different risk factors, as suggested by [33].Table 12 presents a sensitivity analysis conducted in order to evaluate the performance and validity of the results of the model.As can be seen, the final ranking of risk is the same for all the different weightings tested (±10%).

Discussion and Conclusions
The main difficulties in big data security risk analysis involve the volume of data and the variety of data connected to different databases.From the perspective of security and privacy, traditional databases have governance controls and a consolidated auditing process, while big data is at an early stage of development and hence continues to require structured analysis to address threats and vulnerabilities.Moreover, there is not yet enough research into risk analysis in the context of big data.Thus, security is one of the most important issues for the stability and development of big data.Aiming to identify the risk factors and the uncertainty associated with the propagation of vulnerabilities, this paper proposed a systematic framework based on FMEA and Grey Theory, more precisely GRA.This systematic framework allows for an evaluation of risk factors and their relative weightings in a linguistic, as opposed to a precise, manner for evaluation of big data failure modes.This is in line with the uncertain nature of the context.In fact, according to [40], the traditional FMEA method cannot assign different weightings to the risk factors of O, S, and D and therefore may not be suitable for real-world situations.These authors pointed out that introducing Grey Theory into the traditional FMEA method enables engineers to allocate relative importance to the O, S, and D risk factors based on research and their own experience.In a general way, another advantage of this proposal is that it requires less effort on the part of experts using linguistic terms.Consequently, these experts can make accurate judgments using linguistic terms based on their experience or on datasets relating to previous failures.
Based on the above information, the use of our proposal is justified to identify and assess big data risk in a quantitative manner.Moreover, this study comprises various security characteristics of big data using FMEA: it analyzes four dimensions, identification and access management, device and application registration, infrastructure management, and data governance, as well as 20 subdimensions that represent failure modes.Therefore, this work can be expected to serve as a guideline for managing big data failures in practice.
It is worth stating that the results presented greater awareness of data governance for ensuring appropriate controls.In this context, a challenge to the process of governing big data is to categorize, model, and map data as it is captured and stored, mainly because of the unstructured nature of the volume of information.Then, one role of data governance in the information security context is to allow for the information that contributes to reporting to be defined consistently across the organization in order to guide and structure the most important activities and to help clarify decisions.Briefly, analyzing data from the distant past to decide on a current situation does not mean that the data has higher value.From another perspective, increasing volume does not guarantee confidence in decisions, and one may use tools such as data mining and knowledge discovery, proposed in [73], to improve the decision process.
Indeed, the concept of storage management is a critical point, especially when volumes of data that exceed the storage capacity are considered [11].In fact, the emphasis of big data analytics is on how data is stored in a distributed fashion, for example, in traditional databases or in a cloud [66].When a cloud is used, data can be processed in parallel on many computing nodes, in distributed environments across clusters of machines [3].In conclusion, big data security must be seen as an important and challenging feature, capable of generating significant limitations.For instance, several electronic devices that enable communication via networks, especially via the Internet, and which place great emphasis on mobile trends allow for an increase in volume, variety, and even speed of data, which can thereby be defined as big

FMEAFigure 1 :
Figure 1: Flowchart of the proposed FMEA and Grey Theory based approach.

Table 10 : 4 : 2 :Table 11 : 3 : 6 : 1 A4. 2 :
Grey relational coefficient.Intentional access to network services, for example, proxy servers 0Failure of audit review of implemented policies and information security 0The degree of grey relation for each failure mode and each dimension and the final rank.Secret password divulged to any other user 0.708333 A1.4: Intentional access to network services, for example, proxy servers 0Unauthorized readout of data stored on a remote LAN 0.654762 A4: Data governance A4.1: Failure of interpretation and analysis of data 0Failure of audit review of implemented policies and information security 0.528499 A4.3: Failure to maximize new business value 0.526515 A4.4: Failure of real-time demand forecasts 0.484848

Table 1 :
Qualitative methodologies for risk analysis.

Table 5 :
Description of dimensions.

Table 6 :
Failure modes associated with each dimension of big data.

Table 9 :
Difference between comparative series and standard series.