An Empirical Assessment and Validation of Redundancy Metrics Using Defect Density as Reliability Indicator

Software metrics which are language-dependent are proposed as quantitative measures to assess internal quality factors for both method and class levels like cohesion and complexity.)e external quality factors like reliability and maintainability are in general predicted using different metrics of internal attributes. Literature review shows a lack of software metrics which are proposed for reliability measurement and prediction. In this context, a suite of four semantic language-independent metrics was proposed by Mili et al. (2014) to assess program redundancy using Shannon entropymeasure.)emain objective of these metrics is to monitor program reliability. Despite their important purpose, they are manually computed and only theoretically validated.)erefore, this paper aims to assess the redundancy metrics and empirically validate them as significant reliability indicators. As software reliability is an external attribute that cannot be directly evaluated, we employ other measurable quality factors that represent direct reflections of this attribute. Among these factors, defect density is widely used to measure and predict software reliability based on software metrics. )erefore, a linear regression technique is used to show the usefulness of these metrics as significant indicators of software defect density. A quantitative model is then proposed to predict software defect density based on redundancy metrics in order to monitor software reliability.


Introduction
Software quality is one of the main concerns of all organizations using software systems. According to [1], it means the capability of a software process to produce software product with good quality. In both software process and product, quality is described through a set of attributes or characteristics that may be internal or external. Concerning product quality, reliability is identified as one of the most important software quality attributes. Different techniques including fault prevention, removal, fault tolerance, and fault forecasting are defined to produce reliable software systems [2,3]. Furthermore, literature shows that software reliability like other external attributes is difficult to measure directly [4][5][6]. us, it is generally measured and predicted based on other quality attributes like defect density and fault-proneness identified as direct reflections of reliability [7]. ese attributes are directly measurable through software metrics and widely used to validate different suites software metrics. Examples include complexity, cohesion, defect density, and fault-proneness [6,[8][9][10].
Software metrics are quantitative measures used to make evaluation, improvement, and prediction of different software quality attributes [11][12][13][14][15]. Numerous software metrics (or combination of them) are proposed to assess internal attributes like complexity, cohesion and size, faults count, and defect density. Concerning external attributes like reliability and maintainability, they are generally expressed through internal attributes using the metrics proposed to measure them. For instance, reliability can be reflected through cohesion, coupling, and complexity using their related metrics [5].
Continuous research studies focusing on the assessment and prediction of software reliability are needed. erefore, a suite of four semantic metrics (language-independent) are proposed in [16] to assess programs redundancy in order to monitor their reliability. However, these metrics are theoretically presented and manually computed for basic arithmetic operations. Furthermore, it is unclear how they can be computed for more complex programs (functions and classes). Using basic programs containing simple operations, we cannot assume that the metrics are related to programs' redundancy. us, the authors need to perform the empirical assessment based on more complex software projects to convincingly show that the proposed metrics are related to program redundancy. On the contrary, concerning the hypothesis of these metrics as useful indicators of software reliability, there are no concrete examples or evidence of how these metrics can be linked to reliability. erefore, empirical studies focusing on concrete relationship between these metrics and various reliability factors are required.
e authors in [17,18] focused on the empirical validation of the redundancy metrics as measures of programs mutants' rate. e manual assessment of these metrics for basic arithmetic operations and certain types of computation-based programs represent the main limitation of these studies.
Software metrics validation consists on identifying their utility by studying the relationship between these metrics and a quality attribute. e most used quality attributes as reliability indicators are defect density and fault-proneness [6, 8-10, 14, 19].
is relationship can be exploited to propose a quantitative model helping to predict a quality attribute through these metrics [9,14].
To perform this validation, we used an empirical dataset including the different metrics values and defect density attribute. Metrics values are computed using a set of classes taken from Apache projects. Defect density attribute is obtained using the fault injection procedure on the same classes. Using the linear regression technique, this relationship is exploited to propose a quantitative model that reflects programs defect density through these metrics. is model is also useful to predict defect density for further datasets.
is paper is organized as follows: Section 2 presents the related works and motivation. Section 3 discusses the proposed empirical assessment approach. Section 4 outlines the empirical validation of redundancy metrics as defect density predictors. Section 5 analyses the experimental results. Conclusion and perspectives are reported in Section 6.

Related Works and Motivation
is section focuses on software reliability measurement, software redundancy, and entropy concepts. Furthermore, it describes the redundancy metrics suite.

Software Reliability.
Software reliability is one of the most important software quality attributes [11,20]. It is defined from two main sides: (i) e mathematical side considers reliability as the probability of failure-free operation for a specified period of time in a specified environment [3]. (ii) e broader side considers reliability as the degree to which a system, product, or component performs specified functions under specified conditions for a specified period of time [2]. It is described by combining different subcharacteristics which are maturity, availability, fault tolerance, and recoverability [3].
Different techniques, namely, fault prevention, fault elimination, fault tolerance, and fault forecasting, are defined to reflect software reliability [11,21,22]. Software prevention mechanisms help to prevent faults; however, they cannot guarantee the avoidance of all software faults [23]. Consequently, further protective mechanisms like fault removal are required. For [22,24], fault removal techniques are important and critical for software reliability. However, these techniques cannot guarantee the elimination of all faults since they are based on software testing and formal inspections which in turn have their problems. Besides, Mili and Tchier [22] argued that while removing one fault, other ones can appear. ey also note that some faults cannot be removed since they are not sensitized or masked (actual states are equal to the stated ones), but these faults can cause software failure. Consequently, fault tolerance techniques and forecasting are required to enhance the performance of fault avoidance and removal to reduce faults and to improve reliability [23][24][25].
Bansyia et al. [19] noted that reliability is one of the highlevel quality attributes that are abstract concepts and cannot be directly observed and measured. Different models based on direct metrics are proposed to predict various external quality attributes including reliability. Chidamber and Kemerer (C&K) [19,[26][27][28][29] proposed a suite of metrics used for reliability prediction. e proposed reliability prediction models use software metrics (called independent variables) to evaluate measurable reliability attributes (called dependent variable) like defect density, fault-proneness, and defect count [4, 8-10, 14, 30]. Mili et al. [16] also proposed a suite of four metrics to monitor programs reliability based on their redundancy.

Software Redundancy.
Redundancy concept was firstly used in hardware systems. It provides more physical copies of components, i.e., processors and memories, to improve their reliability [31]. Different forms of redundancy were defined in software systems like information redundancy (code redundancy), functional redundancy, and time redundancy [22,32,33]: (i) Information redundancy indicates the excess of information expressed in Shannon bits used to represent the state of a program [34]. (ii) Functional redundancy consists on using the same program specification to generate different algorithm or program versions performing the same functionality [33]. (iii) Time redundancy represented by the time was used to repeat the execution of the failed process [24,31].
Literature review shows that redundancy can be used in different applications. For instance, in 1976, redundancy was exploited through N-version programming technique to achieve reliable systems [35]. is technique was also used to compare the probability of failure between single and N-version systems. Furthermore, in [21], the authors noted that redundancy is useful for self-checking programs. In addition, in 2003, Lyu [25] showed that redundancy structured in mutation testing is exploited through the injection of faults in numerous program versions. Furthermore, in 2015, redundancy was also used to identify the semantic similarity of code fragments [36]. e redundancy metrics proposed by Mili et al. [16,22] assess program information redundancy provided by the different states of the program [34]. Program states are the set of variables manipulating related data and reflect the uncertainty about the outcome of these variables. To develop a better understanding of the redundancy metrics, we present the terminology related to program states: (i) State space is defined by a set of values that the declared program variables may take [16]. (ii) Initial state space is the state of the program (function/class) represented by the input variables [22]. (iii) Current state (actual state) is the set of states that the program may be in at any given point [16,37]. (iv) Final state space is the state of the program that is produced by the output variables [22]. (v) State redundancy arises when the representation of the program state allows a wider range of values than needed to represent the different states [32].
ese definitions are illustrated in the following example: Let the following program: { int s, x, result; //state space of the program int s � 2; //initial state of the program s � s + 1; //internal state 1 of the program s � 2 * s; //internal state 2 of the program s � s%3; //internal state 3 of the program s � s + 12; //final state of the program } 2.3. Entropy Concept. As mentioned, the redundancy metrics were defined based on the Shannon entropy measure. us, for a given random variable X that takes its values in a finite set, its entropy is the function denoted by H(X) and defined by [34,38] H(X) � − where p(x i ) is the probability of the variable X � x i . Intuitively, this function measures (in bits) the uncertainty pertaining to the outcome of X and takes its maximum value H(X) � log 2 (N) when the probability distribution is uniform. N is the cardinality of X.

Redundancy Metrics Suite.
Redundancy metrics were defined based on Shannon entropy measure of programs code. Mili et al. [16] supported the hypothesis of uniform probability distribution in order to perform the analytical evaluation of these metrics. e authors in [16,17] argued that, for a random variable X that takes values of a 32-bit integer, N equals 2 32 and log 2 (N) is then merely equal to 32 bits. ey assumed the uniform probability throughout; then, the entropy of any program variable is basically the number of bits in that variable. e entropy of a given variable X of value N is given as Four metrics were defined which are initial state redundancy, final state redundancy, functional redundancy, and noninjectivity. To define these metrics, the following assumptions were made by Mili et al. [16]: (i) Probability distribution of the different variables is uniform. (ii) Variables are 32 bits size. (iii) Metrics were computed at method level. ese methods manipulate input and output' variables of integer type. is means programs with input states were represented by the declared variables and output states were represented by the modified states of these variables.

Initial and Final State Redundancy Metrics.
We recall that the state of a given program g is defined by its declared variables. It is very common to declare the range of values related to these variables much more than it is really required. For instance, the age of an employee is generally declared as an integer variable type. However, only a restrict range, i.e., between 0 and 120, is really required. is means that 7 bits are sufficient to store the age variable, but the typical 32 bits size of an integer variable is used. e unused bits measure the code redundancy.
us, state redundancy represents the gap between the declared state and the actual state (really used) of this program [16,18,22].
Mathematically, let S be the declared state of the program (its variables) and σ be its actual state (the actual values of these variables), and then, the state redundancy is given by the difference between their respective entropies denoted by H(S) and H(σ). e program moves from its initial states (σ 1 ) to its final states (σ f ), and then, two state redundancy measures, namely, initial state redundancy (ISR) and final state redundancy (FSR), were defined by the following equations [18] (Table 1):  [18], the functional redundancy metric is a function from initial states to final states. It reflects how initial states are mapped to final states. It also identifies how initial states are affected by input data and how final states are projected onto output data. Mathematically, for a program (function), FR is the ratio of the output data delivered by g prorated to the input data received by the program and given by is the entropy of input data passed through parameters, global variables, and read statements.
Considering the same previous example, H (S) � 96 bits. e random variable Y is defined by the integer variable z represented by 32 bits.
en, H (Y) � log 2 (2 32 ) � 32 bits. H(X) is the input data received by g and represented by the two integer variables x and y. en, H(X) � 2 * log 2 (2 32 ) � 64, and FR is given by

Noninjectivity (NI).
According to [28], a major source of program (function) redundancy is its noninjectivity. An injective function is a function whose value changes whenever its argument does. A function is noninjective when it maps several distinct arguments (initial states σ 1 ) into the same image (final states σ f ). Mathematically, NI is the conditional entropy of the initial state given the final state: if we know the final state, how much uncertainty do we have about the initial state? According to [34], this conditional entropy equals the difference between entropies of these two states. Hence, NI was defined as Using the previous example, NI is equal to 44 − 6/ 44 � 0.86. A sum up of the presented metrics is illustrated in Table 2.

Motivation and Objective.
As mentioned, the main objective of redundancy metrics defined in [16] is to monitor product reliability.
is makes them important measures since software reliability is one of the most important quality attributes. However, the different metrics composing this suite are theoretically presented and manually computed for basic arithmetic operations. Furthermore, it is unclear how they can be computed for more complex programs. With very simple operation programs, we cannot assume that the metrics are related to programs' redundancy. erefore, we need to perform the empirical assessment based on more complex software programs (functions/classes) to convincingly show that the proposed metrics are related to program redundancy. In addition, there are no concrete examples or evidence of how these metrics can be linked to reliability.
us, empirical studies focusing on concrete (i) First, we aim to propose an approach to empirically assess the mentioned redundancy metrics. Solving this issue consists first on considering complex programs taken from real-world software projects rather than be limited to basic operations and certain types of computation-based programs [29,39]. e basic idea is to generate an empirical database including for each program (function/class) the values of the redundancy metrics. (ii) Second, we aim to propose an empirical validation of the proposed metrics as reliability indicators by considering defect density attribute as direct reflection of software reliability [5,7]. e basic idea is to exploit the generated database to study the relationship between the metrics and defect density using regression techniques [8,28,40].
erefore, we propose in Section 3 an empirical assessment approach to calculate the different metrics. In addition, we present in Section 4 an empirical validation approach to demonstrate the concrete relationship between these metrics and software reliability.

Empirical Assessment of Entropy-Based
Redundancy Metrics e proposed redundancy metrics were computed manually in [16] at function level for simple examples and for specific data type and values, i.e., greatest common division of two integer variables. us, in this paper, we will consider more complex examples taken from real-world software projects to automatically compute these metrics at the class level since software projects are organized in classes. ree main steps are used: (1) Selection of software classes: in this step, we have selected different classes from which the metrics will be generated. (2) Compute redundancy metrics: once the different classes are selected, we have used appropriate scripts to compute these metrics. (3) Construct the database: the implementation of the two previous steps helps to obtain an empirical database that contains for each class, the values of the different metrics. e presented steps will be detailed in the following sections.

Selection of Software Classes.
According to Radjenović et al. [29] and Kumar et al. [39], software repositories used to validate most of software metrics are of three main types: (i) Private/commercial repositories: this type of repositories was used and maintained by companies within the organizational use. In these repositories, source code and other related information like fault datasets are not available [29]. (ii) Partially public repositories: in these repositories, Radjenović et al. [29] noted that only the product source code and the related software faults are available, whereas the values of software metrics are usually unavailable so, they need to be calculated from the available source code and then mapped with their fault information. Regarding to [29], the mapping may lead to biased results. (iii) Public repositories: in these repositories, the values of software metrics and other information like software faults are usually available, and this justifies their uses in many research projects [27]. Some examples of these public repositories include PRe-dictOr Models In Software Engineering (PROMISE (http://promise.site.uottawa.ca/SERepository/ datasets-page.html)) repository of NASA projects, Software-artifact Infrastructure Repository (SIR (http://sir.csc.ncsu.edu/portal/index.php)), and Bug Prediction Dataset (BPD (http://bug.inf.usi.ch/ index.php)).
To perform the empirical assessment of redundancy metrics, we have focused on the repository containing programs of input/output type as explained above [16].
Given that computing redundancy metrics requires the availability of the source code, private and commercial repositories were not considered since the programs' source codes are not available. erefore, we have focused on partially public and public repositories. Literature review [7,41] shows that most of studies focused on metrics validation have used programs (classes or methods) taken from NASA projects like CM1, JM1, and KC1. In this context, the authors in [40] showed that, from 64 metrics' validation studies performed from 1991 to 2013, NASA projects were the most used (60%), followed by PROMISE repository datasets (15%) and other open-source projects (12%). In our research project, we have proceeded as follows: (1) First, we have used NASA then PROMISE projects for which different information are available including the values of software metrics. Furthermore, faults datasets of these projects are also available; in addition, for each class, we can identify if it is faultfree or not (true/false). However, we have decided to not use this repository as we have identified the unavailability of the source code mandatory to our study as we need to compute the redundancy metrics using this code. (2) Next, we have focused on a set of open-source projects, some of them are not of input/output type like SIR′ velocity and Camel projects. Others do not include source code such as BPD repository. Literature review [39] shows that Apache Common library including different java projects with available source code of input/output type was also used to validate software metrics. Besides, in this repository, the unit tests related to the classes are available.
Consequently, to select the needed repository, we have considered Apache Commons products library which respects all our requirements and hypothesis. en, from the selected repository, we have considered a set 43 classes (see Table 3) containing functions manipulating variables in the input and the output state.
A description of each class and its related function is available at http://commons.apache.org/proper/commonsmath/javadocs/api-3.6/.

Automatic Metrics Computing and Database
Construction.
e process we used to compute redundancy metrics (ISR, FSR, FR and NI) is summarized in Figure 1. Figure 1 presents the different steps used to compute redundancy metrics. ese steps are the same for the different selected classes presented in Table 3. To compute these metrics, we have used the Eclipse development environment (version: Neon.3 Release (4.6.3)). As the selected source code, we have used to compute these metrics organized in classes, and we present under here how redundancy metrics are computed at the class level. e computing process was developed using the following steps.

Compute Program State Space H(S).
To compute the state space H(S), we have first identified for each class the input/output functions manipulating the different states of the class variables. Next, we have computed H(S) as the maximum entropy of all function variables (input/output). For a better understanding of H(S) and the other metrics computing, an example of the used script is illustrated in Figure 2.
In Figure 2, H(S) is computed as shown in line 33, and its value is equated with the maximum entropy of the input and output variables used in lines 24 to 31. e input data related to these variables are randomly generated as shown in lines 34 to 36.

Compute H(σ 1 ) and ISR
Metric. H(σ 1 ) reflects the initial entropy of the input variables and the maximum entropy of the output ones. To compute the entropy of a variable (exact number of the used bits) presented by equation (2), a Java function called sizeOfBits is used. More details about this function are presented in Appendix A (see Figure 3). H(σ 1 ) is computed as illustrated in lines 48 and 49 of Figure 2. Using equation (3), ISR metric value is deduced as illustrated in line 64 of Figure 2

Compute FR and NI Metrics.
To compute FR and NI metrics, we have used equations (6)

Generate Metrics Values to Excel Files.
We have noted that the metrics values were generated using 1000 iterations to test different possibilities of random inputs as shown in lines 19 and 20 of Figure 2. To store the values of these metrics, we have generated them in .xsl files as shown in lines 16 and 91. en, we have computed for each class the average of the 1000 generated metrics values to construct the final database. A part of this database is illustrated in Figure 4. e presented process can be also used to compute the redundancy metrics at a function level. us, an example of the used script is illustrated in Figure 5 of Appendix A. e source of the metrics generator is available at https://gitlab. com/dalilaamara/redundancymetrics.

Empirical Validation of Semantic Metrics as Reliability Indicators
According to [5], a valid metric is one whose values are statistically associated with a quality attribute. e empirical validation aims not only to identify the utility of a proposed metric and to make comparisons with other metrics but also to identify which metrics are not useful. Reliability can be reflected by other measurable attributes including defect density and fault-proneness. In our research, we have focused on the defect density attribute as fault-proneness attribute indicates whether a class contains faults (1) or not (0), and in our research, all classes in the constructed redundancy database contain faults.
6 Scientific Programming

Formulation of Research Hypotheses.
To study the relationship between redundancy metrics and software defect density attribute, the following hypotheses are designed: (i) H1: ISR redundancy metric is significant as software defect density indicator. (ii) H2: FR redundancy metric is significant as software defect density indicator. (iii) H3: NI redundancy metric is significant as software defect density indicator. (iv) H4: ISR, FR, and NI (or combination of them) are jointly indicators of software defect density.
rough these hypotheses, we aim to verify if a relationship between the different metrics and defect density attribute exists. Once, a significant correlation between redundancy metrics and the defect density is identified, and it can be stated that these metrics are useful to monitor software reliability.

Software Defect Density.
Defect density (DD) was defined as the number of defects divided by thousand lines of a delivered code [5,42,43]. It is given as follows [5,7]:       Scientific Programming (i) e product size is in general measured in terms of thousand lines of code (KLOC) [42,43]. (ii) According to [5], defect counts can include postrelease failures, residual faults (all faults discovered after release), all known faults, and the set of faults discovered after some arbitrary fixed points in the software life cycle (after unit testing).
Once the quality attribute is identified, the next step consists on studying the existence of a relationship between this attribute as dependent variable and the different redundancy metrics as independent variables.

Empirical Validation Approach.
A software metric shall be validated [26] as the validation helps to identify the best metrics providing the required information leading to the metrics' purpose [14].
Different studies [4,19,26] detail various software metric suites' validation. Table 4 shows a comparison between the common validation approaches based on their objectives, process validation, and used repositories.
As illustrated in Table 4, different studies were proposed to validate software metrics as appropriate indicators of various quality factors like defect density, maintainability, and fault-proneness. We have stated the following: (i) e different validation approaches were based on three main steps: dataset collection, dataset analysis and models building, and models' performance evaluation. (ii) e data related to software metrics and the considered quality attribute to validate them are available in public datasets including NASA projects. erefore, our proposed methodology to validate redundancy metrics consists of the following steps: (1) First, we have collected data related to dependent and independent variables represented, respectively, by defect density and redundancy metrics. As explained above, redundancy metrics are computed for a set of classes selected from Commons Apache library. In these classes, the defect density was not available. us, we have used defect injection procedure to compute this attribute for the same used classes. (2) Second, we have studied the independent and joint impact of the different redundancy metrics on software defect density using data analysis tools. (3) ird, we have proposed a defect density predictive model based on redundancy metrics.

Defect Density Data Collection.
Based on equation (9), defect density is derived from the number of faults in the source code divided by thousands of line of code (KLOC) from the classes presented in Table 3.

KLOC Computing.
To compute the KLOC measure, we have used the Metrics tool [47]. Within Eclipse environment, this tool provides for each of the used classes the number of lines of code illustrated in Figure 6.

Defect Count Computing.
As mentioned, we have used the Apache commons Math library including only the source code and the associated unit tests. us, we used fault injection procedure to obtain the values of this measure. One of the well-known fault injection techniques is mutation testing consisting on automatically seeding into each class' code a number of faults (or mutations). e new classes are called mutants. en, tests are run, and two possible cases are presented [48,49]: Fault injection procedure is performed based on automated mutation tools like MuJava, MuEclipse, PiTest, and much more [50]. In our research work, PiTest (https://pitest.org/) is used within Maven (https://maven. apache.org/) environment. To inject faults, we have proceeded as follows:  Figure 7(b), it presents a report of the injected faults in the Erf class especially for erf (double, double) function. e green lines indicate that the injected fault is detected, whereas the pink one indicates that it is masked. Based on these reports, the number of the injected faults is used to compute defect density measure for each class. e final structure of the obtained database contains for each class the values of the four metrics and the density attribute.
is database is available at https://gitlab.com/dalilaamara/ redundancymetrics/. (1) Historical data related to software metrics and static code attributes (size and number of methods) were collected from these projects (2) Defect density attribute was predicted using the simple and multiple linear regression techniques applied to static metrics (3) Results were evaluated based on R-squared performance evaluation measure Cited in [46] C&K metrics were used to predict software maintainability attribute; the number of lines changed per class was considered as a criterion in determining the maintainability of a class User interface system (UIMS) and quality evaluation system (QUES) were used to extract the needed information (number of lines changed per class) (1) Historical data of three years related to the number of lines changed per class in the selected software systems was used; also, the C&K metrics were extracted using metrics extraction tools (2) Neurogenetic algorithm (hybrid approach of neural network and genetic algorithm) was applied to estimate the maintainability attribute based on these metrics (3) e performance of this technique was evaluated using the mean absolute error (MAE), mean absolute relative error (MARE), root mean square error (RMSE), and standard error of the mean (SEM) evaluation measures Figure 6: Example of LOC computing.

Redundancy Metrics Data Collection.
We have presented in the previous section the procedure to inject faults into the source code of the different selected classes from the Commons Math library. To test whether the metrics values can be affected by the injected faults, we have computed the redundancy metrics after the fault injection process. us, we have adopted the following steps (Table 5) Table 5. Further details are available at https://pitest.org/quickstart/mutators/. (ii) Step 2: we have computed the values of the redundancy metrics using formulas 3 to 6 presented in Section 2 for each injected type of mutators (faults). (iii) Step 3: the constructed dataset contains the values of the redundancy metrics for each type of the selected mutators. It will be used to determine whether the values of the redundancy metrics are still unchangeable when faults are injected.

Dataset Analysis.
Dataset analysis phase requires other important steps like data normalization/standardization and correlation analysis. Normalization and standardization are feature scale transformation. According to [51], in case we have a large difference between the maximum and minimum values, e.g., 0.01 and 1000, we should rescale them in the range [0, 1]. In this study, the used metrics are defined in such a way that they range between 0 and 1 [18]. Correlation analysis is required to identify the association between the different independent variables (redundancy metrics) in order to consider only significant ones (not intercorrelated). To test the redundancy metrics correlation, we have used Python language. e results are illustrated in Figure 8.
Correlation coefficients between the independent variables are analyzed based on Hopkin's statements [52]: are considered as insubstantial. Figure 8 indicates a strong significant correlation between NI and FSR metrics since their correlation coefficient is equal to 0.99. us, one of these variables will be omitted.

Experiments and Results
In this section, we have evaluated the usefulness of redundancy metrics as predictors of defect density to test the stated hypotheses. According to [7,30], regression techniques are used since the defect density attribute we aim to predict is represented by quantitative values.

Experiments.
As mentioned, linear (simple and multiple) can be used to predict defect density based on redundancy metrics. us, we present the general form of each regression type as follows: (i) e general form of the simple linear regression model is presented as (ii) e general form of the multiple linear regression model is presented as 12 Scientific Programming In the presented equations, the Y represents the dependent variable, the X i represent the independent variables, β i are the estimated parameters, and Epsilon (ε) is the random error. Using the presented formulas (10) and (11), three main experiments are performed: (iii) Experiment 1: before studying the linear regression between the redundancy metrics and the defect density, we have tested whether the values of the redundancy metrics are affected by the injected faults. erefore, for each metric, we have represented its variation with the five selected mutators described in Table 4. (iv) Experiment 2: in this experiment, we have used the univariate linear regression to test the hypotheses H1 to H3 presented in Section 4.1. (v) Experiment 3: in this experiment, the multivariate linear regression is used to test the hypothesis H4.

Experiment 1: Variation in the Redundancy Metrics with the Injected Faults.
We have studied the variation in the ISR, FSR, and NI redundancy metrics for each type of the injected mutators for a set of classes selected from the constructed dataset (see Figure 9). We have only focused on these metrics since FR redundancy metric is computed using the maximum entropy of the input and output data insensitive to any change in the variables' states (see equation (6)). Results are depicted in Figure 9. Figure 9 shows some variation in the values of the redundancy metrics for the different mutators. is variation indicates that change in mutators of different types affects the redundancy of the source code. us, the state of the variables used to assess this redundancy changes with the injected faults. Figure 10 illustrates a part of the mutated source code of the BesselJ class.
In line 446, 2 injected faults of Math mutators are survived, and these faults consist on replacing, respectively, the multiplication with a division and the subtraction with an addition. erefore, the state of the p variable will be modified and will affect the state of the subsequent instructions. In line 464, other types of mutators were injected, which are Mut 1, Mut 5, and Mut 2. ese mutators affect the state of the different variables used by the program showing a variation in the redundancy metrics as their values depend on the variables' states.
In experiment 1, the variability of the redundancy metrics for the different mutators is negligible in some classes, as these faults cause a little change in the number of needed bits representing the variable states. For instance, in Figure 10 line 446, if we consider the variables en � 2, plast � 1, x � 1, and pold � 1, then the value of p variable before and after the injection of Math mutators (replace multiplication with division and subtraction with addition) is equal to 1. us, the redundancy provided by the state of the p variable is still unchangeable. In line 465, if we consider l � 7, then the states of ncalc variable before and after the Math mutators (replace subtraction with division) are, respectively, 6 and 8. e required entropy to represent these states is 3 bits. erefore, there is no variation in the redundancy metrics between these two states (6 and 8). is explains the similar values of redundancy metrics for the different mutators presented in Figure 9.

Experiment 2: Univariate Linear Regression.
e univariate linear regression uses only one independent variable (one of the presented redundancy metrics) to predict defect density. us, we perform univariate linear regression to test separately the three first hypotheses presented above. Results of this experiment are illustrated in Table 6.   Results shown in Table 6 are analyzed based on p value measure. is measure is defined as the probability of error which is the significance level that is used to accept or reject the hypothesis [7]. Two possible cases are presented: (i) To accept the hypothesis, the p value must be less than or equal to 0.05. (ii) Otherwise, reject the hypothesis.
Taking defect density as dependent variable and redundancy metrics as predictors and based on previous statements, the results in Table 6 are summarized as follows: (i) For H1, p value is 0.000 and less than 0.05. So the H1 hypothesis is accepted which means that ISR metric can be considered as a significant defect density predictor. (ii) For FR, p value is 0.000. us, we can accept the hypothesis H3 which indicates that FR redundancy metric can be considered as a significant predictor of defect density attribute. (iii) For NI, p value is 0.000. Using previous statements, the hypothesis H3 is also accepted which means that NI can be considered as a significant predictor of defect density attribute.

Experiment 3: Multivariate Linear Regression.
Once univariate linear regressions are performed to identify the relationship between each redundancy metric and defect density separately, we aim in this experiment to join these metrics and study their common effect on defect density. For this, we have tested the multivariate regression for the hypothesis H4 based on equation (11). Results are summarized in Figure 11. Taking defect density as dependent variable and redundancy metrics as predictors, results in Figure 9 show the following: (i) For ISR and NI, the p values are less than 0.05 and equal, respectively, to 0.011 and 0.000. Consequently, the hypothesis H5 which supposes that redundancy metrics are useful as defect density predictors is accepted for ISR and NI. (ii) For FR metric, p value is greater than 0.05 and equal to 0.189. For this, this metric is omitted from the multivariate regression and only ISR and NI are considered as significant predictors of defect density attribute.

Model Performance Evaluation.
We present in this section the overall evaluation of the linear regression model and summary of results: (i) Model performance evaluation: model evaluation is required to evaluate the significance of the model to identify whether it fits well the data. Among the performance evaluation measures, the coefficient of determination (R-squared score) is the most used [41,53]. is measure represents the proportion of variance in the dependent variable that can be predicted from the independent variables.  A summary of results for the different above hypothesis based on the presented performance measures and model parameters is illustrated in Table 7.
For the different previous hypotheses and based on the presented performance evaluation measures, Table 7 shows the following: (i) For H1, H2, and H3, R-squared values indicate, respectively, that 38.8%, 47.3%, and 65.8% of the variability of defect density is predicted separately   by, respectively, ISR, FR, and NI. e obtained Rsquared values for these hypotheses are moderate and indicate that using the redundancy metrics as separate independent variables explains moderately the variation in defect density as a dependent variable. (ii) For H4, adjusted R-squared shows that ISR and NI jointly predict 73.6% variability of defect density. is indicates that, overall, the performed multiple regression can significantly predict the defect density. So, this multiple regression between redundancy metrics and defect density is justified.
To sum up, the combined impact of ISR and NI redundancy metrics is analyzed, and results show that this multiple regression is justified. So, using these metrics jointly gives more improvement in predicting defect density. e application of multiple linear regression provides a model that reflects the relationship between the redundancy metrics and defect density and can be depicted by the following equation based on beta coefficients presented in Table 7: Using previous statements, we can note the following: (i) For ISR, the regression coefficient is positive and equal to 0.727. is means that, for each 1-unit increase in the ISR metric, there will be an increase in defect density by 0.727 units. (ii) For NI, the regression coefficient is positive and equal to 0.462. Consequently, for each 1-unit increase in the NI variable, the defect density variable will increase by 0.462 units.

Discussion and reats to Validity.
Reliability is in general predicted based on predictive models which are developed using two basic elements: software metrics and software faults [8,54]. e proposed linear regression model is justified and can be used to predict the defect density for new datasets based on their redundancy measures. is defect prediction model can serve as early quality indicator for developers and testing teams to manage and control test execution activities. For different code alternations, different values of the ISR and NI redundancy metrics are obtained. e variation in these values can affect the defect density attribute.
We have obtained promising results proposing validated ISR and NI redundancy metrics as significant reliability indicators. However, we have noted several threads to validity. First, the proposed redundancy metrics are semantic as they depend on the program functionality; each program (function or class) has its state represented by the  manipulated variables. Hence, each time the used variables in the program input state change, the output state will change, and the values of the redundancy metrics will change too. erefore, the proposed computing process described in Section 3 is not automated, and it is implemented separately for each program. Second, the more the larger training datasets and optimizing model parameters are used, the better the model prediction performance [55], and our dataset can be extended to enhance the performance of the proposed prediction model. ird, literature review related to software metrics validation [7,10,12,13] shows that usually numerous quality attributes can be used to validate a software metric. Hence, we can use other reliability subcharacteristics like fault-proneness to show the utility of the redundancy metrics as reliability indicators.

Conclusion and Perspectives
Initial state redundancy, final state redundancy, noninjectivity, and functional redundancy metrics were proposed to assess the code' redundancy in order to monitor software reliability. However, all of these metrics are manually computed and theoretically presented. In this research, we aim at empirically assessing and validating these metrics as significant reliability indicators. We have used the defect density attribute as a direct reflection of software reliability to reach our objective.
We have built an empirical database including a set of Java classes taken from the Commons Math Library, all related redundancy metrics' values, and the defect density as a direct reliability indicator. is database has allowed us to empirically assess and validate the redundancy metrics as reliability indicators.
Regression techniques have been used to propose a predictive model based on the defect density attribute as a dependent variable and initial state redundancy and noninjectivity metrics as independent variables. e proposed model is useful for testers and developers and can be used to predict defect density and to monitor software reliability for further datasets.
As the initial state redundancy metric only measures the program redundancy in its initial and final states without considering the redundancy of its internal states, we propose in the future work, to improve this metric by considering its internal states in order to reflect the overall program redundancy. In addition, we envision to develop an automated support tool computing the redundancy metrics leading to ameliorate the performance of the computing process.

A
In this Appendix, we present examples of the used scripts to perform the empirical assessment and the validation of redundancy metrics as useful indicators of software reliability. Figure 3 presents an example of the sizeOfBits function identifying the needed entropy of the used variables in the different program (function/class) states.
In Figure 5, the different redundancy metrics are computed at method level. e computing process is the same as for the class level. However, at function level, only one function and the associated input and output variables are considered.

B
is appendix presents the process of mutations (faults) injection. us, three main steps were adopted to seed faults into java programs. ese steps consist on executing three Maven command lines: (1) mvn install: consists of installing Maven packages into the local repository; various actions are printed which end with build success result as shown in Figure 12.
(2) mvn test: required to compile test classes result as shown in Figure 13. (3) mvn org.pitest:pitest-maven:mutationCoverage: used to run mutations; build success result is obtained as shown in Figure 14.

Data Availability
Datasets used to perform our empirical research work are available through https://gitlab.com/dalilaamara/ redundancymetrics. is link is also included in the manuscript.

Conflicts of Interest
e authors declare that they have no conflicts of interest.