A New Generated Family of Distributions: Statistical Properties and Applications with Real-Life Data

Several standard distributions can be used to model lifetime data. Nevertheless, a number of these datasets from diverse ﬁ elds such as engineering, ﬁ nance, the environment, biological sciences, and others may not ﬁ t the standard distributions. As a result, there is a need to develop new distributions that incorporate a high degree of skewness and kurtosis while improving the degree of goodness-of-ﬁ t in empirical distributions. In this study, by applying the T-X method, we proposed a new ﬂ exible generated family, the Ramos-Louzada Generator (RL-G) with some relevant statistical properties such as quantile function, raw moments, incomplete moments, measures of inequality, entropy, mean and median deviations, and the reliability parameter. The RL-G family has the ability to model “ right, ” “ left, ” and “ symmetric ” data as well as di ﬀ erent shapes of the hazard function. The maximum likelihood estimation (MLE) method has been used to estimate the parameters of the RL-G. The asymptotic performance of the MLE is assessed by simulation analysis. Finally, the ﬂ exibility of the RL-G family is demonstrated through the application of three real complete datasets from rainfall, breaking stress of carbon ﬁ bers, and survival times of hypertension patients, and it is evident that the RL-Weibull, which is a special case of the RL-G family, outperformed its submodels and other distributions.


Introduction
Choosing an appropriate statistical distribution for modeling and analyzing data is critical in order to draw more accurate conclusions.Many statistical distributions have been proposed to match different data forms over the years.Using conventional distributions for fitting these datasets may produce erroneous findings.As a result, there is a clear need for modifications to the standard distributions.The literature on probability distribution methods contains various extensions and generalizations of continuous, discrete, symmetric, and asymmetric distributions.Regarding the main methods of generating probability distributions and classes of probability distributions, Lee et al. [1] stated that the transformation technique, differential equation technique, and quantile method are three groups of methods developed prior to 1980, and those proposed after 1980 may be categorized as combination methods because these techniques attempt to develop new distributions through the combination of existing ones or by adding additional parameters to an existing distribution.Several studies have proposed using different generated classes to increase the number of parameters in distributions.The resulting distributions have found application in modeling data across various fields of study, such as environmental sciences, economics, and engineering.Some popular generators available in the literature include the exponentiated generated family by [2], the Marshall-Olkin-G by [3], the Kumaraswamy-G by [4], Beta-G by [5], Weibull-X by [6], Weibull-G by [7], the Lomax generator proposed by [8], the Topp-Leone generated family introduced by [9], the Lindley generator by [10], the Chen-G class by [11], the Burr III Topp-Leone-G by [12], the odd Burr-III family by [13], Marshall-Olkin Burr X family by [14], the Topp-Leone odd Lindley-G family by [15], and many others.
In [16], the Ramos-Louzada (RL) distribution, a oneparameter continuous distribution, was introduced for modeling lifetime data.The study demonstrated that the RL distribution performs better than some well-known lifetime distributions such as Lindley and exponential distributions.However, the RL distribution is limited to right-skewed lifetime data with an increasing failure rate.Therefore, it is essential to propose an extension or generalization of the RL to introduce flexibility in modeling different lifetime data with "symmetric" and "asymmetric" shapes and "monotonic" and "non-monotonic" failure rate functions.[17] produced the generalized Ramos-Louzada (GRL) distribution, which is the first extension of the RL distribution.In [18], the discrete RL distribution was developed and proposed.This study adopts the T-X method introduced by [19] to develop the Ramos-Louzada Generator (RL-G), which is capable of producing new distributions that are extensions or generalizations of the RL distribution.Therefore, for any continuous random variable, by applying the T-X method defined in (1), the cumulative distribution function (CDF) of random variable X can be expressed using the RL-G.
where ω = θ, ϵ is a parameter vector, r t ; θ is the PDF generator of a random variable T, Z G x, ϵ is an expression that depends on the CDF of the random variable X, and c is a real number.
The remaining part of the study is structured in the following manner: In Section 2, the CDF, PDF, and hazard rate function of the RL-G family of distributions are presented.Section 3 presents the mixture representation of the RL-G density functions.In Section 4, we have derived the statistical properties of the RL-G family.Parameter estimation for the proposed family of distributions is discussed in Section 5. Some special distributions of the RL-G family are discussed in Section 6. Section 7 presents Monte Carlo simulation analysis on the asymptotic performance of the MLE.Applications of the proposed distribution to three real datasets to demonstrate its flexibility and usefulness are captured in Section 8, while Section 9 presents the conclusion of the study.

The Ramos-Louzada Generated Family of Distributions
Given that equations ( 2) and (3) represent the CDF and PDF of the RL distribution, Let G x ; ϵ represents the CDF of the baseline distribution and ϵ be a vector of parameters associated with the CDF.The proposed RL-G densities are obtained by using the T-X approach in (1) and letting Z G x, ϵ = − log 1 − G x, ϵ , x > 0,thus 2) and (3) into the above relation, we obtain the following: Hence, the CDF of the proposed RL-G is expressed as The proposed RL-G family PDF is derived by finding the derivative of (7), thus From which the survival and hazard functions are, respectively, obtained by Some basic motivations obtained when using RL-G densities are as follows: (i) The properties of the baseline densities are enhanced (ii) An extended form of the baseline model is generated with the introduction of extra parameter(s).
The proof of this proposition is shown in the appendix.

RL-G Family in Mixture Representation
This form of representation plays a very important role in deriving some statistical properties of the RL-G densities.
Using the following generalized binomial series and the power series expansions on (8) where z < 1, t > 0 is a real noninteger The pdf of the RL-G family, that is (8), now becomes where Thus, (13) represents an infinite linear combination of exp-G densities of the baseline density.The linear representation form of the RL-G facilitates the derivation of other statistical properties of the RL-G density.Integrating (13) with respect to x produces the corresponding linear representation form of the CDF of the RL-G family.

Some Relevant Statistical Properties of the RL-G Family
In this section, we have derived some relevant statistical properties of the RL-G family.These include the quantile function, the raw (noncentral) moments, measures of inequality, the entropy measure, mean and median deviations, and the reliability parameter.
4.1.The Quantile Function.By definition, the quantile function Q p of the RL-G family is where the negative branch of the Lambert function, is denoted by W ; see [20].
But 1 − G Q p ; ϵ = y; and hence, the RL-G family quantile function Q p is represented as where the baseline distribution G x ; ϵ has its inverse denoted as G −1 .

Moments.
In statistical analysis, the kurtosis, mean, skewness, and variance are measures that can be computed using the noncentral moments of a distribution.
If X ~RL − G random variable, then the rth moment is defined as follows: Substituting (13) into the above definition and simplifying, the rth noncentral moments can be expressed as which can be simplified as Alternatively, (18) can be expressed in terms of the baseline quantile function, supposed G x ; ϵ = z in (18), then G −1 x ; ϵ = Q z = x, , dz = g x ; ϵ dx.
From (18), we have the rth noncentral moments expressed as 3. Incomplete Moment.This statistical property plays an essential role in the computation of the mean and medium deviations, inequality and entropy measures, and residual life of a random variable.
The RL-G family rth incomplete moment is defined as m r t = t 0 x r f RL−G x ; θ, ϵ dx.Setting (13) into the definition, we obtain the following: where ζ r,a = t 0 x r g x ; ϵ G x ; ϵ a dx, ζ r,a+b+1 = t 0 x r g x ; ϵ G x ; ϵ a+b+1 dx, η a * , and κ a,b * are defined in (18).Alternatively, the rth incomplete moment is expressed in terms of the baseline quantile function.Supposed G x ; ϵ = z in (22), then G −1 x ; ϵ = Q z = x, dz = g x ; ϵ dx.As x ⟶ 0, z ⟶ 0 and as x ⟶ t, z ⟶ G t ; ϵ .From (22), we have the rth incomplete moments expressed as * , and κ a,b * as defined before.
4.4.Measures of Inequality.The Bonferroni and Lorenz curves are two of the most commonly used measures of inequality that are applied in various fields such as insurance, demography, reliability engineering, and economics.By definition, the Lorenz curve L F x of the RL-G family is defined by , where E X is the mean and t 0 xf RL−G x ; θ, ϵ dx is the first incomplete moment of the RL-G family obtained by setting r = 1 into the incomplete moment's expression, that is; Substituting into the definition for L F x produces; 4.5.Mean and Median Deviations.The mean deviation denoted by π 1 for RL-G random variable is defined by

27
where μ 0 xf RL−G x ; θ, ϵ dx is the first incomplete moment.Hence, the mean deviation μ is Computational and Mathematical Methods where

and κ a,b
* as defined before.The median deviation about the median M denoted by π 2 for RL-G random variable is defined by Hence, the medium deviation about the median is expressed as where * , and κ a,b * as defined before.
The RL-G family has the Renyi entropy denoted by I R γ and is defined by where γ > 0, γ ≠ 1.
Substituting the density of the RL-G into the above definition and applying the generalized binomial expansion, the following expression is obtained: Applying the following log power series expansion in the last expression where the constants P k,l are obtained recursively by using the following relation: And after simplifying, we obtain the Renyi entropy as; The RL-G family Shannon entropy is defined by; By setting (13) into the above definition, I X is obtained as; where η a , κ a,b , h a+1 x , and h a+b+2 x are defined in (13).
The Havrda and Charvat entropy for the RL-G family is represented by; where γ > 0 and γ ≠ 1, and the expression in the integral is similar to the one used in Renyi entropy.Thus, the Havrda and Charvat entropy for the RL-G family can be expressed: 5 Computational and Mathematical Methods The Tsallis's generalized entropy for RL-G random variable is obtained by using the following formula: where γ > 0 and γ ≠ 1, from which we obtain and X 1 are strength and stress random variables.The stress-strength reliability parameter of RL-G family of distribution is defined by The simplified result from the last expression is

42
Evaluating each of the integrals and using the generalized binomial series expansion, the result of [25] for a power series raised to a positive integer n;

43
where b 2,0 = a 2 0 and for any integer The reliability parameter after simplification is expressed as

Maximum Likelihood Estimation of the RL-G Family
Suppose the RL-G family has a random sample of size n given by x 1 , x 2 , ⋯, x n , then the log-likelihood function for the parameter vector is given by Computational and Mathematical Methods Taking derivatives with respect to θ and ϵ By using numerical techniques, the above equations are set to zero and simultaneously solved to obtain the maximum likelihood estimates.

Special Distributions of the RL-G Family
In this section, two special members of the RL-G family, the Ramos-Louzada Weibull (RLW) distribution and the Ramos-Louzada Kumaraswamy (RLKum) distribution, are derived, and the flexibility of these distributions is illustrated by displaying plots of their hazard rate and density functions at some parameter values.Simulation analysis and applications to real datasets of the RLW distribution are studied in the latter section.
(1) Assuming the distribution for the baseline is the Weibull, whose CDF and PDF are, respectively, given by G x, α, β = 1 − e −αx β and g x, α, β = α βx β−1 e −αx β , x > 0, α > 0, β > 0, α, and β are scale and shape parameters, respectively.The Ramos-Louzada Weibull (RLW) distribution is obtained by substituting G x, α, β and g x, α, β into equations ( 7) and (8).Thus, the CDF and PDF of the new RLW distribution are, respectively, obtained below: The hazard rate function is expressed as Figures 1 and 2, respectively, display the plots of the PDF and hazard rate function of the RLW distribution with various selections of parameter values.From Figure 1, the RLW distribution can take several forms, such as "left-skewed," almost "symmetric," "reversed J-shapes," and "right-skewed," and plots of the hazard rate function in Figure 2 illustrate various forms, such as "increasing," "decreasing," "J-shape," and "reversed J-shape."Submodels of the RLW distribution are as follows: (i) When α = β = 1, we have the RL distribution given in (ii) When α = 1, the Generalized RL distribution proposed by [17] is obtained.The GRL density function is expressed as (iii) When β = 1, we obtain the Ramos-Louzada Exponential (RLE) distribution.Its density is defined by , we obtain the Ramos-Louzada Raleigh (RLR) density defined by (2) Supposed that the baseline is the Kumaraswamy distribution whose density is defined as G x = 1 − 1 − x β α , and g x = αβx β−1 1 − x β α−1 , 0 < x < 1, α > 0β > 0, β, and α are, respectively, shape and scale parameters, and equations ( 53), (54), and (55), respectively, express the CDF, PDF, and failure rate function of the RLKum distribution:  and 4, respectively, display plots of the RLKum PDF and hazard rate function at various selections of parameter values.From Figure 3, the RLKum distribution can take various forms, such as a "reversed J-shape," a "left-skewed" distribution, or a "J-shape."The hazard rate function plots in Figure 4 illustrate various shapes such as "decreasing," "increasing," "J-shape," "bathtub," and "inverted bathtub."Thus, the RLKum distribution is capable of modeling data with "non-monotonic" and "monotonic" hazard rate functions.

Monte Carlo Simulation
In this section, simulation analysis with sample sizes, n = 50, 150, 200, 500, 800, 1000, was performed to evaluate the properties of the ML estimators for the RLW distribution parameters by examining the average estimates (AV), the average bias (AB), and the root mean square (RMSE) for the estimated parameters.The analysis was repeated for N = 1500 times, with initial parameter values: (I) α = 2 6, β = 1 5, and θ = 2 1; and (II) α = 1 6, β = 0 5, and θ = 2 1.The random number generation is produced by solving the CDF of the RLW with the uniroot function in R software, and the estimations are obtained with the optim function in the same software.The AB, RMSE, and AV were estimated using the following expressions: 8 Computational and Mathematical Methods where ω = α, β, θ.Table 1 displays the simulated results of AB, RMSE, and AV for the parameter values of the RLW distribution.It can be observed that, in all cases, the AB and RMSE decrease to zero with increasing sample size.Furthermore, the AV of the estimators is quite close to the actual values.Hence, the maximum likelihood estimation and their asymptotic results perform well in estimating the RLW parameters.Similarly, alternative parameter choices can yield similar results.

Application
Application to three datasets of the RLW distribution is demonstrated in this section.The goodness-of-fit via Cramer-von Mises distance values (CVM), the Anderson-Darling statistic (AD), the Kolmogorov-Smirnov statistics (KS), and model selection criteria such as Bayesian information criteria (BIC), consistent Akaike information criteria (CAIC), and Akaike information criteria (AIC) of the RLW distribution, its nested models, and some other competing distributions were compared.In the first two applications, the RLW distribution was compared with its submodels, Nakagami (NAK) by [26]), inverse Weibull (INW) by [27], Nadarajah and Haghighi (NH) by [28], and modified extended Chen (MEC) by [29].In the third application, the following nonnested models were used: Marshall-Olkin exponential (MOEx) by [3], generalized exponential (GE) by [30], and generalized inverse Weibull (GIW) by [31].
The CDF of the nonnested models are given below: (i) Generalized Inverse Weibull: 8.1.Dataset 1: Rainfall Data.The information displays the highest annual average monthly rainfall (in inches) that was seen in Ghana's Ashanti region between 1989 and 2019.The dataset can be found in [32].The dataset contains the following:   10.508, 7.614, 12.165, 11.201, 8.988, 8.594, 10.961, 8.350, 9.882, 11.720, 10.272, 9.311, 8.854, 9.819, and 11.863.A graphical representation of the dataset using the hazard function is displayed in Figure 5.The total test on time (TTT) plot indicates that the curve has an increasing hazard rate.
Table 2 shows the ML estimates, standard errors, and p values of the parameters of the fitted distributions for the rainfall data.Two parameters of the RLW are statistically sig-nificant at the 5% significance level, except for the INW.The GRL, RL, NAK, and NH have all their estimated parameters statistically significant at the 5% significance level.
From Table 3, a better fit is provided by the RLW to the rainfall data compared to its submodels and the nonnested models because it has the maximum value of log-likelihood ℓ and the smallest CVM, AD, AIC, CAIC, and BIC.A close competitive model to the RLW is the NAK.
From the likelihood ratio test (LRT) results in Table 4, it is obvious that significant differences exist between RLW   12 Computational and Mathematical Methods and its submodels based on the LRT test since their LRT statistics values are, respectively, greater than the critical values at the 5% level of significance.The graphs of the fitted PDFs versus the histogram of the data are displayed in Figure 6, and the fitted CDFs versus the empirical data are displayed in Figure 7.It is noted that the plots of the densities of the RLW depict the empirical density and CDF of the maximum annual rainfall data more closely than the other models.8.2.Dataset 2: Hypertension Data.This dataset shows the survival periods in years before the development of hypertension for 119 patients randomly selected from the Bolgatanga Regional Hospital in Ghana's Upper East region.The dataset is in [33], and it has the following items: 71, 5,39,62,52,71,38,56,35,69,34,71,66,70,52,37,35,71,73,19,74,74,75,51,76,49,19,76,78,76,76,49,47,48,48,46,46,46,41,40,43,45,47,47,44,45,46,42,43,42,20 8 shows a graphic depiction of the dataset using the hazard function.The RLW distribution can therefore be used to represent the curve because the TTT plot shows that the hazard rate is growing.
The parameter estimates for the fitted models for the hypertension data are shown in Table 5, along with their standard errors and p values.In addition, the estimated parameters for the GRL, RL, and NH models are all statistically significant at the 5% level of significance.Two other estimated parameters of the RLW are also significant at this level.Based on Table 6, the RLW distribution offers a better match to the hypertension data compared to its nested models and the other distributions since it has the least CVM, AD, KS, AIC, and CAIC as well as the highest log-likelihood value.
It is clear from Table 7's likelihood ratio test (LRT) results that there are significant differences between RLW and its submodels based on the LRT test, as each of their LRT statistical values exceeds the critical values at the 5% level of significance.
Figures 9 and 10, respectively, are graphs showing the fitted PDFs against the data's histogram and the fitted CDFs against the empirical data.It should be observed that the RLW plots of densities more accurately represent the empirical density and CDF of the hypertension data than the other models.16 Computational and Mathematical Methods A graphical representation of the dataset using the hazard function is displayed in Figure 11.The plot of the TTT indicates that the curve has an increasing hazard rate and hence can be modeled using the RLW distribution and the other competing models.
Table 8 exhibits the maximum likelihood estimates, standard errors, and p values of the parameters of the fitted models for the carbon fiber data.The estimated parameters of the RLW and the other distributions are significant at the 5% level of significance, except for one parameter of the INW.From Table 9, the RLW distribution provides a better fit to the carbon fiber data compared to its nested models and the other distributions because it has the smallest CVM, AD, KS, AIC, and CAIC and the greatest value of the log-likelihood.
The LRT results of Table 10 indicate that there is no significant difference between the RLW and the GRL distribution since the LRT statistics values are less than the critical values at the 5% level of significance.On the other hand, there is a significant difference between the RLW and the RLRa distribution.
The histogram of the data against the PDFs of the fitted models and the fitted CDFs vs. the empirical carbon fiber data are, respectively, exhibited in Figures 12 and 13.It is observed from the plots that the RLW densities depict the empirical density and CDF of the carbon fiber data more closely than the other models.

Conclusion
In reality, the lack of developed family of distributions based on the RL distribution in the scientific literature served as motivation for our study.In this paper, a new family of distributions, the RL-G family, with its statistical properties such as the quantile function, the raw moments, the incomplete moments, measures of inequality, entropy, mean and median deviations, and the reliability parameter, is studied.The parameters of the proposed generator were estimated using the ML estimation method.The RLW and the RLKum are two special members of the RL-G family.The outcome of the simulation analysis indicates that the ML estimation method and its asymptotic properties performed quite well.Applications of the RLW from the RL-G family were carried out on three complete real datasets, and it is evident that the RLW outperformed its submodels and other distributions.As a result, the newly suggested family of distributions has a broader range of applications in a variety of areas.Despite the RL-G model's numerous advantages, such as flexibility, generalization of the RL distribution, and the ability to provide superior fits to the dataset in comparison to other compared models available in the literature, it cannot be employed for assessing discrete datasets, and expressions of its estimators are difficult to reduce to a simple, closed-form.

2Proposition 1 .
Computational and Mathematical Methods (iii) The kurtosis of the resulting distributions is more flexible compared to the baseline model (iv) Special models with various forms of the hazard rate function are defined The RL-G family is a valid PDF, which suffices that

Figures 3
Figures3 and 4, respectively, display plots of the RLKum PDF and hazard rate function at various selections of parameter values.From Figure3, the RLKum distribution can take various forms, such as a "reversed J-shape," a "left-skewed" distribution, or a "J-shape."The hazard rate function plots in Figure4illustrate various shapes such as "decreasing," "increasing," "J-shape," "bathtub," and "inverted bathtub."Thus, the RLKum distribution is capable of modeling data with "non-monotonic" and "monotonic" hazard rate functions.

1 Figure 2 :Figure 1 :
Figure 2: Plots of the hazard rate function of the RLW distribution for different parameter values.

Figure 4 :Figure 3 :
Figure 4: Plots of the RLKum hazard rate function for arbitrary parameter values.

Figure 5 :
Figure 5: Plots of the TTT transform for the rainfall data.

Figure 9 :
Figure 9: Fitted PDFs vs. histogram of the hypertension data.

Figure 12 :
Figure 12: Fitted PDFs vs. histogram of the carbon fiber data.

Figure 13 :
Figure 13: Empirical vs. fitted CDFs of the carbon fiber data.

Table 1 :
Simulated results of AB, RMSE, and AV for RLW distribution.

Table 2 :
ML estimates, p values, and standard errors of parameters for the rainfall data.

Table 3 :
Information criteria and goodness-of-fit statistics.

Table 4 :
LRT statistics for the rainfall data.

Table 5 :
ML estimates, p values, and standard errors of parameters for the data.

Table 6 :
Information criteria and goodness-of-fit statistics.

Table 7 :
LRT statistics for the hypertension data.

Table 8 :
ML estimates, standard errors, and p values of parameters for the carbon fiber data.

Table 10 :
Likelihood ratio test statistics for the carbon fiber data.