Generalized Pareto Model: Properties and Applications in Neutrosophic Data Modeling

e Pareto distribution is widely used to model industrial, biological, engineering, and other various types of data. A new generalized model, namely the neutrosophic Pareto distribution (NPD), is developed in this article. e proposed model is a neutrosophic variant of the classical Pareto distribution, potentially useful for analyzing vague, unclear, indeterminate, or imprecise data. e structure form of the proposed distribution is skewed to the right and determined to be unimodal. Several characteristics of the NPD are investigated under the neutrosophic framework.e expressions for basic properties such as mean, variance, raw moments, and shape coecients are obtained. e maximum likelihood approach is presented for estimating the imprecise distributional parameters of the proposed model. e extended notions of the NPD are explained with various key functions in the domain of applied statistical methods. Finally, the practical benets of NPD are proven by analyzing two real datasets.


Introduction
e distribution of extreme values for some natural phenomena (e.g., earthquakes, winds, oods, waves, and temperatures) is of relevance in a wide variety of practical situations. For instance, the distributions of high waves in the sea, the distribution of large oods in dams, and so on are important when designing these structures. Extreme value theory has exploded in popularity in recent years as a result of this interest [1]. By the end of the twenty-rst century, there is a substantially increased interest in safety and reducing losses from man-induced and natural disasters [2]. e combination of highly felt social needs and the emergence of new theoretical methods has resulted in tremendous progress in this essential multidisciplinary eld of research [3]. Note that the study of statistical characteristics for various natural catastrophes is essentially required not only for understanding the physical nature of the underlying processes but also for risk assessment [4]. Edwards and Das [5] provide a comprehensive list of major domains where heavy-tailed distributions are shown to be useful. e Pareto distribution is one of the heavy-tailed distributions frequently encountered in physical systems to describe di erent natural disasters (e.g., volcanic eruptions, earthquakes, hurricanes, oods) [6]. e observed uctuation in sea level, river ow discharges, asteroid craters, wind velocity, forest res distributions, and some other natural disasters support the potential of a successful Pareto tting model and obey the Pareto power law [7]. e Pareto distribution and its di erent variants are also especially well known in the literature for its ability to describe heavy-tailed data, which are commonly found in wealth distribution, actuarial science, life testing, nance, economics, engineering, and survival analysis [8]. A large range of socioeconomic variables have heavy-tailed distributions that are reasonably well fitted by the Pareto model [9]. e shape of income distributions is governed by some underlying law [10]. Rootzen et al. [11] also listed a number of additional quantities that have been measured in diverse physical, biological, technical, and social systems and for which the Pareto rule has been proven to be a good fit. In short, several studies have asserted that the Pareto model is versatile in modeling many forms of data with large tails. A traditional method for analyzing extreme values in a population is based on a precisely characterized extreme value Pareto distribution. e customary approach of Pareto distribution is appropriate to use when data consist of a set of exact values or distributional parameters are exactly defined values [12]. However, this strategy has been critiqued since employing exact data results in the loss of information contained in data. Measurements on quantitative variables always have a certain range of inaccuracy [13]. Apart from continuous measurements, there is abundance of situations where exact reporting is impossible due to the irregular nature of circumstances. For example, due to the fluctuating nature of water, the depth of an ocean cannot be exactly quantified but can only be approximated. is issue is remedied by using fuzzy and neutrosophic statistics rather than conventional statistical approaches [14]. e neutrosophic approach is a broader idea that combines a fuzzy concept set with the notion of a classical set [15]. e neutrosophic philosophy takes into account the presence of truth, falsity, and ambiguity [16]. e notion of neutrosophy is now being utilized in a variety of application areas [17][18][19]. e obtained data may be unclear in a number of real-world scenarios. Several researchers have turned to neutrosophical philosophy to solve the issues of incomplete data [20][21][22]. In the domain of neutrosophic statistics, conventional statistical approaches have been rigorous in their treatment of ambiguous data processing. New application areas for distribution theory are emerging and demanding further attention. e literature on statistical distributions is dense with several strategies for generalizing continuous distributions in order to improve their ability to describe a variety of datasets.
is study presents the NPD within the neutrosophic framework, thereby enhancing the model's flexibility in dealing with uncertain data sets from a variety of real-world circumstances. is work aims to investigate the usage and implementation of the NPD in healthcare data analysis and to demonstrate the practical advantages of the suggested model. e remainder of this work is structured as follows: Section 2 contains a description of the proposed and other key characteristics. Simulation studies, including the quantile function of the NPD, are explained in Section 3. e estimation process under the neutrosophic logic is presented in Section 4. In Section 5, a concise explanation of significant theoretical findings is followed by some realworld examples. Section 6 summarizes the findings of the study.

Proposed Model with Some Useful Properties
is section gives an overview of the suggested distribution and presents it in a coherent framework. e following concepts provide a link between the proposed model and its uses in the applied statistical methods. If the random variable X with two parameters λ n and ρ n follows the Pareto model, then the density function (DF n ) of the proposed distribution is defined as where λ n � [λ l , λ u ] and ρ n � [ρ l , ρ u ] are the neutrosophic shape and scale parameters, respectively, of the NPD. Note that the proposed model differs from the existing structure of the classical Pareto model, where shape and scale parameters are precisely determined. When the indeterminate part is considered zero in the proposed model, that is, λ l � λ u � λ and ρ l � ρ u � ρ, it becomes equivalent to the classical model. Various values of λ n and ρ n result in different density curves. A variety of density curves with different neutrosophic shape values and a fixed of scale are plotted in Figure 1. Figure 1 shows that different indeterminate values of shape parameter resulted in different sturdy curves of the NPD. It is clear from Figure 1 that DF n curves are not symmetric and distorted toward the right. e DF n curve is portrayed as a thick layer instead of a single curve in the neutrosophic framework. e layer width (shaded area) indicates an imprecision region, and total area under the thick curve is one in view of completeness. Another intriguing aspect in probability theory applications is the neutrosophic cumulative function (CF n ) of any density. e CF n is a jointly coupled form of the DF n is given by e CF n function estimates the probability that a random variable will have a value smaller than a given value. Figure 2 shows the CF n curves for various interval values of the shape parameter of the proposed model. Figure 2 depicts the cumulative densities of the proposed model for various interval values of shape and fixed value of scale parameter. In each panel of Figure 2, the CF n curve is nondecreasing and ranges from 0 to 1. e nondecreasing nature of the DF n implies that the DF cannot be negative and true for any distribution. Another useful function in the context of the applied statistical method is the possibility that an individual's life will outlive a certain period of time.
is function is referred to as the survival function or simply the survival rate. In the neutrosophic framework, the survival function (SF n ) of the proposed model may be represented as follows: e graph of SF n is referred to as a survival curve. Figure 3 depicts the survival curve for the proposed NPD.  Mathematical Problems in Engineering e steep curve can demonstrate a short survival period, or a low survival rate can be shown by the steep curve, as seen in Figure 3(b). A flat or progressive survival curve indicates a longer survival rate, as seen in Figure 3(a).
Another critical function in reliability analysis is the neutrosophic hazard function (HF n ), often known as the imminent failure rate. It is the ratio of the survival and density functions and may be calculated as follows for the suggested model: e function h n (x) provides the failure probability of an individual or item for a minimal time. HF n may increase, decrease, stay constant, or reflect a more complex process. e graphical behavior of the hazard curve can be seen in Figure 4.  Figure 4 indicates the decreasing trends of the hazard curves of the proposed model.
In this section, we have also further investigated the theoretical background and presented some key distributional properties of the proposed NPD in the context of neutrosophic logic. e distributional properties subject to parameterization as given in (1) are given as follows.
Proof. By definition, the mean of the NPD is given by Equation (5) So, hence proved.
e variance of the NPD is given by Now, Simplification of (9) provides Equation (8) thus becomes □ Theorem 3. If X follows the NPD, then 2 1/λ n ρ n is the median value.
Proof. e median point can be derived from the distribution function as where m n denotes the neutrosophic median value. Furthermore, simplification of (12) for m n yielded Proof. By definition, the jth moment of the NPD is given by From (14), we can write ) is required result, where j � 1, 2, 3, . . . is a general expression for the jth row moment about the origin of the NPD. By using the following relations, moments about the mean for NPD can be derived as shape, scale=[1, 2], 1 shape, scale= [3,4], 1 shape, scale= [5,6]

Simulation Analysis of the Proposed Model
In this section, a Monte Carlo technique is employed to generate the random numbers that are expected to follow NPD. In general, the Monte Carlo method refers to any technique for solving a problem that makes use of random outcomes. e objective of this study is to test the theoretical findings listed in Section 2 by simulating random samples from the NPD with known parameter values using the Monte Carlo approach. e inverse CF n approach has been employed as the most straightforward technique to simulate random numbers from the proposed model. is approach enables us to make use of a computer built-in pseudo-random number generator for generating random numbers. e inverse CF n of the proposed model is given by where U i randomly generated numbers from the uniform distribution, and Q p is desired percentile value of the proposed NPD. Let 10 5 random samples are drawn according to the inverse CF n method from the proposed model with ρ n � [1, 1] and λ n � [4,6]. Analytical outcomes based on the analytical results given in Section 2 are calculated with baseline parameter values. Estimated values of different distribution properties along with exact results are provided in Table 1. Table 1 displays the descriptive metrics of the proposed model for known distributional parameter values. e descriptive measures of the simulated data using the proposed model are in intervals due to assumed indeterminacies in defined parameters. e basic framework of the proposed model is validated by the strong agreement between simulated and analytical results.

Estimation of Neutrosophic Parameters
In this part, a well-known maximum likelihood (ML) technique is used to determine the neutrosophic parameters of the proposed NPD. e ML technique is defined by considering the parameters unknown and calculating the joint density of all observations in a dataset that are assumed to be identical and dispersed independently. Once the likelihood of the NPD is established, maxima of the function are determined.
ese ML estimators are essential in the statistical viewpoint because of minimal variance and asymptotic unbiasedness properties. Let x 1 , x 2 , . . . x m are identical, and independently observations from the m subjects follow the parametric model given in (1), and then, the joint density is given by g n x i |λ n , ρ n , Taking the logarithm of (22) and symbolizing it by ω n (T i |η n , ρ n ) Simplification of (23) yielded Partially differentiating (24) by unknown values and equating to zero implies Further solution of (25) provides the following estimates for unknown parameters of the NPD Note that ρ n and λ n will be interval forms because of imprecise sample data. Additionally, we analyze the simulated dataset to demonstrate how the estimation procedure works in neutrosophic environment. Total 10 4 different random samples from the NPD are generated with values of λ n is taken as [4,6], whereas the value of ρ n is fixed at [1, 1]. e behavior of ML estimator from unknown shape parameter and scale parameter is also investigated in terms of neutrosophic root mean square error (NRME).NRME is estimated according to the formula given as follows: where δ j and δ j are, respectively, actual and predicted value of the estimated parameter, and M is the total number of simulation runs. e R packages EnvStats and Metrics have been utilized to estimate the model's parameters and calculate the values of root mean square error. e estimated values of λ n at a fixed value of scale parameter along with NRME values are reported in Table 2. Table 2 shows that when the sample size increases, the value of the estimator tends to the benchmark value [4,6], and NRME decreases to zero. is trending behavior reveals that ML neutrosophic estimators efficiently perform with a larger sample size. We can estimate and observe the performance of the scale parameter ρ n , but results are not presented here due to a similar trend.

Applications of the Proposed Model
Two real datasets are utilized in this section to show how the proposed NPD may be implemented.

e Dioxin Data.
Dioxins are a class of very poisonous chemical substances that are dangerous to humans [23]. In the environment, dioxins pose a threat. Dioxins are a matter of concern due to their extremely hazardous potential impact on human health. Experimental studies revealed that they could impact negatively on reproductive, developmental, and immunological systems and organs in the human body [24].
Additionally, they can affect hormones and result in cancer. Once dioxins enter cells, they remain there for an extended period of time due to their chemical persistence and ability to be absorbed by fatty tissue, where they are subsequently retained [25]. Dioxins are mostly produced as a byproduct of industrial operations; however, they can also be produced naturally. Dioxins are unintended byproducts of various manufacturing processes, including chlorine drying of paper pulp, smelting, and the production of some agricultural chemicals [26]. When it comes to dioxin discharge into the environment, unregulated waste incinerators are frequently the greatest offenders, because of incomplete combustion. e vast majority of dioxins in the food supply are found in dairy products, meat, shellfish, and fish. at is why securing our food supply is so important. Although dioxins are formed locally, their environmental dispersion is worldwide. Dioxins are prevalent in the environment on a global scale. Dioxins emission is therefore monitored by many countries on a regular basis. e total amount of dioxins emitted in Japan is monitored on a regular basis by the Ministry of Environment [27]. Because of good government policy, the quantity occurrence in the ecosystem or in food is now extremely small; routine levels of ingestion are extremely unlikely to cause acute toxicity. e current levels of dioxins in Japan indicate an extremely low risk of cancer. To assess the safety of dioxins exposure, the TDI (tolerable daily intake) is employed as an indication. It is the quantity of a chemical substance that may be safely absorbed into the body over a long period of time, per kg of body weight per day, known as TDI. us, TDI is a figure that is used as an indicator of how long it will take for daily intake to have an Mathematical Problems in Engineering effect on health [28]. e amount of dioxins absorption from the average diet estimated annually for the time period 1998 to 2015 is published in the annual report 2017 on environmental statistics by the Ministry of Environment Japan [29]. First, the Pareto distribution on intake data for dioxins is evaluated using the distribution fitting package in R software. Figure 5 depicts basic probability plots and empirical density. When the systematic deviations of the points from the straight line in each graph are considered, it is established that the Pareto distribution is an appropriate model for this dataset. us, the visualization plots in Figure 5 show that the process data are skewed. It is, therefore, possible to investigate the data in further depth by applying the previously proposed model. Although intakes of dioxins are initially precise quantities for demonstration purposes, we assume the uncertain sample values as shown in Table 3. e imprecise data are formed according to the strategy devised in [30].
Because of uncertain values, traditional Pareto analysis of these types of data is inappropriate. It is possible to use the suggested NPD to summarize the data containing indeterminacies. Table 4 provides a descriptive summary of the consumption of dioxins from a typical diet using NPD. Table 4 shows the estimated neutrosophic measures based on the suggested model. All the estimated values are expressed as intervals because of indeterminacies inherent in the analyzed dataset.
us, the suggested model is more adaptable and capable of efficiently analyzing incomplete data or estimating the parameters with imprecision values.

e Child Mortality Rate Data.
e second dataset used in this analysis provides the childhood mortality rates under the age of five, covering the period 1995 to 2020 for Saudi Arabia. e information has been gathered from a well-    [31]. Even though child fatality has significantly decreased at extremely low rates in many regions of the globe, it is still seen as a significant issue that requires great attention in the country's policies. Significant worldwide progress has been achieved since 1990 in lowering child fatalities. ere has been amazing development and remarkable progress in the health of children and adolescents in Saudi Arabia during the past two decades due to major factors such as malnutrition reduction, immunization of infectious illnesses, and diarrhea control [32]. e data from the source are crisp death rate values during the first five years of life. To aid comprehension of the previous notion of the suggested distribution, neutrosophic data are created using the approach provided in [30]. e interval childhood mortality rates for the period 1995-2020 are given in Table 5.
e noticeable uncertainty estimates in Table 5 are due to the fact that different estimation procedures typically used for reporting the mortality rates hinder the exact estimates. Depending on the number of census errors and the various estimating methodologies, there are likely to be fluctuations in estimates for any particular country. A distribution fitting R tool is used to depict the basic probability plots to test the applicability of the Pareto distribution on average child death rates, as shown in Figure 6.
In Figure 6, the subjective visual examination of the data suggests that the Pareto distribution is a reasonable model for the mortality data as observations are very close to the straight line. As interval childhood mortality rates are utilized in this investigation, the conventional Pareto analysis is inapplicable. It is feasible to summarize data, including uncertainties using the proposed model. Using the suggested neutrosophic model, Table 6 displays a descriptive overview of the mortality statistics. Table 6 provides the estimated uncertainty bounds of some essential statistics based on the proposed distribution.
All estimated values are provided as intervals due to the intrinsic imprecision of the dataset being studied. us, the proposed model is more flexible and capable of evaluating an imprecise dataset more effectively.

Conclusions
e neutrosophic framework of the Pareto distribution and its applications in applied statistical methods are presented in this work. Statistical characteristics of the newly proposed model using the neutrosophic logic have been widely explored. e key expressions for the suggested model, such as Table 5: Interval estimates of infant death rates for the age less than five years.  cumulative function, hazard function, reliability function, and survival function, have been derived and discussed in detail. e ML estimates for the unknown parameters of NPD have been developed. e theoretical characteristics of the proposed model have been evaluated using the Monte Carlo simulation approach. e effectiveness of the suggested NPD has been demonstrated by using a real dataset on average dioxins consumption data collected from food samples throughout the country of Japan.
A future study might concentrate on enhancing the capacity of the suggested distribution for various inference techniques and its utility for processing high-dimensional data.
Data Availability e data that support the findings of this study are available within the manuscript.

Conflicts of Interest
e authors declare that they have no conflicts of interest.