Prediction and Elucidation of Triglycerides Levels Using a Machine Learning and Linear Fuzzy Modelling Approach

Introduction Triglycerides are lipids composed of fatty acids that provide energy to the cell. These compounds are delivered to the body's cells via lipoproteins found in the bloodstream. Increased blood triglyceride levels have been associated with high-fat or high-carbohydrate diets. Generally, increased triglyceride levels occur in conjunction with other symptoms that are difficult to notice and recognize. Objectives The study's goal was to develop and predict the model that could be used to explain the relationship between triglycerides and waist circumference, high-density lipoprotein (HDL), and hypertension status by determining the relationship between triglycerides and waist circumference, HDL, and hypertension status. This model was developed using qualitative predictor variables and incorporated data bootstrapping multilayer perceptron neural networks and fuzzy linear regression. Materials and procedures. This was a public health study that combined retrospective data analysis with methodology development. The medical records of patients who attended outpatient clinics at Hospital Universiti Sains Malaysia (USM) were collected and analyzed. This was to provide a more extensive illustration of the methods developed. Screening and selection of patient data were necessary following the inclusion and exclusion criteria. The patient's medical record was used to obtain triglycerides, high-density lipoprotein (HDL), waist circumference, and hypertension status. Due to the critical nature of the variable, it was chosen to aid the clinical expert. The R-Studio software was used to develop the associated syntax for the hybrid model, which would define the association between the examined variables. The purpose of this study is to create a technique for the clinical trial design that utilizes bootstrapping, Qualitative Predictor Variables (QPV), Multiple Linear Regression (MLR), Artificial Neural Networks (ANNs), and Fuzzy Regression (FR). All analyses were performed using the newly introduced R syntax. The research developed a fuzzy linear model that increased modelling performance by incorporating clinically significant factors and validated variables via Multilayer Perceptron (MLP). Conclusion The proposed technique for modelling and prediction appeared to be the ideal combination of bootstrap, Multilayer Feed Forward (MLFF) neural network, and fuzzy linear regression. The created syntax is currently being evaluated and validated clinically. For modelling and prediction, the proposed technique looked to be the best, as it incorporated bootstrap, MLFF neural network, and fuzzy linear regression. The established syntax is now being utilized in the clinic to evaluate and validate the outcome. In terms of variable selection, modelling, and model validation, this strategy was superior to earlier approaches for fuzzy regression modelling.


Introduction
Triglycerides are fat-soluble fatty acids that circulate throughout the body. Triglycerides account for most of the fats we ingest, including butter, margarine, and oils. Excess calories, alcohol, and sugar are converted to triglycerides deposited in the body's fat cells. Triglycerides can be assessed fasting or nonfasting, with levels between 2 and 10 mmol/L indicating an elevated risk of cardiovascular disease and levels greater than ten mmol/L indicating an increased risk of acute pancreatitis and possibly cardiovascular disease. Elevated triglyceride levels are significantly related to low HDL cholesterol levels. Elevated triglyceride levels are frequently connected with other heart disease and stroke risk factors, such as obesity and metabolic syndrome. A prior study connected higher waist circumference with increased systolic and diastolic blood pressures, hyperglycemia, HDL cholesterol, and triglycerides [1,2]. Obesity and uncontrolled diabetes are the two most common causes of elevated triglycerides. Triglycerides can increase if someone is overweight and inactive, especially if they consume a lot of carbs or sugary meals or consume a lot of alcohol. Apart from that, obesity and poorly controlled diabetes are the two most common causes of high triglycerides. Triglycerides can be raised if someone is overweight and inactive, especially if they consume a lot of carbs or sugary meals or if they drink alcohol. Minimum test concentrations should be 150 mg/dL (1.69 mmol/L). The borderline is considered to be elevated, with values ranging from 150 mg/dL (1.69 mmol/L) to 199 mg/dL (2.25 mmol/L) being considered high. The levels are considered to be high between 200 and 499 mg/dL (2,26-5,63 mmol/L). Levels exceeding 500 mg/dL are extremely high (5.64 mmol/L) [1,3,4]. This article is significant for several reasons, most significantly for modelling and inference. The essay highlights the inherent limitations of most of the components of a developing hybrid model that integrates linear regression, data bootstrapping, neural networks, and a qualitative predictor variable. The second contribution of this study is unique in that it proves the utility of the bootstrap, multiple linear regression, and multilayer perceptron in self-control theory literature. This information would be highly beneficial in terms of triglyceride management. Additionally, this study intends to provide helpful information and a better knowledge of triglycerides concerning the variables chosen, which included waist circumference, high-density lipoprotein, and blood pressure parameters. This effort will increase our understanding of triglyceride modelling's mechanical behaviour, particularly methodological advancement. [5][6][7][8][9]. The main purpose was to develop and predict the model which determines these relationships effectively and significantly.

Data
Collection. This study assessed data from patients who visited the Hospital Universiti Sains Malaysia's outpatient clinic (USM). This study enrolled a total of 14 patients. The data description for the study's selected variables is summarized in Table 1.

Study Design.
A methodology based on the design of a computational retrospective study. The study case was illustrated by Triglycerides (Y), Waist (X 1 ), High-Density Lipoprotein (X 2 ), and Hypertension status (X 3 ). In fuzzy linear regression, the qualitative predictor variable, also known as dummy variables, is used as an independent variable. Before using this procedure, a dummy variable must be created before conducting the analysis. This necessitated using a set of dummy variables that was one less than the total number of categories. A dummy variable is a categorical variable with values 0 and 1 used to compare various types. A number of 1 indicate that the case falls into a particular category, whereas a value of 0 indicates that it does not. It is a nonsignificant dummy variable. The Universiti Sains Malaysia Research Ethics and Committee (Human) (USM/JEPeM/ 17040225) approved the study. The patient's privacy and medical condition are both protected.

2.3.
Modelling of Computational Biometry. The data were evaluated for links to triglyceride levels. The data were examined using the R-Studio software and the syntax that was implemented. The advanced strategy in this methodology is a combined model that employs the bootstrap, multilayer perceptron (MLP), and multiple linear regression (MLR) techniques. The Multilayer Perceptron (MLP) is a type of feedforward artificial neural network that is frequently used. MLP is a three-layer architecture consisting of an input layer, a hidden layer, and an output layer [4,[9][10][11].

Regression with Qualitative Predictor
Variable. Qualitative predictor variables with a class are represented by a-1 indicator variables with the values 0 and 1. Dummy variables or binary variables are common names for indicator variables. If a qualitative variable has more than two classes, the regression model will include additional indicator variables. Let us say we have (Y), (X 1 = Waist), (X 2 = HDL), and (X 3 = Hypertension) where Hypertension is a variable with three classes: normal, borderline, and hypertension. Therefore, it needs two indicators. Let us define the variable as follows: 2.6. Multilayer Perceptron (MLP). The multilayer perceptron is the most frequently used type of artificial neural network [9,10]. MLP is composed of three layers: the input, the hidden, and the output. Because this analysis contains only one dependent variable, the output node is unique within the research sample. The strength of this model is to lead the machine learning algorithm also can be used for complex nonlinear problems and can be achieved even with smaller data. EquationŶ = g i ð∑ 2 j=1 n j + E 2 Þ i = 1, 2 ; constructs an MLP with N input nodes, H hidden nodes, and a single output node. The MLP with N input nodes, H hidden nodes, and a single output node is shown in Figure 1.
The valueŶ is given asŶ = g 2 ð∑ 2 j=1 n j + E 2 Þ, where E 2 is the bias for the output node and g is an activation function. The value of a hidden node n j is given as n j = g i ð∑ 2 j=1 v ji x i + E 1 Þ , where E 1 is the bias for the output node and g is an activation function, where v ji the output weight from input node i to hidden node j, E 1 is the bias for hidden node j where j = 1, 2 and x i are the independent variables. The MLP's general architecture is represented in Figure 1. The MLP procedure for the variable selection will be used as the input for the multiple logistic regression [1,3,4,9].

Fuzzy Regression Model Using Possibilistic.
Regression analysis is a statistical technique for determining the relationship between variables with a cause and effect relationship. A multilinear regression model contains only one dependent variable and an infinite number of independent variables [4]. Multivariate regression analysis attempts to account for the variation in independent variables that occurs concurrently with the variation in the dependent variable. This section uses a fuzzy regression model to deduce the underlying relationship between triglycerides and the selected explanatory variables. A fuzzy regression model is used in a fuzzy environment to determine the functional relationship between dependent and independent variables. A fuzzy regression model can be written as Y = Z 0 + Z 1 x 1 + Z 2 x 2 +⋯+Z k x k ; here, the explanation variables x i ′s are assumed to be precise. However, the equation above indicates that the response variable Y is not discrete but rather fuzzy, which applies to the parameters. Our objective is to estimate these parameters. In the following discussion, assume that symmetric fuzzy numbers are capable of being expressed as intervals. Their models adopted the general form proposed by Tanaka et al. (1982). The model is as follows: whereỸ is the fuzzy output, outputÃ i , j =1,2,…, n. is a fuzzy coefficient (Figure 2), and (x = x 1 , x 2 ,…, x n ) is a dimension non-fuzzy input vector. The fuzzy component was assumed to be a triangle (TFNs) (Figure 3).

BioMed Research International
Defining the parameter as and restricting consideration to the case where only the coefficient are fuzzy, we can writẽ This is a beneficial formulation because it explicitly portrays the mode of the spreads of the fuzzy parameters. In a subsequent section, we explore fuzzy independent variables.
In this case, the proposed fuzzy model is given as follows: Triglycerides =Ã 1 Waist +Ã 2 Hdl +Ã 3 Borderline +Ã 4 Hypertension, The result for the model is displayed in Table 2. The fuzzy regression is fitting through the R Software. The full step by step method is given as follows ( Figure 4): Figure 4 illustrates the entire procedure for developing the statistical model. The clinical expert selects the variables to be used before beginning the data collection process. The study's strength is that it looks at a model that takes clinically relevant variables into account. Special consideration should be given to the data for the qualitative predictor variable (see section Qualitative Predictor Variable). Additional analysis is performed using the bootstrap procedure. The bootstrap procedure creates a sample of the same size as the original sample, but each observation is repeated multiple times and omitted. Data preparation will be followed by the construction of a multilayer perceptron neural network (MLPNN), a linear model (LM), and a fuzzy regression (FR) model. The entire procedure for developing the statistical model is depicted in Figure 4. The clinical expert determines the variables to be used before initiating the data collection process. The study's strength is that it examines a model with clinically significant variables. Special consideration must be given to the data for the qualitative predictor variable (see section Qualitative Predictor Variable). The bootstrap procedure is used to conduct additional analysis. The bootstrap procedure generates a sample of the same size as the original sample, but with each observation included multiple times and others omitted [6,7].

Results
Developing a model that could be used to explain the relationship between triglycerides and waist circumference, high-density lipoprotein (HDL), and hypertension status was the study's goal. This was accomplished by determining the relationship between triglycerides, waist circumference (and thus HDL), and hypertension. In this section, the obtained result will be divided into three different phases.   Figure 5, with four input nodes, one hidden layer, and one output node.

Phase III:
Result for the Fuzzy regression. This paper provides only a fuzzy regression modelling associated with the relationship triglycerides. The primary purpose of this paper is to demonstrate possible techniques that can be employed to explain such relationships. Below is the obtained result of the fuzzy regression. Table 3 shows the result of fuzzy linear regression. Therefore, the fuzzy regression model corresponding to the result can be written as follows.
Fuzzy regression based on the model using possibilistic is being proposed. Equation (8), Equation (9), and Equation (10) give the fuzzy regression model according to the central tendency of the fuzzy regression model, the lower boundary of the model support interval, and upper boundary of the model support interval. According to the findings, the Waist, high-density lipoprotein (HDL), borderline, or hypertension contribute to triglyceride levels. It has been positively associated with triglyceride levels. All the variables were validated through the MLP neural network, and Mean Squared Error NN (MSE-Network Projection) is 0.02377. The smallest MSE of the neural network model shows the best variable selection combination in the model. It can be shown that the hypertension factor gives the most significant influence to the level of lipoprotein (HDL), borderline, or hypertension status, contributing to triglycerides.

Discussion and Conclusion
Triglycerides are a type of lipid that stores and transports energy. Serum TG is derived from two sources: intestinal absorption and liver synthesis. They are the most common type of fat digested in the body, and they can either be consumed or produced [2]. The goal of this research is to show that triglycerides and circumference have a relationship. Edwina et al. published a study in 2018 in which they measured the waist circumference of 30 people with high triglycerides during their visit. According to the study, there is a strong correlation between triglyceride levels and waist circumference [12]. Our model suggested HDL was directly associated with the increase of triglycerides; however, this result was contradicted with one study which suggested that increased triglyceride levels are associated with decreased HDL cholesterol levels [13], which is associated with an increased risk of ischemic heart disease (IHD). Increased triglyceride levels in the presence of elevated LDL (bad) cholesterol or low HDL (good) cholesterol have been linked to fatty accumulation within the artery walls, increasing the risk of heart attack and stroke. The level of HDL cholesterol is inversely related to the level of triglycerides in the blood. In hypertriglyceridemic patients, HDL particles are more enriched in triglycerides than normal HDL particles. This is because the cholesterol ester in the lipoprotein core of the HDL particle is replaced with triglycerides. Additionally, triglyceride-enriched HDL is more efficiently catabolized. Additionally, researchers discovered a link between highdensity lipoprotein (HDL) cholesterol and triglycerides. As a general rule, divide the triglyceride level by the HDL "good" cholesterol level. The ratio of triglycerides to HDL "good" cholesterol should be less than 2, four is considered normal, and six is excessive. In this case, a lower ratio is preferable. As triglyceride levels fall and HDL levels rise, the ratio decreases. [1,3,10]. The primary goals were to develop, validate, and test a regression modelling methodology. The primary objective of this project was to develop and implement techniques in the field of medical statistics by combining the bootstrapping procedure with artificial neural networks to validate variable selection and linear modelling to complete the modelling process. Clinical expert opinion is included in the variable selection process. At the start of the operation, the bootstrap method generates a mega file from the original data set. The bootstrap procedure, on the other hand, generates a massive file replacement sample. Thirdly, the bootstrap method calculates and saves sample statistics. Fourthly, the bootstrap method iteratively repeats this process, sometimes thousands of times. At the fifth stage, the data is prepared for the next procedure. The R syntax algorithm enables the application to be integrated with the methodology concept. The first step is to select variables in consultation with a professional. The bootstrap will then be applied to the selected data. Training and testing data will be separated. The R syntax algorithm connects the application to the concept of method-based methodology. The first step, with the assistance of a medical expert, is to select variables. Following that, the data will be subjected to the bootstrap procedure. At this point, 70% of the bootstrap data will be designated as a training dataset and 30% as a testing dataset. The training dataset will be used to construct the model, while the validation dataset will be used to verify it. The Waist, HDL, and blood pressure status all have a significant effect on triglycerides, according to a multilayer perceptron analysis, and these results are in line with the recent study done on elderly Japanese men [14], concluding that triglycerides are positively associated with blood pressure and with hypertension [15].
The average square error derived from the multilayer perceptron analysis can be seen. This value was determined after taking into account the training and test sets. It is preferable to obtain a result with the lowest PMSE value. As a result, the study was successful, and the decision-maker received the best possible results. Due to the incorporation of statistical formulations, computation using the developed R syntax, and the neural net package, the proposed methodology resulted in highly successful ANNs. The R neural net package includes the necessary components for building artificial neural networks (ANNs) with various hidden layers and neurons. The most difficult tasks are selecting appropriate input parameters, preparing data, and standardizing it for ANN. 6 BioMed Research International The purpose of this article is to develop a hybrid method that incorporates bootstrapping, quality predictor variables, multilayer perceptron analysis, and multiple linear regression. The R syntax for this methodology was designed to ensure that the researcher fully understood the illustration. In this study, triglycerides were the dependent variable, while waist circumference, HDL, and blood pressure were the independent variables. As a result of the developed model, factors emerged as the most significant factors. When performing multiple linear regressions, the rule of qualitative predictor variables must be followed. In our study, the hybrid model demonstrates that this significant conclusion enables us to understand better the hybrid method's utility and relative contribution to the outcome. This discovery has the most significant potential for further statistical modelling for educational purposes and the decision-maker among the stakeholders.
The proposed strategy and the acquired results demonstrate the superiority of the hybrid model technique given in this work.

Data Availability
All data are available within the manuscript.

Conflicts of Interest
The authors declare that they have no conflicts of interest.