Discrete Hyperparameter Optimization Model Based on Skewed Distribution

As for the machine learning algorithm, one of the main factors restricting its further large-scale application is the value of hyperparameter. erefore, researchers have done a lot of original numerical optimization algorithms to ensure the validity of hyperparameter selection. Based on previous studies, this study innovatively puts forward a model generated using skewed distribution (gamma distribution) as hyperparameter tting and combines the Bayesian estimation method and Gauss hypergeometric function to propose a mathematically optimal solution for discrete hyperparameter selection. e results show that under strict mathematical conditions, the value of discrete hyperparameters can be given a reasonable expected value. is heuristic parameter adjustment method based on prior conditions can improve the accuracy of some traditional models in experiments and then improve the application value of models. At the same time, through the empirical study of relevant datasets, the eectiveness of the parameter adjustment strategy proposed in this study is further proved.


Introduction
e whole Pareto/NBD model was created by Abramowitz and Stegun. e practical problem corresponding to the initial modeling is to analyze the behavior of consumers, especially the repeated purchase of related products, so that managers can evaluate the purchasing intention of future consumers and better arrange production and marketing activities. For the whole model itself, the description of consumer behavior is re ected in the following two important parts [1]. First, it is about the probability distribution or the probability distribution density of the consumer's survival over the entire product purchasing cycle; second, it is about the description of the mathematical expectation of consumers' purchasing behavior randomly selected in a certain consumption period in the future.
For the construction of the whole Pareto/NBD model, the most critical part is to get the exact mathematical analytical expression of conditional mathematical expectation through appropriate mathematical assumptions and strict mathematical reasoning [2][3][4]. is work is very di cult for many researchers who are familiar with the SMC model. An important objective of this study was to prove the accuracy of some key conclusions about the Pareto/NBD model through mathematically deriving relevant impor`tant intermediate results (especially probability distribution function and probability distribution density function) within the framework of Pareto/NBD model. At the same time, the analytical framework of this model is combined with the hyperparameter optimization problem of discrete SMBO type to achieve the accuracy of mathematical description of the whole model.
First of all, we need to introduce a special function, Gauss hypergeometric function, and the mathematical expression of this function in the form of power series [5] is as follows: where c ≠ 0, − 1, − 2, . . ., (a) j is the factorial power (Pochhammer's symbol) operation, and speci c mathematical calculation formula is a(a + 1) · · · (a + j − 1). In contrast to the forming operation, it is about the parameter a of the ascending factorial power operation, and according to the assessment method of Abel, the convergence of the factorial power operation depends on the parameter z, in the case of |z| < 1throughout the series converges to 1, in the case of |z| > 1 tends to spread the whole series, |z| � 1, the series converges to the c − a − b > 0. e ascending joint operation (a) j can be expressed by two gamma functions with related parameters, which can be expressed by mathematical formula as follows: With the help of this nature, the mathematical expression of the whole Gauss hyperparametric function can be further expressed as follows: According to the above mathematical expression form of Gauss hypergeometric function, combined with the nature of gamma function, the whole function is still expressed in discrete form, and the whole Gauss hypergeometric function is mutually symmetric for two parameters a and b [6], which is expressed in mathematical form as follows: rough the above mathematical expressions, we get the continuous expression of the Gauss hypergeometric function, where B(·, ·) is the beta function.

Model Assumptions
Many aspects of BG/NBD model are similar to the Pareto/ NBD model, but there are still many differences between models.
e most important part is whether the hyperparameters change with each experiment/test. In the Pareto model, the hyperparameters change in any interval of each experiment/test interval, which is further independent of the selection and change in datasets. In the BG/NBD model, we assume that the timing of determining whether the hyperparameters change is controlled after each new dataset test; that is, the continuous discrimination interval in Pareto/NBD is changed into discrete decision interval, and then, the beta-geometric (BG) model can be established [7]. On the whole, the following five basic assumptions are needed for the new model.
(1) When dataset testing further influences the change in hyperparameters, the change in hyperparameters is subject to the Poisson process with parameter λ as a whole; that is to say, where d t � ‖D t − ∅‖ represents the difference between different datasets and empty sets, which can be the complexity change between datasets or the size change in datasets. For Poisson process distribution, the exponential distribution with parameter λ is followed between every two dataset experiments, and the mathematical expression is as follows: where g stands for probability density function. (2) For parameter λ, we assume that it is subject to the gamma distribution of parameters r and α, and the mathematical expression is as follows: where r and α are the known and determined constants, which are determined by exogenous factors and not affected by the model itself. (3) After each experiment/test, the probability p of the hyperparameter θ will remain unchanged. In other words, after any experiment/test, a judgment should be made on whether the next hyperparameter θ will change further. en, the probability of the hyperparameter θ remaining unchanged should follow the geometric distribution of the parameter p: where j is the number of experiments/tests. (4) To ensure that there is an analytic solution to the posterior distribution involving hyperparameters when solving the Bayesian estimation, it is necessary to assume that the parameter p is also determined by the exogenously given probability distribution, namely, where a and b are the known and determined constants, which are determined by exogenous factors and not affected by the model itself. (5) To eliminate the interaction between parameters and simplify the complexity of the overall model analysis, we need to assume the independence of parameters λ and p of the two models; that is, with the progress of the experiment/test, λ and p are independent of each other and not affected by each other.

Derivation of Likelihood Function.
We may consider that for the hyperparameter θ, it has experienced x changes in the known experimental/test interval (0, d T ]. ese changes may occur in x interval d 1 , d 2 , . . . , d x in the whole test interval, which can be represented by the number line graph as follows.
To ensure that a relatively complete likelihood function of relevant parameters can be derived [8], we carry out the following technical operation and further assumption: (1) e probability of hyperparameter θ changing between d 1 in the first experiment/test area is a standard exponential distribution model, and the probability of event occurrence should be λexp(− λd 1 ). (2) e probability that the second change in hyperparameter θ happens to be in the second experiment/ test interval d 2 is in the first experiment/test interval d 1 . e probability of hyperparameter θ still continues to change with the product of the probability of d 1 and d 2 changes within the range of test; that is to say, in the whole experiment/test interval, the probability of θ hyperparameter change still obeys the standard of exponential distribution, but needs to multiply by the probability that the hyperparameter θ will still change in the first test interval. e specific expression is as follows: (3) Further extend to the probability that the hyperparameter θ changes at the xth experiment/test interval d x should be equal to the standard exponential distribution times the probability that the change will occur after the previous experiment/test interval d x− 1 , and the mathematical expression is as follows: (4) Finally, to ensure the likelihood function expression of completeness, we must make mandatory assumptions about the predicted change probability after the first experiment/test d 1 , assuming that the first test/test must be changed, but the changes can take 0 concrete or other step length, so overall hyperparameter θ in (d x , d T ] interval constant probability is composed of two parts. e first part is the first time the experiment/test has changed, both of which do not change afterwards, and the second part is just the possibility just there is no change in (d x , d T ] interval, where the specific expression is as follows: Further, we can obtain the likelihood function of the parameter d 1 , d 2 , . . . , d x , d T when λ and p are known: Just like our Pareto/NBD model above, the change in hyperparameter θ is still uncertain and random, and the prediction of the overall hyperparameter change depends on the historical information of the whole experiment/test: At the zero point of the initial experiment/test preparation stage, we must ensure that the hyperparameter θ will change, but there is a difference in the change step size, and then, we can get the probability likelihood function of the hyperparameter θ remaining at 0 in the whole historical information range of (0, d T ], which can be further obtained using the exponential distribution: Further, for a single hyperparameter θ, we can obtain a probability likelihood function, and the mathematical expression is as follows: where I x>0 � 1, when x > 0, and x ≤ 0 is less than minus, and the value of the whole is 0.

Derivation of the Probability Distribution of the Hyperparameter θ.
For θ hyperparameter probability distribution, the application of mathematical formula is expressed as represents the specific value of hyperparameter θ in the whole experiment/test interval d t , so Θ(d t ) becomes a random variable, so that you can establish the basic relationships between experiment/test interval and hyperparameter θ: Among them, T x shows experiment/test range when θ hyperparameter is x changed. According to the corresponding relationship, we can get approximate mathematical expressions about P(Θ(d t ) � x): Considering that the experimental/test interval in which the hyperparameter θ changes follows an exponential distribution (for specific content, see model hypothesis part), according to random variable for the hyperparameter itself, an important part of the probability distribution function is Mathematical Problems in Engineering , a Poisson probability problem; among them, Θ(d t ) � x and P(T x ≤ d t ) (Erlang-x) are the arrival time distribution. Furthermore, we can derive the conditional distribution function of hyperparameter with θ machine variable under the known condition of RP [9]:

Derivation of Mathematical Expectations for the Hyperparameter θ.
Since the change in the hyperparameter θ follows the Poisson process with the parameter λd t , the mathematical expectation of the value change in the hyperparameter θ, i.e., E[Θ(d t )], should be λd t , within the given experimental/test interval. en, for the hyperparameter θ, there are no longer changes in the experi- en, for the change occurring outside the experimental/ test interval d τ , the probability distribution of the change in the value of the hyperparameter θ should follow the conditional probability model about λ and p, that is, is expression further indicates the probability density function for no-answer experiment/test interval d τ ; that is to say, g(d τ |λ, p) � λpexp(− λpd τ ). It shall be highlighted that no-answer experiment/test interval d τ is an exponential form, but the relevant parameter arrangement depends on the probability for the recognition of each experiment/test p. However, in a Pareto/NBD model, the overall model assumes that no-answer experiment/test interval d τ is specified by the probability density function, and there is no relevant information for the discrimination of probability p. en, we can further work out the conditional expectation expression about Θ(d t ) in the case that λ and p know, and the specific mathematical expression is as follows: (20) By substituting relevant intermediate variables into the above formula and further simplifying, we can get a very concise formula:

Derivation of the Transcendental Posterior Expectation.
By the content of the previous section, we discuss the core of the problem is to deduce the mathematical expectation of E[Θ(d t )|λ, p] involving θ hyperparameter under the condition of known parameters λ and p, that is, under the premise of arrival rate λ of the test data and the know p, the conditional expectation expression of the hyperparameter θ is obtained, but in fact, according to the previous hypothesis, according to λ and p, we set and select the two parameters that obey the distribution types and determined parameters. is part mainly focuses on the content of the derivation in λ to gamma distribution and p to beta distribution of cases, that is, to derive the posterior conditional expectation of Θ(d t ) � x involving the parameters r, α, a, b when we assume that the conditions (1) and (2) meet. e whole derivation process is relatively complex, and we will treat the proof content as an appendix. is summary mainly lists the main results in the derivation process.
(1) According to the content of Section 3.1, we have derived the probability likelihood function of the related parameters λ and p when the hyperparameter and the whole experimental/test interval are known, in the form of formula (3). Furthermore, we will make parameters r, α obey the parameters for the gamma distribution and a, b, respectively, the assumptions for the beta distribution into the middle of the derived formula; note that we need to emphasize that the entire (r, α, a, b) parameter set is given and then get in the whole experiment/test cycle (Θ � x, d x , d T ) known cases, and related parameter (Θ � x, d x , d T ) probabilistic likelihood function is expressed as follows: All the four variables (r, α, a, b) in the above formula can be estimated by means of maximum-likelihood estimation. at is to say, in the whole known experiment/test cycle (0, d T ], for Θ � x, the probability likelihood function after the log operation occurred in the d x experiment/test interval can be expressed as follows: e estimation of this formula can follow the general numerical optimization method and determine the optimal solution by finding the stagnation point of the first derivative and solving the extreme value, because the nature of the likelihood function must be convex. (2) According to the conditional probability distribution of hyperparameter θ, the form is shown in formula (4). Considering that the specific distribution of λ, p has been given and the parameters have been determined, we can further express formula (4) as the conditional probability distribution under the condition that parameter set (r, α, a, b) is known: (3) Finally, according to the condition of known parameters of λ, p, the hyperparameter θ conditional expectation, the concrete mathematical expression such as formula (5), combined with the λ, p two parameters, follows gamma distribution and beta distribution in concrete form, and we can get in any experiment/test interval length d t , under the condition of the parameters in the parameter set (r, α, a, b) known, hyperparameter θ conditional expectation, which can also be called the posterior conditional expectation; the mathematic expression of the specific is as follows: where F 1 (·) is Gauss hypergeometric function; refer to the explanation in the proof for the simple derivation, and refer to the relevant introduction in the model introduction for the specific content.
It is important to note that the final expression of posterior conditional expectation about hyperparameter θ needs a value about Gauss hypergeometric function; of course, this value has nothing to do with hyperparameter θ. On the contrary, due to the parameter set (r, α, a, b) can be obtained by maximum-likelihood estimate of the traditional method, and the experiment/test interval length is also a known quantity, so the overall Gauss hypergeometric function value is also determined. Compared with other estimated results, using the Gauss hypergeometric function to express the hyperparameter θ of the posterior conditional expectation is more direct and simple, and polynomial sequence can be used for approximate calculation, so the convenience for the late of simulation is very good, which can greatly simplify the computing complexity and try to solve the corresponding parameter selection in a timely manner.
Of course, the above content is just based on known sequence information (Θ � x, d x , d T ) cases and makes the probability model for how hyperparameter θ values. In addition to accommodating more known conditions and historical information as much as possible to make accurate simulation of the structure of past data, the final purpose of establishing BP/NBD model shall also focus on a given new experiment/test dataset case and how to utilize the known datasets and the relationship between the parameter θ, given the new dataset the parameters of an approximate estimate [10], and the key of the whole model application on the new dataset corresponds to the hyperparameter of the prediction is that we more concern.
(Θ � x, d x , d T ) in the history of the experiment/test information has been confirmed, we can have θ hyperparameter values on the new experiment/test interval d t , we give an approximate posterior conditional expectation, the specific detailed process can see behind a certificate, and here we give the concrete mathematical expression of this posterior conditional expectation: where Θ � x, d x , d T stands for the known experimental/test information, (r, α, a, b) stands for the known parameter set by calculation, and F 1 (·) stands for the Gauss hypergeometric function.
Once again, the whole formula uses the Gauss hypergeometric function, and a mathematical expression formula is the same, of course, where the Gauss hypergeometric function parameters are known, and the calculation of the precise value is guaranteed. To reduce the computation burden on the computer simulation process, the actual operation often uses polynomial equations to approximate its numerical calculation and will not cause enormous computation burden. e rest of the whole formulas are simple numerical calculations and would not form the computing burden. e overall content of the appendix mainly focuses on the mathematical description of the key process in the model derivation process. On the one hand, through mathematical Mathematical Problems in Engineering rigour, it indicates the credibility of model content; on the other hand, it also provides mathematical support for transforming the mathematical model into computer model. e main purpose is the mathematical expectation for hyperparameter θ, E[Θ(d t )], and ultimately predictable θ hyperparameter posterior mathematical expectation of mathematical deduction, and the derivation process of the intermediate links is the Euler integral for Gauss hypergeometric function:

Derivation of Mathematical Expectation E[Θ(d t )] for the Hyperparameter θ. To get the mathematical expectation E[Θ(d t )]
about hyperparameter θ, according to the condition of known parameter set λ, p, we must use formula about the conditional expectation of hyperparameter θ and assumptions about the distribution of λ and p, λ, parameters of r, α gamma distribution on p obedience of a, b beta distribution parameters; first of all, we will bring the probability distribution density of g(λ|r, α) � (α r λ r− 1 exp(− λα)/Γ (r)), λ > 0, related to λ into formula (4) and get preliminary mathematical expressions: Further, we substitute the beta probability distribution density g(p|a, b) � (p a− 1 (1 − p) b− 1 /B(a, b)), 0 ≤ p ≤ 1, of p subject to the overall conditional expectation formula of hyperparameter θ, and we first calculate a definite integral as follows: Further, a more complex definite integral needs to be calculated: To ensure that the definite integral form at the end is consistent with the form of Gauss hypergeometric function, we need to make a variable substitution, so q � 1 − p; of course, in the process of variable substitution, the form of the differential will also change dp � − dq. e integral form above further becomes the following: Euler integral form and parameter r, b; a + b − 1; (d t /α + d t ) of the Gauss hypergeometric function mentioned above are further applied, and the corresponding parameters in the integral form are expressed, respectively; then, the definite integral can be further formalized as follows: where F 1 (·) is a Gauss hypergeometric function.
Finally, the conditional expectation formula of the hyperparameter θ can be obtained, and its mathematical expression is as follows: (33) If the hyperparameter θ still changes further in d T experiments/tests, then the conditional mathematical expectation of Υ(d t ), the corresponding random variable, can be expressed as follows:

Derivation of the Conditional Expectation
en, further, we need to make sure that in the new experiment/test sequence, how much probability that hyperparameter θ will change. According to our hypothesis in the second section, all the experiment/test sequence for each group in the initial state can keep a change state, the purpose of which is to ensure the overall model can be used to measure is in complete probability space, that is to say, in d T /tests, hyperparameter θ keeps changing the state of the conditional probability, which can be expressed as follows: As for (0, d T ] experiment/test interval, the conditional probability of te hyperparameter θ remaining changing state can be approximately expressed, in the case that historical 6 Mathematical Problems in Engineering information Θ � x, d x , d T and parameters λ and p have been determined: (36) Among them, the molecules represented in d x /tests still keep change state, but that in the experiment/test interval of (d x , d T ] keep zero step change probability, and the denominator is the probability that nothing has changed from the beginning of the experiment/test to the end of the historical information, as the case is in assumption (4) in Section 3.1.
Here, we apply a small trick of conditional probability. For the clever treatment of "1," ] � 1; multiplied by the above equation, we can get an important intermediate result: where , as previously concluded in formula (3); note that we need to assume that, at x � 0, the whole A2 formula should be equal to 1.
We multiply the two important intermediate variables A1 and A2 to obtain the following further results: It should be noted that the case of A1 when x � 0 is eliminated, because the premise for the existence of the entire posterior probability and the posterior mathematical expectation is that the entire historical information should remain in the new interval (d T , d T+t ] state at d T , and the hyperparameter θ must be able to connect with each other. Because the above results are involved parameters λ and p, they are still incomplete expressions of random variable Υ(d t ) on posterior mathematical expectation. erefore, according to Section 2 of model assumptions, g(λ|r, α) � (α r λ r− 1 exp(− λα)/Γ(r)), λ > 0, and g(p|a, b) B(a, b)), 0 ≤ p ≤ 1; further processing, since the probability distribution density of these two parameters is continuous, we can adopt the method of integral and further improve A3 posterior mathematical expectation and add more uncertainty information, especially the hidden information about parameters λ, p that express explicit use of parameter set (r, α, a, b): where g (λ, p|Θ � x, d x , d T , r, α, a, b) is the joint posterior probability distribution density of the parameters λ, p.
According to the basic theory of Bayesian estimation and Bayesian optimization, under the condition that the parameters λ, p is independent, their joint posterior probability distribution density can be expressed in the following complex form: By combining the key intermediate variables A5 and A3 in formula A4, the posterior mathematical expectation related to the hyperparameter θ can be further deduced: where two parameters A and B, respectively, represent the following two operation results: Mathematical Problems in Engineering 7 For A, For B, To achieve formal matching with the Gauss hypergeometric function in the form of Euler integral, we need to further realize variable substitution, and let q � 1 − p and also make corresponding change dp � − dq for the differential variable. Further calculation results are as follows: dq.

(44)
By combining the Gauss hypergeometric function in the form of Euler integral, we get the parameter r, b; a + b − 1; (d t /α + d t ), which corresponds to the corresponding parameter expression in the form of integral function respectively, and we get: Under the condition of formula A6, according to the mathematical expression formula of L(r, α, a, b|Θ � x, d x , d T ) (6), the middle of the two key variables A7 and A8, after reduction, we can get the ideal result, under the BP/ NBD model; the mathematical expectation of overall θ hyperparameters under the posteriori probability has analytical solution, and the mathematic expression of the specific is as follows: (46)

Dataset of an Industry Competition.
e premise of text mining of the entries requires complete project information: (1) Most of the projects in the prototype design stage are in the initial stage, and there are some problems such as incomplete application materials. In this mining modeling process, the projects in the prototype design stage are eliminated. (2) Projects without declaration materials (project documents) cannot pass the preliminary examination. In this mining modeling process, enterprises without declaration materials will be eliminated.
Finally, 143 project contents are reserved for topic mining, and the case content and project output value are mined separately to extract important information and explore the application field, current situation, and key points of value promotion of industrial Internet. e main purpose of using this dataset is to adjust the discrete hyperparameters of the relevant prediction or classification model and compare the differences between the results before and after the adjustment, to form the experimental conclusion.

4.2.
Participle. Jieba Chinese word segmentation component [10][11][12]is called, industrial Internet-related special words such as VR, AR, and smart Park are added into Jieba library, and precise mode is used to segment each document by default. In the process of word segmentation, function words and meaningless symbols are removed. To unify the expression of professional vocabulary, a thesaurus is set to merge synonyms to improve the effect of later topic extraction. For example, "artificial intelligence" and "Ai" are synonyms and merged into "Ai." e unimportant words are filtered according to the part of speech, and the related words are finally reserved.

LDA Topic Extraction.
e generated dictionary and corpus are in accordance with the input format of the model, and the open-source Gensim package [13] is used to construct the topic model and estimate the parameters. e default value of parameter selection is a � 0.37 and B � 0.02. e optimal topic number k is determined according to the perplexity of the model. In theory, the k-nearest neighbor with low confusion degree should be selected, but under the premise of small corpus database capacity, more topics may lead to overfitting phenomenon. erefore, this time, the number of topics is adjusted by combining the coincidence degree of the visualization results. Finally, the optimal number of topics is 4, the optimal number of output value topics is 2, and there is no coincidence between topics.

Basic Model.
According to the results of index analysis and LDA theme mining [14][15][16], the projects that are in the prototype design stage and lack of application materials are selected to be eliminated. Finally, 143 projects are retained to enter the modeling stage, of which 67.83% can enter the second round. Considering that there are many projects in the preliminary evaluation stage, which are not easily affected by external factors and can better reflect the development trend of current related industries, the paper uses XGBoost and multinomial NB to predict and model the preliminary evaluation results of 2020 related industry competition. e discrete hyperparameter indexes of the correlation model are the highest polynomial degree (multinomial NB) and the number of root nodes (XGBoost).

Evaluation Index.
Because of the classification model prediction, the confusion matrix is selected as the basic evaluation index, and the evaluation index is derived from confusion matrix, as given in Table 1.

Comparison of Experimental Results.
In this modeling process, the polynomial naive Bayesian model (multinomial NB) and integrated model (XGBoost) are selected as training models (lack of specific mathematical description of the two methods). 70% of the samples are used for model training and 30% for testing. e parameters are adjusted in the training set dataset combined with threefold cross-validation, and "AUC" is used for training in the parameter adjustment process Figures 1 and 2 show the effect of the model after adjusting the parameters, which has a certain reference value for the actual prediction performance of the model. On the test set, the effect of the model is as follows.
rough the comparison of experiments, we can find that (1) From the classification prediction results of the whole model, the final result of XGBoost method is obviously better than the traditional Bayesian model in both the original configuration and the adjusted configuration (2) e method of adjusting discrete hyperparameters based on skew distribution is effective. rough the empirical results of data experiments, the overall model, whether XGBoost or Bayesian model, achieves the improvement of relevant evaluation indicators under the new hyperparameter adjustment strategy.   Mathematical Problems in Engineering 9

Conclusion
rough the combination of mathematical analysis and empirical research, this study focuses on the optimization method of the hyperparameters of the relevant machine learning model under the condition of skew distribution, especially the discrete hyperparameters.
rough strict mathematical assumptions and mathematical derivation, the expected value of the discrete hyperparameters is obtained, and then, the actual data test is carried out through the empirical dataset. e experimental results show that this heuristic parameter adjustment method is feasible and effective for XGBoost and Bayesian models.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e author declares that there are no conflicts of interest.