Research on Audit Data Analysis and Decision Tree Algorithm for Benefit Distribution of Enterprise Financing Alliance

When traditional methods analyze the audit data of enterprise financing alliance, there are some problems, such as long algorithm modeling time and low accuracy of interest distribution algorithm of enterprise financing alliance. ,erefore, this paper proposes an analysis method of interest distribution of enterprise audit data financing alliance based on the decision tree algorithm. ,e audit data collection process of enterprise financing alliance is given, and the continuous attributes of audit data are discretized by the C4.5 algorithm. We perform enterprise financing alliance audit data analysis, remove inconsistencies from audit data through data cleaning, and finally realize enterprise financing alliance audit data analysis based on the improved C4.5 algorithm. ,e experimental results show that this method can shorten the modeling time and improve the accuracy of interest distribution algorithm of enterprise financing alliance. We achieved an average accuracy of 84.7% with the C4.5 algorithm while 84.35% with NBTree.


Introduction
Studies have shown that establishing a financing cooperation alliance between small and medium-sized enterprises (SMEs) is an effective way to solve the financing difficulties of SMEs. However, in practice, with the continuous development of SME financing alliance, the problem of instability of financing alliance is gradually exposed. Among them, the imbalance of benefit distribution of financing alliance has become one of the factors leading to the instability of financing alliance [1]. At this stage, most of the research on the distribution of interest has evolved from the game, for example, using the expected income of the pairwise game under different situations of the three parties to study whether banks, small and medium-sized enterprises and guarantee institutions participate in cooperation under the condition of incomplete information.
With the intensification of economic globalization and technology, fundamental changes have taken place in the survival and development of enterprises in many aspects such as the changing technical level, the rapidly changing market environment, the shortened product life cycle, the more personalized, diversified, and differentiated customer demand, and the higher requirements for product quality and service [2]. Facing the complex and fierce competitive environment, if enterprises want to get a place and improve their competitive advantage, innovation has gradually become the key to enhance their competitiveness. e Fifth Plenary Session of the 18th CPC National Congress proposed that innovation is the first driving force leading development, and innovation driven development is a major strategy for enterprises facing the future [3]. In terms of the current situation, a single enterprise has limited resources. In order to meet the personalized needs of customers, the product life cycle needs to be shortened, and risks need to be reduced. It is difficult for enterprises to fully rely on their own strength to achieve their goals, which is necessary to combine with other enterprises with complementary resources. In order to analyze the audit data of benefit distribution of enterprise financing alliance, this paper introduces the decision tree algorithm. e ID3 algorithm forms a decision tree and uses mutual information to find the attribute field with maximum amount of information in database. e C4.5 algorithm is used to create a data mining model. is model is used to predict the classification authenticity, and the records are considered suspicious if the predicted results are different from the actual distribution. e rest of the paper is organized as following. Section 2 discusses the computer-based audit process. In Section 3 and the following sections, we present an overview of the decision tree algorithm and audit data analysis and talk about data acquisition. In Section 4, we talk about audit data analysis based on the C4.5 algorithm. In Section 5, we discuss the experiments and the results, and Section 6 is the conclusion of the proposed work.

Computer Audit
Computer audit is also known as computer-aided audit. Since the emergence of the word "computer audit," there has been a vague understanding of the concept of computer audit. With the continuous development of theory and practice, people have a deeper understanding of computer audit. At present, it mainly includes two aspects: first, auditors use audit software to check the accounting system. It is an audit of whether the economic activities and accounting materials of the audited unit are true, correct, legal, and effective, which is basically consistent with the traditional audit content, but the means have changed, which is generally called computer-aided audit. Second, testing and evaluation of the electronic information system itself refers to the collection and evaluation of audit evidence to judge the security of resources and assets related to the information system of the auditee, the integrity of data and system, etc. [4]. Figure 1 describes the main process of using the computer audit technology to audit the electronic data of the audited unit.
Pretrial investigation: by grasping the organizational structure of the units being audited, the general situation of the distribution and use of the information system of the unit is mastered. Determine the key subsystems according to the audit objectives, conduct further in-depth investigation on them, and put forward the audit data requirements.
Audit data collection: according to the data requirements of pretrial investigation, determine the objects and means of data collection, and grasp the overall information of audit objects [5].
Data collation: due to the complexity of the auditee's information system and possible deliberate concealment and creation, the collected data must be cleaned and transformed. If necessary, the data from multiple data sources must be integrated to ensure the integrity and consistency of the data and lay a necessary foundation for the subsequent audit work [6].
Audit analysis: for the integrated, complete, and consistent data stored in the database, establish the corresponding analysis model according to the audit focus and analyze the data from different layers and different angles.
Extension, implementation, and evidence collection: since the analysis process is subjectively processed by people, it is inevitable that there will be analysis errors or even errors. According to the results of audit analysis, collect evidence for the problems found and further verify and implement the clues found [7].
From the above process, it can be seen that the sorting and modeling analysis of audit data are two key links closely related to the audit results. eir implementation will directly affect the establishment quality of audit database and further have a significant impact on audit analysis [8].
At present, there is no precedent for authenticity prediction in public rental housing audit. After discussion and analysis, we decided to collect the information of vehicles, industrial and commercial registration, and endowment insurance in addition to the real estate information and provident fund loan information, so as to generate a decision tree and to extract rules to predict the authenticity. In terms of specific operation, the first step is to randomly divide all public rental housing information into two parts. One part is the test data set, and the other part is the training data set. e second step is to use the C4.5 algorithm, which is the training data set to create the data mining model. e third step is to use the created model to "predict" the classification authenticity of the former, that is, the test data set, and obtain records that the prediction results are different from the actual distribution. e fourth step is to find the idea of improving the C4.5 algorithm through the research on the improvement of the decision tree algorithm and finally get the improved result, compare it with the original result, and evaluate its accuracy.
Step 5: if there are records that the predicted results are different from the actual distribution, it can be considered as suspicious data, treated as abnormal conditions, and conducted in-depth audit, analysis, and investigation.

Audit Data Analysis Based on Decision
Tree Algorithm

ID3 Decision Tree Algorithm.
Decision tree is a tree structure similar to binary tree or multi tree, which is used for predictive modeling of discrete and continuous  attributes. e decision tree uses the attribute of the sample as the node and the value of the attribute as the branch, which is a process similar to the flow chart. Each internal node represents the test on an attribute, each branch represents a test output, and each leaf node represents a class or class distribution. It adopts the recursive method of topdown, divide and conquer. If all examples in the training sample set belong to the same class, they will be regarded as a leaf node and identified as this class [9]. If not, first determine a test attribute according to a certain method and divide the sample set into multiple subsets-internal nodes according to different values of the test attribute-so as to ensure that the samples have the same attribute values on the same subset. en, each subset is processed repeatedly until the classification attributes satisfying the conditions are obtained. e root node is the attribute with the largest amount of information in all samples, the intermediate node is the attribute with the largest amount of information in the sample subset contained in the subtree with this node as the root, and the leaf node of the decision tree is the category value of the sample [10].
ID3 learning algorithm, which forms a decision tree through the selection window, uses the mutual information (information gain) in information theory to find the attribute field with the maximum amount of information in the database. It establishes a node of the decision tree and then establishes the branch of the tree according to the different values of the attribute field. Repeat the process of establishing the lower nodes and branches of the tree in each branch subset [11]. e pseudocode is as follows: Generate_decision_tree(samples, attribute): a decision tree is generated from the given training data.
Input: training samples, represented by discrete value attribute; Collection of candidate attributes attribute_ list. Output: a decision tree. Method: Generate_decision_tree(samples, attribute_list) (1) Create node N; (2) if samples All in the same class C then// e value of class label attribute is C, its candidate attribute values are not considered. (3) return N is the leaf node, marked with class C; (4) if attribute_list Empty then. (5) return N, as a leaf node, is marked as the most common class in samples// e one with the largest number of class label attribute values. (6) Select attribute_ Best attribute with the highest information gain in list_ attribute; //Find the best partition attribute. (7) Mark node N as best_ attribute; (8) for each best_ Unknown value ai in attribute//set samples as best_ Attribute (9) A condition of best grows from node N_ Attribute � branch of ai; (10) Let si be the best in samples_ Attribute � set of samples of ai//a partition. (11) if si Empty then. (12) Add a leaf and mark it as the most common class in samples//Find out the one with the largest number of class labels from the sample as the tag of this node. (13) Else plus one by generate_ decision_ tree(si, attribute_list-best_ Attribute) returned node//Call the data subset Si recursively. At this time, the candidate attribute has been deleted_ attribute.
e advantages of ID3 algorithm are as follows: (1) e basic principle of the algorithm is clear (2) Fast classification speed (3) A practical example learning algorithm e ID3 algorithm also has some disadvantages such as the following: (1) ere is a bias problem, and the number of values of each characteristic attribute affects the amount of information (2) A little problem with the training data will make the results different and sensitive to noise (3) e probability of error is proportional to the increase of category

Application of Decision Tree Algorithm in Audit.
According to the work plan deployment of the audit department, auditors will audit the authenticity of a public rental housing distribution. At present, the judgment standard of public rental housing allocation is mainly whether there is a real estate, the size and purpose of the real estate, and the number of family members. e specific situation often needs field visit and investigation. However, although some audit doubts have been found according to this standard, many doubts have been ignored. For example, some people who enjoy affordable housing give their real estate to their relatives or parents by means of ownership transfer in the recent period of audit in order not to be found out in the audit process [12]. In order to improve the efficiency of field visit and investigation and prevent false information reporting (such as real estate information fraud), relevant data of multiple units should be collected, the classification rules of identified doubtful points of affordable housing allocation should be excavated, and the authenticity of normally allocated personnel should be predicted, so as to select some suspicious personnel for extended audit [13].
At present, there is no precedent for authenticity prediction in public rental housing audit. After discussion and analysis, we decided to collect the information of vehicles, industrial and commercial registration, and endowment insurance in addition to the real estate information and provident fund loan information, so as to generate a decision tree and to extract rules to predict the authenticity.

Scientific Programming
In terms of specific operation, the first step is to randomly divide all public rental housing information into two parts: one part is the test data set and the other part is the training data set. e second step is to use the C4.5 algorithm, which is the training data set to create data mining model [14]. e third step is to use the created model to "predict" the classification authenticity of the former, that is, the test data set, and obtain records that the prediction results are different from the actual distribution. e fourth step is to find the idea of improving the C4.5 algorithm through the research on the improvement of decision tree algorithm and finally get the improved result, compare it with the original result, and evaluate its accuracy [15]. In Step 5, if there are records that the predicted results are different from the actual distribution, it can be considered as suspicious data is is treated as abnormal conditions, and in-depth audit, analysis, and investigation were conducted.

Data Acquisition.
Audit data collection is the working premise of data mining in audit application and an important step to obtain the electronic data of the audited unit [16]. e audit practice shows that the acquisition can be realized in two steps: the first step is to accurately select the electronic data to be collected. e second step is to effectively collect the target data. e process is shown in Figure 2.

Data Selection Strategy.
In order to efficiently select the data concerned by auditors from various complex data of the auditee, the collection shall follow the following three principles.
(1) e selected data shall meet the requirements of the audit scheme. e audit scheme describes the main work contents and requirements of the audit. rough the analysis of the audit scheme, master the audit contents and requirements and determine the object and scope of data collection according to these contents and requirements, which is the first step of data collection [17].
(2) Data selection shall be based on a full understanding of the auditee's information system and its business process. By analyzing the generation and use relationship between business processes and data, combined with the needs of audit business, reasonably determine the object and scope of collection.
(3) Data selection should not be limited to specific auditees. Using external data for correlation analysis is an effective means. Generally, there are specific relationships between the data of some departments, such as between enterprise reports and the records of tax departments and between enterprise sales data and Golden Tax engineering data. In the audit process, not only can external data be used to verify the data of the auditee but also external data can be used to replace the data of the auditee for audit when necessary [18]. For example, in this audit, in addition to collecting the data of the audited units such as real estate and provident fund loans, in order to conduct effective audit analysis, relevant data of vehicle, industrial and commercial registration, endowment insurance, and other departments were also collected.

Data Cleaning.
Audit data cleaning is an important part of computer data audit. Its main work is to remove attributes and redundant attributes irrelevant to mining topics, smooth noise data, and fill in vacancy values, so as to serve for subsequent audit data analysis. Although most current classification algorithms (such as C4.5 algorithm) integrate the function of processing noise and vacancy values, however, doing this step well before data mining helps to reduce the deviation in the learning process. Data cleaning can be performed before or after data conversion. Audit data cleaning mainly includes confirming the input data, modifying the wrong value, replacing the null value (e.g., "0"), ensuring that the data value falls into the definition domain, eliminating redundant data, and resolving conflicts in data. is will have a great impact on the results of data mining [19]. According to different data conditions, there are mainly three technical methods for processing: (1) Completing the incomplete data (2) Processing noise data (a) Binning: by examining the "neighbors" (i.e., the surrounding values) to smooth the value of the stored data, the box average smoothing or box boundary smoothing is mainly used to divide the stored values into "bucket" or "box." (b) Clustering: detection of outliers by clustering [20]. e data are clustered unsupervised. e records that cannot be clustered are considered as outliers and can be treated as noise. (c) Combination of computer and manual inspection: computer detection and manual detection methods are combined to find outliers and other data in the data set and clean them appropriately.
(3) Inconsistent data correction In addition to missing data, there are data inconsistencies in the original data. In fact, by analyzing the correlation between data attributes, integrity constraints can be defined to detect inconsistencies and correct these inconsistent data. For example, by observing and analyzing the intermediate table of public rental housing audit data, it is found that some attributes are vacant and lack attribute values. For example, dkye (loan balance) attribute lacks corresponding data values. At this time, the method of ignoring tuples can be used to delete these records. If the attribute value of dkrq (loan date) is earlier than that of KHRQ (account opening date) in the table, such inconsistencies in the data shall be noted and corrected to clean up the data.

Data Conversion.
In the original data obtained in this paper because in the process of input, the personal habits of the staff cause the data format of many data to be inconsistent. In order to better carry out data mining, this paper first needs to unify the data format through the method of data conversion.
For example, the formats such as issuing date and house purchase date are uniformly converted to date format.
In this paper, the C4.5 algorithm can be used to discretize continuous attributes. In the existing public rental housing information table, the payment base and construction area are continuous, so it needs to be discretized. Discretization technology can also provide hierarchical or multidimensional division of attribute values, which can be called conceptual stratification. e main methods are: box division, histogram analysis, cluster analysis, entropy-based discretization, and data segmentation through natural division. In this paper, we use the box method to discretize the attribute values. e payment base is the base of endowment insurance. In order to better carry out data mining, the payment base and construction area can be divided into three boxes. e implementation process of discretization will be carried out in Weka.

Audit Data Analysis
Based on the Improved C4.5 Algorithm. In this paper, the C4.5 algorithm is improved accordingly.
e degree of interest is introduced in the weighting process λ. It is called the user's interest in uncertain knowledge and is a fuzzy concept. It usually refers to the prior knowledge about a certain transaction, including domain knowledge and expert suggestions. Its size is determined by the decision-maker according to the prior knowledge or domain knowledge. e improved C4.5 algorithm modifies the original attribute selection criteria. When calculating the information gain of an attribute, the introduction of positive interest will reduce the support of the attribute for decision-making, that is, if the entropy increases, the information gain will be small, and the information gain rate will also be reduced. On the contrary, the introduction of negative interest will increase the support of the attribute, that is, when the entropy decreases, the information gain will increase, and the information gain rate will also increase.

Due to
Regulations: among λ. e size is determined by |T i | of the attribute. Introducing interest λ, formula (2) becomes or Formula (1)   Scientific Programming e degree of interest can improve or reduce the support of the selected attributes for decision-making, in which the degree of interest is positive λ > 0. When using, the information gain of appropriate attributes should be corrected, and the correlation changes dynamically with |T i | of different attributes and the size of |T| of training samples. e selection steps of classification attributes by the improved C4.5 algorithm are shown in Figure 3

Performance Analysis Experiment of Decision Tree Algorithm
We selected 11 sample data sets from the data set published on Weka website to analyze and compare the algorithm performance of the decision tree algorithm C4.5, cart, and NBTree algorithm in Weka. In the selection of sample data sets, the data size of these 11 sample sets is different, and they all contain both discrete attributes and continuous attributes, so they can comprehensively test the performance of each algorithm. e experimental sample data is shown in Table 1. Among them, meta, optdigits and aggregation data sets are large. e above 11 groups of data are classified and processed by the above three algorithms on Weka.

Evaluation Index of Decision Tree
Algorithm. Among many decision tree algorithms, some deal with discrete data, some deal with continuous data, some consider the shortest modeling time, and some consider the best classification effect. Different decision tree algorithms may have many differences in processing data types, selection of modeling mechanism, construction methods of decision tree, expression methods of classification rules, and so on. So, in the specific affordable housing audit process, how to select the appropriate classification method is to improve the audit efficiency? Next, we analyze the performance of decision tree algorithm based on Weka platform.  ere are basically two indexes used to compare and evaluate the performance of decision tree algorithm: (1) Accuracy. e accuracy of classification is the primary condition to judge the advantages and disadvantages of the algorithm, and high accuracy can provide people with reliable classification information (2) Calculation Speed. e computational cost of generating and using the model is directly proportional to the efficiency of the algorithm  Table 2, we can see that the accuracy of interest distribution algorithm of enterprise financing alliance is different under different methods. Table 3, C4.5 algorithm has the fastest modeling speed, cart algorithm takes the second place, and NBTree is the slowest. When the data set is large, such as meta, optdigits, and aggregation, the time required for cart increases rapidly, while C4.5 algorithm is relatively stable.

Conclusion
is paper presents an audit data analysis method of interest distribution of enterprise financing alliance based on the decision tree algorithm. In this paper, the decision tree model obtained by improving the C4.5 algorithm is more in line with the benefit distribution rules of enterprise financing alliance. Experiments also verify the effectiveness and rationality of the improved algorithm. However, in the experiment, it is found that while the accuracy of the improved model is better, it also increases the complexity of the decision tree.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declares no conflicts of interest.