Application of Decision Tree-Based Classification Algorithm on Content Marketing

Traditional content marketing methods resort grossly to market requirements but barely obtain relatively accurate marketing prediction under loads of requirements. Machine learning-based approaches nowadays are widely used in multiple ﬁelds as they involve a training process to deal with big data problems. In this paper, decision tree-based methods are introduced to the ﬁeld of content marketing, and decision tree-based methods intrinsically follow the process of human decision making. Speciﬁcally, this paper considers a well-known method, called C4.5, which can deal well with continuous values. Based on four validation metrics, experimental results obtained from several machine learning-based methods indicate that the C4.5-based decision tree method has the ability to handle the content marketing dataset. The results show that the decision tree-based method can provide reasonable and accurate suggestions for content marketing.


Introduction
Content marketing is normally regarded as a management process responsible for identifying, anticipating, and satisfying customer requirements profitably in the context of content via electronic channels [1]. e designation of research methods in content marketing has received plenty of attention for more than 100 years. Traditionally, content marketing strategies are determined based on different developments technologies, market requirements and expectations, growth in knowledge, and so on [2]. Machine learning-based methods have received more attention in the past several decades because they have the ability to analyze historical data and predicting future potential behaviors and activities more effectively [3,4]. When there exist newly available data, managers can capture relatively accurate estimations from machine learning prediction models to tackle underlying challenges, such as missing outcomes and stimulus sampling [5]. Due to the fast development of machine learning, the current content marketing strategies tend to be digital-driven, meaning that companies put more attention to the consumer concentration on the Internet or historical habitats [6]. In this regard, the data scale and dimensions become huge, meaning that the traditional marketing methods are not suitable. e major motivation of this paper is therefore to investigate the potential of machine learning methods in dealing with the high dimensionality of attributes of marketing data.
Based on different data requirements, machine learning methods can be divided into three parts [7]. e first part refers to supervised learning-based methods that require an independent training process and predict testing data using an already learned model. e representative supervised models include sparse/collaborative representation [8][9][10], support vector machine [11][12][13], ensemble learning [14][15][16][17], and so on. e second part refers to unsupervised learningbased models that do not demand training samples and determine entire classes by considering the correlations among samples. e unsupervised models are also called clustering methods, such as K-means [18,19], ISODATA [20,21], fuzzy C-means [22][23][24], and so on. e third part refers to semi-supervised learning-based models that simultaneously consider labeled and unlabeled samples in the training stage. Representative methods include semi-supervised support vector machines [25] and deep generative model [26].
Among the above-mentioned machine learning methods, decision trees are the most discussed scalable multivariate methods in machine learning, which intrinsically follow the process of human decision making. Unlike emerging deep learning-based methods [27][28][29][30][31] that require strict support, decision trees consider the latent structure of training data. at is to say, they split the training samples into bins and each of them is verified after calculating the most prevalent split variable by means of a specific metric, such as information gain value, Gini index, and entropy metrics. Besides, decision tree-based methods show at least three advantages. First, the decision tree model to some degree is easy to follow and implement, meaning that customers do not have to know complex and latent background knowledge in the training procedures. Second, the decision tree-based learning models do not pay sufficient attention to the preparation process of samples as it to some degree is useless and time consuming. e main reason behind this fact is that the decision tree-based model equally treats the attributes of samples as the learning foundations, which makes it possible to handle a relatively huge data scale.
ird, the process of verifying effectiveness and robustness of the decision tree model can be easily measured. Figure 1 visually illustrates the processing of classification using a decision tree.
Chou [32] proposed a classification and regression tree method in 1984, called CART.
is method is a representative nonparametric learning strategy that produces classification and decision trees based on the status of dependent variables. In 1986, Quinlan proposed an iterative dichotomiser 3 method, called ID3 [33]. In this method, information gain is considered to determine the potential of the next nodes, meaning that the nodes with high information gain will be split. Xiaoliang et al. [34] extended Quinlan's work and developed a well-known C4.5 method in 2009. Similar to ID3, C4.5 uses a well-known information gain metric. Besides, it also deals well with continuous inputs because it uses thresholds and splits attributes when the values are higher than the threshold. In 2012, Patil et al. [35] further extended C4.5 and ID3 and proposed C5.0 classification method. C5.0 not only has low time and memory consumptions compared to C4.5 and ID3 but also is efficient as it splits the nodes in terms of the field that has the maximum information gain. Recently, publications related to decision tree-based machine learning methods exist in different variants such as random forest [36,37], gradient boosting decision tree [38], regression decision tree [39,40], and so on.
Among loads of decision tree-based methods, this paper specifically considers C4.5 and applies it to content marketing data analysis. is method is widely used because its classifying rules follow the human's thinking process and it also has the ability to produce desirable results. Based on four validation metrics, we conduct several important experiments on the bank content marketing dataset for the purpose of verifying the performances of C4.5 and other comparison methods. e main contribution of this paper is that we introduce a decision tree-based method to the content marketing field to efficiently improve the marketing capability.

Validation Metrics.
Validation metrics including entropy, information gain, Gini index, gain ratio, and so on are the most important standards to determine whether the current node should be split. Entropy verifies the uncertainties among a set of random variables. When random variables have high entropy values, it means that the uncertainty of the random variables is equally high and the current should be further divided into two nodes. e entropy information of such random variables is given as follows: where log denotes a base 2 logarithm and they have the same meaning in the subsequent formulas. Information gain represents the difference between the entropy obtained before performing splitting and that after performing splitting. In this regard, the standard of splitting relies on the method that can produce maximum information gain as it means better classification performances. e information gain is where H(X|A) means the entropy after excluding A attributes from X, given by  Figure 1: Visual descriptions of classification process in content marketing behaviors under the decision tree framework. Based on a splitting metric, each node will be quantified. When the leaf nodes satisfy the requirements, they will be selected as a category; otherwise, they will be allocated to different classes.

Journal of Mathematics
Gini index represents the uncertainty of data. High Gini values mean high uncertainties of samples, and it can be defined as follows: 2.2. C4.5-Based Decision Tree Method. C4.5 algorithm is normally seen as an important variant of the traditional ID3 algorithm, which contains the following representative improvements to ID3 algorithm: (1) C4.5 selects split attributes through information gain rate; (2) C4.5 can handle discrete-continuous attributes; (3) C4.5 involves pruning processing after finishing the construction of the decision tree; (4) when the problem of missing values occurs, C4.5 still can maintain its performances. When C4.5 constructs the decision tree, those attributes with high information gain rate are normally adopted for splitting the current node. Following this recursive process, the calculated information gain will get smaller. Finally, if the attribute contains high information gain rate, it will be split. Gain ratio can be given by where IV(·) denotes intrinsic values of attributes and p(a|A) denotes the ratio of a in A. e IV(·) operator is defined as When the attribute type is discrete, there is no need to discretize the data. When the attribute type is continuous, the data need to be discretized. C4.5 algorithm aims at the discretization of continuous attributes. e core idea is to arrange the N attribute values in ascending order. All attribute values are divided into two parts by dichotomy. Besides, the information gain corresponding to each division method is calculated and the threshold of the splitting rules is selected according to the largest information gain. e detailed process is given as follows: (1) All data samples on the node are arranged from small to large according to the specific values of the continuous attribute to obtain its attribute value Based on this threshold, two subsets are produced, i.e., . en, we calculate the information gain ratio and go to step (3).
(3) By calculating the entire information gain ratio under possibilities, we can obtain two optimal splitting subsets corresponding to maximal gain ratio results. We record the corresponding splitting threshold.
2.3. Pruning Process in C4. 5. e establishment process of the decision tree depends heavily on training samples, meaning that the fitting effect on the given training data will occur. It is worth mentioning that, however, the decision tree-based learning model to some extent is complex for the testing data, triggering low classification accuracies, a.k.a. overfitting. erefore, the process of simplifying the decision model is highly demanding, resulting in a crucial step, i.e., pruning.
C4.5 algorithm adopts the BEP (Bayesian error pruning) method. e BEP method was proposed by Quinlan, which is a top-down pruning method. It determines whether to prune the subtree according to the error rate before and after pruning, so it does not need a separate pruning dataset.
For a leaf node, assume that it involves N samples with E errors. en, the error ratio is (E + p)/N, where p denotes a penalty factor. en, for a subtree with L leaf nodes, its error ratio is where E i denotes the sample numbers with regard to i-th leaf and N i denotes the total number of samples in i-th leaf node. Suppose a subtree misclassifies a sample with a value of 1 and correctly classifies a sample with a value of 0, then the number of misclassifications of the subtrees follows a Bernoulli distribution, so their statistical information, i.e., mean and standard deviation values, can be obtained by After replacing the subtree by leaf nodes, their error values can be determined by where E ′ � L i�1 E i and N ′ � L i�1 N i . Note that the number of misclassifications of leaf nodes also follows a Bernoulli distribution. e mean values of the number of misclassifications of leaf nodes are defined as error mean ′ � error ratio ′ × N ′ .
Here, pruning can be performed if it satisfies the following formula: Journal of Mathematics 3 error mean + error Std ≥ error ratio ′ .
In order to visually describe the process of performing C4.5 decision tree on given data, this paper provides the corresponding flowchart of C4.5 (see Figure 2).

Experimental Parameters.
e entire experiment involved in this paper was conducted on a PC with Intel Core i7-10700 processor at 2.90 GHz and 16 GB RAM. e GPU is NVIDIA GeForce GTX 1660 Super. e comparison algorithms in this paper involve nearest neighbors (the number of nearest neighbors is fixed at 3), linear SVM (the penalty parameter is fixed at 0.025), neural network (the regularization parameter is fixed at 1 and other parameters are fixed under default settings in sklearn package), Naive Bayes, and random forest (the maximum depth is fixed at 5).

Datasets.
is paper considers a bank campaign content marketing dataset, which was originally derived from the Portuguese banking institution. is dataset has 50 user features by investigating multiple content marketing campaign activities based on traditional phone calls in order to find the real information that if users are interested in purchasing the products in the near future. Figure 3 visually describes the correlations among several features using a heatmap.

Validation Metrics.
In order to accurately describe the experimental accuracy, this paper uses four standard evaluation metrics derived from the confusion matrix for evaluation purposes including accuracy, precision, recall, and F1 score. e above four evaluation criteria are calculated based on the confusion matrix. Figure Figure 5 displays the resulting trend of the decision method when varying the max depth from 1 to 15. As can be seen from Figures 5(a) and 5(b), when the number of max depths is fixed as 4, the decision tree method shows the best training accuracies in Gini index and gain ratio metrics. Normally, the max depth of decision tree means the maximum depth to which a tree is allowed to grow. However, a deeper tree will trigger a complex model. Specifically, for the training stage, if we increase max depth, the training error will always decrease (or at least not increase  Figure 2: Flowchart of C4.5 decision tree. In this first step, N nodes are created. In the second step, the algorithm will start a training process. In this regard, the attributes and the gain ratio of attributes are verified. After obtaining gain ratios of each node, the several nodes with high gain ratio are selected for further split by iteratively executing the aforementioned step.

Experimental Results.
training data. Hence, in this paper, we fix the number of max depths of the decision tree as 4. Figure 6 visually illustrates the node construction process of the decision tree. Owing to the fact that the number of useful features is too large, the max depth of the decision tree is fixed at 3 for better visualization. As can be seen from Figure 4, the splitting manner highly depends on a determined metric and a threshold. When the Gini index of a node is higher than a threshold, then it will be split into two new nodes.
Generally, each classifier pays different attention to features, which means that not entire features are equally important in model training. In order to visually display the importance of features in the content marketing dataset, this paper provides comparisons among linear SVM, decision tree, and random forest. It can be seen from Figure 7   Journal of Mathematics random forest method, the top 10 features seem equal to play an important role in the training model. e relatively lower weights also indicate that most of the features are highly used.
In this paper, 20% of total samples are selected as training samples and the rest of 80% samples are treated as testing samples. Table 1 tabulates overall experimental results obtained from six classification models based on four validation metrics. As can be found from Table 1, the decision tree can capture the best predicting performances as it has the highest metric values. Compared to the decision tree, other comparison algorithms such as neural network and random forest, respectively, have the optimal results with regard to F1 score and recall. In addition, the nearest neighbor has the second-best results for F1 score, linear SVM can achieve the second-best results for recall and F1 score, the neural network has the suboptimal results for accuracy, Naive Bayes has the suboptimal results for precision, and random forest has the suboptimal results for F1 score. Execution time obtained from entire algorithms is displayed in Table 1. As can be found from this table, nearest neighbors and linear SVM require the highest computational time, while others cost much less. Figure 8 visually displays the decision boundaries of six classification methods. From those figures, we find that compared to Figures 8(a)-8(d), Figures 8(e) and 8(f ) show more distinct boundaries. Figure 9 further adds the ROC curves obtained from four representative classification methods. As can be seen from Figure 9, even though the decision tree has lower area under the ROC curve than that of the nearest neighbors, it shows high intensity.
To verify the impact of different ratios of training samples on the experimental results of six algorithms, this paper performs an experiment on the content marketing     dataset with varying the number of training samples from 5% to 25. Figure 10 provides a visual comparison of six classifiers on the different ratios of training samples. Compared to other methods, the decision tree shows the best results in all situations of training sample number. e results obtained from Naive Bayes seem disappointing primarily because it assumes that the attributes of samples are independent. In this regard, if the sample attributes are related, the effect is not good. Besides, it cannot learn the interaction among features, which highly limits its experimental performances.

Conclusions
Traditional content marketing strategies rely heavily on empirical knowledge such as market requirements and expectations. However, when the content marketing process meets loads of diverse users in different societies or communities, the user data will be huge so that the traditional  content marketing strategy will be difficult. Machine learning-based methods have the ability to analyze historical data and predict future potential behaviors and activities more effectively. Among plenty of machine learning-based methods, decision tree has received more attention from content marketing managers as it intrinsically follows the process of human decision making. To verify the performance of the decision tree on the content marketing data, this paper considers a well-known decision method, called C4.5, and simultaneously compares C4.5 with other five machine learning methods including nearest neighbor, linear SVM, neural network, Naive Bayes, and random forest. Based on four validation metrics, experiments were conducted on the bank content marketing dataset under different experimental scenarios and settings. Experimental results obtained from six methods indicate that the decision tree has the ability to handle the content marketing dataset, meaning that it can provide reasonable and accurate content marketing suggestions for managers.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.