A Precision Marketing Strategy of e-Commerce Platform Based on Consumer Behavior Analysis in the Era of Big Data

In order to develop a more efficient and accurate marketing strategy for consumers’ purchase behavior, this paper establishes a user value model by modeling and learning the user historical data of e-commerce enterprises. The improved K-means algorithm is used to cluster the purchase behavior of users, and the customer value matrix is constructed from two dimensions of consumption frequency and average consumption amount. Finally, e-commerce users are classified into four categories by marking points. The test results show that the improved K -means algorithm is stable and efficient, and the analysis of user clustering characteristics is helpful to develop more accurate marketing strategies.


Introduction
Due to the improvement of storage capacity of computer technology, the amount of data produced by people has also increased exponentially.In this context, the traditional business marketing model is more and more unable to adapt to the complex and changeable social environment, especially the e-commerce industry relying on the Internet.In addition, the number of customers is large, which is becoming more and more di cult to carry out marketing activities for customers.Moreover, with the continuous improvement of online shopping system, consumers' consumption psychology and consumption form are changing, and the traditional marketing method has been far from meeting the needs of today's consumers [1,2].At this stage, many e-commerce companies do not have a clear target customers in their marketing that all users are publicized and promoted in a large quantity and full coverage way, which will make users receive a large number of messages without any di erence.In the long run, it may lead to the user's resistance psychology and cause the loss of target users.is is not only a waste of resources and capital but also lead to the lack of quantitative evaluation method for the e ect of marketing, so it is a general trend to introduce precision marketing based on big data technology into e-commerce industry [3,4].
Data mining technology is popular in foreign countries in recent years, which is a trend to apply data mining technology to precision marketing.Zhu and Gao [5] studied the method of customer segmentation based on data mining.Qiujin et al. [6] tried to reexamine precision marketing from the perspective of the Internet.He proposed that Internet technology can be used to accurately locate customers in the implementation of precision marketing, carry out targeted marketing activities for di erent users, and use precision marketing to achieve the growth of enterprise bene ts.Wu and Yao [7] studied the application of data mining technology in marketing activities.He pointed out that data mining technology can provide more accurate information for enterprises and has certain guiding signi cance for enterprises to specify marketing strategies, so as to improve the pro ts of enterprises.Zhao [8] and Sophie et al. [9] pointed out that under the background of big data, enterprises should pay attention to data collection when conducting marketing activities because the role of data is crucial in the era of big data and predicted the development of data mining in the field of precision marketing in the future, while Yu [10] studied the marketing strategy based on the record data of users on the Internet in the context of big data.
e current research on the construction of e-commerce only focuses on the optimization of data processing, and few attentions are paid to the optimization of data mining processes, such as the construction and selection of data features.e traditional prediction method of user's purchase behavior is mainly nonmachine prediction, which uses the accumulation of professional knowledge in related fields to predict the future behavior of users.e disadvantage of this method is that the subjectivity is too strong and the prediction accuracy is not high [11,12].In addition, the user classification obtained by machine learning lacks the corresponding marketing strategy research.
erefore, this paper establishes a new user value model by modeling and learning the user history data of e-commerce enterprises, so as to accurately locate the target user group of the enterprise, which is helpful for enterprises to achieve the goal of precision marketing; Second, through the feature construction of user historical data, the import information of each feature is obtained by using the algorithm, and the main features are reasonably explained, which can provide more effective guidance for enterprises to formulate marketing strategies for different users.

Data Mining of Consumer Behavior.
e core of data mining is to analyze the characteristics of consumer behavior, and manage and research the consumer groups with common characteristics based on the consumer behavior characteristics obtained by data mining [13].In the process of analyzing consumer behavior characteristics, the application of data mining technology follows the following process as shown in Figure 1: (1) Clear the subject and main research methods, combined with the needs of business development, determine the research objectives (2) According to the data collection, we choose the appropriate consumer behavior index system (3) Data preparation includes data selection and data cleaning, because there are so many consumer behaviors that can be studied, it is necessary to select the consumer behavior model according to the availability of data (4) According to different data mining objectives and the characteristics of data itself, different mining algorithms can be used and improved on the basis of the original algorithm (5) Analyze the data to prepare for the establishment of the model ( 6) Establish the model, analyze each index data in the model, and finally get the research results (7) According to the results of data mining, on the one hand, it is necessary to put forward specific strategies to guide accurate marketing, on the other hand, according to the feedback of consumers on the use of data mining results, continue to adjust and optimize data mining analysis methods

Problem Description.
is paper establishes a new user value model by modeling and learning the user history data of e-commerce enterprises, so as to accurately locate the target user group of the enterprise, which is helpful for enterprises to achieve the goal of precision marketing.
e traditional user value model is RFM model.While the three indicators of RFM model, R (user's recent consumption time), F (user's consumption frequency), and M (user's monetary) are not applicable in e-commerce industry.Moreover, there is a natural problem in the RFM model, that is, there may be correlation between F index and M index.If a user consumes more frequently, the sum of his consumption amount will be correspondingly high.However, in the traditional model, the importance of each indicator is the same, so the explanatory power of indicators in explaining user value is not strong.erefore, this paper establishes a new DQMC model.Among them: D is the difference between the last shopping and the average interval between shopping; Q is the purchase quantity of the user in the latest period of time; M is the average monetary of users in the latest period; C is the purchase conversion rate of users in the latest period of time.
In addition, the traditional RFM model divides the user into several categories manually and sets an interval for each category.If the value of a user falls into a certain category, the user belongs to a certain category.
is classification method is greatly influenced by human subjective factors.For the same data, different users may get different classification results, which is difficult to have a standard to judge whether the model is good.In view of the above shortcomings, this paper introduces clustering algorithm to classify users.
e clustering algorithm can avoid the influence of subjective factors on the results and has an intuitive and stable evaluation index for the model.e most important thing is that the stability of the model is strong, at is, for the same dataset, the same results can be obtained every time the model runs.e user classification process of e-commerce industry is shown in Figure 2: (1) Collect transaction data related to users in e-commerce industry, preprocess the data, remove the abnormal data in the dataset, and then standardize the data in the dataset after preprocessing.(2) Calculate the values of D, Q, M, and C by data, and the value matrix of users is formed.(3) Calculate the importance of each index in the matrix by principal component analysis as the weight value of the index.(4) e clustering algorithm is used to cluster the matrix, and the corresponding user categories are obtained.(5) Use the weight w 1 h t value calculated in the third step to calculate the value of each category and each user.
2 Mathematical Problems in Engineering (6) Use user value to verify clustering effect.(7) For each category, according to the characteristics of users, specific marketing strategies are proposed to achieve precision marketing.

Data Preprocessing.
ere are two kinds of datasets used in this paper, one is user comment dataset and the other is user behavior dataset.
e attribute values of these two datasets are described in Table 1.
According to the attributes of the dataset, the preprocessing steps are missing value processing, data standardization, and feature selection.

Missing Value Processing.
For this dataset, we mainly use two methods to deal with missing values.One is deletion, when the number of missing values of an attribute is less than 5%, we directly delete the missing values.e second is the mean interpolation method.If the number of missing values is too large, we will use the mean or mode interpolation for the attribute statistical indicator type.
e simple deletion method is convenient and effective and is suitable for most sample data.When the number of samples increases, the mean interpolation method can better ensure the authenticity of data.

Data Standardization.
Because some attributes in the dataset do not have obvious maximum and minimum values, it is difficult to ensure that the standardized dataset retains the information of the original dataset if the min max method is used.erefore, the normalization method is used to standardize the dataset.

Mathematical Problems in Engineering
Z-score method is the most commonly used, which is different from the min max-method above, and the original dataset must be normalized to zone [0, 1].However, the final result is not always in this range.Suppose there is a set of sequences x 1 , x 2 , x 3 , . . ., x n  , to normalize this sequence, the formula is as follows: After normalization, the mean value of the sequence becomes 0, and the variance is 1, so the purpose of removing data dimension is achieved.

Feature Selection.
In this paper, principal component analysis is used to calculate the weight of each index.Principal component analysis is a method to simplify datasets.It is impossible to obtain a series of linear correlation variables by linear transformation: (1) Centralize the raw data, which is to make each data point subtract the mean value of its category.(2) Solve the covariance matrix of the feature.If there are n features in the original data, the covariance matrix formed is an n-order matrix.For the four indicators D, Q, M, and C in this paper, the covariance matrix formed by these four indicators is as follows: where cov is the covariance, and if the two variables in brackets are the same, it is the variance of itself.(3) Eigenvalue decomposition.
In order to solve the eigenvalues and eigenvectors of the above covariance matrix, four eigenvalues will be obtained, and each eigenvalue corresponds to an indicator.e formula is as follows: where W represents a matrix composed of eigenvectors and λ represents the column vector composed of eigenvalues.(4) e corresponding eigenvalue of each index is taken as the weight of each index.

Value Matrix.
Customer segmentation is usually based on the interval, frequency, and amount of consumption behavior.However, in the same time period, the two variables of consumption frequency and total consumption amount have multicollinearity, so this paper transforms the total consumption amount into the average consumption amount.Since the consumption data comes from the same shopping mall, according to the consumer's membership ID, shopping mall backstage can get the average consumption amount through the summary of consumption records, and the data can be tracked for many times, which is conducive to the merchants to retrieve the average consumption amount in different time periods according to different needs.Moreover, more customer groups can be obtained by customer segmentation from the three dimensions of interval, frequency, and amount.erefore, two dimensions of consumption frequency and average consumption amount are selected to simplify the result of customer segmentation, so that all customers are dispersed in the two-dimensional matrix of customer value as shown in Figure 3.
Users are clustered according to the above four indexes.User index matrix constructed in this paper regards each row as a point in 4-dimensional space.Because the size of the dataset used in this paper is small, K-means algorithm is adopted after considering the running speed and the difficulty of model interpretation.
Before clustering, the value of each user is calculated as follows: where (w 1 , w 2 , w 3 , w 4 ) is the eigenvalue obtained by principal component analysis, and V is the value matrix of user.After clustering, the user value matrix of each category is calculated according to formula (4).

K-Means Clustering.
In order to solve the above problem of how to distinguish the types of customers, in this paper, based on the value matrix, the method of changing the initial center point in K-means clustering is proposed.e initial center point is limited in different regions, that is, different types of customers are located in the value matrix, and the categories of customers are distinguished by marking points.If customers are clustered into k class, assuming that there are a class in the abscissa and b class in the ordinate, a × b � k.Find the maximum coordinate (x max , y max ) and the minimum coordinate (x min , y min ).First, abscissa of the center points of different categories of regions can be obtained according to the following formula: (5)

Mathematical Problems in Engineering
Similarly, the obtained ordinates are as follows: erefore, the coordinates of the center point of clustering region k ij are as follows: As shown in Figure 4, assuming that k � 4, then a � 2 and b � 2. According to formula (7), four regional centers k 11 , k 12 , k 21 , and k 22 can be obtained.
en the initial center point of K-means algorithm is randomly generated in the corresponding region.is can not only ensure the effective implementation, not simply fix the initial clustering center point on the regional center point, make the clustering results more dynamic, but also distinguish different types of customers according to the positioning.
K-means clustering algorithm uses Euclidean distance to calculate the distance between each point and the center point for clustering: (1) Find the maximum value of all points (x max , y max ) and minimum (x min , y min ), initialize k points around k ij as cluster centroids (2) Each point is assigned to the corresponding cluster according to its Euclidean distance from the centroid (3) Calculate the average value in each cluster as the new cluster centroid, and return to step (2) (4) Judge whether the centroid has changed.If it does not change, the clustering has converged; otherwise, return to step (2) (5) Output the clustering results

Effectiveness of Clustering Algorithm
4.1.1.Evaluation Index.In order to verify the effectiveness of the algorithm in this paper, two standard test datasets, Breastcancer and Iris plants in UCI database, are selected for experiment, and the spectral clustering method commonly used in related research is used to compare with the K-means algorithm in this paper.Many literature have proposed internal quality evaluation and relative quality evaluation methods for clustering algorithm to evaluate unsupervised clustering results.However, the compactness and separation performance of these evaluation indicators are single, and the measurement effect is one-sided.erefore, it is necessary to evaluate the clustering results comprehensively by the overall index which can measure the compactness and separation at the same time.
Silhoustte coefficient s(i) considers both compactness and separation, and its calculation is shown in where a i represents the average distance between sample i and other samples in its class C. e smaller a i is, the more sample i should belong to this category.b i represents the minimum value in b ij of the average distance between sample i and all samples in a certain class C j .e larger b i is, the less sample I belongs to other classes.erefore, the value of s(i) is between −1 and 1, and the closer s(i) is to 1, the more reasonable the cluster is.

Analysis of Cluster Efficiency.
In order to classify e-commerce users more accurately, the clustering situation under different cluster numbers is discussed.e results are shown in Figure 5.
According to the elbow rule, when the number of categories increases, the average distortion degree will be lower and lower, but the improvement effect will be Mathematical Problems in Engineering smaller and smaller, so the obvious inflection point in the graph is the best value of cluster number.According to Figure 5, when the number of clusters is 5, there is an obvious inflection point.e clustering efficiency of the algorithm with different number of clusters is shown in Figures 6 and 7.
With the increase of the number of clusters, the running time of different datasets of K-means clustering algorithm increases linearly.After testing multiple groups of datasets, the processing time of improved clustering algorithm for 500 groups of data is maintained at about 0.3 seconds, and the contour coefficient is also maintained between 0.35 and 0.5.According to the definition of contour coefficient, the closer its value is to 1, the more reasonable the clustering is.When the data are divided into four categories, the contour coefficient is the largest.erefore, it is reasonable to divide users into four categories.

Comparison of Different
Algorithms. e operation of different algorithms is shown in Table 2.
It can be seen from Table 2 that first, in terms of clustering accuracy, the accuracy rates of K-means on two datasets are 85.4% and 89.3%, respectively, which have good clustering effect.e accuracy rates of spectral clustering are 66.7% and 90%, respectively, and the clustering effect is more unstable than K-means; second, the running time of Kmeans on the two datasets is 19.8 ms and 21.9 ms, while that of spectral clustering is 103.8 ms and 118.1 ms, respectively, which can be clearly seen that the running time of spectral clustering is more than five times that of K-means algorithm, and the clustering efficiency is low. is is mainly because spectral clustering algorithm involves the similarity matrix between data and cannot show better accuracy and clustering efficiency when the dataset dimension is high.However, the e-commerce platform has large transaction volume, large data scale, and complex dimensions, which requires high clustering efficiency.
erefore, it is more efficient to divide the customer behavior according to the needs of e-commerce.

Experimental Data.
e data used in this paper come from the sales data of a clothing store on Taobao platform from November 2020 to June 2021, totaling 57666 items.8.
It can be seen from the above table that for feature D, it describes whether the user is in the normal shopping time.When the value is negative, it means that the normal shopping cycle has been exceeded.e larger the value, the more normal purchase time remaining.According to the average value of feature D in the table, users in the five categories are currently in the normal shopping cycle, but the value of the fourth class of users is large, the value of the other three classes of users is small, and the minimum value of D feature can be seen that the first, the second, and the third classes all have the phenomenon of exceeding the normal shopping cycle, which shows that the fourth class is similar to the fifth category in terms of quality.
For feature Q, it describes the average number of users' purchases in the statistical period.From the average data of feature Q, we can see that the fourth category has the highest number of purchases, far exceeding the average value of the other three categories.erefore, the fourth class of users may be loyal users of the store.
For feature M, it describes the average purchase of users in the statistical period, which can be seen from the data in the table above that the values of the third and fourth categories are much larger than those of the other three categories, especially the average and minimum values.
For feature C, it describes the purchase conversion rate of users.From the data in the table above, it can be seen that the purchase conversion rate of the third and fourth types of users is high, which may be loyal users.

Precision Marketing Strategy
. By comparing the average value of each type of user index with the average value of all data indicators, it can be seen that for the first type of users, their purchase quantity and conversion rate are all higher than the average value, but the normal shopping cycle and consumption amount are far lower than the average value.is kind of users belong to potential users for stores, and they have no clear purpose in purchasing activities.In addition, this kind of users have insufficient loyalty to stores, so their shopping cycle and consumption amount are small.However, if the correct targeted marketing strategy is adopted for such users, there will be potential for them to become loyal users; for the second type of users, each index is lower than the average value of the index.is kind of users is of little value to the shops and may have lost.erefore, the first type of users is named lost customers; for the third type of users, except for the D index is low, the other indicators are basically above the average index.is kind of users is important to the store, but there is a certain risk of loss.
erefore, this type of users is defined as the focus to recover users; for the fourth type of users, the indicators are basically higher than the average value of indicators, indicating that such users are loyal to the store and are high-value users of the store.

Conclusion
is paper classifies e-commerce platform customers by data mining technology, calculates the value of each type of customers, and formulates different marketing strategies for different types of customers.A new user value model is established.
e improved K-means algorithm is used to cluster the purchase behavior of users, and the customer categories are distinguished by marking points.
e test results show that the algorithm is efficient, and when the data are divided into four categories, the contour coefficient is the largest, so it is reasonable to divide users into four categories.
rough the cluster analysis of consumer behavior, it can help the e-commerce platform to tap potential customers, maintain high-value customers, and analyze the reasons for customer churn, so as to formulate more accurate marketing strategies.
However, this paper included different categories of users for clustering, without specific statistics of the number of users in each category.is part of the research will be carried out in the follow-up work.

Data Availability
e dataset can be accessed upon request to the corresponding author.

Conflicts of Interest
e authors declare that they have no conflicts of interest.Mathematical Problems in Engineering

Figure 8 :
Figure 8: e value scores of various types of customers.

Table 2 :
Operation of the algorithm.