Evaluation and Analysis of Intelligent Logistics Distribution Using the Expectation-Maximization Algorithm Calculation Model

The purpose of this article is to solve the problem that the accuracy of logistics distribution path planning is affected by the lack of data in the process of traditional logistics distribution planning and management. This exploration innovatively applies an effective data addition algorithm expectation-maximization (EM) algorithm to the intelligent logistics distribution system to improve logistics distribution’s overall efficiency and management quality. First, the concept of intelligent logistics and the composition and main functions of the intelligent logistics system are introduced. Then, the core idea of the EM algorithm and its applications in intelligent logistics are described. The logistics distribution of a chain company is taken as an example. Finally, the advantages and disadvantages of the intelligent logistics system based on the EM algorithm are compared with those of the traditional intelligent logistics systems based on variable neighborhood search (VNS), Tabu search (TS), and ant colony optimization (ACO). The performance test results show that the EM algorithm’s optimal solution times are 7 times. Its convergence speed is slightly lower than that of the ACO, but there is no obvious difference. The intelligent logistics distribution system based on the EM algorithm has faster order processing speed and higher efficiency in the actual case application. The average processing time of each order is 1.78min, which is 0.237min less than that of VNS and only 0.022min more than that of ACO. It reveals that the intelligent logistics distribution system based on the EM algorithm is more efficient. The study provides a new idea for the efficient distribution of enterprise logistics.


Introduction
With the upgrading of Internet technology and the vigorous development of the logistics industry, a new logistics management mode is favored by more and more logistics enterprises, that is, intelligent logistics. It is a new logistics operation mode promoted by the development of science and technology and e-commerce. It adds an "intelligent system" based on traditional logistics, which can ensure the e cient operation of logistics and reduce the operation cost [1].
Ding et al. pointed out that intelligent logistics was an e ective way to deal with the challenges of rapidly changing customer expectations, seize the opportunities brought by new technologies, and promote the development of new business models [2]. With the continuous innovation of sensing technology, communication technology, and computer technology, the Internet of things technology has also been applied to more infrastructure such as environmental monitoring, biomedicine, and intelligent wear. Song et al. emphasized that the Internet of things could create a data ocean with the assistance of various mathematical analysis technologies and explore the complex relationship among transactions represented by these data. ese characteristics help to promote the development of intelligent logistics [3]. Humayun et al. proposed a layered framework based on Internet of things and blockchain for intelligent logistics, providing intelligent logistics and transportation systems. e advantages of the Internet of things and blockchain in logistics and transportation were highlighted through real case studies [4]. e most important thing in the intelligent logistics system is to improve distribution e ciency, so distribution path planning is quite crucial. e main algorithms used in logistics distribution planning are variable neighborhood search (VNS), Tabu search (TS), and ant colony optimization (ACO). Li et al. transformed the logistics scheduling problem into a mixed-integer linear programming problem, proposed a special coding method suitable for small-scale problems, and used the variable neighborhood search algorithm framework to generate the approximate optimal solution of the problem. e important parameters were calibrated through experiments and the algorithm's robustness was analyzed. Experimental results show that the algorithm is effective [5]. Temucin and Tuzkaya minimized the total logistics cost and total delay and maximized the total average capacity utilization through the metaheuristic method based on TS. Numerical analysis shows the effectiveness of the proposed method [6]. Calabrò et al. adopted an ACO algorithm to solve the vehicle routing problem of inbound logistics. e effectiveness of this method in cost reduction and scheduling was verified by actual data, which provided useful suggestions for the largescale operation of freight services [7].
In conclusion, it reveals that worldwide research on intelligent logistics and distribution path has shown significant breakthroughs and achievements, but most of these problems and research results are confined to the form of traditional logistics.In the traditional logistics system, there may be missing data in the observation data due to observation conditions, instrument faults, human errors, accidents, improper downloading and uploading, improper storage process, and other factors. Various logistics distribution planning algorithms are highly dependent on data, and the amount and quality of data directly determine the algorithm's accuracy. Expectation-maximization (EM) is an effective data addition algorithm, which can provide important support for solving the problem of data missing in logistics management and improving the quality of logistics distribution management. Based on this, first, the relevant theories of intelligent logistics distribution and EM algorithm are summarized. Next, the innovative logistics algorithm based on EM is put forward. Finally, the effectiveness of this method is verified by an actual case.
is exploration provides important support for consolidating the market position of logistics enterprises, enhancing their core competitiveness, and enhancing the brand influence.

Intelligent Logistics Distribution and
Application of the EM Algorithm 2.1. Intelligent Logistics Distribution. Intelligent logistics distribution is to use integrated intelligent technologies such as RFID and sensors to make the logistics distribution system have the ability to think like people, solve some basic problems in the process of distribution, and ensure the normal progress of distribution. It means the ability to analyze and make decisions according to the relevant information provided by the logistics information platform. Intelligent logistics emphasizes the management of logistics activities with the help of dynamic information of machinery and equipment and the Internet. Intelligent logistics matches informatization and thing-thing interconnection. Logistics distribution is more automatic, advanced, and intelligent through more refined management. Its purpose is to make the operation activities of logistics more efficient, use resources more fully, and enrich the forms of value expression to improve the overall level of the logistics industry [8][9][10]. Figure 1 shows its main distribution functions.
In Figure 1, the first is the stocking and out-of-warehouse function. e intelligent logistics uses an intelligent storage system to realize the automatic picking and stock out of goods, which avoids the error of manual operation and improves efficiency. e second is the delivery function. e global positioning system is used to realize the real-time monitoring of vehicles and goods. e third is the delivery terminal service function. On the basis of fulfilling their basic responsibilities, enterprises also need to provide convenient value-added services for customers to enhance their goodwill toward enterprises. e last is the information processing function. e information is collected and processed through the computer terminal, and big data are adopted to analyze customer preferences and push relevant demand information. Figure 2 shows the flow of the intelligent logistics distribution management system. Figure 2 shows that the intelligent logistics distribution management system is a management platform serving logistics distribution enterprises based on the geographic information system (GIS), global positioning system (GPS), and Internet of things. Its functions include real-time monitoring, two-way communication, vehicle scheduling, real-time information query of goods, and planning distribution route. First, mobile phones and fixed-line broadband receive real-time information through the Internet, optical fiber network, and wireless modem. en, the command and control center obtains the position, speed, and cargo information of the vehicle through the GPS satellite; the internal distribution system replans the route, and then, the global system for mobile communications' (GSM) wireless communication network transmits the scheme to the on-board GPS terminal. Finally, the GPS terminal will intelligently prompt the next driving path of the vehicle [11][12][13]. Figure 3 shows the main functions of the intelligent logistics scheduling platform. Figure 3 suggests that the intelligent logistics scheduling platform has the following functions, such as the graphical display of goods information; order information; road information and network nodes; the distribution of customer points and the attributes; number and order number of goods required; the online query of vehicle speed and location, environment, batch, and quantity of goods; and the start and end time and sequence number of services received by each customer. Moreover, the platform can also calculate and display the optimization path according to the goods information clicked by a given customer, feedback real-time road conditions, collect real-time traffic information and capture orders, and collect other information related to travel vehicles.

EM Algorithm.
e EM algorithm is a data addition algorithm. It mainly adds some potential data based on observed data sequence and makes up the lost data through mathematical methods to change incomplete data into complete data. It means that realizing data conversion processing is the biggest feature of the EM algorithm [14].

2.2.1.
Common EM Acceleration Algorithms. Parameter-expanded expectation-maximization (PX-EM) algorithm: PX-EM accelerates the convergence speed of the algorithm. e original model can be obtained again by integrating the added parameter α and appropriate functions into a large-scale model. is process should be realized by selecting a special value α 0 of α [15]. In the original function sequence, if the parameter θ needs to be estimated, the matching model parameter is φ � (θ * , α). θ * and θ have the same dimensions. For a known transformation R, e parameters of the model are adjusted so that the information about α cannot exist in the observed data Y, that is, (1) where f x is the density function of the corresponding extended model under any α ′ , and the complete data Y � Y 0 can be screened out. e PX-EM algorithm is a simple improvement of the EM algorithm, which is realized by improving the t-th iteration: e PX-E step is calculated as follows: e PX-M step is calculated as follows: Each iteration of PX-EM will increase the value of f x � (Y 0 |θ, α), and its convergence property is consistent with the standard.

Data Loss Mode and Mechanism.
At present, some methods to deal with missing data are limited to some specific patterns, which have some limitations. erefore, it is necessary to understand the missing mode of data, which is generally divided into a single value missing mode and arbitrary missing mode. If all missing values are the same attribute, this mode is a single value missing mode. is situation is relatively simple, but it is rare in most complex data. If the missing values belong to different attributes, the data filling method of this missing type is complex, and specific problems need to be analyzed, which is considered as the arbitrary missing mode. Generally, missing data are divided into three categories: missing at random, missing completely at random, and missing not at random. Figure 4 shows a specific description.

Missing Data Processing Methods Commonly Used in the Measurement
(1) Filling Method: According to the auxiliary information or potential information, some mathematical method is adopted to determine the reasonable estimated value to replace the missing value of the data to make the data more complete.
en, the whole dataset is processed by conventional methods. e filling method should be different according to different data information. e filling method can generally be divided into two categories according to the different number of missing values to construct estimates. One is the single imputation and the other is the multiple imputation [16]. Figure 5 shows the common types of single imputation.
Multiple imputation is developed based on single imputation, using a series of possible values to replace each missing value. It mainly finds the data law and randomly generates a value that can replace the missing data, rather than replacing the missing value with a single value. e estimation of uncertain information is more accurate than single imputation through the parameter distribution and connection variables between missing data. e multiple imputation method has high reliability, fully considers the uncertain information contained in the missing data, and greatly reduces the amount of data calculation. erefore, it has become the most widely used data filling method at present. According to the mode and variable type, the multiple imputation method can use trend scoring, random  Missing data are only related to complete variables.
No systematic differences with nonmissing data.
Missing data are only related to incomplete variables.
Missing at random Missing not at random Missing completely at random regression filling, and Markov chain Monte Carlo (MCMC) models to fill the missing data [17][18][19]. e trend scoring method mainly uses the self-service method to fill the missing value of each group of data and then divides the observation data into several subsequences. e processing of incomplete data is carried out according to the following steps: e first is to fit the logistic regression model equation: e trend score for each missing data on variable X i is calculated. According to the trend score, the observation data are grouped according to the fixed number of groups and the number of groups is determined according to the number of observation measurements. Finally, the missing data in each group are estimated and filled by the approximate Bayesian method. e above steps are repeated until X i is filled.
MCMC is a method based on the Bayesian inference. is method has two-step cycles, namely, filling and a posteriori.
e data are corrected in real time and updated to fill the missing data [20]. For missing information, the posterior probability density of the parameter is as follows: Equation (5) is the posterior probability density under complete observation data. e posterior probability y obs of the missing data can be obtained only when the observed data y obs are completely filled. Similarly, the posterior probability p(θ|y obs ) of observation data cannot be obtained directly. Only after inference and simulation can the incomplete observation data be supplemented and then be estimated.
According to the stationary distribution θ and p(Y mis |Y obs , θ (t) ), the missing data filling value is as follows: Y and θ are irrelevant. Based on each missing data, the filling value is calculated as follows: Random regression filling method: there is a linear regression relationship among observation data (Y 1 , Y 2 , . . . , Y 3 ). If observation data are missing, the fitting model is as follows: is the coefficient of the regression model. e final filling value is calculated as follows: where σ * represents the variance of the regression model, ε is the error vector in the normal random state, and β * represents the filled regression coefficient after n times of  replacement. e random regression filling method reflects the uncertainty of missing data and filling value by adding an additional residual term subject to normal distribution or other distribution.
(2) Gray Model Method: It is to research and analyze with time series and establishes the equation with the sequence of number, that is, the model composed of a single variable first-order differential equation. It is a prediction method for modeling after transforming the original irregular sequence into a more regular generated sequence [21]. e data processing steps are as follows: First, the order ratio of the original sequence x 0 � x 0 (t)|t � 1, 2, . . . , n is calculated as follows: whether σ(k) falls into the tolerance interval is judged as the basis for modeling; Next, the level ratio test data are transformed. For the sequence that cannot pass the order ratio test, the data need to be processed by translation transformation, logarithmic transformation, square root transformation, and other related transformations. en, a cumulative transformation is conducted on the qualified data to generate a new sequence as follows: where x 1 refers to the calculation of the one-time accumulation sequence. en, the mean sequence z 1 (k) � 0.5(x 1 (k) + x 1 (k − 1)) of x ′ (t) is calculated. Intermediate parameters are calculated as follows: e obtained intermediate parameters are used to calculate the model parameters: e final model is as follows: where x 0 (1) is the calculated value of the original sequence model. e summation principle is as follows. e first data of the original sequence are the first data of the generated column. e sum of the first and second raw data is the second data of the generated column. e sum of the third data of the original sequence and the second data of the generated column are the third data of the generated column. According to this rule, the new generated column can be obtained. e accumulated restored sequence data are as follows:

Logistics Distribution
Steps. Figure 6 shows a complete logistics distribution process. e first step is to divide the basic delivery area. First, the customer location is systematically analyzed and divided into regions, and then each customer is assigned to the delivery area to make basic preparations for the later distribution decision-making. e second step is vehicle stowage. Due to the different attributes of distribution goods, in order to ensure the safe distribution of goods and improve the distribution efficiency, it is necessary to distinguish the goods with different attributes before distribution. In this way, the distribution methods and tools can be quickly and accurately determined after receiving the order. e third step is to arrange vehicles. e company needs to determine the type and tonnage of distribution vehicles. e fourth step is to determine the delivery order. e fifth step is to choose the distribution route. e delivery time shall be determined according to the actual factors, such as the geographical location of customers; the traffic conditions during the delivery, the route with the shortest distance and the lowest cost; and the special requirements of some customers or actual environment on the delivery time, model, and order when necessary. e last step is to deliver the goods to the customer.

Case Analysis.
e distribution system between nine branches and the distribution center of a chain company is selected as an example to discuss the application effect of the EM algorithm. Considering the limitation of the hard time window, if the area division is too large, there will be order points that exceed the specified time and cannot complete the service. It has become the resistance of the team to complete the task on time and increased the company's overall operating cost [22]. Besides, it is stipulated that the fleet dispatched by the company shall complete the distribution of each area. Vehicles cannot be distributed across regions, the time window requirements should be strictly followed, and rejection is not allowed except under special circumstances. In addition, it is assumed that the driving speed of vehicles is not lower than the average driving speed of the road, the status, position, and speed of these vehicles are monitored in real-time by the central distribution system. e real-time information of any vehicle can be mastered at any time and the adjustment planning of the vehicle path can be completed at any time. Figure 7 shows the distribution of specific branches.
In Figure 7, the distribution center is set to 0, and the nine branches are set from 1 to 9.

Algorithm Comparison.
As mentioned above, at present, the most commonly used algorithms in logistics distribution mainly include VNS, TS, and ACO. In order to prove the feasibility of the proposed algorithm, the three algorithms are compared with the EM algorithm introduced.
e comparative experiments are carried out without changing the parameters in convergence performance, path length, vehicle time consumption, and cost saving.
For the VNS algorithm, the initial tabu length is set to 1, the maximum interval of average solution repetition is 1, the proportion of Tabu length increase is 1.1, the proportion of Tabu length decrease is 0.9, the maximum number of iterations is 300, the maximum number of solution repetitions is 3, and the maximum number of solutions in tabu table is 6. For the TS algorithm, the initial Tabu table length is set to 4 and the neighborhood size is 10, which is gradually adjusted according to the search process. If the feasible solution is not improved after the number of moves reaches a certain level, it is possible to produce local cycles. At this time, it is necessary to increase the length of the taboo table. If all movements are prohibited, the neighborhood size needs to be increased. For the ACO algorithm, the number of ants is 30, the pheromone volatilization coefficient is 0.3, and the amount of information released by ants after completing one cycle is 50. e condition for stopping the cycle is that the difference between the optimal solutions obtained from two adjacent cycles is less than 0.01, the value of the parameter heuristic factor α is 1, and the expected heuristic factor β is 3.

Algorithm Running Environment.
e algorithm is realized by MATLAB simulation software. Table 1 shows the operating environment.

Basic Statistics of Logistics Distribution.
e average speed and real-time vehicle speed of different road types among all branches are obtained through measurement. Figure 8 presents the specific results: Figure 8 shows that the average speed of class I road is the highest, which is 60 km/h, and the real-time vehicle speed is 45 km/h. e main sections include 3-5 and 4-0-7. e average speed of the class II road is 40 km/h and the realtime vehicle speed is 30 km/h. e main sections include 0-1, 1-9, and 5-7. All other sections are class III roads. Figure 9 shows the distance among branch stores (km) and the demand for goods (ton): Figure 9 displays that the branch with the farthest distribution distance is store 3, followed by store 6. Store 7 needs the most goods, followed by store 1. It suggests that there is no linear relationship between the distance between each store and the distribution center and its demand. When designing the vehicle route, priority should be given to store 3 and store 6, which are farthest away, and store 7 and store 1, which are in greatest demand.

Performance Comparison of Four Algorithms.
Each algorithm is tested independently 30 times. eir optimal solution, the worst solution, and the frequency of the optimal solution are solved, respectively, and the convergence results of each algorithm are counted. Figure 10 displays the experimental results: Figure 10 shows that the results of the optimal solution and the worst solution of the four algorithms are basically the same under the same scale. e index with the large difference is the number of times to obtain the optimal solution. e optimal solution obtained by the VNS algorithm is only three times. e TS and ACO get the same number of optimal solutions, 6 times. e EM algorithm is the best and the number of times to get the optimal solution is 7.
12 times of optimal path operation are conducted on these four algorithms. Figure 11 shows the calculation results: Figure 11 shows that the optimal solution of the obtained path is 67.5, in which the VNS does not obtain the optimal path, and the TS obtains the optimal path at the third time. ACO calculates 5 times to obtain the optimal path. e EM algorithm obtains the optimal path in the second operation, and in the latter operation, the number of times to obtain the optimal path is the most.

Comparison of Application Examples of Four Algorithms.
Without changing the parameter setting, the use time (min) and the use of vehicles of the four algorithms are compared when the number of transportation branch stores is 3, 6, and 12, respectively. Figure 12 displays the specific results. e average number of orders that can be processed by one vehicle in the four methods is 12.963, 15, 15, and 21.17. Figure 13 displays that the average time-consuming of each order processed by the four methods is 2.017 min, 1.80 min, 1.658 min, and 1.78 min, respectively. e efficiency of the VNS algorithm is the lowest. Although the average time consumption per order of the EM algorithm is not the least, the EM algorithm is the most efficient algorithm.   Finally, the advantages and disadvantages of the four algorithms are considered from the evaluation factors such as the saved distribution distance, the spent fuel cost and the comprehensive cost of distribution. Figure 13 shows the specific results.
As shown in Figure 13, considering the fuel cost, EM algorithm saves the most, which is 3.17 yuan. Considering the comprehensive cost of distribution, the optimized design of the other three algorithms reduces the comprehensive cost of the original scheme to varying degrees. Among them, EM algorithm saves the most cost, which is 7.81 yuan.

Results and Discussion.
e EM algorithm is compared with VNS, TS, and ACO algorithms in terms of convergence performance, path length, vehicle time consumption, and cost saving.
e results are as follows: (1) the optimal solution obtained by the VNS algorithm is only three times, indicating that the algorithm is very easy to converge to the local minimum solution. e TS algorithm and ACO algorithm get the same number of optimal solutions, both of which are 6 times. e EM algorithm is the best, and the number of times to get the optimal solution is 7. e convergence shows that the ACO algorithm has the fastest convergence speed, followed by the EM algorithm, and the VNS algorithm has the slowest convergence speed. (2) e VNS algorithm does not get the optimal path. e TS algorithm gets the optimal path at the third time, but after more operations, the number of times to get the optimal path is less. e ACO algorithm obtains the optimal path after 5 operations, the EM algorithm obtains the optimal path in the second operation, and in the later operation, the number of times to obtain the optimal path is the most. erefore, the effect of EM algorithm is better. (3) e average time consumption of each order processed by the four methods are 2.017 min, 1.80 min, 1.658 min, and 1.78 min, respectively. e efficiency of the VNS algorithm is the lowest. Although the average time consumption per order of the EM algorithm is not the least, but taken together, the EM algorithm is the most efficient algorithm. (4) Considering the fuel cost, the EM algorithm can reach the optimal path faster, and its cost saving is the most, which is 3.17 yuan. e optimized design of the distribution route reduces the company's fuel cost, energy consumption, exhaust emission, and environmental pollution. Considering the comprehensive cost of distribution, the comprehensive cost of the TS algorithm is basically the same as that of the original scheme. After the optimization design of the other three algorithms, the comprehensive cost is reduced to varying degrees compared with the original scheme. Among them, the EM algorithm saves the most cost, which is 7.81 yuan.
Combined with the actual situation, the company should not only consider one factor in choosing the distribution  scheme, but comprehensively measure the factors such as vehicles, manpower, distance, and fuel cost. Hence, the comprehensive cost should be the basis for the company to decide the distribution scheme. If the operation time allows, the result obtained by EM algorithm is the best, but the convergence speed of this algorithm is slightly slower than that of ACO algorithm, but there is no significant difference.

Conclusion
In the traditional logistics distribution management system, there may be missing data in the observation data due to observation conditions, instrument failure, human error, accidents, improper download and upload, improper storage process, and other factors. It will have a certain adverse impact on the overall quality of logistics planning and management. Based on this, first, the relevant theories of intelligent logistics distribution and EM algorithm are summarized. Next, the intelligent logistics distribution scheme based on the EM algorithm is proposed. Finally, the algorithm widely used in the research of logistics distribution planning is selected as the control and the effectiveness of this method is verified by an actual case. e results show that the convergence speed of this method is the fastest. Its speed and times of obtaining the optimal path are the highest. e overall efficiency is the highest when processing orders. e fuel cost and comprehensive cost saved are the most. It can be concluded that under the condition of allowable operation time, the result obtained by the EM algorithm is the best. e research deficiency is that the problems of road condition information and traffic rules and regulations are ignored in the case study. erefore, the follow-up research should also make reasonable improvement in combination with the specific situation to further enhance the advantages of the algorithm. Applying this method to the actual logistics distribution management system can make a certain contribution to improve the quality of enterprise logistics distribution management and enhance the core competitiveness.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.