Deep Learning Based Purchase Forecasting for Food Producer-Retailer Team Merchandising

Expired foods turning into waste has always been an important issue. In Taiwan, more than 36,000 metric tons of unopened expired food, worthmore than $130million, are thrown away from retail stores as waste each year. Insufficient inventory results in the loss of business prospects for retailers, whilst excessive inventory results in abandoned merchandise. Foods with a short shelf life are particularly vulnerable. Typically, food producer and retailer would form team merchandising (MD). (e team MD mechanism is responsible for ensuring safety and quality, not for forecasting demand. (is study uses artificial neural networks (ANNs) to analyze sales data to forecasting purchase volume in response to changes in store environment, weather, events, and consumer attributes. (e study object is a sort of cream puff with a short shelf life cobranded by a retailer. According to the experimental results, the adopted proposed model in this study effectively reduces the error in purchasing; the mean-square percentage error (MSPE) of the forecast values is less than 6%.(e importance of this study is on promoting the teamMD’s green energy management capabilities in food production and verifiably achieving the goal of environmental sustainability.


Introduction
Retail stores are the last mile of consumers, and they have a significant impact on the economic momentum of a country's domestic demand. Foods with a short shelf life account for a major share of the products sold in retail stores. ese foods are unique and are decisive to the profit or loss of the stores. Insufficient inventory, for example, can result in the loss of commercial possibilities, whilst excessive inventory would result in rejected products. Changes in the environment, weather, events, and consumer qualities in a single retail location can all affect the sales of items with a short shelf life. If these foods expire, they cannot be sold or eaten, leading to the waste of food and energy consumed in the whole process of manufacturing.
is is contrary to our goal of environmental sustainability and energy conservation. Food waste is a global issue, and many developing countries have invested in research on food waste [1,2]. e foods with a short shelf life are usually cobranded by the retail stores and brands, and the manufacturers will join to form team MD to control food quality. However, in a team MD, the manufacturer is only responsible for controlling the food safety and quality [3] and has no role in the sales of the foods. erefore, the retail stores must do the sales analysis and forecast for foods with a short shelf life [4][5][6] to create the purchasing strategies from the manufacturer in the team MD. e retail stores face a dilemma when they make these plans. On the one hand, they must increase inventories as much as possible to secure revenue. On the other hand, they must avoid overstocking in order to avoid wasting food due to expiration. Several reasons may cause the foods with a short shelf life in retail stores to turn into discarded products. One, the purchased goods have expired before they are sold. Two, the stores plan the purchase solely based on monthly average sales data. ree, misestimating of consumer demand. erefore, it is critical for the retail stores to accurately forecast the sale quantity of foods with a short shelf life, determine purchasing strategies [7], and share information with producers in team MD to help them understand demand.
To solve the above problems, this study, from the standpoint of team MD, determines the appropriate retail locations and items with a limited shelf life before offering a forecast model for discussion. e suggested method efficiently reduces the actual value of purchasing mistake variations, provides retailers with a precise forecast of sale quantity, and provides feedback to manufacturers in the team MD, decreasing waste caused by overproduction. e rest of this article is organized as follows: Section 2 presents the literature review of purchase forecasting, Section 3 presents the forecasting method adopted in this study, Section 4 presents the experimental results and discussions, and the conclusion is in Section 5.

Literature Review
External factors such as changes in social formation and family structure, together with food producers' more mature food preservation technology, have resulted in an increase in both demand and supply of goods with a short shelf life in recent years [8,9]. In the past, research on retail sales forecasting focused on inventory management and estimating purchase demand. When the expiration date is factored in, the situation becomes more complicated. As a result, the importance of forecasting increases if the sales forecasting targets foods with a short shelf life. When it comes to sales forecast models, most retail stores rely on manual estimation to determine order quantity. According to Adebanjo and Mann [10], most food industry businesses continue to perform poorly in terms of sales forecasting. From the standpoint of the manufacturing industry, the manufacturing industry is dealing with the market's trend of globalization. Manufacturing firms place a high value on their market positions and go to great lengths to reduce production and product development costs. According to Porter [11], a company has a competitive advantage over its competitors when it can "create higher values for its customers" and when the revenue generated by creating such values exceeds the costs. When an enterprise contacts its suppliers, distributors, and customers, the C2B model [12,13] of the enterprise can connect the entire industry's value chain [14]. For its efficiency, this concept favors a system known as Just-In-Time (JIT) [15][16][17]. It is also known as the Toyota production system. Sohal et al. [18] believed that JIT can help reduce waste in all aspects of manufacturing. Willis and Houston [19] pointed out that the JIT method can shorten the delivery time, reduce the inventory and material costs, and improve the quality, reliability, and flexibility of delivery time. In their research, Lawrence and Lewis [20] discovered that just-in-time delivery of suppliers and supplier participation in improvement activities are important factors of JIT procurement technology, demonstrating the importance of the C2B model. Furthermore, Schonberger and Gilbert [21] believed that if a supplier provides ingredients and materials to a consumer, it is possible that the supplier only possesses the required inventory at the time of delivery in the JIT system. Based on the above findings, JIT can be regarded as a Collaborative Planning, Forecasting, and Replenishment (CPFR) method [22][23][24], which is a practical method used in supply chains.
CPFR can connect the processes of manufacturing order management, material forecasting, and replenishment, allowing partners' key materials to be efficiently collaborated with and used throughout the manufacturing process. Two early adopters of CPFR, Wal-Mart and Warner-Lambert [25], initiated an experiment based on this approach to jointly forecast the replenishment of Listerine mouthwash. By CPFR, they successfully reduced the floor stock from 98% to 87%, shortened the delivery time from 21 days to 11 days, and shortened the inventory days to less than two weeks. Sagar [26] believed that CPFR is a reasonable concept aimed at transforming the traditional supply chains into a series of C2B-driven processes. According to Simchi-Levi et al. [27], CFPR is an information and communication service based on digitalization and intellectualization that uses a joint forecast model to improve the efficiency of collaboration.
It is more important to achieve fast delivery and low inventory when selling goods with "short shelf lives." As a result, the most valued characteristics of integrated distribution channels and manufacturers are typically included in the form of team MD. However, forecasting consumer demand and providing consumers with fresh products that can be enjoyed instantly and quickly is an important responsibility of team MD. In Taiwanese retail stores, store managers typically shoulder this responsibility, despite that there are no appropriate technologies, tools, or research models for the retail store to forecast the demand for goods. When introducing the CPFR system to the team MD, the Voluntary Inter-Industry Commerce Standards (VICS) recommends that enterprises follow a five-step guideline: Evaluate the current situation of the enterprise and its partners, determine the goals and scope of CPFR practice, adjust procedures and technologies to ensure collaboration, implement plans through collaborative procedures and technologies, as well as evaluate the efficiency and consider follow-up activities. Although Schonberger and Gilbert [21] determined the input data and output results of each stage, those results are still unable to encourage the supply chain members to fully utilize their core competitiveness. e main points put forward by Venkatraman et al. [28] are also used to illustrate and evaluate the effectiveness of manufacturers [29].
In a relevant study that focuses on forecast models, Chen and Qu [30] proposed integrating grey relation analysis and multilayer function link network for the sales forecasting of the retail sector. Grey relation analysis can help select essential aspects that influence sales, while multilayer function link networks can provide prediction modes that are faster and produce more accurate results. Au et al. [31] proposed an evolutionary neural network-based sales forecasting system that exploits the connections of an incomplete evolutionary neural network to accelerate the convergence rate and produce more accurate results than traditional SARIMA. Sun et al. [32] forecasted fashion retail sales using a new extreme machine learning method based on artificial 2 Scientific Programming neural networks. It has not only produced better outcomes than artificial neural networks, but it has also dramatically boosted learning and convergence times. Aburto and Weber [33] forecasted sales using a hybrid intelligent system that included autoregressive integrated moving average model (ARIMA) and multilayer perception (MLP) neural network and improved the forecasting accuracy of a Chilean supermarket's restocking system. Forecasts are considered the foundation of plans. e results of forecasting are often used as a foundation for budgeting, production capacity, sales, production, inventory, manpower, and procurement. Forecasting accuracy is considerably more critical in a market with volatile demand. Sales forecasting, one of the data mining techniques, usually involves looking for useable trends or patterns from past historical data using various statistical or regression methods. ARIMA is a simple probability prediction method proposed by Lee and Tong [34] that has been widely used in the fields of finance, economics, and social sciences since its publication. For large data sets with linear distributions, the Box-Jenkins technique provides extraordinarily good forecasting capabilities [35]. In the meantime, artificial neural networks are formed by connecting multiple artificial neurons and imitating biological neural networks with the artificial neurons to create learning networks with high-speed computing capabilities, high memory capacity, learning skills, and fault tolerance. According to the research of Chen and Yao, artificial neural networks have much superior forecasting ability and efficiency than standard statistical methods [36]. In comparison to statistical models, an artificial neural network does not need to point out a specific functional form, and the data are not bound to a specific statistical distribution hypothesis; thus, it has a broader application field in dealing with difficulties [37].
is study proposes a forecast model for the purchasing strategy of a single good. It is hoped that in the future, the purchasing strategy of this good can be extended to the purchase forecasting of other goods in the related product classes. A product class is a group of products that are highly relevant to consumers and can be substituted for one another. However, the in-store space of a retail store is limited. A single product is the smallest unit of a product. When any single product is out of stock, the product class may not necessarily be out of stock. However, every good with a short shelf life is a part of the product class structure. When the purchase strategy for a single product with a short shelf life is very efficient, it is simple to build a purchasing strategy for the product class in the future, and this would be a direction for future research.

Forecasting Method
In this section, we will explain how to perform the preprocessing of data before the analysis and introduce the forecasting model used.

Data Collection and Preprocess.
Retail stores that are members of team MD are often large. It is significantly advantageous for stores of such sizes to undertake sales forecasting of goods and develop purchasing strategies. Usually, such retail stores are part of the franchising business. Due to limited resources, the scope of this research has been confined to a single franchise firm, and the analysis has been undertaken using deidentified data. First, by observing the locations of different retail stores, it was found that hot items among different groups of people are diverse in different locations (Table 1).
Next, retail store managers in commercial districts were questioned based on the study's research goal to explore the purchasing methods of items with a limited shelf life. e research object was a sort of cream puff that is cobranded by the retailer and a brand for the following reasons: (i) e item is a packaged dessert/food and is a cobranded product of a team MD (ii) e item is a hot-selling product with a high revenue share and hence is worthy of forecasting as an indicator (iii) In order to provide a better flavor, not many food additives are added to the cream puffs, and hence, the shelf life is short (iv) In the future, more products of the same product class will be launched; hence, the analysis model of this item is exemplary After determining the retail business and the item to be researched, we analyze physical sales data and relevant information available such as weather in the point-of-sale system (POS) of the retail stores and select applicable criteria for the investigation.
Although much data available for analysis, assessment is required to determine whether this type of data can be collected. Furthermore, among the collected data, not all factors have a high correlation with the target of the forecast. erefore, before data analysis, database normalization and dimensionality reduction with principal component analysis (PCA) [38][39][40] are required to retain the eigenvalues with higher correlations to the target of the forecast, take the eigenvalues as the variables, and obtain the dependent variable for sales forecasting.
Among the collectible data, the POS sales data and open data are the most accurate. erefore, the following variables ( Table 2) are chosen as the eigenvalues for the analysis of this study.
Data preprocessing is required for all eigenvalues in order to discover the factors with the highest correlations to the forecast aim. PCA is used to reduce dimensionality. In the correlation matrix, the eigenvalues with the highest correlation coefficients to the dependent variable are selected. In other words, the contribution of each eigenvalue (i.e., the degree of reducing the variations) is tested to determine whether the eigenvalue should remain in the model; if not, it will be removed.
During the date preprocessing with PCA, it was found that the following are the eigenvalues with high correlations: the sales of the previous day, the sales of the seventh day, whether there is a Buy One, Get One free offer, whether there is any discount for customers using mobile payment, Scientific Programming 3 whether it rains, and is it a weekend, indicating that their degrees of influence are more significant. Hence, Variable01, Variable07, Variable30, Variable31, Variable32, and Vari-able35 are chosen as the eigenvalue variables of the forecasting model.

Forecasting
Model. An artificial neural network is an artificial intelligence (AI) technique that is currently the most popular analysis. It is a deep learning method, commonly used in classification analysis and forecasting [41][42][43][44].
In contrast to statistical and regression models, it is acceptable if the input variables and output variables of an artificial neural network have nonlinear function relationships, which compensates for the shortcomings of linear regression analysis. e most representative artificial neural networks include convolutional neural network (CNN), recurrent neural network (RNN), and back-propagation neural network (BPN). CNN's AI technique is well-suited to image processing, but RNN excels at speech recognition. BPN has been used by many scholars for forecasting analysis; therefore, BPN is used as the research model in this study. A BPN, like any other deep learning method, includes at least an input layer, a hidden layer, and an output layer. Using the hidden layers, BPN converts the independent variables transferred from the input layer into a nonlinear function. Next, the input layer converts the nonlinear function of the hidden layers again. e above steps are repeated, and a forecasting model is finally obtained after repeated learning. Due to the existence of hidden layers, such a mode is called deep learning; the more the hidden layers, the deeper the learning depth. e steps used in this study are briefly described below. e output formula of the hidden layers in the training phase is as follows: where H h is output vector. We calculate the corrections of the hidden layers' weighted value ΔW_hy and bias Δθ_y and initial weighting values W_xh, W_hy. e formula of the hidden layers is given as follows: ΔW is W ij that mimics the strength of the connection between the i-th and j-th neurons. δ is the amount of difference between the processing unit connected to W and the  Variable02 is the single-day sales quantity of the second day before Variable03 is the single-day sales quantity of the third day before . . . Variable28 is the single-day sales quantity of the 28th day before Variable29 Whether there is a discount on the single item in the retail store Variable30 Whether there is a buy one, get one free offer in the retail store Variable31 Whether there is any discount for customers using mobile payment Variable32 Whether it rains Open data Variable33 Temperature Variable34 Humidity Variable35 Is it a weekend 4 Scientific Programming upper-level processing unit. e weighted value and the bias of the output layer are updated. e formula of the output layer is as follows: W hy hj � W hy hj + ΔW hy hj , θ y j � θ y j + Δθ y j .
e model is deemed trained after repeated calculations of formula (6) until convergence is achieved. e parameters include learning rate η, momentum α, the initial weighting values W_xh, W_hy, and initial deviation values θ_h, θ_y of the random number setting network, input training sample X and target output value T, as well as the inferred output value Y of the hidden layers and the output layer.
In the validation phase, the learning rate, momentum, the initial weighting values W_xh, W_hy, and initial deviation values θ_h, θ_y of the random number setting network, input training sample X and target output value T, as well as the inferred output value Y of the hidden layers and the output layer are the same as those of the training phase, but the calculation of the output (Formulas (7) and (8)) of the hidden layers and the inferred output value (Formulas (9) and (10)) of the output layer has minimized the error function of the network. Usually, the error function is used to fine tune the learning quality of the model.
Y j � f net j � 1 1 + exp − net j . (10) e convergence criterion in the experimental: meansquare error less than 0.05 or more than 1000 iterations.
In this study, the implementation results of the proposed forecasting model of the deep learning model are compared with the historical data and the error value of the forecasting results acts as the basis of the team MD retailers' purchasing strategies from manufacturers, ensuring that foods with a short shelf life will not be wasted, although the operational goals of the retailers can also be met.

Experimental Results and Discussions
Due to food safety concerns, fewer preservatives are added to several goods with a short shelf life, so that they are more favorable to the public. Keeping this in mind, large retail stores have launched foods of their own brands or cobranded products successively. e supplier/distributors of foods with short shelf life care most about ingredients, temperature, and hygiene quality controls during production and delivery. To ensure the uniformity of food safety, the operators would usually form team MD. In the actual operation of the distribution side, the purchasing strategies of foods with a short shelf life are crucial. To ensure food safety, it is determined that these foods can only have a short shelf life. Whenever the foods are expired, they should be discarded, and this causes waste at the same time. On the contrary, if a store purchases insufficient goods, the consumers will not be able to buy the goods they want in the store, so business opportunities will be lost. An important responsibility of team MD is to ensure safety and quality but not to assist in forecasting consumption. erefore, the formulation of purchasing strategies should take sales forecasting into account, and the responsibility of sales forecasting falls on the retail store operators. If a retail store can forecast the sales of foods with a short shelf life accurately, it can prevent the waste of unsold foods on the one hand and prevent the manufacturer from overproducing the foods on the other hand (by sharing the information with the manufacturer). e significance lies in enhancing the sustainable operation ability of team MD in the field of green energy in food production.
We selected a large convenience store chain in Taiwan as the research object. e convenience store chain cooperates with a brand to produce and sell strawberry cream puffs. Strawberry cream puffs are a type of food with a short shelf life that accounts for a relatively high proportion of sales and has the characteristics of short shelf lives and a high frequency of purchasing. As a result, manually determining the difference between production and sales data is challenging. e data collected in this study is are the data from the sales points of the retail stores, and there are a total of 178 pieces of data; of which, 160 pieces are utilized for training and 18 pieces are used for verification. e data include open data such as the sales data of the previous 30 days, temperature and humidity, and the discount offers in PoS.
After the data processing with dimensionality reduction, it is found that sales of the previous day (variable01), sales of the seventh day before (variable07), whether there is a Buy One, Get One free offer for the item (variable30), whether there is any discount for customers using mobile payment (variable31), whether it rains (variable32), and is it a weekend (variable35) are the variables with high correlations. We used SAS software and used Stepwise Selection to filter the variables. e selection of 6 variables was the same as we used PCA to extract 6 primary factors, and because there is no homogeneous literature, we decide used 6 primary factors as the eigenvalues in experimental. When the p value reaches 0.500 significant level, then this variable is entered into the model; when the p-value reaches 0.0500 significant level, then this variable stays in the model, and the variables are selected accordingly. e analysis values are listed in Table 3.
Taking mean absolute percentage error (MAPE) and mean-square percentage error (MSPE) for the forecasting error comparison, the absolute error value of MAPE should be between 15% and 18%, whereas the absolute error value of MSPE should be between 4% and 6%. In this study, the performance of the artificial neural network sales forecasting model is better. Table 4 compares the forecasting errors of the artificial neural network forecasting model and the Scientific Programming regression sales forecasting model, and it clearly shows that the overall error of the error indicators is lower in the artificial neural network forecasting model. Lewis proposed a set of criteria for identifying the predictive power of models based on MAPE values in 1982, which can be classified into four different levels, with Lewis noting that less than 10% is a very accurate result and between 10% and 20% is good. is is the goal of our study. erefore, the artificial neural network forecasting model can be used in the context of foods with a short shelf life sold by team MD.
Form Figure 1 below, it can be seen that based on the 18day sales forecast conducted with the mode of this study, the results of the neural network forecasting model are similar to the sales record of the retail stores, indicating that this method can provide effective information for purchasing strategies. During the process, the data reveal that there are significant disparities between the forecasts and the reality on December 25, January 6, and January 8. After observing the sales data and the eigenvalues with higher correlations, it is found that although December 25 was a Wednesday, it was a national holiday in Taiwan, so the actual sales quantity was higher. is eigenvalue, however, was not included in the 178 pieces of data of this study, resulting in a forecast error. It was also observed that although the next Wednesday (January 1) was a national holiday, there was no significant forecast error between the predicted value of this study and the actual sales is is because the eigenvalues of this study include "sales of the seventh day before (Variable07)" after dimensionality reduction. As December 25 was also a national holiday seven days before January 1, the forecast is quite accurate. Similarly, there was no national holiday the following week on January 8, which was also a Wednesday. However, higher forecast error still appears in the forecasting method of this study, while another reason is that seven days before January 8, was January 1, and it was a national holiday.
In terms of different data comparisons, the forecast model should effectively reduce the difference in the number of purchase orders on the day. Two sets of data were compared in this study. e first set is the ratio of the difference between the number of purchase orders and the sales on the day, and the second is the ratio of the difference between the results of the forecasting method and the sales, as shown below. e data in Figure 2 demonstrate that the disparities between the number of purchase orders and the sales were all bigger on most days than the differences between the forecast method and the sales except on December 25 and January 1. In the process, a curve closer to 0 percent represents data that is closer to the quantity that should be purchased, and it is better suited to serving as the basis for purchase forecasting. e experimental data displayed in the above figure not only demonstrate that the suggested forecasting method is practical but also indicates that adding the eigenvalue variable of whether it is a holiday improves the forecast model's accuracy. In other words, the method proposed by this study flattens the curve and can stabilize the actual value of the fluctuation of the purchase error, thereby decreasing the purchase errors.
is study analyzed and discussed the results from the perspective of team MD and found that when the actual purchase quantity recorded on the POS system of the retail stores is compared with the forecasted number of this model, the model proposed by this study will take into consideration the eigenvalues of the previous day, the seventh day before, as well as temperature and humidity, so that the purchasing strategies suggested by this study is superior to the actual number of purchases. However, the results show that there will be a larger inaccuracy when it comes to national holidays because they are not considered an influencing element. Overall, the suggested method may successfully reduce the actual value of purchasing error variations, provide retailers with an accurate projection of sale volumes, and provide feedback to manufacturers in team MD, reducing the waste caused by overproduction.

Conclusion
e retail and service sectors are important industries of a country's domestic market and are the lifeline of the economy.
erefore, more mature retail and service industries indicate a country's progress and development. If the retail sector pays greater attention to the issue of environmental protection, the country can have a more positive image. On the other hand, the output value of the retail sector is larger than that of other industries. If the sector can achieve success in the field of sustainable development of the green environment, it will be of considerable help for the sustainable development of a greener environment.
is study proposes a strategy model for green sustainable management and focuses the discussion on team MD formed by the retail sector and food manufacturers.
Analysis and forecast have been conducted on the retailer's side, and the purchasing strategies were adjusted and shared with the manufacturer to produce the appropriate quantity of foods, avoid overpurchase that leads to waste, and, at the same time, prevent the loss of business opportunities due to insufficient purchase. When choosing the product, we visited the physical retail stores and chose a type of cobranded cream puffs with short shelf life and high revenue ratio as the research object. e 35 eigenvalues were reduced to six significant influencing factors, which served as the variables in the forecasting model, and were analyzed using the artificial neural network. e MSPE value of the forecast values of the retail store item is less than 6%. e ratio of the difference between the number of purchase orders and the sales on the day, and the ratio of the difference between the results of the forecasting method and the sales were also 12    Scientific Programming compared. e method used in this study effectively reduces the error fluctuations in purchasing, thereby reducing the error in purchasing. Although there are many factors influencing retail sales, and the data are hard to access or mostly unavailable, thus this study used a type of cobranded cream puffs with short shelf life and high revenue ratio as the research object. In addition, to obtain forecast results that are useful, the conclusions can also be used for the management of other food items in similar product classes and can enhance the sustainable operation ability of team MD in the field of green energy in food production.

Data Availability
e data used to support the findings of this study have not been made available because the data are stored in data warehouse on the corporate intranet.

Conflicts of Interest
e authors declare that they have no conflicts of interest.