Intelligent Sales Prediction for Pharmaceutical Distribution Companies : A Data Mining Based Approach

One of the problems of pharmaceutical distribution companies (PDCs) is how to control inventory levels in order to prevent costs of excessive inventory and to prevent losing customers due to drug shortage. Consequently, the purpose of this study is to propose a novelmethod to forecast sales of PDCs.The presentedmethod is a combination of network analysis tools and time series forecasting methods. Due to lack of enough past sales records of each drug, an explorative network based analysis is conducted to find clique sets and groupmembers and to use comembers’ sales data in their sales prediction. Afterwards, time series sales forecasting models were built with three different approaches includingARIMAmethodology, neural network, and an advanced hybrid neural network approach. The offered hybrid method by applying each drug and its comembers past records facilitates capturing both linear and nonlinear patterns of sales accurately. The performance of the proposed method was evaluated by a real dataset provided by one of the leading PDCs in Iran. The results indicated that the proposed method is able to cope with low number of past records while it forecasts medicines sales accurately.


Introduction
Precise sales prediction is an essential and inexpensive way for each company to augment their profits, decrease their costs, and achieve greater flexibility to changes.In other words, exact sales forecasting is utilized for capturing the tradeoff between customer demand satisfaction and inventory costs [1].Especially, for the pharmaceutical industry, successful sales forecasting systems can be very beneficial, due to the short shelf-life of many pharmaceutical products and the importance of the product quality which is closely related to the human health [2].
Actually, PDCs are facing several challenges, including huge amount of inventory, increased competition, and tough regulations that limit advertising.They have to meet their customers' needs by delivering the right amount of medicines to the right place and at the right time.In PDCs, both shortage and surplus of goods can lead to loss of income for these companies.Accordingly, one of the problems in PDCs is how much quantity of each drug should be kept in the inventory.
Distribution of pharmaceutical products in Iran is mainly by independent wholesalers, who buy stock for their own account from manufacturers and sell to their customers (mainly hospitals and pharmacies).Distribution of pharmaceutical products is done either by full-line or short-line wholesalers.Full-line wholesalers distribute the full range of available pharmaceutical products and they have some central depots in big provinces, while short-line wholesalers trade in a selection of products only.In Figure 1, a schematic of pharmaceutical distribution channel in Iran is presented.As it is demonstrated in Figure 1, pharmaceutical manufacturers regularly purchase required raw material and product packages from foreign suppliers.After producing, they sell their finished products to distributors through face-to-face visit, web ordering, or teleordering.However, in some cases, distribution companies belong to the manufacturers; accordingly, they directly distribute their own products, which are shown by a direct link from manufacturer to distributors in Figure 1.Furthermore, importers typically sell the imported products to distributors through face-to-face visits, web ordering, or teleordering.However, sometimes, importing companies belong to distributors.Consequently, distributors sell their own imported medicines.That is why there are two links between importer and distributor in Figure 1.Afterward, pharmaceutical distributors sell the medicines to hospitals, clinics, and pharmacies where patients obtain their required drugs, through teleordering or face-to-face visit.Most PDCs in Iran still use heuristic or simple statistical method for their sales forecasting.With the access to past sales data and by use of data mining techniques, almost all companies and especially pharmaceutical distribution centers can make accurate and reliable prediction for the future sales.Since sales prediction must be performed with high accuracy and in a short time, it is impossible to do it with manual or traditional methods.Consequently, it is really preferred to apply one of the data mining techniques to enhance the accuracy of sales prediction and also speed up its process.
In this research, we have collected the required data from a large PDC, which dispenses medicines to customers in a number of provinces in Iran.After receiving the orders, this company, likewise other pharmaceutical distributors, is committed to supplying the required drugs to provinces within 24 hours, to cities within 48 hours, and to remote areas within 72 hours.In keeping with its market-leading position, this company needs to have large products' inventories in order to meet customers' demand, as shortage of drugs is not acceptable in this industry.This company like other PDCs in Iran usually keeps inventories for the needs of next 1.5-2 months in advance.This fact causes many excessive costs and investments for Iranian PDCs.Inventory control, transportation, and financial costs contain a high percentage of total expenses in PDCs.Generally, PDCs buy products from manufacturers and pay for them at once, but they sell products and receive the related money gradually.Thus, this gap causes them undesired expenses.Monthly and precise sale prediction would shorten or even eliminate this gap.Therefore, most distributors in this industry are looking for modern and accurate methods to predict the future sales in order to reduce undesired inventory costs and increase profits while keeping their customers satisfied.
According to the restrictions on sales of medicines such as existing new items with short numbers of past sales records and having great diversity of medicines, the common existing forecasting methods are not effective for pharmaceutical companies since these methods require many past sales records of each item to predict accurately.Consequently, the objective of this research is concerned with the development of a novel and accurate sales forecasting model for pharmaceutical products by means of one of the related data mining approaches to overcome the problem of having numerous kinds of medicines and not having enough past sales records of each medicine.
The organization of the remainder of this paper is as follows.Section 2 reviews the previous literatures.In Section 3, research methodology, including overall research process, methods were applied to each step of graph based analysis and sales prediction, and related results obtained by each step are presented.Finally, conclusion and contributions of the study are discussed.

Forecasting on Time Series Data
Studies on forecasting encompass many researches that have examined alternative methods to find out which ones are the most efficient.From 1970, the methodologies that have been applied in sales forecasting have normally been time series methods that can be categorized as linear or nonlinear, according to the nature of the model they are dependent on [2].Linear models, such as autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA) [3], are successful and famous linear methodologies [4], but their prediction ability is restricted by their supposition of a linear behavior.Consequently, they are not always acceptable [5], specially for applying in non-linear time series forecasting cases [6].Accordingly, several forecasting techniques have been evolved through last decades; each technique has its specific advantages and drawbacks compared to other approaches.During the 1980s, two critical progresses occurred that influenced the evolution of time series study.On the one hand, the advancements of computer science resulted in using more complex algorithms.On the other hand, the evolution of machine learning techniques, such as artificial neural networks, leaded to significant progress in time series domain [7].
As a matter of fact, neural networks are the most famous models in time series forecasting domain since they have been confirmed as an efficient method to make prediction by dealing with nonlinear input and output variables, being able to estimate any function in definite circumstances [8].In other words, if the data are influenced by nonlinear behavior, linear methods are not capable of modeling underlying nonlinear relationships.Accordingly, nonlinear methodologies like artificial neural networks are more appropriate than conventional statistical methods [6,9,10].Recently, artificial neural networks have been extensively applied in forecasting and specifically in time series forecasting with considerably good prediction performance [11][12][13][14].The popularity of the neural networks is attributed to their distinctive capability to simulate a wide range of underlying nonlinear behaviors and to cover a broad variety of fields [15][16][17].Therefore, numerous studies have tried to implement neural networks to time series forecasting and compared them with conventional methods.However, the outcomes are not distinctly in favor of one specific methodology [2,18].

Comparisons of Different Time Series Forecasting Models.
Various comparative studies between traditional models and neural networks have been carried out, but outcomes are different with regard to whether ANNs are better than the linear methods in forecasting or not [19,20].Numerous researchers have offered empirical confirmation on the comparative preference of one method over the other in different forecasting cases [8,16,18,21].The majority of these studies illustrate varied results regarding the effectiveness of the ANNs compared with the traditional models like ARIMA [4].The following survey displays comparisons of diverse time series forecasting methods that developed during 1991-2008.
Since generation of neural networks, only few studies have concluded that traditional methods have superiore performance to or at least the same performance as neural networks.For instance, Tang et al. [22] reported the results of a study that compared the performance of neural networks and traditional methods in time series forecasting.Their research demonstrated that for time series with long memory, both ANNs and Box-Jenkins models had the same performance.Furthermore, Foster et al. [23] noticed that exponential smoothing is better than neural networks in forecasting yearly data, and comparable in forecasting quarterly data.Although it appears that neural networks may outperform conventional statistical methods in forecasting time series with trend and seasonal patterns, Nelson et al. [21] found that neural networks cannot properly model the seasonal patterns in their data.Also, Callen et al. [24] showed that forecasting performance of linear time series models was better than that of neural networks even if the data were nonlinear.Moreover, Church and Curram [25] and Ntungo and Boyd [26] demonstrated that neural networks performed nearly the same as econometric and ARIMA approaches.Afterwrads, Heravi et al. [27] showed that linear models generate more reliable prediction results than neural networks for European industrial production series.
In contrast, many researchers concluded that neural networks had better forecasting performance than linear methods.As an illustration, in a comparative study of the performance of neural networks and conventional methods in forecasting time series, Tang et al. [22] found that ANNs surpassed the Box-Jenkins in short term forecasting.Then, Chakraborty et al. [28] applied neural networks approach to multivariate time series analysis.They precisely forecasted the flour prices in three cities in USA.According to their outcomes, neural networks approach was better than classic methods.In a succeeding study, Ansuj et al. [29] compared the ARIMA model with interventions and ANN model in examining the behavior of sales in a medium size corporation.The outcomes proved that ANN model is more precise.Hill et al. [8] also demonstrated that neural networks were considerably superior to traditional techniques when predicting quarterly and monthly data.Agrawal and Schorling [30] compared the test results of neural networks with multinomial logit model of forecasting and they proved that neural networks could predict brand shares more accurately.Furthermore, Yip et al. [31] examined application of neural networks in sales forecasting.Applying several measures of accuracy, the result of the evaluation confirmed that the neural networks predict better than the time series smoothing methods of forecasting.
Afterwards, Zhang et al. [20] did an inclusive review of the literature regarding the employment of ANNs in various forecasting domains.In less than one-thirds of cases, ANNs were equivalent to linear methods and in about two-thirds of the cases, ANNs were better.In 1998, Kuo and Xue [32] proved that their developed artificial intelligence models could find nonlinear relationships better than conventional time series methods.Elkateb et al. [33] also compared neural networks with ARIMA models in peak load forecasting.The results of their study demonstrated that neural networks had better forecasting perforomance than ARIMA models.In fact, neural networks have surpassed conventional forecasting techniques in definite conditions (Hornik et al; cited by [34]), for example, when there are nonlinearities (White and Stinchcombe; cited by [34]) and also when there are considerable interactions among inputs [35].
Subsequently, Ainscough and Aronson [36] compared neural network and regression analysis, as a linear model, in modeling and predicting the results of retailer activity on the sales of definite products applying scanner data.According to the results of their study, neural networks had better performance than regression model.Then, Qi [37] reported that ANNs are very likely to do better than other methods when the required data are saved as recent as possible.Although statistical techniques have been proven effective for a long time, they still have definite drawbacks [10,32,38].For example, when the data are influenced by particular conditions, like promotion, the prediction results of conventional methods are undesirable [18,38].In addition, it is always logical to anticipate having a noteworthy level of nonlinearity in sales behavior [9].However, traditional methods are not capable of modeling nonlinear relationships, so most researchers apply neural network models to cope with forecasting problems.Although the contribution of neural networks compared to the traditional methods seems degraded in some cases [39], their vigorous capability to model nonlinear relations and their adaptation are highly attractive for most forecasting subjects [40].Therefore, advanced methods like neural networks could serve as more appropriate approximator [40] and would be more suitable for the time series sales forecasting than linear models like ARIMA.
However, Adya and Collopy [19] in their review paper presented that regardless of growing applications of neural networks in prediction and more specifically in business prediction, opinions and outcomes concerning their contribution are varied.Adya and Collopy [19] concluded that assessing studies in this area is complicated, owing to lack of obvious criteria.It means that, although neural networks have acquired growing attention in forecasting domain, resulting in competent applications in time series sales forecasting [41], some studies point out that no single approach works best in every condition, and combining diverse methods is an efficient and effectual way to advance forecasting accuracy [5].Consequently, some recent researches offered good explanations of the hybrid ARIMA-ANN models or combination of other conventional and ANN techniques [5,[42][43][44][45].As an illustration, Zhang [5] recommended a hybrid ARIMA-ANN approach in which ARIMA was applied to model the linear part and ANNs were used to model the prediction errors.He showed that the hybrid method outperformed both separate methodologies.Furthermore, in a research by Kuo et al. [46] a hybrid algorithm founded on radial basis function neural network for sales prediction.They presented hybrid of "particle swarm and genetic algorithm based optimization (HPSGO) algorithm" congregated advantages of "particle swarm optimization (PSO) and genetic algorithm" to advance the learning performance of RBF neural network.Outcomes of the research showed that the presented HPSGO algorithm had better performance than PSO, genetic algorithm, and Box-Jenkins model [46].Also, Khashei and Bijari [47] proposed a new hybrid neural networks model.They applied an artificial neural network (, , ) model for time series forecasting to have a more precise forecasting model than neural networks [47].The results of three real data sets pointed out that the proposed model was an appropriate way to advance forecasting accuracy accomplished by neural networks.Both theoretical and empirical results have confirmed that combination of various models is a successful way of enhancing the performance of forecasting models [47].Subsequently, a novel hybrid model merging autoregressive fractionally integrated moving average (ARFIMA) and feed forward neural networks (FNN) was offered by Aladag et al. [48] to examine time series tourism data.They compared their hybrid approach with other methods and it was revealed that the offered hybrid methodology had the great superiority over other approaches in terms of forecasting precision [48].Furthermore, Wang et al. [4] suggested a hybrid model, integrating the benefits of ARIMA and ANNs in modeling both the linear and nonlinear behaviors in the data set.Their hybrid model was examined on three sets of real data, that is, the Wolf 's sunspot data, the Canadian lynx data, and the IBM stock price data.The results pointed out the superiority of their proposed combinatorial methodology in acquiring more precise forecasting in comparison with existing approaches [4].Accordingly, it appears to be logical to apply hybrid models in most sales forecasting domains.
According to the presented survey, numerous prediction methods have been offered and each method has its specific advantages and disadvantages in comparison with other techniques.However, none of the accomplished studies described the applications of hybrid linear and nonlinear neural networks in forecasting.They also did not offer a novel technique for handling the problem of not having enough past records for forecasting.However, owing to the specific constraints of pharmaceutical products that is having numerous new items with few historical data, existent forecasting methods are generally inappropriate.This motivates the evolution of a novel hybrid approach, which combines both linear and nonlinear methods and their relevant strengths.Accordingly, in this research, the use of hybrid neural network by using each medicine's past records and its group members' past records to make precise sales prediction for PDCs was examined.

Research Methodology
As demonstrated in Figure 2, the overall procedure of this research consists of data collection and preparation, exploratory analysis, graph based analysis, data sampling for model fitting and testing, model building or sales prediction, model evaluation, and finally, conclusion.In this section, each step and different methods associated with each stage are explained.

Data Collection and Preparation.
To predict the sales of a company's products in an appropriate time horizon, past sales records of a preferred PDC (Pakhsh Hejrat Co.) were collected.This company provided us with monthly sales data of near 1200 kinds of medicines which were sold to different provinces or centers of Iran during three years.Database of Pakhsh Hejrat Co. includes name and code of medicines, sales number, name and code of centers, name of manufacturers, and price and monthly date of sales.To approach the objective of the research, code, date, and number of sold products were selected from database.Subsequently, 217 kinds of drugs which have been sold in all 36 months were extracted.
Code of medicines was changed to 1-217 for more easiness.Consequently, these 217 kinds of drugs were employed in grouping and model building phases.

Exploratory Analysis.
After data preprocessing, in order to better understand the nature of our data, an exploratory analysis was conducted.The exploratory analysis consists of the following steps.

Data Visualization.
In this part, sales plots and surface plots of time series data for all medicines were drawn.For example, Figure 3 shows the time series plot of drug 24.
In this plot, we can see considerable downward trend with nonlinear fluctuations around it.The plot also shows a weak seasonality, and its mean increases as the number of months increases.The variance is also altering.Sales plots of drugs represented that (1) some sales series had upward trend, some had downward trend, and some did not have any trend, (2) some series were stationary, but the others were not, (3) some series were seasonal, yet the others were nonseasonal, and (4) the variance was also altering through time for most products.
As a result, it is not possible to apply a unique model for all drugs.It means that these medicines were different products and have diverse characteristics.However, it may be possible to make use of similar sales behavior of some of them in their sales prediction.As an informal way to explore the relationships between variables, the surface plots for all medicines were also depicted in order to see whether they have linear or nonlinear relationships.Consequently, after examining the relationships of all drugs we can say that although in some cases there are some linear relationships, in most cases the relationships among medicines are mainly nonlinear.
To summarize, according to the exploratory analysis it was concluded that (1) most medicines had different and specific characteristics and sales behavior, (2) it was impractical to make a single prediction model for all medicines, and (3) there were both linear and nonlinear relationships among sales records, but there were mostly nonlinear.Thus, it is logical to mainly apply nonlinear or even hybrid models in this research and consider linear models as the second priority.

Graph Based Analysis of Medicines' Networks.
In the current study, forecasting problem has specific features that distinguish it from similar cases.These features are that (1) there are relatively many different kinds of products, (2) there are low numbers of past sales records for each product, (3) there are complementary and substitutable relationships in consumption of products, and (4) there may be irregular and nonlinear patterns in consumption behavior of most products.These conditions exist in a PDC that has to provide many and various kinds of medicines for its customers.In these situations, ARIMA and most of linear statistical time series forecasting methods are not recommended.
In forecasting problems (including sales forecasting), it is basically needed to identify effective variables and model their behavior.In ARIMA methodology, identifying and modeling of change patterns in target variables are emphasized.In fact, it is assumed that target variable trends and fluctuations are reflection of the outcome effect of all effective but absent variables.Accordingly, it is logical to use past records of the target variable in order to predict its future values.However, this question has always been propounded that whether change patterns of a single variable are able to reflect the resultant effects of all effective variables properly or it would be better to use a set of most explanatory variables instead of a single variable.Subsequently, vector autoregression (VAR) methodology had been presented to find the evolution and the interdependencies among several time series, generalizing the univariate autoregressive models.All variables in a VAR model are considered symmetrically by containing an equation for each variable clarifying its development based on its own lags and lags of all other variables in the model.However, excessive necessities, strong assumptions, and conditions of this method limit its usage in most cases.
In fact, in this research, it is experimentally showed that in cases with aforementioned conditions (1 to 4), using past records of the target variables, as well as past records of some potentially explanatory variables, could not give us worse prediction results than those of single variable.However, the key question is that for sales prediction of some products like medicines that there are many various items, past sales records of which medicines should also be used to predict future sales of other drugs.To answer the mentioned question, we introduced a novel approach, which was to conduct a graph based analysis in order to find groups of medicines that had similar sales behavior.

Graph Based Analysis.
In this research, a graph based analysis was performed to find groups of medicines which have similar sales behavior or high sales cross-correlation, then using past sales records of comembers for each other in their sale prediction.The proposed graph based analysis permits one to visualize a dataset through signifying its components as vertices and monitor definite relationships among them.One can simply visualize a graph as a set of vertices and edges connecting those [49].In order to explain the network analysis part, it is needed to present the following symbols and notations:  = { 1 ,  2 , . . .,  || } : Set of products that their sales must be predicted.
= { 1 ,  2 , . . .,   } : Sales records of item  in the period between 1 and . ( A graph representation of medicines is based on the crosscorrelations of their sales.The graph is constructed as follows: a vertex is associated with each medicine, and two vertices are connected by an edge if the correlation coefficient of the corresponding pair of drugs (calculated over a certain period of time) exceeds a specified threshold.Let  = (, ) be an undirected graph with the set of  vertices  and the set of edges  = {(, ) | ,  ∈  [50] and   > }, where   and  are the correlation between drugs  and  and the adopted threshold, respectively.It is said that the graph  = (, ) is connected if there is a path from any vertex to any other vertex in the graph.The degree of a vertex is the number of edges emerging from it [50].A clique within a graph is a set of totally interrelated vertices of the graph [50].In this research, it is proposed to find clique sets of medicines in order to identify sets of items that show harmonious changes.Accordingly, the following steps were followed.
Step 1. Firstly, cross-correlation matrix ( = [  ] ||×|| ) of variables was built in related period.The elements of this matrix consist of correlation coefficient   = correlation (  ,   ) between past sales records of  and  products.
Step 2.Then, adjacency matrixes ( = [  ] ||×|| ), that contains of 0, 1 values, were built so that if the Correlation   between past sales records of products  and  was greater than , the element   of the adjacency matrix would be 1, otherwise 0. In fact, this matrix is an adjacency matrix of a graph so that each node of the graph presents a product.Values 1 in adjacency matrix corresponding to two products indicate relatively strong relationship among their past sales records and 0 shows weak relationship.
Step 3. The graph, which was equivalent to matrix , was built with an appropriate threshold for .In fact, it was desired to choose the best threshold that gave an appropriate number of cliques and the best combination of cliques (medicines' groups).Therefore, as a heuristic approach, it was desirable to have all the following criteria simultaneously: (1) a large number of cliques, (2) modest variance or standard deviation of the cliques' size, (3) high mean of degree for the nodes, (4) nearly identical distribution of nodes' degrees (low variance or standard deviation of degrees).
Accordingly, it was preferred to have the maximum of the following index: In consequence, above index for different thresholds from  = 0.3, 0.35, 0.4, . . ., 0.9 was calculated and it is presented in Figure 4. Heuristically, the best threshold was  = 0.65.Consequently, the adjacency matrix for  = 0.65 was the input of the graph based analysis.Then, it was feasible to group medicines according to their cross-correlation or their mutual relationship.This grouping would help to perform improved and accurate sales forecasting.
Clique Sets.Clique sets (groups) of medicines were detected according to threshold of 0.65 and by means of Gephi software.In analysis of the medicines' graph, a relatively high correlation was chosen to ensure that only significantly correlated edges corresponding to the pairs of drugs were considered.The results, demonstrated in Table 1, stated that which members (drugs) are in each clique.As it is evident, 63 cliques or groups which contained medicines that have crosscorrelations more than 0.65 with each other were found.In fact, in this phase, similar and harmonious changes in sales behavior of medicines were used in order to find clique sets of drugs.After detecting clique sets, the network of medicines was visualized to summarize the findings of previous section (Figure 5).The drugs that are connected to each other are obvious in Figure 5.For instance, some drugs, such as drug 115, were alone and were not connected to any other drugs.These medicines had distinct consumption or sales behavior; accordingly, we could not predict their sales by using sales records of other medicines.However, some medicines like 24, 174, and 103 were pivotal drug as they were connected to many other drugs.These key drugs had similar consumption or sales behavior to many other drugs.Thus, sales records of various drugs could be used in sales prediction of these critical products.In addition, their sales records could be employed in prediction of many other medicines, which were connected to these critical products.Consequently, output of this section would help to overcome the problem of not having enough past sales records of each medicine since it would be possible to make prediction by using past records of each drug and its group members as input variables.
Actually, finding clique sets (groups) of medicines, which have similar changes in their past sales records, would have substantial applications in pharmaceutical sales management and warehousing.Finding groups of products that have similar sale behavior would make it possible to apply sales records of group members as indicators or input variables for sales prediction of each other.In addition, finding group members would have significant application in sales management of these products.As an illustration, managers may decide which items would have similar sales trend or sales behavior in future in order to set up similar sales projection for them.
This grouping of products, based on their sales behavior, would also assist managers to decide which items should be placed adjacent to each other in transportations and warehouses in order to optimize storage and transport of products.

Building Sales Forecasting Models.
In this research, both linear models (ARIMA) and nonlinear models (hybrid neural networks) were applied.Their performance was also compared based on various evaluation criteria.

ARIMA Methodology for Time Series Forecasting.
ARIMA models (, , ), developed by Box and Jenkins [3], have been widely used for time series forecasting.The three kinds of parameters in the model are the autoregressive parameters (), the number of differencing passes (), and moving average parameters ().In this research, ARIMA methodology was conducted to compare its performance with our proposed methodology.To fit ARIMA to the available time series, the following steps were executed.
(1) Transforming the series to make it stationary (finding  parameter): in order to determine the necessary level of differencing, time series plot, autocorrelogram, and partial autocorrelogram diagrams were examined.For instance, for a sample drug, named drug 24, after two times differencing and log transforming the series became approximately stationary.
(2) Identification of the initial  and  parameters: based on the visual examination of correlograms of autocorrelation, and partial autocorrelation plots, the best model for sales data is recognizable.For example, for drug 24, the best model was ARIMA (2, 2, 0).To be sure of the results of this step, the following steps were followed:  [51] and the exact maximum likelihood method according to Melard [52] were examined and compared with all products to compute the SS (sum of squares) of residuals.In some cases, the first method had better results, in some cases the second one.For drug 24, the exact maximum likelihood had better performance.The result of this step shows that the best model for time series sales data of drug 24 was (1, 2, 0).
(3) Forecasting: to avoid overfitting and to do model evaluation, the dataset was divided into two subsets: one for training (32 months) and one for testing and performance evaluation (4 months).To evaluate the performance of the model, the values of residuals, mean squared error (MSE), and mean absolute error (MAE) of the test data were calculated.The result of forecasting for the test data is presented in Table 2.
(4) Diagnosis of the residuals: a proper assessment of the model is to plot the residuals, to check them regarding any organized trends, and to inspect the autocorrelogram of residuals.Normal probability plot of the residuals approved the normality of them.If the goodness of the model is not approved in this step, it is necessary to redo all previous steps.For time series sales data of drug 24, autocorrelation function and partial autocorrelation function of residuals showed that there was no serial dependency between residuals.This step confirmed the goodness of the model obtained in previous step.Although the best ARIMA model for sales data of drug 24 was chosen, the prediction accuracy was not satisfactory enough.In next section, our proposed sales forecasting methodology is explained.

A Hybrid Neural Network Approach for Time Series
Forecasting.In this research, hybrid neural networks were carried out since it is not acceptable to apply a fully linear or fully nonlinear model on sales data.The origin of this approach was related to Zhang [40] that combined ARIMA methodology with ANNs in order to model linear parts with ARIMA and nonlinear parts with ANNs.However, the basic idea of the applied hybrid approach in this research is to let linear ANN model the linear parts and let nonlinear ANN model the nonlinear parts and then combine the results from both linear and nonlinear models.The model and its forecasting process are explained in more detail below.
The Hybrid Model.The hybrid model can be written as [40]   =   +   +   , where   is the time series observation at time period ;   and   are the linear and nonlinear components of the time series, respectively, and   is the random error term.The model building process involved three main steps: (1) fitting a linear neural network model to the time series under study, (2) building a nonlinear neural network model based on the residuals from the linear neural network model, and (3) combining the linear neural network prediction and the nonlinear neural network result to form the final forecast.The purposes of combining mentioned models were to take advantage of the unique modeling capability of each individual model and improve forecasting performance considerably.If the estimated component from the linear model is    and the estimated nonlinear component is    , then the combined forecast    will be [40]    =    +    .
Step 1 (data selection).As stated, 217 selective medicines were grouped in 63 cliques.It should be noted that evolved cliques have some overlapping members such that a drug may be a member of two or more cliques.To build forecasting model it was needed to examine the validity of proposed methodology on different products.Thus, different drugs were extracted from various degrees and Table 3, that explicitly presents each drug with its group members, was built.This table was one of the main tools of sales prediction phase.For instance, to predict the future sales of drug 24 in addition to its own past sales records, sales records of drugs 17,19,20,22,23,30,32,37,38,39,41,56,95,102,103,127,132,174, and 199 could be used.In contrast, drugs 31 and 86 have no comember and their sale forecasting is inevitably done by solely their past sale.However, for making a comparison, hybrid neural network models were built with two different approaches: (1) just using Step 3 (model building (hybrid neural networks)).This step will be illustrated as follows.
Building Linear ANN.In this step, for all 21 drugs and for our two approaches, first a linear neural network model was fitted to the time series under study and the residuals were found.
Building Nonlinear ANN.In this part, ANNs were applied to the residuals (that contained some nonlinearity) from linear model to estimate the probable nonlinear components.In this step, different types of network such as linear, PNN or GRNN, radial basis function, and one-, two-, and threelayer perceptron were examined.In the experiment, various numbers of hidden units were chosen and examined to find the best network architecture.For activation functions, linear and sigmoid logistic functions were tried.Finally, for training, various learning algorithms including BP, KM, KN, CG, and PI were examined automatically.
Step 4 (combining both models).Finally, two models were combined together to make a forecast.The previous steps were carried out through 25 different neural network models with different architectures and learning algorithms.Then, the best-fitted model for each drug was extracted.As it is obvious from these tables, the prediction performance was improved greatly when partners' past records were used as well.For instance, Table 4, which is related to sales prediction of drug 24 with its own records, shows MSE = 4094.8and MAE = 51.5.However, Table 5, which is related to sales prediction of drug 24 with its own and comembers records, shows MSE = 1973.5 and MAE = 33 that indicates a great improvement in accuracy of the prediction.By comparing these results with results of ARIMA model in previous section (MSE = 140793.3and MAE = 369.8417), the differences among three methods were clearly observed.The forecasting results for drug 24 show that the novel method outperforms both examined methods considerably.
Accuracy measures of ARIMA and both ANN approaches were calculated and compared for 21 selective drugs (see Table 6).By comparing MSE measures for all three approaches, it can be observed that in 18 cases out of 21 cases, MSE measures for the second approach were smaller than those for the first method.In 17 cases out of 19 cases, MSE measures for the third approach were smaller than those for first one.In all 19 cases, the MSE measures of the third approach were smaller than those for the second approach.Moreover, by comparing MAE measures for all three approaches, in 18 cases out of 21 cases, MAE measures for the second approach were smaller than those for the first one.In 17 cases out of 19 cases, MAE measures for the third approach were smaller than those for the first one.In 17 cases out of 19 cases, the MAE measures of the third approach were smaller than those for the second approach.Furthermore, by comparison of MSE and MAE for all three approaches, it is concluded that both MSE and MAE were smaller for the third approach than those for other two approaches.It is also observed that both MSE and MAE were smaller for the second approach than those for the first one.
This research verified that by applying network analysis tools to group medicines, using records of group members as input variables, and combining both linear and nonlinear models, forecasting performance can be considerably improved for two reasons: (1) grouping of products and using group members' past sales records solved the problem of not having enough past sales records for each product, (2) the hybrid approach studied in this research overcame the limitation of a pure linear or nonlinear modeling approach while at the same time took advantage of their unique modeling capability to capture different patterns in the data.In addition, it has been stated repeatedly in related literature that increasing number of past data would improve accuracy of time series forecasting results.In fact, the experimental observations in this research are a confirmation to the abovementioned theory.Consequently, evidences confirmed that the offered forecasting method (hybrid neural networks by inputting both each drug's past records and its partners' past records) performed noticeably better than both hybrid ANNs without use of only each drug's past recodes and ARIMA methodology.

Conclusion
According to the problem of having many new items with short number of past records, and having great diversity of medicines, common prediction methods are mainly inappropriate or ineffectual for PDCs.The basic objective of this research was to offer a novel and precise sales prediction method to help companies, especially PDC, to forecast product sale and tuning inventory management policies in order to prevent costs of excessive inventory and prevent losing customers due to drug shortage.In order to validate the proposed method, three-year monthly sales data was gathered from Pakhsh Hejrat Co.In data preprocessing phase, raw data was prepared to suit the research objectives.Subsequently, an exploratory analysis was conducted to better specify the nature of data.Next, a comprehensive graph based analysis was performed to find clique sets and group members and visualize the network of drugs.Afterwards, sales forecasting models were built with three different approaches: (1) ARIMA methodology for time series forecasting, (2) hybrid neural network approach for time Series forecasting by means of each drug's past recodes, and (3) hybrid neural network approach for time.Series forecasting by means of each drug's past records and its group members' past records.This research revealed that by grouping products, using records of group members as input variables, and combining both linear and nonlinear models, forecasting performance can be significantly improved for two reasons: (1) grouping of products and using group members' past sales records solved the problem of not having enough past sales records for each product and (2) the hybrid approach studied in this research overcame the limitation of a pure linear or nonlinear modeling approach while at the same time took advantage of their unique modeling capability to capture different patterns in the data.As a methodological contribution, in this research, hybrid neural networks were carried out to let linear ANN model the linear components and let nonlinear ANN model the nonlinear components and then merge the results from both linear and nonlinear models.In addition, this research introduced a novel method of grouping products to make use of group members' past sales data for each other in sales prediction and increase the accuracy of the prediction.The introduced scheme outperformed two previously known methods of (1) ARIMA modeling and (2) building ANNs by just using each drug's own past records.The empirical contribution of this research can be considered from two different points of view.Firstly, a real problem and an actual company (Pakhsh Hejrat Co.) with its genuine sales data were chosen, and sales prediction results were compared with unseen sales data of this company, and acceptable outcomes were achieved.Secondly, 8 managers and experts of 4 main and famous PDCs (Pakhsh Hejrat, Armaghan Daroo, Pakhsh Ferdows, and Daroo Pakhsh) that stand for more than 70% of total sales of medicines to drug stores and hospitals in Iran were interviewed.All managers and experts were attracted by the introduced method of finding group members and sales forecasting.They strongly believed that finding groups of products with similar changes in their past sales records would also help managers to optimize ordering, storage, transport, and delivery of products.Accordingly, empirical supports were attained for the proposed methodology.

Further Research.
In this research, clique sets were found based on sales cross-correlation, which only analyzed linear covariations.However, it is probable that medicines have also nonlinear covariations with each other.Thus, it is proposed to find cliques sets based on other similarities or harmonies in changes of sales behaviors to discover nonlinear relationships as well.It means that it is logical to test nonlinear relationships or even combination of linear and nonlinear relationships to have more efficient cliques and more accurate forecasting.Moreover, networks of medicines can be a good source for exploring the relationships between medicines.The roles of some medicines in the network and various types of relationships between them can be analyzed from the network analysis point of view and it will be helpful to consider therapeutic attributes of medicines in order to find groups of similar medicine.Furthermore, running and comparing other approaches such as evolving neural networks and evolving fuzzy rules will be very valuable but could not be included in the scope of this research and can be considered as a further research.

Figure 3 :
Figure 3: Time series plot of sales data for drug 24.

Figure 4 :
Figure 4: Heuristic indices changes for different values of .

4. 1 .
Contributions.If different contributions of researches are classified into three categories of theoretical, methodological, and empirical contribution, it could be stated that this research has methodological and empirical contributions.

Table 2 :
Four months' forecasting results for the original data.

Table 4 :
Forecasting results and accuracy measures of test data for drug 24 (with its own past records).

Table 5 :
Forecasting results and accuracy measures of test data for drug 24 (with its own and its comembers' past records).Step 2 (split data into train and test set).The data set was divided into two data sets.The last 4 months' data was reserved as the holdout sample (test data) in order to be used for forecasting evaluation.The rest of the data (insample) was used for model selection and estimation.The in-sample data were further divided into two portions of a training sample and a validation sample.The validation sample consists of the last 4-month observations while the training sample consists of remaining observations.
3.5.Model Evaluation.Once a model has been generatedand tested, its performance should be evaluated.In this study, two forecast error measures, namely, mean squared error (MSE = ∑  (  −   ) 2 /) and mean absolute error(MAE = ∑  |  −   |/)were employed for model evaluation and model comparison.Tables 4 and 5 show the best-fitted model, training algorithms, and predicted values in comparison with observed values, residuals (difference between real and predicted values),  2 , ||, MSE, and MAE for drug 24.

Table 6 :
Comparisons of the conducted methods.