A New Empirical Model for Short-Term Forecasting of the Broadband Penetration: A Short Research in Greece

The objective of this paper is to present a short research about the overall broadband penetration in Greece. In this research, a new empirical deterministic model is proposed for the short-term forecast of the cumulative broadband adoption. The fitting performance of the model is compared with some widely used diffusion models for the cumulative adoption of new telecommunication products, namely, Logistic, Gompertz, Flexible Logistic (FLOG), Box-Cox, Richards, and Bass models. The fitting process is done with broadband penetration official data for Greece. In conclusion, comparing these models with the empirical model, it could be argued that the latter yields well enough statistics indicators for fitting and forecasting performance. It also stresses the need for further research and performance analysis of the model in other more mature broadband markets.


Introduction
The diffusion of innovative products is the process in which innovation is adopted by society.An innovative product, after it has been widely adopted by society, can be the basis for developing other innovations.The modern diffusion theory concerning the construction of diffusion models for product adoption has been a topic of academic interest since the 1950s.The literature on this area is significantly large.In this brief historical overview, some important papers are presented.
Most of the diffusion models, used in market studies, are S-shaped curves.S-shaped curves (sigmoids) describe the growth of various phenomena in physics, biology, and social sciences [1].The Logistic model has been used by Griliches (1957) in explaining the adoption of hybrid corn in the USA [2].The linear form of the model was also used by Mansfield (1961) [3,4].A model derived from a study of the plants growth is that of Richards (1959) [5,6].The theory of diffusion of new products adopted by a social system has been presented by Rogers (1962) [7].
Rogers considers five categories according to the time response of consumers in purchasing new products.These categories of consumers, profiles are: innovators, early adopters, early majority, late majority, and laggards.This study is a standard reference for diffusion theory.The theory of Rogers is a description of the life cycle of a new product [7].
The Gompertz forecasting model was proposed by Gregg, Hassel, and Richardson (1964) [8].According to Chow (1967), the Gompertz model is better than the Logistic model in explaining computer demand [9,10].Bass (1969) proposes a growth model, which accurately forecasts peak color TV sales [11].The Bass model proposes the coefficient of innovation in diffusion's equation, which makes it suitable for the modeling of product adoption [11,12].Bewley and Fiebig (1988) use the flexible logistic growth (FLOG), as well as the Box-Cox model with applications in telecommunications [13,14].
In conclusion, it can be said that the objective of diffusion models is to describe and to predict market trends [1].

Diffusion Models
In general, diffusion models are deterministic time functions, and they use historical data to estimate the parameters of the diffusion process of a product's life cycle.These models are Modelling and Simulation in Engineering used to forecast the cumulative adoption of a new technology [15].
Diffusion models are divided into, at least, two categories according to whether the level of saturation is constant (nondynamic models) or changing over time (dynamic models).
The differential equation which describes the fundamental diffusion model follows the following formulation: where A is the estimated diffusion saturation level for time t, Y (t) is the diffusion penetration and function f (t) is the diffusion coefficient.In dynamic diffusion models, the A saturation factor is dependant on time.It should be considered that the difference A − Y (t) corresponds to the cumulative number of potential adopters, N, for time t.So, (1) could be written as In this paper the formulations of the Gompertz, Logistic, Richards, Flexible Logistic (FLOG), Box-Cox are used, as well as that of the Bass model.Here follows a short review of the above models.

Gompertz
Model.The Gompertz model is described by the following equations: where Y = Y (t) is the estimated diffusion level at time t, B < 0 and parameter A is the saturation level [16].The parameters that determine the model are A, B, C. Parameter B is related to the year that diffusion reaches the point of inflection, and parameter C measures the "diffusion speed".Parameter D is a constant (4) and is called Gompertz with constant.

Logistic Model. The general form of the logistic model is given by
where f (t) = −C − Dt(m, k) and C, D, m, k / = 0 constants.Various functions derive from the logistic model such as the following.

Fisher-Pry Model.
When the parameters of the logistic model are C = 0 and D / = 0 and time t(m, k) = t, the model is known as linear logistic or Fisher-Pry model.The linear logistic model has an inflection point that occurs when the diffusion level is the half of its saturation level.

Flexible Logistic Model.
The form of Flexible Logistic model (FLOG) derives from the logistic, where C / = 0, D / = 0 and The FLOG model locates the point of inflection anywhere between its upper and lower bounds [4].The parameters are m / = 0 and k / = 0 [16].

Bass
Model.The Bass model includes two categories of adopters, the innovators, at the early stage of the diffusion, and the imitators, afterwards.Y = Y (t), the diffusion level at time t, is expressed as where A is the saturation level of adoption, p / = 0 is the innovation coefficient, and q / = 0 is the imitation coefficient [6,11].

General Aspect of the
Model.An empirical model for short-term forecasting of the broadband penetration is proposed, described by the following discrete equation of differences: where Y n = Y (t n ) is the estimated penetration level at time t n .For t 0 = 0 the initial penetration Y 0 is taken.C n function measures the "adoption rate" at time t n , given by where k / = 0, k ∈ R and model's time t n is given by: where n ∈ N * is the current time (natural number without 0) of the observation, n STEP ∈ R is the step between n − 1, and n and D ∈ R * (real numbers without 0).Also, an equivalent mathematical expression in ( 9) that can be used is The continuous format of ( 9) is given by where, for t 0 = 0, we take the initial diffusion level Regression analysis shows that, in many cases, the range of k factor varies between values 2, 3.In the special case that k = 3, the solution of ( 13) gives: where Li s (z) represents the polylogarithm function of order s and argument z, defined by the infinite series In this paper, the discrete format of this model is used, described in ( 9) or ( 12), (10), and (11).It should be noted that the penetration growth rate is proportional to the number of potential adopters and to the likelihood f (t n ) of the fact that total number of adopters is N(t n ) for time t n .This can be written in discrete format as
Exponent k relates to the behavior of the time function which describes the total number of potential adopters.According to the empirical model, N(t n ) is given by The B factor ties up with the speed of the cumulative adoption and the saturation level of the diffusion model.Specifically, the B factor is related to the probability distribution of the model.The discrete probability p n of total potential adopters [N(t n )] at time t n is presented in Equation ( 18) is the discrete resultant of the reasonable assumption that the rate of probability in time will be in negative proportion to the probability p n and B factor [11].
The continuous format of the differential equation which describes this assumption is given by The normalized discrete probability function for time t n is given in The factor 1/(1 − e −B•tn ) is the sum of the geometric series Replacing ( 21) in ( 16) yields which is equivalent to (10).Finally, by replacing ( 22) in ( 12), the following is obtained: The difference t m+1 − t m is a constant number and equal to D. So, (23) becomes which is equivalent to (12).Consequently, parameter D relates to the time period of the growth as well as to the slope of the adoption rate curve (Figure 1). Figure 2 shows the influence of the model parameters on its performance.

Dataset.
The actual data used in this short analysis concerns the quarterly total broadband connections and the percentage of broadband connections per 100 inhabitants in Greece, from December 2004 until June 2010 (see Figure 3) [25].The construction of Figure 3 is based on data of the National Telecommunications and Post Commission (EETT) [21][22][23], as well as data of the Observatory for the Greek Information Society [24].

Fitting and Forecasting Method
Regression analysis is used to fit sampled data to a model.Curve fitting is done using the ordinary least squares method (OLS) [12].The objective of ordinary least squares method is to minimize the sum of squared error (SSE) between data points (t i , O(t i )) and model evaluated points (t i , Y (t i ; a 1 , a 2 , . . ., a k )): where Y (t i ; a 1 , a 2 , . . ., a k ) is a time function , with a set of k parameters {α i } and n > k known data points (t i , O(t i )) [12].In forecasting, parameter estimation is usually focused on the time interval near the last observed data points.Thus, the weighted least squares method is used [12].The weighted least squares method is to minimize the weighted sum of squared error (SSE) between points of the dataset (t i , O(t i )) and model estimated points (t i , Y (t i ; a 1 , a 2 , . . ., a k )): where w i is the weight 1/σ i and σ 2 i is the variance of ith observation [12].

Fitting Results.
According to the Sum of Squared Error (SSE) OLS , the fitting curves are sorted in Table 1.
The forecasting model shows the best fitting performance.It should be noted that the statistical indices concern the whole dataset.The fitting performance of the forecasting model improves its performance for any subset of the dataset, especially for the latest data.Analytically, the Mean Absolute Error (MAE), the Root Mean Squared Error (RMSE), and the Mean Absolute Percentage Error (MAPE) are statistics for evaluating the overall quality of a regression model [26].The Mean Absolute Error (MAE) is the average of the absolute value of the residuals.We observe that the MAE of the empirical model is the smallest concerning the whole set of models.It should be taken into account that this reduction is observed mainly in the last data.MAE is very similar to RMSE, but less sensitive to large errors.It can also be seen that the Root Mean Squared Error (RMSE), which is the square root of the average squared distance of a data point from the fitted line, is the smallest.The Mean Absolute Percentage Error (MAPE) is the percentage   It can be seen that the MAE of the forecasting model is the smallest.In addition, Gompertz (with constant), Gompertz, and Richards' models show a good behaviour.The aforementioned performance is improved on the time interval near the last observed data points.In Figure 5, the fitting model plots in time, December 2004 until June 2010, is shown.
The overall fitting performance of the models is satisfactory, as is the correlation coefficient between fitted and actual data of the models.This coefficient varies between 0.999285392 (forecasting model) and 0.998073643 (logistic model).In case the parameters of the models are calculated by the OLS regression method, the R squared (R 2 ) indicator which is the square of the correlation coefficient depends on the number of the parameters.So, the indicator's range varies  In the literature, MAPE is a reliable indicator for evaluating the prediction performance of the models.The proposed model performs better than the others.Specifically, the empirical model achieves well enough indicators (MAPE, MAE, and MSE).In general, it could be argued that the models that achieve good indicators, especially MAPE, are the Richards and Gompertz family.
In Figure 6, the aforementioned performance is graphically presented, for a forecasting period of one year.

Future Trends.
The performance of the models using the dataset (23 data points) is presented here.The WLS method was also applied here.The parameters of the forecasting model are presented in Table 3.
According to the MAPE indicator, the forecasting curves are sorted in Table 4.
According to the MAPE indicator, the forecasting model, Gompertz with constant, Richards and Gompertz perform better than the others.According to SSE WLS , the models with the best performance are the forecasting model, FLOG, Logistics (with constant), and Bass.The estimation of our model for the broadband penetration in Greece on June 2012 (two years ahead) is 22.09% approximately.The most optimistic estimation concerns the Gompertz model (BB penetration 22.34%), and the most pessimistic estimation is done by Logistic with 19.46%.Also for forecasting period of one year, the empirical model estimates the BB penetration 20.73%, Gompertz 20.85%, Gompertz (constant) 20.75%, and Richards 20.34% (Figure 7).

Future Implementation of the Empirical Model
It is mentioned that the adoption of an innovative technology by society follows the sigmoid curve.So, the  empirical model could be implemented for a generalpurpose time-series forecasting that follows the sigmoid curve.The implementation to different markets would give, using regression analysis, different parameters estimation.
The time horizon of the forecast depends on the dataset density.Generally, it should be noted that the reliability of the diffusion models depends on the number of the time series data.This principle governs the empirical model.So the implementation of the model should be chosen based on this principle.
Specifically, the coverage of the FTTH (Fiber To The Home) technology is rapidly growing, according to OECD [27].So, a suggestion of future implementation of the model (when the historical data are satisfactory in number) would be the forecast of the FTTH technology penetration in a Modelling and Simulation in Engineering country or in a geographical sector (e.g., Europe, Asia, America, etc.)

Conclusion
In this paper, a new short-term forecasting model was introduced, concerning the overall broadband penetration.The forecasting model exhibited a well-fitting performance to the observed data.The residuals, as well as the RMSE, MAE, and MAPE indicators of the empirical model were satisfactory.The statistical indicators, concerning the forecasting behaviour of the model, namely, MAE, RMSE, and MAPE, showed satisfactory results, also.
Future study about the performance of the empirical model for the broadband penetration, in different markets, should be considered.The comparison of the behaviour of the proposed model in different markets can offer a better estimation of the model's parameters and the correlation with financial and social indices.

Figure 1 :Figure 2 :
Figure1: The parameters effect on the adoption rate, as well as the effect on the broadband penetration.

Figure 3 :
Figure 3: Broadband penetration in Greece (total connections and percentage of the broadband penetration per 100 inhabitants).

Figure 5 :
Figure 5: The overall fitting performance of the models.

Figure 6 :
Figure 6: The performance of the models for forecast period of one year.

Figure 7 :
Figure 7: The overall forecasting performance of the models.

Table 1 :
Results of the fitting process for the models.

Table 2 :
Results of the forecasting process for the models (forecasting period: one year).

Table 3 :
The estimation of the parameters for the forecasting model.

Table 4 :
Results of the forecasting process for the models.