This paper proposes a modified Genetic Programming method for forecasting the mobile telecommunications subscribers’ population. The method constitutes an expansion of the hybrid Genetic Programming (hGP) method improved by the introduction of diffusion models for technological forecasting purposes in the initial population, such as the Logistic, Gompertz, and Bass, as well as the BiLogistic and LogInLog. In addition, the aforementioned functions and models expand the function set of hGP. The application of the method in combination with macroeconomic indicators such as Gross Domestic Product per Capita (GDPpC) and Consumer Prices Index (CPI) leads to the creation of forecasting models and scenarios for medium and longterm level of predictability. The forecasting module of the program has also been improved with the multilevelled use of the statistical indices as fitness functions and model selection indices. The implementation of the modifiedhGP in the datasets of mobile subscribers in the Organisation for Economic Cooperation and Development (OECD) countries shows very satisfactory forecasting performance.
Forecasting is an endogenous process intertwined with the evolution of science. Forecasting methodology is divided into two categories: qualitative and quantitative. Qualitative methods employ the judgment of experts group to produce forecasts [
There is a variation of quantitative methods such as the time series forecasting which use past trend to forecast the future values of the variable and causal methods that, besides the past trend assumption, also examine the correlation of the variable with other indicators.
The adoption of innovative technologies by a society such as the mobile telecommunications adoption has been discussed and some widely used forecasting models have been proposed. The diffusion processes as well as the produced models are described in the literature [
The most commonly used diffusion models are Gompertz, Logistic, and Bass [
In addition to time response, we investigate the relationship of the produced models with some macroeconomic indicators such as GDPpC and CPI. The core work is an expansion, modification, and implementation of the hybrid Genetic Programming (hGP) method which was presented in [
The term Genetic Programming (GP) method is a generalization of the Genetic Algorithm (GA) which represents a heuristic method that employs the Darwinian principle of natural selection in finding an appropriate solution of a welldefined problem and every produced solution corresponds to a new program [
The basic structure of the paper follows. Firstly, a brief reference to the GP method and the diffusion models are presented. The hGP technique analysis follows as well as the description of the modifications and expansion on it. The next section analyses the results of the hGP implementation. After that, we discuss the forecasting results, and, finally, the conclusion is presented.
GP was introduced by Koza in [
The steps for the GP construction are generally the following. Firstly, GP produces an initial population of random programssolutions composed of the functions and terminals of the problem. The next step iteratively performs the following substeps until a termination criterion will be satisfied: execution of each program and assignment of fitness value according to the precision of each solution [
Rogers [
A diffusion process is described by dynamic or nondynamic models according to whether the level of saturation is changing over time (“carrying capacity”) [
The Logistic model is the solution of the differential equation (
The Gompertz model has been extensively used in forecast processes [
Also, a variation of (
Bass proposes that the adoption of a new product by a market consists of two major categories: innovators and imitators. The overall diffusion process starts with the innovators adoption of the new product or the innovative technology and then the imitators follow.
The cumulative adoption of the new technology
In (
In some cases, the overall life of a product, like mobile telecommunications, has many phasesgenerations. For this purpose we employ the BiLogistic curve which is the sum of two Logistic curves [
In this case, the saturation level is time dependent
This model describes the diffusion process when an innovative technology has created generations which are not clearly separated [
The hybrid Genetic Programming method in fitting and forecasting was presented in a previous work [
Flowchart of the modified hGP.
This stage of the method contains the preparation steps for the program execution process [
In the second step, the terminal set
The next step is to define the fitness function for each solution. Various statistical indicators can be used for the fitness function during the evaluation process. Following the previous implementation of the hGP [
In the fitting process, each chromosome is evaluated with the Sum of Squared Error (SSE), as in
In (
In forecasting, the fitness function refers to the weighted sum of squared error (wSSE) function, as in
In this function, a weight
Finally, the maximum number of generations is defined to end the execution of the GP.
As mentioned before, the function set of the modified hGP is extended compared to hGP. Apart from the primary arithmetic functions set
Representation of chromosomes in modified hGP as strings.
Representation of chromosomes in modified hGP as parse trees.
The expressions of the randomly created solutions combine the following primary block format, whereas each part is randomly chosen.
Block:
The solutions of the initial population are the combination of random chosen functions, variables, constants, and primary blocks. Also, the optimized Logistic, Gompertz I, Gompertz II, Bass, BiLogistic, and LogInLog diffusion models are being inserted in the population. The parameters of the diffusion models are optimized by nonlinear regression analysis and the LevenbergMarquardt algorithm has been used [
In modified hGP, each chromosome is a string of characters and corresponds to a program that is a possible solution to the problem [
The parse tree consists of nodes. There are two types of nodes, the terminal and nonterminal nodes. The terminal nodes (leaves) of the tree contain the variables or the constants. In contrast, the nonterminal nodes of the tree consist of the modifiedhGP functions [
As stated above, the best solution is selected according to (
Representation of the solutions’ sorted list in the selection operation of the modifiedhGP.
In tournament selection, a number of solutions from the sorted solutions’ list are selected at random and, then, the best is chosen for the crossover or mutation operation.
In the crossover operation, two parents are randomly selected, according to the tournament selection process, from the sorted by the best fitness value solutions’ list.
In each parent solution, a crossover point is randomly chosen. The substring of each parent beginning at the crossover point is interchanged between two parents’ solutions and the children (offspring) are generated. The crossover operation is presented in Figures
Crossover of the modifiedhybrid Genetic Programming method (string representation).
Crossover of the modifiedhybrid Genetic Programming method (parse tree representation).
In the mutation process, a solution is chosen by tournament selection from the tournament list. Once again, a string’s point, which depicts a function, is randomly chosen. The mutation replaces the chosen function from the
The mutation operation is presented in Figures
Mutation of the modifiedhybrid Genetic Programming method (string representation).
Mutation of the modifiedhybrid Genetic Programming method (parse tree representation).
The fitness function of each individual in the modified hGP method is the Sum of Squared Error (SSE) for the fitting process, as in (
The statistical indices in the modified hGP [
MAPE is presented in (
MSE, MAE, and RMSE are presented in (
In addition, this study has deployed a Bayesian’s Information Criterion (BIC) inspired format [
In (
It should be noted that in the final selection process of the appropriate forecasting model we use the half of dataset before the last observed data point.
In this section, the macroeconomic indicators of Gross Domestic Product per Capita (GDPpC) and normalized Consumer Prices Index (CPI) will be presented. The GDPpC is a macroeconomic index for the productivity of a country and it could not be considered as index of personal income.
According to [
In general, CPI indicates a weighted average of basic consumer goods prices. Moreover, in this study, the CPI relies on the individual consumption expenditure of households, less energy, and food consumption [
Yearly growth of the GDPpC (in 100000$) and CPI (normalized on the year 2005).
It should be noted that, after the year 2008 (“economic crisis year”), the OECD’s GDPpC is decreased for 2009, but, on the other hand, the CPI has bigger tolerance.
According to [
Also, in 2003, the next generation (3G) of mobile networks, Universal Mobile Telecommunications System (UMTS), emerged with the first videocalls and, later, (around 2006) was upgraded to High Speed Packet Access (HSPA) with data rates of 14 Mbps in the downlink and 5.76 Mbps in the uplink. Then, HSPA was upgraded to HSPA+ with theoretically 168 Mbps and 22 Mbps for downlink and uplink, respectively, and data services as videos, mobile email, and music. In 2009, LongTerm Evolution (LTE) was launched for commercial usage, while a new generation (4G) of technology is coming [
OECD mobile subscribers (in billions).
It should be noted that the number of the mobile subscribers is growing through the technology generations evolution.
The results will be analysed in order to provide a satisfactory prediction for mobile subscribers which consist of mobile contract subscribers and mobile prepay subscribers in OECD countries, as well as mobile 3G subscribers.
This study investigates the implementation of modified hGP on four different datasets. The datasets present the total yearly number of OECD mobile subscribers, the yearly number of mobile contract subscribers, the yearly number of mobile prepay subscribers, and finally the yearly number of mobile 3G subscribers. The observation period begins from the year 1997 to 2009, which is comprised of 13 data points.
Table
Initialization parameters of the modifiedhGP.
Initialization parameters of the modifiedhGP  

Maximum number of generations  500 
Evaluation function  SSE 
Upper limit of the precision for the candidates for crossover and mutation  0.5 
The fitting performance of the first modifiedhGP model for the total number of OECD subscribers, according to its fitness value (SSE), is presented in Figure
Statistical indices in fitting process of the modifiedhGP model for the total number of OECD subscribers.
Statistical indices of the modifiedhGP for OECD mobile subscribers  

Model name  MAPE  SSE  MSE  RMSE  MAE 
ModifiedhGP Model 





The fitting performance of the modifiedhGP model for the total number of OECD subscribers.
Residuals of the modifiedhGP model in fitting of the total number of OECD subscribers.
The corresponding modifiedhGP model has the following format:
As one can see, this method combines different variables like GDPpC or CPI with the independent variable of time. In Table
The fitting performance and the residuals for the remaining data sets are presented in Figures
Statistical indices in fitting process of the modifiedhGP model for the OECD mobile contract subscribers.
Statistical indices of the modifiedhGP for OECD mobile contract subscribers  

Model name  MAPE  SSE  MSE  RMSE  MAE 
ModifiedhGP model  0.00051034 


0.0002335  0.000187453 
Statistical indices in fitting process of the modifiedhGP model for the OECD mobile prepay subscribers.
Statistical indices of the modifiedhGP for OECD mobile prepay subscribers  

Model name  MAPE  SSE  MSE  RMSE  MAE 
ModifiedhGP model  0.000405 




Statistical indices in fitting process of the modifiedhGP model for the OECD 3G mobile subscribers.
Statistical indices of the modifiedhGP for OECD 3G mobile subscribers  

Model name  MAPE  SSE  MSE  RMSE  MAE 
ModifiedhGP model 





The fitting performance of the modifiedhGP model for the OECD mobile contract subscribers.
Residuals of the modifiedhGP model in fitting for the OECD mobile contract subscribers.
The fitting performance of the modifiedhGP model for the OECD mobile prepay subscribers.
Residuals of the modifiedhGP model in fitting for the OECD mobile prepay subscribers.
The fitting performance of the modifiedhGP model for the OECD 3G mobile subscribers.
The corresponding modifiedhGP model for contract subscribers has the following format:
It should be noted that this method combines different variables like GDPpC or CPI with the independent variable of time and a variation of diffusion models’ blocks. The performance of the model corresponds to a good enough behavior in fitting process. The error performance in fitting is depicted in Figure
The corresponding modifiedhGP model for prepay subscribers has the following format:
Once again, model yields a good enough behavior in fitting process. The error performance in fitting is depicted in Figure
Finally, the corresponding modifiedhGP model for 3G subscribers has the following format:
Finally, the modifiedhGP model yields a satisfactory fitting performance. The error performance in fitting is depicted in Figure
Residuals of the modifiedhGP model in fitting for the OECD 3G mobile subscribers.
The forecasting results of the generated models by the modifiedhGP method are presented in this section, as well as the combined diffusion models with the modifiedhGP models. As mentioned before, the statistic indicator wSSE has been used for the forecasting process.
The initialization parameters for the execution of hGP are presented in Table
Initialization parameters of modifiedhGP.
Initialization parameters of the modifiedhGP  

Maximum number of generations  500 
Evaluation function  wSSE 
Upper limit of the precision for the candidates for crossover and mutation  2 
The forecasting performance of modifiedhGP (forecast period—2 years ahead).
The corresponding modifiedhGP model has the following format, which has a BiLogistic behavior and it is time dependent:
The forecasting performance of the optimized diffusion models, according to their fitness value (wSSE) for the 11 training points, is presented in Figure
Statistical indices in forecasting process of the modifiedhGP and diffusion models for the total number of OECD subscribers.
Statistical indices of the modifiedhGP and diffusion models in forecasting the total number of OECD mobile subscribers  

Model name  MAPE  wSSE  MSE  RMSE  MAE 
ModifiedhGP  0.01271669  0.000226 

0.006957  0.005281 
Logistic  0.064808  0.009513  0.00148  0.038466  0.034057 
Gompertz I  0.04259  0.004252  0.000732  0.027062  0.023146 
Gompertz II  0.034539  0.004214  0.000574  0.023967  0.019402 
Bass  0.034737  0.004238  0.000575  0.023983  0.019426 
LogInLog  0.036436  0.023129  0.002048  0.045259  0.024329 
BiLogistic  0.018125  0.000281 

0.008764  0.007522 
The forecasting performance of the diffusion models (forecast window—2 years ahead) for the total number of OECD subscribers.
Considering Table
Residuals of the modifiedhGP model in forecasting (2 years window period) of the total number of OECD subscribers.
The forecasting performance of the modifiedhGP (forecast window—2 years ahead) for the number of OECD mobile contract subscribers.
The modifiedhGP model’s performance, concerning OECD mobile contract forecasting, is depicted in Figure
The forecasting performance of the diffusion models is presented in Figure
Statistical indices in forecasting process of the modifiedhGP and diffusion models for the number of OECD mobile contract subscribers.
Statistical indices of the modifiedhGP and diffusion models in forecasting the number of OECD mobile contract subscribers  

Model name  MAPE  wSSE  MSE  RMSE  MAE 
ModifiedhGP  0.004613 


0.003572  0.002164 
Logistic  0.028749  0.001562  0.00019  0.013771  0.011197 
Gompertz I  0.02024  0.000547 

0.008898  0.007826 
Gompertz II  0.011585  0.000148 

0.005422  0.004472 
Bass  0.011497  0.000149 

0.005415  0.004456 
LogInLog  0.011592  0.00034 

0.006392  0.004874 
BiLogistic  0.008066  0.000116 

0.004239  0.003334 
The forecasting performance of the diffusion models (forecast window—2 years ahead) for the number of OECD mobile contract subscribers.
Residuals of the modifiedhGP model in forecasting (2 years window period) of the number of OECD mobile contract subscribers.
The forecasting performance of the modifiedhGP (forecast window—2 years ahead) for the number of OECD mobile contract subscribers.
From Table
The modifiedhGP model’s performance concerning OECD mobile prepay forecasting is depicted in Figure
The forecasting performance of the diffusion models is presented in Figure
Statistical indices in forecasting process of the modifiedhGP and diffusion models for the number of OECD mobile prepay subscribers.
Statistical indices of the modifiedhGP and diffusion models in forecasting the number of OECD mobile prepay subscribers  

Model name  MAPE  wSSE  MSE  RMSE  MAE 
ModifiedhGP model  0.493749  0.000725  0.000363  0.019041  0.013627 
Logistic  0.615593  0.006608  0.001076  0.032806  0.029652 
Gompertz I  0.361444  0.003908  0.000617  0.024838  0.022462 
Gompertz II  0.239243  0.001791  0.00032  0.017901  0.015845 
Bass  0.24657  0.001829  0.000323  0.01798  0.015852 
LogInLog  0.066263  0.010377  0.000887  0.029782  0.014094 
BiLogistic  0.030654  0.000938 

0.008933  0.004127 
The forecasting performance of the diffusion models (forecast window—2 years ahead) for the number of OECD mobile prepay subscribers.
Table
Residuals of the modifiedhGP model in forecasting (2 years window period) of the number of OECD mobile prepay subscribers.
Finally, the modifiedhGP model’s performance, concerning OECD mobile 3G forecasting, is depicted in Figure
The forecasting performance of the modifiedhGP (forecast window—2 years ahead) for the number of OECD mobile 3G subscribers.
The forecasting performance of the diffusion models is presented in Figure
Statistical indices in forecasting process of the modifiedhGP and diffusion models for the number of OECD mobile 3G subscribers.
Statistical indices of the modifiedhGP and diffusion models in forecasting the number of OECD mobile 3G subscribers  

Model name  MAPE  wSSE  MSE  RMSE  MAE 
ModifiedhGP model  17.10841 


0.006422  0.005037 
Logistic  17.11114 


0.006423  0.005038 
Gompertz I  13.05382  0.000887  0.000132  0.011469  0.008277 
Gompertz II  4.674663  0.004189  0.000538  0.023185  0.012231 
Bass  4.66249  0.004268  0.000548  0.023399  0.012315 
LogInLog  17.11736 


0.006425  0.005019 
BiLogistic  1.760939  0.011841  0.002242  0.047353  0.041684 
The forecasting performance of the diffusion models (forecast window—2 years ahead) for the number of OECD mobile 3G subscribers.
Table
Residuals of the modifiedhGP model in forecasting (2 years window period) of the number of OECD mobile 3G subscribers.
The forecasting results of the generated models by the modifiedhGP method are compared with those of the ARIMA method derived. As mentioned before, the statistic indicator wSSE has been used for the forecasting process.
ARIMA is an acronym for AutoRegressive Integrated Moving Average. The
In (
The
The ARIMA models that are derived by the implementation of the “Gretl, Gnu Regression, Econometrics and Timeseries Library” for the aforementioned datasets are depicted below. The forecasting performance of the same modifiedhGP models and ARIMA models concerning total OECD mobile subscribers, contract, prepay, and 3G, is depicted in Figures
Statistical indices in forecasting process of the modifiedhGP and ARIMA model for the total number of OECD subscribers.
Statistical indices of the modifiedhGP and ARIMA model in forecasting the total number of OECD mobile subscribers  

Model name  MAPE  MSE  RMSE  MAE 
ModifiedhGP  0.003932 

0.002652  0.004845 
ARIMA  0.053389  0.000591421  0.024319  0.066797 
Statistical indices in forecasting process of the modifiedhGP and ARIMA model for the total number of OECD contract mobile subscribers.
Statistical indices of the modifiedhGP and ARIMA model in forecasting the total number of OECD contract mobile subscribers  

Model name  MAPE  MSE  RMSE  MAE 
ModifiedhGP  0.001999 

0.00063  0.001388 
ARIMA  0.022104 

0.008629  0.015574 
Statistical indices in forecasting process of the modifiedhGP and ARIMA model for the total number of OECD prepay mobile subscribers.
Statistical indices of the modifiedhGP and ARIMA model in forecasting the total number of OECD prepay mobile subscribers  

Model name  MAPE  MSE  RMSE  MAE 
ModifiedhGP  0.00302 

0.001312  0.001628 
ARIMA  0.078342  0.00114  0.033759  0.042227 
Statistical indices in forecasting process of the modifiedhGP and ARIMA model for the total number of OECD 3G mobile subscribers.
Statistical indices of the modifiedhGP and ARIMA model in forecasting the total number of OECD 3G mobile subscribers  

Model name  MAPE  MSE  RMSE  MAE 
ModifiedhGP  0.014951 

0.002216  0.00381 
ARIMA  0.151172  0.000407353  0.020183  0.041896 
Comparison of the forecasting performances of modifiedhGP with ARIMA model (forecast period—2 years ahead).
Comparison of the forecasting performances of modifiedhGP with ARIMA model (forecast window—2 years ahead) for the number of OECD mobile contract subscribers.
Comparison of the forecasting performances of modifiedhGP with ARIMA model (forecast window—2 years ahead) for the number of OECD mobile prepay subscribers.
Comparison of the forecasting performances of modifiedhGP with ARIMA model (forecast window—2 years ahead) for the number of OECD 3G mobile subscribers.
Considering Table
From Table
Also, in Tables
It could be concluded that the overall performance of the modifiedhGP achieves better statistic indices than ARIMA model for the predicted data points.
The proposed method has been tested for the stability and the robustness. The program was executed 20 times in the same dataset of the mobile subscribers. The mean gap between the best and worst solutions was decreasing as the generation was increasing. Also, the curve of the total average of fitness value per generation was decreasing. It should be noted that in Table
The hGP parameters for the testing process.
Initialization parameters of the modifiedhGP for the testing process  

Number of runs (executions of the program)  20 
Maximum number of generations  300 
Evaluation function  wSSE 
Target upper limit of the precision for the solution candidates  0.1 
Testing the robustness of the forecasting modifiedhGP.
The difference between the worst and the best solutions is decreasing. In particular, after the 25th generation, the indices above are converging. The mean gap of wSSE for the worstbest solution begins from value 0.004146752 and ends up to 0.000165101.
The introduction of GDPpC and CPI outside the time variable leads to the creation of causal forecasting models. This method provides a scenario based approach to forecasting. In order to study the future of mobile subscription in OECD countries, three scenarios are presented, according the GDPpC and CPI growth.
The pessimistic one concerns a continuing crisis scenario, so that the GDPpC and CPI growth rates are not increased. The second is a moderate growth scenario and the last one is the optimistic scenario, with GDPpC and CPI getting increased.
A variation of models is generated by the implementation of the modifiedhGP method. According to Bayesian’s criterion as well as the wSSE criterion, two models which combine all the variables, GDPpC, CPI, and time, are chosen. Figures
The forecasting performance (5 years ahead) of modifiedhGP models—the pessimistic scenario.
The forecasting performance (5 years ahead) of modifiedhGP models—the moderate scenario.
The forecasting performance (5 years ahead) of modifiedhGP models—the optimistic scenario.
In Table
Models’ expression and statistical indices of two modifiedhGP models in forecasting the total number of OECD subscribers.
ModifiedhGP models in forecasting the total number of OECD mobile subscribers  

Model name  Model’s expression  MAPE  wSSE  BIC 
First modifiedhGP model 

0.004733 




Second modifiedhGP model 

0.021657  0.000433 

The BIC depends on the number of the parameters. The generated models with one variable, like time dependent models, in many cases, have better BIC performance, but not always better forecasting performance.
In contrary, multivariable models, with good enough BIC, yield a good enough forecasting performance.
In the pessimistic scenario, the first model (most likely) achieves 1.538472 billion of OECD mobile subscribers, in the year 2014. The second has a total number of 1.415613 billion subscribers. It should be noted that the GDPpC and CPI growths are unchanged.
In the moderate scenario (the most likely scenario), the first model achieves 1.8 billion of OECD mobile subscribers, in the year 2014. The second has a total number of 1.58 billion subscribers. It should be noted that the average GDPpC rate is 2.5% and average CPI rate growth 1.7%.
In the optimistic scenario, the first model achieves 1.948 billion of OECD mobile subscribers, in the year 2014. The second has a total number of 1.686 billion subscribers. It should be noted that the average GDPpC rate is about 4.4% and average CPI rate growth 2.2%.
This paper is a modification of our previous work [
The proposed method differs from the hGP in some points. Firstly, the diffusion models’ set is extended with BiLogistic and LogInLog except for Logistic, Gompertz, and Bass so that the forecast horizon is improved, for longterm forecasting. Also, the functions’ set of the method is extended by the insertion of new functions and function blocks. According to this technique, chromosomes with complicated syntax expressions can be presented with short length expression stings. The tournament selection is implemented for the crossover and mutation operations in order to maximize the algorithm’s efficiency. Finally, a Bayesian inspired criterion has been implemented which, in combination with wSSE, improves the final selection of the forecasting models.
In general, the method could be considered as a forecasting tool that produces time dependent models and causal models for longterm forecasting with more variables than one. It should be noted that this method is compared with ARIMA model and achieved satisfactory performance. Also, the robustness of the proposed method has been analyzed. The implementation of the method is going to be continued on more datasets and it will be compared with other prediction methods in future work.
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors wish to express their acknowledgments to Professor Imed Kacem, University of Lorraine, France, for his constructive comments and suggestions, which helped to improve the quality of this paper.