Long-Term Exchange Rate Probability Density Forecasting Using Gaussian Kernel and Quantile Random Forest

. In the midst of macro-economic uncertainties, accurate long-term exchange rate forecasting is crucial for decision-making and planning. To measure the uncertainty associated with exchange rate and obtaining additional information of future exchange rate, a hybrid model based on quantile regression forest and Gaussian kernel (GQRF) is constructed. Quarterly dataset of KSh/USD exchange rate and macro-economic variables from 2007 to 2016 are used. The forecast horizon spans from 2013 to 2016. With a prediction interval coverage probability and prediction interval average width of 95% and 29.6493%, the constructed model has a very high coverage probability. The method of determining the probabilistic forecasts is very signiﬁcant to achieve forecasts with correct coverage. The probability density forecasting model for the exchange rate gave signiﬁcant information–the probability distribution of the forecasted results. In this way, uncertainties around the forecast can be evaluated because the complete exchange rate distribution are forecasted. GQRF is eﬃcient as it can uphold the uncertainty about the variance linked to each point, which is important for exchange rate forecasting. Using the constructed model, the probabilities of exceedance such as the probability of future exchange rate exceeding the average exchange rate for the year can be computed. This paper also adds to the scarce literature of exchange rate probability density forecasting using machine learning techniques.


Introduction
Exchange rate forecasting is truly a challenging task and continues to be a very vital research area for financial institutions and economists, foreign currency hedgers, speculators, traders, and all professionals in the foreign exchange (forex) market. Central banks, governments, and stakeholders in the forex market take into accounts exchange rate forecasting to make critical and important economic decisions. ese decisions impact on the future movements of a country's economy. e expected value of exchange rate influences cash flows of all foreign transactions. Hence, forecasting the movement of exchange rate is beneficial to investors, trade, businesses, and policy makers. Most multinational corporations (MNCs) use exchange rate forecast to decide whether to hedge future payables and receivables which are in foreign currencies. MNCs also use exchange rate forecasts for capital budgeting decisions, earning assessment, short-and long-term financing decisions. However, there is significant doubt about the likelihood of accurate forecasts because of the uncertainty of the major macro economic variables used in forecasting exchange rate.
is has resulted in exchange rate uncertainty in the forex market leading to significant effects in the trade (international and domestic) sector and domestic investments of most countries.
As indicated by [1], exchange rate uncertainty for medium-term period affects trade flows of most industrial countries with the exception of the United States of America. Reference [2] examined the effects of exchange rate uncertainty on firm's growth rate to profits and concluded that an increase in the uncertainty of exchange rate results in a greater variability in the growth rate of the firm's profit. Using Croatia and Cyprus as a case study, [3] analysed the effect of exchange rate uncertainty on sectoral exports. e authors indicated that there were significant negative effects of exchange rate uncertainty on exports. e relationship between Japanese firms' exposure to exchange rate uncertainty and their risk management practice was examined by [4]. It was concluded that firms with considerable dependency on sales in foreign markets have significant foreign exchange exposure that is determined by the market. Reference [5] studied the impact of exchange rate uncertainty on exports for some selected sub-saharan African countries ( e Gambia, Ghana, Kenya, Madagascar, Mauritius, Mozambique, Nigeria, Sierra Leone, Tanzania, Uganda, and Zambia) from 2013 to 2014. eir study revealed that, there was a negative effect of exchange rate uncertainty on exports in the short-term and a positive effect in the long-term period.
Different from the above literatures are literatures that have examined the effects of exchange rate fluctuations on domestic investments [6][7][8]. Exchange rate uncertainties have two contrasting impacts on the economy and the gross future prospects of investment on the economy of a country [9]. at is, imports become comparatively inexpensive as in contrast with exported goods. Consequently, the marginal profit of investing an extra unit of capital is certain, owing to the lower revenue generated by domestic firms operating locally and internationally. In their studies, [6] found that exchange rate fluctuations indirectly affects domestic investment because of its effect on domestic and international trade. Reference [7] investigated the effects of exchange rate uncertainties on foreign direct investment in some selected countries in Sub Saharan Africa. ey observed a strong negative and significant effect of exchange rate uncertainty on foreign direct investment. at is, when investors are uncertain about the performance of a country's currency in the long term, they are incline in moving their investment out of that country to other stable markets. In examining the impacts of exchange rate uncertainty on domestic investment for some selected emerging markets and developing economies (EMDE), [8] found that there was a positive and significant impact of exchange rate uncertainty on domestic investments for these EMDEs. e above literatures confirm the effects of exchange rate uncertainties on domestic investment, foreign direct investment, trading activities of industries, growth of industries, international and domestic trade, and the economy of a country as a whole. It is therefore imperative to accurately forecast exchange rate in the midst of uncertainties to help in long-and short-term decision making for key stakeholders of the economy and for investors. Using quarterly exchange rate and other macro-economic variables data of Kenya as a case study, we propose a probability density forecasting model that is able to capture the uncertainties associated with exchange rate. e constructed model is a hybrid model of quantile regression forest and Gaussian kernel (GQRF). Quantile regression forest has been used in the area of health and agriculture. Reference [10] used quantile regression forest to predict drug response treatment of cancer patients and evaluate the prediction reliability. Using out-of-bag validation, they asserted that their QRF model attained a higher prediction accuracy of the drug response. Recently, [11] applied a hybrid model of quantile regression forest and Epanechnikov Kernel function to capture weather uncertainties in crop yield forecasting. eir model performed well in predicting crop yields in the midst of extreme and uncertain weather conditions. e contributions made in this paper are as follows: (1) a reliable, efficient, and accurate probability density forecasting model using quantile regression forest and Gaussian kernel is proposed and implemented to capture the uncertainty of exchange rate. e total conditional probability density curve of exchange rate is constructed and all the selected observed exchange rate values lies within the forecasted probability density curve. (2) Using the proposed model, the probabilities of exceedance can be computed, such as the probability of future exchange rate exceeding the average exchange rate for the year. To the best of our knowledge, the proposed probability density forecasting model is the first of its kind in literature. e paper is structured as follows: Section 2 presents the theoretical concepts of the paper. An overview of quantile regression, random forest, and the proposed model (quantile regression forest and kernel density estimation) is presented in this section. Section 3 provides the evaluation metrics used to evaluate the performance of the point and interval predictions. In Section 4, a case study using exchange rate between United States dollars (USD) and Kenya Shillings (KSh) and selected macro-economic variables are presented. Data analysis and evaluation results are presented in this section. e conclusion is presented in Section 5.

Quantile Regression.
Conventional regression techniques draw the inferences of the relationship between realvalues response variable Y and a predictor variable X. It then finds an estimate μ(x) of the conditional mean E(Y | X � x) of the predictor variable Y|X � x. In contrast to conventional regression techniques, Quantile regression (QR) does not assume a particular parametric distribution for the response variable, nor assume a constant variance for the response variable. Hence, the introduction of Quantile regression. Quantile regression (QR) is a statistical approach of estimating and making inferences on conditional quantile of the response variable [12]. e standard QR problem can be defined as in where X i are the predictors/features, b is a vector of parameters for quantile α, and Q α (α | X i ) � inf y: f α (y | X i ) ≥ α} is the conditional α th quantile of the exchange rate distribution (y i ). From equation (1), the conditional distribution function F(y | X � x) is explained as the probability that Y is smaller than or equal to y ∈ R, that is By minimizing the loss function for a particular α th quantile, the vector of parameters are evaluated, 2 Complexity where n is the total number of the sample data, 1 is the indicator function, and . , x ki ] are the independent variables.

Random Forest.
Random forest (RF) is an example of machine learning (ML) ensemble algorithm called bagging (an abbreviation of ''bootstrap aggregating''). In bagging, the learner comprises of an ensemble of distinct base learners and each base learner is trained with a bootstrap sample for every tree from the entire observation. at is, bagging takes M subsamples with replacement from the training data and train the model on those subsamples. is ensures that every tree in the forest is built on about 63% of the available training data, and the remaining data is used for testing. Hence, RF has no need of an explicit test sample. e remaining data is called out-of-bag (OOB) data. e performance of each model on the OOB data when averaged can give an estimated accuracy of the bagged models. Predictions made using the OOB data are called OOB predictions. e principal advantage of bagging is that it integrates the regularization in it and only needs an appropriate choice of parameters or the base algorithms. Bootstrapping techniques such as random forest is appropriate for small sample datasets having the likelihood to overfit [13]. Now assume that T � y i , x i N i�1 is the set of training data and a technique for building a predictor Q(x, T) using the given training data, where y i ∈ R is the target variable and x i ∈ R d is the set of predictors. In bagging, a sequence of training datasets T B,1 , T B,2 , . . . , T B,P are created of equal size as T by bootstrapping from T. P predictors are built in a way that the p th predictor Q(x, T p,B ) depends on the p th bootstrap training data. As long as these predictors are aggregated, the resultant predictor can be significantly more accurate than the actual predictor [14].

Gaussian Quantile Regression Forest (GQRF)
2.3.1. Quantile Regression Forest. Quantile regression forests (QRF) is a consistent algorithm that presents a non-parametric and explicit technique of evaluating conditional quantiles for high dimensional predictor variables [15]. Quantile regression forest is a Machine learning technique that is based on random forest and quantile regression. e prediction of random forest can be likened to the weighted mean of the actual response variables. Similar to random forest, trees are grown in quantile regression forests. Using the weighted distribution of the actual response variable, the conditional distribution is evaluated. e assigned weights for the actual response variable are similar to random forest algorithm.
Consider the conditional distribution of the actual response variable Y given predictor variable X � x based on a the tree T(ϑ). ϑ is defined as the random parameter that decides the growth of a tree. Now, assume that the leaf containing x is represented as ℓ n (x, ϑ). It follows that the weight ω i (x, ϑ) is specified as Given that the T trees of the random forest can be represented as ϑ T and ω i (x) is the average of ω i (x, ϑ) over each and every single tree. It follows that e estimate of the conditional quantile Q α (x) for 0 < α < 1 can be computed as

Kernel Density Estimation.
e primary kernel density estimator can be written in a compact form as [16]: where K h (·) � K(·/h)/h and it is called the scaled kernel. e statistical analysis of kernel estimators is more basic than histograms. Normally, where σ > 0 is the bandwidth or smoothing parameter. A smaller bandwidth gives much more wobbly distribution and a substantial bandwidth loose much more details. Plugin methods, Cross validation, rule of thump are all types of bandwidth selectors. Different kernels such as triangular, triweight, cosine, uniform, Gaussian, Epanechnikov, and others can be used for kernel density estimation. In this study, we use the Gaussian kernel type of kernel density estimation because it can compute the bandwidth by a rule of thumb automatically [16]. To generate a total probability density, the density estimation is repeated for different values of x. is paper constructs a hybrid model for probability density forecasting. First, for out-of-bag prediction, the keep.inbag parameter of the QRF model is set to "True" and node size was set to 5. e obtained conditional quantiles are used as the inputs of kernel density estimation. e Gaussian Complexity kernel function is combined with the normal reference density ("nrd0") bandwidth selection method to implement the probability density forecasting. Figure 1 gives the realization flowchart of the constructed hybrid model for the probability density forecasting.

Evaluation Metrics for Point Prediction Errors.
Different evaluation metrics have been used to estimate the accuracy and efficiency of point forecast. Nonetheless, in this paper, we use the mean absolute percentage error (MAPE), root mean square error (RMSE), and R-squared (R 2 ). e smaller the values of MAPE and RMSE, the better the prediction model. e higher the value of R 2 , the higher the the goodness of fit. MAPE, RMSE, and R 2 are mathematically defined in equations (10)-(12) as follows, where i is quarterly exchange rate data, n is the aggregated quarterly exchange rate data over the forecasted period, y i and p i is respectively the i th actual and the predicted exchange rate.

Evaluation Metrics for Prediction Intervals.
Different metrics can be used to estimate the efficiency of prediction intervals. Prediction interval coverage probability and prediction interval normalized average deviation are the metrics that will be use to evaluate the reliability of prediction intervals in this paper. Prediction interval coverage probability (PICP) is defined as in equation (13) PICP where 1 i is a Boolean variable determined as and n is the total number of forecast. e higher the percentage of PICP, the better. at is, a larger PICP means a greater number of the actual values are covered by the forecasted prediction interval. However, the value of PICP can be increased by increasing the width of the prediction interval. e widths of prediction intervals should be measured to ensure the quality of prediction intervals. A narrow prediction interval gives a more factual information than a wider prediction interval. Prediction interval normalized average width (PINAW) is defined as follows: where min(y i ), max(y i ) are the minimum and maximum of the actual/target values.

Data Selection and
Description. e performance of exchange rate is controlled by different economic factors such as: stock price returns, consumer price index, interest rate, amount supply, foreign direct investment, international and domestic trade, real gross domestic product, consumer price index.
ere are complex interactions between these factors, hence it is very complex to accurately predict exchange rate. Following the work of [17], we selected the macro-economic variables (features) that consider several dimensions of interests like consumer price index, interest rate, money supply, foreign direct investment, real gross domestic product, and consumer price index. Table 1 presents the definition of the variables used in this study and the function of the the variable in the proposed model. Quarterly secondary data of money supply, foreign exchange rates, and weighted lending interest rates were sourced from the Kenyan Central Bank. Quarterly secondary data of foreign direct exchange, consumer price index (inflation), and gross domestic product were also taken from the Kenyan National Bureau of Statistics. Summary statistics for the sampled quarterly period from 2007 to 2016 is given in Table 2. From the table, the average of the foreign exchange rate, CPI, lending rate by commercial banks, money supply, foreign direct investments for the sample period under study were 83.81, 125.47, 15.79, 824287.6, 1513157, and 525046.5 respectively. From the nonparametric locally estimated scatterplot smoothing (loess) curve (dark grey line in the plots) fitted to the data in Figure 2, there is an upward trend in all the variables over the period of interest.
Anderson-Darling and Shapiro-Wilk test for normality are used to test the normality of the residuals of exchange rate data. From Table 3, the p value of EX residuals is less than the significance level of 0.05. As a result, we can reject the null hypothesis of normality and conclude that distribution of residuals data for EX are significantly different from the normal distribution. e departure of normality of EX is visually presented in Figures 3(a) and 3(b) where the density plot , box plot and normal quantile-quantile (QQ) plot ( Figure 4) are presented. For instance, there were case of outliers in the boxplot and the residuals data is skewed. Using Non-Constant Error Variance (NCV) and Breusch Pager tests for residuals homogeneity, the p value was found to be less than the significance level of 0.05 in both test. We can therefore reject the null hypothesis of constant error variance (homogeneity) and conclude that the variance are heteroscedastic (non-constant variance). From above, it is clear that some major assumptions of Ordinary least squares (OLS) regression model are violated. We can therefore employ a hybrid model using quantile regression technique and random forest to reveal the heterogenous effect of the features on the target variable (exchange rate) for different quantiles.

Data Normalization.
e numeric values of the datasets are changed to a common scale without distorting the differences in the range of the values using a min-max normalization. e normalized values lie within the range

Complexity
[0, 1], and is computed using equation (16). e normalized values are changed to the magnitude of actual/observed data using the antinormalization equation (see equation (17)).
x � x norm (max(x) − min(x)) + min(x), where x norm is the normalized input value, min(x) and max(x) are respectively the minimum and maximum value of the inputs. R statistical software was used in implementing all the forecasting approaches.

Experimental Results and Discussion.
e efficiency of the proposed Gaussian kernel quantile regression forest using the quarterly data described in Section 4.1 is determined in this section. e probability density of Kenya's exchange rate are predicted given the values of the selected macro-economic indicators from a scenario setting. Figure 5 presents the visual prediction results for the outof-bag (OOB) prediction. e OOB prediction values were obtained through the implementation of bootstrapping. e "quantregForest" package [18] of R statistical software was used. For optimal performance of quantile regression forest in this paper, we chose 1100 trees (T �1100). Figure 6 shows the mean square error and the r-squared plot. From the  is indicates that, the optimal number of trees was 200. e number of features selected from the entire set of features for each node of the regression tree was 2.
e out-of-bag prediction and node size parameters were set to "True" and 5 respectively. Evaluation metrics for point forecast using the mean and median are presented in Table 4. e MAPE value for median prediction (0.0217) is smaller than the MAPE value of the mean prediction (0.0265). e RMSE value of the mean prediction is 3.119 as compared to 2.7146 for median prediction. With a R-squared value of 0.9428, the point forecast of the exchange rate using the median fits the data relatively better than the mean. From Table 4, it can be concluded that, point forecast of exchange rate using the median is significantly better than point forecast using the mean. Generally, the fitness of model to the data is very high in all cases. is is evident in the R-squared values of 0.9248 and 0.9248 recorded by both the point forecast of the mean and median. Figure 7 shows the quarterly exchange rate for the sample period under study. For each quarterly period, the red dots represents observed quarterly exchange rate. e black line represents forecasted exchange rate at 99.5% percentile. is is the line above which 0.5% of the forecasted exchange rate would fall and below which 99.5% of all forecasted exchange rate are expected to fall. e OOB predictions and prediction intervals based on the observed exchange rate shows that the median prediction always fall within the prediction intervals. 95% of observed exchange rate values also falls within the prediction interval. is shows that QRF can accurately forecast exchange rate. e main conclusion that can be achieved by visualizing the prediction intervals in Figure 8 is the reliability of the prediction, as the length of the prediction intervals differs strongly. We can conclude that for majority of the observations, the prediction is more reliable than for others (only two observations). Table 5 (see appendix) presents the outof-bag forecasting results, the relative percentage error and the forecasting error of GQRF using the Exchange rate quarterly data.
To evaluate the performance of the proposed Gaussian quantile regression forest, we computed the values of PICP and PINAW. Table 6 presents the values of PICP and PINAW. PICP recorded a high percentage value of 95% and PINAW 29.6493%.
is indicates that the constructed prediction interval capture the actual exchange rate values very well. e small value of the PINAW also proofs the      efficiency of the the constructed prediction interval. For the purpose of illustration, the probability density curve plot for four years are presented in Figure 9. e probability density curves gave full probability distribution of future exchange rate. For each density plot, the average of the quarterly observed data for the year are capture with a red line in the   plot. e average of the quarterly exchange rate always fall within the probability density curve. It can therefore be concluded that the proposed GQRF model can acurately depict exchange rate uncertainty. e forecasted quarterly density exchange rate (see Figure 9) can be use to compute the probabilities of exceedance, for instance, the probability that the exchange rate will be more than the average exchange rate of 98.6975 USD/KSh for the year 2015. is exceedance probability can be computed from the density curve and it comprises of all exchange rate higher than 100 USD/KSh. e area of the density curve is calculated by finding the integral of the density function in Figure 9. e probability of exchange rate exceeding the observed average exchange rate 98.6975 USD/KSh in 2015 is 0.2899. e probability of exchange rate to exceed 101.5 USD/KSh in 2015 is 0.1584 and 0.7530 in 2016. is means it is less likely for exchange rate to exceed 101.5 in 2015 than in 2016.

Conclusion
Considering a long-term exchange rate forecasting problem, this study proposes a novel forecasting method which has the propensity to draw total conditional probability density curve of future exchange rate. Quantile regression forest is used to build a non-linear quantile regression probabilistic forecasting model. Direct plug-in bandwidth selector and Gaussian kernel selector are used to draw probability density curve. e significance of the proposed Gaussian quantile regression forest technique is its ability to capture information uncertainty of future exchange rate using probability density forecasting functions. Using a case study of the exchange rate between United States Dollar (USD) and Kenya Shilling (KSh), we were able to show that the proposed model performs accurately well. e values of PICP and PINAW revealed the high performance and accuracy of the prediction interval built by Gaussian quantile regression forest. e probability of exceedance for a given exchange rate value can also be computed using the proposed model.

Data Availability
Data for this work are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.