Research Article Tourism Growth Prediction Based on Deep Learning Approach

The conventional tourism demand prediction models are currently facing several challenges due to the excess number of search intensity indices that are used as indicators of tourism demand. In this work, the framework for deep learning-based monthly prediction of the volumes of Macau tourist arrivals was presented. The main objective in this study is to predict the tourism growth via one of the deep learning algorithms of extracting new features. The outcome of this study showed that the performance of the adopted deep learning framework was better than that of artiﬁcial neural network and support vector regression models. Practitioners can rely on the identiﬁed relevant features from the developed framework to understand the nature of the relationships between the predictive factors of tourist demand and the actual volume of tourist arrival.


Introduction
Most countries depend on the tourism sector for economic growth as it creates jobs and contribute about 10.4% to the global gross domestic product (GDP) as at 2019 [1]. Prediction of tourism demand (TD) is one of the critical activities in the tourism sector due to the need for precise forecasts in making certain operational decisions, such as pricing, resources, staff, revenue, and capacity management [2,3]. Precise TD prediction is also required by the governments to facilitate planning of destination infrastructure, operational flexibility, and environmental quality control [4]. Both quantitative and qualitative techniques are used in TD forecasting. e qualitative techniques depend on intuition, experience, and understanding of a specific destination market and exhibit poor adaptability [5]. ey are used mainly in the prediction of tourist arrivals based on historical records and other determinants of tourism volume [6].
Most of the existing studies in TD prediction are based on the quantitative approach; they normally present a model using training data from historical tourist arrival volumes (TAVs) and other TD predictors [7,8]. Web technology advancements have made search engines an essential tool for tourists when planning their trips, especially in getting relevant information on their areas of interest. Search Intensity Indices (SIIs) have been recognized as a potential TD indictor in the destination market [9,10] and have been examined by many researchers for TD prediction [11]. SII data are important for accurate TD prediction even though some practitioners have reported some challenges in using them with the conventional prediction models. Two of the common practical barriers that exist are as follows: (i) issues related to feature selection. Rosselló-Nadal and He [12] stated that many factors are considered potential TD predictands, such as the exchange rate, travel cost, tourism prices, and several SII data. e increasing number of the potential influential predictands reduces the availability of training datasets in the feature space, meaning nonavailability of enough data for the building of accurate models.
According to Wang et al. [13], training of most prediction models on training data with numerous explanatory factors is a tedious task; hence, feature engineering is an essential step in building a prediction model as it provides the best set of relevant features that will improve the performance of the developed model [14]. Although some factors have established meanings, it is possible that numerous potential keywords are focused with end user in terms of tourism marketing. Selected feature to TD prediction is currently based mainly on the existing knowledge of the tourism of destination market with the selection of the effective features demand significant human efforts [15]. (ii) e second issue is related to lag order selection. Even though numerous TD prediction methods have adopted the SII data, few studies focus on the determination of the huge relation for series data of the time. Some of the current studies investigated unpredictability hypothesis using either the Granger causality test or Pearson correlation coefficients [16]; in these studies, the investigation of the less hypothesis is performed by examining the extent of the relationship between the lagged values of a factor and the volume of tourist arrival.
However, the reliability of both the Granger causality test and Pearson correlation coefficients is still in doubt in the face of nonlinear underlying relationships [17]. erefore, the effective selection of the potential data relationships will enhance the development of highly accurate prediction models. e performance of time-series and AI models in prediction tasks is excellent as they rely less on feature selection depends on the existing information of market destination. e performance of the existing prediction methods is not satisfactory in every destination market due to the impact of numerous real-world situations, especially when adopting numerous SII data as TD indicators that may demand significant field-related expertise to restrain ambiguity. In addition, each SII parameter has different numbers of lag effective; the complexity increases in the presence of the same problem, such as bias in platform and language [18]. e emergence of deep learning techniques has been the solution to most of the above barriers as they have provided the means for achieving accurate TD prediction [19]. Deep learning is an extension of the ANN techniques of two processing layers of nonlinear relationships; they have an influence on many methods due to its built-in feature selection capability. e deep learning network is also relevant in time-series analysis owing to its advantages of being flexible and ability to discriminate nonlinear relationships. Long-term dependencies can be specifically handled and learnt by the recurrent neural network (RNN), attention mechanism, and long short-term-memory (LSTM) models. Hence, deep learning is considered an alternative solution to TD prediction. is work proposes a deep learning technique for TD prediction that will simultaneously address the earlier mentioned practical issues.
Structure of this manuscript is as follows: introduction is mentioned before, and then related work shows the existing methods. e deep learning section comes with its properties, following by difference of deep learning and normal neural network, then continuing by the proposed method and detail regarding the main method. Result and discussion section comes with analysis and the given results followed by conclusion section to summarize the research.

Related Work
e need for accurate TD prediction is enormous as it provides the necessary tourism-related information to researchers and practitioners; such information aids in making decisions on certain activities, such as resource allocation, risks, and opportunity identification. A review of the existing literature on TD prediction was presented in this section; deep learning technique was also reviewed as it is the selected basis for this work.

Tourism Demand Prediction.
e existing TD prediction-related studies can be classified into two approaches: qualitative and quantitative. e qualitative approach mainly depends on the experience and knowledge on a specific domain; hence, they are sometimes considered "artistic," and they exhibit low generalization potential [20]. e quantitative approach has been the method of choice for the estimation of the mount relation between many notes data in tourism. ey develop models depending on historical data for prediction to potential tourist arrival volumes. e performance of the quantitative approach can be improved using two main approaches: the first plan is to introduce more related parameters that can motivate tourism-related travels and the second strategy is adoption of more complex models that can accurately generalize future trends. e construction of TD prediction models mainly depends on input factors that are highly related to TD with no missing values. e available TD prediction models can be grouped in different ways using different criteria. ey can be grouped into determinants and indicators based on the nature of their influence on TD. Determinants are the major prediction parameters. e traditional theory of economic, such as utility theory, and the theory of the behaviour of consumption estimate that TD is influenced by both qualitative and quantitative economic factors, but most of the TD prediction models do not consider the qualitative economic factors due to the difficulty of their quantification. Such models rely mainly on the quantitative economic factors because they can be measured and used feature to prediction methods. Considering TD nature, it could be stated that the consideration of only economic factors is not enough. Some works have previously focused on the impact of noneconomic determinants on travel motivations, as well as the impact of travel motivation on the destination choices. Kumar and Kumar [21], for instance, introduced qualitative noneconomic determinants such as climate index, special events, and leisure time index. ese limits will have classified to pull, resistance, and push, considering their relationship with the specific domain; however, the pull factors are the only considered destination tourism market attributes [22] while the push factors have more influence on the source market [23]. e resistance factors on the other hand are comprised of factors restricting travel from source to the destination market, such as the relative prices and perceived corruption [24].
2 Complexity e determinants of economic theory mainly influence TD. e TD prediction accuracy can be improved by introducing some leading parameters that are seen as secondary factors into the prediction model [25]. e advancement of Web technology has made it possible for most tourists to search for important info before engaging on their trips; such information may be related to selection of the destination, hotel reservations, booking flights, and planning activities. e attention of tourist is reflected on the SII data; hence, they are effective TD indicators that need to be incorporated in TD prediction models. e study by [26] focused on the analysis of the relationship between the search words of tourists on US cities and the level of attractiveness of the cities. From the analysis, SII data were found as important tourism scale indicators in the destination market. Furthermore, Hilal et al. [27] predicted TD of Hong Kong from 9 source countries using the Google Trends Index; the study attested to the importance of SII data for such tasks.
Yang et al. [28] stated that data of SII portray choices to tourist and offer rapid data, which represents right on time variations to the choices of people. e relevance data of SII to prediction of motel residence were also presented by Li et al. [29]. ese attributes made them better than the variable time-series of individual models as they can solve the problems associated with abrupt variations in econometric patterns [30].
Search engine selection is mainly based on their popularity in the targeted tourism market. Google and Baidu provide the SII data commonly used in the existing literature [31,32]. However, the Baidu Index provides daily search volumes while Google company trends provide and normalize the index every week/month. Several prediction models are available in the field of TD prediction. In [8], the authors suggested that these methods are classified into time-series, AI, and econometric models, but TD prediction is usually done using time-series and econometric models [33].
Most of the common prediction methods are extensions of the AMA model [12] while the complex ones, such as Bayesian model, Markov-switching model, and generalized dynamic factor model, are developed for better performance [22,24]. e existing techniques in this group rely on historical time-series patterns to determine the relationships between TD predictands and the tourist arrival volumes. When building a predictive model, the main task is to introduce the best set of features that will reduce the prediction errors based on the measured performance metrics, such as mean absolute errors (MAEs), root mean squared errors (RMSEs), or mean absolute percentage error (MAPE). e AI models are the soft computing and machine learning methods used in TD prediction. e study by Law and Au [34] relied on multivariate regression analyses to build models for the identification of nonlinear relationships using neural networks. Furthermore, Zhang et al. [16] presented the improvement of the comprehensibility of TD prediction models using the rough set approach. According to Chakrabarty [35], evolutionary computing methods can be easily used to predict the monthly arrival of tourists at the Balearic Islands of Spain. Effectiveness and efficiency proved by machine learning were used to evaluate the distribution of people during tourism [34]. Hybrid models that integrate different models have recently been shown to provide better results [20,36]. However, the "No Free Lunch" concept provides that no single technique can perform well in all scenarios as all methods are associated with certain limitations in certain scenarios [37]. Econometrics and timeseries models normally depend on stabilizer economic structure and patterns of historical while AI models depend on the size with quality from the training data available.

Deep Learning (DL).
Successful prediction of TD using AI models, such as SVR and neural networks, has been reported. e study by Chen et al. [38] succeeded in training deep network models through greedy layer-wise pretraining for wide range practical applications. Since the development of deep learning, it has found numerous practical applications, ranging from pattern recognition to natural language processing and image recognition [39]. It has also been used in predicting sequential data problems [40].
is section discussed two common deep network techniques that have shown great efficiency in time-series prediction. e reviewed deep learning techniques are LSTM and RNN with mechanism of attention. Regarding the RNN, its popular deep networking [41] processes data elements through selective passage of information across the time steps. is is an important attribute for its suitability in TD prediction as its structure is included in the time-series data series provides important information of context. Presented in Figure 1, either of the x for input or y for output of RNN, it can be a singular point of data, but both are time-series data. e memory of the RNN is preserved in the neuron of the hidden layer that captures all information that has been previously processed. e neuron output is generated based on the state of the neuron of the previous hidden layer and the current input via a feedback loop mechanism. e RNN can establish relation between a range of loop elements; it is also being found efficient in data series of nontime [42].
LSTM was developed as an extension of the RNN; it was built with not just a recurrent learning unit, but with several gates for the capturing of the longer and shorter states from the starting and last units, respectively.
is feature has enabled the use of LSTM in solving time-series prediction problems. Regarding the mechanism of attention, its feature selection technique works together beside other deep learning (DL) models. e model can assign different weights to different inputs to learn the relevance of the input without the need of doing it prior to model fitting.
e incorporated attention mechanism of LSTM made it ideal for TD prediction because it offers solution to both prediction and feature selection problems. Figure 2 shows the structure example of LSTM.

Rationale of is Work.
Much progress has been made in the prediction of TD, but these developments are yet to reflect in the performance of feature selection processes for TD prediction. Although the prediction performance of TD Complexity prediction models is dependent on the selected features used as the training data, TD prediction is still facing two practical limitations; the first issue is associated with feature selection. For secondary TD prediction factors, feature selection is mainly aimed at query selection with the aim of collecting search engine keywords that are related to tourism. However, the issue of long tail in SII data means there are numerous search density small search queries, thereby reflecting travel experiences unique [22]. Some irrelevant and redundant features can be removed using common senses to arrive at a set of ideal subqueries related, but this demands much human effort. e second limitation is related to lag order selection. is lag order determines the nature of the relationship between time-series data. Hence, lag order selection is an important step of preliminary in TD prediction; it is done using methods such as Granger causality test [27].
Most times, some tests do not capture nonlinear relationships or explain the underlying effects of confounding [29]. A wrong lag order selection process will invalidate the subsequent steps of building a prediction model.
Deep learning is an aspect of artificial intelligence that is considered a potential alternative to the existing TD prediction models due to its two unique properties which are (i) ability to naturally learn from highly nonlinear correlations and (ii) ability to automatically select appropriate features at different network layers due to its built-in feature selection mechanism. Moreover, lag selection can be reduced by exploiting the temporally local correlation between TD and its predictands; this can carefully select the best feature of the raw data of input. e properties made deep learning (DL) ability solution of the over-reliance in the field of expertise. Hence, the aim of the proposed work is to develop a model of deep learning for TD prediction via using the deep network framework for autoextraction of the relevant features with suitable lag orders from various potential features.
With the proposed method, deep learning used features extracted and structured as the neural network with variable nodes and layers. Taken features control the number of layers and nodes and control the system accordingly.

Deep Learning vs. Neural Network.
e neural network works on the basis of entering the information that must be processed into the input layer and then into the hidden layers that are limited in number in the neural network. Also, the number of nodes is fixed in a normal neural network. As for deep learning, the number of hidden layers that are responsible for the process is variable. What distinguishes deep learning is the change in the number of hidden layers as well as the number of nodes in the concerned layer. As for the proposed method, deep learning also contains feedback between the hidden layers and the output layer as well.

Methodology
A deep learning-based conceptual framework for TD prediction is proposed in this study; the following subsections provide the detail of the adopted deep learning framework for addressing the abovementioned problems. e proposed conceptual model was used for TD prediction. e proposed deep learning-based model for TD prediction was developed according to the steps shown as follows: (1) e first step is the identification of the search engine platforms. While planning for travels, tourists normally use several search engines to get information about their potential destination. Hence, it is expected that different source marketing platforms have various search engines. So, that is why it is important to identify the search engine platforms first. Google, for instance, is the commonest search engine in most of the Englishspeaking regions of the world, while Baidu and Yandex are common in places such as Russia and China [29]. (2) e second step is the data collection phase. ere is a need to get the volume of monthly tourist arrival volume from reliable sources. Most TD prediction variables can be sourced from various resource depends on data that are available. e collection of other factors, such as data of SII, involves the following substeps: (i) Identification of the initial search terms of the potential destination of the market and maximization of potential set of keywords searched in advance that will reflect the points of interest to tourist on the end user market using Google related to trends queries.  Although the feature selection step is not required in the deep learning framework that was applied in the subsequent step, it is important to remove most of the irrelevant factors that relate loosely with the size of tourist arrivals; this will be done automatic by the deep learning framework. Regarding the limitation of the linear Pearson correlation coefficients (PCCs), factors with minor associations can be prefiltered using the maximal information coefficient (MIC). is MIC is based on the concept that if there are two related features, a scatter plot of the grid that divides the data can be made to understand the relation of related features. Provide the MIC generally: where ρ represents the PCC and can be used as a natural nonlinearity measure. For a high value of MIC, large equation (1)

Deep Network
Structure. e proposed deep network architecture and its articulation with historical time-series TD data are presented in this section. e integration of the attention mechanism into the LSTM network is also detailed; the mechanism of attention gives the degrees of interest in the various factors. Tourism demand prediction is mainly aimed at predicting the tourist arrival volume based on historical multivariate factors. e input is formally portrayed as the fully observed set of feature vector as follows: (2) e corresponding tourist arrival volume is represented as where T represents the total length of time in steps, and say the week numbers or months with the gathered database. y t represents the tourist arrival volume (TAV) with time of step t, while x t of vector of multivariate factors. e prediction of TD requires the use of the time-series of multivariate parameters (x l ) l�1 L and the real TAV (y l ) l�1 L as model inputs for the construction of a model F for the prediction of y at future time steps: is expression differs from those of autoregressive models where the availability of (x l ) L+Δ l�L+1 is normally assumed when predicting (y l ) L+Δ l�L+1 as both are designed for the modelling of the relationship between conditions and their consequences. Long-term dependencies can be handled by the RNN, but the training is sensitive with the RNN of changing in gradient. So, the proposed LSTM and RNN can address this issue through the provision of block memory in their current connections. Cell memory is contained in each block for the storage of the temporal states of the network; there are three gates that control the network, which are called on the basis of data flow, and they are as follows: forget, remember, and inference; they ensure that weak signals are blocked from flowing through the network. Figure 3 depicts the LSTM framework.
Assuming the time-series as an input, this input is encoded by the LSTM into a set of hidden states (y l ) L l�1 . LSTM is based on the concept that a few gates are implemented at each time step from the regulation of the flow of information through the sequences; this enables accurate capturing of any long-range dependency. For any time of step l in LSTM, the capturing of long-range dependencies requires updating of the hidden state H l by the fixed data with the same current of time step x l , at the preceding time step H l−1 , at the input gate z l , at the forget gate f l , at the output gate o l , and at the memory cell c l using the following equations [17]: c l � f l * c l−1 + z l * tanh W xi X l + W hi h l−1 + b c , where σ and tanh are frequent functions, while X indicates sage multiplication. e b and W are used with the LSTM as the parameters though framework training. e result of Complexity 5 equation (9) is utilized to estimate the results of the linear retreating layer: As mentioned in equation (7), W r is indicated as the linear weight retreating layer. e proposed method deep network (DN) structure of TD is seen in Figure 4. e framework used the LSTM parameters along with the concern technique. Identify the leg or lead relations among data series at time is important in TD expectation due to advantage effect varies depending on the periods of delay. e LSTM is used for the framework of long-term dependency in data series of the time, with attention technique which is indicated in which parts of the unit sequentially units of the framework are effective. is structure gives ability to authors to take two critical parts of data in TD expectation, called as follows: (i) e relationship of temporal among different demand and factors (ii) Significance of the factors depending on weights on TD erefore, the temporal of long-term dependency among different arrivals of tourist volumes and factors can be detected automatically via utilizing the LSTM of the alertness technology.Size of the input is (m * d * n), while (m * d) considers size of choosing data in training. e (n) is large tardiness arrangement specified by the users. All layers that are connected (dense) consider bullied according to attention technology with each attention.
en, the common relevant information is chosen in the lead series as seen in the following equation: where W 1 e , W 2 e , and W 3 e are the weighing to be used by the framework. While the vector (e l ) represents as weighing that calculates the significance of feature of lead series on time l (x l ), and it was normalized e l . us, lead series x l is multiplied by attention weighing a l : x l * a l . e LSTM utilizes x l and h l−1 as input and updates the concealing case at time h l . e vector c l is presented by calculating up the multiplication: e final result is generated by the linear layer: e LSTM and the identifier can be trained one by one.

Empirical Study.
Implementation of the proposed imaginary framework was empirically evaluated through the prediction of the monthly TAV in Macau, an autonomous region of China that is located across the Pearl River Delta from Hong Kong. e economy of this region mainly depends on its gaming and tourism sectors. So, it is important to have an accurate and timely prediction of the TAV to sustain the economy of the region. e TAV of Macau in this study was predicted using secondary indicators such as SII data due to the lack of reliable data sources and expertise.

Performance Evaluation.
To proposed model was evaluated for TD prediction performance by comparison with some conventional prediction models that included the SVR, ANN, ARIMA, and ARIMAX models [18,20]. e predictions with the conventional models relied on the use of the TAVit dates back 12 months as an estimate of y k+1 , with the name of y k+1 � y k+11 . For the ANN and SVR models, the input data used were the data from the past 12 months (x l , y l ) l�k−11 k for the prediction of y k+1 . e ANN model was built with a sigmoid activation function and one hidden layer; it was trained using the backpropagation algorithm. Stationary series are achieved in the ARIMAX and ARIMA using the AR order of (p, d) times of difference; they are trained on the MA order of q using the tourist data from the past 12 months for the prediction of the next 12month TAV series; this is gradually increased during the process of model validation. ARIMA differed from ARI-MAX by using only the TAV for the prediction task while ARIMAX requires other external factors, x l�1 T . Features that contribute less to the TAV were eliminated using the MIC because such a large number of features cannot be handled by the SVR and ANN. e walk-forward model validation was employed to mimic a real-world scenario where new TAVs are made available monthly for the prediction of the TAV for the following month. e e best prediction model is determined as the model which achieved the lowest measurement values. e robustness of the developed model in this study was ensured by repeating the walk-forward validation process for the benchmarking models five times in the prediction of 12-month TAVs from 2016 to 2019. Note that the comparative models (SVR & ANN) have no expert-crafted features as their input features were automated from the MIC filtering process. Both models were further implemented using the features selected by the deep learning model; here, the models were recognized as SVR + F and ANN + F. Tables 1-3 summarize the global MAPE, MAE, and RMSE of the comparative models over a five-year period. e table shows that the performance of the deep network is better in terms of achieving low measurement errors for the 4 consecutive years. Table 4 shows the result of the one-tailed l-test for the MAPE, MAE, and RMSE of the proposed and comparative models in this study at α � 0.05. e results showed that the null hypothesis is rejected as it suggests the equality of the mean values of the proposed DLM and the comparative model. is is supported by the lower MAPE, MAE, and RMSE values of the DLM compared to the comparative models. e SVR and ANN with no feature selection performed comparatively with the conventional method. However, the proposed DLM which required no feature selection process performed better in terms of prediction than the other models. Note that the performance of SVR and ANN was improved by incorporating the DLM-selected features into their framework; the MAPE of SVR reduced to 5.086% from 6.482% while that of ANN reduced to 5.922% from 14.319%. Hence, the feature selection capability of the proposed approach is validated.
Although all DLM, ARIMA, and ARIMAX do not require preselected features, the ability of DLM to autoselect relevant features from the raw SII data improved its performance more than that of the ARIMA model. e comparison was made using only the Baidu keywords and Macau tourist arrival from Mainland China. Tables 5-7 show the MAPE, MAE, and RMSE of the comparative models within the studied five-year period. ese tables also show similar improvements in performance; for instance, there were decreases in the MAPE. e results of          Table 8; the results further supported better performance of the proposed DLM compared to the benchmark models.

Conclusion
e tourism sector requires accurate and timely demand predictions for making informed sector-related decisions. Studies have previously focused on the time-series, econometrics, and AI models for this task in the previous years; however, the performance of these conventional models is dependent on the goodness of the selected features. e feature selection process and their lag order determination are domain specific and demand significant human effort. is study proposed a deep learning-based approach to the selection of the relevant features for better performance of the predictive models. e evaluations showed better performance of the proposed deep network framework compared to the conventional methods possibly due to two reasons; firstly, the ability of the deep network model is to mimic the natural biological system. e successive network layers extract low-level features from the initial input layer for the subsequent abstraction of the high-level features that will capture the semantic relationships between features in the succeeding layers. Secondly, the LSTM has an attention mechanism that automatically selects the relevant features at each time step.
is study made two important contributions to TD prediction; the first one is the development of a systemic conceptual framework of TD prediction and the validation of its TD prediction capability. e proposed model in this work considered all the available TD prediction factors and required no human intervention in terms of feature selection. e second contribution is the use of the attention score for the interpretation is the trained deep network architecture. is provides the practitioners a new way of updating their TD prediction based on a set of relevant indicators at different time steps. e outcome of this work also suggests the ability of the proposed DLM to select a set of relevant features and determine their suitable lag orders. With these contributions, this work can be extended in two directions: firstly, incorporation of different types of indicators, such as Blogs and Tweets into the TD prediction task;, which can address the issue of nonavailability of training data as the DLM allows direct usage of these media data; secondly, feature sets with suitable lag orders can serve as inputs to other TD prediction models. e combination of DLM and the conventional prediction models can enable further theory development.
Future work can increase the features extracted and make the deep neural network more flexible in terms of the number nodes and layers. Prediction for a large variable number is required, and training should be made for the huge dataset.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.