Meta-Learning Enhanced Trade Forecasting: A Neural Framework Leveraging Efficient Multicommodity STL Decomposition

. In the dynamic global trade environment, accurately predicting trade values of diverse commodities is challenged by unpredictable economic and political changes. Tis study introduces the Meta-TFSTL framework, an innovative neural model that integrates Meta-Learning Enhanced Trade Forecasting with efcient multicommodity STL decomposition to adeptly navigate the complexities of forecasting. Our approach begins with STL decomposition to partition trade value sequences into seasonal, trend, and residual elements, identifying a potential 10-month economic cycle through the Ljung–Box test. Te model employs a dual-channel spatiotemporal encoder for processing these components, ensuring a comprehensive grasp of temporal correlations. By constructing spatial and temporal graphs leveraging correlation matrices and graph embeddings and introducing fused attention and multitasking strategies at the decoding phase, Meta-TFSTL surpasses benchmark models in performance. Additionally, integrating meta-learning and fne-tuning techniques enhances shared knowledge across import and export trade predictions. Ultimately, our research signifcantly advances the precision and efciency of trade forecasting in a volatile global economic scenario.


Introduction
Understanding the dynamics of global trade is paramount, as it interlinks with worldwide economic growth.Unforeseen economic or political shocks, such as the recent COVID-19 pandemic, can profoundly afect international trade.For instance, in the frst half of 2020, China experienced a 6.3% decrease in total import and export value, compared to the previous year, due to pandemic-related disruptions (National Development and Reform Commission).Conficts between countries, like Russia and Ukraine, also lead to economic disturbances, including infation and reduced demand.Tese occurrences underline the signifcance of trade in economic growth and the necessity for accurate analysis and forecasting of foreign trade to mitigate potential adverse impacts on global economics.
Forecasting trade is inherently complex due to its susceptibility to unpredictable political and economic events and the multifaceted nature of trading various commodities.Moreover, trade data are typically presented on a monthly basis, which means that the data volume is limited.Consequently, utilizing conventional deep learning techniques on these data often leads to overftting.Nevertheless, accurate forecasting is crucial for both businesses and policymakers, as it aids in making informed decisions to mitigate adverse impacts on the global economy.
Historically, trade forecasting has employed traditional statistical techniques, such as ARIMA and fuzzy time series [1].However, there has been a discernible shift towards machine learning and deep learning models [2], known for their enhanced accuracy in capturing complex data patterns.
Te incorporation of deep learning has marked substantial advancements in time series forecasting [3][4][5].Models like the Long-and Short-term Time series network (LSTNet) [6] have proven efective in multivariate time series forecasting, ranging from solar plant energy outputs to trafc congestion predictions.
Recent explorations into spatiotemporal sequence forecasting ofer promising avenues for trade forecasting.Treating import and export trades as nodes in spatiotemporal tasks and utilizing deep learning models like Convolutional Long Short-Term Memory (ConvLSTM), Spatiotemporal Graph Convolutional Networks (STGCNs), and Graph WaveNet [7][8][9] can possibly enhance trade prediction accuracy.Techniques such as Spatiotemporal Graph to Sequence (STG2Seq) [10] may further contribute to this evolving feld, emphasizing the importance of continued research and development to adapt to the dynamic nature of global trade.
From these perspectives, we propose a neural model integrating Meta-Learning Enhanced Trade Forecasting with efcient multicommodity STL decomposition (Meta-TFSTL), which leverages the Transformer architecture and STL (Seasonal and Trend decomposition using Loess) decomposition, coupled with a dual-channel graph embedding with Meta-Learning.Te structure of Meta-TFSTL is illustrated in Figure 1.Te main contributions of this work are as follows: (1) Novel Trade Forecasting Neural Network.We develop a novel neural network that leverages the Transformer architecture and STL decomposition to capture intricate relationships and dependencies among various commodities and enhance the model's generalization ability.(2) Construction of Temporal and Spatial Graphs.We construct commodity spatial and temporal graphs based on Spearman correlation coefcients and temporal features, respectively, and employ graph embedding methods to capture the nodes' position and association and obtain high-dimensional representation vectors.(3) Time Series Interpretability with Attention.We use the self-attention mechanism of the Transformer architecture to capture drastic changes in the seasonal and trend components of the trade data, crucial for identifying sudden events and enhancing the interpretability of the time series neural network.(4) Meta-Learning for Enhanced Generalization.We integrate meta-learning techniques in response to the limited volume of monthly import and export data, aiming to enhance the model's profciency in extracting insights from smaller datasets.Specifcally, we adopt few-shot learning, a facet of metalearning, to train our model such that it efectively generalizes to previously unseen datasets after minimal exposure to training examples.(

5) Meta Knowledge Adaptation in Import and Export
Forecasting through Meta-Learning.Building on the hypothesis that import and export value series can inform predictions for each other, we employ meta-learning to pretrain on one domain (either import or export) and subsequently fne-tune on the other.Te enhanced performance achieved through this meta knowledge adaptation approach, as compared to direct training on the target domain, reafrms the existence of shared knowledge between imports and exports.Tis demonstrates the efcacy of meta-learning in harnessing this shared knowledge for improved forecasting accuracy.(6) Advanced Performance relative to State of the Art.
Our model, Meta-TFSTL, outperforms advanced models on trade datasets (import and export), achieving lower Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE).Tis results in more accurate trade forecasting, highlighting its practical superiority.

Related Work
2.1.Traditional Trade Forecasting.Trade forecasting plays a pivotal role in economic analysis and policymaking, offering insights into future trade trends and patterns.Over the years, numerous studies have assessed and compared various forecasting methods for predicting trade exports, imports, or both.Tese methods span from conventional statistical techniques such as autoregressive integrated moving average (ARIMA), fuzzy time series, and support vector regression (SVR), to more sophisticated approaches incorporating machine learning and deep learning technologies [11][12][13].
Classic time series models like the autoregressive integrated moving average (ARIMA) have been widely adopted for trade forecasting [14][15][16].Research by Hasin et al. [17] and Farooqi [18] demonstrated ARIMA's efcacy in trade prediction.However, Fattah et al. [19] highlighted ARIMA's challenges, including its requirement for a large number of observations.Despite its capabilities, the exclusive use of ARIMA has declined due to its limitations in addressing modern data complexities.
Fuzzy time series models have also emerged as an alternative for trade forecasting.Wong et al. [20] assessed the performance of multivariate fuzzy time series models against traditional time series models for forecasting Taiwan's exports.Teir fndings suggested that fuzzy time series models could surpass ARIMA in short-term forecasting, although ARIMA was more adept for long-term predictions.Wang [21] conducted a similar comparison, concluding that fuzzy time series models ofered superior accuracy for short-term forecasting but were limited in long-term projections.Wong et al. [22] demonstrated the efectiveness of fuzzy time series models in forecasting Taiwan's export volumes, showing more accurate predictions than ARIMA.
While fuzzy time series models excel in short-term forecasting, they struggle with long-term trends.In contrast, Support Vector Regression (SVR) outperforms both traditional and fuzzy models, especially with complex, nonlinear datasets.Guanghui [23] found SVR superior to 2 International Journal of Intelligent Systems ARIMA in demand forecasting.Studies by Lu and Wang [24] and Wu [25] emphasized SVR's accuracy in product demand predictions.Kuo and Li [26] enhanced SVR's performance by integrating it with various algorithms.However, SVR requires careful parameter tuning and can be computationally intensive.Multivariate time series models, such as ARIMAX, are favored in trade forecasting for their ability to incorporate multiple variables [27][28][29].Despite their prevalence, these models assume linear relationships and can be computationally demanding.Conversely, recent studies have shown machine learning models outperforming traditional time series methods in trade forecasting [30][31][32].Tese models excel with complex, nonlinear data but necessitate extensive training data and parameter tuning.Teir efectiveness largely depends on the quality of feature extraction.

Deep Learning in Trade
Forecasting.Recent research has delved into the potential of deep learning for trade forecasting, attributed to its capacity for automatic feature extraction and high representation ability [33][34][35].Lloret et al. [3] introduced models based on CNN and ED-RNN, both outperforming traditional methods in forecasting disaggregated freight fows.Similarly, Shen et al. [4] utilized an LSTM network for predicting trade volumes of 23 countries, demonstrating its superiority over conventional statistical models.
In the broader domain of time series forecasting, which includes trade forecasting, there has been a shift towards leveraging advanced deep learning models to improve predictive accuracy.For instance, Qin et al. [36] introduced the Dual-Stage Attention-Based Recurrent Neural Network (DA-RNN) in 2017, designed to enhance the efectiveness and interpretability of time series predictions.Subsequently, Lai et al. [6] proposed the Long-and Short-term Time series network (LSTNet) in 2018, a framework aimed at tackling the challenges associated with multivariate time series forecasting, crucial for predictions in various sectors such as solar plant energy output and trafc congestion.Following this, Oreshkin et al. [37] developed N-BEATS in 2019, a deep neural architecture specifcally for univariate time series point forecasting.More recently, in 2021, Lim et al. [5] proposed the Temporal Fusion Transformer (TFT), an innovative attention-based architecture for multihorizon forecasting, combining high performance with interpretability.

Spatiotemporal Sequence Forecasting.
As the feld of forecasting evolves and the complexity of data patterns increases, spatiotemporal sequence forecasting has become increasingly important.Li et al. [38] proposed a model that combines Graph Convolutional Networks (GCNs) with Recurrent Neural Networks (RNNs) to capture spatial and temporal dependencies, respectively.Yu et al. [8] introduced STGCN, utilizing graph convolutional flters and 1D convolutional neural networks.Wu et al. [9] proposed Graph WaveNet, incorporating a WaveNet-based architecture and GCN with attention mechanisms.Zhang et al. [39] developed deep spatiotemporal residual networks for citywide crowd fow prediction.Seo et al. [40] introduced structured sequence modeling with graph convolutional recurrent networks.Zheng et al. [41] presented a spatiotemporal sequence prediction model for large-scale trafc data using deep learning approaches.Nevavuori et al. [42] employed spatiotemporal deep learning architectures with UAV data for crop yield prediction, achieving promising results with a 3D-CNN architecture.

Optimization-Based Meta-Learning in Trade Forecasting.
Meta-learning, or "learning to learn," has emerged as a powerful paradigm for training models to quickly adapt to new tasks with minimal data.In trade forecasting, where data patterns can vary signifcantly, meta-learning ofers a robust solution.Te Model-Agnostic Meta-Learning (MAML) algorithm [43] is a pioneering method in this domain, designed to be model-agnostic and applicable to any model trained with gradient descent.Reptile [44] and Amortized Neural Inference for Learning (ANIL) [45] further extend this concept, ofering scalable and efcient solutions.
To the best of our knowledge, meta-learning is yet to be extensively applied in the domain of import and export forecasting.Tis situation presents a dual-edged scenario: a challenge, due to the unique complexities of trade data infuenced by myriad global factors, and an opportunity, given meta-learning's adaptability to new tasks with sparse data.Te unexplored application of meta-learning in this area holds signifcant promise, suggesting that its integration could revolutionize trade forecasting by tapping into the vast potential of emerging markets and new product categories.International Journal of Intelligent Systems Despite progress, trade prediction faces several challenges.Firstly, the heterogeneous nature of commodities and their susceptibility to unforeseen events complicate accurate forecasting.Secondly, most works in trade forecasting have not utilized spatiotemporal forecasting, thus missing out on capturing potentially crucial dependencies and relationships in the data.Tirdly, many trade forecasting scenarios sufer from limited data availability, especially for emerging markets or niche commodities.Tese challenges underscore the continuous need for innovation and methodological refnement to improve forecasting accuracy.In response to these challenges, we propose Meta-TFSTL, a novel spatiotemporal sequence forecasting neural network with Meta-Learning designed to capture intricate relationships and dependencies among various commodities, enhance model generalization, and provide more accurate and timely guidance for trade forecasting.

􏽮
∈ R T 2 ×N , and its ground truth is denoted by 3.2.Self-Attention Mechanism.Te self-attention mechanism is a widely employed attention technique that allows each token in a sequence to gather information from all other tokens.Te inputs are represented by queries, keys, and values, each with dimension d.Te mechanism computes the dot products between the query and all keys, scales down each product by , and applies a softmax function to obtain the weights for the values: where W Q , W K , and W V are the learnable projection parameters and Attention(•) denotes the self-attention operation.
3.3.STL Decomposition.STL decomposition, i.e., Seasonal and Trend decomposition using Loess [46], is a renowned technique for breaking down a time series into seasonal, trend, and residual components.Te seasonal component identifes regular patterns recurring at fxed intervals, the trend component depicts the long-term direction, and the residual component captures random fuctuations that remain after extracting the seasonal and trend components.
Te STL decomposition process includes two iterative phases: an internal cycle and an external cycle.Te internal cycle, aimed at ftting the trend and determining the seasonal component, involves six steps: (1) Remove the Trend.Calculate a detrended series ( Tis equation delineates the standard inner-loop optimization in OBML, involving τ steps of gradient descent on the loss l(ϕ ⊤ θ<L (X i )w, Y i ) with a learning rate λ.Lin et al. [47] empirically showed that maintaining w static during training does not compromise ANIL's performance, suggesting that optimizing w in the outer loop is nonessential.Hence, ANIL's training objective, excluding the outer-loop optimization of w, becomes Finally, in the Dual-channel Multitasking decoder, the

Methodology
, which are aggregated through Adaptive Event Fusion interaction to capture the latent representation of dual-scale temporal patterns,

Time Series Decomposition Layer.
Given the dynamic nature of global trade, our model's architecture is precisely engineered to adeptly respond to sudden economic shifts or political events that may impact trade patterns.Trough the application of STL decomposition, the model disentangles the input data X into trend (X tr ), seasonal (X s ), and residual (X r ) components, as shown in the following equation: Te trend component is tasked with capturing long-term changes, whereas the seasonal component addresses cyclical fuctuations.In instances where sudden events occur within a specifc cycle, such anomalies trigger the temporal attention module, which focuses on the trend component, to detect deviations from established patterns.Concurrently, the variability in the seasonal component increases, refecting the immediate efects of these events on trade dynamics.Tis nuanced response is further refned in the subsequent Full Graph Attention module, which captures anomalies by leveraging the interconnectedness and dependencies across various commodities and timeframes.
Tis dual-channel processing, combining trend stability with seasonal variability, renders the model particularly sensitive to abrupt economic or political occurrences.Te focus of the temporal attention module on the trend component ensures that long-term shifts are accurately detected, while the increased variability in the seasonal component signals short-term disruptions.Te Full Graph Attention module enhances this mechanism by providing a broader context, allowing for an even more nuanced understanding and response to sudden changes.
Tus, the model supports rapid adaptation to such changes, ensuring its forecasting capabilities remain robust and responsive in the ever-changing landscape of global trade.Tis strategic design underscores our model's readiness to navigate the complexities of trade forecasting amidst economic variability and unforeseen events.

Dual-Channel Spatiotemporal Encoder.
Te dualchannel Spatiotemporal encoder, as shown in Figure 2, represents an innovative approach to trade forecasting, addressing the complexities of trade data through a comprehensive model.Te "dual-channel" aspect of the encoder is designed to separately model the trend and seasonal components derived from STL decomposition, allowing for a nuanced understanding and representation of both longterm trends and cyclical patterns within trade data.Tis separation ensures that each component's unique characteristics are accurately captured and utilized for forecasting.
Moreover, the "Spatiotemporal" nature of the encoder incorporates both time slots with Struc2Vec graph embedding and a temporal attention mechanism, along with dilated causal convolution.Tis integration enables the model to capture not just the temporal correlations present within the trade data but also the spatial relationships between diferent commodities or markets.By embedding time slots and employing graph embedding techniques, the encoder enriches the model's input with both global and local patterns, signifcantly enhancing its forecasting capability.
Incorporating spatial and temporal graphs into the encoder addresses the limitations of previous single-method approaches, which may not fully capture the intricate dependencies and dynamics present in complex trade data.Te dual-channel Spatiotemporal encoder's design philosophy is rooted in the need for a more robust and fexible modeling technique that can adeptly navigate the multifaceted nature of global trade, thereby ofering a substantial improvement over traditional forecasting models.

Dual-Channel Temporal Pattern Recognition.
Unlike previous works that employed single methods such as LSTM to model complex temporal patterns in entangled fnancial sequences, our approach utilizes both temporal convolutional layer and temporal attention to capture the temporal correlations of trend and seasonal components.
Te trend component (X tr ), embodying long-term patterns, is adeptly captured using temporal attention after being decomposed by the Time Series Decomposition layer via STL decomposition, as illustrated in Figure 2. Tis process considers global relationships across the entire series, thereby efectively capturing the overall trend.Conversely, the seasonal component (X s ), representing shortterm patterns and specifc events, is best modeled using temporal convolutional layers that focus on local patterns.Tese layers operate on the decomposed seasonal International Journal of Intelligent Systems component, accurately capturing seasonal patterns and sudden events.Tis combination leverages the strengths of both methods, enabling comprehensive and accurate forecasting of entangled fnancial sequences.Te distinct processing of X tr and X s components through the dual-channel Spatiotemporal encoder ensures a nuanced approach to modeling the intricate dynamics of trade data.
Te temporal convolutional layer employed in this study is a one-dimensional convolution that slides over the input by skipping values at specifc strides, as illustrated in Figure 2. Teoretically, given a one-dimensional sequence input x ∈ R T and a flter f ∈ R J , the temporal convolution operation between x and f at time step t is defned as where c is the dilation factor.Te temporal convolution layer for the seasonal component is represented by where Θ and b are learnable parameters and ReLU(•) denotes the rectifed linear unit.Moreover, we apply masked self-attention to the temporal dimension of the trend component.Tis approach is motivated by the trend component's inherent stability, allowing it to clearly represent long-term trends: X tatt tr � Concat ta 1 , . . ., ta n , . . ., ta N , where ta n � Attention X n tr , X n tr , X n tr . (7)

Global Spatial Feature Extraction.
As depicted in the Graph Construction module of Figure 2, for spatial correlation of commodity series, we initially considered adopting the vanilla Graph Attention Network (GAT) to dynamically calculate weights between connected nodes.However, the spatial receptive feld of the vanilla GAT is limited to immediate neighbors.Tus, we utilized the full GAT to dynamically capture global spatial dependence by performing self-attention on the spatial dimension of X conv s and X tatt tr .Tis approach enables the model to understand and leverage the complex spatial relationships within the trade data, enhancing its predictive capability by incorporating a broader context of intercommodity infuences.
Tis equation combines raw trade data (X) with spatial (ρ spa′ ) and temporal (ρ tem′ ) graph embeddings, enhancing the model's ability to capture spatial dependencies and temporal dynamics in trade data.By incorporating these embeddings, the model benefts from a richer representation that can take into account the interactions of commodities over time and space.Tis integrated approach enriches the model's inputs by taking advantage of the nuanced relationships and changing patterns inherent in global trade, thereby enhancing the model's generalization capabilities and its adaptability to complex trade data scenarios.In the specifc frequency decoder, we predict the future of the trend and seasonal components outputted by the dualchannel encoder using a predictor.Te trend and seasonal information are integrated through fusion attention and multitasking supervision.
Te trend component, generally more stable and less volatile than the seasonal component, provides a reliable basis for model training, helping stabilize the training process.Moreover, the trend, embodying the persistent and stable global patterns in the data, plays a crucial role in overall predictions.While the seasonal component is pivotal for capturing short-term fuctuations and sudden events, its inherent uncertainty and volatility make it less reliable for overall predictions.Terefore, by prioritizing the supervision of the trend component, the model can extract more valuable and reliable information for long-term predictions, crucial for decision making in many practical applications.

Decomposed Temporal Feature Fusion.
Te goal is not merely to predict the trend and seasonal components but to forecast the future trade value series based on these components and other factors.We propose a fusion attention mechanism, as illustrated in Figure 1 Te model utilizes L 1 loss for supervision.By leveraging knowledge from the more stable trend component, the model efectively enhances its capability to learn the long-term trends of the commodity value sequence, thus improving performance.Consequently, the optimization objective of Meta-TFSTL is to minimize the loss function shown in the following equation: Tis loss function computes the L 1 distance loss between the predicted and actual values for each timestep t and each node n, where y n t represents the actual value, and  y n t denotes the predicted value for each commodity in the subsequent months.Moreover, y n tr t signifes the real trend component, and  y n tr t its predicted counterpart.Minimizing this loss function enables the model to better ft the future trends in value, thereby enhancing model performance.

Meta-Learning Framework for Trade Forecasting.
Faced with a new commodity or a shift in economic conditions, Meta-TFSTL applies its meta-learned knowledge for initial predictions, demonstrating the model's quick International Journal of Intelligent Systems adaptability by fne-tuning on a limited dataset specifc to the new context.Tis adaptation mechanism is crucial for maintaining high forecasting accuracy in the dynamic and unpredictable domain of global trade, emphasizing our approach's efectiveness in navigating evolving market trends and economic shifts.
In trade forecasting, a task T i is defned as the prediction of trade values under specifc economic conditions.Each task comprises a support set S i for training and a query set Q i for testing, illustrating the model's readiness for various forecasting scenarios.For task T i , the support set S i includes n pairs of historical trade data and corresponding trade values, represented as . Te meta-learning framework categorizes model parameters into two groups: θ <L , the parameters of all layers except the last, termed Backbone parameters, and w, the last layer or task-specifc head parameters, termed Output parameters.Te main objective is to optimize θ <L across multiple tasks, allowing w to be rapidly adjusted for each specifc task T i .
Te adaptation process for task T i is as follows: where α is the learning rate for the inner-loop optimization and L T i (S i ; θ <L , w) is the loss computed on the support set S i for task T i using the current model parameters.
After adjusting w to w * i for task T i , the model's performance on the query set Q i informs the update of θ <L based on overall task performance: where β is the learning rate for outer-loop optimization and is the loss on the query set Q i for task T i , using the adapted parameters w * i .Tis iterative two-phase optimization process-adjusting w for each task followed by updating θ <L based on aggregated task performance-enables Meta-TFSTL to acquire generalized parameters θ <L * , facilitating rapid adaptation to new tasks.Tis capability signifcantly enhances the model's forecasting accuracy, especially for new commodities or changing economic conditions.1.

Experiments
In this study, we use the values of imported (or exported) commodities from the frst 10 time steps to predict the values in the subsequent 2 time steps.Tese datasets are then chronologically split into training (70%), validation (20%), and test (10%) sets.Performance of all methods is evaluated using three standard metrics, namely, MAE, RMSE, and MAPE.

Experimental Settings
5.2.1.Baselines.In this paper, we benchmark the performance of our proposed Meta-TFSTL model against a comprehensive suite of established baseline models.Tese baselines span from traditional statistical methods to the latest neural network architectures in time series forecasting.Our selection includes a total of 11 models, providing a broad overview of the feld's evolution and current state of the art.Here is a brief overview of each model, including their publication year to highlight recent advancements: (1) LastValuePredictor: a basic forecasting method using the most recent observation to predict future values.Tis approach serves as a simple baseline for comparison.
(2) ARIMA [49] (1976): a well-established statistical method for time series forecasting, known for its efectiveness in capturing linear relationships and trends.(3) VAR [50] (1980): a model that captures linear interdependencies among multiple time series, widely used in econometrics and fnancial analysis.(4) Bagging [51] (1996): an ensemble technique that improves the stability and accuracy of machine learning algorithms by combining multiple models.(5) LSTM [52] (1997): a recurrent neural network architecture designed to learn long-term dependencies, marking a signifcant advancement in sequence modeling.(6) GRU [53]  Tis diverse set of baselines, especially including models from the last three years (DeepAR, DeepVAR, N-Hits, and TFT), ensures that our comparison covers a wide spectrum of time series forecasting methodologies, from classical approaches to cutting-edge neural network models.

Experimental Settings.
In this work, we implemented the Meta-TFSTL model using the PyTorch framework and trained it using the Adam optimizer for a total of 1000 iterations, each iteration including 5 adaptations.Within the Meta-TFSTL model, the number of heads e in the attention mechanism was set to 1, with an initial dimension d e of 128.Additionally, the number of layers L in the spatiotemporal encoder was set to 2. To capture cyclical time dependencies, we employed dilated causal convolution layers with a kernel size of J � 2. Te initial learning rate was set to 0.001, adjusted with a decay rate of 0.1.Dropout was also incorporated, with a dropout rate of 0.2, to mitigate the risk of overftting in the model.

Training Environment.
In this study, we utilized a computer equipped with two V100 GPUs and a Hygon C86 7380 32-core Processor CPU as our training environment.Each GPU boasts 32 GB of available memory, ofering robust parallel computing capabilities to expedite the training of deep learning models.

Results. Tis study conducted experiments to investigate the Meta-TFSTL model, addressing the following six research questions:
RQ1.How should the periodicity and robustness of the STL time series decomposition be chosen and determined?RQ2.How are the support set and query set selected and determined for the meta-learning algorithm ANIL?RQ3.Does Meta-TFSTL outperform the baseline models, and what role does meta-learning play in enhancing the model's performance?RQ4.How do diferent components of Meta-TFSTL (e.g., sequence decomposition methods and graph embeddings) impact its performance?RQ5.How do hyperparameters infuence the performance of Meta-TFSTL?RQ6.Is our proposed Meta-TFSTL more efcient than baseline models?

Determination of Periodicity and Robustness in STL Decomposition (RQ1)
(1) Periodically Determined.STL time series decomposition dissects a time series into seasonal components X s , trend components X tr , and residual components X r .As mentioned earlier in Subsection 4.2, our focus in this inquiry is chiefy on the residual component, which captures the random fuctuations in the series that are not explained by its trend or seasonality.
To identify the dominant cycle in import (or export) value series, we examined periods ranging from 2 to 50.Using the STL decomposition, we tested the residuals for each period with a ten-order lag in the Ljung-Box test.A period was considered suitable if all series residuals showed white noise characteristics, indicating that the seasonal and trend components have efectively captured most of the series information.
After contrasting the test outcomes across varying periods, it was observed that the residuals for all product series distinctly passed the Ljung-Box test when the period was set to 10 months (i.e., period � 10).Tis period can be construed as the typical cyclicity for import (or export) value series.Te results of the lagged ten-order Ljung-Box test under this International Journal of Intelligent Systems period for all the import/export product value series are delineated in Table 2.
From the perspective of national development and openness, trade series for commodities not only exhibit clear periodicity but also show an upward trend over time, refecting the impact of globalization and economic growth on trade activities.In this context, STL becomes particularly important as it can precisely decompose economic time series into periodic and trend components, thus showing good adaptability to trade data [56].
Applying STL decomposition to all import commodities, as shown in Figure 3, and selecting a period of 10 months for analysis, we can not only clearly see the long-term growth patterns in the trend components of each commodity but also observe the regularity of periodic fuctuations.Tis identifcation of periodicity not only validates the accuracy of the chosen period length but also provides key prior knowledge for the construction of prediction models based on temporal attention.Particularly, the regular fuctuations observed in the seasonal components provide clear guidance for temporal convolutional layers in capturing periodic changes, ensuring that the model can efectively adapt to and recognize the periodic features in time series data.
Observing Figure 3, we can identify periods where the seasonal components exhibit noticeable shifts from previous trends, denoted as "Signifcant Regimes" (highlighted in purple in the fgure).Tis observation aligns with our further analysis of the seasonal components in STL decomposition, emphasizing the model's ability to discern substantial market fuctuations during specifc periods.For instance, the rise in Natural Gas (NG) imports in 2021 refects China's energy demand and policies to reduce air pollution by shifting from coal to cleaner energy sources (US Energy Information Administration).Similarly, the increase in Metal Ores and Concentrates (Metal) imports from March 2020 onwards aligns with China's economic recovery eforts and infrastructure projects post-COVID-19 (Reuters (2021) "China 2020 iron ore imports hit record on robust post-virus demand").Te growth in Grain imports from January 2020 is attributed to securing food supplies amid global uncertainties (World Grain (2020) "China imports record amount of grains in 2020"), while the spike in Coal and Lignite (Coal) imports by July 2020 corresponds to the demand for energy as the economy recovered (Reuters (2020) "China's July coal imports surge on heatwaves, power use").Te volatility in Automobile and parts (Auto) imports between July 2017 and June 2021 could be due to domestic demand shifts, tarif adjustments, and global supply disruptions, particularly due to the pandemic and trade tensions (U.S.Department of Commerce "China-Automotive Industry").Te initial decline and subsequent rapid increase in Crude Oil (Crude) imports from January 2020 refect global oil price fuctuations, strategic reserves replenishment, and support for domestic recovery (Reuters (2021) "China 2020 crude oil imports surge to record on buying binge").Tese periods of signifcant changes in commodities imports underscore the STL decomposition's efectiveness in capturing the dynamics of the market, providing valuable insights for the model's attention mechanisms to focus on and learn from these key market changes.
Terefore, the application of STL decomposition in the analysis of multicommodity trade data showcases its superiority in revealing and utilizing the seasonal, trend, and random fuctuation components in time series data.Tis not only provides a solid foundation for subsequent model design and prediction but also ofers a new perspective and method for understanding complex market behaviors.
(2) Comparative Analysis of Robust and Nonrobust STL Decomposition.STL time series decomposition primarily follows two distinct methods: Robust and Nonrobust decomposition.Te Robust decomposition showcases enhanced robustness when dealing with data containing outliers or anomalies.Leveraging weighted algorithms, such as Local Weighted Regression (LOWESS), robust decomposition minimizes the impact of anomalies on the decomposition results.In contrast, the Nonrobust decomposition, relying on simple averaging, is more susceptible to outliers and can be adversely infuenced by anomalous values.For a particular commodity series, the trend, seasonal, and residual components from both robust and nonrobust decompositions are illustrated in Figure 4 (with a period of 10, focusing on the imported Cu as an exemplary commodity).
From Figure 4, it can be discerned that the trend component from the robust decomposition is smoother, illustrating its insensitivity to anomalies.Conversely, the nonrobust decomposition's trend component exhibits more pronounced local fuctuations, which contradict the trend component's role in capturing overall tendencies.Examining the seasonal component, the robust decomposition's seasonal fuctuations appear more pronounced.Tis can be primarily attributed to the reduced fuctuations captured by the robust decomposition's trend component.As a result, sudden events or transient information might be incorporated into the seasonal features.Consequently, the seasonal component absorbs more volatility, refecting sudden incidents in trade, aligning with the designed role of the seasonal component to detect periodic and abrupt events.
Upon careful observation and interpretation, the Robust STL decomposition emerges as the more suitable method.Its trend component aptly captures the overall tendencies without being hindered by transient information, while the

Determination of Support and Query Sets in the Meta-Learning Algorithm ANIL (RQ2).
From a meta-knowledge adaptation perspective, our aim is for the model to adapt to more complex scenarios.Terefore, we designate the more intricate situations as the query set [43].Te advantage of enhancing generalization through meta-learning is evident in that training a model on a known data distribution (support set) can yield favorable results on an unknown data distribution (query set).Temporally speaking, the forecasting process often encompasses periods that are relatively straightforward to predict, as well as more challenging intervals.Te overall performance can be adversely afected by these harder-to-predict time spans, leading to suboptimal model outcomes.To address this challenge, we strategically  International Journal of Intelligent Systems design our approach to leverage the strengths of metalearning.
Adopting this approach for meta-learning modeling more efectively captures the intricate characteristics of the data.Initially, we employ the ARIMA algorithm to model all commodities in import and export, computing the monthly MAPE for each commodity.A subset of the results is illustrated in Figure 5.
We systematically computed the monthly MAPE between the predicted and actual values for all imported and exported commodities.Additionally, we derived the average MAPE across all commodities.By aggregating instances where the monthly MAPE exceeded the average MAPE for each commodity, a cumulative count was established, as illustrated in Figure 6.Tis metric serves as an indicator, highlighting specifc months that are inherently more challenging to forecast compared to others.From this analysis, it is clear that the ARIMA forecasts for import and export commodity values are more accurate from April to September, with fewer instances where monthly MAPE exceeds average MAPE.Conversely, the months from January to March and October to December present greater forecasting challenges, likely infuenced by global events such as New Year, Chinese Lunar New Year, and Christmas, which can disrupt trade patterns.Given these insights, we designate April to September as the support set and the remaining months as the query set for Meta-Learning.

Performance Comparison and Meta Knowledge Adaptation (RQ3)
(1) Performance Comparison.From Tables 3 and 4, both the TFSTL and Fine-Tuned Meta-TFSTL models excel in predicting import and export value series.Notably, the Fine-Tuned Meta-TFSTL surpasses TFSTL, demonstrating effective knowledge adaptation through fne-tuning.
For imported commodity value series, traditional models such as LastValuePredictor, ARIMA, and VAR tend to have higher error rates, with VAR underperforming signifcantly.Tis discrepancy may stem from the series' inherent nonlinearities and a lack of manually engineered features.Machine learning models show reliable results, with Bagging being noteworthy.Deep learning models generally perform comparably, but N-Beats edges ahead, potentially due to its sequence decomposition approach.Tis subtly reafrms the robustness of our STL-based decomposition in TFSTL and Meta-TFSTL.
Predicting the exported commodity value series, traditional models show varied results, with VAR's performance being notably poor.Among deep learning models, while diferences are minimal, N-Beats holds a slight edge, reaffrming its efcacy in such prediction tasks.
(2) Meta Knowledge Adaptation.Building on the premise that the Meta-TFSTL model leverages the predictability of certain months to establish foundational understanding of 12 International Journal of Intelligent Systems import trends and subsequently fne-tunes this knowledge with the more challenging months, we further explored its adaptability.
Shen et al. [4] posited the potential of leveraging economic formulas to predict export data using import data and vice versa, achieving commendable results.Tis highlighted plausible knowledge adaptation between import and export data, suggesting that reusing such knowledge could enhance prediction accuracy.To empirically validate this hypothesis, we adopted a meta-learning approach in our study to harness this knowledge adaptation.
Building on this foundation, our experiments with the Meta-TFSTL model for both import and export predictions were designed to strategically utilize training and validation sets from one domain and fne-tune on the validation set of the other.Specifcally, for import predictions, we trained on the export dataset and fne-tuned using the import validation set, achieving a performance boost with a nearly 2 percentage point reduction in MAPE over the TFSTL model.Conversely, for export predictions, the model was initially trained on the import dataset and fne-tuned with the export validation set, resulting in a signifcant improvement with a reduction of nearly 5 percentage points in MAPE compared to the TFSTL model.Tis approach not only validated   Te superior prediction performance for imported commodities over exported ones may refect the relative stability of domestic demand infuencing imports, the steadying impact of long-term tarifs and trade agreements, and the more exhaustive data acquisition for imports due to mandatory customs checks.

Ablation Study (RQ4).
To investigate the efectiveness of various components of Meta-TFSTL, we compared it with six distinct variants: Te ablation study results, presented in Tables 5 and 6, are organized into two distinct sections to evaluate the efectiveness of diferent components within the Meta-TFSTL framework.Te upper section of each table, above the line, comprises variants that employ alternative decomposition methods, including classical X-11 decomposition, Variational Mode Decomposition (VMD), and Discrete Wavelet Transform (DWT).Te lower section assesses models from which key components have been removed, such as spatial and temporal graphs, the time series decomposition layer, or the fusion attention mechanism.Tis structured comparison highlights the integral role of these components, with the complete Meta-TFSTL model outperforming all its variants on import and export forecasting tasks, thereby underscoring the composite model's robustness and efciency.
Te ablation study reveals that while the Meta-TFX11 and Meta-TFVMD variants ofer innovative approaches by employing X-11 decomposition and Variational Mode Decomposition (VMD), respectively, they do not match the performance of the full Meta-TFSTL model.Te Meta-TFX11 variant, despite utilizing the classical X-11 decomposition method for adjusting seasonal fuctuations, may not be as efective in capturing the nonlinear and complex patterns present in trade value series, leading to its lower performance.Similarly, the Meta-TFVMD variant, while adept at decomposing the trade value series into intrinsic mode functions, might oversimplify the intricate economic trends and seasonal dynamics, which are crucial for accurate forecasting.Tis simplifcation could be the reason for its suboptimal results compared to Meta-TFSTL.Furthermore, the Meta-TFWavelet variant signifcantly underperforms relative to Meta-TFSTL, likely due to the reduction in time step post-Discrete Wavelet Transform (DWT) and the potential loss of series information during inverse fltering for upsampling.Additionally, wavelet decomposition may not aptly capture economic trends and seasonal fuctuations as efectively as STL, contributing to TFWavelet's inferior performance.
Te "-G," "-F," and "-DF" models do not perform as well as the Meta-TFSTL model, likely due to the absence of graph embedding information, replacement of fusion attention, and omission of the disentangling fow layer.Tese components are crucial for the model's capability in information integration, complex pattern modeling, and relationship extraction, underscoring their importance within the model.

Parameter Sensitivity Analysis (RQ5
). Figure 7 presents the results of a parameter sensitivity analysis for merchandise import and export value sequences.Te top row of three graphs relates to the import merchandise's model hyperparameter variations, while the bottom row pertains to the export merchandise.For the import model, the hidden layer size and batch size were varied within a search space of [32,64,128,256], while for the export model, the search space was extended to [32,64,128,256,512]. Te import model achieves minimum prediction error with both hidden layer size and batch size set at 64, suggesting that further increases may lead to overftting and decreased predictive performance.Conversely, the optimal 14 International Journal of Intelligent Systems outcome for the export model is achieved with a hidden layer size of 256 and a batch size of 128, indicating a higher predictive complexity for exports.Additionally, the performance of Meta-TFSTL improves with an increasing number of layers, stabilizing at a count of 2.

Model Scalability and Computation Cost (RQ6)
(1) Model Scalability.Te scalability of neural network models is a crucial factor in their applicability to time series forecasting, particularly as the volume of data available for   8).Our analysis, leveraging MAPE as the performance metric, reveals Meta-TFSTL's consistent superiority in scalability and predictive accuracy across all evaluated dataset sizes.Starting with a MAPE of 14.86% at 20% dataset size, Meta-TFSTL exhibits a notable performance improvement, achieving a MAPE of 10.13% at full dataset utilization.Tis contrasts with other models, which, despite showing improvements, do not match the efciency and accuracy of Meta-TFSTL, highlighting its robustness and efectiveness in leveraging larger data volumes for enhanced forecasting accuracy.
(2) Computation Cost.Te results from Figure 9 highlight a signifcant disparity in computational costs, manifesting through both speed and parameter count.Conventional RNN architectures, like LSTM and GRU, display moderate speeds with subpar performance.Teir relatively smaller parameter count makes them computationally lightweight and simpler in design.On the other hand, models like DeepAR and DeepVAR seem to prioritize model intricacy with a more compact parameter footprint.However, their elevated MAE suggests potential compromises in their performance.Te N-Beats model showcases a notably high parameter count, hinting at a complex model architecture.Tis complexity, however, does not necessarily translate to superior performance as its MAE is middling.Te N-Hits and TFT models strike a balance between speed, performance, and parameter count.
Interestingly, the Meta-TFSTL model emerges as a frontrunner in terms of performance, boasting the lowest MAE.With its highly parallelized design and transformerbased architecture, it achieves the fastest speed among the models, despite its substantial parameter tally.Such a design choice is justifable in applications where precision is paramount, even if it means increased computational overhead within a given timeframe.

Enhancing Trade Forecasting through Meta Knowledge
Adaptation: A Meta-TFSTL Case Study.Tables 3 and 4  signifcantly noticeable in both import and export sectors, as illustrated in Figures 10 and 11, with the model achieving exceptionally low MAPEs for commodities such as Cu (7.07%) and Agri (6.07%) in imports, and OAP (3.82%) and PP (4.99%) in exports, highlighting its predictive accuracy.Te model's robustness is particularly noteworthy in its adept handling of commodities known for their market volatility, such as Coal and Textile in exports, with MAPEs of 13.21% and 23.22%, respectively.Tis showcases Meta-TFSTL's capability to navigate and forecast within the unpredictable commodity markets efectively, where its adaptability and analytical prowess are paramount.
Te essence of the Meta-TFSTL model's success lies in its innovative adaptation of knowledge between import and export data, leveraging inherent patterns to enhance predictions.Tis adaptability is key, demonstrating the model's superior analytical capabilities and consistent performance over baseline models in volatile market conditions.Tis nuanced approach not only confrms the model's supremacy but also underscores the critical role of knowledge adaptation in forecasting market trends accurately.Trough meta-learning, Meta-TFSTL delivers dependable forecasts, crucial for strategic decision making, thus underscoring its indispensable value in commodities trading.

Discussion
While our Meta-TFSTL model demonstrates promising results in forecasting trade values, its practical applicability in real-world scenarios entails navigating a complex landscape of data availability, model interpretability, and adaptability to sudden market changes.Below, we detail the model's real-world applicability and delineate pivotal challenges alongside prospective enhancements.Conclusively, Meta-TFSTL represents a signifcant advance in trade forecasting.However, to fully realize its practical utility, it is essential to address these challenges through focused improvements, leveraging interdisciplinary collaboration and innovation to enhance the model's realworld applicability and inform strategic trade policy and economic planning.

Conclusion
In this study, we introduced Meta-TFSTL, a novel neural model that integrates Meta-Learning Enhanced Trade Forecasting with efcient multicommodity STL decomposition.Empirical evaluations demonstrated Meta-TFSTL's superiority over baseline models, ofering signifcant improvements in forecasting accuracy with the added benefts of swift computational efciency.Trough strategic application of STL decomposition, dual-channel spatiotemporal encoding, and innovative use of Struc2Vec graph embedding for spatial graphs and temporal graphs construction, Meta-TFSTL successfully merges insights from trend and seasonal components.Tis is further augmented by the adoption of fused attention and multisupervision strategies during the decoding phase.Incorporating metalearning and fne-tuning methodologies, we have established a framework for efective knowledge adaptation between import and export trade predictions, leveraging the shared insights across these domains.Looking ahead, we plan to introduce more complex methodologies to enhance the model's capabilities, ensuring that Meta-TFSTL continues to set benchmarks in trade forecasting accuracy and computational efciency.
Economics, and the Emerging Interdisciplinary Project of CUFE.

Figure 1 :
Figure 1: Te architecture of the proposed Meta-TFSTL.FC: fully connected layer.In ANIL, the model's feature-extracting backbone remains static (Regular Optimization), while the output head undergoes gradient descent updates (Meta-Learning-Based Optimization).Tis ensures stable foundational representations while allowing rapid task-specifc adaptations.

4. 4 .
Dual-Channel Multitasking Decoder.To transform the representation encoded by the dual-channel encoder into future representations for multistep import and export commodity value series prediction, this study employs a predictor (i.e., a fully connected layer) on the time dimension of X gat tr and X gat s ∈ R T 1 ×N×d .Trough the predictor, future representations of the trend and seasonal components,  Y f tr and  Y f s ∈ R T 2 ×N×d , are obtained.Subsequently, fusion attention and multitasking supervision merge the information of trend and seasonal components, acquiring knowledge through the supervision of the trend component.

5. 1 .
Dataset.Tis study utilizes the monthly trade value series of imported and exported commodity from China between January 2005 and January 2023, sourced from Global Trade Flow (https://gtf.sinoimex.com).Te trade values are tallied once a month and are denominated in US dollars.Tis dataset encompasses all of China's trade commodities in recent years.Based on the 2022 customs duty specifcations, inspection and quarantine codes set by China Customs, and the globally accepted HS8 codes, this study categorizes the commodities into 13 major classes for both imports and exports.Te commodities and their respective abbreviations are presented in Table

Figure 3 :
Figure 3: STL decomposition analysis across multiple commodities showcasing the identifed 10-month cyclicality and the distinct upward trends, reinforcing STL's adaptability in capturing economic series characteristics.

Figure 4 :
Figure 4: STL robust and unrobust decomposition of the value sequence of imported commodity Cu.

Figure 5 :Figure 6 :
Figure 5: Te red horizontal line represents the Mean Absolute Percentage Error (MAPE) of the ARIMA prediction results across the entire dataset (comprising the training, validation, and test sets).Te blue bars depict the MAPE values for specifc months, clearly highlighting several months where the prediction values signifcantly exceed the MAPE.Tese months are of particular interest and warrant further investigation.

( 1 )
Meta-TFX11 (Trade Forecasting via X-11-Decomposition-based Networks): this variant uses the classical X-11 decomposition method [57] for analyzing and adjusting seasonal fuctuations in the trade value series.(2) Meta-TFVMD (Trade Forecasting via Variational Mode Decomposition-based Networks): this model employs Variational Mode Decomposition (VMD) [58] for decomposing the trade value series into a set of intrinsic mode functions.(3) Meta-TFWavelet (Trade Forecasting via Wavelet-Decomposition-based Networks): this variant employs the Discrete Wavelet Transform (DWT) [59] instead of STL for decomposing the trade value series.(4) w/o G: a version of Meta-TFSTL without both spatial and temporal graphs.(5) w/o D: a version of Meta-TFSTL without the time series decomposition layer.(6) w/o F: a version of Meta-TFSTL where fusion attention is replaced with additive operations.

Figure 8 :
Figure 8: Comparative scalability analysis of neural network models for time series forecasting, highlighting Meta-TFSTL's superior performance.

Figure 9 :
Figure 9: Performance (y axis), speed (x axis), and parameter count (size of the circles) of methods.
, which integrates the representations of the trend and seasonal components, f tr and  Y f s , into the future trade value sequence  Y f ∈ R T 2 ×N×d .Tis mechanism captures future internal dependencies by considering the trend component as the query, extracting useful long-term and short-term information from both the trend and seasonal components within two temporal attentions.Te fusion attention mechanism is expressed as follows:  Y f � Concat fa 1 , . . ., fa n , . . ., fa N , where fa n � Attention  Y

Table 1 :
Abbreviations for imported and exported commodity.

Table 3 :
Performance comparison of various models for imported commodity value series.
Bold: best; underline: second best; italics: best in baseline.

Table 4 :
Performance comparison of various models for exported commodity value series.

Table 5 :
Ablation study results for imports.

Table 6 :
Ablation study results for exports.
6.1.Real-World Applicability of the Model (1) Data Availability and Quality.Te performance of Meta-TFSTL heavily relies on access to accurate, detailed, and current trade data.Challenges such as delays in data collection, inconsistencies across international trade databases, and restrictive data policies can hinder model efectiveness.Enhancing collaborations with global trade organizations and exploring alternative data sources, like satellite imagery, could improve data quality and enrich model inputs.(2) Model Interpretability.Te ability to interpret model predictions is crucial for trade policy and economic decision making.Despite its accuracy, the complex architecture of Meta-TFSTL may not be easily understandable, emphasizing the need to incorporate Explainable Artifcial Intelligence (XAI) techniques to clarify the model's predictive processes and build trust among stakeholders.(3) Adaptability to Market Fluctuations.Te dynamic nature of global trade, infuenced by geopolitical, economic, and policy changes, requires a model that can quickly adapt.Integrating live economic indicators and sentiment analysis could enhance Meta-TFSTL's responsiveness, allowing for timely model updates in response to changing global trends.Bilateral Trade Dynamics.Te model might not fully capture the complexities of bilateral trade agreements and policies.Developing a more nuanced approach that considers tarif negotiations, trade barriers, and bilateral agreements could ofer a deeper understanding of global trade fows.(2) Market Scalability.While Meta-TFSTL shows promising results for China's trade data, extending its applicability to diverse economic systems and trade regulations worldwide is challenging.Future research should aim to test and adapt the model across diferent global markets to achieve broad applicability and scalability.