A Deep Transfer NOx Emission Inversion Model of Diesel Vehicles with Multisource External Influence

Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei 230088, China School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China School of Computer Science and Technology, Anhui University, Hefei 230601, China Department of Automation, University of Science and Technology of China, Hefei 230026, China Institute of Advanced Technology, University of Science and Technology of China, Hefei 230088, China Key Laboratory of Technology in GeoSpatial Information Processing and Application System, Chinese Academy of Sciences, Beijing 100192, China Key Laboratory of Environmental Optics & Technology, Anhui Institute of Optics and Fine Mechanics, Chinese Academy of Sciences, Hefei 230031, China Institute of Advanced Technology, University of Science and Technology of China, Hefei 230088, China


Introduction
With the rapid development of China's urbanization and social economy, China's motor vehicle fleet is growing rapidly and has become the world's largest production and marketing of motor vehicles for eleven consecutive years. At the same time, the air pollution problem caused by motor vehicle emissions is becoming increasingly serious, and it has become an important source of air pollution in large and mediumsized cities in China, as well as the important cause of fine particulate matter and photochemical smoke pollution. As an important tool for quantitative accounting of mobile source pollution, emission inventory can be used for air pollution control measures and traceability analysis. Because the current mobile source pollution emission regulation mainly relies on the annual vehicle inspection, the single vehicle emission testing takes a long time, and the test results cannot fully reflect the actual emissions of vehicles on the road, making it difficult to realize dynamic regulation of mobile source emissions. By installing on-board diagnostics (OBD) on tested vehicles [1], the after-treatment exhaust emissions can be monitored in real time to provide data support for the construction of dynamic emission inventories of mobile source emission. However, due to security of data privacy and equipment installation costs, it is not possible to install monitoring equipment on all road-running vehicles for emission detection, while a series of problems such as human data tampering and equipment failure often leads to missing monitoring values, which greatly limits the application of OBD monitoring data in mobile source emission management. erefore, it is significant to improve the application efficiency of OBD monitoring data through reliable analysis of features affecting emission detection and accurate prediction of missing monitoring data for mobile source emission precise regulation.
Existing emission estimation methods for mobile sources are mainly divided into average-speed based models and actual driving cycle based models. e former method usually builds a statistical regression model of pollution emissions based on the average speed of a fleet of vehicles and is usually used to estimate the macrolevel traffic pollution emissions within a specific region (administrative district or city) for a specific time period (usually a quarter or a year). e typical models are the MOBILE model developed by the US Environmental Protection Agency (US EPA) [2], the EMFAC model developed by the California Air Resources Board (CARB) [3], and the COPERT model developed by the European Commission (EC) [4]. ese models obtain emission features from standard bench test cycles and characterize vehicle emission characteristics in terms of mean values such as average speed and average emission features, while ignoring the effects of actual road operating conditions, driving behavior, and vehicle dynamics on vehicle emissions. e driving cycle based emission models analyze the emissions of vehicles under different driving cycles through the complete driving process based on the multidimensional working condition characteristics data such as instantaneous speed and acceleration obtained when the vehicle is driving, which are suitable for the tasks of analyzing single-vehicle emissions or emissions calculations for a specific number of roads. e main models in this category are the IVE model [5] and the CMEM model [6] developed by the University of California, Riverside (UCR), the MOVES model [7] developed by the EPA (which can estimate emissions based on both average speed and driving cycles), and the EMIT model [8] developed by the Massachusetts Institute of Technology (MIT). Due to the lack of basic vehicle testing data, domestic research on emission factor models started late, and default foreign model values were used directly in assessing local vehicle pollution emissions, resulting in large estimation errors. In recent years, with the development and application of vehicle emission monitoring system, it can obtain the actual road emission features used to make corrections to foreign emission models [9]. Quirama et al. [10] used PEMS to construct an energybased microtrip operating model and estimate the actual energy consumption and exhaust emissions of a fleet in a given region. Tsinghua University developed an emission factor model for the Beijing Vehicle fleet (EMBEV) based on mature foreign emission models, which integrates the average speed and driving cycle to achieve macro-and microemission factor acquisition [11]. Wang et al. [12] used a sequential decision strategy based on vehicle low-frequency GPS trajectories to achieve roadway speed estimation and combined with a microscopic emission model to estimate vehicle CO 2 emissions. Wang et al. [13] considered the influence of vehicle historical operating state and constructed a microscopic emission model based on BP neural network using short-time driving cycle. e traditional driving cycle-based emission model uses artificially designed parameters such as vehicle speed and acceleration to characterize the relationship between vehicle driving cycle and pollution emissions, but it ignores the vehicle engine operating state information and inadequate representation of vehicle driving cycle characteristics, which makes it difficult to effectively estimate the exhaust emissions of monitoring missing vehicles under different driving conditions.
With the boom in machine learning and deep learning research, some scholars began to introduce artificial intelligence techniques into the research of mobile source emission estimation. Chen et al. [14] used quantile regression forest on vehicle remote sensing data for NO prediction. Xu et al. [15] established a spatiotemporal map convolutional multifusion network for effective prediction of regional vehicle emissions in Hefei city. Xu et al. [16] combined deep moment residual early-late fusion network with semisupervised geographical weighted regression to predict regional emissions as a spatiotemporal series data. Altug and Kucuk [17] trained XGBoost with engine speed, engine torque, pedal position, and vehicle speed data as inputs to predict NO x emissions and compared it with elastic network and LSTM, showing its high accuracy. Fei et al. [18] proposed a multicomponent fusion time network to predict the emissions of CO considering multiple complex features. Xu et al. [19] constructed a mobile source emission prediction model based on deep neural network to realize the relationship mapping between vehicle transient operating conditions and pollution emissions and further proposed a deep correction model for remote emission sensing data based on COPERT emission features by constructing a three-layer AutoEncoder network to realize the feature extraction of multisource heterogeneous data, such as meteorological data, road network data, traffic flow data, and urban functional areas [20].
In actual vehicle emission detection systems, due to differences in vehicle driving conditions, engine operating conditions, and driving behavior patterns, it is impossible to ensure that the emission monitoring data of different vehicles always follow the same distribution. However, in a traditional machine learning emission model, it is usually assumed that the training set and testing set of emission testing data are derived from the same data distribution, and a unified emission model is used for estimation of different types of vehicles, ignoring the difference in monitoring data distribution. As shown in Figure 1, Label_S represents the NO x emission values of diesel vehicle used for training to obtain the NOx prediction model on one kind of diesel vehicle, which is the label of the training set in the regression model. Label_T represents the NO x emission values of diesel vehicle that are expected to make use of the knowledge of the prediction model obtained on the Label_S dataset, which is the label of the data set on another kind of diesel vehicle in the regression model. e NO x emission distribution of source domain and target domain is different, which will cause the performance degradation in the supervised model based on independent identical distribution assumption. e transfer learning technique [21] can provide a solution for the construction of exhaust emission prediction models under different driving conditions by transferring the data-complete source domain knowledge to the data-sparse target domain.
Inspired by the insight of transfer learning, a novel NO x emission inversion prediction method for diesel vehicles is proposed in this paper. Specifically, it is a deep transfer learning (DTL)-based model which firstly uses Spearman correlation analysis and Lasso feature selection to accomplish the selection of factors with high correlation with NOx emission from multiple influence factors (e.g., throttle state and engine-related states). en, the stacked sparse AutoEncoder is used to map different vehicle working condition emission data into the same feature space, and then the distribution alignment of different vehicle working condition emission data features is achieved by minimizing maximum mean discrepancy (MMD) in the feature space. Finally, we validated the proposed method on the real-world diesel vehicle OBD data, and the comprehensive results show that the proposed DTL model outperforms several deep learning (DL) methods, indicating that DTL based on multiple sources of external influences has great potential for diesel vehicle NO x emission prediction in the case of insufficient monitoring data. e rest of the article is organized as follows. Section 2 discusses the related works. Section 3 describes the construction of the DTL model. In Section 4, several experiments are conducted. e conclusions and future research are drawn in Section 5.

Related Works
2.1. Lasso. Using features unrelated to the prediction as input variables will increase the complexity and reduce the explanatory power of the regression model, so it is necessary to select the relevant initial features. Least absolute shrinkage and selection operator (Lasso) proposed by Tibshiran [22] is a commonly used variable selection method in terms of the machine learning field. It achieves variable selection by adding the L1 norm so that some of the variable coefficients in the input variables are trained to be set to 0. e loss function is as follows: where λ is the penalty coefficient and the larger its value, the fewer variables are retained. e cross-validation method is usually used to determine its optimal value.

SAE. AutoEncoder (AE)
is a symmetric single hidden layer neural network [23]. It consists of an encoding module and a decoding module, where the encoding module is represented by the input layer to the hidden layer, and the decoding module is represented by the hidden layer to the output layer. After training, it is able to copy the input to the output to the maximum extent possible, and the features of the hidden layer represent an abstract representation of the input features in the feature space. e AE structure is shown in Figure 2 contains m features, where the features h of the hidden layer is specifically expressed as e formula w 1 is the weight from the input layer to the hidden layer, b 1 is the bias from the input layer to the hidden layer, and f is an activation function, and in this paper, we choose sigmoid. e reconstructed feature x can be expressed as where w 2 is the weight of the hidden layer to the output layer and b 2 is the bias of the hidden layer to the input layer. In order to ensure that x can be restored to the maximum extent, the loss function is used as follows: When the number of hidden layer neurons is smaller than the number of inputs, the AutoEncoder can achieve data compression. e AutoEncoder simply copies the input to the output in training, which makes it difficult to obtain meaningful feature representations. Nowadays, research compensates for this drawback by adding constraints to traditional AutoEncoder, resulting in various novel AutoEncoder, such as Denoising AutoEncoder (DAE) [24], Sparse AutoEncoder (SAE) [25], and Variational AutoEncoder (VAE) [26].
In SAE, KL divergence is added as a sparse penalty term to force only some of the neurons in the hidden layer to be activated. e KL divergence is expressed as follows: where ρ represents the probability of the hidden layer neuron being activated, which is generally taken as a value close to 0. ρ j is the actual activation probability of the jth neuron in the hidden layer, which is expressed as follows: where f j (x i ) represents the activation probability of the hidden layer neuron j when the input data is the ith sample.
In addition, to prevent the network from overfitting, the L2 norm is added to the loss function and α and β are the penalty coefficients of the sparse and weight terms. In summary, the loss function of SAE is as follows: 2.3. Stacked AutoEncoder. Compared to a normal AutoEncoder, a stacked AutoEncoder can obtain hidden features that are more suitable for complex regression tasks. e stacked sparse AutoEncoder uses layer-wise unsupervised pretraining [27]; specifically, after the simple sparse AutoEncoder is trained, the features of the hidden layer are used as a new input to train a new sparse AutoEncoder, which can be described as n ⟶ m ⟶ n ⟹ m ⟶ k ⟶ m ⟹ · · · ⟹ s ⟶ t ⟶ s, and when the required number of layers is reached, all hidden layers are combined in order to form a stacked sparse self-encoder.

Domain Adaptation.
Domain adaptation (DA) [28] is a more popular transfer learning method, which aims to map source features with different distributions and target features into the same space and draw the distributions of the two close in the feature space, thus achieving distribution alignment, and then the objective function obtained by training using the source data in the feature space can be transferred to the target domain.
ere are three main types of DA methods in deep learning, which are discrepancy-based domain adaptation, adversarial-based domain adaptation, and reconstructionbased domain adaptation.
Discrepancy-based domain adaptation focuses on measuring the difference between the source and target domains by adding a certain metric and achieving alignment between the source and target domains by minimizing this metric. In deep domain adaptation, Tzeng et al. [29] proposed a new CNN structure that performs domain adaptation by adding an adaptive layer and an MMD-based loss function and has excellent performance on vision domain tasks; Werner et al. [30] proposed central moment difference (CMD), which performs domain adaptation by aligning the central moments of each order between domains; Li et al. [31] proposed a DTN based on MMD for adaptation of edge distribution and conditional distribution, which has superiority in image classification and recognition as well as text classification.
Adversarial-based domain adaptation is mainly achieved through adversarial with discriminators, where the generator aligns source and target data on the feature space. Eric et al. [32] combined discriminative model, weight sharing, and GAN loss to propose Adversarial Discriminative Domain Adaptation (ADDA); Judy et al. [33] proposed Cyclic Consistent Adversarial Domain Adaptation (CyCADA) to perform cross-domain adaptation at both pixel level and feature level while ensuring semantic consistency. Shen et al. [34] proposed WGDRL metric and optimized feature extraction network to reduce Wasserstein distance in an adversarial manner.
Reconstruction-based domain adaptation mainly focuses on domain adaptation by reconstructing the data to ensure that the learned features remain unchanged. Glorot et al. [35] proposed domain adaptation based on stacked AutoEncoder SDAs to extract higher-order semantic information; Bousmalis et al. [36] proposed a DSN framework to decode the source and target domains with a common decoder for each of the three encoder outputs that extract the common features between different domains and use the shared features for transferring.

MMD.
Maximum mean discrepancy (MMD) is used more frequently in transfer learning as a common means of encoder decoder Figure 2: e structure of AutoEncoder.
measuring the difference between two domains. It maps the original data into Hilbert space and then measures the distribution between the two domains, which is a kernel learning method [37]. e specific metric formula is as follows: where k(·) is the mapping for mapping the original data into the Reproducing Kernel Hilbert Space (RKHS), X, Y denote the samples of two distributions, and F is the set of mapping functions.

Methodology
In this section, we mainly introduce the details of the model in this paper. As shown in Figure 3, we propose a deep transfer learning (DTL) model for NO x emissions from diesel vehicles based on multisource external influences, using Spearman for correlation analysis and Lasso-based feature selection to find out the features with strong correlation with diesel vehicle NO x emission. After that, the stacked sparse AutoEncoder is designed to extract the common hidden features in the source and target domains. e data alignment of different vehicle models is achieved by minimizing the MMD distance between the source and target domains. Finally, the transfer of NO x emission models between vehicles with different data distribution is obtained.

Data Description.
e OBD data of diesel vehicles collected in Hefei in 2020 includes license plate, terminal number, data date, engine speed, actual output torque percentage, water temperature of engine, oil temperature of engine, after-treatment downstream NO x value, aftertreatment downstream oxygen percentage, atmospheric pressure, environmental temperature, after-treatment waste mass flow rate, urea tank level percentage, temperature of urea tank, vehicle speed, gas pedal opening, word trip mileage, total mileage, engine instantaneous fuel injection, engine instantaneous fuel consumption rate, average engine fuel consumption, engine fuel consumption for a single trip, cumulative engine fuel consumption, battery voltage mailbox level, cumulative engine running time, longitude, latitude, SCR upstream temperature, and SCR downstream temperature. Table 1 shows the comparison of the detailed parameters of the source domain diesel vehicle and the target domain diesel vehicle. In order to improve the data quality, we preprocessed the data, including data deduplication, outlier removal, and removal of irrelevant features. After preprocessing, the data statistics of source domain diesel vehicle and target domain diesel vehicle are shown in Tables 2 and 3.

Relevant Features Selection.
ere are many features affecting NO x monitoring downstream of diesel vehicle after-treatment, and for the source data after pretreatment, we calculate Spearman correlation coefficients between NO x and many features downstream of after-treatment, such as oxygen percentage, engine speed, and temperature of engine water, and subject these coefficients to hypothesis testing at p � 0.05(t � 1.645) and remove the uncorrelated NO x emission external features as new characteristics. e specific values of Spearman coefficient and T value are shown in Table 4, from which it is easy to know that the temperature of engine oil, the temperature of ambient, temperature of urea tank, and percentage of urea tank level are not related to NO x emission under the condition of t � 1.645.
After finding out the new features, the Lasso algorithm was used to calculate the correlation coefficients of each feature with NO x , and then the features whose coefficients were not 0 were taken as the final features. Among them, the Lasso coefficients of each feature with NO x after Spearman correlation analysis are shown in Table 5, where the Lasso coefficients of vehicle speed and NO x are 0, and they are removed from the final features.
e new source data consisting of Spearman and Lasso processed features are denoted as n X S , and their features feature n X S are subdivided by source into vehicle enginerelated, vehicle throttle-related, and vehicle after-treatment system-related, and their specific classification is shown in Table 6.
In order to ensure that the source features and the target features are the same, we take the feature feature n X S of n X S as the benchmark and make the feature feature X T in the target domain intersect with it, and the obtained feature feature n X T forms the new target domain data n X T for which the visualization is expressed as feature n X T � feature n X S ∩ feature X T , feature n X S ∈ feature X T .

(9)
Since the monitoring elements for diesel vehicle emissions are consistent, feature n X S is a subset of feature X T .

DTL.
After screening the external correlates of diesel vehicle NO x emissions, we obtained homotypic source and target data highly correlated with vehicle NO x emissions, whose characteristics contain engine speed, actual output torque percentage, temperature of engine water, gas pedal opening, after-treatment downstream oxygen percentage, and after-treatment exhaust gas mass flow rate. Domain adaptation is achieved by minimizing the MMD distance representing the difference in distribution between the source and target data through a deep transfer of network projection to the common space and a high-dimensional sparse representation.

Stacked Sparse AutoEncoder.
We take n X S as the input of the first layer of the stacked sparse AutoEncoder, the number of hidden layer neurons is set to 5 times the number of input features, α � 2, β � 0.01, and the probability of the hidden layer neurons being activated ρ is 0.05, and we optimize the loss function J SAE by backpropagation and save Journal of Advanced Transportation the hidden layer weights after the network converge. en, we use the hidden layer feature data as input and train the new sparse AutoEncoder according to the above steps, and when the required number of stacked layers k is reached, the saved hidden layers are stacked, and Table 7 shows the hidden feature dimensions of different stacked layers.

Weight Sharing.
In order to learn the common hidden features of the source and target domains more quickly and efficiently, we use weight sharing as a means to transfer the weights of each layer of the stacked sparse AutoEncoder trained with the source data to the final deep transfer network.
Weight sharing is common means in deep transfer learning [38][39][40]. After pretraining a stacked sparse AutoEncoder with source data, the weights W i and bias b i of each layer need to be shared to the new stacked sparse AutoEncoder to complete the weight transferring, described as  Figure 3: e architecture of the proposed method.
where w n i is the ith hidden layer weight of the new stacked sparse AutoEncoder and b n i is the ith hidden layer bias of the new stacked sparse AutoEncoder; w i and b i are the ith hidden layer weight and bias of the trained stacked sparse AutoEncoder, respectively.

Feature Transfer Learning.
In order to mix the source domain and the target domain into the same domain in the feature space, we put n X S , n X T into the new sparse AutoEncoder together as inputs and use MMD as the loss function. erefore, the loss function of the stacked sparse AutoEncoder based deep transfer network is as follows: By continuously minimizing the MMD, the distribution of target and source domain can be effectively brought closer together in new feature space. By continuously minimizing the MMD, the distributions of the target domain and source domain can be effectively approximated in the new feature space. With the help of back propagation, the gradient

Target Domain
. NO x prediction. e source data and the target data are projected onto the feature space by the deep transfer network, and the domain adaptation is completed. e transformation of the original features through the deep transfer network can be described in detail by the following equation: where x i is the original feature in column i, [x 1,i , x 2,i , . . . , x s,i ] is the feature representing x i on the feature space, where s is an artificially set parameter.
Since the NO x values downstream of the after-treatment are affected by nonlinear features, such as engine speed, oxygen percentage downstream of the after-treatment, and engine water temperature, we chose to use a BP neural network to build a regression prediction model.
After feature transferred, we divide [Rec X S , Y S ] into training and validation sets by 8 : 2, and [Rec X T , Y T ] is used as the test set to construct a double hidden layer BP neural network model. e mean square error (MSE) is chosen for the loss function of the whole regression network, and the mean absolute error (MAE) is chosen for the evaluation index, and Adam is used as the optimization function, and after the whole network converges, it is tested on the test set.

Evaluation Metrics.
We use mean absolute error (MAE) and root mean squared error (RMSE) to effectively evaluate the prediction effectiveness of NO x emissions. ey are calculated as follows: where m is the number of samples, y i is the true value of the label, and y i is the predicted value of the label.

MMD Settings.
e arrangement of MMD is diverse. We try to add MMD at different layers for domain adaptation based on stacking sparse AutoEncoder layers and compare the MAE and RMSE of their predicted values to select the optimal position as the final model setup. Table 8 shows the MAE, RMSE comparison of the predicted values after selecting sparse AutoEncoders with different layers and trying to add MMD at different layers, where SAE (n), n represents the number of stacked sparse AutoEncoder layers, and the number in the DTL term represents the addition of MMD at that layer. Since the number of features dimensions increases exponentially as the layers are stacked, which also increases the training time, we only compare the stacked sparse AutoEncoder up to 3 layers. From the results in Table 8, we choose the sparse AutoEncoder with three stacked layers and add MMD in the second layer for domain adaptation.

Model Performance.
In order to verify the effectiveness of the model proposed in this paper, we compare the traditional deep learning model with the results of our model. e traditional deep learning (DL) model defaults to the source domain and the target domain belonging to the same distribution and uses the source data as the training set and the validation set to train a BP neural network and the target data as the test set. Figure 4 shows the prediction effect of 100 randomly selected data points, in which the DTL model proposed in this paper has a smaller prediction error and a better fit with the true value compared with the traditional DL model.
To further validate the effectiveness of the model in this paper, we conducted experiments using the DTL and DL on the dataset without relevant feature screening (the effect is shown in Figure 5), and the training set, validation set, and test set were kept consistent with the previous experiments. In the regression prediction section, Random Forest, Support Vector Regression (SVR), and AdaBoost were tried as regressors, and Table 9 shows the comparison of the prediction results of the models, where DL represents the prediction model with deep learning without considering the influence of external features on NO x emissions, nDTL represents the prediction model with deep transfer learning without considering the influence of external features on NO x emissions, nDL represents the prediction model with deep learning with considering the influence of external features on NO x emissions, and nDTL represents the prediction model with deep transfer learning with considering   Table 9 shows that the performance of the DL without feature transfer is significantly higher than that of the DTL model, which proves the effectiveness of feature transfer in unsupervised prediction, and it can be clearly concluded that the data after feature screening is more favorable for the prediction of diesel vehicle NO x concentration, and in this experiment, the model of the  neural network is better than the models of traditional machine learning. Figure 6 shows the visualization effect of the source diesel vehicle and target diesel vehicle before and after feature transferring by t-sne dimensionality reduction. Figure 6(a) shows the distribution of engine speed, real-time output torque percentage, temperature of engine water, gas pedal opening, after-treatment downstream oxygen percentage, and after-treatment exhaust gas mass flow rate features on the source diesel vehicle and the target diesel vehicle after t-sne dimensionality reduction. Figure 6(b) shows the distribution of the above features on the source diesel vehicle and the target diesel vehicle after t-sne dimensionality reduction by the reconstructed features after the deep transfer learning framework proposed in this paper. It is obvious from the figure that the data distribution of the source diesel vehicles and the target diesel vehicles after the training of the DTL model can be basically mixed in one domain, and domain adaptation is achieved.

Exploring the Influencing Features of NO x Emission.
e above experiments selected relevant features that have significant effect on NO x and predicted them effectively on the DTL model. In order to further investigate which specific aspect of features has more influence on NO x concentration, we trained DTL with each type of attribute distribution according to the source division of influencing features in Table 6, predicted NO x , and obtained MAE of predicted data, RMSE, as shown in Table 10.
From the indicators in Table 10, it is easy to know that the throttle-related feature for DTL is better predicted; that is, the degree of opening and closing of the throttle pedal has a great influence on the NO x emissions of diesel vehicles during driving. In the real world, the acceleration sensitivity of diesel cars is poor. If the engine suddenly increases the fuel supply when the gas pedal is stepped on sharply at low speed of the diesel car, the circulating fuel supply will increase sharply, and due to its poor sensitivity, the diesel engine speed will not increase much, resulting in relatively weak air turbulence, prolonging the combustion process and increasing incomplete combustion, which eventually leads to increased NO x emission; when the throttle is released sharply, it will cause the engine combustion conditions to deteriorate and work unstably due to the sudden closing of the throttle, and the NO x emission will increase. erefore, the driver should operate   the throttle smoothly when driving, not emergency pedal or emergency release of the throttle pedal. Both engine-related and throttle-related features can be controlled in real time by the driver during driving. Proper driving behavior can greatly reduce pollutant emissions, that is, when road conditions and environmental conditions permit, a steady speed should be maintained and frequent speed changes should not be made, and NO x emissions from diesel vehicles can be effectively reduced through aftertreatment systems.

Conclusion
In this paper, we propose a deep AutoEncoder transferring inversion model for NO x emission prediction of diesel vehicles under the integration of multiple sources of external influences to perform NO x emission pattern transferring among different diesel vehicles and then effectively improve the accuracy of diesel vehicle NO x emission prediction. For the OBD data of diesel vehicles, the features related to NO x emissions are selected by using Spearman correlation analysis and Lasso feature selection, and the selected features of engine speed, actual output torque percentage, temperature of engine water, gas pedal opening, after-treatment downstream oxygen percentage, and after-treatment exhaust gas mass flow rate have strong correlation with NO x emissions, and the designed DTL learning framework with distribution alignment relies on diesel vehicles containing the above strong correlation features with corresponding NO x emission values and diesel vehicles with only the above strong correlation features to jointly train the network model so that the diesel vehicle data of different categories converge to the same distribution in the feature space and then train the objective function in the feature space using the diesel vehicle data containing NO x emission values, and transfer to the diesel vehicles without NO x emission values to achieve NO x prediction of diesel vehicles without NO x emission values, which provides an effective prediction method for the prediction of unlabeled diesel vehicle data. Based on the analysis of diesel vehicle NO x based on external features from different sources, vehicle throttle-related features have large impact on diesel vehicle NO x emissions, and reasonable control of throttle state during driving is an important means to effectively control diesel NO x emissions.
Future research can be extended in the following ways. (1) In potential feature extraction, other methods can be tried to find the abstract representation of the original features in the feature space. (2) In feature transfer, the MMD metric is used to measure the difference in distribution between two domains, and a more appropriate metric can be selected in later studies based on the dataset.

Data Availability
e data used to support the findings of this study have not been made available because of data ownership issues.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.