Deep Learning-Based Big Data Analytics for Internet of Vehicles: Taxonomy, Challenges, and Research Directions

The Internet of Vehicles (IoV) is a developing technology attracting attention from the industry and the academia. Hundreds of millions of vehicles are projected to be connected within the IoV environments by 2035. Each vehicle in the environment is expected to generate massive amounts of data. Currently, surveys on leveraging deep learning (DL) in the IoV within the context of big data analytics (BDA) are scarce. In this paper, we present a survey and explore the theoretical perspective of the role of DL in the IoV within the context of BDA. The study has unveiled substantial research opportunities that cut across DL, IoV, and BDA. Exploring DL in the IoV within BDA is an infant research area requiring active attention from researchers to fully understand the emerging concept. The survey proposes a model of IoV environment integrated into the cloud equipped with a high-performance computing server, DL architecture, and Apache Spark for data analytics. The current developments, challenges, and opportunities for future research are presented. This study can guide expert and novice researchers on further development of the application of DL in the IoV within the context of BDA.


Introduction
By 2025, the massive ecosystem of the Internet of ings (IoT) is projected to pave a smooth way for 100 billion connections. us, the IoT can revolutionize future industries [1]. e IoT has been extended to the Internet of Vehicles (IoV) [2] due to incorporation of intelligent transportation systems for enhanced services [3]. e IoV allows vehicles to communicate with their internal and external environments. e communications of vehicles in sharing information can be in a different form. For example, the vehicles can communicate with sensors, road infrastructure, vehicles, and the Internet [4]. e building blocks of the IoV are the connected vehicles.
e IoV evolution is driven by the dynamic mobile communication system with capabilities of gathering, sharing, processing, computing, and securing the release of information [5].
Over 90% of road accidents are caused by human errors. is finding has prompted the emergence of autonomous vehicles to eliminate drunk driving, sleeping while driving, and human errors. As next-generation vehicles that usher in a new frontier of vehicle revolution worldwide, autonomous vehicles can reduce traffic congestion and improve energy efficiency. Different vehicle manufacturing companies, such as Volkswagen, Waymo, Tesla, Hyundai, Mercedes Benz, Baidu, BMW, and Ford, conduct test runs of autonomous vehicles. ese autonomous vehicles will be merged into the IoV. Autonomous vehicles need to communicate with the internal and external environments for safety and smooth driving. ese vehicles have attracted the attention of the academia and the industry because of their positive impact on the society [6]. Reference [7] argued that the autonomous vehicle market is presently growing and is expected to hit USD 131.9 billion in 2019.
Autonomous vehicles are expected to flood European public roads by 2021 [2]. In China, 8.6 million autonomous vehicles are anticipated to hit public roads by 2035. Out of the 8.6 million vehicles, 3.4 million will be fully autonomous while the others will remain semiautonomous [8]. In USA, hundreds of autonomous vehicles are expected to start operating on public roads in the near future. 25% of the global automobile market between 2015 and 2040 is estimated to be dominated by autonomous vehicles [8] equipped with sensors for communications to realize the IoV experience. Reference [9] pointed out that 200 sensors are projected to be embedded in each vehicle in 2020 to cope with the increasing communications with the environments. ese sensors are expected to generate massive amounts of data. Reference [10] estimated that in 2021, 380 million connected vehicles will be running on public roads, and each was projected to generate 25 GB of data every hour. Reference [5] argued that the IoV will generate information more than the telecommunication industry. For instance, the smart processes of collecting, processing, and releasing dynamic traffic information emanating from various sources within a city will require a petabyte-scale system. erefore, the IoV ushers in the big data arena.
Deep learning (DL) plays a critical role in big data analytics (BDA) because of its capacity to process big data to uncover knowledge from the complex system [11]. DL searches for the network elements or features in respect of input data by mimicking how human brain operates to generate the best solution [12]. Different from conventional techniques, DL can deal with raw natural data [13]. Deep neural networks have won multiple awards in pattern recognition competitions [14]. In machine learning, DL is the most active theme in current times [15]. DL is expected to record more number of successes in the near future because its architecture requires minimal human effort in engineering [13].
Despite the success of DL in different domains and the unprecedented attention it currently receives from researchers, the empirical exploration of DL in the IoV within the context of BDA is highly limited in the literature. We believe that exploring DL in the IoV within the context of BDA can improve the effectiveness and efficiency of the IoV as a key component in decision making.
e IoV is an emerging concept in its early stage. erefore, a theoretic viewpoint is required to guide the effective empirical applications of DL in the IoV within the context of big data.
We present a survey and theoretical perspective leverage of DL in the IoV within the context of BDA. e intention is to stimulate the research community to focus on exploring DL in the IoV within the context of BDA. is approach can unveil valuable knowledge from the large-scale data expected to be generated from the IoV. Exploring the theoretical aspect of big data is crucial [16] for it is empirical application. e remainder of this paper is organized as follows. Section 2 presents the rudiments of DL. Section 3 presents the concept and new taxonomy of the IoV. Section 4 emphasizes the case studies of the IoV. Section 5 introduces the BDA platform that supports DL in the IoV. e role of DL in the IoV in the context of BDA is presented in Section 6. Section 7 presents the proposed model of the IoV integrated into the cloud equipped with a high-performance computing server, DL models, and Apache Spark. Section 8 outlines the research challenges and future research opportunities. Lastly, the concluding remarks are presented in Section 9.

Deep Learning Architecture and Applications
In this section, we provide a brief description of the DL and some major DL architectures and their variants given the limited scope of this study. e main application domain of DL architecture is outlined. A simple taxonomy of the architecture and applications of DL is presented in Figure 1.
e major DL architecture discussed is as follows: deep belief network (DBN), generative adversarial network (GAN), and convolutional neural network (ConvNet).
DL is the branch of machine learning that allows the computers to learn from experience and comprehend the hierarchy of a concept in the world [17]. DL includes computational models that permit the composition of multiple layers of processing elements to learn the representation in datasets with multiple levels of abstraction. DL uses the backpropagation algorithm to uncover complex structure in large-scale datasets. e DL algorithms and their architecture newly proposed in the literature are geared toward minimal human effort in engineering [13].

Deep Belief Network.
e DBN architecture ( Figure 2) is a deep ANN that comprises a sequential arrangement of the unsupervised restricted Boltzmann machine (RBM). We discuss the basic concepts of the RBM for easy understanding of the DBN and how it works to achieve its goal. e RBM is the major building block of the DBN. e RBM is a stochastic two-layered ANN that has hidden and visible layers, as shown in Figure 3. It is restricted because the connection between neurons on the same layer is restricted. e data representation in the RBM occurs in the visible units, and the learning that represents features capturing the higher-order correlation in the experimental data occurs in the hidden layer. e visible and hidden layers are connected by a matrix of asymmetric weight W connections [18]. e computation of the weights in the RBM assumes that the probability of the distribution of input vector (x) can be expressed as follows: where Z (W) ℓ (x; W) is the normalized constant. In the architecture of the DBN, the hidden layer of the RBM is   Mathematical Problems in Engineering visible to the subsequent RBM [19]. e main idea of the DBN is that the DBN weight W is learned by the RBM that is defined by p(v|h, W) and the predistribution over the hidden vectors p(h|W). Accordingly, the probability of generating v can be expressed as follows: When W is learned, p(v|h, W) is maintained. However, p(h|W) is substituted by a superior model with better performance of the aggregated posterior over the hidden vectors [18]. e weights of the DBN are learned using the contrastive divergence approach to avoid being stuck in local minima and to improve the speed of the convergence contrary to the typical Markov Chain Monte Carlo approach [20]. e training of the DBN involves two phases, namely, unsupervised and supervised. Unsupervised training is performed by the contrastive divergence algorithm to determine the initial weights using sample input data, whereas supervised training is performed by the backpropagation algorithm to obtain the final and optimal weights [21,22]. e DBN can be used to reduce high-dimensional data to low-dimensional data without losing accuracy [23].

Deep Recurrent Belief Network.
e DBN vanishing gradient with increase in delay causes long delays when learning. A deep recurrent belief network (DRBN) with distributed time delay is proposed to avoid this problem. A Gaussian network is applied to initialize the weights of every hidden neuron. Markov Chain Monte Carlo is used to evolve the dynamic Gaussian Bayesian network over the training samples to initialize the weights of the hidden neurons [19].

Discriminative Deep Belief Network with Ant Colony
Optimization.
e DBN and the discriminative (D) feature of backpropagation are combined to produce DDBN. e optimum parameter of the DDBN is selected automatically without human intervention through ant colony optimization (ACO) to avoid the laborious trial and error in DDBN parameter selection. e resulting algorithm is the combination of DBN, D, and ACO to obtain DDBN-ACO [24].

Adaptive Fractional Deep Belief Network.
Adaptive fractional DBN (AFDBN) is another variant of the DBN. In this variant, fractional calculus is used to generate the learning weights to obtain an optimal weight suitable for yielding optimum results. Learning by fractional theory is conducted by derivative theory [25].

Quaternion-Improved Harmony Search Deep Belief
Network. Quaternion-improved harmony search (QHS) is applied to fine-tune the parameters of DBN (QHS-DBN) in quaternion search space. e harmony search algorithm is selected because of its efficiency in optimization. Moreover, the harmony search algorithm updates probable solutions one by one in a single iteration, not at once. us, it is suitable for the fine-tuning of the DBN parameters [26].

Self-Organizing Deep Belief Network.
e self-organizing DBN (SODBN) based on the growing and pruning algorithm is the integration of the self-organizing ANN and DBN. Different from the original DBN, the SODBN simultaneously considers its structure and learning algorithm. e best number of hidden layers and units can be determined automatically by the SODBN, and weight adjustment is performed during the self-organizing structure dynamic process [27].

Competitive Deep Belief Network.
e competitive DBN (CDBN) is constructed by introducing a competitive learning algorithm mechanism into the DBN. e competitive learning algorithm improves the discriminate information of the deep features among the groups [28].

Continuous Deep Belief Network.
e continuous DBN (CoDBN) deals with the actual data instead of the discrete data in the standard RBM. e CRBM is designed by introducing zero-mean Gaussian noise to the visible layer of the RBM. us, the RBM can improve its capability to deal with actual data. e CoDBN is constructed by sequentially arranging the CRBM and can work with continuous data [21,29,30].

Cost-Sensitive Adaptive Differential Evolution Deep Belief Network.
e cost-sensitive DBN (CS-DBN) with adaptive differential evolution (ADE) is introduced to eliminate the problem of classical DBN in dealing with imbalanced data. e DBN does not effectively work on imbalanced data given that the DBN assumes equal cost for every class. e misclassification cost is optimized before embedding into the DBN to create CS-DBN. e parameters of the CS-DBN are updated using adaptive differential evolution to construct CS-DBN-ADE [15]. Figure 4, is a class of feedforward ANN. It is composed of two feedforward ANNs, namely, the generator (G) and discriminator (D). e two networks, G and D, compete against each other. e adversary D evaluates the quality of the new candidate produced by G. e G ANN model generates forged data from random uniform space, whereas D differentiates between the forged generated data and the original data. e distinguishing of the forged and original data by the D ANN model assists G in generating data with good precision without making reference to the original data. us, the G model is refined. is approach is the main idea behind the GAN. G and D are deep ANN models comprising many layers. e connections in the deep ANN model are conducted in such a manner that the output of the neuron in each layer is the input of the neurons in the next layer [31]. e objective of G is to learn the probability distribution of the training data to generate forged data as close as possible to the original data. On the contrary, the objective of the adversarial D is to distinguish the forged data from the original data. It is performed by penalizing the work of G in generating forged data. is process continues until G and D improve their ability until an equilibrium is reached, where forged and original data cannot be distinguished. G is trained to deceive D to believe that the generated data are actual data. e training of G is performed by minimizing prediction error, whereas that of D is performed by maximizing the prediction error. is approach has resulted in a competitive battle between G and D. e challenging issue in the training of GAN is the lack of stability between its component networks. In training GAN, when the performance of G is significantly better than that of D in the competition, the complete GAN fails. In the initial stage, D gains superior performance to G. G has to struggle to compete with D [31]. e core GAN capability includes image synthesis [32]. GAN is also effective in fraud detection [33].

Coupled Generative Adversarial Network.
e coupled GAN (CoGAN) mitigates the requirement of the tuples of the corresponding images in the dataset with a different domain.
e CoGAN has the ability to learn the joint distribution without requiring tuples for the corresponding images. e joint distribution can be learned with samples that emanate from marginal distribution. Weight-sharing constraint is enforced to limit the capacity of the network and provide preference to the joint distribution solution over the marginal distribution product [34].

Coupled Generative Adversarial Stacked Autoencoder.
Coupled generative adversarial stacked autoencoder (CoGASA) is introduced to overcome the limitations of CoGAN, such as the inability to handle noisy dataset, high computational cost, and lack of potential for real-world applications.
e CoGASA can transfer data from one domain to another without difficulty in handling noisy dataset, and it has less computational cost [35].

Stacked Generative Adversarial Networks.
Stacked GAN (SGAN) comprises a stack of GAN in a top-down hierarchical representation. Each GAN in the stack is learned to generate low-level representation for high-level representations. e architecture also has conditional and entropy loss for using conditional information and maximizing lower bound variation on the conditioned entropy belonging to the G outputs [36].

Conditional Generative Adversarial Network.
Conditional GAN (CGAN) is a variant of GAN that introduces a condition to the discriminator and generator. e CGAN is achieved by feeding extra information to the discriminator and the generator as an extra layer of input. e condition introduced to the GAN has the advantage of providing representation for multimodal data generation [37].

Deep Convolutional Generative Adversarial Network.
Deep convolutional GAN (DCovGAN) is created to extend the supervised ConvNet to unsupervised deep Con-vNet + GAN. e spatial down-and upsampling operators in the DCovGAN use the stride and fractional stride convolutions for learning during the training. e DConvNet has strong DL architecture for unsupervised learning [38].

Convolutional Neural Network.
ConvNet was proposed in [39] and subsequently modified as LeNet-5 to improve its effectiveness and efficiency [40] for the classification of handwriting digits. e architecture of the ConvNet comprises input, hidden, and output layers, as shown in Figure 5. e hidden layer of the ConvNet is composed of convolutional, pooling, fully connected, and normalized layers, and individual features are usually extracted by different layers of the ConvNet in a high-dimensional structure [41]. When an input is supplied to the ConvNet, convolutional operations are applied to the input by the convolutional layer before the result of the operations is passed to the next layer in the ConvNet hidden layers. Each neuron in a feature map is connected to the receptive field of the neuron in the previous layer. e response of a neuron is imitated by the convolution to the visualization of a stimulus. e role of the convolutional layer is to reduce the high number of free parameters required for training the Con-vNet, especially the large input associated with images. us, the ConvNet allows the entire network to be deep with a few parameters. erefore, the problem of vanishing gradient associated with training the classical deep ANN is resolved using the backpropagation algorithm. Global or local spooling may be included in the convolutional network. e convolutional network integrates the results yielded by the cluster of neurons into a single neuron at the subsequent layer. Maximum (max) or average pooling can be used from each of the clusters at the previous layer [40]. e fully connected layer in the ConvNet connects each neuron in one layer to the neuron in another layer. e weights of the ConvNet are shared in the convolutional layer to reduce memory footprint and improve performance [40].
e ConvNet requires activation function to introduce nonlinearity in the network for

Dilated Convolutional Neural Network.
e dilated ConvNet (DConvNet) has an additional hyperparameter introduced to the convolutional layer of the ConvNet. In the DConvNet, zero is introduced between the filter elements to increase the size of the network receptive field.
is approach can provide room for the DConvNet to cover a large amount of relevant information [43].

Recurrent Convolutional Neural Network.
e recurrent ConvNet (RConvNet) has the capacity to use large input; however, the capacity of the RConvNet is limited. Different from the classical techniques, the RConvNet does not rely on segmentation or task-specific features. As long as the size of the context increases with the built-in recurrences, the system adapts to detect and correct its own errors [44].

Tiled Convolutional Neural Network.
e tiled Con-vNet (TConvNet) tiles and multiplies feature maps to enable the model to learn different types of invariance. e tiling and multiplication of the feature maps allow the model to learn rotational and scale-invariant features contrary to the ConvNet [45].

Network in Network Convolutional Neural Network.
e network in network ConvNet (NINConvNet) uses micro networks in place of the convolutional layer linear filter. is approach provides the NINConvNet the capability of approximating abstraction representation more than the classical ConvNet. e main building block of the NINConvNet is the micro network; the stack of the micro networks forms the NINConvNet [46].

Symmetric Convolutional Neural Network.
e symmetric ConvNet (SConvNet), contrary to the ConvNet, imposes convolution and deconvolution operations in a symmetric approach to improve segmentation performance.
e SConvNet can also perform automatic mandible segmentation from the original data [47].

Deep Recurrent Neural Network.
e basic recurrent neural network (RNN) is a neural network architecture that accepts the set of input sequence and computes the hidden and output vector sequences by iterations. For a given input vector sequence x � (x 1 , . . ., x t ), hidden vector sequence h � (h 1 , . . ., h t ), and output vector sequence y � (y 1 , . . ., y t ), the iterations start from T �1 to t. However, the deep RNN (DRNN) is built by stacking multiple RNN hidden layers on the top of one another. In this approach, the output sequence of one layer forms the input sequence of the subsequent layer [50,51].

Deep Feedforward Neural Network.
e deep feedforward neural network (DFNN) is the architecture of the ANN that has multiple hidden layers. It is different from the shallow ANN composed of only three layers, namely, input, hidden, and output. e DFNN is carefully constructed to avoid the local minima problem. e number of hidden layers increases the complexity of the DFNN because many parameters are required to be tuned. However, the DFNN can effectively deal with large-scale datasets because recent empirical and theoretical works indicated that local minima are not a serious issue [13].

Application of Deep Learning Architecture.
In this section, the applications of the DL architecture are briefly presented. Recently, the application of DL in image analysis, speech recognition, and text understanding has demonstrated outstanding success. e DL applies the supervised and unsupervised learning techniques for learning multiplelevel representation as well as features in hierarchical architectures to solve classification and pattern recognition problems [15]. e DL architecture presented in the previous section has demonstrated excellent performance in different application domains. e DL architecture can be applied in image processing [52,53], natural language processing [54,55], video analysis [56], text analysis [57], scene [57], object detection [58,59], speech processing [60,61], and dimension reduction [23].

Internet of Vehicles
e great revolution that escalated from the Internet has provided opportunity for connecting people at an exceptional magnitude and speed. e success recorded from the Internet revolution brought about significant opportunity that is presently changing the methods by which various objects communicate at present. is rapid development considers the interconnection between objects to realize a smart city, where a device interacts with other connected devices. is communication is achieved through seamless ubiquitous sensing, emerging technologies, and availability of a scalable platform for large data analysis. At present, objects, such as smartphones, vehicles, laptops and tablets, TVs, and other handheld devices, change our surroundings, making them very interactive and informative [62,63]. rough modern communication, smart devices create a network of interconnected objects with real-time interactions. e growth in the number of devices and the nature of the global network architecture, which includes all existing heterogeneous networks, has shaped our experience.
is universal network of things has been identified as a future Internet presently shaped as the IoT [64].
e IoT serves as an enabling environment where sensors and actuator objects interact seamlessly and provide progressively more suitable platforms for data exchange. e recent advancement and adaptation of various wireless communication technologies have positioned IoT to be a promising technology, which benefits from the potential prospects provided through Internet technology.
e IoT technology has brought about the development of intelligent systems, which include but are not limited to smart retail, smart water, smart energy, smart grids, smart healthcare, smart homes, and smart transportation [62,63]. e IoT has created interfaces for smart devices to be connected to a global network with the ability to render services from other connected devices [65]. e IoT enables seamless integration of heterogeneous network of devices through the use of intelligent interfaces. erefore, one of the key objectives of the IoT is interoperability among heterogeneous devices [62,64]. e emergence of IoT technology has revolutionized many new research and development areas.
e IoV is an innovation activated by IoT, and this domain evolves from vehicular ad hoc networks (VANETs) to build smart vehicles within smart cities [66]. At present, the number of connected vehicles has witnessed exponential growth, and according to [67], a significant number of vehicles are expected to have an Internet connection. e global vehicular traffic was projected to escalate to 300,000 exabytes toward the end of 2020. is significant increase in vehicular data results from the advancement in vehicular telematics applications, including in-vehicle infotainment and ITS [68]. Conventional VANETs use vehicle as a node for transmitting or relaying traffic information between vehicles and infrastructures using vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications. Many vehicular applications, including ITS services and safety application, have leveraged the potentials offered by the increasing connectivity of modern vehicles. For instance, V2V communication enables sharing of information among vehicles for safety communication propagation. Conversely, V2I communication enables the collection of information from different infrastructure facilities [68]. e IoV can be visualized using the three layers shown in Figure 6.
is architecture considers the IoV from the network connection perspective [5].
e taxonomy of the IoV communication system with various information flows is presented in Figure 7.
e taxonomy of IoV communication is shown in Figure 7.
is taxonomy presents the different types of interactions that exist between vehicles and other devices. In addition, it identifies the information flow in each IoV communication category as well as the emerging technologies utilized by each communication type. e type of the communication involved vehicle-to-vehicle communications, vehicle-to-infrastructure communications, vehicle-toroadside unit communications, vehicle-to-sensor (V2S) communications, vehicle-to-personal device (V2P) communications, vehicle-to-pedestrian (V2D) communications, and vehicle-to-home (V2H) communications.

Case Studies
is section briefly points out case studies involving IoV. Five case studies are presented for readers to appreciate the level of progress in the concept of IoV. e summary of the case studies is presented in Table 1 and discussed in the subsequent section.

5G Internet of Vehicles.
A 5G IoV has been built by Nokia in Wuzhen Town, China. e 5G IoV has a 5 km test route that can be used by three vehicles. Two different scenarios are tested on the route. e first scenario shows the vehicle signaling warning while slowing down at a time when a different car at 1000 m away makes an emergency stop. e second scenario shows the capability of the vehicle to issue accurate instructions on changing or packing a lane in complex circumstances.
e Nokia 5G IoV solution Car2X has improved the current 1 s delay to less than 20 ms between the vehicle and the mobile communication networks [69].

Internet-of-Vehicle Platform of Huawei Technologies.
In promoting innovation in the IoV, Huawei technologies have developed a connected car solution on its platform-OceanConnect IoV.
e connected car solution provides transport-oriented services, such as data, interconnection, fleet, and security. It has a secure network access. e connected car solution generates new value stream and flexible adaptation for multiple terminals, as well as the collection and analysis of large-scale data. e connected car solution has been commercialized with FAW group, including Kingdom of Saudi Arabia Zain and Malaysia Axiata. e connected car solution has won the best "IoV Innovation Award at World Intelligent Vehicle Conference 2017" [70].

Big Data Analytic Platforms That Support
Deep Learning in the Internet of Vehicles e processing of the large-scale data generated from the IoV environment from various sources, such as cameras and sensors, is required. DL can be used for the processing of the IoV big data. BDA platforms that support DL are required for the analysis of IoV BDA. In this section, we present the Apache Spark that supports DL and other BDA platforms that support machine learning, such as Hadoop, AzureML, and BigML (Figure 8). DL is a branch of the machine learning that can solve classification, prediction, and clustering problems in IoV environments.

Apache Spark.
Spark is a big data processing framework based on streaming, machine learning, and graph processing [75]. It is an open-source framework and was developed to overcome some of the limitations of Hadoop MapReduce. Spark uses memory based on processing large amounts of data, and it is faster in terms of data processing than MapReduce framework. As a result, the data are stored in memory using resilient distributed datasets. Moreover, Spark supports real-time analysis. Reference [76] presented Spark's open-source distributed machine learning library, MLlib. Several learning settings exist in MLlib to improve the functionality efficiently, such as optimization, linear algebra primitives, and underlying statistical methods. Moreover, MLlib provides a high-level API and several languages that leverage Spark's rich ecosystem to simplify the development of end-to-end machine learning pipelines. Reference [77] discussed the DL over Apache Spark for mobile BDA. e authors showed how Spark can perform distributed DL on Map-Reduce. Each partition of the deep model is learned by the Spark worker for the entire mobile big data. en, the parameters use the master deep model of all partial models through averaging.

Hadoop.
Hadoop has emerged as an important framework for "distributed processing of large datasets across clusters of machines" [78]. Many Hadoop-related projects have been developed over the years to support the framework, such as, Hive, Pig, Tez, Zookeeper, and Mahout. Mahout is one of the distributed linear algebra frameworks for scalable machine learning [79]. Moreover, "scalable advanced massive online analysis" is an open-source platform for data mining and machine learning similar to Mahout, which supports Hadoop for streaming big data processing [80]. Discussion on Twitter's integration of machine learning into the Hadoop platform was done by [81]. e main idea is to utilize Pig extensions to offer predictive analytic capabilities. e authors identified various techniques related to stochastic gradient descent for supervised classification through online learning and ensemble methods, which can scale out to large amounts of data. Recently, DL networks based on backpropagation are implemented with one hidden layer in Mahout to learn arbitrary decision boundaries. Moreover, different machine     learning algorithms, including neural networks, and parallel programming methods, such as MapReduce, are mapped to improve processing speed.

AzureML.
AzureML is a collaborative machine learning platform based on predictive analytics in big data, which allows easy development of predictive models and APIs. Numerous unique features, such as easy operationalization, versioning collaboration, and integration of user code, are provided by AzureML [82]. Reference [83] offered a technique for cloud-based AzureML named Generalized Flow, which allows binary classification and multiclass datasets and processes them to maximize the overall classification accuracy. e performance of the technique is tested on datasets based on the optimized classification model. e authors used three public datasets and a local dataset to evaluate the proposed flow using the classification. e result of the public datasets has shown an accuracy of 97.5%. Furthermore, the concept has become indispensable on big data technologies. For example, AzureML supports neural network for regression, two-class classification, and multiclass classification.

BigML.
BigML provides highly scalable ML and predictive analysis services on cloud (Martin & Ortega). e goal of BigML is to assist in developing a set of services given that it is easy to use and seamless to integrate. BigML has been used in many studies for predictive analytics and DL because of its robustness and simplicity in providing a userfriendly interface. For example, a study on the distinguishing features of human footprint images is conducted by [84] to offer deep analysis using BigML. e idea is to exploit the concept of the human footprint for personal identification using many fuzzy rules for predictive analysis. e verification of 440 footprint images is conducted for data quality. GPUs have been applied to speed up the performance. Moreover, [85] presented a predictive analysis on the most popular place for dengue in Malaysia to obtain an early warning and awareness to people using BigML platform. e study is based on the decision tree algorithm model, which builds on BigML to support classification. Moreover, [86] analyzed the game features and acquisition, retention, and monetization strategies as primary drivers of mobile game application success.

Harnessing Deep Learning in the Internet of Vehicles in the Context of Big Data Analytics
In the IoV, fully autonomous, semiautonomous, and conventional vehicles equipped with IoV technologies operate in the environment.
IoV has the ability to support big data acquisition, storage, transmission, and computing. e big data can improve the effectiveness and efficiency of the IoV based on the characterization of network, analysis of performance, and protocol design [68]. Big data distinctly have different formats (unstructured). Unstructured data can be in the format of text, images, videos, and graphics. e unstructured component of the big data constitutes 75% [87]. DL is a popular tool for big data processing [77,88] because of its outstanding result in different applications [26]. e DL architecture is complex with the capability to work on big data generated from the IoV. ese complex networks work better than the simple structure of the ANN [89]. DL has shown promising performance in unstructured data analytics. e promising performance of DL in processing unstructured data, for example, in visual object classification, speech recognition, natural language processing, and information retrieval, has been reported in the literature [90].
In the case of autonomous vehicles, DL is highly required because it learns from experience. Despite the fact that almost all available possibilities are fully automated, DL is required to capture new scenarios and perform analytics of accumulated data from the cameras and sensors. is approach enables the vehicles in the IoV environment to take critical decision that can avoid collision and possible loss of life. Progress in sensor networks and communication technology prompted the gathering of big data. Sufficient training objects are provided when big data are exploited. As a result, the performance of DL is improved. Training of large-scale DL models for big data feature requires highperformance systems and architecture, such as graphical processing units (GPUs) and CPU cluster [15]. A recent study has shown that evolutionary ANN has potential application in the IoV. Chen et al. [91] demonstrated that the evolutionary ANN can predict rear-end collision within the IoV environment. Hence, it can help in the development of an effective rear-end collision detection system for vehicles in IoV environments. Kong et al. [92] proposed the application of DBN for the prediction of short traffic flow within the IoV environment. e study is motivated by the accumulation of big data in the IoV, and the shallow artificial neural network algorithm cannot handle such a large amount of data. e DBN is applied for the short traffic flow, and it performs better than the baseline algorithms. Wang et al. [93] proposed a DL model for optimal workload allocation to improve vehicle energy consumption in the IoV. DL provides enhanced energy efficiency and improves the latency of the network. Ning et al. [94] proposed ConvNet to improve the speed of data transmission and enhance the content among vehicles in the IoV environment. e ConvNet is applied for data transmission by exploiting the tri-relationship between vehicles. e result indicates the efficiency of the proposed ConvNet based on latency, message delivery, and percentage of connected devices.
Ning et al. [95] hybridized motif based method (MBM) and ConvNet (MBM-ConvNet) for D2D communication in the IoV. e MBM clusters the intelligent mobile devices in buses and with passengers in a triangular manner whereas the ConvNet predicts the D2D connection.
e MBM-ConvNet model performed better than the pair discovery scheme, social aware approach, and MBM. e issue here is that edge network may not be good for emergencies. Gulati et al. [96] hybridized energy estimation scheme (EES), Wiener process model (WPM), and ConvNet (EES-WPM-ConvNet) to ensure enhanced throughput and reduced latency for data transmission in the IoV. e EES checks vehicle's energy level for connectivity by comparing it with a threshold value, WPM estimates vehicles connectivity while the ConvNet predicts the ideal vehicle pairs for data transmission. e proposed EES-WPM-ConvNet performed better than the EES-WPM. e challenge is that connectivity is allowed only for vehicles with sufficient amount of energy.
Wang et al. [93] hybridized greedy algorithm (GA) and ConvNet with simulated annealing (SA) algorithm (GA-ConvNet with SA) to ensure reduced energy level consumption for both vehicles and road sign unit in the IoV. e GA selects the server with minimum power consumption for processing the queuing requests, SA searches for the global optimal solution for the initialization phase of the ConvNet, and the ConvNet predicts the optimal workload allocation for the computational facilities. e result showed that the GA consumes less power compared to the ConvNet and SA whereas the ConvNet has the least network delay compared with the GA and SA. However, the paper assumed that all vehicles move on a straight line. Ning et al. [94] hybridized Edmonds-Karp Algorithm (EKA) and DRL with deep Q-network (DQN) (EKA-DDQN) to minimize the amount of energy consumed during computational offloading in the IoV. e EKA ensures flow redirection among RSUs while the DDQN minimizes the overall energy consumption. e result obtained showed that the EKA performed better than the greedy method and exhaustive method while the DDQN outperformed Q-learning and cloudlet computing models. e issue is that if the rate of data offloads is above the computational capability of the vehicles, say 80 MB, the rate of energy consumption increases rapidly.
Liu et al. [97] hybridized practical Byzantine fault tolerant algorithm (PBFTA) and DRL based scheme in blockchain enabled (PBFTA-DRL-BCE) IoV for performance optimization. e PBFTA appends a particular block to blockchain through agreement on the block that is recently realized. e DRL makes block producers (BPs), block size (BS), and block interval (BI) in the PBFTA-DRL-BCE conversant with various instances of the IoV so as to maximize throughput. e PBFTA-DRL-BCE model performs better than the PBFTA-DRL-BCE without BPs selection, PBFTA-DRL-BCE with fixed BS, PBFTA-DRL-BCE with fixed BI, and existing static scheme. e transactional throughput was unstable at some point during learning process.
Dai et al. [98] hybridized DRL based on deep deterministic policy gradient (DDPG) and Manhattan grid model (MGM) for edge caching and content delivery latency reduction in the IoV. e MGM defines direction of vehicle's movement while DDPG optimizes vehicle edge caching and minimizes content delivery latency. e result obtained shows that the DDPG scheme performed better than random edge caching without bandwidth allocation, optimization of edge caching, and content delivery without bandwidth allocation schemes. Kong et al. [92] proposed a deep belief network (DBN) model for short-term traffic flow prediction in smart multimedia system (SMS) in the IoV. e DBN model predicts short-term traffic flow in SMS for SMS to driver communication. e result obtained shows that the DBN model performed better than the ANN, backpropagation, support vector regression machine (SVRM), and autoregressive moving average. e issues here are that the DBN model cannot handle a large-scale dataset of up to 10 million for feature mining and prediction, and if data complexity and randomness are increased in the traffic, the output cannot be ascertained.
Goudarzi et al. [99] hybridized DBN, backpropagation algorithm (BPA), and firefly algorithm (FFA) (DBN-BPA-FFA) for traffic flow prediction in the IoV. FFA optimizes the DBN topology and learning rate parameters, the BP finetunes the weight parameters of RBMs, and the optimized DBN predicts the traffic flow. e result shows that DRBM-FFA performed better than the autoregressive integrated moving average (ARIMA), multilayer perception (MLP) optimized FFA (MLP-FFA), and ARIMA optimized particle swarm optimization (PSO) (ARIMA-PSO). It is assumed that traffic behaviors are concurrent at peak periods. Sharma et al. [100] proposed a deep neural network (DNN) for security system in the IoV. e DNN detects and thwarts various cyberattacks. e DNN scheme performed better than the traditional security system (TSS).
Deep reinforcement learning [101] can also play a vital role within the IoV environments because of the complexity of real-world driving. In autonomous vehicles, DL, highperformance computing system, and advanced algorithms are required for the vehicles to adapt to changing situations.
is approach can be performed through 3D high-definition maps. e cameras and sensors in the autonomous vehicles generate large-scale data for compilation. e data are required to be analyzed to keep the vehicle moving on the lane. Without DL that uses the information from high-definition maps that contain geocoded data, fully autonomous driving becomes a mirage. Without high-definition maps containing geocoded data and DL that uses this information, fully autonomous driving stagnates in Europe [8]. Artificial intelligence software and DL models are used in Baidu's AutoBrain to train computers to drive the same way as humans [102].

Deep Learning for IoV in the Context of Big Data Analytics
Compared to Other AI Techniques. Unlike DL architectures, other artificial intelligence (AI) techniques like the shallow neural network, support vector machine, fuzzy systems, random forest, and k-nearest neighbor typically witness deteriorating performance as the amount of data increases, which makes them unfit for BDA. As discussed in Ali et al. [3], support vector machine has the challenge of dealing with fast authentication mechanism for large-scale IoV architecture. Fuzzy system has the limitation of dealing with IoV multimedia communications. Shallow algorithms like the random forest, multilayer perceptron, and AdaBoost are facing the challenge of securing decision for safety in the V2X traffic.
In addition, other AI techniques require separate techniques for feature extraction before feeding the data to the algorithm for processing, which increases computational cost and requires human intervention, whereas DL has embedded automatic feature extraction mechanism that makes the DL algorithm eliminate the requirement for extra feature extraction techniques, thereby reducing the effort of data engineering. erefore, it gives DL advantage in BDA over other AI techniques. It is well known in the literature that DL architecture, specifically ConvNet, has proven to be outstanding in image processing compared to other AI techniques. DL has the advantage of dealing with natural unlabeled data better than other AI techniques.
Furthermore, the application of DL in BDA has the following strengths: ability to generate intrinsic features, effective processing of unlabeled data, high accuracy in providing results, and efficiency with multimodal data [77]. We discuss it in the context of the IoV as follows.
Accuracy in the IoV is a crucial issue because the vehicles in the IoV environment depend on the decision of the DL system. Accurate analysis can prevent chaos on the public roads that can lead to accidents, injury, and possibly death. For example, inaccurate capturing of new scenario by the DL system might cause a fatal accident in the IoV environment. e 3D road map data are recorded by the automated driving maps. Within the distance of a few centimeters away, the 3D road map data are accurate for the vehicle position.
e vehicle detects and follows other vehicles with a high level of accuracy, recognizes lanes, and measures distance and speed. is condition typically occurs when the object and environmental technology of the car is enabled [8]. e DL system plays a significant role in this circumstance. e sensors embedded in the vehicles within the IoV environment generate data with intrinsic feature because the data are obtained from the sensors. BDA requires intrinsic features, and DL has the ability to generate the intrinsic features required by BDA. e feature is a characteristic of sensor data. High-level features can be learned automatically by DL without manual intervention.
A large portion of the data generated from the sensors embedded in vehicles in the IoV environment refers to natural data. Different from conventional machine learning techniques that require significant engineering works, DL can effectively deal with natural unlabeled data with minimal human intervention.
us, human effort in labelling data is minimized. e sensors generate a variety of data (images, audio, and speech), and DL can work with multimodal input data.

Proposed Model of IoV Integrated into the Cloud Equipped with High-Performance
Computing Server, Deep Learning Models, and Apache Spark e paper proposes a model that integrates the IoV into cloud equipped with high-performance computing server, large-scale DL models, and Apache Spark. Low-end devices have limitation in terms of handling the application of large-scale DL models for data processing [15]. erefore, computers in the vehicles within the IoV environment have limitation in terms of handling large-scale DL models for processing massive large-scale data expected to be generated from the IoV environment with millions of vehicles. Reference [8] suggested that the computers in the vehicles within the IoV environment should be connected to a cloud processing platform for instantaneous data integration and move to the selected terminal. Figure 9 shows the networks of wireless access technology involving vehicles and the Internet, as well as the heterogeneous network commonly referred to as the IoV. e figure shows the representation of the IoV in large-scale distributed environment in terms of wireless communication of various devices. In the proposed model, the IoV environment comprises autonomous, semiautonomous, and conventional vehicles equipped with IoV technologies.
Autonomous vehicles are equipped with sensors for selfcontrolling self-driving vehicles and monitoring road conditions, energy consumption, tire pressure, traffic information, water temperature, speed control, and parking services. As the sophistication of autonomous vehicles increases, the number of sensors in the vehicle increases at the same rate. A single vehicle is expected to be equipped with 200 sensors by 2020 given the increase in communication between the vehicle and its surrounding environments. Semiautonomous and conventional vehicles equipped with IoV technologies within the IoV environment are also equipped with sensors. However, the number of sensors in semiautonomous and conventional vehicles can differ from that in autonomous vehicles because the latter are more sophisticated.
ese embedded sensors in the vehicles generate diverse and complex data at a faster rate in real time and on a massive large scale given that the number of the vehicles with large number of sensors gains acceptability and continues to increase exponentially. ese data are generated from the IoV through sensors, cameras, road infrastructure, vehicles, home, Internet, pedestrians, and personal devices that can provide information about the representation of the IoV environment. Such dataset from the IoV environment has extremely high dimension and is unstructured. e data are transferred in real time to the cloud equipped with large-scale DL models, Apache Spark platform, and high-performance computing server equipped with multiple GPUs for processing IoV big data and storage in the cloud, as shown in Figure 9.
e DL model requires a large-scale dataset as the main component in solving classification, clustering, and prediction problems related to the big data from the IoV environment. e data generated from the IoV could include speech, visual objects, signals, audio, video, and text. e DL concepts perform excellently in processing such data (Section 2).
We propose GPU for the high-performance computing server because studies [15,103] have shown that processing large-scale data based on DL is more effective and efficient when run on GPU than on CPU. Currently, a special processor for DL is under development and is expected to run DL experiments faster than the GPU to reduce computational time [15]. Apache Spark is the proposed big data platform for the large-scale DL to process the big data generated from the IoV environment because the BDA platform supports DL. e results of the analytics can be forwarded to the relevant companies, such as automobile makers and insurance companies, for use in making crucial decisions, such as designing new business models, detecting component malfunction, reducing the number of recall vehicles, and predicting component failure. e autonomous vehicles can use the result in making decisions, such as predicting rear collision and warning regarding change of lane. is application can prevent unexpected failure prior to occurrence. ese decisions are uniform throughout the entire IoV environment, thereby improving the reliability, effectiveness, and efficiency of the IoV environment.
e IoV involves drastic communications (Section 3). ese communications continue to increase as long as the vehicles continue to grow in number within the IoV environment. Extensive research and test run of the IoV in complex and challenging environments can also create new opportunities for new communications within the IoV environment.

Internet-of-Vehicle Dataset Problem.
Performing data analysis in the IoV requires datasets generated from the IoV environment. However, the IoV is an emerging concept mostly pilot-tested by different companies.
e data required by researchers to apply DL for carrying out meaningful analysis in the IoV within the context of big data are scarce, so working in the area of DL is challenging. Most IoV technologies are in the trial phase; thus, releasing data to third-party researchers for analysis is difficult. Data are the key component of DL; without data, the algorithm is ineffective even with excellent DL architecture. erefore, we suggest the building of public IoV data repository for use by the research community to run the proposed DL algorithms in the context of the IoV. e availability of public IoV repository will encourage studies on DL within IoV environments. With such a public repository, additional effective, robust, and efficient models of the IoV can be built for further improvement. In-depth research is required to fully understand the emerging concept and improve the state-of-the-art progress. Data analysis is a key ingredient in timely resolving potential challenges in the IoV. As an alternative, PTV VISSIM, a leading simulator for simulating microscopic traffic [104], can be used to create an IoV environment similar to [105] and generate relevant IoV data for the DL application. e OceanConnect IoV platform developed by Huawei is a good platform for researchers to explore.

Limited Deep Learning Approaches in the Internet of
Vehicles. Despite the excellent performance of DL in different application domains as revealed in the literature, very few studies applied the DL architecture in the IoV for data analysis. Despite the fact that the IoV is projected to generate large-scale data (Section 1), the application of the DL architecture in IoV data analysis is highly limited in the literature. e application of DL in the IoV to solve problems is in its infancy stage. e following DL concepts remain unexplored in the context of IoV: GAN, efficient inference technique, attention models, memory augmented neural network, transfer learning, biologically plausible deep network, and few-shot learning. e DL, IoV, and big data research communities should deploy massive efforts in the application of DL in the IoV within the context of BDA. e DL architecture deserves exploration on the dataset acquired from the IoV environments-real-life or simulated environments.
is approach can provide insights into the development of effective and efficient IoV models, the concept of which is yet to be fully understood. e metaheuristic optimization of the DL architecture should be tuned by considering its parameters in the context of the IoV because it is effective in solving problems, as proven by [106].

Restriction of Uniform Decision in the Internet-of-Vehicle
Environment. Compatibility issue is another challenge facing the smooth operation of the IoV. e communications between vehicles, especially V2V, are challenging. e V2V technology in a vehicle should be compatible prior to exchange of data between the vehicles. If the V2V technology in a vehicle is not compatible with another V2V technology in another vehicle, no communication between the vehicles exists. us, these vehicles cannot communicate within the IoV environment. e data generated from the vehicles and transferred to central processing system for data analysis cannot easily provide uniform decision to the vehicles within the IoV environment. e data generated from a particular vehicle obtain a decision different from the data generated from vehicles with different V2V technologies. erefore, having uniform decision is restricted only to vehicles with compatible V2V technologies. e V2V technology in the IoV environment should be compatible among all the vehicles in the same environment to provide smooth communications. e data generated from the vehicle sensors and cameras can be transferred to a central location for analysis by DL on the BDA platform in a high-performance computing platform. erefore, the decision taken as a result of the analysis will be uniform, and all the vehicles will benefit from the decision.

Unknown Effect of Autonomous Vehicles on the Internet of Vehicles.
e effect of autonomous vehicles on the traffic operations and infrastructure of the transportation system is unknown [104]. Lack of IoV data limits the understanding of the effect and its consequences. In addition, the IoV is an emerging technology, and its idea is not fully understood [64]. However, an ongoing initiative aims to fully operate it in the near future. e IoV currently attracts unprecedented attention from the industries and the academia.
is concept requires in-depth research and development to fully understand the effect of autonomous vehicles on traffic operations and transportation infrastructure. e DL research community can consider a research in this direction given that determining the effect of autonomous vehicles on traffic operations and transportation infrastructure of the IoV can assist in developing a robust IoV infrastructure that can accommodate autonomous vehicles comfortably.

Loss of Internet Connectivity Can Cause Missing Data
Points.
e IoV heavily depends on Internet connectivity for its smooth and efficient operation. A loss of Internet connectivity between fast moving vehicles in the IoV can cause the loss of data points in the data generated from the IoV environment. e loss of Internet connectivity can be caused by extreme weather, natural disaster, and interruption as a result of limited Internet coverage. Reference [68] argued that the high mobility in the IoV environment can cause frequent interruptions of Internet connectivity. us, the quality of the data generated from the IoV environment is affected, thereby resulting in noisy large-scale data.
Processing of noisy data requires extra effort to improve the quality of data. However, obtaining real-life dataset without missing points is difficult. e DL models are capable of handling datasets with missing data points. We recommend the application of DL models to handle the IoV dataset with missing data points for BDA. e performance of the DL models does not require complete information to perform contrary to expert systems.

New Perspective Based on Deep Learning for Solving
Challenges Raised in [68].
e IoV big data sources data in different forms, and preprocessing is required. e issue of sourcing different data from different sources is expected from the IoV, and bigger data from larger scope of IoV can also be collected [68]. In view of the fact that the DL does not require extra preprocessing techniques to process data, we recommend that the DL algorithm can be applied to process the IoV big data in conjunction with the framework proposed in Section 7 as shown in Figure 9. Big data can be collected from the IoV network protocol. e big data collected from the IoV network protocol [68] can be analyzed using deep learning to gain insight. e new insight from the data analytics can be used to improve the efficiency, quality, security, and effectiveness of the IoV network protocol. It is reported that the data collected from the roads by the vehicles can be aggregated to form HD maps [68]. Because the HD maps are in form of images, the images can be processed via deep learning especially the ConvNet architecture to get value from the HD images for improving the overall vehicular mobility in the IoV, thereby improving the services rendered by the big data IoV powered services.

Security Challenges in the Internet-of-Vehicle
Environment.
e security of an IoV network is vital, and some of the possible attacks on the IoV are discussed as follows.
8.7.1. Ransomware Attacks. Ransomware attacks are classified into three types, namely, crypto, locker, and cryptolocker. e crypto ransomware works by applying encryption schemes on device data. e locker ransomware works by restricting user access to system functionalities, whereas the crypto-locker ransomware supports encrypting and locking devices. is attack is dangerous because the device data and functionality could be compromised. e device is only released back to the user after a ransom has been paid via any of the blockchain technology online payment systems, such as Bitcoin [107]. e threat of ransomware attacks in the IoV environment can be devastating when fully deployed because of the possibility of vehicle hijack using remote connectivity via IoV protocols until a ransom is paid. Vehicles can also be hijacked to commit crimes by impersonation given that the cryptolocker ransomware can encrypt and lock devices and compromise computerized vehicular systems. In addition, the IoV is a source of big data generation given that vehicle signals, vehicle routing information, packing information, and GPS information should be securely stored in massive storage facilities, such as the cloud infrastructure, big data processing, distribution, e-commerce, and IoV transactions [108]. ese storage facilities are also a potential target for ransomware attacks, leading to the compromise of data integrity.

DDoS/DoS Attacks.
e DDoS/DoS attacks in the IoV environment are used to flood a target vehicle with unsolicited traffic to deny legitimate communication. is attack can lead to system jamming, malfunction, or failure, which will eventually cause vehicular accident in an IoV environment. A number of DDoS/DoS detection and prevention algorithms have been presented in the VANET environment [109][110][111]. However, these algorithms are ineffective in an IoV situation. One of the best efforts in detecting DDoS/DoS attacks in the IoV was presented by [112]. e study introduced a broadcast authentication protocol called Paralleling Broadcast Authentication Protocol, which aims at improving energy efficiency and providing network security in the uninterrupted communication between vehicles in the IoV environment.
DoS and DDoS attacks can drain the resources of a target autonomous vehicle. Bandwidth resources can be very limited for IoV entities depending on the nature of vehicular communication. Exhaustion of the bandwidth for a certain time interval can lead to inaccessibility of the server or an autonomous vehicle within that time. e resources of an autonomous vehicle range from processing capacity, number of ports, memory, to storage space. erefore, exhaustion of available resources of the autonomous vehicle can lead to adverse state of the vehicle during which the cybercriminals can compromise the confidentiality, availability, and integrity of the data in the autonomous vehicle [113]. In the IoV environment, DoS/DDoS attacks are frequently achieved in two ways, namely, reflection and amplification methods. In the reflection method, the attacker sends different packets with a bogus IP address of the target vehicle as the source address of the packets to many endpoints. is method is deployed by cybercriminals to hide trails of the attacker. In the amplification method, an insignificant number of packets are sent from cybercriminals to stimulate an enormous number of packets directed to the targeted vehicle. e amplification method is often used together with the reflection method to launch a huge attack against an unsuspecting autonomous vehicle.

Malware and Spyware Attacks.
Malware is generally referred to as viruses or worms. ese viruses are generally propagated via outside unit software and firmware updates.
Malware can affect autonomous vehicles in IoV environment, thereby permitting remote enemies to gain access and control the target vehicles. Remote access spyware combines with the innovative communication services that VANETs convey to the IoV, which is likely to gain access of the autonomous vehicle to interrupt vital facilities and services. Isolated malware threats are commonly established and have been revealed in test beds to put drivers and passengers at risk. Recent studies on spyware targeting vehicles have revealed that the spread of spyware is likely to be realized via weaknesses in the in-built systems deployed to analyze vehicles throughout the service period. e broad consequence is that many vehicles within the IoV network may be infected given that the malware or spyware is transmitted via trusted service platform, possibly infecting a complete product line [114].
Reference [115] introduced a verification method to ensure that only the verified IoV user can use the autonomous car. e authors also used a cloud-based vehicle malware defense mechanism to address the malware and spyware challenges. However, the main issue is the maintenance of updated patches and signature files in the IoV vehicles.
e DDoS/DoS, ransomware, malware, spyware, and MITM attacks are dynamic in nature. erefore, the attacks can change to disguise and bypass the security system in the Internet of Vehicles. e DL models are adaptive in nature with capability of adapting to new circumstances. erefore, we suggest the application of DL models for the development of a powerful adaptive intrusion detection system that can detect dynamic security threats in the IoV environment.
us, the impact of the DDoS/DoS, ransomware, malware, spyware, and MITM attacks in the IoV is minimized.

Conclusions
We present a survey on leveraging DL in the IoV within the context of BDA. e relationship that exists between DL, IoV, and BDA has been unveiled to provide researchers with a clear perspective on the empirical application of DL in the IoV within the context of BDA. e results show that empirical works on DL in the IoV are highly limited and public repository data for IoV are unavailable to researchers. e paper presents current development issues, potential challenges, and new direction for emerging research on DL in the IoV within the context of BDA. We believe that this study can help expert researchers to easily identify areas that require solutions and novice researchers can use it as a benchmark. SA: Simulated annealing SConvNet: Symmetric convolutional neural network SGAN: Stacked generative adversarial network SMS: Smart multimedia system SODBN: Self-organizing DBN SVRM: Support vector regression machine TConvNet: Tiled convolutional neural network TSS: Traditional security system TVs: Televisions USA: United States of America VANETs: Vehicular ad hoc networks V2D: Vehicle-to-device V2H: Vehicle-to-home V2I: Vehicle-to-infrastructure V2P: Vehicle-to-pedestrian V2R: Vehicle-to-roadside units V2S: Vehicle-to-sensor V2V: Vehicle-to-vehicle V2X: Vehicle-to-everything WAVE: Wireless access for vehicular environment Wi-Fi: Wireless fidelity WiMAX: Worldwide interoperability for microwave access WPM: Wiener process model 3G LTE: ird generation long term evolution 3D: ree-dimensional 4G LTE: Fourth generation long term evolution 5G LTE: Fifth generation long term generation.

Data Availability
No data were used to support the findings of this study.

Conflicts of Interest
e authors declare that there are no conflicts of interest.