An Efficient Outlier Detection with Deep Learning-Based Financial Crisis Prediction Model in Big Data Environment

As Big Data, Internet of Things (IoT), cloud computing (CC), and other ideas and technologies are combined for social interactions. Big data technologies improve the treatment of financial data for businesses. At present, an effective tool can be used to forecast the financial failures and crises of small and medium-sized enterprises. Financial crisis prediction (FCP) plays a major role in the country's economic phenomenon. Accurate forecasting of the number and probability of failure is an indication of the development and strength of national economies. Normally, distinct approaches are planned for an effective FCP. Conversely, classifier efficiency and predictive accuracy and data legality could not be optimal for practical application. In this view, this study develops an oppositional ant lion optimizer-based feature selection with a machine learning-enabled classification (OALOFS-MLC) model for FCP in a big data environment. For big data management in the financial sector, the Hadoop MapReduce tool is used. In addition, the presented OALOFS-MLC model designs a new OALOFS algorithm to choose an optimal subset of features which helps to achieve improved classification results. In addition, the deep random vector functional links network (DRVFLN) model is used to perform the grading process. Experimental validation of the OALOFS-MLC approach was conducted using a baseline dataset and the results demonstrated the supremacy of the OALOFS-MLC algorithm over recent approaches.


Introduction
With the dynamic expansion of the financial marketplace, enterprises might increase lower-cost deposit from the financial marketplace to quicken their improvement, and investor uses the process of the financial marketplace to finance and acquire high revenues [1]. But current companies are confronting progressively unsuccessful marketplace environments, and risk continuously provides operators a problem. e features of the current enterprise environment are mostly replicated in the rapid development of information technology, economic globalization, changes in business models, management methods, and customer orientation. is factor is influenced by technology, society, economy, and politics. e existing procedure of current companies is a method where different types of risks are endlessly produced and solved [2]. Over the last few years, the most important constraint for positioning resourceful devices is extremely pertinent to small and medium-sized enterprises (SMEs) for predicting economical faults and business loss. SMEs need business management for observing the modus operandi and inspect whether it is relevant to attain the determined objectives [3]. is model is portrayed through a series of firm rules, and some approaches are called "controls," which guarantee the structure of the enterprise organization. At last, the requirement is stimulated for intermittent assessments [4]. As a result, detecting and estimating the development of corporate entitie make it easy to understand by the high dynamism that proposes to be a complicated process. It is important to progress that served as the inspection of efficiency in the economy [5].
Over the last few years, with the advance of the economic crisis of businesses all over the world, the enterprises are paying more interest in the field of FCP [6]. For a financial or company organization, it is vital to model earlier and reliable predictive models for forecasting the possible risk of the business status of economic failure. FCP usually yields a dual classification model that was resolved rationally [7,8]. e outcome from the classification models is classified as a failure and nonfailure status of enterprise [9]. Previously, several classification methods are designed with different areas of concern for FCP. Usually, the proposed predictive method classified into Artificial Intelligence (AI) or statistical methods [10].
El-Kenawy et al. [11] present a modified binary grey wolf optimization (MbGWO) dependent upon stochastic fractal search (SFS) for identifying essential features with attaining the exploration and exploitation balances. Next, the diffusion procedure SFS implemented an optimum solution of modified GWO by utilizing the Gaussian distribution approach to arbitrary walk from a development procedure. Sankhwar et al. [12] establish a new predicting structure for the FCP method by the hybrid IGWO and fuzzy neural classifier (FNC). e proposed IGWO-based FS approach was utilized for discovering an optimum feature in the financial data. To classifier drives, FNC was utilized. e authors in [13] present Bolasso (Bootstrap-Lasso) that chooses consistent and relevant features in a pool of features. e consistent feature selection (FS) was determined as the robustness of selecting features in terms of alterations in the dataset Bolasso created shortlisted feature is then executed for several classifier techniques such as K-NN, SVM, RF, and NB for testing their prediction accuracy. Kim et al. [14] present globally optimizing SVM, signified by GOSVM, a new hybrid SVM approach structured for optimizing FS, sample selection, and kernel parameters.
is study presents GA for concurrently optimizing several heterogeneous designed factors of SVMs. Ghosh et al. [15] present a wrapper-filter group of ACO, whereas it can be established subset estimation utilizing a filter approach before utilizing a wrapper approach for reducing computational complexity. A memory for keeping optimum ants and feature dimensional-dependent pheromone upgrade has also been utilized for executed FS from a multiobjective approach.
is presented method is estimated on several real-life datasets, obtained in the UCI-ML repository and NIPS2003 FS challenge, utilizing KNN and MLP techniques.
is study develops an oppositional ant lion optimizerbased feature selection with a machine learning-enabled classification (OALOFS-MLC) model for FCP in a big data environment.
(i) To handle the big data in the financial sector, the Hadoop MapReduce tool is employed (ii) e proposed OALOFS-MLC model designs a novel OALOFS technique to choose an optimal subset of features which helps in attaining improved classification results (iii) e deep random vector functional links network (DRVFLN) model is exploited to perform the classification process (iv) e experimental validation of the OALOFS-MLC algorithm was performed using a benchmark dataset e remaining section in this paper as follows: Section 2 describes the proposed model, and Section 3 describes the results and discussions. Section 4 concludes the paper.

The Proposed Model
In this study, a novel OALOFS-MLC model was established for FCP in a big data environment. Besides, the presented OALOFS-MLC model designs a novel OALOFS technique to choose an optimal subset of features which helps in attaining improved classification results. Furthermore, the DRVFLN model exploited to perform the classification process. Figure 1 depicts the block diagram of the OALOFS-MLC approach.

Hadoop MapReduce.
Hadoop is a group of tools and technologies with considerable improvement; the application in the Hadoop technology solution is moderately outstanding in the public sources [16]. Map Reduce is the building block of Hadoop. It can be a corresponding program design framework. Map Reduce utilized for solving the problems of similar operations and analysis in largescale datasets. e foundation of the term Map Reduce is defined by the two fundamental procedures: the mapping process Map and the inductive process Reduce. Map Reduce implements the process simultaneously on a sequence of working nodes. Every node makes use of similar coding for processing the succeeded information without data communication. Map Reduce makes developers no longer assume the fundamental information while designing largescale dataset processing applications, understanding the consistent interface according to the operation that considerably decreases the improvement complexity and progresses the enlargement effectiveness.

Design of OALOFS Technique.
In this study, the presented OALOFS-MLC model designs a novel OALOFS technique to choose an optimal subset of features which helps in attaining improved classification results. Reference [17] proposed an Ant Lion optimizer (ALO) that is a nature-inspired metaheuristic approach that simulates the hunting system of antlion in catching their prey. Constructing traps, random walking (RW) of ants, catching ants, reconstructing, and traps entrapment of ants in traps are different measures of the ALO. e antlion is generally known as doodlebugs. Larvae and adults are 2 metamorphic phases in their life cycle. ALO is stimulated by the hunting system characteristics of antlions. e steps included in the calculation of the parameter of the solar cell due to the impact of the environmental condition are given below: Step 1. Initialization: An initialized population of ants is represented as X Ant � (x 1 , x 2 , . . . , x N ) and antlion is referred to X as X Antlion � (x 1 , x 2 , . . . , x N ) produced within the searching region of the parameter as x Antlion � I pv , E g , μ, R s , β s , R p , β p , n, α for ant and (x Antlion � I pv , E g , I 0 , μ, R s , β s , R p , β p , n, α) correspondingly, where the size of the population can be represented as N. e searching region of the parameter for Photowatt PWP201 PV module and R.T.C. evaluate the present value of each ant and antlion and describe the fitness values, discover the optimal antlions and it is represented as elite. Fix the maximal amount of iterations as max_iter.
Step 2. Constructing the trap: For all the ants, antlion is preferred by Roulette wheel selection according to the optimal fitness of antlion for constructing the trap for ant [18].
Step 3. RW of ant: e ant moves randomly to search for food and it can be arithmetically formulated in the following equation: In (1), cum sum can be represented as a cumulative sum. n refers to the maximal amount of iterations. it indicates the step of RW and r(it) denotes a stochastic function as follows: In (2), random. random is a randomly produced integer that lies within the range of [0, 1]. e normalization formula of the RW of ant from (1) is utilized for maintaining the location of the ant in the searching region.
In (3), a, b, c, and d are minimal of RW and are maximal of RW, and lower and upper bounds of the parameter correspondingly. An RW can be normalized for every parameter. For RW of ant (R A ), antlion is designated by the Roulette wheel and for elite antlion (R E ), and they are normalized and implemented.
Step 4. Trapping of ants: e mathematical expression of trapping ants can be given in the following equation: In Equation (4), i and j indicate the indices of designated ant and antlion correspondingly.
Step 5. Sliding of ant toward antlion: the antlion throws and at the edge of the trap for sliding the ant toward the trap once an ant tries for escaping. It is formulated by where I � 10 w ir/mox − iter., i indicates the present iteration and max − iter denotes the maximal amount of iterations. w indicates a constant that relies on the iteration as follows:  Computational Intelligence and Neuroscience w � Step 6. Catching prey and re -constructing pit: e fitness of the novel location of the ant was estimated. When the ant becomes fitter (viz., location of the antlion) when compared to the respective antlion, the ant has been trapped by the antlion and the antlion reconstructs the trap for the following hunt.
In (7), x AL represents the location of the antlion. is procedure is considered as catching prey and reconstructing the pit at the location where there is a higher probability of catching ant for the following iterations.
Step 7. Elitism: It can be the procedure for maintaining the location of optimal antlion (elite) by optimized technique. It can be performed by the following equation: In (8), t denotes the present iteration and x t a indicates the location of ant.
Step 8. Upgrade elite when an antlion becomes fitter when compared to elite.
Step 9. End when the stopping condition is accomplished otherwise return to Step 3 to start the following iteration.
For improving the efficiency and performance of ALO, the study presents a revised edition of the technique using the concept of opposition-based learning (OBL). From the abovementioned statement, ALO, as a member of a population-based optimization algorithm, initiates a set of primary solutions and tries to increase the efficiency toward the optimal solution. During the nonexistence of prior knowledge regarding the solution, the randomly initialized technique is applied for generating a candidate solution (rat first position). e convergence speed and performance are strongly associated with the distance of the first solution from the finest solution. In another word, the process has improved performance when the arbitrarily created solution has the lowest value when compared to the objective function. Based on the concept and to increase the chance of finding the global optima and the convergence speed of typical ALO, this study presents a revised edition of the approach named OALO. In the OALO, the initial iteration of the process afterward produces the first arbitrary solution, and the opposite position of every solution would be produced according to the conception of the opposite number.
To determine the new initialized population, it is essential to describe the conception of the opposite number. Given that n -dimension vector X is defined by the following equation: In equation (9), . en, the opposite point of x i , that is represented as x i , in the following: To employ the concept of opposite number in the initialized population of OALO, assume x i as an arbitrarily produced solution in N-dimension problem space (that is, solution candidate). For that arbitrary solution, its opposite would be produced by (10) and represented as x i . Next, these two solutions (i.e., x j and x i ) are estimated by the objective function f. Hence, when f(x i ) is superior to whereas ErrorRate signifies the classifier error rate utilizing the chosen features. α is utilized for controlling the significance of classifier quality and subset length. During the experiments, α is fixed to 0.9.

Data Classification Process.
Finally, the DRVFLN model is exploited to perform the classification process. e DRVFLN network is wide of shallow RVFL networks assuming deep or representation learning. An input to every layer from the stack result of the prior layer whereas every layer constructs an internal representation of input data [19]. At this point, regarding a stack of L hidden layers (HL), they all have the same count of hidden nodes N. In order to ease representations, neglect the bias term in the equation. Figure 2 depicts the framework of the RVFLN technique.
All the layers l > 1 can be defined by (9): whereas W (1) ∈ R d×N and W (l) ∈ R N×N imply the weighted matrices amongst the input-first and inter HL, correspondingly. Such variables (bias and weight) of hidden neurons were made arbitrarily in a suitable range and retained set from the trained stage. g signifies the nonlinear activation functions. Afterward, an input to resultant layers defined as follows: is model structure corresponding to the RVFL network. Whereas input to output layers has nonlinear features in the stacked HL and novel features. Afterward, the resultant is defined as follows: e resultant weighted β d ∈ R (NL+d)×K (K: the count of classes) has been resolved. In (14) and (15), DRVFLN occurs a linear integration amongst the feature and resultant layer weighted matrix β d which is the weight of the count of features from the HL containing the input layer.

Experimental Validation
e experimental validation of the OALOFS-MLC model is tested using two datasets namely German credit [20] and Australian credit [21] datasets. e former dataset includes 1,000 samples and 24 features. e latter dataset holds 690 instances with 14 features. Table 1 offers the number of features selected by the OALOFS-MLC model on the applied datasets.
e table values indicated that the OALOFS-MLC model that selected a total of 12 features for the German Credit dataset and 9 features for the Australian Credit dataset. Table 2 and Figure Table 4 offers a detailed comparative examination of the FCP outcomes of the OALOFS-MLC model with recent models on the German Credit dataset [22]. Figure 5 provides a comparative study of the OALOFS-MLC model based on sens y , spec y , and F −score . e figure indicated that the OALOFS-MLC model reached maximum classification performance. Regarding sens y , the OALOFS-MLC model has achieved a higher sens y of 97.36%; the PIOFS system, ACOFS approach, GWOFS methodology, and PSOFS model have obtained a lower sens y of 95.43%, 90.12%, 85.73%, and 81.28%, respectively. Also, aboutspec y , the OALOFS-MLCapproach has gained a superior spec y of 97.06%; the PIOFS system, ACOFS approach, GWOFS methodology, and PSOFS model have obtained a lower spec y of 95.06%, 90.82%, 89.48%, and 83.02%, correspondingly. In terms of F − score, the OALOFS-MLC system has achieved a higher F −score of 97.31%; the PIOFS system, ACOFS approach, GWOFS methodology, and PSOFS model have obtained a lower F −score of 94.88%, 92.93%, 89.31%, and 79.17%, correspondingly. Figure 6 illustrates a comparison study of the OALOFS-MLC model with recent techniques in terms of accu y , MCC, and kappa. e figure represented that the OALOFS-MLC approach has obtained maximal classification performance. In terms of accu y , the OALOFS-MLC algorithm has achieved a superior accu y of 98.75%, but the PIOFS system, ACOFS approach, GWOFS methodology, and PSOFS model have obtained minimal accu y of 95.23%, 90.81%, 89.31%, and 79.42%, correspondingly. Moreover, concerning MCC, the OALOFS-MLC algorithm has achieved a higher MCC of 96.13% whereas the PIOFS system, ACOFS approach, GWOFS methodology, and PSOFS model have obtained a lower MCC of 95.47%, 92.12%, 87.97%, and 80.22%, correspondingly. In addition, in terms of kappa, the OALOFS-MLC system has achieved higher kappa of 96.19%, whereas the PIOFS system, ACOFS approach, GWOFS algorithm, and PSOFS methodologies have obtained lower kappa of 94.24%, 91.94%, 85.98%, and 80.36%, correspondingly. e training accuracy (TA) and validation accuracy (VA) attained by the OALOFS-MLC approach on the German Credit dataset are demonstrated in Figure 7. e experimental outcome implied that the OALOFS-MLC system has gained maximum values of TA and VA. In specific, the VA seemed to be higher than TA. e training loss (TL) and validation loss (VL) achieved by the OALOFS-MLC algorithm on the German Credit dataset are established in Figure 8. e experimental outcome inferred that the OALOFS-MLC methodology has been least values of TL and VL. In specific, the VL seemed to be lower than TL.   Table 5 offers a detailed comparative investigation of the FCP outcomes of the OALOFS-MLC algorithm with recent systems on the overall work. Figure 9 provides at Table 5 comparative study of the OALOFS-MLC system with recent methodologies for sens y , spec y , and F −score . e figure indicated that the OALOFS-MLC model has reached higher classification performance. In terms of sens y , the OALOFS-MLC system has achieved a superior sens y of 97.41%, whereas the PIOFS system, ACOFS approach, GWOFS methodology, and PSOFS model have obtained a lesser sens y of 95.36%, 91.18%, 90.43%, and 83.51%, correspondingly. Also, for spec y , the OALOFS-MLC system has achieved a higher spec y of 96.53%, whereas the PIOFS system, ACOFS approach, GWOFS methodology, and PSOFS model have obtained minimal spec y of 94.71%, 91.28%, 85.52%, and 79.93%, correspondingly. Eventually, concerning F − score, the OALOFS-MLC methodology has achieved a higher F −score of 97.92%, whereas the PIOFS system, ACOFS approach, GWOFS methodology, and PSOFS model have obtained decreased F − score of 94.61%, 90.65%, 89.07%, and 79.06%, correspondingly.    Concerning accu y , the OALOFS-MLC system has achieved a higher accu y of 98.50%, whereas the PIOFS system, ACOFS approach, GWOFS Figure 11. e experimental outcome outperformed that the OALOFS-MLC methodology has gained maximal values of TA and VA. In specific, the VA seemed that superior to TA.    Computational Intelligence and Neuroscience e TL and VL attained by the OALOFS-MLC approach on the Australian Credit dataset are established in Figure 12.
e experimental outcome signified that the OALOFS-MLC system has accomplished minimal values of TL and VL. In specific, the VL seemed to be lower than TL.
From the detailed results and discussion, it can be stated that the OALOFS-MLC model has shown an effectual outcome on FCP.

Conclusion
In this study, a novel OALOFS-MLC model was established for FCP in a big data environment. To handle the big data in the financial sector, the Hadoop MapReduce tool is employed. Besides, the presented OALOFS-MLC model designs a novel OALOFS algorithm for choosing an optimum subset of features which helps in attaining improved classification results. Furthermore, the DRVFLN model is exploited to perform the classification process. e experimental validation of the OALOFS-MLC approach was performed utilizing a benchmark dataset and the outcomes highlighted the supremacy of the OALOFS-MLC model over recent approaches.
us, the presented OALOFS-MLC model can be exploited as an effectual tool for FCP in the big data environment. In the future, outlier detection and data clustering approaches can be applied to FCP.

Data Availability
All data are available in the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.  Computational Intelligence and Neuroscience 9