Maximum Entropy Principle Based on Bank Customer Account Validation Using the Spark Method

,


Introduction
Signifcant developments in information and communication technology over the last several decades have brought about changes in a number of domains, including international commerce.As a result, trade, economic, banking, customs, and other operations have transformed.Te development of diverse approaches for distinct IT operations for varied activities has also been facilitated by advancements in the realm of computers and intelligent ways.Ecommerce is one of the most signifcant economic successes of information technology, and quick and simple Internet connection has also made a perfect environment for commercial and economic transactions by giving users access to the virtual world [1].Online payment methods, ebanking, and e-commerce infrastructure are some of the crucial concerns.Any banking service that uses electronic tools and does not need the consumer to physically be at a certain location is considered electronic banking [2].
Banks are impacted by developments as a result of fnancial liberalisation and globalisation, and they are responding by increasing the range of services they provide to clients.Even though there is more competition for banking services, banks are working to boost their clients' adoption of online banking [3].Because of this rivalry between banks, more and more are seeking new locations in which to extend their services, and the number of banks providing Internet banking services is rising daily.More clients can be reached, and a wider range of services may be ofered thanks to Internet banking.Customers may get fnancial services more quickly and efciently and save time by managing their accounts with the help of online banking.Organisations have quickly adopted and employed information systems due to the evident advantages of information technology, which include improving business accuracy and speed, worldwide quality, cost reduction, and customer happiness [4].
In actuality, e-banking helps businesses save expenses and maintain their competitive edge over conventional banks [3].Given the advantages of Internet banking and a large number of Iranians using the Internet, statistics show that less than 40% of cardholders at 5% of each bank make online purchases, and only 5% of customers use the bank's Internet services, indicating that the full potential of cyberspace is not being utilised to provide these services [4].Te researcher's conclusions indicate that if customers reject or underutilize the new banking technology and services, they will not get any return on their investments.Tey want to get cutting-edge facilities and technologies [5].Banks need to come up with new competitive tactics in this environment, and one thing that afects such strategies is consumer behaviour.Because the growth of e-banking relies on consumer approval, it is crucial to investigate customer behaviour and variables infuencing e-banking adoption.
Receiving and paying money is one of the most crucial aspects of any bank.Tis may be carried out online or ofine, using the bank's Internet services or by visiting one of the bank's branches.To fnd every potential activity, these transactions must be managed and watched over.However, it is thought to be very difcult for the workforce to monitor them owing to the massive amount of transactional data.As a result, these transactional data are intelligently monitored using machine learning-based techniques.Data mining techniques are one aspect of machine learning that can recognise and extract new knowledge and information from data.Te identifcation of questionable activities in bank accounts and the identifcation of money laundering, which calls for specialised oversight, are two of the most signifcant components of transactional data.In order to achieve this goal, this study presents a learning-based approach for the analysis and detection of money laundering.To achieve this goal, data collection pertaining to bank fnancial transactions is required.Te data used in this study come from a Mellat Bank branch in Iran in February 2016.Tere are 3500 user-performed transactions in this dataset in a given month, including both legitimate and questionable ones.Data have no output and just transaction input values and time.Tus, unsupervised learning is the process of learning new information.
Te main approach of this research is to use the principles of data mining which reversibly tries to learn the data in general.Terefore, a new and optimal method will be used in combination.So, the spiking neural network (SNN) is used as part of training and testing data.Although this approach has been suggested before and used for a variety of reasons, it is not a good idea since the spark structure may also extract characteristics.However, via this study, the method has been refned.Terefore, a method called the maximum entropy principle (MEP) is used to improve it.Te MEP method can achieve dimension reduction, feature selection, and feature extraction by considering entropy criteria and iteration loops, which increases the execution speed of the Spark method and can increase the improvement in the ROC diagram and the AUC rate.Tese evaluation criteria have capability to examine fnancial transaction accuracy in detection in datasets.Te type of training method will be unsupervised which will increase the speed of implementation in the training and testing phases.Te proposed method is expected to have a functional advantage over the previous methods.So, the main contribution of this article is using the structural method for money laundering analysis and detection with SNN-MEP as unsupervised learning.Te rest of this paper is as follows: In Section 2, a literature review is discussed based on recent articles for similar goals.Ten, in Section 3, a new method of SNN-MEP is proposed.After that in Section 4, a simulation is performed and results are analyzed correctly.In Section 5, a conclusion as the main discussion of this research is reviewed.In general, the principles of some literature are reviewed as data mining in [6,7], SNN in [8], and transactions in databases and datasets in [9][10][11][12][13].
We address the area under the ROC curve (AUC) and the receiver operating characteristic (ROC) curve for clinical domain binary classifcation issues.A boosting strategy for maximising the AUC serves as the foundation for our statistical approach, which combines many feature variables.Te feature variables from several basic classifers are dynamically integrated into one strong classifer in this iterative process.Trough the use of a penalty term for nonsmoothness, we examine regularisation to stop overftting to the data in the algorithm.We can better comprehend the relationship between each feature variable and the binary outcome variable with the use of this regularisation technique, which also enhances classifcation performance.We provide evidence of the value of score plots created component wise using the boosting approach.We demonstrate the usefulness of our strategy by describing two simulated studies and an actual data analysis.

Literature Review
2.1.Internet Banking Systems.In a study, they examined the factors that determine the acceptance of the Internet banking system by users [14].Tere are not many studies in the feld of money laundering detection, but a series of other processes have been carried out in it which can lead to further work.Terefore, this section tries to explain a series of similar methods that can help to detect money laundering and any banking disorder.To examine the factors afecting the acceptance of Internet banking, they used the variables of trust, experience of using the Internet, use of other banking services, perceived ease, and usefulness.In their study, they found that despite investments in the use of information 2 Journal of Computer Networks and Communications technology in the feld of banking, some users do not use it despite access to technology.Tey also found that trust is an important and efective element on consumer behaviour and determines the success of technologies such as e-commerce and transaction analysis.In a study entitled Internet banking implementation, they found that full capacity of cyberspace is not used to provide these services in Turkey [15].Trust is not perfect as a person believes that using online banking is safe and that no privacy factor threatens his or her privacy.Tey found that security and privacy concerns were the biggest barriers to online banking in the country.Trust in Internet banking is also much more important and complex than traditional banking due to its virtual environment.Terefore, to make online purchases, customers must trust the online business and payment through the bank.Without trust, the consumer will refuse any online transaction.
In a study entitled mobile acceptance survey in China, they found that users, especially in developing countries, are accustomed to face-to-face monetary transactions and are more cautious in using online banking [16].Tey also found that users use technology when they fnd it useful, so banks should try to inform their customers about the benefts of using Internet banking over traditional banking.Informing customers that using the benefts of Internet banking services can help increase their productivity, make it easier to communicate with the bank, and improve people's business performance.
In a study entitled defnitive Internet banking in Greece [17], the following results were obtained: Perceived usefulness has a high impact on the attitude towards Internet banking services, the perception of usefulness has a signifcant efect on the intention to use Internet banking between attitudes and there is a strong signifcant relationship between the intentions to use Internet banking.Tere is also a signifcant relationship between trust in Internet banking and the intention to use Internet banking.In addition, customers' perceptions of usefulness, credibility, and ease of use play a major role in accepting Internet banking.Tey eventually concluded that when the attitude of users could be changed, the acceptance of Internet banking would also increase.
In [18] entitled outlier data recognition algorithm using big data processing and Internet of Tings (IoT) architecture, the authors examined the presentation of the outlier data recognition procedure using the K-means algorithm and big data processing using the Hadoop database and Mahout.In another article entitled detection and prediction of distance-based outlier data [19], the authors examined the method of distance-based outlier data detection that fnds higher outlier data in an unlabeled dataset and provides a subset of it.A set of calls for solving outlier dates and new hidden objects can be used to predict outlier data.Considering only one subset of all two-by-two distances from the dataset, the solution set contains a sufcient number of points to detect high outliers.Te properties of the solution set are investigated, and its calculation algorithms over time under the required quadratic equation are presented.

Data Management Systems.
In a two-step clustering process for outlier data detection [20], a new model of a twostep clustering algorithm is proposed for outlier point detection.Using the heuristic technique that if a novel input pattern is sufciently far from all center clusters, it will be assigned as cluster center, the classic K-means algorithm modifcation in phase 1 was used for the frst time.Te results show that the data point in the same cluster may most likely resemble all outliers or nonoutliers, and then, a minimum spanning tree is used in phase 2 by removing the longest edge.At the end, small clusters of trees with fewer nodes are selected as outliers.
In the continuation of this section, a series of methods that have been presented to date are also reviewed.Te main concern in lending organisations is credit management and customer needs' assessment which deals with data breaches, theft detection, and other lending scams.Data breach is a critical issue in the feld of global security in lending.In the United Kingdom, 93% of large organisations and 87% of small organisations sufer from data breaches [21].Te average cost in the UK against data breaches reported in [22] is about 1.4 $ million, and the recovery period is normally about 9 months and 3 days.Although various technical solutions for the accreditation and needs assessment of current funds have been ofered in various banks in recent years and some are being upgraded, this issue is still considered an important approach [23].Many approaches in this feld have been proposed with the neural network methods which are one of the newest methods proposed in [24][25][26][27][28][29] which have advantages and disadvantages of their kinds, including computational complexity, execution time.Te algorithm's lack of generalizability and inadequate accuracy is shown above.
Additionally, using the Portugal-related data set, a neural network-based approach is provided to forecast banks' marketing performance [30].In another approach presented in [31], the classifcation of bank customers in order to assess their needs and credit in the need for a loan is considered which is based on a Random Forest algorithm.In [32], the identifcation of prioritization of criteria afecting the benchmarking of bank customers using the hierarchical analysis process is presented.Te results of this research can show the needs of customers in lending.Tis research is using banking data of Mazandaran province, Iran, but it has been suggested that this topic can be implemented with diferent data.Another case study conducted for a bank in Tehran province, Iran, is presented in [33] which proposed a method as an application to various datasets.Tis method classifes the bank customers' data with a data mining and data clustering approach in order to identify the people in need of a loan and perform accreditation in the bank.In [34], the accreditation and the needs assessment of bank's customers with a data mining approach have been considered in order to estimate the growth of the fund as well as the number of people in need of loans.Te use of logic functions is proposed in this research.Te results show high accuracy of the proposed method, but computational complexity is also high.In [35], the authors examined the Journal of Computer Networks and Communications factors afecting the intention to use the bank's mobile payment system.Tese fndings show that attitude, mental norm, and behavioural control have a positive efect on the intention to use the mobile payment system.
In [36], the authors examined the efect of mobile banking on electronic customer satisfaction.Te sample of 360 customers out of 400 who used mobile banking services in Jordan Ahli Bank, Union Bank, HSBC Bank, and Sarmayeh Bank and the hypothesis were tested through simple regression, and the results represented that the use of mobile banking services to achieve electronic customer satisfaction is efective.Te results indicated that there is a statistically signifcant efect of the overall dimensions of mobile banking services on customer satisfaction.After a simple regression, it was indicated that privacy and accessibility are more infuential than other dimensions of mobile banking.

Electronic Banking Systems.
In [37], some infuencing factors of acceptance in electronic banking were surveyed with an analysis from the perspective of bank customers.Preliminary data were collected from 387 valid questionnaires that were randomly distributed among customers of 26 licensed Jordanian banks.Multiple regression analysis was used to test the hypotheses.Te main fndings of this study are avoiding uncertainty that has an important and positive efect on perceived ease of use and perceived usefulness.Perceived risk has a greater impact on customer attitudes, which in turn afects customers' intentions to use e-banking.Based on the idea expressed in this research, four reference articles can be cited, which in the adjacent areas are the proposed approach, which includes references [38][39][40][41].Since the issue of money laundering detection and any unauthorized transactions in the feld of anomaly detection is based on a dataset, data mining methods can be extracted and used from other research in other felds.For example, the authors of [38] objectively improved the efciency of a bank branch.Te proposed method is robust and multivariate clustering which has been proposed as a method called data envelopment analysis (DEA).Te main structure is clustering and complete data analysis covering each feld of data for the previous section.In this research, the structure of data coverage in the forthcoming project is used.In [39], a clustering of big data with the defnition of membership functions and fuzzy variables is performed as fuzzy type II, which can be used for modeling and simulation to obtain suspicious as well as uncertain data.Tis research is not in the feld of banking systems, but its method can be used.In [40], a three-cluster big data ensemble approach has been performed that has trained and tested a total of 19 real big datasets in the world and has achieved interesting results in all of them.Te algorithm is used for K-means optimization in clustering in an ensemble manner that reduces processing costs and improves data mining operations.Te three-cluster structure can be used in this research.Also, in [41], anomalies are detected from the dataset of the wireless sensor network (WSN), which can detect any intrusion and fraud.Te use of the sampled window structure is modeled to improve data clustering and identify any category of anomalous operations as an important issue.It is clear that the scope of this research is in network systems, but the clustering structure of this research can be used with the method mentioned in this project, integration and windowing operations to improve the combined model of SNN and MEP.
Tere are some other studies that have been conducted in the feld of bank customer's validation.Of these, the most recent one in [42] provided a multilevel tutorial on label ratios for classifying bank customers.In this research, an attempt has been made to use a multilevel extreme learning machine (ELM) for learning from label proportions (LLP).With a neural network-like structure, ELM has a higher computational speed and better generalizability.As a result, it is convenient to deal with large-scale, multitier problems.In order to maintain the stability of the stable model when increasing the size of the bag, a small number of label samples in its model, called LLP-ELM, have been performed in a semisupervised learning framework.Experiments prove that there are benefts to improving this research for large class sizes.In practice, it is worthwhile to consider the proposed algorithm for learning multiple classes of label ratios for bank customer classifcation and other multiclass scenarios.A descriptive study on customer satisfaction with mBoB (Mobile Bhutan Bank) services has examined the level of satisfaction and challenges facing mBoB users in Bhutan [43].For this purpose, preliminary data were collected using a structural questionnaire consisting of 28 questions.A convenient sampling method was used to select 150 respondents around the site.Te collected data were analyzed using descriptive statistics such as simple frequency, percentage, and mean.Tis study indicated that factors for ease of use, security, and 24-hour access are statistically signifcant and have positive efects on customer satisfaction with the use of mBoB.
Customer sharing based on Internet banking data can be used which has been studied in [44].Clustering is an unsupervised data mining technique that can be used to divide the customer.Tis research creates clustering models on customer profle data using XYZ Internet banking.Clustering methods were performed using the K-means method and the K-medoids method based on the RFM score of the customer's online banking transactions.In this research, a knowledge discovery methodology has been used.Te performance of both methods was measured and compared.Te results represented that the K-means method based on the cluster K distance (AWC) performed better than the Kmedoids method.According to the Davies-Bouldin index, K-means perform slightly better than K-medoids.Also, in [45], the authors proposed a novel machine learning method for money laundering based on the Python programming language.Detecting money laundering using neural networks in order to stop terrorism is suggested in [46].Also, in [47,48], the authors used another method of machine learning for fght with money laundering in banks.In [49], a new method was proposed for detecting suspicious activities in bank data for money laundering detection as a new application and prototype.Also, two diferent antiterrorism models for money laundering detection were proposed in [50] for the Bitcoin (BTC) cryptocurrency index and [51][52][53] for BankX in the United Kingdom.

Proposed Method
In the proposed model, frst, the SNN with sparks performs clustering operations and then classifcation at the data level and then, the MEP method can reduce dimensions and select and extract features by considering entropy criteria and iterative loops based on classifcation operations were obtained, which increases the execution speed of the SNN and can also help to improve the ROC diagram and AUC rate and other evaluation criteria and can accurately examine fnancial transactions in datasets.Terefore, the proposed approach has two parts: the frst part comprises clustering and classifcation operations with SNN and the second part comprises dimensional reduction, selection, and feature extraction with MEP.  which is a member of the clusters in the global model.In the most existing clustering results, a cluster is usually represented by a single set, which divides the global U model into two regions.From a decision perspective to view the representative of a single set, it means that the objects in the set belong to this cluster and the objects in the set defnitely do not belong to this cluster.Tis is a typical result of bilateral decisions.In this research, this issue is called cluster representation based on bilateral decisions.However, a two-way representation of a cluster cannot indicate that objects may belong to that cluster and cannot intuitively indicate the amount of data infltration during the cluster formation process.Terefore, it makes more sense to use three regions to represent a cluster than to use two regions.Terefore, a cluster representation based on tripartite decisions is proposed [40].In contrast to the cluster resolution overview, this study presents a three-dimensional C cluster as a pair of sets that are in the following equation [40]: (1) In this regard, Co(C) ⊆ U and (C) ⊆ U.It is assumed that there is a structure in the form Tr(C) � U − Co(C) − Fr(C).Ten, Co(C), Fr(C), and Tr(C) are naturally three regions as the core region, the margin region, and the trivial region, respectively.If x ∈ Co(C), the object x belongs to the cluster C; if x ∈ Fr(C), the object x may belong to the cluster C; and if x ∈ Tr(C), the object x belongs to no cluster of C. Tese subsets have a series of features [40]: In these equations, if Fr(C) � 0 /, the representation of C in equation ( 7) will be C � Co(C), which is a single set and Tr(C) � U − Co(C).It represents bilateral decisions.In other words, the representation of a set is a special case of the representation of three-dimensional clusters.In addition, according to formula (2), it is sufcient to artifcially represent an area by the core area and the margin area.Alternatively, for 1 ≤ k ≤ K, a cluster schematic can be provided by the specifcations in the following equation [40]: ( In the equation, the attribute (i) implies that a cluster cannot be empty.Tis makes a cluster physically meaningful.Te attribute (ii) states that each U object must defnitely belong to a cluster or may belong to a cluster which ensures that each object is properly clustered.According to the family of clusters C, there is a family of clusters such as equation ( 4) formed by tripartite representations [40]: It is obvious that the family is formed under bilateral clusters which are as follows (5) [40]: Soft clustering and hard clustering can be formed as described in equation ( 6) under representations which are available for clustering if k ≠ t [40]: If equation ( 6) is established, it is called soft clustering, and otherwise, it is called hard clustering.Until a condition of equation ( 6) is met, there must be at least one object belonging to more than one cluster.Obviously, a tripartite representation has advantages: displaying a specifc set is a special case of representing tripartite clusters.It intuitively shows which objects are the core of the cluster and which are the edges of the cluster.Tis type diversifes the overlap and reduces the search space when focusing on objects with the overlap/border.Here, an evaluation-based tripartite cluster model is introduced that generates three regions using an evaluation function and a pair of thresholds.Tis model partially addresses the problem of data mining of a global set of three areas of banking data simultaneously.It is assumed that there is a threshold pair (α, β) and α ≥ β.Although estimates are restrictive based on a general order, they have a computational advantage.By comparing valuation values with a pair of thresholds, three areas in the bank data can be obtained for initial processing for clustering.Based on the evaluation function v(x, C k ), tripartite decisions are made in the following equation [40]: Based on the evaluator function v(x, C k ), a new algorithmic structure can be proposed in Spark.In fact, to formulate the evaluation function, we can refer to similarity measures or remote measures, probability, possibility functions, fuzzy membership functions, Bayesian validation measures, submeasures, and so on.Tis study, using the evaluation function v(x, C k ), attempts to optimize Spark's orthogonality in a spike neural network with CRF.Te spiking neural network presented in this research has high fexibility, in addition to using the linear function to activate cells or neurons or sparks in the hidden layer, uses nonlinear activating functions such as sigmoid or sinusoidal, or uses nonderivatives as well as intermittent activation functions.By default, Spark has an equation, as follows: According to this equation, β i represents the weights between the input layer and the hidden layer, β j represents the weights between the output layer and the input layer, b j is the threshold value of the neurons in the secretory layer or bias, and g(. ..) is an activator or stimulus function.Input layer weights w i,j and bias b j are randomly assigned.Te activation function g(. ..) is assigned at the beginning of the number of input layer neurons or n and the number of hidden layer neurons or m.According to this information, if the known parameters in the overall equilibrium are combined and rearranged, the output layer will be as expressed in the following equation: In all models of training-based algorithms, the main goal is to minimise errors as much as possible.Te y p output error function is obtained by the actual y main output in Spark, which can be divided into the training part  s k (y main − y p ) and the test part ‖ s k (y main − y p ) 2 ‖.For both functions, the output y p obtained by the actual output y main must be equal to y p .When this equation is performed and the results are satisfed, an unknown parameter is specifed.Although sparks have been used to consider the local dependencies of labels, they are not sufcient for banking data.Tis is mainly because anatomical structures have complex shapes that are diferent for modeling.In addition, the temporal or spatial relevance of banking data also plays an important role in clustering that should be considered in the method.Terefore, it is better to correct the probability mapping obtained by Spark.Te H matrix can be a very unlikely matrix, meaning that the amount of data in the training set phase may not be equal to the total number of data properties.Terefore, the operation of reversing [H] and fnding weights or β will be a major issue and challenge.To overcome this challenge in Spark, a fully connected CRF matrix is used, which can develop the approximate inverse calculation of nonreversible matrices, which can perform clustering operations with high accuracy and incredible speed compared to the method.
In recent years, CRFs have been widely used in many data mining and processing applications because of their good performance for modeling complex dependencies in spatial data.To cluster banking data, CRFs can be used not only to model the relationship between data features but also to model the dependency between their local data.As mentioned earlier, CRFs have been formulated as neural networks for various data mining and processing operations.However, the process of teaching their method is cumbersome and computationally complex.In contrast, in this paper, CRFs will be used as a suitable processing method.Using a fully connected CRF matrix and layer, β * is the output matrix and H * is a fully connected inverse CRF inverted matrix of H. Terefore, according to Spark optimization, which is CRF-Spark in this section, the problem of output weights in Spark was solved and converted to B * � H * .In general, CRF-Spark becomes a chain of repetitive modules over time in the training phase.CRF-Spark will be able to work like a conveyor, that is, add or subtract information to neurons.Unlike deep learning structures and other models, such as the support vector machine or new Bayesian, no weight updating operations are performed during training.CRF-Spark can specify features during data mining.By minimising the CRF energy performance, an appropriate model is taught that can be modeled as where M � 2 is the number of Gaussian kernels and w (m)  refers to a weight for the Gaussian kernel m.μ(y i , y j ) � [y i ≠ y j ] is the function label which is compatible.Te variable k (1) represents the appearance of the kernel trying to assign the same class tags to neighboring and adjacent data with the same intensity.Also, the variable k (2)  represents the smoothness of the core, which aims at removing unnecessary areas.Tese two steps are represented as the following equations: where e i and e j are the application intensities i and j, s i and s j are the corresponding spatial coordinates, f i and f j represent the characteristic vectors of each data pair, namely, performance intensity and spatial information, and θ α , θ β , and θ c represent Gaussian kernel parameters, respectively.However, some parts of data clusters may not be processed and mined properly in this way, so an optimization of this algorithm will be performed in layers.In general, the layers of the Spark method are the use of an input layer with a number of neurons (spike).
Next, the structure of the training and test layers has convolutional, pooling, and fully connected layers with CRF.Ten, a softmax function is embedded for it and an output layer to display the work.Te windowing of the training layers is in the form of a matrix, which is considered as 9 × 9 in the convolve layer, 7 × 7 in the pooling layer with two random pooling, and 5 × 5 as the maximum pooling section.Te structure of the fully connected layer is CRF as described in the previous section, and its window structure is 9 × 9. Te softmax function is also 7 × 7.
In fact, the initial Spark training and processing operation takes place in the training layer, including the convolutional layer and the pooling layer which are interconnected, that is, frst a convolutional layer, followed by a random pooling layer, another convolutional layer, and a maximum pooling layer.At the end of this training layer, there is a fully connected CRF layer.Ten, outside the training layer, there is a softmax function that performs the optimization with MEP.It should be noted that the number of neurons in each segment is decisive.Te main framework of the clustering operation with the Spark model is specifed in Figure 1.

Part 2: MEP for SNN Optimization Based on Classifcation
Structure.A general distribution of the Spark structure is created that the entropy of a distribution p(n) is expressed as follows [41]: If, in this equation, the distribution is discrete, there is an equation as follows [41]: According to these equations, p is a probabilistic value and p(n) is a maximum entropy-based distribution.Te principle of maximum entropy states that the most appropriate distribution for modeling a given dataset is the highest entropy among all data that meet the limits of prior knowledge.Tese constraints are always given as a relation based on the intended distribution.Te mode of entropy maximisation for operations is to reduce the dimensions, select, and extract features to improve clustering operations and change the classifcation structure in the following equation [41]: In this regard, G is the mapping in terms of distribution, E is the expected value for the data test, and Z is the result of the Spark operation which represents the entropy for the next step in maximisation.D is the distance between the clusters created in the Spark operation which originates from the global U set and is supposed to interpret the operation into a classifcation structure and create classes to specify the type of banking operation of the data.R m is also a real set of entropy to be maximised, and U or the global set of clusters based on the Spark model is a member of R m .

Simulation and Results
Te data obtained are the result of observation and testing that was obtained from Mellat Bank, Iran, in February 2016.When data enter the program as input, they are normalized.Te data need to enter the SNN as a whole.Te data are placed in the input layer completely, each column is known as the data attribute, each row is known as a spike (neuron), and the sum of these spikes is called the sparks.Te total number of features is 9 items that are placed in the input layer.Data will be entered as input from the input layer to the hidden (middle) layer of the SNN.In these middle layers, training operations are performed.Tese layers include the convolutional layer with a 3 × 3 windowing flter and then the random pooling layer with a 3 × 3 windowing flter.Te total number of features in this middle layer in the training phase is 18.In fact, in the data training phase of the SNN, the initial clustering operation is performed along with training, but it is observed that 18 features are a bit high and the features should be reduced.Terefore, there is a function called softmax in which the MEP algorithm is executed which can perform feature extraction operations (dimension reduction, feature selection, and extracting the best features).In this section, the windowing flter in the softmax function is 9 × 9. Te size of windowing flters should be Journal of Computer Networks and Communications individual, and their size depends on the design of the neural network and is determined experimentally.During the feature extraction operation, which has three main steps (dimension reduction, feature selection, and extracting the best features), the windowing dimensions should be slightly larger than those of the training layer.Now that the MEP has been implemented in the softmax function of the SNN and feature extraction has been performed, it can be seen that there are 3 top features that have been selected.Ten, there is an output layer that converts clustering mapping in the training phase along with feature extraction with the MEP into a classifcation mode.Tus, the two key participants in this study are the data that identify money laundering suspects and the healthy data that do not include money laundering.Te general confguration of the SNN with the explanations provided can be seen in Figure 2.
Tere are nine primary data properties in the input layer, eighteen general data properties in the middle layer during the training phase, and the SoftMax function runs the MEP algorithm.During the training and clustering phases, the properties of the eighteen properties are reduced to three properties, and the same three features are used for classifcation in the output layer.Tere are two weights for the middle layer: the frst weight is 0 and the second weight interfering with the softmax function is 1, and the bias size is always constant 1. Te type of activation function in the training phase based on clustering is defned as sigmoid.In the softmax function with the MEP executable algorithm, both weight and bias are considered equal to 1, and its activation function is considered linear in the feature extraction operation.In fact, in the two middle layers (with convolutional and random pooling layers) and softmax function, training operations are performed and clustering and feature extraction are performed, respectively.In the output layer, the fnal classifcation is performed.Levenberg Marquardt was used as an adaptive marker in the neural network training process, along with descending gradients with momentum (traingdx) and mean square error (MSE) was used as the performance assessment criteria over 100 iteration rounds.Te efciency rate of the SNN-MEP based on entropy can be seen in the performance output of this network which is shown in Figure 3.
It is observed that in 1000 iterations, the minimum is close to the target or the best network conditions.Te training states can also be seen in Figure 4.
As shown in the top portion of Figure 4, the gradient rate of the network was 0.0016979 across 1000 repetitions, with ascents in each component of the gradient.Te two middle Figure 5 indicates the amount of money loaded in 3 days from the main data, which is related to 30 days to 85 days, during a span of three days (which increases the rate of money increase).Prediction in the data can help show embezzlement or any other action intended, such as money laundering.Te red graph shows the loading of money into the data box.Te blue graph shows the average load of money into the data box.Te black graph shows the daily amount of money loaded into the data box, and the turquoise chart shows the minimum amount of money loaded into the data box.Te following is a graph of the output of the clustering phase, which can be seen if there is noise in this phase, the output of which is in the form of Figure 6.
Based on Figure 6, it can be seen that there are two main noises in it.Tese noises can indicate money laundering or a high volume of features.Terefore, feature extraction operations are performed using the MEP method in the softmax function.Now, it can measure the accuracy of the operation, the total output of which is shown in Figure 7.
According to Figure 7, accuracy measurements in percentage (out of 100%) over a period of 80 days of the total data can be observed.Tere are two graphs in this output: the frst blue diagram is the diagram of the proposed approach of this research which shows the SNN-MEP method and the red diagram shows the single use of the SNN.Te proposed approach with the blue diagram is somewhat better than the red diagram of the SNN alone which can be seen from a high value to a low detection percentage within 10 to 20 days of observation.It can be said that in this period    of time, money laundering has been carried out which has drastically reduced the accuracy of the approach, or it can be clustered and classifed due to high noise in the data in the combined operation phase.However, from the 20 th to the 60 th day, it is observed that the accuracy is increasing, and on the 61 st to the 70 th day, an increase and a decrease are observed; then, from the 70 th to the 80 th day, the accuracy is again increased.Based on a general analysis of Figure 6 and its comparison with Figure 7, it can be concluded that there is a problem in the data that have reduced the accuracy in some parts and increased in some parts.Te overall accuracy of the proposed hybrid approach on the 80 th day is 87% and has an error of 13%, and the method of using only the SNN on the 80 th day is about 84.71% accurate which shows the improvement in the proposed approach compared to the classic SNN.In the following, the ftness function is considered for this proposed approach, the output of which is shown in Figure 8.
Based on Figure 8, it can be seen that the output of the red graph is for the lower levels of the classifcation, the blue graph is for the upper levels of the classifcation, and the green graph is for the middle level.Based on this, it can be seen that there is a real noise in the total amount of data in the sparks along with their properties which is higher and lower with the blue and red graphs, above the normal level, i.e., the green graph and an error.It is also observed in Figure 7 that the proposed approach has a 13% error.Next, an evaluation is performed from the ftting function according to Figure 8, the output of which is shown in Figure 9.
Based on Figure 9, the ftness function ratio is performed on the evaluation of 50 datasets suspected of money laundering which is the data pattern.It can be seen that the proposed approach with a red graph to a higher proportion than the red graph which is the classical approach has been able to identify a higher level of the original data pattern and receive a better analytical output for evaluation.Tese 50 datasets show a high volume with noise, and it is possible to detect money laundering by making it smaller, which is performed with 25 datasets, and the output is shown in Figure 10.
Based on this output, in 25 classes of 5 data from the total data, there is a general amount of money laundering and nonmoney laundering.It can be seen that the 4 th dataset in the two sections has a high recognition of the proposed approach with the blue diagram, and the 3 rd dataset also has this problem in diferent parts of the data classes.Terefore, money laundering is seen in datasets 4 and 3 of the total data, and the rest of the data are healthy and has no money laundering.

Conclusion
Bank customer accreditation is considered a critical process in the statistics of any bank or fnancial organisation.It is signifcant because user transactions must be identifed in order to assess and diferentiate between healthy and unhealthy currencies.Tis research will provide a perceptive approach consistent with its emphasis on the use of authentic Iranian bank data to identify money laundering.Annual money laundering eforts pose a danger to global economic stability.Tese businesses may proft from criminal activities, endangering the stability of international fnancial systems.Due to these reasons, money laundering is seen as a severe problem in many countries.Both academics and analysts have a specifc interest in the use of software technology to enhance the detection of money laundering activities.While ensuring that they provide more sincere and efcient customer service, banks and other fnancial institutions are striving to ensure that unlawful conduct is discovered and stopped.In  a better technology base in order to handle antimoney laundering activities efciently.Te method this research suggests for identifying money laundering is using an intelligent technology.

Figure 1 :
Figure 1: Clustering framework with the Spark model.

Figure 6 :Figure 5 :Figure 7 :
Figure 6: Te output of the clustering phase to view the noise at this stage in the training phase.
Journal of Computer Networks and Communications in which P(y i | I) is the probability obtained by Spark for each data i.While measuring the capabilities of two matrix pairs of CRF in the fully connected layer, it deals with the relationship between each data pair which is defned as )In this regard, u, p ∈ 1, 2, . . ., C n )  are processing data and i, j ∈ 1, 2, . . ., N { } are parts of the original data or I. Ψ p (y i ) � − log P(y i | I) is a negative logarithmic probability 6 Two diferent clustering and classifcation mappings for the spiking neural network (SNN) are used in this approach.Tis strategy uses the maximum entropy principle (MEP) technique since the SNN is unable to minimise the quantity of data during feature extraction and optimum feature selection.A training layer for feature extraction operations based on the MEP technique and a training layer for clustering utilising convolutional and random/maximum pooling layers are often included in the softmax function in an SNN.Clustering and feature extraction mappings are identifed and listed below as methods for data classifcation and money laundering detection.Te present study's overall approach improves the diagnosis in two datasets 3 and 4 of all the data in a time frame of 30 to 80 days, with an accuracy as high as 87% compared to the previous technique, which is 84.71% accurate and employs just the SNN.