^{1}

^{1}

^{1}

Mining outlier data guarantees access security and data scheduling of parallel databases and maintains high-performance operation of real-time databases. Traditional mining methods generate abundant interference data with reduced accuracy, efficiency, and stability, causing severe deficiencies. This paper proposes a new mining outlier data method, which is used to analyze real-time data features, obtain magnitude spectra models of outlier data, establish a decisional-tree information chain transmission model for outlier data in mobile Internet, obtain the information flow of internal outlier data in the information chain of a large real-time database, and cluster data. Upon local characteristic time scale parameters of information flow, the phase position features of the outlier data before filtering are obtained; the decision-tree outlier-classification feature-filtering algorithm is adopted to acquire signals for analysis and instant amplitude and to achieve the phase-frequency characteristics of outlier data. Wavelet transform threshold denoising is combined with signal denoising to analyze data offset, to correct formed detection filter model, and to realize outlier data mining. The simulation suggests that the method detects the characteristic outlier data feature response distribution, reduces response time, iteration frequency, and mining error rate, improves mining adaptation and coverage, and shows good mining outcomes.

With the rapid development of broadband wireless access (BWA) technologies and mobile terminals, the currently emerging mobile Internet integrates both mobile communication and Internet access. The term “mobile Internet” is a generic term that refers to implementations of activities that combine Internet technologies, platforms, commercial patterns, and applications with mobile communication technologies. As the mobile Internet has developed, applying mobile technology to a large real-time database can significantly improve the database’s operational efficiency. However, many potential safety hazards can occur during operation of large real-time databases under mobile Internet conditions. Therefore, to avoid security vulnerabilities, investigating how to effectively monitor and mine outlier data in mobile Internet-based large real-time databases has become hot research topics [

Traditionally, mining methods that integrate clustering of mapping perturbation searches and the fuzzy C mean value have been adopted for outlier data mining in large mobile Internet-based real-time databases. However, these methods neglect the complexities of large real-time databases in mobile Internet conditions and limit the efficiency with which outlier data can be detected in such databases [

Aiming to solve the shortcomings of these traditional methods, a decision-tree outlier-classification feature-filtering detection-based method is presented in this paper to mine outlier data in a mobile Internet-based large real-time database. This method’s advantages are that it can construct a magnitude spectra model of outlier data by analyzing real-time data features. Based on this model, the preliminary signals of outlier data are analyzed to obtain an information chain in which outlier data exist. The method improves the efficiency of subsequent outlier detection. A transmission model for a decision-tree information chain using internal outlier data under mobile Internet conditions is constructed to precisely cluster mobile and scattered data and obtain an information chain with outlier data, which is then introduced into the database in an orderly fashion and, therefore, supports the improved precision of subsequent mining. An information flow with internal outlier data in an information chain is obtained in a large real-time database and undergoes clustering to further improve its precision, reduce the complexity of mining computation, and lay the foundation for a later division of the confidence interval. Later, using the improved mining method, the phase position characteristic of the prefiltered outlier data is obtained in outlier data-contained information flows, and a decision-tree outlier-classification-characteristic filtering algorithm is used to obtain an analytic signal and a constant amplitude. Then, the decision-tree filtering method is used to screen out useless high- and low-frequency components and to achieve the phase-frequency characteristics of outlier data. WT threshold denoising is combined with signal denoising, which greatly reduces the response time, iteration frequency, and error rate of mining, improves mining adaptation and precision, and corrects the detection filter model. The method effectively improves the precise coverage and probability of outlier data mining and achieves outlier data mining in a mobile Internet-based large real-time database.

Assume that

The decision-tree cross-term information chain is adopted for data under the mobile Internet using the following expression:

In the above formula,

Formulas (

Under mobile Internet conditions, assume that

In the preceding formula,

Transmission model for outlier-data-carrying decision-tree information chain under mobile Internet conditions.

In Section

First, a time domain and spatial domain analysis are conducted for the information chains transmitted to the large real-time database. Formula (

Assume that

Second, based on acquiring the global frequency feature of the information flows, the C4.5 decision-tree model [

In the above formula,

The preceding formula describes the check-bit generated by outlier data-containing information flows in the information chains in the large real-time database that represent the data interference frequency. On this basis, formula (

Third, after preliminary data screening, dynamic mining is conducted for outlier data-carried information flows. The interference ratio is calculated for every decision-making root node

Finally, based on the processing steps above, the outlier data-containing information flows in the information chains of the large real-time database are obtained, forming a basis for the design of a decision-tree outlier-feature-classification algorithm-based outlier data mining method.

After acquiring the outlier data-containing information flows as presented in the preceding sections, the information flows are clustered, and the MOC algorithm [

The (XB) index measures in-class compactness and interclass separation after the division of the outlier data-containing information flow clusters and is defined as the ratio of the internal compactness of the outlier data-containing information flow aggregate

Then, the optimized

During the iterative process of the MOC algorithm, outlier data-containing information flows are set first, before setting the weighting coefficient of every outlier data aggregate,

In formula (

Note that the preceding formula is used to calculate the cluster center of every outlier information flow aggregate during the clustering process. When the variance of the cluster center is below

In conclusion, the MOC algorithm is used to cluster outlier data-containing information flows in a large real-time database. the clustered information requires further data mining.

The procedures discussed in the previous section improve the method for outlier data mining in a mobile Internet-based large real-time database. The decision-tree outlier-classification feature-based filter algorithm is adopted in this paper. Essentially, decision-tree outlier-classification feature-filtering is a continuous process that moves from high-frequency filtering to low-frequency filtering [

In the above formula, any outlier data signal

For decision-tree classification feature-filtering, the above formula is adopted as a real signal orthogonal term. The corresponding analytic signal and instantaneous amplitude are obtained as follows:

In the above formulas,

The introduced decision-tree outlier-classification feature-filtering algorithm is adopted in this study combined with the threshold noise reduction method for wavelet transforms to reduce signal noise. The offset degree is analyzed. First, the feature state response distribution of outlier data contained by the information flows of the large real-time database is analyzed [

In addition, formula (

In the preceding formula,

To evaluate the efficiency of the improved method for outlier data mining in a mobile Internet-based large real-time database, a simulation experiment was conducted. The experiment was based on Matlab simulation software. A large real-time database model was constructed. The selected signal model of data under mobile Internet conditions was a group of LFM signal frequency spectrum with a frequency band of 2~10 KHz and a duration of 4 ms. The

Parameters used to classify outlier data with the decision-tree in large real-time databases.

Title | Explanation | Default |
---|---|---|

Binary splits | The binary tree method is used to divide noun attributes. | False |

Confidence factor | Prune confidence factors (factors smaller than the given value are pruned from the subtree). | 0.25 |

MinNumObj | The number of instantiation that will be pruned from leaf nodes. | 2 |

NumFolds | This value is used to reduce the error-pruning data flow; the remaining data are used to construct the tree. | 3 |

ReducedErrorPruning | Prune with the error-reduction method. | False |

Seed | Prune with error-reduction method and transplant subtree seeds of random data. | 1 |

Unpruned | Determines whether result tree has been pruned. | False |

Parameters of the simulation experiments.

CPU amount of a single node | 4 |

Single CPU | Core i5 3.11 GHZ |

Internal storage of a single node | 4 G |

Operating system | Windows 7 |

Hardware | 500 G |

Switched network | 200 M optical network |

According to the parameters in Tables

In the experiment based on the unimproved method, according to the offset degree analysis of the model for outlier data in a mobile Internet-based database, a decision-tree outlier-feature classification method was adopted for outlier data mining. The steps of the traditional method are as follows. First, a decision-tree-based information chain transmission model for outlier data in a mobile Internet-based large real-time database is constructed. Then, the decision-tree classification feature algorithm is used to mine the outlier data. Finally, the model for outlier data signals in the large real-time database is shown in Figure

Outlier data signal model obtained using the traditional method.

Interference term datasets were shown when mining outlier data, and there was substantial interference noise in the wide-ranging subspaces of the outlier data series, resulting in a low confidence coefficient of the mining algorithm. It was difficult to construct a model for outlier data mining in a mobile Internet-based large real-time database; consequently, the conditions shown in Figure

Therefore, after obtaining the transmission model for the outlier data-containing information chains, denoising and dynamic mining of outlier data-contained information flows and clustering of information flow are required for information chains transmitted into the large real-time database. Only after the denoising and dynamic mining processes are complete, can effective and precise outlier data mining be realized.

Using the proposed method, outlier data-containing information flows in information chains of a large real-time database are obtained first. Then, the chirp-signal amplitude-frequency features are utilized to obtain the global frequency features and to model the data outlier-classification features and analyze its signals. On this basis, the enveloping features formed by outlier data-containing information flows are obtained from the information chains of the large real-time database, and their interference frequencies are acquired. Next, the probability weights are calculated, and after preliminary big-data screening, dynamic mining for the outlier data-containing information flows in the information chains obtains the information content of the time domain waveform for certain frequency bands.

The transmission process of outlier data-contained information flows in information chains in a large real-time database was sampled; and the input and output were set as Data In or Data Out. The sampling frequency for the global outlier data-contained information flows was 12.58 Hz, the sampling interval was 32.4 s, and a total of 1,024 sampling points were output stably [

Time domain sequence waveform of the original outlier data information flows.

To test the effects of the improved method, the CCPSWNIDA method (Message request is in proportion to message response. With the function of monitoring abnormal conditions, once the percentage of outlier variation is found to be beyond a normal range, the outlier will be alarmed so as to realize the purpose of invasion detection.) (Ni, 2014) [

Time domain sequence waveform of outlier data-containing information flows obtained by the proposed method.

Time Domain sequence waveform of outlier data-containing information flows obtained by the CCPSWNIDA method.

As shown in Figures

Later, the MOC algorithm was adopted to cluster the outlier data-containing information flows obtained from the presented method, and the clustering extraction result is shown in Figure

Directional clustering results of outlier data-contained information flows under MOC algorithm.

Finally, outlier data mining was conducted for the outlier data-containing information flows using a top-down pattern. Further experiments were conducted using the method presented in this paper, which was obtained according to the model for outlier data-containing information chain transmission under mobile Internet and the model for acquiring outlier data-containing information flows in the information chains of a large real-time database.

First, the phase position features of the prefiltered outlier data were obtained. Then, corresponding analytical signals and instant amplitude values were obtained according to the decision-tree outlier-classification feature-filtering, and the decision-tree filtering method was adopted to remove several high-frequency components before the outlier data and several low-frequency components after the outlier data. The phase-frequency features were then obtained. On this basis, the decision-tree outlier-classification feature-filtering algorithm was introduced and combined with the WT threshold denoising method to obtain the final model for outlier data detection and filtering. This process confirmed the generation of outlier data signals and obtained the feature state response distribution of the outlier data, as shown in Figure

Curve of outlier data feature state response distribution detected with dynamic scheduling of information flows (the proposed method).

Curve of outlier data feature state response distribution detected with dynamic scheduling of information flows (the traditional method).

In Figure

For the outlier data feature states, the times and errors of multithreading dynamic scheduling of information flows were compared between the traditional method and the proposed method. The maximum number of detected samples was set to 500, and the time and errors of outlier data feature state responses were compared between the two methods under different sample amounts. The experimental results are shown in Tables

Times and errors of outlier data feature state responses detected with the proposed method.

Detection sample amount | Proposed method | |
---|---|---|

Detection time/s | Error/% | |

50 | 2.7 | 0.003 |

100 | 3.0 | 0.002 |

150 | 2.4 | 0.002 |

200 | 2.3 | 0.003 |

250 | 3.4 | 0.004 |

300 | 3.3 | 0.003 |

350 | 2.6 | 0.002 |

400 | 2.7 | 0.002 |

450 | 3.1 | 0.002 |

500 | 2.8 | 0.002 |

Times and errors of outlier date feature state responses detected with the traditional method.

Detection sample amount | Traditional method | |
---|---|---|

Detection time/s | Error/% | |

50 | 6.9 | 0.1 |

100 | 6.4 | 0.08 |

150 | 7.4 | 0.09 |

200 | 7.8 | 0.07 |

250 | 8.0 | 0.06 |

300 | 10.2 | 0.07 |

350 | 6.7 | 0.04 |

400 | 7.3 | 0.06 |

450 | 9.7 | 0.05 |

500 | 8.2 | 0.04 |

Tables

Later, the number of iterations and the fitness of the outlier data state response detected by dynamic scheduling information flows were compared between the traditional and the proposed methods. for this comparison, there were 10 trials, and the number of iterations and the fitness were compared between the two methods under different sample amounts. The results are shown in Tables

Number of iterations and fitness of outlier data feature-state responses detected with the traditional method.

Test amount | Traditional method | |
---|---|---|

Iterations | Fitness value | |

1 | 72 | 58 |

2 | 83 | 89 |

3 | 91 | 116 |

4 | 61 | 159 |

5 | 99 | 82 |

6 | 73 | 85 |

7 | 88 | 64 |

8 | 83 | 73 |

9 | 77 | 151 |

10 | 84 | 82 |

Average | 81.1 | 95.9 |

Number of iterations and fitness of outlier data feature-state responses detected with the proposed method.

Test amount | Proposed method | |
---|---|---|

Iterations | Fitness value | |

1 | 26 | 62 |

2 | 17 | 67 |

3 | 19 | 67 |

4 | 18 | 65 |

5 | 16 | 64 |

6 | 21 | 65 |

7 | 19 | 63 |

8 | 17 | 65 |

9 | 17 | 64 |

10 | 24 | 69 |

Average | 19.4 | 65.1 |

An analysis of Tables

According to the significant features of outlier data feature response detected by the improved method-based dynamic scheduling information flows, an outlier data detection filtering model was obtained for subsequent experiments.

By further correcting the model of outlier data detection filtering, the probability confidence coefficient range was improved for outlier data mining, the offset degree and average of outlier data in an information flow were calculated, and the inverse wavelet transform was utilized to remove noise interference to finally realize outlier data mining in the information flow. To compare outcomes, the outlier data confidence coefficient and offset degree were adopted as the measurement indexes. The simulation result of the outlier data cloud picture was obtained, as shown in Figure

Cloud picture comparison of outlier data mining in information flows of a large real-time database.

Simulation of outlier data mining based on the proposed method

Simulation of outlier data mining based on the traditional method

The improved method presented in the paper was extended to achieve precise mining of outlier data under one-time transmission in outlier data-contained information flows in information chains of a mobile Internet-based large real-time database.

The results of the outlier data mining are shown in Figure

Results of outlier data mining in a mobile internet-based large real-time database.

The waveform in Figure

To verify the effectiveness of the outlier data mining method presented in this paper, compactness and clustering results of outlier data mining were adopted as the basic performance evaluation indexes of outlier data mining, and the improved method was compared with the traditional knowledge granularity method and support vector machine (SVM) method. Figure

Comparison of outlier data mining compactness.

Clustering results of outlier data.

Figure

In Figure

To further verify the high efficiency of the proposed outlier data mining method, outlier data mining accuracy, mining efficiency, mining result stability, and required mining time were adopted as indexes to evaluate the global performances of outlier data mining methods. The traditional knowledge granularity and support vector machine methods were adopted for comparisons. The results are shown in Figures

Comparison of outlier data mining accuracy.

Comparison of outlier data mining efficiency.

Comparison of outlier data mining stability.

Comparison of outlier data mining time.

As shown in Figure

It can be viewed by analyzing Figure

As shown in Figure

As shown in Figure

Outlier data mining in a mobile Internet-based large real-time database using traditional methods is prone to generating mass interference data and reducing the accuracy, efficiency, and stability of data mining, all of which are severe deficiencies. To address these issues, this paper presents a decision-tree outlier-classification feature-filtering detection-based method for outlier data mining in a mobile Internet-based large real-time database. The method is used to analyze features of real-time data, obtain magnitude spectra models of outlier data, and conduct preliminary analysis on signals of outlier data. A decisional-tree information chain transmission model for outlier data existing under mobile Internet conditions is established to precisely cluster mobile and scattered data, obtain outlier data-containing information chains, and transmit them into the real-time database in an orderly manner. Outlier data-containing information flows are obtained from the information chains in the large real-time database and then clustered to improve the precision of subsequent mining, significantly reduce the complexity of mining computation, and lay a foundation for the subsequent division of the confidence interval. Consequently, the mining method is subsequently improved by acquiring the phase position features of prefiltered outlier data in outlier data-containing information flows. The decision-tree outlier-classification feature-filtering algorithm is adopted to obtain analytical signals and instant amplitude, and the decision-tree filtering method is used to remove useless high- and low-frequency components and to obtain the phase-frequency features of the outlier data. The WT threshold denoising method is integrated to perform signal denoising. Then, the data offset degree is analyzed. The results of simulation experiments indicate that the method significantly reduces mining response time, the number of required iterations, and the error rate and improves the fitness of precise mining. The formed detection filtering model is corrected and the simulation experiments indicate that the precise coverage and probability of outlier data mining are effectively improved. This approach achieved outlier data mining in a mobile Internet-based large real-time database and yielded favorable outlier data mining effects. With respect to performance, the proposed method was compared with the traditional knowledge granularity method and the support vector machine method regarding compactness, accuracy, efficiency, stability, and time of outlier data mining as well as clustering. The results showed the absolute superiority of the proposed method for outlier data mining.

The authors declare that there are no conflicts of interest regarding the publication of this paper.

The research described in this paper was substantially supported by Grants from the National Natural Science Foundation of China (no. 71471178) and the State Key Program of National Natural Science Foundation of China (no. 71431006) and Projects of International Cooperation and Exchanges NSFC (no. 71210003) and the National Innovation Research Group Science Foundation for China (no. 70921001).