^{1}

The Outlier Interval Detection is a crucial technique to analyze spacecraft fault, locate exception, and implement intelligent fault diagnosis system. The paper proposes two OID algorithms on astronautical Time Series Data, that is, variance based OID (VOID) and FFT and

The Time Series Data is a sequence of values observed in some periods. Usually, it takes a long time to continuously observe an object and record its data so the accumulated TSD is often in a very large amount. A significant issue is that how to mine latent or interesting knowledge from the huge amount of TSD and apply what is mined to the future. The Astronautical data (AD) is a typical big TSD, which is gathered by frequently and continuously observing spacecrafts. The AD mining techniques have many significant applications, including but not limited to analyzing satellites’ working status, judging faults/errors, and forecasting its next state in the near future. Since it is hard to touch or measure spacecrafts in a shouting distance. The technique to detect outlier intervals from AD is important. The outlier intervals are the exceptional or unusual parts in a long period of the TSD, which cover abundant information about spacecraft faults or special events. The Outlier Interval Detection technique can provide foundation proofs for intelligent astronautical fault analysis, diagnosis and prediction.

Although, most spacecrafts work well in most of their lives, the faults and abnormal situations take place occasionally. Namely, the outlier intervals are always drowned in long pieces of regular data. Obviously, it is a very tough mission for a person to find out outliers and make out regulations from the huge AD. Human experts will spend not only plenty of efforts and time, but also some luck. This paper proposes two algorithms to automatically detect the outlier intervals in AD by means of data mining. These algorithms not only are able to promote analysis efficiency, raise diagnosis system’s response performance, but also can be used to support spacecraft fault prediction and diagnosis system as well as astronautical intelligent monitoring system. Additionally, the OID technique has a great perspective in the area of industry production process monitoring, financial administration, medical data analysis, disaster alarm, network intrusion detection, credit card fraud detection, and so forth.

Generally, there are four ways to implement Outlier Interval Detection, that is, distance based approach, statistic based approach, deviation based approach, and clustering based approach.

(1) The distance based OID calculates the distance between objects firstly, and then the outliers are defined as the objects whose distances to others exceed the given threshold [

Angiulli et al. have done a lot of research on this way [

Chen et al. [

(2) The statistic based OID approach requires the knowledge the probability distribution of the data in advance, which is the foundation of the approach. But for a specific sample space, it is usually hard to know the exact distribution of the data. The key work of the approach is to perform plenty of tests in order to get the most proper distribution model. For example, Takeuchi and Yamanishi [

(3) The deviation based OID extracts the main features of the center objects, and then the outliers are considered as the objects whose features deviate from the centers remarkably. For example, Oliveira and Meira [

(4) The clustering based OID considers the outliers as the by-products of clustering [

Some OID methods exploit the Fourier Transform and the Wavelet Transform to extract data features and mine the outliers in a special domain. For example, Rasheed et al. [

The solo variant TSD is the observed value of the single object or attribute in a period. Generally, a long period of TSD is divided into many intervals according to the time scale. Thus an interval is a piece of time and the values in the range. The time span of each interval can be equal to each other, such as a day, a week, or 10 days. Some applications may employ unequal interval. But in this paper, a TSD is divided into identical length of intervals.

The OID of the solo variant aims to find out the oddest

The examples of the outlier interval.

In a real astronautical dataset, the abnormally varying data often result in a great fluctuation amplitude. The degree of the amplitude can be directly reflected by the variance in the interval. Namely, the outlier score can be measured by the variance. The higher the value of the variance is, the odder the interval. It is easy to mine the top

The time complexity of the standard variance definition is

Pseudocode

Input:

Output: The top

(1)

(2) for each

(3)

(4)

(5) endfor

(6)

The time complexity of the VOID is

The VOID algorithm can quickly detect the outlier intervals in time domain. However, many real AD are periodical in some extent. The violent frequency fluctuations also imply something happened. On the other hand, the violent fluctuations in time domain lead to changes in frequency domain. Figure

The frequency spectra of 12 intervals of the data shown in the Figure

In order to detect the outliers in a fine granularity, more frequencies have to be taken into account. So a feature vector of an interval is made of the whole frequency band from the lowest frequency to the highest. The outlier score of an interval is measured according to the distance between feature vectors instead of variance. Moreover, an amplitude threshold is set to decline noises, namely; the value is assigned to 0 if it is not higher than the threshold, otherwise it keeps its value.

The FKOID algorithm firstly divides the whole TSD into

The idea of the FKOID is inspired by the method of Grané and Veiga [

Input:

Output: The top

(1)

(2) for each

(3)

(4) endfor

(5) for each

(6)

(7) endfor

(8)

The FKOID algorithm has to maintain a distance matrix and fetches the

We use six real astronautical datasets to test the algorithms. As shown in Figure

The top 4 outlier scores of both algorithms on the 6 real astronautical datasets where the number in the brackets is the ID of the interval.

Algorithm | (a) | (b) | (c) | (d) | (e) | (f) |
---|---|---|---|---|---|---|

VOID |
1.2001(3), | 1.7812(4), | 131.66(3), | 135.83(5), | 0.0076(5), | 5.4093(10), |

1.0300(9), | 1.6907(9), | 77.06(10), | 128.09(10), | 0.0062(1), | 5.3548(7), | |

0.6976(10), | 1.4850(3), | 72.81(9), | 127.37(3), | 0.0055(4), | 5.3369(8), | |

0.3677(4) | 1.3747(10) | 21.37(2) | 110.52(9) | 0.0055(9) | 5.1904(9) | |

| ||||||

FKOID |
7211(3), | 9013(3), | 28163(3), | 39757(5), | 60.94(4), | 1479(1), |

6515(9), | 8856(9), | 28083(9), | 39619(2), | 55.02(1), | 1306(10), | |

6027(10), | 7145(10), | 26842(10), | 39182(1), | 53.10(5), | 1149(7), | |

4862(4) | 5746(4) | 18941(2) | 38768(4), | 53.03(9) | 1096(2) |

The VOID results on the 6 real astronautical data where the symbol

The FKOID results on the 6 real astronautical data where the symbol

Based on the experimental results, the VOID algorithm is fit for the case that data varies slightly in the regular situation whereas it becomes violent in the irregular situation. If the normal data waves frequently and varies widely, then the VOID algorithm will have a great error. Additionally, the VOID algorithm is suitable to detect OID in time domain, but failed in the case that data varies peacefully in time domain but violently in frequency domain.

The FKOID algorithm can solve the problem of OID in frequency domain. It is shown in Figure

The OID technique can quickly deal with TSD to find the oddest objects, which often imply crucial exceptional events. It has a great perspective in the astronautical applications, such as the spacecraft fault prediction and diagnosis system, astronautical intelligent monitoring system and other systems based on TSD.

This paper proposes two algorithms to detect the outlier intervals on astronautical data. The VOID algorithm directly exploits the variance of data to quickly detect the outlier intervals in time domain. The FKOID employ the full frequency band to build a feature vector of an interval and measure the outlier score by the distances sum of the

However, the above algorithms are based on the identical length of interval. It is rather arbitrary in practice because the real outlier intervals may be varying in length. So it is our next work to study on the methods of the unequal length Outlier Interval Detection.

This research is supported by National Natural Science Foundation of China (Grant 60903123) and the Baidu Theme Research Plan on Large Scale Machine Learning and Data Mining.