K-Line Patterns ’ Predictive Power Analysis Using the Methods of Similarity Match and Clustering

Stock price prediction based on K-line patterns is the essence of candlestick technical analysis. However, there are some disputes on whether the K-line patterns have predictive power in academia. To help resolve the debate, this paper uses the dataminingmethods of pattern recognition, pattern clustering, and pattern knowledge mining to research the predictive power of K-line patterns. The similarity match model and nearest neighbor-clustering algorithm are proposed for solving the problem of similarity match and clustering of K-line series, respectively. The experiment includes testing the predictive power of the Three Inside Up pattern and Three Inside Down pattern with the testing dataset of the K-line series data of Shanghai 180 index component stocks over the latest 10 years. Experimental results show that (1) the predictive power of a pattern varies a great deal for different shapes and (2) each of the existing K-line patterns requires further classification based on the shape feature for improving the prediction performance.


Introduction
A time series is a series of observations listed in time order.It is the most commonly encountered data type, touching almost every aspect of human life [1], for example, the meteorological time series, the time series of stock prices (stock time series for short) which are composed of stock price observations, and the time series of personal health that are consisted of the observation of blood pressure, temperature, white corpuscle, and so forth.
Researches show that the time series have two import features.(a) The historical information will affect the future trend [2].That is, the historical values of observations will exert an influence on the future values in the time series.The influence can be described by time series' period, nonstationarity, varying volatility, and so on.(b) History repeats itself [3].That is to say, some special time subseries will repeat in the entire time series.Because of the two features, all kinds of time series forecasting have become a present hot research, one of which is the prediction of stock time series, stock prediction for short.As a typical time series, not only have stock time series the features of time series, but also the trend of stock prices is directly related to the people's vital interests.Therefore, stock prediction has aroused the interest of a wide variety of researchers.
There are many technical analysis methods about stock prediction, the best known of which is candlestick technical analysis that is also called K-line technology analysis in Asia.In the stock market, in order to learn and study the fluctuation of stock prices in a more intuitive way, people invent a candlestick chart (also called K-line) to represent stock time series graphically.Taking a daily K-line, for example, a Kline represents the fluctuation of stock prices in one day, it not only shows the close price, open price, high price, and low price for the day but also reflects the difference and size between any two prices (all K-lines given in the paper refer to daily K-line, unless otherwise indicated).If the K-line of a stock lists in time order, then a series used to reflect the fluctuation of the stock price for some time can be formed, which can be called K-line series.As each K-line consists of four prices, the essence of K-line series is stock series with four observations.In K-line series, if a K-line subseries contains some knowledge used to predict stock, then this subseries is called a K-line pattern series, a K-line pattern for short.For instance, when a subseries appears, the stock price will often rise or descend.Then, this subseries is a typical pattern series.Stock prediction based on K-line patterns is the essence of K-line technology analysis.How to mine the K-line patterns and how to make use of these patterns for predicting are main research contents of K-line technology analysis.
By the artificial methods of observing the K-line series of stock market (or Japanese rice market), people (the leading character is the founder of K-line, Munehisa Honma, who was a Japanese rice trader in the 18th century) have found many K-line patterns.The literatures [4,5] introduce the existing patterns and their features in detail, such as Three Inside Up (TIU), Three Inside Down (TID), and Doji.Some papers [6][7][8][9][10] conclude from the experiment that the existing K-line patterns have a good forecasting capability for forecasting stock trends.Some other papers [11][12][13][14][15] have studied the stock prediction based on these patterns and have achieved some research results.However, there are also a number of papers [5,[16][17][18] challenging these patterns' predictive power.They argue that K-line technology analysis violates the efficient market hypothesis, so it is not feasible for stock investment based on K-line patterns.They also did some experiments, which show that the existing K-line patterns have no predictive power.
Based on the above analysis, it is obvious that there are some disputes on whether the K-line patterns have predictive power in academia.However, there are few papers analyzing the reason why there are two different positions regarding the patterns' predictive power.Paper [19] also pays attention to the debate, while it does not analyze the K-line patterns themselves but attempts to obtain an answer to the following question: are the trend reversals accompanied more often by some types of candlesticks than by others?Finally, paper [19] has found that there exist types of candlesticks that frequently tend to appear close to the trend-reversal regions and others that cannot be found in such regions.Although the paper's research shows that the K-line patterns exist, it does not give the answer that why there is a debate on the K-line patterns' predictive power.
Through reviewing the relevant literatures, this paper considers that the main reason is that the existing K-line patterns are lack of rigorous mathematical definition.For example, the shadow length and body size are not defined clearly in the definition of K-line patterns, which means that a K-line pattern has many different shapes.Because the predictive power of a pattern may vary a lot for different shapes.If we ignore the shape difference and research the predictive power of a pattern by taking all patterns with various shapes as a whole instead of classifying the pattern further based on its shape feature, then the study result of K-line patterns' predictive power may produce deviations.For instance, a TIU pattern has three shapes: shape A, shape B, and shape C, as shown in Figure 1, where shape A is the generic form of TIU pattern, and shape B and C are infrequent form of which.Suppose that shape A has predictive power, and shape B and C do not have predictive power.When studying the predictive power of TIU pattern, if we ignore the shape difference between the three patterns and research them as a whole, then we will come to the wrong conclusion that TIU pattern has no predictive power.However, if the three patterns are classified further based on shape features and researched separately, then we can get the correct conclusion that TIU pattern has predictive power only at shape A.
In addition, another reason is that, as the existing K-line patterns are mined by artificial means, there may be some spurious pattern in them.
In order to resolve the debate and verify the two inferences, this paper presents the research of K-line patterns'  predictive power using the data mining related method, such as pattern recognition, pattern clustering, pattern knowledge mining, and statistical analysis.The rest of this paper is organized as follows.In Section 2, we firstly shortly introduce K-line, K-line technology analysis, and K-line patterns.Then we define the similarity match model and nearest neighborclustering algorithm of K-line series.In Section 3, we define the mining method of patterns' predictive power.Section 4 presents the experimental result and discussion.Section 5 concludes the paper.

K-Line and K-Line Series Clustering
Firstly, we give the mathematic definition of K-line series.Let KS  represent the -th K-line series of any stock, and let    represent t-th K-line in KS  ; then where |KS  | is the number of elements in KS  , which is also called the length of KS  .   ,    ,    , and    are the t-th day's close price, open price, high price, and low price in KS  , respectively.In this paper, "| |" symbol indicates the number of elements in the set or series.

K-Line Introduction.
As defined in literature [4][5][6], the K-line is drawn by four basic elements: close price, open price, high price, and low price, where the part between the close price and open price is drawn into a rectangle called body of K-line and the part between the high price and body is drawn into a line called upper shadow of K-line.Moreover, the part between the lower price and body is drawn into a line called lower shadow of K-line.This kind of very personalized lines consisting of upper shadow, lower shadow, and body is called K-line.
In the K-line, if open price is lower than close price, Kline also called Yang line, the body is usually filled with white or green color, as shown in Figure 2(a).And if open price is higher than close price, K-line also called Yin line, the body is usually filled with black or red color, as shown in Figure 2(b).Moreover, if open price is equal to close price, Kline also called Doji line, the body then collapses into a single horizontal line, as shown in Figure 2(c).It is important to note that the body color of Yin line and Yang line is different in Chinese stock market and stock markets of European and American.In Chinese stock market, the body color of Yang line and Yin line is red and green, respectively.However, the body color of Yang line and Yin line is green and red, respectively, in the stock markets of European and American.

K-Line Technology
Analysis.Firstly, we introduce and define some key concepts of K-line technology analysis.Let   represent the t-th day's K-line of any stock.
(1) Moving Average.It is the average of stock price for some time.The three-day moving average at time   is defined by where   denotes the close price of   .
(2) K-Line Trend.It is used to describe the K-line's trend, including uptrend and downtrend.  is said to be a downtrend if with at most one violation of the inequalities.Uptrend is defined analogously.(3) Stock Price Trend.It is used to describe the general trend of stock prices for some time, including uptrend and downtrend.If the future trend of stock price is rising, it is called bullish market.In contrast, if the future trend of stock price is descending, it is called bearish market.Moreover, a more intense rising or descending trend indicates a more typical bullish or bearish market.The capability of a K-line patter for predicting the bullish market and bearish market is defined in formulas ( 17) and (18), respectively.It is noted that the concepts of "moving average" and "Kline trend" are defined by the paper [6], while the concept of stock price trend is firstly defined by the paper.

K-Line Patterns. Many K-line patterns have been mined
up to now, as shown in literatures [4][5][6].Limited by space, only the patterns of TIU and TID will be introduced in the next content.Let KS = [  ,  +1 ,  +2 ] represent a three-day K-line series.
The conditions of KS becoming the TIU pattern are as follows: (1)   is a downtrend, and   <   .(2)  +1 >  +1 ,   ≥  +1 >   , and   ≥  +1 >   , where at most one of the two equalities holds.That is, the second day  + 1 is Yang line and must be contained with the body of the first day.(3)  +2 >  +2 ;  +2 >   .That is, the third day  + 2 is Yang line and closes above the open of the first day.A standard TIU pattern is shown in Figure 3(a).
The predictive power of TIU pattern from the existing literature is that TIU is a trend-reversal pattern, which gives the bullish market signal.This means when the TIU pattern appears, the stock prices will be likely to be transferred from downtrend into uptrend or the stock market would be changed from bearish market to bullish market, and the stock prices would rise gradually.
The conditions of KS becoming the TID pattern are as follows: (1)   is an uptrend, and   >   .(2)  +1 <  +1 ,   ≥  +1 >   , and   ≥  +1 >   , where at most one of the two equalities holds.That is, the second day  + 1 is Yin line and must be contained within the body of the first day.
(3)  +2 <  +2 ;  +2 <   .That is, the third day  + 2 is Yin line and its close is lower than the first day's open.A standard TID pattern is shown in Figure 3(b).
The predictive power of TID pattern from the existing literature is that TID is a trend-reversal pattern, which gives the bearish market signal.That means, after the TID pattern appears, the stock prices will be likely to be transferred from uptrend into downtrend or the stock market would be changed from bullish market to bearish market, and the stock prices would fall gradually.

Similarity Match of K-Line Series.
The similarity match of K-line series is an essential and basis task for K-line series clustering.In the literature, however, there are few papers focusing on the similarity match of K-line series.Only paper [20] studies the similarity match method and search algorithm of K-line series using image retrieval technology.In addition, paper [19,21] proposes the similarity match model of K-line series based on the traditional Euclidean distance.
From the view of stock prediction, the K-line series' similarity refers to the trend similarity of K-line in the Kline series.However, the K-line trend is determined by the close price change, open price change, high price change, low price change, and the size relationship between close price and open price.Therefore, if we want to match the similarity between two K-line series, we should calculate the similarity of K-line price changes instead of the similarity of price values.As the changes of K-line price are not shown in the K-line chart, K-line prices distance rather than Kline price changes distance is used in the similarity match model of literature [19][20][21].This means that these match models belong to similarity match methods based on K-line price values rather than K-line price changes.Therefore, they cannot accurately measure the similarity of stock prices trend in the K-line series.Therefore, this paper proposes a new similarity match model based on K-line price changes to measure the trend similarity between two K-line series.In this model, the similarity of K-line series is composed of two parts: one is the shape similarity of K-line, which is the similarity of the corresponding K-line's shape features in the two K-line series; the other is the position similarity of K-line, which is the similarity of the corresponding K-line's position features in the two K-line series.Therefore, this paper will define K-line series' shape similarity model and position similarity model, respectively.Then based on these two kinds of similarity models, the similarity model of the entire K-line series could be built.

The Shape Similarity of K-Line
Series.According to the shape feature of K-line, this paper proposes using the shape distance to measure the shape similarity between two Klines.Firstly, based on the shape structure of K-line, three components of K-line shape are extracted: the upper shadow shape, the lower shadow shape, and the body shape.Secondly, the similarity match methods of three shapes are defined, respectively.Finally, the shape similarity of K-line can be calculated by summing the three shapes' similarity.Assuming that    and    denote the t-th day's K-line of KS  and KS  , respectively, the shape similarity model of K-line series is defined as follows: (1) Let US  [] denote the upper shadow length of    , as defined in the following formula: where   −1 * 0.1 is used to normalize the upper shadow length.According to the related regulation of Chinese Ashare market, the range of daily fluctuations of stock prices cannot exceed 10% of the previous day's close price.So   −1 * 0.1 can be used to normalize the length of the K-line's upper shadow, lower shadow, and body.
Let Sim (2) Let LS  [] denote the lower shadow length of    , as defined in the following formula: (3) Let   [] denote the body length of    , as defined in the following formula: where  Body ,  US , and  LS represent the weight of Sim where    represent the weight of Sim ,  ().Thanks to the idea that each K-line can be given different weight, the K-line series having special shape features could be identified well.

The Position Similarity of K-Line Series.
For computing the similarity between two K-line series, we not only consider the shape similarity of K-line series but also the position similarity.If we only consider the shape similarity, then it will cause the problem that two K-line series having same shape features but different position features will have the same similarity.
For example, supposing that the K-line series chart of KS  and KS  is shown in Figure 2, we can see that, according to the shape feature definition of K-line, all of the corresponding K-lines of KS  and KS  have the same shape features.These mean that KS  and KS  have identical shape features; that is, SSim , = 1.However, as is vividly shown in Figure 4, the relative positions of   2 and   2 are different though   1 and   1 have the same relative position in the K-line series.Therefor the stock price's overall trend of KS  and KS  are not identical, that is, Sim , < 1.If we only consider the shape similarity, we will draw the wrong conclusion that SSim , = Sim , = 1.
To solve this problem, the concept of K-line coordinate is introduced hoping to implement the position match of Kline by defining K-line's coordinate in the K-line series.In this paper, the sequence of K-line in the K-line series is called  coordinate of K-line; the increase range of close price is called  coordinate of K-line; in addition the first K-line's  coordinate is set to 1 in the K-line series.Therefore, the position similarity model of K-line series based on K-line coordinate is defined as follows.
(1) Let (   ,    ) denote the coordinate of    , which are defined in the following formula: Let Sim  (2) Let PSim , denote the position similarity between KS  and KS  , as defined by where    represents the weight of Sim ,  ().Thanks to the idea that each K-line can be given different weight, the K-line series having special coordinates could be identified well.

The Similarity of K-Line Series.
Finally, based on the shape similarity and position similarity, the similarity of Kline series could be obtained.Therefore, the similarity match model between KS  and KS  is defined by where   and   represent the shape similarity weight and position similarity weight of K-line series, respectively.

Cluster of K-Line Series.
The more accurate classification result of K-line patterns can be gotten by clustering them using the nearest neighbor-clustering algorithm based on the similarity match model of K-line series.The K-line series' nearest neighbor-clustering algorithm (KNNCA) is described as shown in Algorithm 1.
In addition, |  | represents the number of elements in   .As each K-line series will be matched once with all of the K-line series stored in the cluster, the time complexity and space complexity of KNNCA are both ( 2 ).

Mining of Patterns' Predictive Power
We can mine and analyze the patterns' predictive power according to the following steps.
(1) Pattern Recognition.Based on the definition of K-line patterns, we identify all the K-line series belonging to a pattern (such as TIU or TID), and then they form a set (KSet).
(2) Pattern Clustering.We use the KNSSC algorithm to cluster KSet; then the set of clusters (CSet) can be gotten, in which different clusters represent the same pattern's different shapes.
(3) Knowledge Mining.We define some statistical indicators about stock prices, which we use to mine stock prediction knowledge from each cluster.
The pattern's predictive power is gotten primarily by analyzing the trend of the pattern's consequent K-line series.Paper [22] found that K-line technology is suited for shortterm investment prediction and that the most efficient time period for prediction is 10 days.Therefore, we mainly analyze the close price trend of the pattern's consequent K-line series in 10 days.Let KS = [  ,  +1 ,  +2 ] denote a three-day Kline pattern; its consequent K-line series is denoted by CKS.The statistical indicators of CKS are defined as follows.
(a) Let   denote the k-th close price of CKS, let   (  ) denote that the probability of the trend of   is uptrend, and let   (  ) denote that the probability of the trend of   is downtrend.  (  ) and   (  ) are calculated by where | ,  | represents the number of patterns meeting the condition of   ≥  +2 in   , | ,  | represents the number of patterns meeting the condition of   ≤  +2 in   , and  +2 represents the close price of  +2 .  (  ) > 0.5 indicates that the future trend of   is rising.  (  ) > 0.5 indicates that the future trend of   is descending.
(b) Let    ∈ [0, 1] denote the probability that the close price will rise in the next  days if the pattern appears.   ∈ [0, 1] denotes the probability that the close price will In Table 1,  0 represents the cluster composed of 1516 TIU Patterns.Its  10   is only 0.5 which means that it may be a spurious pattern to predict bullish market.However, after further classifying the TIU patterns, we can see that (1)  3 ,  6 ,  16 , and so forth have a strong capability for predicting bullish market, because their  10  both are above 0.8, (2)  1 ,  4 ,  7 , and so forth have a moderate capability for predicting bullish market, as their  10   are only in 0.5∼0.7,and (3)  2 ,  5 ,  8 , and so forth have a weak capability for predicting bullish market, as their entire 10   are below 0.5.Particularly for  2 , its  10   is only 0.1.By comparing the predictive power of  3 and  2 , as shown in Figure 5, we can see that the predictive result of  3 is bullish market while that of  2 is bearish market, which means that their predictive power is opposite.The result of experiment one shows that (1) the predictive power of TIU varies a great deal for different shapes and (2) to be a better pattern for predicting bullish market, the TIU pattern badly needs to be further classified, which are consistent with the expected analysis.

Experiment Two.
The aim of the second experiment is to analyze the TID pattern's predictive power based on the method defined in Section 3. Firstly, based on the definition of TID, 1498 TID patterns are identified from the test data.Then we cluster these patterns using the KNSSC algorithm, and finally 572 clusters are obtained.We choose the top 20 clusters with the most elements to conduct statistical analysis, as shown in Table 2.
Similarly, the TID pattern may be also a spurious pattern to predict bearish market because  10   of  0 is 0, where  0 represents the cluster composed of 1498 TID patterns.Moreover, after further classifying the TID patterns, we can see that (1) except for  11 and  16 , almost all of the clusters have a weak capability for predicting bearish market, as their entire  10   are below 0.5 and (2) even  11 and  16 still not have a higher value of  10   , which are only 0.5.Therefore, we can consider that the TID pattern is definitely a spurious pattern, which is also consistent with the expected analysis.

Experiment Conclusion.
Through the above experiment, we can draw the following conclusion.(1) The predictive power of a pattern varies a great deal for different shapes.Take TIU; for example, some shapes' TIU patterns have a strong capability for predicting bullish market, while some others have the opposite predictive power.Therefore, to analyze the predictive power of a pattern, we should make a concrete analysis of concrete shapes.(2) There are definitely some spurious patterns in the existing K-line patterns.Therefore, in order to improve the stock prediction performance based on K-line patterns, we need to further classify the existing patterns based on the shape feature, identify all the spurious patterns, and choose the patterns having stronger predictive power to predict the stock price.

Conclusion
Stock prediction is a popular research field in the time series prediction.As a primary technology analysis method of stock prediction, there is different option on the stock price prediction based on K-line patterns in the academic world, though it is widely used in reality.To help resolve the debate, this paper uses the data mining method, like pattern recognition, similarity match, cluster and statistical analysis, and so forth, to study the predictive power of K-line patterns.
Experimental results show that one reason for the debate is that the definition of K-line patterns is more open and lack of mathematical rigor.The other is that there are some spurious patterns in the existing K-line patterns.In addition, the method presented in the paper can be used not only to test the predictive power of patterns but also for K-line patterns mining and stock prediction.Therefore, the future works as follows.
(1) It will be a necessary and significant task to identify the entire spurious pattern using the proposed method.
(2) On the basis of the proposed method, we can research an automatic pattern mining method to discover more useful patterns for stock prediction.

Figure 1 :
Figure 1: Three kinds of shapes of TIU pattern.

Figure 3 :
Figure 3: THE standard K-line series chart of TIU and TID.
For example, assuming that there are two K-line series KS  and KS  needed to match their similarity, where KS  = KS  and Sim , indicate their similarity.Let    = 10,   +1 = 10.5,    = 20,   +1 = 21, and RC +1,  indicate the close price change rate of KS  at day  + 1, which is calculated by (  +1 −    )/   , SRC +1, ,denotes the similarity between RC +1,  and RC +1,  , then RC +1,  = RC +1,  = 5%, and SRC +1, , = 1.We cannot calculate the correct result of SRC +1, , = 1 by the similarity match model in literature [19-21].Similarly, the same problems would occur for calculating the similarity of open price, high price, or low price.

Figure 5 :
Figure 5: The comparison of predictive power between  3 and  2 .