Stock price prediction based on K-line patterns is the essence of candlestick technical analysis. However, there are some disputes on whether the K-line patterns have predictive power in academia. To help resolve the debate, this paper uses the data mining methods of pattern recognition, pattern clustering, and pattern knowledge mining to research the predictive power of K-line patterns. The similarity match model and nearest neighbor-clustering algorithm are proposed for solving the problem of similarity match and clustering of K-line series, respectively. The experiment includes testing the predictive power of the Three Inside Up pattern and Three Inside Down pattern with the testing dataset of the K-line series data of Shanghai 180 index component stocks over the latest 10 years. Experimental results show that
A time series is a series of observations listed in time order. It is the most commonly encountered data type, touching almost every aspect of human life [
Researches show that the time series have two import features. (a) The historical information will affect the future trend [
There are many technical analysis methods about stock prediction, the best known of which is candlestick technical analysis that is also called K-line technology analysis in Asia. In the stock market, in order to learn and study the fluctuation of stock prices in a more intuitive way, people invent a candlestick chart (also called K-line) to represent stock time series graphically. Taking a daily K-line, for example, a K-line represents the fluctuation of stock prices in one day, it not only shows the close price, open price, high price, and low price for the day but also reflects the difference and size between any two prices (all K-lines given in the paper refer to daily K-line, unless otherwise indicated). If the K-line of a stock lists in time order, then a series used to reflect the fluctuation of the stock price for some time can be formed, which can be called K-line series. As each K-line consists of four prices, the essence of K-line series is stock series with four observations.
In K-line series, if a K-line subseries contains some knowledge used to predict stock, then this subseries is called a K-line pattern series, a K-line pattern for short. For instance, when a subseries appears, the stock price will often rise or descend. Then, this subseries is a typical pattern series. Stock prediction based on K-line patterns is the essence of K-line technology analysis. How to mine the K-line patterns and how to make use of these patterns for predicting are main research contents of K-line technology analysis.
By the artificial methods of observing the K-line series of stock market (or Japanese rice market), people (the leading character is the founder of K-line, Munehisa Honma, who was a Japanese rice trader in the 18th century) have found many K-line patterns. The literatures [
Based on the above analysis, it is obvious that there are some disputes on whether the K-line patterns have predictive power in academia. However, there are few papers analyzing the reason why there are two different positions regarding the patterns’ predictive power. Paper [
Through reviewing the relevant literatures, this paper considers that the main reason is that the existing K-line patterns are lack of rigorous mathematical definition. For example, the shadow length and body size are not defined clearly in the definition of K-line patterns, which means that a K-line pattern has many different shapes. Because the predictive power of a pattern may vary a lot for different shapes. If we ignore the shape difference and research the predictive power of a pattern by taking all patterns with various shapes as a whole instead of classifying the pattern further based on its shape feature, then the study result of K-line patterns’ predictive power may produce deviations. For instance, a TIU pattern has three shapes: shape A, shape B, and shape C, as shown in Figure
Three kinds of shapes of TIU pattern.
Shape A
Shape B
Shape C
In addition, another reason is that, as the existing K-line patterns are mined by artificial means, there may be some spurious pattern in them.
In order to resolve the debate and verify the two inferences, this paper presents the research of K-line patterns’ predictive power using the data mining related method, such as pattern recognition, pattern clustering, pattern knowledge mining, and statistical analysis. The rest of this paper is organized as follows. In Section
Firstly, we give the mathematic definition of K-line series. Let
As defined in literature [
In the K-line, if open price is lower than close price, K-line also called Yang line, the body is usually filled with white or green color, as shown in Figure
K-line chart.
Yang line
Yin line
Doji line
Firstly, we introduce and define some key concepts of K-line technology analysis. Let
It is noted that the concepts of “moving average” and “K-line trend” are defined by the paper [
Many K-line patterns have been mined up to now, as shown in literatures [
The conditions of
THE standard K-line series chart of TIU and TID.
TIU
TID
The predictive power of TIU pattern from the existing literature is that TIU is a trend-reversal pattern, which gives the bullish market signal. This means when the TIU pattern appears, the stock prices will be likely to be transferred from downtrend into uptrend or the stock market would be changed from bearish market to bullish market, and the stock prices would rise gradually.
The conditions of
The predictive power of TID pattern from the existing literature is that TID is a trend-reversal pattern, which gives the bearish market signal. That means, after the TID pattern appears, the stock prices will be likely to be transferred from uptrend into downtrend or the stock market would be changed from bullish market to bearish market, and the stock prices would fall gradually.
The similarity match of K-line series is an essential and basis task for K-line series clustering. In the literature, however, there are few papers focusing on the similarity match of K-line series. Only paper [
From the view of stock prediction, the K-line series’ similarity refers to the trend similarity of K-line in the K-line series. However, the K-line trend is determined by the close price change, open price change, high price change, low price change, and the size relationship between close price and open price. Therefore, if we want to match the similarity between two K-line series, we should calculate the similarity of K-line price changes instead of the similarity of price values. As the changes of K-line price are not shown in the K-line chart, K-line prices distance rather than K-line price changes distance is used in the similarity match model of literature [
For example, assuming that there are two K-line series
Therefore, this paper proposes a new similarity match model based on K-line price changes to measure the trend similarity between two K-line series. In this model, the similarity of K-line series is composed of two parts: one is the shape similarity of K-line, which is the similarity of the corresponding K-line’s shape features in the two K-line series; the other is the position similarity of K-line, which is the similarity of the corresponding K-line’s position features in the two K-line series. Therefore, this paper will define K-line series’ shape similarity model and position similarity model, respectively. Then based on these two kinds of similarity models, the similarity model of the entire K-line series could be built.
According to the shape feature of K-line, this paper proposes using the shape distance to measure the shape similarity between two K-lines. Firstly, based on the shape structure of K-line, three components of K-line shape are extracted: the upper shadow shape, the lower shadow shape, and the body shape. Secondly, the similarity match methods of three shapes are defined, respectively. Finally, the shape similarity of K-line can be calculated by summing the three shapes’ similarity. Assuming that
Let
Let
Let
For computing the similarity between two K-line series, we not only consider the shape similarity of K-line series but also the position similarity. If we only consider the shape similarity, then it will cause the problem that two K-line series having same shape features but different position features will have the same similarity.
For example, supposing that the K-line series chart of
K-line series chart.
To solve this problem, the concept of K-line coordinate is introduced hoping to implement the position match of K-line by defining K-line’s coordinate in the K-line series. In this paper, the sequence of K-line in the K-line series is called
Let
Finally, based on the shape similarity and position similarity, the similarity of K-line series could be obtained. Therefore, the similarity match model between
The more accurate classification result of K-line patterns can be gotten by clustering them using the nearest neighbor-clustering algorithm based on the similarity match model of K-line series. The K-line series’ nearest neighbor-clustering algorithm (KNNCA) is described as shown in Algorithm
Assign initial value for parameters: Get
In addition,
We can mine and analyze the patterns’ predictive power according to the following steps.
The pattern’s predictive power is gotten primarily by analyzing the trend of the pattern’s consequent K-line series. Paper [
(a) Let
(b) Let
As Yahoo provides the finance stock API used to download the transaction data of Chinese stock market, the stock transaction data of Chinese A-share market in any time can be acquired based on the API. To get a representative testing data, we select the K-line series data of Shanghai 180 index component stocks over the latest 10 years (from 2006-01-04 to 2016-08-24) as the test data. Limited by space, only the TIU and TID pattern’s predictive power will be analyzed in the experiment. And the parameters of KNSSC algorithm are set as follows:
The aim of the first experiment is to analyze the TIU pattern’s predictive power based on the method defined in Section
The experiment result of TIU.
|
|
|
|
|
|
|
|
|
|
|
|
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
1516 | 0.493 | 0.486 | 0.518 | 0.495 | 0.475 | 0.488 | 0.481 | 0.479 | 0.484 | 0.479 |
|
|
79 | 0.405 | 0.532 | 0.557 | 0.532 | 0.544 | 0.506 | 0.468 | 0.506 | 0.43 | 0.443 |
|
|
42 | 0.405 | 0.429 | 0.5 | 0.452 | 0.405 | 0.452 | 0.429 | 0.476 | 0.524 | 0.476 |
|
|
32 | 0.5 | 0.594 | 0.594 | 0.625 | 0.594 | 0.531 | 0.625 | 0.594 | 0.563 | 0.531 |
|
|
25 | 0.6 | 0.6 | 0.52 | 0.48 | 0.48 | 0.4 | 0.48 | 0.48 | 0.6 | 0.64 |
|
|
23 | 0.435 | 0.478 | 0.609 | 0.565 | 0.522 | 0.391 | 0.478 | 0.348 | 0.391 | 0.391 |
|
|
22 | 0.455 | 0.636 | 0.545 | 0.5 | 0.545 | 0.545 | 0.591 | 0.591 | 0.636 | 0.636 |
|
|
21 | 0.333 | 0.571 | 0.524 | 0.476 | 0.619 | 0.857 | 0.762 | 0.714 | 0.429 | 0.048 |
|
|
21 | 0.524 | 0.476 | 0.524 | 0.429 | 0.476 | 0.476 | 0.429 | 0.476 | 0.429 | 0.429 |
|
|
21 | 0.524 | 0.476 | 0.524 | 0.429 | 0.381 | 0.286 | 0.381 | 0.429 | 0.381 | 0.429 |
|
|
19 | 0.105 | 0.368 | 0.421 | 0.421 | 0.368 | 0.421 | 0.474 | 0.474 | 0.579 | 0.421 |
|
|
19 | 0.579 | 0.474 | 0.421 | 0.368 | 0.263 | 0.368 | 0.316 | 0.263 | 0.211 | 0.211 |
|
|
17 | 0.588 | 0.294 | 0.471 | 0.353 | 0.412 | 0.471 | 0.471 | 0.412 | 0.471 | 0.412 |
|
|
15 | 0.6 | 0.533 | 0.467 | 0.467 | 0.467 | 0.467 | 0.533 | 0.533 | 0.6 | 0.6 |
|
|
15 | 0.533 | 0.6 | 0.667 | 0.6 | 0.533 | 0.6 | 0.6 | 0.667 | 0.6 | 0.6 |
|
|
14 | 0.5 | 0.071 | 0.5 | 0.429 | 0.429 | 0.357 | 0.286 | 0.286 | 0.286 | 0.429 |
|
|
14 | 0.643 | 0.571 | 0.714 | 0.714 | 0.714 | 0.5 | 0.643 | 0.571 | 0.571 | 0.714 |
|
|
14 | 0.714 | 0.857 | 0.857 | 0.857 | 0.786 | 0.786 | 0.571 | 0.643 | 0.643 | 0.643 |
|
|
14 | 0.357 | 0.571 | 0.429 | 0.5 | 0.357 | 0.571 | 0.429 | 0.429 | 0.5 | 0.5 |
|
|
13 | 0.385 | 0.462 | 0.692 | 0.615 | 0.615 | 0.692 | 0.692 | 0.769 | 0.769 | 0.769 |
|
|
12 | 0.5 | 0.667 | 0.667 | 0.667 | 0.75 | 0.583 | 0.583 | 0.667 | 0.583 | 0.583 |
|
In Table
By comparing the predictive power of
The comparison of predictive power between
The aim of the second experiment is to analyze the TID pattern’s predictive power based on the method defined in Section
The experiment result of TID.
|
|
|
|
|
|
|
|
|
|
|
|
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
1498 | 0.421 | 0.451 | 0.461 | 0.449 | 0.437 | 0.449 | 0.459 | 0.444 | 0.427 | 0.435 |
|
|
78 | 0.449 | 0.5 | 0.436 | 0.436 | 0.372 | 0.359 | 0.449 | 0.397 | 0.359 | 0.346 |
|
|
72 | 0.431 | 0.486 | 0.542 | 0.458 | 0.431 | 0.431 | 0.417 | 0.417 | 0.361 | 0.306 |
|
|
44 | 0.364 | 0.364 | 0.364 | 0.386 | 0.273 | 0.386 | 0.364 | 0.386 | 0.386 | 0.341 |
|
|
39 | 0.513 | 0.359 | 0.359 | 0.333 | 0.231 | 0.282 | 0.308 | 0.282 | 0.282 | 0.282 |
|
|
23 | 0.174 | 0.261 | 0.304 | 0.391 | 0.348 | 0.391 | 0.391 | 0.435 | 0.435 | 0.391 |
|
|
23 | 0.348 | 0.478 | 0.522 | 0.478 | 0.435 | 0.435 | 0.391 | 0.478 | 0.478 | 0.435 |
|
|
21 | 0.429 | 0.429 | 0.476 | 0.429 | 0.619 | 0.524 | 0.524 | 0.476 | 0.476 | 0.571 |
|
|
18 | 0.444 | 0.333 | 0.444 | 0.5 | 0.5 | 0.278 | 0.222 | 0.333 | 0.333 | 0.333 |
|
|
18 | 0.333 | 0.667 | 0.611 | 0.389 | 0.444 | 0.444 | 0.389 | 0.389 | 0.389 | 0.389 |
|
|
18 | 0.333 | 0.222 | 0.278 | 0.167 | 0.278 | 0.389 | 0.444 | 0.389 | 0.5 | 0.556 |
|
|
18 | 0.444 | 0.444 | 0.278 | 0.333 | 0.389 | 0.222 | 0.278 | 0.333 | 0.444 | 0.389 |
|
|
16 | 0.438 | 0.375 | 0.5 | 0.563 | 0.688 | 0.688 | 0.625 | 0.5 | 0.438 | 0.563 |
|
|
15 | 0.267 | 0.267 | 0.467 | 0.4 | 0.267 | 0.6 | 0.533 | 0.467 | 0.4 | 0.4 |
|
|
13 | 0.385 | 0.308 | 0.308 | 0.385 | 0.308 | 0.462 | 0.231 | 0.231 | 0.308 | 0.308 |
|
|
12 | 0.25 | 0.333 | 0.583 | 0.667 | 0.667 | 0.583 | 0.5 | 0.5 | 0.417 | 0.5 |
|
|
12 | 0.5 | 0.333 | 0.25 | 0.25 | 0.5 | 0.667 | 0.417 | 0.5 | 0.5 | 0.5 |
|
|
12 | 0.25 | 0.833 | 0.75 | 0.667 | 0.583 | 0.5 | 0.5 | 0.417 | 0.417 | 0.583 |
|
|
11 | 0.364 | 0.273 | 0.273 | 0.364 | 0.545 | 0.455 | 0.545 | 0.545 | 0.455 | 0.545 |
|
|
11 | 0.364 | 0.455 | 0.273 | 0.364 | 0.364 | 0.364 | 0.455 | 0.455 | 0.455 | 0.364 |
|
|
11 | 0.273 | 0.364 | 0.273 | 0.364 | 0.364 | 0.364 | 0.364 | 0.364 | 0.182 | 0.273 |
|
Similarly, the TID pattern may be also a spurious pattern to predict bearish market because
Through the above experiment, we can draw the following conclusion. (1) The predictive power of a pattern varies a great deal for different shapes. Take TIU; for example, some shapes’ TIU patterns have a strong capability for predicting bullish market, while some others have the opposite predictive power. Therefore, to analyze the predictive power of a pattern, we should make a concrete analysis of concrete shapes. (2) There are definitely some spurious patterns in the existing K-line patterns. Therefore, in order to improve the stock prediction performance based on K-line patterns, we need to further classify the existing patterns based on the shape feature, identify all the spurious patterns, and choose the patterns having stronger predictive power to predict the stock price.
Stock prediction is a popular research field in the time series prediction. As a primary technology analysis method of stock prediction, there is different option on the stock price prediction based on K-line patterns in the academic world, though it is widely used in reality. To help resolve the debate, this paper uses the data mining method, like pattern recognition, similarity match, cluster and statistical analysis, and so forth, to study the predictive power of K-line patterns. Experimental results show that one reason for the debate is that the definition of K-line patterns is more open and lack of mathematical rigor. The other is that there are some spurious patterns in the existing K-line patterns. In addition, the method presented in the paper can be used not only to test the predictive power of patterns but also for K-line patterns mining and stock prediction. Therefore, the future works as follows.
The authors declare that there are no conflicts of interest regarding the publication of this paper.
The Key Basic Research Foundation of Shanghai Science and Technology Committee, China (Grant no. 14JC1402203), and the Science and Technology Support Program of China (Grant no. 2015BAF10B01) financially supported this work.