Modeling Traders’ Behavior with Deep Learning and Machine Learning Methods: Evidence from BIST 100 Index

Although the vast majority of fundamental analysts believe that technical analysts’ estimates and technical indicators used in these analyses are unresponsive, recent research has revealed that both professionals and individual traders are using technical indicators. A correct estimate of the direction of the ﬁnancial market is a very challenging activity, primarily due to the nonlinear nature of the ﬁnancial time series. Deep learning and machine learning methods on the other hand have achieved very successful results in many diﬀerent areas where human beings are challenged. In this study, technical indicators were integrated into the methods of deep learning and machine learning, and the behavior of the traders was modeled in order to increase the accuracy of forecasting of the ﬁnancial market direction. A set of technical indicators has been examined based on their application in technical analysis as input features to predict the oncoming (one-period-ahead) direction of Istanbul Stock Exchange (BIST100) national index. To predict the direction of the index, Deep Neural Network (DNN), Support Vector Machine (SVM), Random Forest (RF), and Logistic Regression (LR) classiﬁcation techniques are used. The performance of these models is evaluated on the basis of various performance metrics such as confusion matrix, compound return, and max drawdown.


Introduction
e efficiency of technical analysis, one of the oldest instruments used to predict market direction, has long been debated, and the discussion seems likely to continue. e main reason for this is that, to predict future market trends, the technical analysis is likely to use information such as past price and past volume, which is not based on fundamental analysis. is violates the classical market efficiency theory [1].
Investors are thought to be one of the most important drivers of volatility in stock prices as a result of repetitive patterned trading behavior. is leads to the idea that stock prices are following the trends that form the basis of technical analysis [2]. Although patterned trading behavior does not seem logical to some, it is known that investors are using it to predict market trends and predict future price movements effectively. e basic information that technical analysts use is volume and price. In technical analysis studies, the patterns in the historical stock exchange series arising from daily market activities are examined in order to predict future market movements.
Despite ongoing debate on the effectiveness of technical analysis, the emergence of traders applying technical analysis in practice may be even more motivating to carry out new studies in this area [3][4][5][6][7]. For example, a survey conducted on 692 fund managers indicates that 87% of them pay attention to technical analysis while making investment decisions [8].
In financial trading, technical and quantitative analysis uses mathematical and statistical tools to determine the most appropriate time for investors to initiate and close their orders, which means instructions for buying or selling on a trading venue. While these traditional approaches serve to some extent their purpose, new techniques emerging in computational intelligence such as machine learning and data mining have also been used to analyze financial information.
One of the main objectives of machine learning methods is to find hidden patterns in the data by using automatic or semiautomatic methods. Useful patterns allow us to make meaningful estimates on new data [9]. Machine learning techniques used in real life, such as time series analysis [10], communication [11], Internet traffic analysis [12], medical imaging [13], astronomy [14], document analysis [15], and biology [16], have demonstrated impressive performance in solving classification problems. While the vast majority of previous financial engineering research focuses on complex computational models such as Neural Networks [17][18][19][20] and Support Vector Machines [21,22], there is also research based on new deep learning models that yield better results in nonfinancial applications [23,24].
Deep learning is one of the machine learning methods that use past data to train models and make predictions from new data. Recent developments in deep learning have allowed computers to recognize and tag images, recognize and translate speech, be very successful in games that require skill, and even perform better than human beings [25]. In these applications, the goal is usually to train a computer to perform tasks that humans can do as well. Deep learning methods allow the task to be performed without human participation; perhaps the task that can be done differently by a person is unlikely to be completed with human power over a limited period of time, or there is too much of a benefit in tasks where supernatural performance is needed, as in the case of medical diagnoses [26].
Current state-of-the-art practices of deep learning differ from market direction forecasting problems in many aspects. However, one of the most striking aspects is that market forecasting problems are not those that people can already do well. Unlike interpreting, perceiving objects in a picture, understanding texts in the pictures, people do not have the innate ability to choose a stock that will perform well in some future periods. However, deep learning techniques may be useful for such selection problems because these techniques essentially convert any function mapping data to a return value. At least, in theory, a deep learner can find a return value for a relationship among data, no matter how complex and nonlinear it is. is is far from both the simple linear factor models of traditional financial economics and relatively coarse statistical arbitrage methods, and other quantitative asset management techniques [27].
In this study, we investigate the benefits of Deep Neural Network (DNN), Support Vector Machine (SVM), Random Forest (RF), and Logistic Regression (LR) classifiers in making decisions on market direction. In particular, we show whether these classification approaches can make trading consistent and profitable for a long period of time.
e main contribution of the study is developing a deep learning model taking into consideration OHLC prices and transaction costs and also to compare the classification performance of the developed model with the most commonly used machine learning methods on estimating the direction of a stock market index. e success of deep learning and machine learning methods may differ according to the inefficiencies of the markets [28]. is study investigates the case of Stock Exchange Istanbul and emerging markets. Another contribution of this study is the use of threshold values to control transaction costs in financial estimates. In some studies, transaction costs are not covered, although estimates seem profitable. It is known that when transaction costs are included, profitability may disappear [23]. To avoid this problem, the threshold level is dynamically adjusted according to the standard deviation of the profit distribution, and optimal values are selected to reduce the number of transactions in order to increase the return (profit on an investment) per transaction. Accordingly, the aim is to create profitable operations in the long run with the right combination of parameter values and property selection of the training set size. e structure of the paper is organized as follows. Section 2 provides the related work and similar studies of deep learning and machine learning in making decisions on market direction. Section 3 briefly describes the methodology, general experimental setup, datasets, the attribute selection for feeding the models, the specific parameter settings to provide comprehensive information for deep learning, and other machine learning algorithms used in experiments and their use in future work. e results of the analysis of each of the trading scenarios are presented in Section 4. Finally, Section 5 concludes the study by providing the obtained results and future considerations.

Related Work
Researchers have intensified their studies on the direction of movements of various financial instruments using time series and machine learning methodologies. Both academic researchers and practitioners have developed financial trading strategies to make forecasts about future movements of the stock market index and transform the predictions into profits. is section includes a summary of research about the stock prediction that covers methods that use technical indicators as features, traditional machine learning algorithms, studies done for Istanbul Stock Exchange (ISE), and current methods that use deep learning algorithms in finance.
e majority of the studies based on stock market prediction with machine learning algorithms use technical indicators as part of the training dataset. Neural Networks (NN) [17,18] and Support Vector Machines (SVM) are one of the mostly used machine learning methods. ere are also studies that use classification methods such as Decision Trees (DTs) [29], Random Forests (RFs) [30], Logistic Regression (LR) [31], and Naive-Bayes (NB) [32]. Patel et al. [33] focused on predicting future values of Indian stock market indices using Support Vector Regression (SVR), Artificial Neural Network (ANN), and Random Forest (RF). e best overall prediction performance is achieved by SVR-ANN hybrid model. Accuracy in the range of 85-95% has been achieved for long-term prediction on stocks such as AAPL, MSFT, and Samsung using Random Forest classifier by building a predictive model in Khaidem's research [34]. Buy, hold, or sell decision prediction is performed on Stock Exchange of ailand (SET) by Boonpeng and Jeatrakul [35], comparing the performance of the traditional neural network with One vs. All (OAA) and One vs. One (OAO) neural network (NN). With an average accuracy of 72.50%, OAA-NN showed better output than OAO-NN and traditional NN models.
In order to improve the profitability and stability of trading that includes seasonality events, Booth et al. [36] introduced an automated trading system based on performance weighted ensembles of random forests. Tests are done on a large sample of stocks from the DAX, and they have found that recency-weighted ensembles of random forests produce superior results. e research in [37] investigated methods for predicting the direction of movement of stock and stock price index for Indian stock markets, by comparing four machine learning prediction models: Artificial Neural Network (ANN), Support Vector Machine (SVM), Random Forest, and Naive-Bayes. It was found that Random Forest outperforms the other three prediction models on overall performance. Likewise, a hybridized framework of Support Vector Machine (SVM) with K-Nearest Neighbor approach for the prediction of Indian stock market indices is proposed by Nayak et al. [38]. is paper investigates how to combine several techniques on predicting future stock values in the horizon of 1 day, 1 week, and 1 month. It is pointed out that the proposed hybridized model can be used where there is a need for scaling high-dimensional data and better prediction capability.
Kara et al. [39] developed two efficient models based on two classification techniques, Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs), and compared their performances in predicting the direction of movement in the daily Istanbul Stock Exchange (ISE) National 100 Index. Ten technical indicators were selected as inputs of the proposed models. It was found that the ANN model performed significantly better than the SVM model. In Pekkaya's study [40], the results of Linear Regression and NN model have been compared to predict YTL/USD currency using macrovariables as input data. It is shown that NN gives better results. In [41], optimal subset indicators are selected with ensemble feature selection approach in order to increase the performance of predicting the next day's stock price direction. A real dataset is obtained from Istanbul Stock Exchange (ISE), and the subset is composed using technical and macroeconomic indicators. From the results of this study, it has been found that the reduced dataset shows an improvement over the next day's direction estimation. e effectiveness of using technical indicators, such as simple moving average of closing price and momentum, in the Turkish stock market has been evaluated in Göçken's study [42]. Hybrid Artificial Neural Network (ANN) models such as Harmony Search (HS) and Genetic Algorithm (GA) are used in order to select the most relevant technical indicators in capturing the relationship between the technical indicators and the stock market. As a result from this study, it has been found that HS-based ANN model performs better in stock market forecasting.
Prediction of the stock movement direction with Convolutional Neural Networks (CNN), which is one of the DNN methods most commonly used for analysing visual imagery [43], is applied first on predicting the intraday direction of ISE 100 stocks by Gunduz et al. [44]. e feature set is composed of different indicators. Closing price, temporal information, and trading data of classifiers are labeled by using hourly closing prices. e proposed classifier with seven layers outperforms both Logistic Regression and CNN, which utilizes randomly ordered features. Chong et al. [24] proposed a deep feature learning-based stock market prediction model as a case study using stock returns from the KOSPI market, the major stock market in South Korea. A time period of five minutes is used in order to evaluate deep learning network's performance on market prediction at high frequencies.
e aim is to provide a comprehensive and objective assessment of both the advantages and drawbacks of deep learning algorithms for stock market analysis and prediction. e proposed model has been tested with covariance-based market structure analysis and it is found that the proposed model improves covariance estimation effectively. From experimental results, practical and potentially useful directions are suggested for further investigation into how to use deep learning networks.
A simple method has been proposed to leverage financial news to predict stock movements by using the popular word embedding representation and deep learning techniques [45]. ey have used DNN composed of 4 hidden layers and 1024 hidden nodes in each layer to predict stock's price movement based on a variety of features. By adding features derived from financial news, they have managed to decrease the error rate significantly.

Methodology
Our objective in this study is to use the best features and machine learning methods in order to model traders' behavior so that we can predict market direction. Big traders including investment banks, hedge funds, and brokerage firms build their proprietary trading software for stock trading. e methods used by these firms are kept as confidential and trade secrets, which makes their comparison impossible. In our exploration of the best methods and strategies, we decided to use a rich set of features and deep learning methods in addition to traditional machine learning algorithms because of their success in many areas. As a deep learning framework, we use TensorFlow which is a powerful and open-source software built by Google Brain team to service many different artificial intelligence tasks [46]. Our dataset is organized as TensorFlow data structure for holding features, labels, and other parameters. Figure 1 illustrates the steps performed to predict market direction by using the TensorFlow framework. It starts with preprocessing step that extracts features and performs normalization. While reading the dataset, a set of features and labels are defined. If there are string variables, they are encoded. After this step, the dataset is divided into two parts as training and testing datasets. Time series k-fold Complexity cross-validation method is used for evaluation. In this study, k-fold is set to ten. Financial time series data is split into two parts as shown in Figure 2. In each cross-validation step, the training data gets bigger and includes all data prior to the testing data whereas the size of testing data stays the same. In the last cycle of cross-validation, the size of the training dataset is nine times bigger than that of the test dataset. After the formation of a model with the training dataset at each step, the model is tested with the testing dataset and precision, accuracy, cumulative return, maximum drawdown, and return on investment are calculated. Here, the ultimate goal is to achieve the highest precision, accuracy, cumulative return, and the lowest maximum drawdown. Hence, optimal parameters are obtained based on the trade-off between accuracy and cumulative return.

Classification Methods.
In this study, four types of data mining algorithms were used to compare the financial forecasting capabilities of the models. is section gives a brief description of the classification approaches which we have used.

DNN Classifier.
A Multilayer Perceptron (MLP) is composed of one input layer, one or more hidden layers, and one output layer. Every layer except the output layer includes a bias neuron and is fully connected to the next layer. When an ANN has two or more hidden layers, it is called a Deep Neural Network (DNN) [47].
For creating fully connected neural network layers, handy functions of TensorFlow are used. e DNNClassifier calls tf.estimator.DNNClassifier from the TensorFlow Python API [46]. is command builds a feedforward multilayer neural network that is trained with a set of labeled data in order to perform classification on similar, unlabeled data. As an activation function, we used ReLU and also regularization and normalization hyperparameters are optimized. e flexibility of neural networks is also one of their main drawbacks since there are many hyperparameters to tweak. Apart from using it in any imaginable topology, one can use it even in a simple MLP, where the number of layers, neurons per layer, the type of activation function used in each layer, the weight initialization logic, and many other parameters can be modified. erefore, on choosing the best combination of hyperparameters for the DNN model, both grid search and randomized search are used. Since Grid-SearchCV evaluates all combinations, it can take a long time to find the best hyperparameters. For that reason, the hyperparameter adjustment process for DNN is carried out in two steps. First, RandomizedSearchCV is used to narrow the range for each hyperparameter. an GridSearchCV is implemented using a grid based on the best values provided by the RandomizedSearchCV.

Logistic Regression.
In predicting the direction of the market, Logistic Regression is generally used to estimate the likelihood of a sample belonging to a particular class. For making it a binary classifier, the model predicts that the instance belongs to that class if the estimated probability is greater than 50%; otherwise, it does not.
Vectorized form of Logistic Regression model estimated probability is shown in equation (1). Logistic Regression model is computed by adding bias term to weighted sum of the input features, resulting as logistic outputs. As shown in equation (2), the logistic, also called the logit, is a sigmoid function that outputs a number between 0 and 1: After probability p � h θ (x) has been estimated by Logistic Regression model, prediction y can be calculated easily using equation (3). When θ T · x is positive, the model predicts as 1 (rise), and otherwise, it predicts 0 (fall): e aim of the training is to set the parameter vector θ, so that the model estimation is maximized. For this purpose, a cost function is used. e cost function over the whole training set is simply the average cost over all training instances. Since the cost function is convex, using Gradient Descent guarantees the finding of the global minimum [47].
To implement LR, Scikit-Learn Logistic Regression model is used. For model regularization, ℓ1and ℓ2 penalties are implemented. On Scikit-Learn, ℓ2 penalty is added by default [48].
Even though hyperparameters are not so critical on LR, we wanted to be sure that we are using the best hyperparameters for our dataset; therefore, the hyperparameters are tuned. e best model was chosen by using the Grid-SearchCV by defining the grid of the parameters desired to be tested in the model. Useful differences in performance or convergence with different solvers may be seen. erefore, "newton-cg," "lbfgs," and "liblinear" solvers have been added to the grid to be tested. As penalty (regularization), ℓ1 and ℓ2 parameters are used. Finally, C parameter is added to the gird, which controls regularization strength. As C parameters, 100, 10, 1.0, 0.1, and 0.01 values are used. Obtained best parameters for LR after running GridSearchCV are C � 0.01, penalty � "l2," and solver � "liblinear."

Random Forest.
Decision trees are one of the widely used machine learning methods to predict the direction of the stock market. Since there are extremely irregular patterns, trees need to grow very deep to learn these patterns, which can cause trees to overfit training sets. A slight noise in the data can cause the tree to grow in a completely different way. e reason for this is that decision trees have very low bias and high variance. Random Forest overcomes this problem by training multiple decision trees on different subspace of the feature space at the cost of slightly increased bias [34]. e Random Forest algorithm introduces extra randomness when growing trees; instead of searching for the best feature when splitting a node, it searches for the best feature among a random subset of features [47]. is means none of the trees in the forest sees the entire training data. e data are recursively split into partitions. At a particular node, the split is done by asking a question on an attribute. e choice for the splitting criterion is based on some impurity measures such as Shannon Entropy or Gini impurity. is results in a greater tree diversity, which trades a higher bias for a lower variance, generally yielding an overall better model.
In the implementation of RF, Scikit-Learn Random-ForestClassifier Python library is used [48]. As recommended by Breiman Random Forest classifier is trained with 500 trees, each limited to maximum 16 nodes [49]. To improve Random Forest Classifiers performance, hyperparameters are tuned using a grid search.
ere are more than fifteen parameters that can be tuned. We were focused on the most important five of these Complexity parameters. Parameters that are placed on the grid are number of trees in the forest-n-estimators: [100, 200, 300, 400]; maximum depth of the tree-max-depth: [50,60,70,80,90]; min number of samples required to split an internal node-min-samples-split: [8,10,12]; min number of samples required to be at a leaf node-min-samples-leaf: [3,4,5]; and the number of features to consider when looking for the best split-max-features: [2,3]. After the grid search is fitted to the data, best parameters are obtained. Obtained best parameters that are used in this research are nestimators � 200; max-depth � 60; min-samples-split � 12; min-samples-leaf � 5; and max-features � 3.

Support Vector Machines.
For assigning new unseen objects into a particular category by training a model, SVM is one of the most used binary classifiers. e main idea of SVM is to establish a decision boundary (hyperplane) in which the correct separation of rising and falling samples is maximized [50]. A hyperplane of n-dimensional feature vectors x � x 1 , . . . , x n can be defined as in equation (4) where the sum of the elements will be greater than 0 on one side and less than 0 on the other: e class of each point x i can be denoted by By maximizing the distance between the boundary and any point, we can get an optimal hyperplane. e best data splitting boundary is called maximum margin hyperplane. Data points close to the hyperplane are known as a Support Vector Classifier (SVC), and only these points are relevant to hyperplane selection. SVC cannot be applied to nonlinear functions. For solving this issue in SVM, a more general kernel function is applied as in equation (5) which is a quadratic programming (QP) optimization problem with linear constraints and can be solved by using standard QP solver: SVM is implemented through Pedregosa et al. Scikitlearn Python library using LinearSVC package [48]. Line-arSVC implements "one-vs-the-rest" multiclass strategy; since we have only two classes, only one model is trained.
In order to improve the performance of SVM, we are focused on tuning three major hyperparameters. Kernels, Regularisation, and Gamma are the most important parameters that affect performance. ese parameters are placed on the grid in order to be used by GridSearchCV for grid search. e model is evaluated for each combination of algorithm parameters specified in the grid. Used hyperparameters are as follows: C: [0.1, 1, 10, 100], gamma: [1, 0.1, 0.01, 0.001], and kernel: [rbf, poly, sigmoid]. After fitting GridSearchCV in the training data, the best estimators are acquired. Obtained best hyperparameters that are used in this study are C � 10, gamma � 0.1, and kernel � rbf.

Dataset.
In this study, nine years of BIST 100 index data ranging from January 2008 to December 2016 is obtained from Borsa Istanbul Datastore [51]. Although the BIST 100 data in the last few years are published with a time period of one second, the time period in the data we have obtained is ten seconds. Open-high-low-close (OHLC) prices were used to convert the dataset from ten seconds to different time periods. e conversion process is shown in Figure 3. Since dataset is converted from a lower time period to higher time periods, it can be inferred that there is not any missing data in the converted time periods.
For example, in the process of converting to an hourly dataset, the price at the beginning of the hour is taken as open price, maximum and minimum values at that hour are used as high and low prices, and the last price value of the hour is used as close price. In the same way, all the volumes in the hour were agglomerated and the total volume of that hour was obtained. We used open, high, low, and close prices and volume of index data within two hours, hourly, and 30 min periods. Bihourly, hourly, and 30 min datasets are composed of 9157, 18314, and 33673 rows, respectively. An example of hourly dataset is shown in Table 1. For each cross-validation k-fold value, in-sample period is used for training and out-of-sample period is used for evaluating forecasting performance.
When publications about stock predictions are reviewed, it is observed that technical indicators used in technical analysis are generally utilized to generate feature sets of prediction models [52]. Technical indicators are mathematical calculation methods used to analyze the prices of financial instruments. After some specific calculations on time series data, most of the indicators help investors to forecast price movement trends in the future. Some indicators, on the other hand, try to show whether a trend will continue or not. Indicators are calculated for a specific moment and period to enlighten the investors.
ere are literally hundreds of technical indicators that can be used for forecasting. Some of these indicators extract similar information and produce similar signals. e selection of the right and diverse set of indicators is important so that a diverse set of measures/indicators can be used as features in the formation of prediction models. e names and descriptions of the selected technical indicators used in the study are given in Table 2. Similar abbreviations have been used for the definition of indicators in Kumar's et al. [53] and Gündüz's et al. [54] studies. We use the same naming conventions in this study.
After the selection of the technical indicators, we have to determine time periods and required OHLC price data to be used in the calculation of these indicators. For example, the SMA, EMA, ROCP, and MOM indicators were calculated using the closing price of the BIST 100 index and on 3, 5, 10, 15, and 30 previous values of time series on two hour, hourly, and 30 min interval periods. e WILLR, CCI, UO, and ATR indicators were found using the daily maximum, minimum, and closing prices of the BIST 100 index. ese values are calculated using 4 time periods for WILLR, and one time period for CCI, UO, and ATR. With the calculation of different indicators for different time periods, we obtained 97 features for each period of the BIST 100 index. After the features are composed, min-max normalization is applied to each feature as in equation (6) where x � x 1 , . . . , x n expresses feature vectors and z is the normalized value of x. e dataset obtained was used for each model; there is no change in the dataset according to the models: In this study, class labeling is determined based on returns calculated according to the closing prices of the BIST 100 index's trading periods. r i symbolizes the return of the ith trading period and r (i+1) symbolizes the return of the next trading period where trading periods are defined as thirty minutes, one hour, and two hours, respectively. Also p i denotes the closing price of i-th trading period and p (i+1) denotes the closing price of the next trading period as it is used in the following equation: e class label for i-th period, i.e., y R i for Rise and y F i for Fall, is set based on the following equations: In the class labeling equations, the threshold value θ is used to arrange transaction costs and define targeted returns. Due to the transaction costs and risk of a stock exchange, investors are not willing to do too many transactions, at least the transaction costs are targeted to be met. In order to be able to take off from the transactions where the return is less than the transaction cost and also to be able to evaluate the success of the system according to prediction performance and compound return, different threshold values are used. ey were obtained from multiplying the standard deviation of returns by predetermined values. Predetermined values start from 0 and increase by 0.1 until they reach 0.5. In this way, six different threshold values were obtained.

Performance Measures and Implementation of Prediction
Model. Predicting the market direction, whether it moves upside or downside, is equally important since traders can make a profit from both sides. erefore, predicting index rise and index fall is modeled separately. In the first model, the system is trained to predict whether there will be a rise or not, and in the second model, the system is trained to predict whether there will be a fall or not. In order to overcome the transaction costs problem, we have used a dynamic threshold variable which helps us to eliminate small returns that are less than transaction costs. Evaluation metrics are needed to measure and compare the predictability of classifiers. To evaluate the performance and robustness of the proposed models, we have used performance metrics that are derived from confusion matrix like accuracy, precision, and recall. To evaluate the model's performance from a financial return perspective, we have used compound return   and return of investment metrics. Additionally, max drawdown measurement is used for evaluating the model's risk of investment.

Confusion Matrix.
In machine learning algorithms, classifiers' performance evaluation is mainly done by the confusion matrix. e number of true and false estimates is summarized by the counting of values separated by each class. It provides a simple way to visualize the performance and robustness of an algorithm.
Since we aim to estimate gains that cover transaction costs and focus on eliminating small returns, we use the threshold structure as shown in equations (8) and (9). For the evaluation of upward movement predictions, the confusion matrix is shown in Table 3. And also for evaluating downward movement predictions, the confusion matrix is shown in Table 4. For upward movement, positive observation is Rise and negative observation is Not Rise. Similarly, for downward movement, positive observation is Fall and negative observation is Not Fall.
Assessments of performance and robustness of the proposed models are calculated based on these four values of the confusion matrix. Accuracy, precision, recall, and Fscore are among important measures which are calculated from these values.
Accuracy percentage calculation is given in equation (10). Since accuracy measures true orders and our dataset is unbalanced, only the model evaluation by accuracy will not be enough: In the trading model, false positive (FP) means that actually there is no opportunity for profit, but the model indicates that you need to enter into trade (buy or sell). In this case, you will lose money, which is the worst possible situation.
us, choosing a model with minimum FP is crucial. is can be achieved by maximizing precision. e calculation of the percentage of precision is shown in the following equation: On the other hand, false negative (FN) means that although there is an opportunity to make money from trade, the model does not indicate that. In this case, the opportunity to make money will have escaped, but it will not be perceived as a major problem as there is no expectation in trading to predict every movement of the market. Recall maximization indicates FN minimization. Percentage of recall calculation is shown in the following equation: Lastly, the F-score provides insights for the relation between precision and recall. Since precision and recall are prioritized equally, F1-score is used as F-score. e following equation provides definitions of F1-score:

Compound Return.
Calculating the rate of return of our predictions correctly is one of the main concerns, since we are assuming to put all investment without excluding profit or compensate losses, in each trade. e compound return is one of the best measurement tools that fit for this purpose. Shown as a percentage, compound return indicates the outcome of a series of profits or losses on the initial investment over a while, in a continuous manner. When evaluating the performance of an investment's return over a time period, it is known that average return as a measurement tool is not as proper as compound return. is is because when the average return is used, the returns are independent of each other and the effect of each return cannot be carried on to the next step, resulting in failure to clearly determine the success of the model. For average return calculation, discrete returns can be used. Discrete returns are calculated as shown in equation (14), where P t represents the price at time t and P t+1 represents the price at time t + 1: When calculating the average return, discrete returns are summed and divided by the number of periods. e return of the aggregated multiperiod performance will only be correct if period returns are contributed. Since discrete returns are multiplicative, they will not be appropriate in this case. us, the correct aggregated performance is calculated using the compound return formula as shown in the following equation [55]: At the beginning of each period, trained models decide whether to enter the trade. If it enters a trade, at the end of the period, the trade closes. For each trade, discrete return is calculated. is means, if it is traded on the hourly time period, at the beginning of the hour, the model decides whether it should open an order a not. And at the end of the hour, the model closes the open order.

Maximum Drawdown.
e main concerns of the investment are capital protection and consistent estimations. Since maximum drawdown (MDD) is one of the most important measures of risk for a trading strategy, it plays a crucial role in evaluating the performance of the prediction model [56]. MDD value is calculated as shown in the following equation: where P represents peak profit before largest loss and L represents the lowest value of loss before the new profit peak is established. MDD is used to express the difference between the highest capital level and the lowest capital level, where the highest capital level must occur before the lowest capital level. e maximum drawdown duration is the longest time it takes for the forecasting model to recover the capital loss [57]. MDD structure has been illustrated in Figure 4. In this study, drawdowns are measured in percentage terms.

Experimental Results
Supply and demand helps to determine the price of each security or the willingness of participants-investors and traders-to trade. Buyers, in exchange, offer a maximum amount they would like to pay, which is usually lower than the demand of the sellers. In order a trade to take place, either buyer increases the price or seller reduces the price. According to this, if the purchase occurs, the price increases and the price decreases if sales are made. is shows that the decision of the investors has a direct effect on the price.
As mentioned before, we know that traders use technical analysis methods in decision making. e main idea in this study is that if the market's direction of movement is shaped by traders' transactions [2], and if the majority of traders are using technical analysis methods in the decision-making process [8], by training deep learning and classic machine learning methods using technical analysis indicators to estimate the market direction, we are actually modeling traders' behavior.
To strengthen this idea, first of all, we had to choose the best timeframes. We started with examining high-frequency trading (HFT) studies [58]. In these studies, the processing time ranged from milliseconds to seconds, and we observed that market makers frequently use these strategies. As we do not have an appropriate infrastructure, we have decided that these methods will not be applicable to us because the transactions to be made within these periods will not cover the costs.
In addition, we investigated studies, in which deep learning and machine learning methods have been successfully applied. ese studies attempt to predict a wider time frame, such as weekly, monthly, or annual estimates [30]. We did not find appropriate to use these time intervals because the sample size decreased dramatically in these studies. We think that predicting for such long horizons would be too risky.
After a long process of literature review, we decided to focus on intraday intervals rather than daily, weekly, or monthly time periods. Our decision is based on the fact that the vast majority of recent studies are focusing on intraday trading research. Also, their cumulative returns are higher than larger timeframes. ese facts are the main reasons for focusing on an intraday investigation. In addition, being less risky is another important factor that forced us to examine intraday market direction prediction.
In order to compare the performance of classification techniques according to the prepared dataset, four different machine learning methods were used in three different time periods and bidirectional (buy/sell) operations were tested on six different threshold values, resulting in a total of 144 aspects. To avoid the problem of overfitting which may arise while designing a supervised classification model for predicting the direction of the index, k-fold cross-validation is applied to each aspect where k value is set to ten. In the strategies applied according to the methods of deep learning and machine learning, there are 48 different results in each period. In the obtained results, the threshold value was tested with a total of six different threshold values starting at 0 up to 0.5, with incremental steps of 0.1. e detailed evaluation of the BIST 100 index direction forecast performance concerning rise and fall is listed in Tables 5 and 6, respectively. We compare the predictive performance of Deep Neural Network (DNN), Support Vector Machine (SVM), Random Forest (RF), and Logistic Regression (LR) on the out-of-sample test set in terms of confusion matrix values, accuracy (acc. %), precision (pre. %), recall (rec. %), and the F1-score (f1 %). Also we added maximum drawdown (mdd. %) and compound return (cmp.) on performance evaluation metrics.
It may be misleading to use only traditional machine learning performance assessment measures to evaluate the trading model estimate. For trading applications, higher accuracy in estimates does not always mean higher profits. Any trading strategy will ultimately lose money, even if the strategy appears to be profitable on paper if the returns are not high enough to come up above the transaction costs associated with commissions, spreads, and slips in a series of consecutive transactions. In a particular way, parameters such as threshold value, average return per transaction, maximum drawdown, and cumulative return represent a more appropriate measure for such a study [23].
Our main target was to investigate whether if it is possible to predict the BIST 100 index consistently using deep learning and machine learning classification approaches. For supporting decisions on financial markets, results are compared with the "buy and hold" strategy. Since the average return of the "buy and hold" strategy on the BIST 100 index on the test period is %15, from the results in Tables 5 and 6, it can be seen that compound return of both DNN and other methods outperform buy and hold strategy.
From Tables 5 and 6, it can be pointed out that the outcome acquired by the average 10-fold cross-validation seems to demonstrate the inverse correlation between accuracy and compound return. For instance, in Table 5 considering threshold values ranging from 0 to 0.5 for the Complexity 9 DNN, it can be noticed that precision and compound return (cmp.) decreases from 60 to 48 and from 3.34 to 1.12 whereas accuracy and average return (ret.) per trade increases from 58 to 77 and from 0.15 to 0.33, respectively. Additionally, by increasing the time period from 30 min to 2 hours, compound return decreases. e reason for the increase in accuracy and decrease in compound return is that by increasing the threshold value, we are aiming to minimize risk and to maximize return per trade. Results indicate that we are reaching our goal of minimizing the number of trades and increasing return per trade. By targeting larger returns, we reduce the number of transactions and eliminate smaller returns, which results in a reduction of compound return. e numbers of correct predictions though are increasing and likewise accuracy increases. Recall decreases as we are limiting the number of trades.
Similar results can be seen in Table 6 where fall direction is predicted. As expected, DNN performs better in smaller time periods where there are more records. By decreasing the number of instances, DNN performance decreases. On the other hand, the performance of Random Forest and SVM increases. By using threshold and time period structure, we are enabling investors to weigh the potential reward against the risk to decide if the pain is worth the potential gain.
e results obtained for predicting price rise are reported in Table 5. All models were compared according to the test results corresponding to each threshold value. From Table 5, it can be concluded that the highest compound return with minimum drawdown and maximum precision can be achieved with the DNN model.
In order to minimize risk, investors aim to diversify their portfolio. To build it without purchasing many individual stocks, they are investing in index funds instead. Even when financial system suffers from erratic behaviors and high volatility, it reflects as a loss to investors. Figures 5 and 6 compare BIST 100 index return with our DNN models return on different time periods when index performs well and when it causes loss.
2009 is one of the most profitable periods of the BIST 100 index in the dataset. In Figure 5, the DNN model's results are compared with the BIST 100 index return of this period. Furthermore, in Figure 6, the results of the DNN model were compared with the BIST 100 index, which has suffered a loss in 2016. As can be seen from both results, even investing in the index to achieve a diversified portfolio can lead to losses, while a more stable investment instrument can be obtained by investing according to our proposed deep learning and machine learning models. Independently of the index performing well or poorly, risks can be minimized while profit increases simultaneously with the proposed model.
In predicting the direction of the BIST 100 index with deep learning and machine learning methods, we have noticed that true-positive trades' gains are much higher than false-positive trades' losses. e accuracy and precision of our test results are close to 60%, which means, even when accuracy and precision are close to the 50% level, the system will be profitable since the gains from true-positive trades are greater than the losses from false-positive trades.
In most of the studies where deep learning and machine learning are applied, average return per trade is not evaluated [35,44,53]. ere are not many studies that include cumulative return in their results [23]; however, we could not find any study where the average return per trade is compared.
As can be seen from Tables 5 and 6, return percentage ret. % row is positive and it increases when the threshold value and time period increase. In Table 5, when the threshold value is 0 and the time period is 30 minutes, the return percentage is 0.15, and it increases to 0.74 when the threshold value is 0.5 and the time period is 2 hours. Similarly in Table 6, when the threshold value is 0 and the time period is 30 minutes, the return percentage is 0.19 and it increases to 0.58 where the threshold value is 0.5 and the time period is 2 hours. According to the results, we observe that DNN has a higher average return per transaction. e profit from the right decisions is greater than the loss from the wrong decisions, resulting in a higher compound return. Creating more profit from right decisions compared to the losses incurred from wrong decisions is the main objective of money management. As Druckenmiller, who was manager at Soros' Quantum Fund, says "I've learned many things from George Soros, but perhaps the most significant is that it's not whether you're right or wrong that's important, but how much money you make when you're right and how much you lose when you're wrong" [59]. From our results, we can infer that by applying deep learning and machine learning methods on predicting BIST 100 direction, profits of being right will be greater than the losses of being wrong.
From the results, we can see that, in smaller time periods, compound return is bigger and max drawdown is lower. erefore, by using smaller time periods, we can achieve lower risk and increase profit. We used three different time intervals to compare estimation performance and compound return over different time periods. Selection of the time period can be optimized by trying different values, but time period optimization is not the main focus of this study. From our results, we can infer that, by optimizing time period selection, compound return can increase and max drawdown can be decreased. One of the most important implications of our results is that deep learning and machine learning methods produce successful results when used in predicting market direction.
Our results indicate why large funds and experts are involved in using and studying deep learning and machine learning methods to predict financial markets [60].

Implications from Experiment Results.
To summarize the obtained results, although traditional machine learning techniques are still preferred mainstream methods in predictive analysis, recent research shows that these methods do not capture the properties of complex, nonlinear problems as well as deep learning methods. Accordingly, these experiments show that a deep learning algorithm, indirectly, has the capacity to produce an appropriate representation of information.
Even though according to the confusion matrix DNN model performs notably better than other models according to compound return, max drawdown, and precision, we aimed to identify if any observed difference is statistically significant. For comparing the statistical significance of the models, McNemar test is used, in which it captures the errors made by both models [61]. e null hypothesis is the expression that classifiers have a similar proportion of errors on the test set. On McNemar test, the p value is below a given threshold (0.05) only on DNN-RF comparison. We can reject the null hypothesis since the p value is 0.048 and infer that the difference between DNN and RF classifiers is statistically significant. But being statistically significant does not mean that the trading model will be successful since a large loss will result in multiple winnings. erefore, on model evaluation, return per trade should be considered.
Our results on market direction prediction suggest that better results are achieved with deep learning than classical machine learning methods. Nevertheless, the complex architecture of deep learning models must be considered. Even if advanced libraries such as TensorFlow and Keras are used, there is a need for comprehensive understanding and solid experimentation to use such models efficiently.

Conclusions
Recent trends have led to an increase in studies of models based on technical indicators, demonstrating that both professionals and individual investors use technical indicators. Assuming that most people make their investments according to technical indicators, we can confirm that technical indicators actually show the behaviour of investors. Accordingly, in this study, technical indicators applied to BIST 100 index data were used as input for modeling trader behaviour using deep learning and machine learning methods.

Complexity 13
As can be deduced from the test results, the main contribution of this study is enabling traders to make profits even if there is negative news and index loss in the market with the deep learning model developed considering OHLC prices and transaction costs.
In this study, we have compared Deep Neural Network with Support Vector Machine, Random Forest, and Logistic Regression on predicting the direction of BIST 100 index in intraday time periods using different threshold values. To test the robustness and performance of these models, empirical studies have been performed on these machine learning methods in three different time periods. Bidirectional (buy/sell) operations were tested on six different threshold values, resulting in a total of 144 aspects. To avoid the problem of overfitting, k-fold crossvalidation is applied to each aspect where k value is set to ten. Metrics such as accuracy, precision, recall, F1-score, compound return, and max drawdown have been used to evaluate the performance of these models on predicting direction.
Empirical findings suggest the superiority of the proposed DNN model on lower threshold values and smaller time periods when evaluated based on compound return, average return per trade, and max drawdown. As threshold value increases, the superiority of DNN model over other machine learning methods reduces. Also by using DNN and machine learning methods, we can achieve a model where the number of true-positive orders is higher than that of false-positive orders. At the same time, average returns per trade of true-positive orders are higher than average losses per trade of false-positive orders. e findings of this study suggest that even if the precision of the model may be close to 60 percent, the outcome from using the same model is profitable. If it is considered what the results obtained mean in practice to a trader. Six out of ten transactions proposed by the model will be in the right direction and the return of these transactions will be higher than the expense of the faulty transactions. If the investment is fixed, depending on the selected threshold value, if the model earns 100 TL from the correct transactions, it loses 75 TL from the faulty ones. Accordingly, the model will make an average of 600 TL (6 * 100) profit and 300 TL (4 * 75) losses. In total, approximately 300 TL profit will be gained. ese transactions are expected to take place within a few weeks, depending on the time period to be selected.

Data Availability
e "BIST100" data used to support the findings of this study were supplied by "Borsaİstanbul" under license and so cannot be made freely available. Requests for access to these data should be made to "Datastore-Borsaİstanbul" (https:// datastore.borsaistanbul.com/).

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.