PROJECTS | personal-resume

RNN Stock Prediction

Keywords: Recurrent Neural Network, GRU, LSTM, Backtesting

Abstract:

This repository aimed at applying a stacked GRU / LSTM to generate stock trading signals in order to make profit from the stock markets. The model was written in Python and the neural network was coded with Pytorch. In many academic papers, the market trend of stock price can be predicted by the technical analysis indicators, i.e. RSI, MACD, and EMA. This project made use of these indicators together with the AI algorithm to predict the future price movement. The result was quite interesting. If you are interested to try yourself, from my github post, you can clone all the files and run the 'mainloop.py' for result generation and testing. The hyperparameters can be adjusted in this file as well. All the result will be saved for further analysis. The stock data was downloaded by the 'Yahoo Finance' API in the codes. You need to choose a stock by there 'ticker' and the period that you want to investigate. The train test split can be adjusted in the mainloop.py file.

Model Training

Data Loader:

The downloaded stcok price data is then pass to generate the technical anaylsis tools including EMA short lines, EMA long lines, RSI, and MACD for features input. In the code, as RNN model is a supervised learning model, a label column has to be added as the target of the output. In order to signify the trading signals, a binary label with 0 and 1 was used. 0 represented holding cash or selling a stock; where 1 represented buying a stock or holding a stock. The labels depended on the percentage change of the current price to previous price (‘pct_change’). Also, a 5-day average percentage change of price was used instead of one day change. Given the fact that a shorter window are prone to short term noise, 5-day average price were introduced to calculate the signal label. The data loader allowed the model to retrieve a series of training data as input and the label as the output.

Neural Network:

Stacked GRU/LSTM

As both networks work and be coded similarly, here GRU was used as example to explain the mechanism. A 3-layer stacked GRU was used. The input size of GRU aligned with the original shape so that no reshaping was needed. The hidden dimensions in all layers was set to 25. To begin with, a hidden state had to be initialized as another input to the network. This hidden state was initialized with all zeros and shape equal to (layer number, Batch size, Hidden dimension). The output from the GRU layers was of shape equal to (Batch size, hidden dimension x timestep). Firstly, Dropout with ratio 0.5 was carried out to avoid overfitting problem. In order to further extract the information learned from the GRU, a hidden linear layer with 8 neurons and ReLu activation was used. The shape needed to be converted to 1 output units and sigmoid function was applied.

Loss Function and Optimization

The loss function implemented was the Mean Squared Errors (MSE) between the predicted values and the target values.
Loss=∑(Y_i- Y'_i)^2 / n ----(1)
where Y_i was the predicted values, Y'_i was the target value and n is the batch size. ADAM optimization algorithm was used.

Validation

The training was carried out with validation applied. To avoid overfitting problem, the training was kept on until the stop criteria met. There were training loss tolerance and validation loss tolerance. When the number of epochs for the training loss stop reducing which exceed the tolerance number, the stop criteria were triggered.

Hyperparameters

All the hyperparameters were listed below:

Training Algorithm:

Initialize all the hyperparameters
Initialize Network with random weights
For each Epoch:
1. Initialize hidden states for encoder and decoder
2. For each time step:
  1. Input the data retrieved from the data loader to the neural network
  2. Obtaining the predicted values.
  3. Calculate loss by the Equation (1)
  4. Carry out gradient descent by Adam optimization to update the weights in the Network

End For

3. Calculate and save the validation loss

End For

4. If minimum of training loss or validation loss was 30 steps ahead, Stop training.

Model Testing

In the testing phase, the models were kept training as the same practice in the RL models. Most of the settings were kept the same as those in the training phase. For the output, as sigmoid activation function was applied to it, the output was in the range of [0,1]. In order to perform the action, a threshold number was implemented. If the output was greater than the threshold, the output was treated as 1, vice versa.
Threshold=(maxY_i + min⁡〖Y_i 〗)/2
where Y_i was the predicted values. Finally, the batch size was reduced to 15 in order to keep the network up to date.

Testing Algorithm:

Initialize all the hyperparameters
Initialize Network with best weights obtained in the training
Initialize hidden states for encoder and decoder
For each time step:

i. Input the data retrieved from the data loader to the neural network

ii. Obtaining the predicted values.
iii. Calculate loss by the Equation (1)
iv. Carry out gradient descent by Adam optimization to update the weights in the Network

v. use Equation (21) to calculate the predicted label

End for

Backtesting

The model was backtested with the following evaluation metrics:

1. Sharpe Ratio

Sharpe Ratio was one of the most popular indices in the financial field. It can be defined as:

Sharpe Ratio= (R- Rf) / SD

where R is the return of the asset, Rf is the risk free rate, and SD is the standard deviation of the asset value. Sharpe ratio captured both the return and the standard deviation of the asset price which was a good indicator for investors who wanted to maximize return and minimize the fluctuation of the asset. Sharpe Ratio does not have a range as both the return and SD are unbounded. To be noted that, the risk-free rate under current market is close to zero. To simplify the calculation, Rf was neglected.

2. Win Ratio

Win Ratio was defined as the percentage of winning trades in all the buy and sell trading. A good trading strategy can maximize the ratio. To better understand the win ratio in the dataset, in Section 3.2.1, the win ratio for all the dataset were analysed.

3. Max Drawdown / Return

As the name described, Maximum Drawdown and Return referred to the minimum and maximum percentage of asset value. By compare these indicators, the risk of trading strategy can be revealed. The alternative was the asset price SD. However, by looking into the maximum drawdown / return, the range of the PnL was easily presented.

4. Annualized Return

In different datasets, there were different trading days. To fairly compare the returns, annualized returns were collected. It can be defined as:

Annualized Return=(1+R)^(1/period) -1

where R is the Total Return and period is the total investment time period.

Disclaimer: The above model is for academic research purpose. Investor should not be use it for making any investment decision.

Back to Project