forked from AI4Finance-Foundation/FinRL
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathStock_NeurIPS2018_SB3.py
502 lines (364 loc) · 15.5 KB
/
Stock_NeurIPS2018_SB3.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
#%% md
# <a href="https://colab.research.google.com/github/AI4Finance-Foundation/FinRL/blob/master/FinRL_StockTrading_NeurIPS_2018.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
#%% md
# Deep Reinforcement Learning for Stock Trading from Scratch: Multiple Stock Trading
# * **Pytorch Version**
#%% md
# Content
#%% md
"""
Outline
* [1. Problem Definition](#0)
* [2. Getting Started - Load Python packages](#1)
* [2.1. Install Packages](#1.1)
* [2.2. Check Additional Packages](#1.2)
* [2.3. Import Packages](#1.3)
* [2.4. Create Folders](#1.4)
* [3. Download Data](#2)
* [4. Preprocess Data](#3)
* [4.1. Technical Indicators](#3.1)
* [4.2. Perform Feature Engineering](#3.2)
* [5.Build Environment](#4)
* [5.1. Training & Trade Data Split](#4.1)
* [5.2. User-defined Environment](#4.2)
* [5.3. Initialize Environment](#4.3)
* [6.Implement DRL Algorithms](#5)
* [7.Backtesting Performance](#6)
* [7.1. BackTestStats](#6.1)
* [7.2. BackTestPlot](#6.2)
* [7.3. Baseline Stats](#6.3)
* [7.3. Compare to Stock Market Index](#6.4)
* [RLlib Section](#7)
Part 1. Problem Definition
This problem is to design an automated trading solution for single stock trading. We model the stock trading process as a Markov Decision Process (MDP). We then formulate our trading goal as a maximization problem.
The algorithm is trained using Deep Reinforcement Learning (DRL) algorithms and the components of the reinforcement learning environment are:
* Action: The action space describes the allowed actions that the agent interacts with the
environment. Normally, a ∈ A includes three actions: a ∈ {−1, 0, 1}, where −1, 0, 1 represent
selling, holding, and buying one stock. Also, an action can be carried upon multiple shares. We use
an action space {−k, ..., −1, 0, 1, ..., k}, where k denotes the number of shares. For example, "Buy
10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or −10, respectively
* Reward function: r(s, a, s′) is the incentive mechanism for an agent to learn a better action. The change of the portfolio value when action a is taken at state s and arriving at new state s', i.e., r(s, a, s′) = v′ − v, where v′ and v represent the portfolio
values at state s′ and s, respectively
* State: The state space describes the observations that the agent receives from the environment. Just as a human trader needs to analyze various information before executing a trade, so
our trading agent observes many different features to better learn in an interactive environment.
* Environment: Dow 30 consituents
The data of the single stock that we will be using for this case study is obtained from Yahoo Finance API. The data contains Open-High-Low-Close price and volume.
"""
# import libraries
from __future__ import annotations
import datetime
import os
import sys
from pprint import pprint
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from finrl import config
from finrl import config_tickers
from finrl.agents.stablebaselines3.models import DRLAgent
from finrl.main import check_and_make_directories
from finrl.meta.data_processor import DataProcessor
from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv
from finrl.meta.preprocessor.preprocessors import data_split
from finrl.meta.preprocessor.preprocessors import FeatureEngineer
from finrl.meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.plot import backtest_plot
from finrl.plot import backtest_stats
from finrl.plot import get_baseline
from finrl.plot import get_daily_return
# matplotlib.use('Agg')
# %matplotlib inline
sys.path.append("../FinRL-Library")
import itertools
from finrl.config import (
DATA_SAVE_DIR,
TRAINED_MODEL_DIR,
TENSORBOARD_LOG_DIR,
RESULTS_DIR,
TRAIN_START_DATE,
TRAIN_END_DATE,
TEST_START_DATE,
TEST_END_DATE,
TRADE_START_DATE,
TRADE_END_DATE,
)
# Use check_and_make_directories() to replace the following
#
# if not os.path.exists("./" + config.DATA_SAVE_DIR):
# os.makedirs("./" + config.DATA_SAVE_DIR)
# if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
# os.makedirs("./" + config.TRAINED_MODEL_DIR)
# if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
# os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
# if not os.path.exists("./" + config.RESULTS_DIR):
# os.makedirs("./" + config.RESULTS_DIR)
check_and_make_directories(
[DATA_SAVE_DIR, TRAINED_MODEL_DIR, TENSORBOARD_LOG_DIR, RESULTS_DIR]
)
# """
#
# <a id='2'></a>
# # Part 3. Download Data
# Yahoo Finance is a website that provides stock data, financial news, financial reports, etc. All the data provided by Yahoo Finance is free.
# * FinRL uses a class **YahooDownloader** to fetch data from Yahoo Finance API
# * Call Limit: Using the Public API (without authentication), you are limited to 2,000 requests per hour per IP (or up to a total of 48,000 requests a day).
#
# # -----
# class YahooDownloader:
# Provides methods for retrieving daily stock data from
# Yahoo Finance API
#
# Attributes
# ----------
# start_date : str
# start date of the data (modified from config.py)
# end_date : str
# end date of the data (modified from config.py)
# ticker_list : list
# a list of stock tickers (modified from config.py)
#
# Methods
# -------
# fetch_data()
# Fetches data from yahoo API
#
#
# %%
#
# # from config.py TRAIN_START_DATE is a string
# config.TRAIN_START_DATE
#
# #%%
#
# # from config.py TRAIN_END_DATE is a string
# config.TRAIN_END_DATE
#
# #%%
# """
df = YahooDownloader(
start_date="2009-01-01",
end_date="2021-10-31",
ticker_list=config_tickers.DOW_30_TICKER,
).fetch_data()
print(f"config_tickers.DOW_30_TICKER: {config_tickers.DOW_30_TICKER}")
print(f"df.shape: {df.shape}")
df.sort_values(["date", "tic"], ignore_index=True).head()
# """
# # Part 4: Preprocess Data
# Data preprocessing is a crucial step for training a high quality machine learning model. We need to check for missing data and do feature engineering in order to convert the data into a model-ready state.
# * Add technical indicators. In practical trading, various information needs to be taken into account, for example the historical stock prices, current holding shares, technical indicators, etc. In this article, we demonstrate two trend-following technical indicators: MACD and RSI.
# * Add turbulence index. Risk-aversion reflects whether an investor will choose to preserve the capital. It also influences one's trading strategy when facing different market volatility level. To control the risk in a worst-case scenario, such as financial crisis of 2007–2008, FinRL employs the financial turbulence index that measures extreme asset price fluctuation.
# """
fe = FeatureEngineer(
use_technical_indicator=True,
tech_indicator_list=config.INDICATORS,
use_vix=True,
use_turbulence=True,
user_defined_feature=False,
)
processed = fe.preprocess_data(df)
list_ticker = processed["tic"].unique().tolist()
list_date = list(
pd.date_range(processed["date"].min(), processed["date"].max()).astype(str)
)
combination = list(itertools.product(list_date, list_ticker))
processed_full = pd.DataFrame(combination, columns=["date", "tic"]).merge(
processed, on=["date", "tic"], how="left"
)
processed_full = processed_full[processed_full["date"].isin(processed["date"])]
processed_full = processed_full.sort_values(["date", "tic"])
processed_full = processed_full.fillna(0)
processed_full.sort_values(["date", "tic"], ignore_index=True).head(10)
# """
# # Part 5. Design Environment
# Considering the stochastic and interactive nature of the automated stock trading tasks, a financial task is modeled as a **Markov Decision Process (MDP)** problem. The training process involves observing stock price change, taking an action and reward's calculation to have the agent adjusting its strategy accordingly. By interacting with the environment, the trading agent will derive a trading strategy with the maximized rewards as time proceeds.
#
# Our trading environments, based on OpenAI Gym framework, simulate live stock markets with real market data according to the principle of time-driven simulation.
#
# The action space describes the allowed actions that the agent interacts with the environment. Normally, action a includes three actions: {-1, 0, 1}, where -1, 0, 1 represent selling, holding, and buying one share. Also, an action can be carried upon multiple shares. We use an action space {-k,…,-1, 0, 1, …, k}, where k denotes the number of shares to buy and -k denotes the number of shares to sell. For example, "Buy 10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or -10, respectively. The continuous action space needs to be normalized to [-1, 1], since the policy is defined on a Gaussian distribution, which needs to be normalized and symmetric.
#
# # Training data split: 2009-01-01 to 2020-07-01
# # Trade data split: 2020-07-01 to 2021-10-31
# """
train = data_split(processed_full, "2009-01-01", "2020-07-01")
trade = data_split(processed_full, "2020-07-01", "2021-10-31")
print(f"len(train): {len(train)}")
print(f"len(trade): {len(trade)}")
#%%
print(f"train.tail(): {train.tail()}")
#%%
print(f"trade.head(): {trade.head()}")
#%%
print(f"config.INDICATORS: {config.INDICATORS}")
#%%
stock_dimension = len(train.tic.unique())
state_space = 1 + 2 * stock_dimension + len(config.INDICATORS) * stock_dimension
print(f"Stock Dimension: {stock_dimension}, State Space: {state_space}")
#%%
buy_cost_list = sell_cost_list = [0.001] * stock_dimension
num_stock_shares = [0] * stock_dimension
env_kwargs = {
"hmax": 100,
"initial_amount": 1000000,
"num_stock_shares": num_stock_shares,
"buy_cost_pct": buy_cost_list,
"sell_cost_pct": sell_cost_list,
"state_space": state_space,
"stock_dim": stock_dimension,
"tech_indicator_list": config.INDICATORS,
"action_space": stock_dimension,
"reward_scaling": 1e-4,
}
e_train_gym = StockTradingEnv(df=train, **env_kwargs)
#%% md
## Environment for Training
#%%
env_train, _ = e_train_gym.get_sb_env()
print(f"type(env_train): {type(env_train)}")
#%% md
# """
# # Part 6: Implement DRL Algorithms
# * The implementation of the DRL algorithms are based on **OpenAI Baselines** and **Stable Baselines**. Stable Baselines is a fork of OpenAI Baselines, with a major structural refactoring, and code cleanups.
# * FinRL library includes fine-tuned standard DRL algorithms, such as DQN, DDPG,
# Multi-Agent DDPG, PPO, SAC, A2C and TD3. We also allow users to
# design their own DRL algorithms by adapting these DRL algorithms.
# """
agent = DRLAgent(env=env_train)
# """
# Model Training: 5 models, A2C DDPG, PPO, TD3, SAC
# """
### Model 1: A2C
#%%
agent = DRLAgent(env=env_train)
model_a2c = agent.get_model("a2c")
#%%
trained_a2c = agent.train_model(
model=model_a2c, tb_log_name="a2c", total_timesteps=50000
)
### Model 2: DDPG
#%%
agent = DRLAgent(env=env_train)
model_ddpg = agent.get_model("ddpg")
#%%
trained_ddpg = agent.train_model(
model=model_ddpg, tb_log_name="ddpg", total_timesteps=50000
)
### Model 3: PPO
agent = DRLAgent(env=env_train)
PPO_PARAMS = {
"n_steps": 2048,
"ent_coef": 0.01,
"learning_rate": 0.00025,
"batch_size": 128,
}
model_ppo = agent.get_model("ppo", model_kwargs=PPO_PARAMS)
#%%
trained_ppo = agent.train_model(
model=model_ppo, tb_log_name="ppo", total_timesteps=50000
)
### Model 4: TD3
agent = DRLAgent(env=env_train)
TD3_PARAMS = {"batch_size": 100, "buffer_size": 1000000, "learning_rate": 0.001}
model_td3 = agent.get_model("td3", model_kwargs=TD3_PARAMS)
#%%
trained_td3 = agent.train_model(
model=model_td3, tb_log_name="td3", total_timesteps=30000
)
### Model 5: SAC
agent = DRLAgent(env=env_train)
SAC_PARAMS = {
"batch_size": 128,
"buffer_size": 1000000,
"learning_rate": 0.0001,
"learning_starts": 100,
"ent_coef": "auto_0.1",
}
model_sac = agent.get_model("sac", model_kwargs=SAC_PARAMS)
trained_sac = agent.train_model(
model=model_sac, tb_log_name="sac", total_timesteps=60000
)
# """
# ## Trading
# Assume that we have $1,000,000 initial capital at 2020-07-01. We use the DDPG model to trade Dow jones 30 stocks.
#
# #%% md
#
# ### Set turbulence threshold
# Set the turbulence threshold to be greater than the maximum of insample turbulence data, if current turbulence index is greater than the threshold, then we assume that the current market is volatile
# """
data_risk_indicator = processed_full[
(processed_full.date < "2020-07-01") & (processed_full.date >= "2009-01-01")
]
insample_risk_indicator = data_risk_indicator.drop_duplicates(subset=["date"])
insample_risk_indicator.vix.describe()
insample_risk_indicator.vix.quantile(0.996)
insample_risk_indicator.turbulence.describe()
insample_risk_indicator.turbulence.quantile(0.996)
# """
# ### Trade
#
# DRL model needs to update periodically in order to take full advantage of the data, ideally we need to retrain our model yearly, quarterly, or monthly. We also need to tune the parameters along the way, in this notebook I only use the in-sample data from 2009-01 to 2020-07 to tune the parameters once, so there is some alpha decay here as the length of trade date extends.
#
# Numerous hyperparameters – e.g. the learning rate, the total number of samples to train on – influence the learning process and are usually determined by testing some variations.
#
# """
# trade = data_split(processed_full, '2020-07-01','2021-10-31')
e_trade_gym = StockTradingEnv(
df=trade, turbulence_threshold=70, risk_indicator_col="vix", **env_kwargs
)
# env_trade, obs_trade = e_trade_gym.get_sb_env()
print(f"trade.head(): {trade.head()}")
df_account_value, df_actions = DRLAgent.DRL_prediction(
model=trained_sac, environment=e_trade_gym
)
print(f"df_account_value.shape: {df_account_value.shape}")
print(f"df_account_value.tail(): {df_account_value.tail()}")
#%%
print(f"df_actions.head(): {df_actions.head()}")
#%% md
# """
# # # Part 7: Backtest Our Strategy
# Backtesting plays a key role in evaluating the performance of a trading strategy. Automated backtesting tool is preferred because it reduces the human error. We usually use the Quantopian pyfolio package to backtest our trading strategies. It is easy to use and consists of various individual plots that provide a comprehensive image of the performance of a trading strategy.
#
# #%% md
# """
# """
# # 7.1 BackTestStats
# pass in df_account_value, this information is stored in env class
# """
#%%
print("==============Get Backtest Results===========")
now = datetime.datetime.now().strftime("%Y%m%d-%Hh%M")
perf_stats_all = backtest_stats(account_value=df_account_value)
perf_stats_all = pd.DataFrame(perf_stats_all)
perf_stats_all.to_csv("./" + config.RESULTS_DIR + "/perf_stats_all_" + now + ".csv")
#%%
# baseline stats
print("==============Get Baseline Stats===========")
baseline_df = get_baseline(
ticker="^DJI",
start=df_account_value.loc[0, "date"],
end=df_account_value.loc[len(df_account_value) - 1, "date"],
)
stats = backtest_stats(baseline_df, value_col_name="close")
#%%
df_account_value.loc[0, "date"]
#%%
df_account_value.loc[len(df_account_value) - 1, "date"]
#%% md
#
# <a id='6.2'></a>
# ## 7.2 BackTestPlot
#%%
print("==============Compare to DJIA===========")
# %matplotlib inline
# S&P 500: ^GSPC
# Dow Jones Index: ^DJI
# NASDAQ 100: ^NDX
backtest_plot(
df_account_value,
baseline_ticker="^DJI",
baseline_start=df_account_value.loc[0, "date"],
baseline_end=df_account_value.loc[len(df_account_value) - 1, "date"],
)