Stock Trend Analysis Using Deep Learning

Project Overview

Target Audience

Financial Analysts and Traders looking for data-driven insights to enhance investment strategies and make more informed trading decisions.
Investors seeking tools to predict market trends and make well-informed decisions about their investments.

Data Description

We are using stock prices dataset from New York Stock Exchange The dataset covers the period from 2010 to the end of 2016 and includes stock data for 501 companies listed on the S&P 500 index. It consists of 851,264 individual data points, each representing daily stock market information for a specific company.

Key features used for prediction include:

Opening Price
Highest Price
Lowest Price
Adjusted Close Price
Trading Volume

The dataset includes adjustments for 140 stock splits, recorded in the file prices-split-adjusted.csv. Past prices are retroactively adjusted to match the current scale of the stock price after all the splits during the 2010–2016 period. This ensures that the historical prices are comparable to today's prices, making it easier to analyze the stock's performance over time.

Required Libraries

Pandas: For data manipulation and analysis.
Numpy: For numerical operations and handling arrays.
Scikit-learn: For building and evaluating machine learning models.
Matplotlib.pyplot: For creating visualizations like charts and plots.
Seaborn: For statistical data visualization, providing an interface for creating complex visualizations easily.
Shap: For model explainability, to analyze how individual features contribute to model predictions.

Methodology

Data Preprocessing

Target variable is created to predict if the next day's closing price will be higher (1) or lower (0) than the current day's closing price.
Addressed missing values, outliers, and inconsistencies within the dataset.
Created new features daily_return and volatility to enhance the predictive model.
Applied One Hot Encoding to encode the categorical column.
Numerical features are scaled using the StandardScaler to ensure that all features contribute equally to the model.

Exploratory Data Analysis (EDA)

Plotted histograms to visualize the distribution and identify patterns, skewness, or outliers in the data.
Computed the correlation matrix for selected numerical features and the target variable.
Visualized the correlations using a heatmap to understand relationships between features and the target.

Model Selection and Training

An LSTM-based neural network is used for binary classification, where the model predicts whether the closing price will increase (1) or decrease (0) on the next day.
The model consists of two LSTM layers with dropout for regularization, followed by dense layers and a final sigmoid activation for binary classification.
The model is trained with early stopping to prevent overfitting.
The training process is conducted over a maximum of 300 epochs with a batch size of 512.

Main Findings

Results

As training progresses, both the training and validation loss gradually decrease, indicating that the model is learning and improving.
The validation accuracy shows a steady improvement, although it is still below 60%. This suggests that the model is able to make reasonable predictions, but there is room for improvement.
The training accuracy also shows a steady increase, reaching about 60% by epoch 153, while the validation accuracy stabilizes around 58%.

Plotting Learning Curve

The training accuracy increases steadily, reflecting the model's improving performance on the training data but the validation accuracy increases upto a certain point (around epoch 50) and then stabilizes. The gap between the traing and validation loss/accuracy suggests some level of overfitting.

Model Evaluation

The overall performance metrics showed that the model is not well optimized yet. Recall is particularly low. ROC-AUC being near 0.5 indicates the model struggles to differentiate between classes effectively.

Confusion Matrix

The model struggles to identify "Increase (1)" cases, as shown by the high number of false negatives (40,407). The imbalance between true positives and false negatives suggests poor recall for class "Increase (1)."

ROC Curve

Credits

This project was developed collaboratively by the following team members:

Igor T.
Developed and set up the GitHub repository for the project.
Enhanced data cleaning and feature selection code, refining indicators to improve model performance.
Contributed to the preparation of visualizations for model results and performance metrics.
Mehran Hassanzadeh
Built the deep learning model using LSTM (Long Short-Term Memory) for stock price prediction.
Applied advanced techniques for model optimization and tuning to improve prediction accuracy.
Implemented additional evaluation methods (e.g., cross-validation, hyperparameter tuning) to better assess and compare the models' performance.
Designed and executed comprehensive Exploratory Data Analysis (EDA) to visualize data patterns, identify trends, and select key features for model building.
Sidra Zain SidraOB
Assisted in preprocessing the dataset.
Coordinated and contributed to the final testing and model deployment process.
Authored the README documentation to clearly communicate the project setup, implementation steps, and model evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
data/raw		data/raw
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stock Trend Analysis Using Deep Learning

Project Overview

Target Audience

Data Description

Required Libraries

Methodology

Data Preprocessing

Exploratory Data Analysis (EDA)

Model Selection and Training

Main Findings

Results

Plotting Learning Curve

Model Evaluation

Confusion Matrix

ROC Curve

Credits

About

Releases

Packages

Contributors 3

Languages

triggor/nyse-prediction

Folders and files

Latest commit

History

Repository files navigation

Stock Trend Analysis Using Deep Learning

Project Overview

Target Audience

Data Description

Required Libraries

Methodology

Data Preprocessing

Exploratory Data Analysis (EDA)

Model Selection and Training

Main Findings

Results

Plotting Learning Curve

Model Evaluation

Confusion Matrix

ROC Curve

Credits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages