GitHub

Data Science and Machine Learning Projects, by Francesco Parisio

About

The repository contains a series of data science and machine learning projects that I carried out. The content is divided into subcategories and each project is implemented as either jupyter notebook or a python script.

Summary: This project forecasts a 1-year increase of 6.7% in CO₂ emissions from natural gas-based electricity production in the United States using an optimized and cross-validated SARIMAX model, with an uncertainty of about 10%. The model's reliability and robustness are demonstrated by a Symmetric Mean Absolute Percentage Error (SMAPE) below 3% on testing data and a cross-validation showing 75% of forecasts with a SMAPE less than 15%. These forecasts can inform policymakers on the need for stronger measures to reduce CO₂ emissions.

Tools: Python, Statsmodels, Pandas, Numpy, Matplotlib, Itertools, Tqdm, Scipy, Seaborn, SARIMAX, Time Series Analysis, Rolling-split Validation, Monte Carlo Simulations.

Recommendation system for Amazon products

Summary: I developed a series of recommendation systems for Amazon products using collaborative filtering, matrix factorization, and optimization techniques to achieve a minimum precision of 85%. The collaborative filtering models use cosine similarity and KNN for user-user or item-item approaches, while the matrix factorization model employs the SVD algorithm. Each model's performance is evaluated and optimized using precision, recall, and F1 score metrics, with the user-user and SVD models achieving the best results.

Tools: Python, Collections, Surprise, Scikit-leanr, Pandas, Numpy, Matplotlib, Recommendation Systems, Collaborative Filtering, KNN, SVD.

Customer churn prediction

Summary: I constructed a machine learning pipeline that automates exploratory data analysis (EDA) and trains two predictive models (Random Forest and Logistic Regression) to forecast customer churn. The code quality is ensured through linting, formatting, and pre-commit githooks verification. This project serves as an example of best practices in MLOps.

Tools: Python, Scikit-learn, Pandas, Numpy, Matplotlib, Pytest, Pylint, Black.

Convolutional neural networks to recognize digits

Summary: I developed a convolutional neural network (CNN) to identify housing number digits using the Street View Housing Numbers (SVHN) image dataset. By incorporating additional convolutional layers, batch normalization, and dropout, the model achieved over 90% accuracy. This deep learning approach effectively recognizes digits from labeled image databases, demonstrating the versatility of CNNs in image recognition tasks.

Tools: Python, Scikit-learn, Pandas, Numpy, Matplotlib, TensorFlow, Keras, Artificial Neural Networks, Convolutional Neural Networks, Image Recognition.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
DigitRecognition		DigitRecognition
Images		Images
RecommendationSystem		RecommendationSystem
TimeSeriesCO2ForecastNatGas		TimeSeriesCO2ForecastNatGas
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science and Machine Learning Projects, by Francesco Parisio

About

Table of Contents

Time series forecasting of CO₂ emissions

Recommendation system for Amazon products

Customer churn prediction

Convolutional neural networks to recognize digits

About

Releases

Packages

Languages

License

fparisio/DataScience

Folders and files

Latest commit

History

Repository files navigation

Data Science and Machine Learning Projects, by Francesco Parisio

About

Table of Contents

Time series forecasting of CO2 emissions

Recommendation system for Amazon products

Customer churn prediction

Convolutional neural networks to recognize digits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Time series forecasting of CO₂ emissions

Packages