This is the final project for the Data Scientist Nanodegree, where our goal is to predict churn for a fictional streaming service called Sparkify.
-
Updated
Jul 6, 2023 - HTML
This is the final project for the Data Scientist Nanodegree, where our goal is to predict churn for a fictional streaming service called Sparkify.
Churn Prediction using PySpark
Students will build an ETL pipeline that extracts data from S3, stages them in Redshift, and transforms data into a set of dimensional tables for their analytics team.
This Git repo showcases my analysis of Sparkify dataset with PySpark on Apache Spark cluster mode and JupyterLab on Docker. The goal was to identify at-risk customers and develop retention strategies. The analysis tested multiple machine learning models and uncovered insights into customer behavior and churn patterns.
Cloud Data Warehouse of Sparkify Data using Redshift
Data Analysis in Spark to Identify Customer Churn for a fictional music service.
Sparkify project for predicting customer loyality.
An ETL model designed using Postgres SQL for Sparkify database 🗄, modeling user activity data to create a database and ETL pipeline🔀 for a music streaming app 🎼.
Udacity Data Engineer Nanodegree: Project Data Lake
Project: Data Modeling with Cassandra
Add a description, image, and links to the sparkify topic page so that developers can more easily learn about it.
To associate your repository with the sparkify topic, visit your repo's landing page and select "manage topics."