This is the course work done to complete the Data Science Career Track at Springboard.
Springboard is an online intensive full time data science program:
- 600+ hours of curriculum, including video, articles and hands-on projects.
- Developed and continuously updated with and by industry experts, to teach in-demand skills
- Capstone Projects
- Curriculum covers data wrangling, data storytelling, inferential statistics, data visualization, machine learning and big data
- Weekly 1-on-1 mentorship by industry experts
- Mentor matched to student profile and goals delivering relevant insight from the industry
- All mentors are active data scientists at top technology companies
- 20% acceptance rate; we accept only the best applicants
The goal is to cover core data science fundamentals to get students job ready
Need to install the following Modules for each Unit
JSON Based Data Exercise | pandas, json, numpy |
SQL Practice | Springboard SQL Website |
API Mini-Project | requests, json, pandas, statistics |
Frequentist Statistics | scipy, numpy, pandas, nupmy.random, matplotlib.pyplot |
Bootstrap Statistics | pandas, numpy, numpy.random, matplotlib.pyplot |
Bayesian Inference | pymc3, pandas, numpy, numpy.random, matplotlib.pyplot, scipy |
Linear Regression Boston Housing Data Set | numpy, pandas, scipy, matplotlib, sklearn, seaborn |
Heights and Weights Logistic Regression | numpy, scipy, matplotlib, pandas, seaborn, sklearn, warnings |
Predicting Movie Ratings from Reviews Using Naive Bayes | glob, numpy, scipy, matplotlib, pandas, seaborn, six.moves |
Customer Segmentation Using Clustering | pandas, sklearn, matplotlib, seaborn |
Find 2-3 Job Titles WordCloud | wordcloud, re, string, collections, nltk, bokeh |
Spark Mini-Project DataBricks | Need to register on databricks, pyspark and spark sql |
Take-Home Challenge Ultimate | pandas, json, plotly, bokeh, seaborn, matplotlib, numpy, sklearn, datetime, sklearn, xgboost, keras |
Take Home Challenge Relax | glob, numpy, datetime, tqdm, collections, seaborn, bokeh, sklearn, xgboost |
This World Bank dataset to practice data wrangling for school quality improvement project in Ethiopia.
Use Springboards SQL to wrangle data.
Using Quandl to analyze stock prices of Frankfurt Stock Exchange.
In part A teaches about z-statistic, t-statistic and the central limit theorem
Part B goes into Hospital Medical charges.
Uses the same Medical charge dataset but uses bootstrap method instead.
Same Medical charge dataset but using bayesian statistics.
This is a very quick run-through of some basic statistical concepts, adapted from Lab 4 in Harvard's CS109 course.
* Linear Regression Models
* Prediction using linear regression
Logistic Regression Exercise mini project from lab5 CS109.
Movie reviews using subset of rotten tomatoes data to analyze basic text.
Marketing and newsletter/email campaign to try to cluster the customers into different groups. Unsupervised learning.
Exercise to use word cloud visualization on job postings to see what they want in candidates.
Learning to use Spark to handle huge dataset. Dealing with job types and payroll.
Company is basically like UBER/ Lyft. Tried to figure out how to keep customers and turn more basic memberships to premium memberships.
Try to predict which users would continue on to be active users. The dataset is similar to the slack or online workspace companies.
- Justin Huang