date | duration | maintainer | order | title |
---|---|---|---|---|
w02d02 |
90 |
ultimatist |
2 |
Linear Regression Code Intro |
At the end of this notebook the students should:
- Be able to visualize data
- Look for correlations and multicollinearity
- Understand how linear regression models work
- Interpret basic regression statistics like R^2
- Do basic feature engineering and selection to improve models
Linear Regression Theory Intro
- Students should have a foundational understanding of linear regression prior to attempting this lab
- Pickling
The goal of this notebook is to guide students through implementation of linear regression modeling. Prior theory understanding is key; still, this lab should take 90 minutes and students should attempt all exercises in the car price predictor student section.
With this notebook students will:
Be able to create linear regression in:
- statsmodels: a package mainly best at doing regressions with traditional R formula syntax
- scikit-learn: This is the main machine learning package we'll be using throughout the course. It has a multitude of machine learning algorithms and helpful machine learning pipeline tools. sklearn has a tremendous amount of functionality, to get the most out of this course it will help to really explore the depth of the documentation on your own and watch as you understand more and more of the functionality as the course progresses.
Gain familiarity with the following:
- R formulas: R formulas are a convenient way for encapsulating functional relationships for regressions
- seaborn: We'll use seaborn for visualization as we go along
- Variable Preprocessing and Polynomial Regression with scikit-learn: We'll be "standardizing" or "normalizing" many of our variables to yield better model data. We'll show how the "linear" models can be extended to basically any type of function by using functions of the different fields as the inputs to the linear model.
conda install pandas numpy statsmodels seaborn scikit-learn