- Familiarity with Python programming language
- Basic understanding of machine learning concepts
-
1. Data Cleaning
- Setting up local & remote repository using GitHub
- Data Cleaning using Numpy and Pandas best practices
-
2. Exploratory Data Analysis
- Understanding the workflow of systematically analyzing datasets
- Understanding the various plots, statistical measures and hypothesis tests to analyze datasets
- Exploring a custom EDA module for convenience and significantly reduce complexity of analyzing datasets
- Performing in-depth analysis of various kinds of numeric, categorical and date-time variables
- Leveraging statistical measures, hypothesis tests, and univariate, bivariate and multivariate plots
-
3. Feature Engineering and Data Preprocessing
- Understanding feature engineering teachniques for different types of variables
- Creating scikit-learn compatible custom classes and functions
- Using advanced scikit-learn features for feature engineering and data preprocessing such as:
- Pipeline
- Feature Union
- Function Transformer
- Column Transformer
-
4. Model Training and Deployment
- Training and Tuning a machine learning model on SageMaker
- Using S3 buckets for storage and EC2 for computing purposes
- Creating a web application from scratch and deploying over cloud using Streamlit