- Analyzed 1.5 million accident cases using Python and predicted the severity of accidents based on time, location, weather and traffic condition.
- Visualized the number and severity of accidents cases dynamically on interactive maps with plotly and folium package.
- Utilized unsupervised machine learning such as Principle Component Analysis (PCA), k-means clustering to reduce dimension and visualize the accident data.
- Implemented and evaluated machine learning models including Decision Tree, Random Forest model, Naïve Bayes classifier and Logistic Regression and reach a testing accuracy of 89% by stacking several models.
Key Words: Python, Pandas, Data wrangling, EDA, Data visualization, Machine Learning, Classification
- Designed a 4-page interactive website with Node/React JS and HTML to help viewers discover new players, teams as well as matches in NBA.
- Constructed a relational database from several large datasets and operated it in Relational Database Service (RDS) of Amazon Web Service (AWS).
- Programmed complex SQL queries to guarantee users using advanced filtering to search for in-depth information
- Established a team-building function allowing users to build their own NBA team of 5 with total payroll and players’ photos showing on top of it.
Key Words: SQL, JAVA Script, Web Application, Relational Database, AWS, Data Wrangling, Query Optimization
- Implemented the CoAtNet Convolutional Neural Network (CNN) architecture to classify thousands of Chest X-Ray images from Covid-19, normal and pneumonia patients.
- Constructed the model based on TensorFlow and sklearn in Google Colab and AWS SageMaker and measured the performance by learning curves, confusion matrix as well as testing error.
- Improved testing accuracy to 89% and reduced overfitting comparing to the baseline ResNet model.
Key Words: TensorFlow, Machine Learning, Deep Learning, Convolutional Neural Network, Image Classification, CoAtNet, AWS Sagemaker
- Created a dataset suitable for model-fitting to investigate the relationship between early life factors and adolescent wellbeing
- Formulated a standard label from published studies to assess adolescent social and psychological wellbeing
- Implemented and tested a variety of machine learning models in R, such as Linear Regression and Logistic Regression with LASSO regularization, Random Forest and Baggings
- Visualized a decision tree from the Random Forest model that displays interpretable relationships between early predictors of child wellbeing
Key Words: R, dplyr, Data wrangling, EDA, Machine Learning, Random Forest