Skip to content

sundy1994/Yuxuan-Zhang-Portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 

Repository files navigation

Yuxuan Zhang Portfolio

Overview

  • Analyzed 1.5 million accident cases using Python and predicted the severity of accidents based on time, location, weather and traffic condition.
  • Visualized the number and severity of accidents cases dynamically on interactive maps with plotly and folium package.
  • Utilized unsupervised machine learning such as Principle Component Analysis (PCA), k-means clustering to reduce dimension and visualize the accident data.
  • Implemented and evaluated machine learning models including Decision Tree, Random Forest model, Naïve Bayes classifier and Logistic Regression and reach a testing accuracy of 89% by stacking several models.

Key Words: Python, Pandas, Data wrangling, EDA, Data visualization, Machine Learning, Classification

States with most accident cases

An interactive density map displaying severity of accidents in Philadelphia


Overview

  • Designed a 4-page interactive website with Node/React JS and HTML to help viewers discover new players, teams as well as matches in NBA.
  • Constructed a relational database from several large datasets and operated it in Relational Database Service (RDS) of Amazon Web Service (AWS).
  • Programmed complex SQL queries to guarantee users using advanced filtering to search for in-depth information
  • Established a team-building function allowing users to build their own NBA team of 5 with total payroll and players’ photos showing on top of it.

Key Words: SQL, JAVA Script, Web Application, Relational Database, AWS, Data Wrangling, Query Optimization

Team-building

Team Page


Overview

  • Implemented the CoAtNet Convolutional Neural Network (CNN) architecture to classify thousands of Chest X-Ray images from Covid-19, normal and pneumonia patients.
  • Constructed the model based on TensorFlow and sklearn in Google Colab and AWS SageMaker and measured the performance by learning curves, confusion matrix as well as testing error.
  • Improved testing accuracy to 89% and reduced overfitting comparing to the baseline ResNet model.

Key Words: TensorFlow, Machine Learning, Deep Learning, Convolutional Neural Network, Image Classification, CoAtNet, AWS Sagemaker

Learning Curve


Overview

  • Created a dataset suitable for model-fitting to investigate the relationship between early life factors and adolescent wellbeing
  • Formulated a standard label from published studies to assess adolescent social and psychological wellbeing
  • Implemented and tested a variety of machine learning models in R, such as Linear Regression and Logistic Regression with LASSO regularization, Random Forest and Baggings
  • Visualized a decision tree from the Random Forest model that displays interpretable relationships between early predictors of child wellbeing

Key Words: R, dplyr, Data wrangling, EDA, Machine Learning, Random Forest

About

Important Data Science Projects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published