Skip to content

Latest commit

 

History

History
16 lines (12 loc) · 1.36 KB

readme.md

File metadata and controls

16 lines (12 loc) · 1.36 KB

Final Project: Building Machine Learning Models for Analyzing NHANES Mortality Data - Eryn Yuasa

Project for Johns Hopkins Bloomberg School of Public Health 140.644.01 - Statistical Machine Learning

Assignment

Full assignment instructions are found in the Final_Project_2022.html file.
"Problem: For parcipants 50 years and older, build a model to predict the mortality status, mortstat. You should explain all the steps. Please use both sensitivity and specificity to choose the best model. Explain which of the two performance measures makes more sense."

File Structure

This project was completed in R / R Markdown. Files include:

  • ML_final_Eryn_Yuasa.pdf: PDF of final project with code, comments, and major findings.
  • ML_final.Rmd: R Markdown file to build the pdf version
  • nhanes2003-2004.Rda: R datafile with necessary dataset

Overview of Analysis

I broke my final project up into three sections: 1) Exploratory Data Analysis and Data Cleaning, 2) Modeling, and 3) Model Selection and Analysis. From the NHANES 2003-2004 dataset with the outlined specifications from the project assignment, I considered 10 different models and listed notes, sensitivity, and specifity of all models to predict mortality status. From all the models considered, the Logistic Regression Model with 10 varaibles chosen from Best Subset Selection, was preferable to advance with.