Project for Johns Hopkins Bloomberg School of Public Health 140.644.01 - Statistical Machine Learning
Full assignment instructions are found in the Final_Project_2022.html file.
"Problem: For parcipants 50 years and older, build a model to predict the mortality status, mortstat. You should explain all the steps. Please use both sensitivity and specificity to choose the best model. Explain which of the two performance measures makes more sense."
This project was completed in R / R Markdown. Files include:
- ML_final_Eryn_Yuasa.pdf: PDF of final project with code, comments, and major findings.
- ML_final.Rmd: R Markdown file to build the pdf version
- nhanes2003-2004.Rda: R datafile with necessary dataset
I broke my final project up into three sections: 1) Exploratory Data Analysis and Data Cleaning, 2) Modeling, and 3) Model Selection and Analysis. From the NHANES 2003-2004 dataset with the outlined specifications from the project assignment, I considered 10 different models and listed notes, sensitivity, and specifity of all models to predict mortality status. From all the models considered, the Logistic Regression Model with 10 varaibles chosen from Best Subset Selection, was preferable to advance with.