Most of the materials in this class are derived from notebooks and activities developed by Ben Farr for his scientific computing class at the University of Oregon and by Stephen Taylor for his Astrostatistics class at Vanderbilt University.
This is a collection of notebooks and data, which will be added to throughout the term.
- Explore data on birth rates in the US: Birth Data Exploration.ipynb
- Start our introduction to sampling: Intro to Sampling.ipynb
- Wrap up our introduction to sampling: Intro to Sampling.ipynb
- Introduce Gaia data and explore the solar neighborhood: Solar Neighborhood w Gaia.ipynb
- Wrap up Intro to Sampling.ipynb
- Recall the Boltzmann distribution, use Ising model to understand the behavior of a ferromagnet at finite temperature: Boltzmann and Ising (and some Metropolis).ipynb
- We made a quick-and-dirty observational HR diagram from our Gaia data set Solar Neighborhood w Gaia.ipynb
- Introduce JAX and NumPyro for probabilistic model building and inference: Intro to NumPyro.ipynb
- Employ a mixture model in NumPyro to account for outliers in linear regression: Modeling Outliers w NumPyro.ipynb.
- Build a sequentially more complex model with
NumPyro
to make inferences from CO2 concentrations in Mauna Loa, Hawaii: CO2 w NumPyro.ipynb. - Introduce concepts and vocabulary for machine learning: Intro to Machine Learning (w Gaia).ipynb
- Introduce logistic regression: Logistic Regression.ipynb
- Build and train a logistic regression classifier to identify quasars in SDSS observations: Logistic Regression w SDSS.ipynb
- Extend our use of logistic regression to handle more than two classes: Multiclass Classification.ipynb
- Wrap up our multi-class classification work from Week 6
- Introduce neural networks: Intro to Neural Networks.ipynb
- Introduce
Flax
and use a dense neural network layer to perform linear regression: Intro to Flax.ipynb - Use
Flax
to construct a dense neural network for classifying handwritten digits: Dense Neural Network on MNIST Digits.ipynb - Dense neural network classifier on M4: Dense Neural Network Classifier for M4.ipynb
- Intro to CNNs: Intro to CNNs.ipynb
- Volcanoes on Venus: VenusVolcanoes.ipynb
- Intro to Signal Processing: Signal Processing.ipynb
- Filters, Welch Method and Pulsars: Filters, Welch method, and Pulsars.ipynb
- Notch filters, Gaussian noise, and LIGO data: Notch filters, Gaussian noise, and LIGO data.ipynb
- Inferring black hole properties from gravitational wave observations: Calculating a Posterior Probability Density Function.ipynb
US Birth data from the Social Security Administration, prepared by FiveThirtyEight.
This data can be with a wget command:
mkdir -p ../data
wget -qO ../data/US_births_2000-2014_SSA.csv https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_2000-2014_SSA.csv
We will use the Gaia DR3 data release to explore the solar neighborhood. The data is available from the Gaia Archive. We will use the following query to get the data:
SELECT TOP 300000 phot_g_mean_mag+5*log10(parallax)-10 AS mg, bp_rp, parallax FROM gaiadr3.gaia_source
WHERE parallax_over_error > 10
AND parallax > 10
AND phot_g_mean_flux_over_error>50
AND phot_rp_mean_flux_over_error>20
AND phot_bp_mean_flux_over_error>20
AND phot_bp_rp_excess_factor < 1.3+0.06*power(phot_bp_mean_mag-phot_rp_mean_mag,2)
AND phot_bp_rp_excess_factor > 1.0+0.015*power(phot_bp_mean_mag-phot_rp_mean_mag,2)
AND visibility_periods_used>8
AND astrometric_chi2_al/(astrometric_n_good_obs_al-5)<1.44*greatest(1,exp(-0.4*(phot_g_mean_mag-19.5)))
This data accompanies Hogg, Bovy, and Lang (2010). It can be downloaded directly with
!wget -o ../data/data_yerr.dat https://raw.githubusercontent.com/davidwhogg/DataAnalysisRecipes/master/straightline/src/data_yerr.dat
Monthy-averaged CO2 concentrations measured in Mauna Loa, Hawaii, hosted by the NOAA:
!wget -q ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt -O ../data/co2_mm_mlo.txt
To introduce logistic regression we make use of some data used by Jordi Warmenhoven in their Coursera Machine Learning course.
!wget https://raw.githubusercontent.com/JWarmenhoven/Coursera-Machine-Learning/master/notebooks/data/ex2data1.txt -O ../data/ex2data1.txt
!wget https://raw.githubusercontent.com/JWarmenhoven/Coursera-Machine-Learning/master/notebooks/data/ex2data2.txt -O ../data/ex2data2.txt
This is data collected by the Sloan Digital Sky Survey (SDSS) relating to quasars. The catalogs we'll be using are part of PSU's astrostatistics data sets. We need three separate files, separated by spectroscopically confirmed classifications.
Spectroscopically confirmed stars:
!wget -q --no-check-certificate -O ../data/SDSS_stars.csv https://astrostatistics.psu.edu/MSMA/datasets/SDSS_stars.csv
white dwarfs:
!wget -q --no-check-certificate -O ../data/SDSS_wd.csv https://astrostatistics.psu.edu/MSMA/datasets/SDSS_wd.csv
and quasars:
!wget -q --no-check-certificate -O ../data/SDSS_quasar.dat https://astrostatistics.psu.edu/datasets/SDSS_quasar.dat
More info on the dataset can be found here.
We make use of two separate data products from the Gaia collaboration. First is a cluster catalog here, which is associated with this paper looking at the kinematics of many globular clusters. The full data release associated with the paper can be found here, and includes tables of members identified for each cluster they studied. This can be downloaded directly with:
wget http://cdsarc.u-strasbg.fr/ftp/J/A+A/616/A12/files/NGC6121-1.dat -O ../data/NGC6121-1.dat
Second, we use m4_gaia_source.csv
, which was pulled from the Gaia data archive with the following query:
SELECT TOP 1000000 gaia_source.designation,gaia_source.source_id,gaia_source.ra,gaia_source.dec,gaia_source.parallax,gaia_source.parallax_error,gaia_source.parallax_over_error,gaia_source.pm,gaia_source.pmra,gaia_source.pmra_error,gaia_source.pmdec,gaia_source.pmdec_error,gaia_source.astrometric_n_good_obs_al,gaia_source.astrometric_chi2_al,gaia_source.visibility_periods_used,gaia_source.phot_g_mean_flux_over_error,gaia_source.phot_g_mean_mag,gaia_source.phot_bp_mean_flux_over_error,gaia_source.phot_bp_mean_mag,gaia_source.phot_rp_mean_flux_over_error,gaia_source.phot_rp_mean_mag,gaia_source.phot_bp_rp_excess_factor,gaia_source.bp_rp,gaia_source.radial_velocity,gaia_source.radial_velocity_error
FROM gaiadr3.gaia_source
WHERE
CONTAINS(
POINT('ICRS',gaiadr3.gaia_source.ra,gaiadr3.gaia_source.dec),
BOX('ICRS',246,-26.5,3,3)
)=1