Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review comments on figure captions #148

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion paper/figures/code/example_objects.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
from deepbench.image import SkyImage, ShapeImage
from deepbench.physics_object import HamiltonianPendulum, Pendulum
import matplotlib.pyplot as plt
import numpy as np

Expand Down
16 changes: 9 additions & 7 deletions paper/figures/code/example_pendulums.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,9 @@
# Plot that against the time and with scatter and line options
pendulum_noiseless = pendulum.create_object(time, noiseless=True)
subplots[0].plot(time, pendulum_noiseless, color="black")
subplots[0].scatter(time, pendulum_noiseless, color="black", label="Noiseless")
subplots[0].scatter(
time, pendulum_noiseless, color="black", label="Noiseless", marker=">"
)

# Use the noiseless=False to do the same with a noiseless pendulum
pendulum_noisy = pendulum.create_object(time, noiseless=False)
Expand All @@ -46,22 +48,22 @@
},
)

# Cacluate the pendulum positions and engeries
# Calculate the pendulum positions and energies
pendulum_data = pendulum.create_object(time)

# Plot the line and scatterplot versions of the position wrt time
subplots[1].plot(pendulum_data[4], pendulum_data[0], color="black")
subplots[1].scatter(
pendulum_data[4], pendulum_data[0], color="black", label="Noiseless"
pendulum_data[4], pendulum_data[0], color="black", label="Noiseless", marker=">"
)

# Repeat the process with the noisely pendulum
# Repeat the process with the noisy pendulum
pendulum = HamiltonianPendulum(
pendulum_arm_length=10.0,
starting_angle_radians=np.pi / 4,
acceleration_due_to_gravity=9.8,
noise_std_percent={
"pendulum_arm_length": 0.2,
"pendulum_arm_length": 0.1,
"starting_angle_radians": 0.0,
"acceleration_due_to_gravity": 0.0,
},
Expand All @@ -81,9 +83,9 @@
# plot.set(xticks=[], yticks=[])

plot.set_xlabel("Time (s)")
plot.set_ylabel("X Position")
plot.set_ylabel("X Position (m)")

# Assign legend location
subplots[1].legend(loc="center left", bbox_to_anchor=(1.02, 1))

plt.savefig("../pendulums.png")
plt.savefig("./pendulums.png")
Binary file modified paper/figures/pendulums.png
100644 → 100755
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
32 changes: 16 additions & 16 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ authors:
- name: M. Voetberg
orcid: 0009-0005-2715-4709
equal-contrib: true
affiliation: "1"
- name: Ashia Livaudais
affiliation: "1"
- name: Ashia Livaudais
orcid: 0000-0003-3734-335X
equal-contrib: true
affiliation: "1"
affiliation: "1"
- name: Becky Nevin
orcid: 0000-0003-1056-8401
equal-contrib: false
Expand Down Expand Up @@ -49,7 +49,7 @@ There are multiple open problems and issues that are critical for the machine le

The physical sciences community lacks sufficient datasets and software as benchmarks for the development of statistical and machine learning models. In particular, there currently does not exist simulation software that generates data underpinned by physical principles and that satisfies the following criteria:

* multi-domain
* multi-domain
* multi-purpose
* fast
* reproducible
Expand All @@ -60,13 +60,13 @@ The physical sciences community lacks sufficient datasets and software as benchm

## Related Work

There are many benchmarks -- datasets and simulation software -- widely used for model building in machine learning, statistics, and the physical sciences.
There are many benchmarks -- datasets and simulation software -- widely used for model building in machine learning, statistics, and the physical sciences.
First, benchmark datasets of natural images include MNIST `[@dengMnistDatabaseHandwritten2012c]`, CIFAR `[@krizhevskyCIFAR10CanadianInstitute2017a]`, Imagenet `[@russakovskyImageNetLargeScale2015a]`. Second, there are several large astronomical observation datasets -- CfA Redshift Survey `[@huchraSurveyGalaxyRedshifts1983]`, Sloan Digital Sky Survey `[@yorkSloanDigitalSky2000]`, and Dark Energy Survey `[@abbottDARKENERGYSURVEY]`. Third, many n-body cosmology simulation data sets serve as benchmarks -- e.g., Millennium `[@springelCosmologicalSimulationCode2005]`, Illustris `[@vogelsbergerIntroducingIllustrisProject2014b]`, EAGLE `[@schayeEAGLEProjectSimulating2015]`, Coyote `[@heitmannCoyoteUniversePrecision2010]`, Bolshoi `[@klypinDARKMATTERHALOS2011]`, CAMELS `[@villaescusa-navarroCAMELSProjectCosmology2021]`, Quijote `[@villaescusa-navarroQuijoteSimulations2020]`. Fourth, there have been multiple astronomy data set challenges that can be considered benchmarks for analysis comparison -- e.g., PLAsTiCC `[@hlozekResultsPhotometricLSST2020a]`, The Great08 Challenge `[@bridleHandbookGREAT08Challenge2009a]`, and the Strong Gravitational Lens Challenge `[@metcalfStrongGravitationalLens2019c]`. Fifth, there are multiple software that generate simulated data for astronomy and cosmology -- e.g., astropy `[@theastropycollaborationAstropyCommunityPython2013a]`, galsim `[@roweGalSimModularGalaxy2015]`, lenstronomy `[@birrerLenstronomyMultipurposeGravitational2018a]`, deeplenstronomy `[@morganDeeplenstronomyDatasetSimulation2021a]`, CAMB `[@CAMBInfo, @lewisEfficientComputationCMB2000]`, Pixell `[@WelcomePixellDocumentation]`, SOXs `[@SOXSSimulatedObservations]`. Finally, particle physics projects use standard codebases for simulations -- e.g., GEANT `[@piaGeant4ScientificLiterature2009]`, GENIE `[@andreopoulosGENIENeutrinoMonte2015]`, and PYTHIA `[@sjostrandPYTHIAEventGenerator2020]`. These simulations span wide ranges in speed, code complexity, and physical fidelity and detail. Unfortunately, these data and software lack a combination of critical features, including mechanistic models, speed, reproducibility, which are needed for more fundamental studies of statistical and machine learning models. The work in this paper is most closely related to SHAPES `[@wuVisualQuestionAnswering2016a]` because that work also uses collections of geometric objects with varying levels of complexity as a benchmark.




# DeepBench Software
# DeepBench Software

The **DeepBench** software simulates data for analysis tasks that require precise numerical calculations. First, the simulation models are fundamentally mechanistic -- based on relatively simple analytic mathematical expressions, which are physically meaningful. This means that for each model, the number of input parameters that determine a simulation output is small (<$10$ for most models). These elements make the software fast and the outputs interpretable -- conceptually and mathematically relatable to the inputs. Second, **DeepBench** also includes methods to precisely prescribe noise for inputs, which are propagated to outputs. This permits studies and the development of statistical inference models that require uncertainty quantification, which is a significant challenge in modern machine learning research. Third, the software framework includes features that permit a high degree of reproducibility: e.g., random seeds at every key stage of input, a unique identification tag for each simulation run, tracking and storage of metadata (including input parameters) and the related outputs. Fourth, the primary user interface is a YAML configuration file, which allows the user to specify every aspect of the simulation -- e.g., types of objects, numbers of objects, noise type, and number of classes. This feature -- which is especially useful when building and studying complex models like deep learning neural networks -- permits the user to incrementally decrease or increase the complexity of the simulation with a high level of granularity.

Expand All @@ -83,25 +83,25 @@ The **DeepBench** software simulates data for analysis tasks that require precis
* Readily extensible to new physics and outputs


# Primary Modules
# Primary Modules

* Geometry objects: two-dimensional images generated with `matplotlib` `[@hunterMatplotlib2DGraphics2007b]`. The shapes include $N$-sided polygons, arcs, straight lines, and ellipses. They are solid, filled or unfilled two-dimensional shapes with edges of variable thickness.
* Physics objects: one-dimensional profiles for two types of implementations of pendulum dynamics: one using Newtonian physics, the other using Hamiltonian.
* Astronomy objects: two-dimensional images generated based on radial profiles of typical astronomical objects. The star object is created using the Moffat distribution provided by the AstroPy `[@theastropycollaborationAstropyCommunityPython2013a]` library. The spiral galaxy object is created with the function used to produce a logarithmic spiral `[@ringermacherNewFormulaDescribing2009a]`. The elliptical Galaxy object is created using the Sérsic profile provided by the AstroPy library. Two-dimensional models are representations of astronomical objects commonly found in data sets used for galaxy morphology classification.
* Image: two-dimensional images that are combinations and/or concatenations of Geometry or Astronomy objects. The combined images are within `matplotlib` meshgrid objects. Sky images are composed of any combination of Astronomy objects, while geometric images comprise individual geometric shape objects.
* Collection: Provides a framework for producing module images or objects at once and storing all parameters that were included in their generation, including exact noise levels, object hyper-parameters, and non-specified defaults.
* Geometry objects: two-dimensional images generated with `matplotlib` `[@hunterMatplotlib2DGraphics2007b]`. The shapes include $N$-sided polygons, arcs, straight lines, and ellipses. They are solid, filled or unfilled two-dimensional shapes with edges of variable thickness.
* Physics objects: one-dimensional profiles for two types of implementations of pendulum dynamics: one using Newtonian physics, the other using Hamiltonian.
* Astronomy objects: two-dimensional images generated based on radial profiles of typical astronomical objects. The star object is created using the Moffat distribution provided by the AstroPy `[@theastropycollaborationAstropyCommunityPython2013a]` library. The spiral galaxy object is created with the function used to produce a logarithmic spiral `[@ringermacherNewFormulaDescribing2009a]`. The elliptical Galaxy object is created using the Sérsic profile provided by the AstroPy library. Two-dimensional models are representations of astronomical objects commonly found in data sets used for galaxy morphology classification.
* Image: two-dimensional images that are combinations and/or concatenations of Geometry or Astronomy objects. The combined images are within `matplotlib` meshgrid objects. Sky images are composed of any combination of Astronomy objects, while geometric images comprise individual geometric shape objects.
* Collection: Provides a framework for producing module images or objects at once and storing all parameters that were included in their generation, including exact noise levels, object hyper-parameters, and non-specified defaults.


All objects also come with the option to add noise to each object. For Physics objects -- i.e., the pendulum -- the user may add Gaussian noise to parameters: initial angle $\theta_0$, the pendulum length $L$, the gravitational acceleration $g$, the planet properties $\Phi = (M/r^2)$, and Newton's gravity constant $G$. Note that $g = G * \Phi = G * M/r^2$: all parameters in that relationship can receive noise. For Astronomy and Geometry Objects, which are images, the user can add Poisson or Gaussian noise to the output images. Finally, the user can regenerate the same noise using the saved random seed.


# Example Outputs
# Example Outputs

![Example outputs of **DeepBench**, containing shapes, and astronomy objects. Variants include a single object, a noisy single object, two objects, and two noisy objects.](figures/example_objects.png)
![Example outputs of **DeepBench**, containing geometric and astronomy objects. Variants include a single object, a noisy single object, two objects, and two noisy objects. The geometric outputs are produced with filled ellipses and outlined rectangles, with a gaussian noise overlay for the noisy variants. The astronomy outputs feature a star and an elliptical galaxy profile with similarly applied noise.](figures/example_objects.png)

![Example physics simulations from **DeepBench**. Pendulums show noisy and non-noisy variants of the Newtonian (left) and Hamiltonian (right) mathematical simulations.](figures/pendulums.png)
![Example physics simulations from **DeepBench**. Pendulums show noisy and noiseless variants of the Newtonian (left) and Hamiltonian (right) mathematical simulations. Both use initial conditions of an arm length of 10 meters and a starting angle of $\pi/4$. The noisy variants introduce uncertainty to these input parameters, along with the measurement of acceleration due to gravity.](figures/pendulums.png)

# Acknowledgements
# Acknowledgement

*M. Voetberg*: conceptualization, methodology, software, writing, project administration. *Ashia Livaudais*: conceptualization, methodology, software, writing, project administration. *Becky Nevin*: software, project administration. *Omari Paul*: software. *Brian Nord*: conceptualization, methodology, project administration, funding acquisition, supervision, writing.

Expand Down