Domino experiment management leverages MLflow Tracking to enable easy logging of experiment parameters, metrics, and artifacts, while providing a Domino-native user experience to help you analyze your results. MLflow runs as a service in your Domino cluster, fully integrated within your workspace and jobs, and honoring role-based access control. Existing MLflow experiments works right out of the box with no code changes required.
This repository holds the code and intructions that can be used to demonstrate this new capability which is available in private preview in Domino 5.4 (public preview to be released in Domino 5.5).
ML Flow is included in the Domino Standard Environments, which means experiment tracking capabilities will work right out of the box. However, this example demonstrates how to use this capability with PyTorch Lightning so you will need to ensure that your environment has the PyTorch libriaries installed. If you do not already have an environment with PyTorch, follow these instructions to create one.
Start by logging an experiment run from a workspace.
-
Create a workspace using your PyTorch Lightning environment.
-
Clone this repo into the directory and run the
train.py
script.
git clone https://github.com/ddl-jwu/experiment-management
python experiment-management/train.py
-
Navigate to the
Experiments
page and select themnist
experiment -
Click into the most recent run to monitor and evaluate the results that were just logged.
As your experiment matures, we recommend logging your experiments through jobs (rather than workspaces) to guarantee reproducibility:
-
Make sure your changes in the workspace are committed and start a new job from the Domino UI
-
In the
File Name or Command
section, run the training script that was cloned into the workspace earlier.
python experiment-management/train.py
-
Make sure your PyTorch Lightning environment is selected.
-
Click
Start
to begin the job. -
Follow the same steps as above to monitor and evaluate the results.
-
Navigate to the
Environments
page in the Domino UI -
Click
Create Environment
. -
Name your new environment.
-
In the Base Environment / Image section, select
Start from a custom base image
. -
In the FROM line, enter
quay.io/domino/compute-environment-images:latest
. -
Set the environment’s visbility.
-
Click
Customize Before Building
-
In the
Dockerfile Fnstructions
section, add:
RUN pip install torchvision==0.14.1 torch==1.13.1 pytorch-lightning==1.9.0 protobuf==4.21.12 --user
- In the
Pluggable Workspace Tools
section, add:
jupyter:
title: "Jupyter (Python, R, Julia)"
iconUrl: "/assets/images/workspace-logos/Jupyter.svg"
start: [ "/opt/domino/workspaces/jupyter/start" ]
supportedFileExtensions: [ ".ipynb" ]
httpProxy:
port: 8888
rewrite: false
internalPath: "/{{ownerUsername}}/{{projectName}}/{{sessionPathComponent}}/{{runId}}/{{#if pathToOpen}}tree/{{pathToOpen}}{{/if}}"
requireSubdomain: false
jupyterlab:
title: "JupyterLab"
iconUrl: "/assets/images/workspace-logos/jupyterlab.svg"
start: [ "/opt/domino/workspaces/jupyterlab/start" ]
httpProxy:
internalPath: "/{{ownerUsername}}/{{projectName}}/{{sessionPathComponent}}/{{runId}}/{{#if pathToOpen}}tree/{{pathToOpen}}{{/if}}"
port: 8888
rewrite: false
requireSubdomain: false
vscode:
title: "vscode"
iconUrl: "/assets/images/workspace-logos/vscode.svg"
start: [ "/opt/domino/workspaces/vscode/start" ]
httpProxy:
port: 8888
requireSubdomain: false
rstudio:
title: "RStudio"
iconUrl: "/assets/images/workspace-logos/Rstudio.svg"
start: [ "/opt/domino/workspaces/rstudio/start" ]
httpProxy:
port: 8888
requireSubdomain: false
- Click
Build
to create environment.