Thanks for considering contributing to MegaBlocks!
Issues tagged with good first issue are great options to start contributing.
If you have questions, join us on Slack -- we'll be happy to help you!
We welcome contributions for bug fixes, new efficient methods you'd like to contribute to the community, or new models and datasets!
To set up the development environment in your local box, run the commands below.
1. Install the dependencies needed for testing and linting the code:
pip install -e '.[all]'
2. Configure pre-commit, which automatically formats code before each commit:
pre-commit install
To submit a contribution:
1. Fork a copy of the MegaBlocks library to your own account.
2. Clone your fork locally and add the megablocks repo as a remote repository:
git clone [email protected]:<github_id>/megablocks.git
cd megablocks
git remote add upstream https://github.com/databricks/megablocks.git
3. Create a branch and make your proposed changes.
git checkout -b cool-new-feature
4. When you are ready, submit a pull request into the megablocks repository!
We have some rough guidelines that will make your PR easier to review and more likely to get smoothly merged. Please don't let uncertainty or difficulty with any of these things stop you from opening a PR! We are happy to help you through them :)
- Self-contained title and description. Please include a concise title and clear PR description. The title should allow someone to understand what the PR changes or does at a glance. The description should allow someone to understand the contents of the PR without looking at the code.
- If the PR affects output that is displayed to a user of MegaBlocks (e.g. console logging or experiment tracker reporting), please include screenshots showing what the new output looks like. UX is important!
- Include tests. If you are fixing a bug, please add a test that would've caught the bug. If you are adding a new feature, please add unit tests that test the various components of the feature, and also a test that tests the full functionality of the feature.
- Please consider whether your changes affect the example notebooks or large parts of the code base, and run the daily tests locally if so (
pytest -m 'daily and not remote and not gpu and not vision and not doctest'
) pre-commit
should help you handle formatting and type checking, but please do make sure you have it installed as described above.
MegaBlocks uses pytest-codeblocks to test all example code snippets. The pytest-codeblocks repository explains how to annotate code snippets, which supports most pytest
configurations. For example, if a test requires model training, the GPU mark (<!--pytest.mark.skip-->
) should be applied.
To test your changes locally, run:
make test
# run CPU testsmake test-gpu
# run GPU testscd docs && make doctest
# run doctests
Some of our checks test distributed training as well. To test these, run:
make test-dist WORLD_SIZE=2
# run 2-cpu distributed testsmake test-dist-gpu WORLD_SIZE=2
# run 2-gpu distributed tests
These tests run with the composer
launcher. We also support WORLD_SIZE=1
, which would run the tests with the composer
launcher on a single device.
See the Makefile for more information.
If you want to run pre-commit hooks manually, which check for code formatting and type annotations, run pre-commit run --all-files
To run the tests in the provided docker containers:
docker pull mosaicml/composer
(or an alternative image likemosaicml/composer:latest_cpu
)docker run --rm -v ./:/composer --user $(id -u):$(id -g) -it mosaicml/composer
- from inside the container
cd /megablocks
pip install -e .
pytest <args>
ormake <args>
to run the desired tests
See the MegaBlocks Style Guide for guidelines on how to structure and format your code.
MegaBlocks aims to annotate all functions with type annotations (introduced in PEP 526). Don't worry if you are not a Python typing expert; put in the pull request, and we'll help you with getting the code into shape.