Skip to content

Commit

Permalink
Merge branch 'main' into zloss
Browse files Browse the repository at this point in the history
  • Loading branch information
Muennighoff authored Sep 7, 2024
2 parents e430ad7 + 66d7894 commit 15be34e
Show file tree
Hide file tree
Showing 98 changed files with 3,312 additions and 1,805 deletions.
29 changes: 29 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# What does this PR do?

<!--
Please briefly describe your change, including what problem the change fixes, and any context
necessary for understanding the change
-->

# What issue(s) does this change relate to?

<!--
Please include any issues related to this pull request, including 'Fixes' if the issue is resolved
by this pull request.
Example:
- Fixes #42
- Related to #1234
-->

# Before submitting
- [ ] Have you read the [contributor guidelines](https://github.com/databricks/megablocks/blob/dev/CONTRIBUTING.md)?
- [ ] Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
- [ ] Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
- [ ] Did you update any related docs and document your change?
- [ ] Did you update any related tests and add any new tests related to your change? (see [testing](https://github.com/databricks/megablocks/blob/dev/CONTRIBUTING.md#running-tests))
- [ ] Did you run the tests locally to make sure they pass?
- [ ] Did you run `pre-commit` on your change? (see the `pre-commit` section of [prerequisites](https://github.com/databricks/megablocks/blob/dev/CONTRIBUTING.md#prerequisites))

<!--
Thanks so much for contributing to MegaBlocks! We really appreciate it :)
-->
41 changes: 41 additions & 0 deletions .github/workflows/code-quality.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: Code Quality Checks
on:
push:
branches:
- main
- release/**
pull_request:
branches:
- main
- release/**
workflow_call:
workflow_dispatch:
# Cancel old runs when a new commit is pushed to the same branch if not on main or dev
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
defaults:
run:
working-directory: .
jobs:
code-quality:
runs-on: ubuntu-latest # TODO: switch to linux-ubuntu-latest later
timeout-minutes: 30
strategy:
matrix:
python_version:
- "3.11"
pip_deps:
- "[dev]"
steps:
- uses: actions/checkout@v3
- name: Get composite run steps repository
uses: actions/checkout@v3
with:
repository: mosaicml/ci-testing
ref: v0.1.2
path: ./ci-testing
- uses: ./ci-testing/.github/actions/code-quality
with:
python_version: ${{ matrix.python_version }}
pip_deps: ${{ matrix.pip_deps }}
16 changes: 9 additions & 7 deletions .github/workflows/pr-gpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,30 +15,32 @@ concurrency:
jobs:
pytest-gpu:
name: ${{ matrix.name }}
runs-on: ubuntu-latest # todo: switch to linux-ubuntu-latest later
if: github.repository_owner == 'databricks'
runs-on: ubuntu-latest # TODO: switch to linux-ubuntu-latest later
strategy:
fail-fast: false
matrix:
include:
- name: "python3.11-pytorch2.3.1-gpus1"
- name: "python3.11-pytorch2.4.0-gpus1"
gpu_num: 1
python_version: 3.11
container: mosaicml/pytorch:2.3.1_cu121-python3.11-ubuntu20.04
- name: "python3.11-pytorch2.3.1-gpus2"
container: mosaicml/pytorch:2.4.0_cu124-python3.11-ubuntu20.04
- name: "python3.11-pytorch2.4.0-gpus2"
gpu_num: 2
python_version: 3.11
container: mosaicml/pytorch:2.3.1_cu121-python3.11-ubuntu20.04
container: mosaicml/pytorch:2.4.0_cu124-python3.11-ubuntu20.04
steps:
- name: Run PR GPU tests
uses: mosaicml/ci-testing/.github/actions/[email protected].0
uses: mosaicml/ci-testing/.github/actions/[email protected].2
with:
name: ${{ matrix.name }}
container: ${{ matrix.container }}
python_version: ${{ matrix.python_version }}
gpu_num: ${{ matrix.gpu_num }}
git_repo: databricks/megablocks
pip_deps: "[all,testing]"
pytest_command: "coverage run -m pytest tests" # todo: remove tests from pytest tests when we delete all tests outside of MegaBlocks repo
pytest_command: "coverage run -m pytest tests"
# TODO: remove tests from pytest tests when we delete all tests in the MegaBlocks dir
pytest_markers: "gpu"
composer_package_name: mosaicml # Required as Composer is built from source
mcloud_timeout: 3600
Expand Down
118 changes: 118 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Copyright 2024 Databricks authors
# SPDX-License-Identifier: Apache-2.0

default_language_version:
python: python3
repos:
# - repo: local
# hooks:
# - id: pyright
# name: pyright
# entry: pyright
# language: node
# types: [python]
# pass_filenames: false
# args: [--warnings]
# additional_dependencies: ["[email protected]"]
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.2.2
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
- repo: https://github.com/google/yapf
rev: v0.32.0
hooks:
- id: yapf
name: yapf
description: A formatter for Python files.
entry: yapf
args: [-i, -vv, -p] # inplace
language: python
types: [python]
additional_dependencies:
- toml
- repo: https://github.com/hadialqattan/pycln
rev: v2.1.2
hooks:
- id: pycln
args: [. --all]
- repo: https://github.com/pycqa/isort
hooks:
- id: isort
rev: 5.12.0
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0
hooks:
- id: check-added-large-files
- id: check-ast
- id: check-builtin-literals
- id: check-case-conflict
- id: check-docstring-first
- id: check-executables-have-shebangs
- id: check-json
- id: check-shebang-scripts-are-executable
- id: pretty-format-json
args:
- --autofix
- --no-sort-keys
- --indent=4
- --no-ensure-ascii
- id: check-merge-conflict
- id: check-symlinks
- id: check-toml
- id: check-vcs-permalinks
- id: check-xml
- id: check-yaml
- id: debug-statements
- id: destroyed-symlinks
- id: double-quote-string-fixer
- id: end-of-file-fixer
- id: fix-byte-order-marker
- id: mixed-line-ending
- id: trailing-whitespace
- repo: https://github.com/Lucas-C/pre-commit-hooks
rev: v1.5.4
hooks:
- id: insert-license
args:
- --license-filepath
- .pre-commit/FILE_HEADER
- --comment-style
- "#"
- --allow-past-years
types: [python]
- repo: https://github.com/PyCQA/docformatter
rev: v1.5.0
hooks:
- id: docformatter
args: [--in-place, --wrap-summaries=120, --wrap-descriptions=120]
- repo: https://github.com/PyCQA/pydocstyle
rev: 6.1.1
hooks:
- id: pydocstyle
name: pydocstyle
entry: pydocstyle
language: python
types: [python]
exclude: (.ci|.github)
additional_dependencies:
- toml
- repo: https://github.com/adrienverge/yamllint.git
rev: v1.28.0
hooks:
- id: yamllint
name: yamllint
description: This hook runs yamllint.
entry: yamllint
language: python
types: [file, yaml]
- repo: https://github.com/trufflesecurity/trufflehog.git
rev: v3.40.0
hooks:
- id: trufflehog
name: secret scan
exclude: tests/horrible_strings.py
entry: trufflehog filesystem ./
args:
- --only-verified
- --fail
2 changes: 2 additions & 0 deletions .pre-commit/FILE_HEADER
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Copyright 2024 Databricks
SPDX-License-Identifier: Apache-2.0
42 changes: 42 additions & 0 deletions .yamllint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
yaml-files:
- "*.yaml"
- "*.yml"
- .yamllint

ignore: |
wandb
rules:
braces:
forbid: non-empty
brackets:
forbid: false
colons: enable
commas: enable
comments: enable
comments-indentation: enable
document-end:
present: false
document-start:
present: false
empty-lines: enable
empty-values: disable
hyphens: enable
indentation:
spaces: 2
indent-sequences: false
check-multi-line-strings: false
key-duplicates: enable
key-ordering: disable
line-length:
max: 120
allow-non-breakable-words: true
allow-non-breakable-inline-mappings: true
new-line-at-end-of-file: enable
new-lines: enable
octal-values: enable
quoted-strings:
quote-type: double
required: false
trailing-spaces: enable
truthy: disable
104 changes: 104 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Contributing to MegaBlocks

Thanks for considering contributing to MegaBlocks!

Issues tagged with [good first issue](https://github.com/mosaicml/megablocks/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) are great options to start contributing.

If you have questions, join us on [Slack](https://join.slack.com/t/mosaicml-community/shared_invite/zt-w0tiddn9-WGTlRpfjcO9J5jyrMub1dg) -- we'll be happy to help you!

We welcome contributions for bug fixes, new efficient methods you'd like to contribute to the community, or new models and datasets!

## Prerequisites

To set up the development environment in your local box, run the commands below.

1\. Install the dependencies needed for testing and linting the code:

<!--pytest.mark.skip-->
```bash
pip install -e '.[all]'
```

2\. Configure [pre-commit](https://pre-commit.com/), which automatically formats code before
each commit:

<!--pytest.mark.skip-->
```bash
pre-commit install
```

## Submitting a Contribution

To submit a contribution:

1\. Fork a copy of the [MegaBlocks](https://github.com/databricks/megablocks) library to your own account.

2\. Clone your fork locally and add the megablocks repo as a remote repository:

<!--pytest.mark.skip-->
```bash
git clone [email protected]:<github_id>/megablocks.git
cd megablocks
git remote add upstream https://github.com/databricks/megablocks.git
```

3\. Create a branch and make your proposed changes.

<!--pytest.mark.skip-->
```bash
git checkout -b cool-new-feature
```

4\. When you are ready, submit a pull request into the megablocks repository!

## Pull request (PR) guidelines

We have some rough guidelines that will make your PR easier to review and more likely to get smoothly merged. Please don't let uncertainty or difficulty with any of these things stop you from opening a PR! We are happy to help you through them :)
* Self-contained title and description. Please include a concise title and clear PR description. The title should allow someone to understand what the PR changes or does at a glance. The description should allow someone to understand the contents of the PR _without_ looking at the code.
* If the PR affects output that is displayed to a user of MegaBlocks (e.g. console logging or experiment tracker reporting), please include screenshots showing what the new output looks like. UX is important!
* Include tests. If you are fixing a bug, please add a test that would've caught the bug. If you are adding a new feature, please add unit tests that test the various components of the feature, and also a test that tests the full functionality of the feature.
* Please consider whether your changes affect the example notebooks or large parts of the code base, and run the daily tests locally if so (`pytest -m 'daily and not remote and not gpu and not vision and not doctest'`)
* `pre-commit` should help you handle formatting and type checking, but please do make sure you have it installed as described [above](#prerequisites).

## Configuring README Code Snippets

MegaBlocks uses [pytest-codeblocks](https://github.com/nschloe/pytest-codeblocks) to test all example code snippets. The pytest-codeblocks repository explains how to annotate code snippets, which supports most `pytest` configurations. For example, if a test requires model training, the GPU mark (`<!--pytest.mark.skip-->`) should be applied.

## Running Tests

To test your changes locally, run:

* `make test` # run CPU tests
* `make test-gpu` # run GPU tests
* `cd docs && make doctest` # run doctests

Some of our checks test distributed training as well. To test these, run:

* `make test-dist WORLD_SIZE=2` # run 2-cpu distributed tests
* `make test-dist-gpu WORLD_SIZE=2` # run 2-gpu distributed tests

These tests run with the `composer` launcher. We also support `WORLD_SIZE=1`, which would run the tests with the `composer` launcher on a single device.

See the [Makefile](/Makefile) for more information.

If you want to run pre-commit hooks manually, which check for code formatting and type annotations, run `pre-commit run --all-files`

### Docker

To run the tests in the provided docker containers:

* `docker pull mosaicml/composer` (or an alternative image like `mosaicml/composer:latest_cpu`)
* `docker run --rm -v ./:/composer --user $(id -u):$(id -g) -it mosaicml/composer`
* from inside the container
* `cd /megablocks`
* `pip install -e .`
* `pytest <args>` or `make <args>` to run the desired tests


## Code Style & Typing

See the [MegaBlocks Style Guide](/STYLE_GUIDE.md) for guidelines on how to structure and format your code.

MegaBlocks aims to annotate all functions with type annotations (introduced in
[PEP 526](https://www.python.org/dev/peps/pep-0526/)). Don't worry if you are not a Python typing expert;
put in the pull request, and we'll help you with getting the code into shape.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ RUN pip install flash-attn

ENV PYTHONPATH="/mount/megablocks/third_party/Megatron-LM:${PYTHONPATH}"

WORKDIR /mount/megablocks
WORKDIR /mount/megablocks
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -387,4 +387,4 @@
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
limitations under the License.
Loading

0 comments on commit 15be34e

Please sign in to comment.