As a project within the online-ml
community, we have a public roadmap that lists what has been done, what we're currently doing, and what needs doing. There's also an icebox with high level ideas that need framing. You're welcome to pick anything that takes your fancy and that you deem important. Feel free to open a discussion if you want to clarify a topic and/or want to be formally assigned a task in the board.
Of course, you're welcome to propose and contribute new ideas. In this case, we also encourage you to open a discussion so that we can align on the work to be done. It's generally a good idea to have a quick discussion before opening a pull request that is potentially out-of-scope.
The typical workflow for contributing to River is:
- Fork the
master
branch from the GitHub repository. - Clone your fork locally.
- Commit changes.
- Push the changes to your fork.
- Send a pull request from your fork back to the original
master
branch.
Start by cloning the repository:
git clone https://github.com/online-ml/deep-river
Next, you'll need a Python environment. A nice way to manage your Python versions is to use pyenv, which can installed here. Once you have pyenv, you can install the latest Python version River supports:
pyenv install -v $(cat .python-version)
curl -sSL https://install.python-poetry.org | python3 -
Now you're set to install River and activate the virtual environment:
poetry install
poetry shell
Finally, install the pre-commit push hooks. This will run some code quality checks every time you push to GitHub.
pre-commit install --hook-type pre-push
You can optionally run pre-commit
at any time as so:
pre-commit run --all-files
You're now ready to make some changes. We strongly recommend that you to check out River's source code for inspiration before getting into the thick of it. How you make the changes is up to you of course. However we can give you some pointers as to how to test your changes. Here is an example workflow that works for most cases:
- Create and open a Jupyter notebook at the root of the directory.
- Add the following in the code cell:
%load_ext autoreload
%autoreload 2
- The previous code will automatically reimport River for you whenever you make changes.
- For instance, if a change is made to
regression.Regressor
, then rerunning the following code doesn't require rebooting the notebook:
from deep_river.regression import Regressor
from torch import nn
class MyModule(nn.Module):
def __init__(self, n_features):
super(MyModule, self).__init__()
def forward(self, X, **kwargs):
# your transformation here
return X
model = Regressor(module=MyModule)
- Pick a base class from the
base.py
file, which can either beDeepEstimator
orRollingDeepEstimator
. - Check if any of the mixin classes from the
base
module apply to your implementation. - Make you've implemented the required methods, with the following exceptions:
- Stateless transformers do not require a
learn_one
method. - In case of a classifier, the
predict_one
is implemented by default, but can be overridden.
- Stateless transformers do not require a
- Add type hints to the parameters of the
__init__
method. - If possible provide a default value for each parameter. If, for whatever reason, no good default exists, then implement the
_unit_test_params
method. This is a private method that is meant to be used for testing. - Write a comprehensive docstring with example usage. Try to have empathy for new users when you do this.
- Check that the class you have implemented is imported in the
__init__.py
file of the module it belongs to. - When you're done, run the
utils.check_estimator
function on your class and check that no exceptions are raised.
If you're adding a class or a function, then you'll need to add a docstring. We follow the Google docstring convention, so please do too.
To build the documentation, you need to install some extra dependencies:
poetry install --with docs
pip install git+https://github.com/MaxHalford/yamp
From the root of the repository, you can then run the make livedoc
command to take a look at the documentation in your browser. This will run a custom script which parses all the docstrings and generate MarkDown files that MkDocs can render.
poetry install
Unit tests
These tests absolutely have to pass.
pytest
Static typing
These tests absolutely have to pass.
mypy river
Web dependent tests
This involves tests that need an internet connection, such as those in the datasets
module which requires downloading some files. In most cases you probably don't need to run these.
pytest -m web
Notebook tests
You don't have to worry too much about these, as we only check them before each release. If you break them because you changed some code, then it's probably because the notebooks have to be modified, not the other way around.
make execute-notebooks
- Checkout
master
- Run
make execute-notebooks
just to be safe - Run the benchmarks
- Bump the version in
deep_river/__version__.py
- Bump the version in
pyproject.toml
- Tag and date the
docs/releases/unreleased.md
file - Commit and push
- Wait for CI to run the unit tests
- Push the tag:
DEEP_RIVER_VERSION=$(python -c "import deep_river; print(deep_river.__version__)")
echo $DEEP_RIVER_VERSION
git tag $DEEP_RIVER_VERSION
git push origin $DEEP_RIVER_VERSION
- Wait for CI to ship to PyPI and publish the new docs