Categorical encoding is a python library in the scikit-learn-contrib family for encoding categorical variables as numeric. Our goals are to:
- Provide a variety of different techniques for encoding
- Provide a common, familiar API to all of these encoders
- Support both numpy arrays and pandas dataframes equally
- Be an all around nice project to use and contribute to
With that in mind, we welcome and in fact would love some help.
The preferred workflow to contribute to git-pandas is:
-
Fork this repository into your own github account.
-
Clone the fork and install project via poetry:
$ git clone [email protected]:YourLogin/category_encoders.git $ cd category_encoders $ poetry install
-
Create a branch for your new awesome feature, do not work in the master branch:
$ git checkout -b new-awesome-feature
-
Write some code, or docs, or tests.
-
When you are done, submit a pull request.
This is still a very young project, but we do have a few guiding principles:
- Maintain semantics of the scikit-learn API
- Write detailed docstrings in numpy format
- Support pandas dataframes and numpy arrays as inputs
- Write tests
To run the tests, use:
$ pytest
There are usually some issues in the project github page looking for contributors, if not you're welcome to propose some ideas there, or a great first step is often to just use the library, and add to the examples directory. This helps us with documentation, and often helps to find things that would make the library better to use.