A Python data science cookiecutter for a poetry
and pyenv
user, which includes:
- A
src
folder where the main module for shared code should exist insrc/package_name
- pre-commit hooks:
autopep8
poetry export
to sync dependencies with a requirements.txt file
- A local
.python-version
forpyenv
- Folder structure similar to
kedro
To create a new project folder that follows this cookiecutter template run:
python -m cookiecutter [email protected]:banditkings/ds_cookie.git
or if using HTTPS auth:
python -m cookiecutter https://github.com/banditkings/ds_cookie.git
{{ cookiecutter.repo_name }} <- Git repo name
├── .gitignore <- Hidden file that prevents staging of unnecessary files to `git`
├── README.md <- The top-level README for developers using this project
│
├── data <- Store raw data, features, etc - not committed to git
│ ├── 01_raw <-- Raw immutable data
│ ├── 02_intermediate <-- Typed data
│ ├── 03_primary <-- Domain model data
│ ├── 04_feature <-- Model features
│ ├── 05_model_input <-- Often called 'master tables'
│ ├── 06_models <-- Serialised models
│ ├── 07_model_output <-- Data generated by model runs
│ └── 08_reporting <-- Ad hoc descriptive cuts
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
│
├── pyproject.toml <- Poetry dependency and environment file
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc
│ └── figures <- Generated graphics and figures to be used in reporting
│
└── src <- Source code for this project
├── tests <- All tests for this package
└── {{ cookiecutter.package_name }} <- This package
├── data <- Scripts to download or generate data
├── features <- Scripts to turn raw data into features for modeling
├── models <- Scripts to train models and then use trained models to make
│ predictions
└── visualization <- Scripts to create exploratory and results oriented visualizations