"I bring order to chaos" - The Borg Queen, 2373
The project structure guide defines the layout of the project and the files and folders that make up the project.
📦data-analytics-project-template | The project root folder
┣ 📂.vscode | VS Code generated content: settings for spellings.
┣ 📂artifacts | Work products of the Jupyter notebooks.
┣ 📂assets | Assets for the project, provided by the starter pack.
┣ 📂docs | Documentation for the project.
┃ ┗ 📂images | Images for the documentation.
┣ 📂junk-dna | Code and artifacts that didn't make it into the release.
┣ 📂maps | Map system files.
┃ ┗ 📂irl-adm1 | irl-adm1 maps files.
┣ 📂notebooks | Jupyter notebooks for the project.
┃ ┗ 📂script | nbautoexport folder: Its contents are auto generated.
┣ 📂python-package | Python source code for the Python package.
┃ ┗ 📂project_name | project_name Python package.
┣ 📂references | References for the project, with copies.
┗ 📜readme.md | The project readme file.
Directory Structure Legend
All directories, Jupyter notebooks, markdown files, images and other files are to be named in lower kebab-case. This rule does not apply to Python files in the directory python-package or to map system files in the subdirectories of maps.
Lower kebab-case is chosen because the project is available online as a data science portfolio project on GitHub and GitHub URLs are case-sensitive.
https://github.com/markcrowe-com/
-
https://raw.githubusercontent.com/markcrowe-com/data-analytics-project-template/master/readme.md Returns the contents of
readme.md
-
https://raw.githubusercontent.com/markcrowe-com/data-analytics-project-template/master/README.md Returns a
404: Not Found
error
filename |
---|
readme.md |
docs/assessment-criteria.md |
docs/images/gantt-chart.png |
assets/births-deaths-marriages-ireland-1960-2021 |
notebooks/notebook-1-01-example.ipynb |
filename | Reason |
---|---|
README.md |
Capitalized filename |
assets/birthsdeathsmarriagesireland-1960-2021 |
Difficult to read, words not separated with '-' |
Docs/Assessment-Criteria.md |
Mixed case filename, Capitalized First letters in words is PascalCase |
DOCS/ |
Capitalized directory |
docs/images/gantt-chart.PNG |
Capitalized file extension |
notebooks/notebook-1-01 example.ipynb |
Space in name |
samples/samplePythonModule.py |
Mixed case filename, Python filename words not separated with '_' |
Each notebook filename should begin notebook-S-NN-
where S is a unique number and NN is a unique two-digit number of the notebook. This code is used to order the notebooks in the project.
Python files in the directory python-package are to be named in snake_case.
When working on this project use the following directory structure:
Files generated from the Jupyter notebooks should be placed in the directory artifacts.
The files provided in the directory assets were provided to start the project. They are not to be modified.
Documentation for the project is to be placed in the directory docs. Documentation is to be where possible in markdown
Images used in the documentation should be placed in the directory docs/images.
Map system files for the project should be placed in the directory maps. Each map's files should be in its own subdirectory. These files names are not to be changed to conform to filename conventions.
Jupyter notebooks for the project should be placed in the directory notebooks. Each notebook filename should begin notebook-S-NN-
where S is a unique number and NN is a unique two-digit number of the notebook. It may be necessary to clear/delete files from the notebooks/script directory in teh event a notebook is renamed or deleted.
Python modules files for the project_name
Python package are kept in the directory python-package. These files and folders folder are to be named in snake_case.
References are to be listed using Harvard referencing style in the file /references/readme.md. Copies of the references are to be placed in the references directory beginning with their data accessed in the filename e.g. 2021-11nov-04-python-3-f-strings.md
. Where possible reference copies are to be saved as markdown or pdf.
To make the project interactive online each Jupyter Notebook will include a heading with the online editors in the first cell.
online editors |
Two function have been provided to generate this code.
create_jupyter_notebook_header(github_username: str, repository: str, notebook_filepath: str, branch: str)
print_jupyter_notebook_header_html(github_username: str, repository: str, notebook_filepath: str, branch: str)
Data Sources files are to be referenced from their online sources.
These files are used to build the Python package.
📦data-analytics-project-template
┣ 📂python-package
┃ ┣ 📂project_name
┃ ┃ ┣ 📜dataframe_labels.py
┃ ┃ ┣ 📜project_manager.py
┃ ┃ ┗ 📜__init__.py
┃ ┣ 📂tests
┃ ┃ ┗ 📜test_basic.py
┃ ┣ 📜license
┃ ┣ 📜pyproject.toml
┃ ┣ 📜readme.md
┃ ┣ 📜requirements.txt
┗ ┗ 📜setup.cfg
Python Package Setup
The scripts folders in notebooks and samples and their contents are generated by nbautoexport and pipreqs respectively. You should not need to modify these files. The .nbautoexport
files are created by nbautoexport and required for its functionality.
The .gitattributes file is configured to use pandoc for comparing Microsoft Word .docx files. The .gitignore file was created using gitignore.io and is used to ignore files that are not to be committed.
The files desktop.ico
and desktop.ini
are used to set the icon and name of the project folder on a Windows computer. The folder must be read-only for this setting to take effect.
📦data-analytics-project-template
┣ 📂.vscode
┣ 📂notebooks
┃ ┣ 📂script
┃ ┗ 📜.nbautoexport
┣ 📜.gitattributes
┣ 📜.gitignore
┣ 📜desktop.ico
┗ 📜desktop.ini
System Files
📦data-analytics-project-template
┣ 📂.vscode
┃ ┗ 📜settings.json
┣ 📂artifacts
┃ ┣ 📜group-skills.xlsx
┃ ┗ 📜readme.md
┣ 📂assets
┃ ┣ 📜2021-12Dec-11-population-estimates-1950-2021-pea01.csv
┃ ┗ 📜readme.md
┣ 📂docs
┃ ┣ 📂images
┃ ┃ ┣ 📜correlation-matrix-heatmap-pyramid.png
┃ ┃ ┗ 📜gantt-chart.jfif
┃ ┣ 📜assessment-criteria.md
┃ ┣ 📜build-python-package.md
┃ ┣ 📜build-requirements.md
┃ ┣ 📜code-style-guide.md
┃ ┣ 📜gantt-chart.md
┃ ┣ 📜install-nbautoexport.md
┃ ┣ 📜install-python-package.md
┃ ┣ 📜jupyter-notebook-layout-guide.md
┃ ┣ 📜knowledge-skills-abilities.md
┃ ┣ 📜notebook-managers.md
┃ ┣ 📜project-structure-guide.md
┃ ┣ 📜readme.md
┃ ┗ 📜template-todo.md
┣ 📂junk-dna
┃ ┗ 📜readme.md
┣ 📂notebooks
┃ ┣ 📂script
┃ ┃ ┣ 📜notebook-1-01-example-bad-code-population.py
┃ ┃ ┣ 📜notebook-2-01-example-better-code-population-eda.py
┃ ┃ ┣ 📜notebook-2-02-example-better-code-population-dv.py
┃ ┃ ┗ 📜requirements.txt
┃ ┣ 📜.nbautoexport
┃ ┣ 📜notebook-1-01-example-bad-code-population.ipynb
┃ ┣ 📜notebook-2-01-example-better-code-population-eda.ipynb
┃ ┣ 📜notebook-2-02-example-better-code-population-dv.ipynb
┃ ┗ 📜readme.md
┣ 📂python-package
┃ ┣ 📂project_name
┃ ┃ ┣ 📜dataframe_labels.py
┃ ┃ ┣ 📜project_manager.py
┃ ┃ ┗ 📜__init__.py
┃ ┣ 📂tests
┃ ┃ ┗ 📜test_basic.py
┃ ┣ 📜license
┃ ┣ 📜pyproject.toml
┃ ┣ 📜readme.md
┃ ┣ 📜requirements.txt
┃ ┗ 📜setup.cfg
┣ 📂references
┃ ┣ 📜2021-11nov-03-data-scientists-your-variable-names-are-awful.md
┃ ┣ 📜2021-11nov-03-pep-515-underscores-in-numeric-literals.md
┃ ┣ 📜2021-11nov-03-pep-8-style-guide-for-python-code.md
┃ ┣ 📜2021-11nov-04-python-3-f-strings.md
┃ ┣ 📜2021-11nov-04-to-camel-case-or-under-score.pdf
┃ ┗ 📜readme.md
┣ 📜.gitattributes
┣ 📜.gitignore
┣ 📜desktop.ico
┣ 📜desktop.ini
┣ 📜license
┗ 📜readme.md
Template footnote:
This project started from the template https://github.com/markcrowe-com/data-analytics-project-template. Permission is granted to reproduce for personal and educational use only. Commercial copying, hiring, lending is prohibited. In all cases this notice must remain intact. Template Author Mark Crowe Copyright © 2021, All rights reserved.