-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean .git history #317
Comments
One could test that now for our hackathon repo pypsa-meets-earth/pypsa-africa-hackathon#14 |
This for cleaning additional big files above i.e. 1M files |
List of file types that could be cleaned: |
git-filter-repo seems to be the new tool to do the job. Docs can be found here. Note. Action with git filter-repo can be really deconstructive. Make always a copy of the original, perform dry-runs and check if it worked. After cleaning the repository, NO old repository is allowed to push, otherwise histories will be mixed up resulting in a mess.
|
Great tip in the Step-by-Step guide (called DISCUSSION here):
PyPSA-Africa needs to remain with dirty history at the beginning. |
I'm wondering whether we may rebase the old commits by squashing them: intermediate files that are created and deleted in the squashed commits may disappear from the history (hopefully). |
Some more info:
Whatever we go for, we should experiment first in a fork or a dummy repo |
#449 shows that even if we remove all the history and keep the latest code we are still close to 100 MB. It could be possible/essential to clean the current state of the notebook folder and images folder before sanitizing the history. |
I've tested on my fork at https://github.com/davide-f/pypsa-africa (branch main) When downloading the repository, the size is now about 50MB, see image below. What has been done:
Note: To keep the notebooks empty, I've proposed to also add a pre-commit rule Note2: I am not sure why the badge in this branch is not showing the size I'd expect; it may be due to some time delay or not sure why... |
I could reduce the repo size to 5.9 MB while keeping all the .py history.
Design note:
|
Update: I locally did what you did as well max: I removed the images folder and the corresponding history and the size is now 38MB Very interesting approach; however, you removed all the notebooks. Update2: I added what you did on my repo, same link as before but (a) keeping the notebooks and (b) fixing the missing images in the doc/image to match the images of the api_reference. Removing the OLD_* notebooks remove only 0.5MB of memory and around 100-200 commits. Not sure if it is worthy. However those notebboks are no more needed. |
Proposal B: Keep jupyter notebooks outside of the PyPSA-Earth repository & add clear documentation on use.Context & problem:
In my opinion, Jupyter notebooks are only useful if they are precompiled such that the user knows what images or results to expect. Otherwhise the user/developer will waste time debugging code that was not necessary in the first place. The problem is that not all notebooks or even none jupyter notebook should be hosted on the Design idea: A side benefit. The PyPSA-Earth repository will be reduced from 360MB -> 6-25MB at the current version Maybe another strong argument for this option. Adding Jupyter notebooks later if it's really needed is not destructive while removing Jupyter notebooks could be destructive (requiring filter-repo & everyone needs to work on a new fork/clone) |
We decided to go for Proposal B. |
|
…y_demand Adaptations to industry demand
Just realised that our .git history (hidden files) is quite large (270MB). One can clean historic .ipynb, cleaning their outputs.
The last command from here does the job:
git filter-branch --tree-filter "python3 -m nbconvert --ClearOutputPreprocessor.enabled=True --inplace *.ipynb **/*.ipynb || true"
The text was updated successfully, but these errors were encountered: