-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shallow clone trees to save on memory usage #268
Conversation
Signed-off-by: Alex Goodman <[email protected]>
Benchmark Test ResultsBenchmark results from the latest changes vs base branch
|
Signed-off-by: Alex Goodman <[email protected]>
After talking with @willmurphyscode I'm going to hold off on merging in order to demonstrate a little more proof of this working as intended in syft. If all goes well I think this will land on the order of days 🎉 |
The test we still want to do is get a big image that has different file types for the same location, and edit the file, and prove that the filetree object doesn't get mutated. This can probably be done with the stereoscope example application. Basically, make an image that, for a given path, has layers where that path is a directory with children, then where it's a file, where it's a link, where it's a file with different contents, etc. Basically fuzzing the tree to prove that the structs that are now being shallow-copied won't be mutated. |
we decided to do additional testing before merging
I ran a test, and I think we have some differences here. Overview
Setup❯ tree .
.
├── Dockerfile
└── chimera
├── a.txt
└── b.txt FROM debian:trixie-slim
# makes /chimera/a.txt /chimera/b.txt
COPY . /
# make a layer where the dir is empty
RUN rm /chimera/a.txt /chimera/b.txt
RUN rmdir /chimera
RUN touch /chimera
RUN rm /chimera
RUN ln -s /etc/hostname /chimera
RUN unlink /chimera
RUN mkdir /chimera
Saving results like go run ./examples/basic.go localhost/aggressive > willtmp/before.txt
git checkout shallow-tree-clone
git rebase origin/main
go run ./examples/basic.go localhost/aggressive > willtmp/after-with-rebase.txt ResultsRunning --- willtmp/before.txt 2024-11-23 07:39:21
+++ willtmp/after-with-rebase.txt 2024-11-23 07:44:20
@@ -10336,8 +10336,6 @@
/bin/znew
/boot
/chimera
- /chimera/a.txt
- /chimera/b.txt
/dev
/etc
/etc/.pwd.lock I think some more investigation is needed before we merge this. A reasonable next step is probably to write a failing test that exhibits this bug and push it to this branch. The above testing proves there is a difference, but not much about what the cause is. |
Signed-off-by: Will Murphy <[email protected]>
I've pushed a test case that passes on |
Thanks for taking the due diligence here and proving that this doesn't quite work in the context of syft -- I'm going to close this for now. |
Based off of the inuse memory in stereoscope we spend of lot of memory on copying the filetree for squashing. An observation here is that all tree mutation operations always create or replace nodes, but never mutate existing nodes. This means that we can share references across the multiple trees created for the squashfs functionality for each layer.
Here is a snapshot of inuse memory today for image read for a sample image:
Here is a snapshot of inuse memory after using
map.Clone
to shallow clone existing trees:Partially fixes #233 in terms of inuse memory