Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty partitions are not empty #115

Open
tamasfe opened this issue Dec 29, 2024 · 1 comment
Open

Empty partitions are not empty #115

tamasfe opened this issue Dec 29, 2024 · 1 comment

Comments

@tamasfe
Copy link

tamasfe commented Dec 29, 2024

I'm building a job queue and evaluating persistent back-ends for it, and fjall seems to be a very interesting/exciting project that could potentially fit very well. A typical use-case is having a partition where pending jobs are inserted and something periodically goes through them and the removes them.

Using the default compaction settings I eventually end up with partitions where both len() and is_empty() returns 0 and true respectively but there is a 23MB segment that is never removed from disk.

The issue with this (other than disk usage, which I don't mind) is that it seems that even is_empty() seems to take up significant resources and having a few "empty" partitions like this it ends up with ~500MB to 1GB memory usage while also being heavy on the CPU.

I assume this is because the segment is mostly full of tombstones that were not removed by compaction and fjall has to iterate over them to find the first live key-value that doesn't exist while having to read/decompress the entire segment each time.

Example flamegraph

Image

Is there a way to force compaction to remove all dead records or is there a way to truncate a partition without recreating it completely? (I'd need this while the application is running)

I am also using transactions, I haven't checked the behaviour without them.

@marvin-j97
Copy link
Collaborator

I assume this is because the segment is mostly full of tombstones that were not removed by compaction and fjall has to iterate over them to find the first live key-value that doesn't exist while having to read/decompress the entire segment each time.

Indeed this is correct - it boils down to #56.

A workaround is to periodically run a major compaction or rewrite certain disk segments into the last level, but those APIs are not exposed yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants