-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance after server running for a few days #1282
Comments
can you check your system regarding
|
Initial stats (after a service restart and leaving it for a bit, just DAVx5 doing mobile syncing):
After 5 minutes whilst InfCloud is waiting for a scanning PROPFIND request:
After 10 minutes with calendars loaded in InfCloud:
Stats during those requests: radicale-2023-02-06-vmstat.log |
Looks like your system has an IO and/or CPU issue, the "idle" column turns from usually 99 down to 0 while the "wait" column turns up from 0 to nearly 99 -> the process/OS is waiting for IO response. Potential reasons
what is the output of
Also monitor using |
This is a bare-metal home server, so no virtualisation here. CPU is a 2-core Since the run above, I've rebooted after some other upgrades, and we're back to ~10 seconds for that scanning PROPFIND. It's possible it's just cleared some in-memory cruft, though I hadn't noticed any other software performing poorly before the reboot. I'll keep tabs on it and see how the performance goes as time goes on, but I'll welcome any other places to check. |
So we're a couple of reboots later, and it seems that after about 7-10 days of uptime I start seeing noticeably slow or timing-out requests with Radicale. I've took down a couple of "heavy" services to try and rule out system performance issues, though the server as a whole still seems fine, and besides some automated backups there isn't really anything else here with any significant disk usage. From the potential reasons listed before:
I noted it's a 2-core system before -- is there an expected / ballpark figure for number of cores required (be it fixed, linear in number of calendars etc.)? Would reducing |
Just wanted to check in again as the performance remains a problem, and is also now blocking my ability to sync on mobile: DAVx5 is consistently timing out during the sync (not sure if they've recently implemented a timeout, or the delays have crept up and passed its tolerance). I'm at the point where I'm considering alternative servers that don't use plain .ics files on disk, as I assume that's ultimately the bottleneck here, though being able to interact with the storage is very useful and I suspect any database-backed servers will be harder to work with for ad-hoc tasks. Any answers to the open questions above (re. CPU cores, |
did this problem disappear since last report? |
No change, still seeing problematic disk usage. (I assume I'm not expecting to see any changes on Radicale's side? The latest release is still 3.1.8 and the Arch package hasn't changed since last year.) I have switched from InfCloud to AgenDAV as my web client, which seems to handle the slowness a bit better in general, at the cost of some edits seemingly not going through if they time out. |
Your collection has a size of over 1 GByte on disk? That's huge and potentially an explanation for the high "wait" value caused by long running I/O. For testing purposes, can you insert a SSD and move the collection? Another way of speed check would be temporary copy the collection to |
I suspected the collection was on the large size, which is why I put the collection stats in the original issue and queried it a few times. Unfortunately I don't have a spare SSD I can hook up, but I can try running it in-memory for a bit and see how that compares -- I just came across Profile-sync-daemon elsewhere which might work for this. |
What filesystem are you using? If "ext*" and there are a huge amount of files in one single directory this can also be a reason. In this case try to create an additional partition with "xfs" and copy over and check the behavior again. |
I've currently got the largest calendar symlinked into It is otherwise on ext4, so with the large calendar in the tens of thousands of items and others in the thousands, I'm not sure how much of a hit is taken listing the collection directories (naive Would tidying up the cache help? Or perhaps putting all the caches in
|
Hmm, the sync token directory has a large size. You can stop "radicale", remove directory ".Radicale.cache" complete, create it on Beside that I would assume that without using Python's profiling capabilites it's not easy to detect what consumes the time. But you can work with a copy of on a different system to dig further. I would start on this 180 seconds duration call from the log you've posted above - and potentially dig into |
I was evaluating Radicale few months ago (before the project became active again) for use as a CardDAV server for Contacts synchronisation on iOS/macOS. I had ~12k read-only contacts/users synchronised with ActiveDirectory via script over a course of few months, with ~100 changes per day on average. I spent some time profiling Radicale and the bottleneck was always the filesystem based storage (especially when it lists the files in directories). To make it work in more or less reliable manner:
|
which filesystem type was in use?
This can be related to file system type and related caching settings (in the very past this was the reason why "squid" and also "postfix" implemented a multi-subdirectory structure for its file cache/queues)
Filed a separate issue: #1523 (potentially one want to contribute and send a PR)
Default is 30 days, are there any general recommendations around about min or max?
Thank you for the hint. Created a new Wiki page: https://github.com/Kozea/Radicale/wiki/Performance-tuning Feel free to contribute |
can this be closed now? |
Unfortuately I haven't had the time to properly dig into this. Last time I checked (on the upgrade to 3.2.0 I think) it was still struggling, so I ended up switching efforts to reduce calendar usage more generally -- these large calendars likely just aren't suitable for Radicale's disk-backed storage, and I've yet to find a database-backed one I like and can successfully configure. 🫤 Feel free to close if supporting abnormally-sized calendars isn't a priority; I suspect I'm in the minority here by shoehorning data into calendars! |
please try to reproduce with an optional different caching method "use_mtime_and_size_for_item_cache=True" implemented into upcoming 3.3.2 by #1655 If it improves, all fine and issue was caused by the item file read to calc SHA256 hash and then lookup cache. If not, please reopen |
Thanks for the heads up -- I've built a local Arch package for the 3.3.2 tag, restored my large calendars, left the cache migration running (took around 80 minutes on my collection, not sure what the expectation there is), and started the server back up. For now it does seem to be running much more smoothly: InfCloud can load the month view from scratch in 15 seconds, which is a significant improvement over the times in the OP. I'll let it simmer and see how it holds up after a few days/weeks of uptime.
|
I assume, too many fsync calls during verfication and regeneration of the cache...will dig into how to postpone them. |
Should be fixed by 4b1183a |
I'm seeing certain requests taking durations into the minutes to complete. For example, loading my calendars in InfCloud:
I've bumped my nginx proxy timeout to keep the requests working, though it makes things difficult to use with the long delays.
Performance seems to vary somewhat (e.g. give or take a few minutes for a scan of all collections) from day to day despite no obvious difference in server load. Restarting Radicale helps (e.g. reducing the scan time to about 30 seconds initially) for a day or two, but the slowness creeps back in after that.
I'm not sure if I'm at the point where my calendars are just too hefty to work with as files, or if there's another bottleneck here.
Debug log snippet: radicale-2023-02-26.log (NB. different day to the table)
Collection stats
For context, as I'm not sure what constitutes a "large" collection, here's some stats on the collection root (I'm the only user):
Totals: 43695 items, 1.2GB
Server: Radicale 3.1.8 on Arch Linux
The text was updated successfully, but these errors were encountered: