-
-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Speed up server import process #1597
Comments
The way we check for duplicates is straightforward: A lighter way to check that would be to check the file name and verify that it does not already exist in the database. Proposition:
|
That could work. The hashing function might reduce the import quite a bit. |
Each image in Lychee is associated to a row in the photos table in the database. There are multiple things that can take time:
To give you an idea this is an illustration of the time required to access data on specific parts of your computer.
Even if those two may seem relatively fast as single event, when do you a 1 by 1 process it will still be slow in the end. Unless we use a different strategy for such processing, the only gains to be done are by optimizing the sequential process. |
Yeah, if multi-threading was available, I'm sure quite a few things could be sped up. How much do you think disk speed determines the import process? In general, the different factors would be:
I wonder now what would have the most impact. |
Hello, I just wanted to add my 2 cents about this issue. I have a similar problem where syncing the files takes forever but not because of the number (which I am aware would increase the sync time no matter what) but rather the big files I host. I have quite a lot of files that are above 200MB each and the CPU the server uses is just an Intel Celeron. Computing the hash of those big files takes an enormous amount of time, so much that I would consider it to be a waste of time and energy (in those hard times where power costs much more). I would think just checking the names instead of the checksum would be a huge gain of time and energy (while also not allowing lychee to find real file changes but my files are not subject to those changes so I could afford that). If accuracy is an issue, maybe saving the exact number of byte per file in the database and comparing to the real files would be more efficient and be a bit more accurate? I would think it is extremely unlikely that a modified file would have the exact same number of bytes as the previous one (but maybe I'm wrong). I started working on a dirty bash script for this purpose (that uses the API) and sadly I do not know PHP enough to even try adding this functionality myself. Not having such a feature is really blocking for me as I cannot start using Lychee at all unless I decide to manually add albums by hand which would be really tedious... Everything is hosted on a SATA SSD so it should be fast enough. |
Yeah I'm trouble importing large folders (thousand or two images). Every time it gets hung on the session times out or whatever it has to start over. Seems like it should maybe be able to track what was last processed and only restart for new items? Or items added since last run date? |
@bushibot Your use case would be better suited by using command line: https://github.com/LycheeOrg/Lychee/blob/master/app/Console/Commands/Sync.php |
I’d like to understand more about how to do that? It doesn’t seem quite as straightforward as open up the console and typing sync.php. Getting data in has already turned into a multi day project and it’s only getting worse… cli running the background might help it be more robust 😝 |
This should hopefully cover it: |
Cool, not sure how I missed that but thanks. |
Or not so easy. I'm running on unraid docker image. I opned a console to test but got back
|
To solve this problem, you should stay where the 'artisan' file is and pass in the absolute path of the path to be imported, like the following root@038a06b0c059:/var/www/html/Lychee# php artisan lychee:sync /uploads/import/ |
To give some context, I have a photo library of around 350k photos and videos.
Luckily, Lychee can import things per symlink and skip duplicates, which makes it good for having these photos in just one location. The problem is the repeated importing. At least every two weeks, there are multiple new photos and videos added to that library. To show them in Lychee, an import from the server has to be run, which skips duplicates and makes symlinks.
The problem here is, that the checking of the duplicates takes too long. With the current implementation, it would take hours upon hours of just going through everything to see if it's already present and in the same condition.
I'm not sure what everything is done during the duplicate skip, but if it's just checking if the file already exists, it seems to be quite inefficient in that.
If it does more, can that “more” be disabled? I don't “change” already present content, so the only thing that has to be checked for is new file paths.
I hope there is some solution to that; otherwise, it makes it difficult to use this tool for this kind of library sizes.
The text was updated successfully, but these errors were encountered: