Skip to content
This repository has been archived by the owner on Oct 4, 2024. It is now read-only.

Downloaded media files lose case in filenames if downloading albums without specifying any album filters #400

Open
mrjo118 opened this issue Dec 2, 2022 · 11 comments
Assignees
Labels

Comments

@mrjo118
Copy link

mrjo118 commented Dec 2, 2022

I was using gphotos-sync to download everything from my Google Photo account a couple of months ago and noticed this issue (I was using release 3.0.3 at the time).

If I run gphotos-sync without any filters on album or album-regex I found that all the files which get downloaded have their filenames converted to all lowercase. I noticed however when downloading a single album using --album or --album-regex, the case of the filenames are correctly retained on the downloaded media. In order to download my whole collection of photos and videos in a single run, I had to work around the issue by using --album-regex "^(.*)$" to effectively use a regex which matched all albums

@mrjo118 mrjo118 changed the title Media files lose case in filenames if downloading albums without specifying any filters Downloaded media files lose case in filenames if downloading albums without specifying any album filters Dec 2, 2022
@gilesknap
Copy link
Owner

I am bemused by this. There is an algorithm that adds '(2)' to the second instance of the same filename and this is affected by the folder structure as duplicate filenames in different folders are OK. But I can't immediately see why using the album reg would change that. I'll have a review of the code and see if I can see anything

@mrjo118
Copy link
Author

mrjo118 commented Jan 5, 2023

Might it be related to this? (line 135/258):
https://github.com/gilesknap/gphotos-sync/blob/main/src/gphotos_sync/GooglePhotosDownload.py#L135

I didn't check where settings.case_insensitive_fs is derived, but I am using Windows

@mrjo118
Copy link
Author

mrjo118 commented Jan 5, 2023

Having a deeper dig in the code I can see that when indexing the media it will take 2 different routes in the code depending on whether you specify --album-regex or not (or other varients of arguments).
-If you specify --album-regex it will go down the route of google_albums_sync.index_album_media
-If you don't specify --album-regex it will go down the route of google_photos_idx.index_photos_media

For the index_photos_media route, this will consider the value of case_insensitive_fs and set the to_lower attribute which eventually converts the filename to all lowercase if set to true or if unspecified and the derived value is true (which is the case for windows). While the album route does not utilise the case_insensitive_fs value at all.

The question is, is it necessary to convert the filename to all lowercase if the filesystem is detected as case insensitive? If so, why does the album route work without doing so, or is there a bug for an edge case in the album route?

@gilesknap
Copy link
Owner

Thanks for the detailed report.

When the filesystem is case insensitive we do need to change the case internally because the filename clash detection will fail otherwise. Therefore it is a bug if we don't do this in album names, however album names with just a case difference are reasonably unlikely so I have got away with it so far.

For you original issue I'm guessing that the code thinks you have a case insensitive FS when you don't? Usually windows is case insensitive but it does support other filesystems.

@mrjo118
Copy link
Author

mrjo118 commented Jan 5, 2023

The thing with Windows is the default behaviour is you're not allowed to have files with the same name but in different case (therefore it fails this check https://github.com/gilesknap/gphotos-sync/blob/main/src/gphotos_sync/Checks.py#L108), yet the case of filenames when naming a file is retained.

To be honest it's not ideal to be converting everything to lowercase in the final output, but I appreciate it could be a load of work to change the algorithm for filename conflict detection to retain the original case of filenames on a case insensitive filesystem.

If you didn't fancy changing that then I think it would be best to fix the GoogleAlbumsSync route to consider the case_insensitive_fs flag, and then document somewhere clearly that if the filesystem is case insensitive everything will be converted to lowercase, unless you pass '--case_insensitive_fs false' explicitly, with the disclaimer that doing so will mean files with the same filename in different case on Google Photo will overwrite each other

@gilesknap
Copy link
Owner

gilesknap commented Jan 5, 2023

re Windows filesystem: I'm not sure I knew that - or at least if I did then I made a bad choice on how to handle these filesystems when I wrote the clash handling.

I think we'd have to go with the latter fix you suggest.

I should have made the clash check case insensitive and preserve case in filenames. But, to make this change now would cause loads of updates to existing syncs and sorting out how to resolve those would be really fiddly.

@gilesknap
Copy link
Owner

Also the line 108 check covers other file systems that are fully case insensitive - not sure if they exist but I would not be surprised.

@gilesknap
Copy link
Owner

I'm going through issues today and getting a release out.

I'm putting this one on hold for a later release as its going to break lots of system tests and take some time to sort out.

@gilesknap gilesknap self-assigned this Mar 11, 2023
@gilesknap gilesknap added the bug label Mar 11, 2023
@hubert3
Copy link

hubert3 commented Jun 14, 2023

Also hitting this bug, running on macOS installed version 3.1.2 from pip today, the app indexes 732 photos but saves 1462, saving a duplicate for each file ending in (2)

06-15 00:04:19 WARNING  Indexing Google Photos Files ...
06-15 00:04:27 WARNING  indexed 732 items
06-15 00:04:27 WARNING  Downloading Photos ...
06-15 00:05:11 WARNING  Downloaded 732 Items, Failed 0, Already Downloaded 0
06-15 00:05:11 WARNING  Indexing Shared (titled) Albums ...
06-15 00:05:31 WARNING  Indexed 2 Shared (titled) Albums
06-15 00:05:31 WARNING  Indexing Albums ...
06-15 00:05:32 WARNING  Indexed 0 Albums
06-15 00:05:32 WARNING  Downloading Photos ...
06-15 00:06:47 WARNING  Downloaded 1462 Items, Failed 0, Already Downloaded 732
06-15 00:06:47 WARNING  Creating album folder links to media ...
06-15 00:06:47 WARNING  Created 730 new album folder links
06-15 00:06:47 WARNING  Done.

Using --album-regex "^(.*)$" as a workaround:

06-15 00:07:35 WARNING  Downloaded 0 Items, Failed 0, Already Downloaded 0
06-15 00:07:35 WARNING  Indexing Shared (titled) Albums ...
06-15 00:07:55 WARNING  Indexed 2 Shared (titled) Albums
06-15 00:07:55 WARNING  Indexing Albums ...
06-15 00:07:55 WARNING  Indexed 0 Albums
06-15 00:07:55 WARNING  Downloading Photos ...
06-15 00:09:10 WARNING  Downloaded 730 Items, Failed 0, Already Downloaded 0
06-15 00:09:10 WARNING  Creating album folder links to media ...
06-15 00:09:10 WARNING  Created 730 new album folder links
06-15 00:09:10 WARNING  Done.```

@gilesknap
Copy link
Owner

@hubert3 thanks for the report. I'll try to take a look at this again soon.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants