Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equivalent of Windows folder sync on Linux. #49190

Open
Bockeman opened this issue Nov 10, 2024 · 4 comments
Open

Equivalent of Windows folder sync on Linux. #49190

Bockeman opened this issue Nov 10, 2024 · 4 comments
Labels
0. Needs triage Pending check for reproducibility or if it fits our roadmap enhancement

Comments

@Bockeman
Copy link

How to use GitHub

  • Please use the 👍 reaction to show that you are interested into the same feature.
  • Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.

Is your feature request related to a problem? Please describe.
I'd like the folder synchronization to work on Linux like it does on Windows.

The file system on Windows is different from that on Linux, but Nextcloud does a very good job of storing Windows data on a Linux server. Ok, so all the files on the linux server are owned by the http owner such as apache, the permissions are probably coalesced and Windows shortcuts are ignored.

Importantly, when a Windows client machine restores/fetches/updates files from the Linux server, they appear to all work seemlessly without any complaint from Windows. Under the hood, there may be differences between files created on one Windows machine, sync'd to the Linux server, then sync'd to a another Windows machine. But to the user, it just works. A world class excellent product.

Now lets suppose the client machine (desktop or laptop) is running Linux (and that SELinux is not running or is at least permissive). The usage example, equivalent to the Windows usage example below, simply does not work. It cannot, because there are so many features that Linux relies upon (ownership, permissions, symbolic links), and these are not supported over the current WebDav (or whatever) interface the Nextcloud uses.

Describe the solution you'd like
I'd like the richer Linux metadata to be encoded somehow and passed over the WebDav interface. I'm not asking a Windows machine to handle this extra metadata, it should ignore it on download from the server, and not provide it for upload (at least in the first instance, though some features, like a Windows shortcut could be mapped to a Linux symbolic link and vice versa, but even that creates issues which I want to keep out of bounds for this request).

Describe alternatives you've considered
The Linux "rsync" does an excellent job of keeping the laptops aligned, provided this is a one way sync, that is rsync is from powered to laptop. But even then, "rsync" is inadequate because it gets tripped up with temporary files (e.g. lock files) and temporary permission changes (only writable when lock is held). I've experimented with trying to make rsync two way (e.g. by taking turns in each direction), but this proved to be a disaster. It got even more tangled by the temporary files, and got into a complete mess if files or folders were moved/renamed. It actually turned out to be dangerous, removing folders from both ends. Ok, so my implementation may have had weaknesses, but rsync was not designed for this job. rsync does not have any conflict resolution mechanism. It is my understanding the Nextcloud handles a number of scenarios, like file in use, and does support conflict resolution.

Additional context
Here is a Windows usage example. In the office, a user has a Windows desktop with multiple large screens, various devices attached or nearby, and possibly network access to data storage servers. Nextcloud client ensures that all files in selected folders is continuously sync'd with the Linux server. In addition to the typical Document, Picture, Video folders this might include more exotic folders like AppData/Roaming/Thunderbird, or even a dynamic folder such as GoogleDrive [Yep, I've been syncing shared GoogleDrive storage with Nextcloud successfully for ages]. This user has a Windows laptop, that normally stays at home or is taken to other locations. When the laptop is switched on, Nextcloud sync's all the folders, and (onced sync'd) the user can carry on where they left off in the office (barring the office based hardware). Assuming, the user checks that all files have been sync'd before powering off the laptop, upon return to the office, the user can resume on the desktop "where they left off" on the laptop. There's an underlying assumption that the user only works on one machine, the laptop or the desktop. But even in the case where the user accidently leaves an application running, such as an email client, Nextcloud detects the potential conflict and provides a very reasonable way of resolving those conflicts.

The obvious Linux usage example, is equivalent to the Windows usage example above, but all the client machines (desktop, laptop) are running an equivalent OS, Linux. I am not expecting any useful cross platform synchronisation (apart from existing trivial usage of files and folders not relying on the richer Linux file system metadata).

Here's a more challenging Linux usage example. The site has a farm of Linux machines and infrastructure. There are also a number of laptops which are normally connected. It is not cost effective to provide UPS for the whole infrastructure mainly because a power cut is just one of many scenarios that could result in loss of service. When there is a power outage, all the powered Linux machines go down, but the laptops keep running for several hours (long enough for considered manual intervention contingencies). When power is restored, all machines essentially power up to an idle state waiting for manual confirmation (this is so much safer and more convenient than the calamity that is introduced by an uncoordinated power up). Meanwhile, the laptops remain running. It these laptops that are used to coordinate bringing up the powered systems. If the laptops were running Nextcloud, then they would have an up-to-date record of essential information prior to the power cut, and would be immediately ready to coordinate bringing up the powered systems. Without Nextcloud, the bring up process is convoluted and takes hours.

I know an awful lot of work is put in to keep Nextcloud functioning correctly and providing ever more powerful features. Thank you all. Sometimes, revisiting the foundations (of the fundamental two-way sync) might not be appealling. But think of the rewards and the potential, such as sync'ing other platforms like MacOS, Android, iOS, and then eventually the cross platform syncing! (There's a messy cross platform solution where the additional metadata is stored on hidden files on any client machine that does not support particualr metadata features -- but I'm sure there's a better solution waiting for someone to invent).

@Bockeman Bockeman added 0. Needs triage Pending check for reproducibility or if it fits our roadmap enhancement labels Nov 10, 2024
@MrRinkana
Copy link

  1. File-metadata such as creation date, exif and such is already synchronised, to all clients no matter os (win, linux, mac, android, ios).
  2. filesystem specific metadata such as permissions (ownership), where next block is, selinux context, eventual checksums (resilient fs:s) and such should not be synchronised, and is not to any os. Every client just creates the files with its own user:group (of the local user/machine). Its like that on both windows and unix based. It does not make sense, nor will it work to have the same user:group (or other such filesystem metadata) owning the files on all systems. Wanting that is doing something wrong.
  3. Your text is long and I at-least cant follow what you want. What exactly are you trying to do that's not working? What exactly do you want added?

To my knowledge the only practical difference between linux client and windows is the support for virtual filesystems (file only downloaded when accessed), which is a important feature, but not something you seem to be describing/asking for at all, and that is tracked elsewhere.

@Bockeman
Copy link
Author

Bockeman commented Nov 16, 2024

@MrRinkana Thank you for responding. Let me address the points you raise:

  1. File-metadata such as creation date, exif and such is already synchronised, to all clients no matter os (win, linux, mac, android, ios).

I happy to assume that there is a basic level of synchronisation between clients whatever OS. I am going to create a test environment to pin down what is included. I'm concerned not only about creation date, but also for each of the date properties for last access, last status change, last modification.

  1. filesystem specific metadata such as permissions (ownership), where next block is, selinux context, eventual checksums (resilient fs:s) and such should not be synchronised, and is not to any os. Every client just creates the files with its own user:group (of the local user/machine). Its like that on both windows and unix based. It does not make sense, nor will it work to have the same user:group (or other such filesystem metadata) owning the files on all systems. Wanting that is doing something wrong.

As stated above "Now lets suppose the client machine (desktop or laptop) is running Linux (and that SELinux is not running or is at least permissive).", I am not expecting synchronisation of SELinux context, nor any filesystem context such as inode, number of hardlinks, next block and similar.

But I am expecting all the properties directly associated with the file itself: owner, group, permissions, all dates, attributes and extended attributes to be synchronised. This is contrary to your statement "should not be synchronised" and which I would like to better understand (see below). The acid test is whether any standard linux command (make, ls -lt, lsattr, tee, ">", cp, ...) would work in exactly the same way across all synchronised clients. There are many programs and UI's that I'd also expect to work seemlessly across synchronised clients including IDE's, revision control systems, backups and snapshots.

There is an implication here, and that is that all the synchronised clients have the same OS (or possibly a compatible OS, such as Linux and MacOS). Standard linux command interoperability across different OS's (eg Linux - Windows) is not required and does not make sense; but ideally should files be synchronised across incompatible OS's then the incompatible metadata is somehow preserved.

  1. Your text is long and I at-least cant follow what you want. What exactly are you trying to do that's not working? What exactly do you want added?

As stated above: "I'd like the folder synchronization to work on Linux like it does on Windows.".

And to be clear, that means owner:group, all date-times, permissions, attributes and extended attributes are required to make things work on Linux.

To my knowledge the only practical difference between linux client and windows is the support for virtual filesystems (file only downloaded when accessed), which is a important feature, but not something you seem to be describing/asking for at all, and that is tracked elsewhere.

Agreed. I am not interested in virtual filesystems at this time.

What should not be synchronised
You mentioned that some things (owner:group ...) should not be synchronised. I'd like to better understand what you mean by this.

[Please put aside the nuances of the english language for "should not", "must not", "may not", "cannot" and similar, as these may not be clear to many readers.]

Owner:group are not synchronised at present. I want them to be synchronised. What fundamental reasons do you have that might explain the technical feasability or infeasability of implementing synchronisation of such properties.

@MrRinkana
Copy link

You don't have to worry about creation, modification -time. Last access is commonly ignored on filesystems, or done lazily but that's not related to nextcloud.

I don't understand why you would want the owner:group to be syncronised, what is the use-case?

The tools like "make, cp, ls, > etc" will work the same on different systems with different or same users, exactly because the owner properties are not synchronised, but rather the files are owned by the user running the sync client on each system - allowing that user to read, list and modify the files. You will not get consistent behaviour if you sync the user:group, see example below.

An example user Bob:Bob on systemA might have uid:gid 123:123, but on systemB Bob:Bob might be uid:gid of 124:164 because they where created at different times/versions of os or whatever. On Bobs laptop the user might be "Bobby". Bobs phone will have a bunch of different users, none chosen by Bob. Syncing the ownership and group will make so that Bob cant open the different files except on the original system, unless using root account.

To make it worse, what if Bob wants to share a file to Anna? Annas user on her system will definitively not be "Bob", making her unable to open it.

On the server the user account will be the web-server user, commonly "wwwrun". wwwrun must be the owner of the files, such that it can serve them, and no - the web-server cant run as multiple users just so that Bob, Anna, Elsa, Tom etc. can have their files owned by their uids and gids.

Do you see how keeping the files ownership and group does not make sense?

Btw, Acid compliance is something used for database resiliency, not about command line tools behaving similar.

I still heavily doubt you have a valid use-case for syncing the user:groups, but if you're running some HA kubernetes or whatever docker swarm clusters where you can be sure of the users being identical across different nodes, then you're not looking for something like a cloud solution. Take instead a look at Ceph, GlusterFS or similar.

What you are requesting (syncronised user:groups) in not resonable, nor possible for nextcloud to support. This issue should be closed.

To be clear, it does not work like that on windows either, nor does any other cloud solution do it that way. Not even things like SMB synchronise ownership and group. Things will break if you synchronise ownership across different systems, not the opposite of being required to work.

@Bockeman
Copy link
Author

@MrRinkana Thank you again for your comprehensive response. Let me address the points you raise:

You don't have to worry about creation, modification -time.

I think what you are saying is "Nextcloud folder synchronisation correctly handles creation-time and modification-time, so this is not a problem for folder sync on Linux [ie no need to worry about it]."

In principle I agree. Modification-time is important. The obvious use case is when the folder(s) contain(s) files that are relevant for make. A more subtle use case is when files are being updated more rapidly than can be uploaded from the client to the server; if nextcloud does not get this right, then older files could be downloaded from the server and overwrite a younger file on the client and/or create a conflict. However, although this is a real and observed problem (false conflicts), it applies to all OS's and is off-topic for this thread.

Having established that modification-time is important, there is another implementation feature that gets in the way: modification-time appears to be stored in two places. The visible storage is the modification-time of the file itself, and different OS's potentially handle such meta-data/attributes differently, so Nextcloud must handle this correctly [I think it does]. The second place where the modification time is stored is in the MySQL database. The reason I know this is because on a client I attempted to synchronise files in a Linux folder where permissions on the server allowed file write/append, but attribute changes were denied. (Errors were generated, but hidden in log files and not adequately propogated to the user]. As a result, when the file was viewed via the web interface (which I assume interrogates the database) the file is reported as changed n hours ago, but the file attributes on the server show an ancient modification-date. Because of this phenomenon I do worry about modification-time.

This particular problem (file modification-time different from database) was overcome by changing the owner:group and permissions of the folder and the files on the server or something like that. I put this down to an isolated issue when a large amount of data was added to the server [followed by a occ files:scan --all ] rather than being uploaded to the server from a nextcloud client. This does illustrate my concern with owner:group. I guess this bizarre situation could occur regardless of client OS and so is off-topic for this thread.

Last access is commonly ignored on filesystems, or done lazily but that's not related to nextcloud.

Agreed.

I don't understand why you would want the owner:group to be synchronised, what is the use-case?

The tools like "make, cp, ls, > etc" will work the same on different systems with different or same users, exactly because the owner properties are not synchronised, but rather the files are owned by the user running the sync client on each system - allowing that user to read, list and modify the files. You will not get consistent behaviour if you sync the user:group, see example below.

An example user Bob:Bob on systemA might have uid:gid 123:123, but on systemB Bob:Bob might be uid:gid of 124:164 because they where created at different times/versions of os or whatever. On Bobs laptop the user might be "Bobby". Bobs phone will have a bunch of different users, none chosen by Bob. Syncing the ownership and group will make so that Bob cant open the different files except on the original system, unless using root account.

To make it worse, what if Bob wants to share a file to Anna? Annas user on her system will definitively not be "Bob", making her unable to open it.

On the server the user account will be the web-server user, commonly "wwwrun". wwwrun must be the owner of the files, such that it can serve them, and no - the web-server cant run as multiple users just so that Bob, Anna, Elsa, Tom etc. can have their files owned by their uids and gids.

Do you see how keeping the files ownership and group does not make sense?

Thank you for your thoughts. You have caused me to re-evaluate how nextcloud is used/deployed. I have subsequently considered distinguishing between two broad scenarios:

  1. Each user (whatever OS) runs a nextcloud client on their machine. Client files are owned by the user running the nextcloud client. All of your arguments above apply. This makes perfect sense on a single user machine (typically Windows), but also works for a multi-user machine (typically Linux) provided each user runs a separate nextcloud client.
  2. Compute, storage and other resources are "pooled". For compute resources, each user logs in and has processes running on whatever machines are appropriate, but accessed via a single humble/dumb machine (desktop or laptop). Storage is provided via an appropriate file system, such as GlusterFS, and owner:group as well as mode (file permissions) are important. Also imagine user and non-user tools accessing this filesystem, from revision control through backup and snapshot. One such tool might be nextcloud which keeps selected folders on one filesystem (e.g. local storage on a laptop) synchronised with the corresponding folders on another filesystem (e.g. the pooled storage resource like GlusterFS). To make this work, the nextcloud client would need to be a superuser, and would need to replicate all of the meta-data for each file/folder (owner:group, permissions, attributes, extended attributes ...).

Some examples that fall into scenario 2.

  • A laptop that is taken off-site. A copy of selected folders of the multi-user filesystem is synchronised to the laptop local drive(s).
  • A laptop or server supported by UPS. Selected folders locally synchronised to maintain critical services.
  • A compute machine with fast local storage. Selected folders locally synchronised for performance.

In brief the scenarios are

  1. User synchronisation
  2. Filesystem synchronisation

Btw, Acid compliance is something used for database resiliency, not about command line tools behaving similar.

I still heavily doubt you have a valid use-case for syncing the user:groups, but if you're running some HA kubernetes or whatever docker swarm clusters where you can be sure of the users being identical across different nodes, then you're not looking for something like a cloud solution. Take instead a look at Ceph, GlusterFS or similar.

I'm using GlusterFS. I looked at Ceph. UID/GID is identical across all nodes.

What you are requesting (syncronised user:groups) is not reasonable, nor possible for nextcloud to support. This issue should be closed.

I respect your opinion/position that syncronised user:groups is not reasonable, but I think I have outlined above a scenario and examples where in my mind what I am asking for is reasonable. I have to accept your better judgement that this might not be possible with nextcloud. But I'd like to keep the issue open in case someone does come up with a workable solution.

To be clear, it does not work like that on windows either, nor does any other cloud solution do it that way. Not even things like SMB synchronise ownership and group. Things will break if you synchronise ownership across different systems, not the opposite of being required to work.

I'm using SMB and have to pay careful attention to the configuration to get owner:group and permissions such that the limited required access on either the Windows side or the Linux side work as desired.

Conclusion

For a single user, I'd like all the other non-user:group properties to be synchronised. (For example: permissions - some revision control systems change permissions to read-only or just user write, according to lock granted). I intend to run some tests to check current nextcloud capability.

I accept that owner:group does not fit in to current nextcloud synchronisation implementation. Though I note that nextcloud is not completely agnostic to owner:group, for example the -all for all users in some occ commands.

For filesystem synchronisation, I think there might be a workaround. Each client node (containing a local filesystem that needs to synchronised with the server) runs a nextcloud client process for each required user. All the files on the server are owned by the server process (wwwrun, apache, ...). The configuration for each user, e.g. ignored files, ensures that only that user's files are synchronised.

  • This is horrible and open to abuse, etc. But would work where users trust each other.
  • Consider this as a "seed" idea intended to stimulate discussion or even a workable solution.
  • I have not ruled out the possibility that the server process is a superuser and able to ammend/keep owner:group according client properties (assuming UID/GID alignment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0. Needs triage Pending check for reproducibility or if it fits our roadmap enhancement
Projects
None yet
Development

No branches or pull requests

2 participants