Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syncing the current folder structure across separate acquisition machines. #373

Open
JoeZiminski opened this issue Apr 25, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@JoeZiminski
Copy link
Member

A problem of acquiring data across machines is that it is possible for the subject / session numbers to get out of 'alignment'. For example you may have two acquisition machines (ephys, behaviour). On one, you accidently create a 'subject-002' that is empty. Now when you automatically get the next subject, on one machine it is 'sub-003' and the other 'sub-002'.

This is a general problem of acquisition pipelines across machines and not datashuttle-specific. However it would be nice to include as many protections against this, and ways to mitigate this in datashuttle as possible.

Some ways to handle this are:

  1. The current way, which is sub-optimal. When data is transferred from the local machines to central, then central contains the most up to date project. When using get_next_sub or ses then (by default) this includes central and so the correct recommendations should be made. However it is not reasonable to expect that data across all acquisition machines is transferred immediately after the session is finished (e.g. might have two mice back to back).

  2. A layer of protection can be added quite easily by writing a metadata file to hidden .datashuttle folders within subject/ session level whenever a folder is created. This could include the time that the folder is written. Then checks can be run, for example if ses-002 and ses-003 were created within 1 minute of each other, they are probably wrong.

  3. The best solution will be to write metadata detailing local folder structure and send this to central immediately (i.e. when folders are created). Then central can always contain a total overview of folders on the project at any time. This could be updated either when a) folders are created b) data is transferred. This is a bigger job and a lot of care will have to be taken to consider the many possible edge cases. It will be necessary to expose the central directory tree through the GUI / through an datashuttle method that prints the file tree.

@JoeZiminski JoeZiminski added the enhancement New feature or request label Apr 25, 2024
@adamltyson
Copy link
Member

Some kind of "sync metadata" function would be great. Users could certainly run this before/after every session, and then sync the data itself later. As you say though, a lot of work.

@niksirbi
Copy link
Member

I agree that the 3rd option, albeit more difficult, is the only permanent solutions to this. We need to sync a representation of the project's tree structure without the actual files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants