You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A problem of acquiring data across machines is that it is possible for the subject / session numbers to get out of 'alignment'. For example you may have two acquisition machines (ephys, behaviour). On one, you accidently create a 'subject-002' that is empty. Now when you automatically get the next subject, on one machine it is 'sub-003' and the other 'sub-002'.
This is a general problem of acquisition pipelines across machines and not datashuttle-specific. However it would be nice to include as many protections against this, and ways to mitigate this in datashuttle as possible.
Some ways to handle this are:
The current way, which is sub-optimal. When data is transferred from the local machines to central, then central contains the most up to date project. When using get_next_sub or ses then (by default) this includes central and so the correct recommendations should be made. However it is not reasonable to expect that data across all acquisition machines is transferred immediately after the session is finished (e.g. might have two mice back to back).
A layer of protection can be added quite easily by writing a metadata file to hidden .datashuttle folders within subject/ session level whenever a folder is created. This could include the time that the folder is written. Then checks can be run, for example if ses-002 and ses-003 were created within 1 minute of each other, they are probably wrong.
The best solution will be to write metadata detailing local folder structure and send this to central immediately (i.e. when folders are created). Then central can always contain a total overview of folders on the project at any time. This could be updated either when a) folders are created b) data is transferred. This is a bigger job and a lot of care will have to be taken to consider the many possible edge cases. It will be necessary to expose the central directory tree through the GUI / through an datashuttle method that prints the file tree.
The text was updated successfully, but these errors were encountered:
Some kind of "sync metadata" function would be great. Users could certainly run this before/after every session, and then sync the data itself later. As you say though, a lot of work.
I agree that the 3rd option, albeit more difficult, is the only permanent solutions to this. We need to sync a representation of the project's tree structure without the actual files.
A problem of acquiring data across machines is that it is possible for the subject / session numbers to get out of 'alignment'. For example you may have two acquisition machines (ephys, behaviour). On one, you accidently create a 'subject-002' that is empty. Now when you automatically get the next subject, on one machine it is 'sub-003' and the other 'sub-002'.
This is a general problem of acquisition pipelines across machines and not datashuttle-specific. However it would be nice to include as many protections against this, and ways to mitigate this in datashuttle as possible.
Some ways to handle this are:
The current way, which is sub-optimal. When data is transferred from the local machines to central, then central contains the most up to date project. When using
get_next_sub
orses
then (by default) this includes central and so the correct recommendations should be made. However it is not reasonable to expect that data across all acquisition machines is transferred immediately after the session is finished (e.g. might have two mice back to back).A layer of protection can be added quite easily by writing a metadata file to hidden .datashuttle folders within subject/ session level whenever a folder is created. This could include the time that the folder is written. Then checks can be run, for example if ses-002 and ses-003 were created within 1 minute of each other, they are probably wrong.
The best solution will be to write metadata detailing local folder structure and send this to central immediately (i.e. when folders are created). Then central can always contain a total overview of folders on the project at any time. This could be updated either when a) folders are created b) data is transferred. This is a bigger job and a lot of care will have to be taken to consider the many possible edge cases. It will be necessary to expose the central directory tree through the GUI / through an datashuttle method that prints the file tree.
The text was updated successfully, but these errors were encountered: