-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📝 Add page on genomic harmonization #78
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's dangerous to call all BIX processing "harmonization". Harmonization is the first thing they do, but then they also do lots of secondary analysis processing after harmonization that isn't part of harmonization. Sometimes sequencing centers directly give us "harmonized" files, but calling all bix processed files the "harmonized" ones makes people not realize that. Those are source data, but they're still harmonized, and then we have a label collision where we have to say "ok, is it harmonized harmonized or just harmonized?" I'm not saying we need to change the bucket configuration, but can we expand the descriptions here to talk about both harmonization and post-harmonization analysis?
Also, you have both "Kids-First" and "Kids First".
8bd38d0
to
2578c50
Compare
Lol CheckTrailingWhitespace |
2578c50
to
9a1bc73
Compare
I'll just second that the term "harmonization" is definitely overloaded. We're talking in the meeting right now about not using that term as much, but rather being a bit more specific about the workflow types and whether they are "functionally equivalent" or not. With the alignment phase being one of the easiest to be functional equivalent around. |
genomic data is given below: | ||
|
||
1) Sequencing center deposit data into S3 buckets provided by Kids First | ||
2) Harmonization projects are set up in Cavatica as a workspace to harmonize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no mention of what Cavatica is in the handbook. Maybe include a short blurb on Cavatica (what it means for Kids First DRC) or at least provide a hyperlink to it (http://docs.cavatica.org/docs/getting-started). Could put it in the Cavatica Project Setup section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think this will come as another page somewhere in the docs at which point we can come back and cross-reference it.
===================== | ||
|
||
.. figure:: /_static/images/genomic_data_flow.png | ||
:alt: Genomic data flow diagram |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the bix team image should be coders rather than wet lab scientists 😁unless my understanding of their duties is totally wrong lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no good neck beard icons in draw.io though 😢
project by mounting the bucket to it with a read-only user. | ||
6) Investigators are invited to the Cavatica project so that they may access | ||
their harmonized data. | ||
7) After the six-month embargo period has expired, the harmonized data is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the 6 month embargo period?
Receiving Data | ||
-------------- | ||
|
||
Kids First will receive genomic data in an S3 bucket specific to a study. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What genomic data? Unaligned, aligned, or both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both. There's a note about it under the 'Harmonizing' section, not sure if it should be repeated here.
|
||
Kids First will receive genomic data in an S3 bucket specific to a study. | ||
This data is transferred into the ``source/`` prefix within the bucket | ||
by the sequencing site. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really going to be the sequencing center or will it be the Kids First DRC that manages the transfer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've done both in the past, but I believe sequencing centers should be performing this going forward.
9a1bc73
to
def5cf3
Compare
I think maybe this would be better titled as |
An alternative is to split out harmonization starting from sequencing centers and ending with delivered aligned data from the analysis work by putting the two on different pages. |
Adds a page on how genomic harmonization works in cavatica.