Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📝 Add page on genomic harmonization #78

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dankolbman
Copy link
Contributor

Adds a page on how genomic harmonization works in cavatica.

@dankolbman dankolbman added the documentation Regarding developer or user documentation label Aug 8, 2019
@dankolbman dankolbman self-assigned this Aug 8, 2019
@dankolbman dankolbman requested a review from XuTheBunny August 8, 2019 15:01
Copy link
Contributor

@fiendish fiendish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's dangerous to call all BIX processing "harmonization". Harmonization is the first thing they do, but then they also do lots of secondary analysis processing after harmonization that isn't part of harmonization. Sometimes sequencing centers directly give us "harmonized" files, but calling all bix processed files the "harmonized" ones makes people not realize that. Those are source data, but they're still harmonized, and then we have a label collision where we have to say "ok, is it harmonized harmonized or just harmonized?" I'm not saying we need to change the bucket configuration, but can we expand the descriptions here to talk about both harmonization and post-harmonization analysis?

Also, you have both "Kids-First" and "Kids First".

docs/data/genomic_harmonization.rst Outdated Show resolved Hide resolved
@dankolbman dankolbman force-pushed the genomic-harmonization branch from 8bd38d0 to 2578c50 Compare August 9, 2019 18:39
@dankolbman dankolbman requested a review from fiendish August 9, 2019 19:39
@fiendish
Copy link
Contributor

Lol CheckTrailingWhitespace

docs/data/genomic_harmonization.rst Outdated Show resolved Hide resolved
docs/data/genomic_harmonization.rst Show resolved Hide resolved
docs/data/genomic_harmonization.rst Outdated Show resolved Hide resolved
docs/data/genomic_harmonization.rst Outdated Show resolved Hide resolved
docs/data/genomic_harmonization.rst Outdated Show resolved Hide resolved
docs/data/genomic_harmonization.rst Outdated Show resolved Hide resolved
docs/data/genomic_harmonization.rst Outdated Show resolved Hide resolved
docs/data/genomic_harmonization.rst Outdated Show resolved Hide resolved
docs/data/genomic_harmonization.rst Show resolved Hide resolved
docs/data/genomic_harmonization.rst Outdated Show resolved Hide resolved
@dankolbman dankolbman force-pushed the genomic-harmonization branch from 2578c50 to 9a1bc73 Compare August 12, 2019 14:53
@dankolbman dankolbman requested a review from fiendish August 12, 2019 14:53
@allisonheath
Copy link
Member

I'll just second that the term "harmonization" is definitely overloaded. We're talking in the meeting right now about not using that term as much, but rather being a bit more specific about the workflow types and whether they are "functionally equivalent" or not. With the alignment phase being one of the easiest to be functional equivalent around.

genomic data is given below:

1) Sequencing center deposit data into S3 buckets provided by Kids First
2) Harmonization projects are set up in Cavatica as a workspace to harmonize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no mention of what Cavatica is in the handbook. Maybe include a short blurb on Cavatica (what it means for Kids First DRC) or at least provide a hyperlink to it (http://docs.cavatica.org/docs/getting-started). Could put it in the Cavatica Project Setup section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think this will come as another page somewhere in the docs at which point we can come back and cross-reference it.

=====================

.. figure:: /_static/images/genomic_data_flow.png
:alt: Genomic data flow diagram
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the bix team image should be coders rather than wet lab scientists 😁unless my understanding of their duties is totally wrong lol

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no good neck beard icons in draw.io though 😢

docs/data/genomic_harmonization.rst Outdated Show resolved Hide resolved
project by mounting the bucket to it with a read-only user.
6) Investigators are invited to the Cavatica project so that they may access
their harmonized data.
7) After the six-month embargo period has expired, the harmonized data is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the 6 month embargo period?

Receiving Data
--------------

Kids First will receive genomic data in an S3 bucket specific to a study.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What genomic data? Unaligned, aligned, or both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both. There's a note about it under the 'Harmonizing' section, not sure if it should be repeated here.


Kids First will receive genomic data in an S3 bucket specific to a study.
This data is transferred into the ``source/`` prefix within the bucket
by the sequencing site.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really going to be the sequencing center or will it be the Kids First DRC that manages the transfer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've done both in the past, but I believe sequencing centers should be performing this going forward.

docs/data/genomic_harmonization.rst Outdated Show resolved Hide resolved
docs/data/genomic_harmonization.rst Outdated Show resolved Hide resolved
docs/data/genomic_harmonization.rst Outdated Show resolved Hide resolved
@dankolbman dankolbman force-pushed the genomic-harmonization branch from 9a1bc73 to def5cf3 Compare August 12, 2019 18:21
@dankolbman
Copy link
Contributor Author

I think maybe this would be better titled as Genomic Harmonization and Analysis or Bioinformatics Worflows and encompass the entirety of everything that happens in Cavatica starting with alignment (harmonization) and continuing to delivery of alignment and analysis results.

@dankolbman
Copy link
Contributor Author

An alternative is to split out harmonization starting from sequencing centers and ending with delivered aligned data from the analysis work by putting the two on different pages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Regarding developer or user documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants