Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of Differential Expression package (DESeq2) #227

Open
reductiveminded opened this issue Jun 20, 2024 · 1 comment
Open

Addition of Differential Expression package (DESeq2) #227

reductiveminded opened this issue Jun 20, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@reductiveminded
Copy link

Addition of Algorithm:

My name is Matthew Marino. I am a CFDE GlyGEN summer intern working under Jeet Vora and Rene Ranzinger. I am currently trying to create multiomics workflows starting with transcriptomics and proteomics data.
I am hoping to be able to quantify differentially expressed (DE) genes from a counts/metadata matrix. Get a list of DE genes with their corresponding log2FC and q values in ranked order by significance. From here, I am then trying to do a similar method for LC-MS/MS proteomics and glycoproteomics datasets to assess the overlap/correspondence as well as draw meaningful biological conclusions from the datasets.

In order to analyze differentially expressed genes from untreated and treated samples within a RNA Seq counts matrix, DESeq2 does the following, Briefly: DOI: 10.1186/s13059-014-0550-8

1.) Data input: with a counts matrix and metadata describing the sample identifiers in each column (this appears to be similar to the add AnnData option)
2.) Combining both into a single data object (again similar to the AnnData option)
3.) Filtering low reads based on rowsums.
4.) Size factor calculation: median ratio of each gene's count to the to the geometric mean of that gene's count across all samples.
5.) Dispersion and Shrinkage Estimation: Models variance by; calculating gene wise dispersion estimates and fitting them to a trend line to provide a mean-dispersion relationship. Shrinkage of dispersion estimates is done by fitting to the trend line to improve accuracy of the dispersion estimates.
6.) Fitting of a generalized linear model (GLM) with a negative binomial distribution specific to the metadata experimental conditions.
7.) Hypothesis testing: Wald test: are log2FC different from zero? Provides P values. Multiple testing correction: Benjamin-Hochberg method to provide q (p adj.) values. (controls the false discovery rate, statistical significance of differentially expressed genes).
8.) Results table

A quick note regarding the output: I am hoping that it can return a table which ranks the differentially expressed genes by their q value and still contains the gene name and log2FC of each. This will allow the user to see whether they are differentially expressed in what direction and to what statistical significance.

This package is widely used by the community and is considered one of the most accurate ways to depict variance among treated vs untreated samples in RNA Seq data.
Citations:
DOI: 10.1093/bib/bbt086
DOI: 10.1186/1471-2105-14-91

@AviMaayan
Copy link
Collaborator

Hi @reductiveminded Next Tuesday at 1 PM ET we have a workshop about the Playbook platform. You can find information about it here: https://playbook-workflow-builder.cloud/events/2024-06-25

@u8sand u8sand added the enhancement New feature or request label Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants