Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify that there are 4, not 5 datasets #8

Open
shntnu opened this issue Nov 17, 2022 · 0 comments
Open

Clarify that there are 4, not 5 datasets #8

shntnu opened this issue Nov 17, 2022 · 0 comments
Assignees

Comments

@shntnu
Copy link
Contributor

shntnu commented Nov 17, 2022

Our abstract says

we provide a collection of four datasets with both gene expression and morphological profile data useful for developing and testing multimodal methodologies.

but the GitHub repo says

We have gathered the following five available data sets that had both Cell Painting morphological (CP) and L1000 gene expression (GE) profiles, preprocessed the data from different sources and in different formats in a unified .csv format.

We should clarify this, using the context below

One of the chemical datasets (CDRP-BBBC047-Bray) has a subset of compounds that are known to be bioactive. We referred to this subset as CDRP-bio-BBBC036-Bray and reported the details independently for this dataset (Supplementary Data 1 and 2). We only used CDRP-bio and not the full CDRP set for the analysis, because we believe that the quality of CDRP is insufficient for either of these analyses given that very few data points remained after filtering for replicate reproducibility across both modalities (Supplementary Fig. 1).

@shntnu shntnu self-assigned this Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant