Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Does the default_cdisc_join_keys contain exhaustive list of CDISC datasets? #258

Open
3 tasks done
vedhav opened this issue Jan 19, 2024 · 5 comments
Open
3 tasks done

Comments

@vedhav
Copy link
Contributor

vedhav commented Jan 19, 2024

What is your question?

When testing about default_cdisc_join_keys along with the scda datasets. I was unable to find the join keys for c("ADAB" "ADPC" "ADPP" "ADTR") in the default_cdisc_join_keys however they were present in scda. There were also additional join keys in the default_cdisc_join_keys c("ADSAFTTE" "ADCSSRS" "ADEQ5D5L") which were missing in the scda datasets.

I thought that CDISC datasets formats contain an exhaustive list of datasets (at least for a given version of the SDTM). My question is do we need to extend the default_cdisc_join_keys to include the missing datasets from scda? Perhaps also add all the available datasets in the default_cdisc_join_keys into scda.

join_key_datasets <- default_cdisc_join_keys |> names()
latest_scda_names <- scda::synthetic_cdisc_data("latest") |> names() |> toupper()

setdiff(latest_scda_names, join_key_datasets)
# [1] "ADAB" "ADPC" "ADPP" "ADTR"
setdiff(join_key_datasets, latest_scda_names)
# [1] "ADSAFTTE" "ADCSSRS"  "ADEQ5D5L"

Code of Conduct

  • I agree to follow this project's Code of Conduct.

Contribution Guidelines

  • I agree to follow this project's Contribution Guidelines.

Security Policy

  • I agree to follow this project's Security Policy.
@donyunardi
Copy link
Contributor

donyunardi commented Jan 19, 2024

[Question]: Does the default_cdisc_join_keys contain exhaustive list of CDISC datasets?

No it doesn't and I don't think we should maintain this exhaustive list.

Analysis datasets are named using the ADXXXX convention, where the XXXX portion is sponsor-defined and created depending on the product. As CDISC continues to evolve, it's too laborious to always have to keep up with the new convention.

At the very least, we should cover the common ones, and upon a quick glance, I felt we have already done this:
https://github.com/insightsengineering/teal.data/blob/main/inst/cdisc_datasets/cdisc_datasets.yaml

@lcd2yyz
Can I get your opinion on this?

@lcd2yyz
Copy link

lcd2yyz commented Jan 19, 2024

@donyunardi Great explanation! Confirm it's correct.

I actually feel we should maybe remove some from the list, because they are sponsor-defined dataset names, as opposed to common datasets names outlined in CDISC standards or ADaM implementation guides. For examples, ADAETTE, ADQLQC, ADCSSRS, ADEQ5D5L.

@khatril @shajoezhu @crazycatandy @telepath37 Can I get you opinion on the suggestion to drop these sponsor-defined datasets?

@telepath37
Copy link

@donyunardi Great explanation! Confirm it's correct.

I actually feel we should maybe remove some from the list, because they are sponsor-defined dataset names, as opposed to common datasets names outlined in CDISC standards or ADaM implementation guides. For examples, ADAETTE, ADQLQC, ADCSSRS, ADEQ5D5L.

@khatril @shajoezhu @crazycatandy @telepath37 Can I get you opinion on the suggestion to drop these sponsor-defined datasets?

I agree - we should just keep the very common ADaM datasets in our list ("defaults") and allow users to define keys on their ADXXXX datasets if they want.

@shajoezhu
Copy link
Contributor

Thanks @lcd2yyz and @donyunardi

I also agree that we should trim this list, and keep this to minimal. the standards and implementation changes all the time, if it puts too much restirction checks, it is less user-friendly

@khatril
Copy link

khatril commented Jan 26, 2024

Thanks for the discussion and for putting it on our radar, I'm also in agreement to trim these back to the common datasets only and allow users the flexibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants