Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework update submissions to allow re-submissions #256

Open
wants to merge 13 commits into
base: dev
Choose a base branch
from

Conversation

jessicarowell
Copy link
Collaborator

Description

This reworks the update_submissions module so that it is used for re-submitting updates to BioSample, etc. Now the --update_submissions flag is used to submit an updated metadata file. It requires the submission_report.csv to be present in its expected location (with valid accession IDs) and will upload new submission/xml files with updated metadata and the accession IDs.

There is a new fetch_submissions module that runs after initial_submissions. It runs after an initial submission or an update submission, but can also be run independently with fetch_reports_only = true.

The subworkflow was re-worked a bit to allow these three independent submission-related functionalities:

  • Run an initial submission (and try to fetch the reports immediately)
  • Run an updated submission (submit a new metadata file, and try to fetch the reports immediately)
  • Try to fetch reports for an existing submission (the workflow will look for this submission in output_dir/submission_files/metadata_file_basename).

Checklist

Go Through Checklist Below and Place A ✔️ (X Inside the Box) if Completed

General Checks

  • [] Have you run appropriate tests (unit/integration/end-to-end) to check logic across run environments (Conda/Docker/Singularity on Scicomp/AWS/NF Tower/Local)?
    Checked test profile on Scicomp, with Singularity

    For each relevant configuration:

    • Can the program run completely through without erroring out?
    • Does it produce the expected outputs, given the inputs provided?
  • Have you conducted proper linting procedures?

    • Numpy formatted docstrings for functions
    • Comments explaining lines of code
    • Consistent and intuitive naming conventions for variables, functions, classes, methods, attributes, and scripts
    • Single empty line between class functions, two lines between non-class functions, and two lines between imports and code body
    • Camel case formatting for class names
  • [] Have you updated existing documentation (README.md, etc.) or created new ones within docs?
    I have not done this yet. Should be completed before merging the PR.
    We need to add fetch_reports_only flag, and explain that's just for fetching the report.xml from NCBI server if, for instance, it wasn't fetched after the initial submission ran. And we need to clarify that update_submission is for updating a submission you've already made - it requires the submission_report.csv (with valid accession IDs) and will upload new submission/xml files with updated metadata and the accession IDs. The IDs are required for NCBI to link the record to the original one and update it (otherwise it gets submitted as a new record).

CDC Checks

  • Did you check for sensitive data, and remove any?
  • If you added or modified HTML, did you check that it was 508 compliant?

Are additional approvals needed for this change? If so, please mention them below:

Are there potential vulnerabilities or licensing issues with any new dependencies introduced? If so, please mention them below:

@jessicarowell
Copy link
Collaborator Author

jessicarowell commented Jan 21, 2025

Test the three different flags, for example (with virus test data):

  1. Initial submission - no change nextflow run main.nf -profile singularity,test --species virus --output_dir test --submission_config ~/02.scratch/submission_config.yaml --submission true --submission_wait_time 1 --annotation true
  2. Fetch reports for this submission (make sure you point output_dir to the same as the dir in # 1 nextflow run main.nf -profile singularity,test --species virus --output_dir test --submission_config ~/02.scratch/submission_config.yaml --submission true --submission_wait_time 1 --annotation true --fetch_reports_only
  3. Update submission - change some data in the metadata file from # 1 and try to push this updated metadata to NCBI test server nextflow run main.nf -profile singularity,test --species virus --output_dir test --submission_config ~/02.scratch/submission_config.yaml --submission true --submission_wait_time 1 --annotation true --update_submission

NOTE: # 3 should fail if it doesn't find the accession ID in the submission report csv file, and now that I'm thinking about it - it probably doesn't fail. If you can test and let me know, then I'll fix that. To update a submission, NCBI requires the accession IDs (that get pulled into the submission_report.csv file if they exist. On the test server, we never get those...I still need to work that out or do a prod submission to properly assess this feature, but need to talk to NCBI before I submit to prod, so I don't break stuff).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant