Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V1.2.2 Update #82

Merged
merged 3 commits into from
Sep 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ github_pages_url <- description$GITHUB_PAGES

<p style="font-size: 16px;"><em>Public Database Submission Pipeline</em></p>

**Beta Version**: v1.2.1. This pipeline is currently in Beta testing, and issues could appear during submission. Please use it at your own risk. Feedback and suggestions are welcome!
**Beta Version**: v1.2.2. This pipeline is currently in Beta testing, and issues could appear during submission. Please use it at your own risk. Feedback and suggestions are welcome!

**General Disclaimer**: This repository was created for use by CDC programs to collaborate on public health related projects in support of the [CDC mission](https://www.cdc.gov/about/organization/mission.htm). GitHub is not hosted by the CDC, but is a third party website used by CDC and its partners to share information and collaborate on software. CDC use of GitHub does not imply an endorsement of any one particular service, product, or enterprise.

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

</p>

**Beta Version**: 1.2.1. This pipeline is currently in Beta testing, and
**Beta Version**: 1.2.2. This pipeline is currently in Beta testing, and
issues could appear during submission. Please use it at your own risk.
Feedback and suggestions are welcome\!

Expand Down
110 changes: 105 additions & 5 deletions config/gisaid/gisaid_FLU_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,56 @@
description="Additional information regarding patient (e.g. Patient infected while interacting with animal).",
title="Additional host information",
),
"gs-Sampling_Strategy": Column(
dtype="object",
checks=None,
nullable=True,
unique=False,
coerce=False,
required=False,
description="Sampling strategy for sequence. (i.e. Baseline surveillance)",
title="Sampling strategy",
),
"gs-Sequencing_Strategy": Column(
dtype="object",
checks=None,
nullable=True,
unique=False,
coerce=False,
required=False,
description="Sequencing purpose for sample. (i.e. DNA amplification)",
title="Sequencing strategy",
),
"gs-Sequencing_Technology": Column(
dtype="object",
checks=None,
nullable=True,
unique=False,
coerce=False,
required=False,
description="Add the sequencer brand and model (e.g. Illumina MiSeq, Sanger, Nanopore MinION).",
title="Sequencing technology",
),
"gs-Assembly_Method": Column(
dtype="object",
checks=None,
nullable=True,
unique=False,
coerce=False,
required=False,
description="Additional information regarding patient (e.g. Patient infected while interacting with animal).",
title="Assembly Method",
),
"gs-Coverage": Column(
dtype="object",
checks=None,
nullable=True,
unique=False,
coerce=False,
required=False,
description="Average genome coverage (e.g. 50x, 100x, 1,000x).",
title="Average coverage",
),
"gs-Submitting_Sample_Id": Column(
dtype="object",
checks=None,
Expand Down Expand Up @@ -198,7 +248,17 @@
coerce=False,
required=False,
description="",
title="adamantanes resistance",
title="adamantanes resistance genotype",
),
"gs-Adamantanes_Resistance_pheno": Column(
dtype="object",
checks=None,
nullable=True,
unique=False,
coerce=False,
required=False,
description="",
title="adamantanes resistance phenotype",
),
"gs-Oseltamivir_Resistance_geno": Column(
dtype="object",
Expand All @@ -208,7 +268,17 @@
coerce=False,
required=False,
description="",
title="oseltamivir resistance",
title="oseltamivir resistance genotype",
),
"gs-Oseltamivir_Resistance_pheno": Column(
dtype="object",
checks=None,
nullable=True,
unique=False,
coerce=False,
required=False,
description="",
title="oseltamivir resistance phenotype",
),
"gs-Zanamivir_Resistance_geno": Column(
dtype="object",
Expand All @@ -218,7 +288,17 @@
coerce=False,
required=False,
description="",
title="zanamivir resistance",
title="zanamivir resistance genotype",
),
"gs-Zanamivir_Resistance_pheno": Column(
dtype="object",
checks=None,
nullable=True,
unique=False,
coerce=False,
required=False,
description="",
title="zanamivir resistance phenotype",
),
"gs-Peramivir_Resistance_geno": Column(
dtype="object",
Expand All @@ -228,7 +308,17 @@
coerce=False,
required=False,
description="",
title="peramivir resistance",
title="peramivir resistance genotype",
),
"gs-Peramivir_Resistance_pheno": Column(
dtype="object",
checks=None,
nullable=True,
unique=False,
coerce=False,
required=False,
description="",
title="peramivir resistance phenotype",
),
"gs-Other_Resistance_geno": Column(
dtype="object",
Expand All @@ -238,7 +328,17 @@
coerce=False,
required=False,
description="",
title="other resistances",
title="other resistances genotype",
),
"gs-Other_Resistance_pheno": Column(
dtype="object",
checks=None,
nullable=True,
unique=False,
coerce=False,
required=False,
description="",
title="other resistances phenotype",
),
"gs-Host_Gender": Column(
dtype="object",
Expand Down
2 changes: 1 addition & 1 deletion docs/app.json

Large diffs are not rendered by default.

31 changes: 28 additions & 3 deletions gisaid_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

import shutil
import subprocess
from typing import Dict, Any, List, Optional, Match
from typing import Dict, Any, List, Optional, Match, Any
import os
import pandas as pd
import file_handler
Expand All @@ -15,6 +15,8 @@
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
import warnings
warnings.filterwarnings("ignore", 'This pattern has match groups')
import re

import upload_log
Expand All @@ -39,13 +41,15 @@ def create_gisaid_files(organism: str, database: str, submission_name: str, subm
gisaid_df["fn"] = "sequence.fsa"
first_cols = ["submitter", "fn", sample_name_column]
elif "FLU" in organism:
gisaid_df = gisaid_df.rename(columns = {"authors": "Authors", "collection_date": "Collection_Date"})
gisaid_df = gisaid_df.rename(columns = {"authors": "Authors"})
# Parse out dates into respective columns
gisaid_df[["Collection_Date", "Collection_Year", "Collection_Month"]] = gisaid_df["collection_date"].apply(process_flu_dates)
gisaid_df["Isolate_Id"] = ""
gisaid_df["Segment_Ids"] = ""
# Pivot FLU segment names from long form to wide form
gisaid_df["segment"] = "Seq_Id (" + gisaid_df["segment"].astype(str) + ")"
group_df = gisaid_df.pivot(index="Isolate_Name", columns="segment", values="sample_name").reset_index()
gisaid_df = gisaid_df.drop(columns=["sample_name", "segment"])
gisaid_df = gisaid_df.drop(columns=["sample_name", "segment", "collection_date"])
gisaid_df = gisaid_df.drop_duplicates(keep="first")
gisaid_df = gisaid_df.merge(group_df, on="Isolate_Name", how="inner", validate="1:1")
first_cols = ["Isolate_Id","Segment_Ids","Isolate_Name"]
Expand All @@ -58,6 +62,27 @@ def create_gisaid_files(organism: str, database: str, submission_name: str, subm
file_handler.create_fasta(database="GISAID", metadata=metadata, submission_dir=submission_dir)
shutil.copy(os.path.join(submission_dir, "sequence.fsa"), os.path.join(submission_dir, "orig_sequence.fsa"))

# Flu collection dates require partial dates to use different columns
def process_flu_dates(row: Any) -> pd.Series:
sections = row.strip().split("-")
if len(sections) == 1:
full_date = ""
year = sections[0]
month = ""
elif len(sections) == 2:
full_date = ""
year = sections[0]
month = sections[1]
elif len(sections) == 3:
full_date = row.strip()
year = ""
month = ""
else:
print(f"Error: Unable to process 'Collection_Date' column for FLU GISAID submission. The field should be in format 'YYYY-MM-DD'. Value unable to process: {row.strip()}", file=sys.stderr)
sys.exit(1)
return pd.Series([full_date, year, month])


# Read output log from gisaid submission script
def process_gisaid_log(log_file: str, submission_dir: str) -> pd.DataFrame:
file_handler.validate_file(file_type="GISAID log", file_path=log_file)
Expand Down
2 changes: 1 addition & 1 deletion settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
PROG_DIR: str = os.path.dirname(os.path.abspath(__file__))

# SeqSender version
VERSION: str = "1.2.1 (Beta)"
VERSION: str = "1.2.2 (Beta)"

# Organism options with unique submission options
ORGANISM_CHOICES: List[str] = ["FLU", "COV", "POX", "ARBO", "OTHER"]
Expand Down
2 changes: 1 addition & 1 deletion shiny/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
header = (
ui.card_header(
ui.HTML(
"""<p><strong>Beta Version</strong>: 1.2.1. This pipeline is currently in Beta testing, and issues could appear during submission. Please use it at your own risk. Feedback and suggestions are welcome!</p>"""
"""<p><strong>Beta Version</strong>: 1.2.2. This pipeline is currently in Beta testing, and issues could appear during submission. Please use it at your own risk. Feedback and suggestions are welcome!</p>"""
)
),
)
Expand Down
10 changes: 10 additions & 0 deletions shiny/templates/config.gisaid.gisaid.FLU.schema_template.csv
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,25 @@ gs-sub_province,Optional,"Local region name, county, territory, etc."
gs-Location_Additional_info,Optional,"Additional location information (e.g. Cruise Ship, Convention, Live animal market)."
gs-Host,Required,"Host species name. For Wastewater use ""Environment""."
gs-Host_Additional_info,Optional,Additional information regarding patient (e.g. Patient infected while interacting with animal).
gs-Sampling_Strategy,Optional,Sampling strategy for sequence. (i.e. Baseline surveillance)
gs-Sequencing_Strategy,Optional,Sequencing purpose for sample. (i.e. DNA amplification)
gs-Sequencing_Technology,Optional,"Add the sequencer brand and model (e.g. Illumina MiSeq, Sanger, Nanopore MinION)."
gs-Assembly_Method,Optional,Additional information regarding patient (e.g. Patient infected while interacting with animal).
gs-Coverage,Optional,"Average genome coverage (e.g. 50x, 100x, 1,000x)."
gs-Submitting_Sample_Id,Optional,ID used by submitting lab.
gs-Originating_Lab_Id,Optional,ID used for originating lab.
gs-Originating_Sample_Id,Optional,ID used by originating lab.
gs-Antigen_Character,Optional,
gs-Adamantanes_Resistance_geno,Optional,
gs-Adamantanes_Resistance_pheno,Optional,
gs-Oseltamivir_Resistance_geno,Optional,
gs-Oseltamivir_Resistance_pheno,Optional,
gs-Zanamivir_Resistance_geno,Optional,
gs-Zanamivir_Resistance_pheno,Optional,
gs-Peramivir_Resistance_geno,Optional,
gs-Peramivir_Resistance_pheno,Optional,
gs-Other_Resistance_geno,Optional,
gs-Other_Resistance_pheno,Optional,
gs-Host_Gender,Required,"Synonym for ""Biological sex"". Should be ""Female"", ""Male"", or ""Unknown""."
gs-Host_Age,Required,Numeric age for host.
gs-Host_Age_Unit,Required,"Unit of time for host age (e.g. Year is ""Y"", Month is ""M"")."
Expand Down
34 changes: 17 additions & 17 deletions test_data/FLU/flu_gisaid_metadata.csv
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
sequence_name,gs-sample_name,organism,collection_date,authors,gs-Isolate_Name,gs-segment,gs-Subtype,gs-Lineage,gs-Passage_History,gs-Location,gs-province,gs-sub_province,gs-Location_Additional_info,gs-Host,gs-Host_Additional_info,gs-Submitting_Sample_Id,gs-Originating_Lab_Id,gs-Originating_Sample_Id,gs-Antigen_Character,gs-Adamantanes_Resistance_geno,gs-Oseltamivir_Resistance_geno,gs-Zanamivir_Resistance_geno,gs-Peramivir_Resistance_geno,gs-Other_Resistance_geno,gs-Adamantanes_Resistance_pheno,gs-Oseltamivir_Resistance_pheno,gs-Zanamivir_Resistance_pheno,gs-Peramivir_Resistance_pheno,gs-Other_Resistance_pheno,gs-Host_Age,gs-Collection_Month,gs-Collection_Year,gs-Host_Age_Unit,gs-Host_Gender,gs-Health_Status,gs-Note,gs-PMID
XX-566912_PB2,A/California/566912/2016_PB2,Influenza A virus,2016-12-28,John Doe; Jane Doe;,A/California/566912/2016,PB2,H3N2,,Original,United States,California,,,Human,,,3080,,,,,,,,,,,,,92,,,Y,F,Unknown,,
XX-566912_PB1,A/California/566912/2016_PB1,Influenza A virus,2016-12-28,John Doe; Jane Doe;,A/California/566912/2016,PB1,H3N2,,Original,United States,California,,,Human,,,3080,,,,,,,,,,,,,92,,,Y,F,Unknown,,
XX-566912_PA,A/California/566912/2016_PA,Influenza A virus,2016-12-28,John Doe; Jane Doe;,A/California/566912/2016,PA,H3N2,,Original,United States,California,,,Human,,,3080,,,,,,,,,,,,,92,,,Y,F,Unknown,,
XX-566912_HA,A/California/566912/2016_HA,Influenza A virus,2016-12-28,John Doe; Jane Doe;,A/California/566912/2016,HA,H3N2,,Original,United States,California,,,Human,,,3080,,,,,,,,,,,,,92,,,Y,F,Unknown,,
XX-566912_NP,A/California/566912/2016_NP,Influenza A virus,2016-12-28,John Doe; Jane Doe;,A/California/566912/2016,NP,H3N2,,Original,United States,California,,,Human,,,3080,,,,,,,,,,,,,92,,,Y,F,Unknown,,
XX-566912_NA,A/California/566912/2016_NA,Influenza A virus,2016-12-28,John Doe; Jane Doe;,A/California/566912/2016,NA,H3N2,,Original,United States,California,,,Human,,,3080,,,,,,,,,,,,,92,,,Y,F,Unknown,,
XX-566912_M,A/California/566912/2016_MP,Influenza A virus,2016-12-28,John Doe; Jane Doe;,A/California/566912/2016,MP,H3N2,,Original,United States,California,,,Human,,,3080,,,,,,,,,,,,,92,,,Y,F,Unknown,,
XX-566912_NS,A/California/566912/2016_NS,Influenza A virus,2016-12-28,John Doe; Jane Doe;,A/California/566912/2016,NS,H3N2,,Original,United States,California,,,Human,,,3080,,,,,,,,,,,,,92,,,Y,F,Unknown,,
XX-566913_PB2,A/Texas/566913/2016_PB2,Influenza A virus,2016-11-10,John Doe; Jane Doe;,A/Texas/566913/2016,PB2,H3N2,,Original,United States,Texas,,,Human,,,3081,,,,,,,,,,,,,21,,,Y,M,Unknown,,
XX-566913_PB1,A/Texas/566913/2016_PB1,Influenza A virus,2016-11-10,John Doe; Jane Doe;,A/Texas/566913/2016,PB1,H3N2,,Original,United States,Texas,,,Human,,,3081,,,,,,,,,,,,,21,,,Y,M,Unknown,,
XX-566913_PA,A/Texas/566913/2016_PA,Influenza A virus,2016-11-10,John Doe; Jane Doe;,A/Texas/566913/2016,PA,H3N2,,Original,United States,Texas,,,Human,,,3081,,,,,,,,,,,,,21,,,Y,M,Unknown,,
XX-566913_HA,A/Texas/566913/2016_HA,Influenza A virus,2016-11-10,John Doe; Jane Doe;,A/Texas/566913/2016,HA,H3N2,,Original,United States,Texas,,,Human,,,3081,,,,,,,,,,,,,21,,,Y,M,Unknown,,
XX-566913_NP,A/Texas/566913/2016_NP,Influenza A virus,2016-11-10,John Doe; Jane Doe;,A/Texas/566913/2016,NP,H3N2,,Original,United States,Texas,,,Human,,,3081,,,,,,,,,,,,,21,,,Y,M,Unknown,,
XX-566913_NA,A/Texas/566913/2016_NA,Influenza A virus,2016-11-10,John Doe; Jane Doe;,A/Texas/566913/2016,NA,H3N2,,Original,United States,Texas,,,Human,,,3081,,,,,,,,,,,,,21,,,Y,M,Unknown,,
XX-566913_M,A/Texas/566913/2016_MP,Influenza A virus,2016-11-10,John Doe; Jane Doe;,A/Texas/566913/2016,MP,H3N2,,Original,United States,Texas,,,Human,,,3081,,,,,,,,,,,,,21,,,Y,M,Unknown,,
XX-566913_NS,A/Texas/566913/2016_NS,Influenza A virus,2016-11-10,John Doe; Jane Doe;,A/Texas/566913/2016,NS,H3N2,,Original,United States,Texas,,,Human,,,3081,,,,,,,,,,,,,21,,,Y,M,Unknown,,
sequence_name,gs-sample_name,organism,collection_date,authors,gs-Isolate_Name,gs-segment,gs-Subtype,gs-Lineage,gs-Passage_History,gs-Location,gs-province,gs-sub_province,gs-Location_Additional_info,gs-Host,gs-Host_Additional_info,gs-Sampling_Strategy,gs-Sequencing_Strategy,gs-Sequencing_Technology,gs-Assembly_Method,gs-Coverage,gs-Submitting_Sample_Id,gs-Originating_Lab_Id,gs-Originating_Sample_Id,gs-Antigen_Character,gs-Adamantanes_Resistance_geno,gs-Oseltamivir_Resistance_geno,gs-Zanamivir_Resistance_geno,gs-Peramivir_Resistance_geno,gs-Other_Resistance_geno,gs-Adamantanes_Resistance_pheno,gs-Oseltamivir_Resistance_pheno,gs-Zanamivir_Resistance_pheno,gs-Peramivir_Resistance_pheno,gs-Other_Resistance_pheno,gs-Host_Age,gs-Host_Age_Unit,gs-Host_Gender,gs-Health_Status,gs-Note,gs-PMID
XX-566912_PB2,A/California/566912/2016_PB2,Influenza A virus,2016-12-28,John Doe; Jane Doe;,A/California/566912/2016,PB2,H3N2,,Original,United States,California,,,Human,,Baseline surveillance,,Illumina,FluAssembler,100x,,3080,,,,,,,,,,,,,92,Y,F,Unknown,,
XX-566912_PB1,A/California/566912/2016_PB1,Influenza A virus,2016-12-28,John Doe; Jane Doe;,A/California/566912/2016,PB1,H3N2,,Original,United States,California,,,Human,,Baseline surveillance,,Illumina,FluAssembler,100x,,3080,,,,,,,,,,,,,92,Y,F,Unknown,,
XX-566912_PA,A/California/566912/2016_PA,Influenza A virus,2016-12-28,John Doe; Jane Doe;,A/California/566912/2016,PA,H3N2,,Original,United States,California,,,Human,,Baseline surveillance,,Illumina,FluAssembler,100x,,3080,,,,,,,,,,,,,92,Y,F,Unknown,,
XX-566912_HA,A/California/566912/2016_HA,Influenza A virus,2016-12-28,John Doe; Jane Doe;,A/California/566912/2016,HA,H3N2,,Original,United States,California,,,Human,,Baseline surveillance,,Illumina,FluAssembler,100x,,3080,,,,,,,,,,,,,92,Y,F,Unknown,,
XX-566912_NP,A/California/566912/2016_NP,Influenza A virus,2016-12-28,John Doe; Jane Doe;,A/California/566912/2016,NP,H3N2,,Original,United States,California,,,Human,,Baseline surveillance,,Illumina,FluAssembler,100x,,3080,,,,,,,,,,,,,92,Y,F,Unknown,,
XX-566912_NA,A/California/566912/2016_NA,Influenza A virus,2016-12-28,John Doe; Jane Doe;,A/California/566912/2016,NA,H3N2,,Original,United States,California,,,Human,,Baseline surveillance,,Illumina,FluAssembler,100x,,3080,,,,,,,,,,,,,92,Y,F,Unknown,,
XX-566912_M,A/California/566912/2016_MP,Influenza A virus,2016-12-28,John Doe; Jane Doe;,A/California/566912/2016,MP,H3N2,,Original,United States,California,,,Human,,Baseline surveillance,,Illumina,FluAssembler,100x,,3080,,,,,,,,,,,,,92,Y,F,Unknown,,
XX-566912_NS,A/California/566912/2016_NS,Influenza A virus,2016-12-28,John Doe; Jane Doe;,A/California/566912/2016,NS,H3N2,,Original,United States,California,,,Human,,Baseline surveillance,,Illumina,FluAssembler,100x,,3080,,,,,,,,,,,,,92,Y,F,Unknown,,
XX-566913_PB2,A/Texas/566913/2016_PB2,Influenza A virus,2016-11-10,John Doe; Jane Doe;,A/Texas/566913/2016,PB2,H3N2,,Original,United States,Texas,,,Human,,Baseline surveillance,,Illumina,FluAssembler,100x,,3081,,,,,,,,,,,,,21,Y,M,Unknown,,
XX-566913_PB1,A/Texas/566913/2016_PB1,Influenza A virus,2016-11-10,John Doe; Jane Doe;,A/Texas/566913/2016,PB1,H3N2,,Original,United States,Texas,,,Human,,Baseline surveillance,,Illumina,FluAssembler,100x,,3081,,,,,,,,,,,,,21,Y,M,Unknown,,
XX-566913_PA,A/Texas/566913/2016_PA,Influenza A virus,2016-11-10,John Doe; Jane Doe;,A/Texas/566913/2016,PA,H3N2,,Original,United States,Texas,,,Human,,Baseline surveillance,,Illumina,FluAssembler,100x,,3081,,,,,,,,,,,,,21,Y,M,Unknown,,
XX-566913_HA,A/Texas/566913/2016_HA,Influenza A virus,2016-11-10,John Doe; Jane Doe;,A/Texas/566913/2016,HA,H3N2,,Original,United States,Texas,,,Human,,Baseline surveillance,,Illumina,FluAssembler,100x,,3081,,,,,,,,,,,,,21,Y,M,Unknown,,
XX-566913_NP,A/Texas/566913/2016_NP,Influenza A virus,2016-11-10,John Doe; Jane Doe;,A/Texas/566913/2016,NP,H3N2,,Original,United States,Texas,,,Human,,Baseline surveillance,,Illumina,FluAssembler,100x,,3081,,,,,,,,,,,,,21,Y,M,Unknown,,
XX-566913_NA,A/Texas/566913/2016_NA,Influenza A virus,2016-11-10,John Doe; Jane Doe;,A/Texas/566913/2016,NA,H3N2,,Original,United States,Texas,,,Human,,Baseline surveillance,,Illumina,FluAssembler,100x,,3081,,,,,,,,,,,,,21,Y,M,Unknown,,
XX-566913_M,A/Texas/566913/2016_MP,Influenza A virus,2016-11-10,John Doe; Jane Doe;,A/Texas/566913/2016,MP,H3N2,,Original,United States,Texas,,,Human,,Baseline surveillance,,Illumina,FluAssembler,100x,,3081,,,,,,,,,,,,,21,Y,M,Unknown,,
XX-566913_NS,A/Texas/566913/2016_NS,Influenza A virus,2016-11-10,John Doe; Jane Doe;,A/Texas/566913/2016,NS,H3N2,,Original,United States,Texas,,,Human,,Baseline surveillance,,Illumina,FluAssembler,100x,,3081,,,,,,,,,,,,,21,Y,M,Unknown,,
Loading