Skip to content

Commit

Permalink
Bug fix for gisaid name overwriting genbank name for fasta file
Browse files Browse the repository at this point in the history
  • Loading branch information
dthoward96 committed Apr 15, 2024
1 parent 67d14c9 commit 18f77f1
Show file tree
Hide file tree
Showing 50 changed files with 5,494 additions and 3,560 deletions.
35 changes: 35 additions & 0 deletions TEST_COV/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
Submission:
NCBI:
Username: username
Password: password
Table2asn: False
Submission_Position: 2
Description:
Title: cov_test_submission
Comment: This is a test submission
Organization:
'@role': owner
'@type': institute
Name: CDC
Address:
Affil: Centers for Disease Control and Prevention
Div: Respiratory Viruses Branch, Division of Viral Diseases
Street: 1600 Clifton Rd
City: Atlanta
Sub: GA
Postal_code: 30329
Country: USA
Email: [email protected]
Phone: ""
Submitter:
'@email': [email protected]
'@alt_email':
Name:
First: Jane
Last: Doe
GISAID:
Client-Id: TEST-EA76875B00C3
Username: username
Password: password
Submission_Position: 1

3 changes: 3 additions & 0 deletions TEST_COV/metadata.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
organism,collection_date,authors,ncbi-spuid,ncbi-spuid_namespace,ncbi-bioproject,bs-isolate,bs-package,bs-description,bs-collected_by, bs-host, bs-host_disease,bs-isolation_source,bs-geo_loc_name,bs-host_sex,bs-host_age,sra-file_location,sra-file_name,sra-library_name,sra-instrument_model,sra-library_strategy,sra-library_source,sra-library_selection,sra-library_layout,sra-library_construction_protocol,sra-loader,sequence_name,gb-seq_id,gb-subm_lab,gb-subm_lab_division,gb-subm_lab_addr,gb-authors,gb-publication_status,gb-publication_title,src-isolate,src-country,src-host,src-isolation_source,cmt-StructuredCommentPrefix,cmt-Assembly Method,cmt-Coverage,cmt-Sequencing Technology,cmt-StructuredCommentSuffix,gs-virus_name,gs-type,gs-passage,gs-location,gs-add_location,gs-host,gs-add_host_info,gs-sampling_strategy,gs-gender,gs-patient_age,gs-patient_status,gs-specimen,gs-outbreak,gs-last_vaccinated,gs-treatment,gs-seq_technology,gs-assembly_method,gs-coverage,gs-orig_lab,gs-orig_lab_addr,gs-provider_sample_id,gs-subm_lab,gs-subm_lab_addr,gs-subm_sample_id,gs-consortium,gs-comment,gs-comment_type
Severe acute respiratory syndrome coronavirus 2,3/28/2020,"Doe, John, R.; Doe, Jane;",SARS-CoV-2/human/USA/GA_2741/2020,CDC-OAMD,PRJNA512913,SARS-CoV-2/human/USA/GA_2741/2020,SARS-CoV-2.cl.1.0,Sars CoV2 Sequencing Baseline Constellation,Helix,Homo sapiens,COVID-19,nasal swab,United States: Georgia,Male,28,local,"fastq_1_R1.fastq.gz, fastq_1_R2.fastq.gz",Helix COVID-19 and Flu Test,Illumina NovaSeq 6000,AMPLICON,VIRAL RNA,RT-PCR,PAIRED,Helix Hybrid-Capture Test,latf-load,GA_2741,SARS-CoV-2/human/USA/GA_2741/2020,NIH,NCBI,"10 Center Dr, Bethesda, MD, USA 20895","Doe, John, R.; Doe, Jane;",unpublished,,SARS-CoV-2/human/USA/GA_2741/2020,USA: GA,Homo sapiens,nasal swab,Assembly-Data,Newbler v. 2.3,100x,Illumina ,Assembly-Data,hCoV-19/USA/GA-2741/2022,betacoronavirus,Original,North America/USA/Georgia,unknown,Human,,,unknown,unknown,unknown,,,,,Illumina NextSeq 550,,3000x,Bio1 Mel,"16 Info St, Mel, Aus",,Bio2 Mel,"32 Data St, Mel, Aus",,,,
Severe acute respiratory syndrome coronavirus 2,4/29/2020,"Doe, John; Doe, Jane;",SARS-CoV-2/human/USA/GA_3742/2020,CDC-OAMD,PRJNA512962,SARS-CoV-2/human/USA/GA_3742/2020,SARS-CoV-2.cl.1.0,Sars CoV2 Sequencing Baseline Constellation,Helix,Homo sapiens,COVID-20,nasal swab,United States: Georgia,Male,45,local,"fastq_2_R1.fastq.gz, fastq_2_R2.fastq.gz",Helix COVID-19 and Flu Test,Illumina NovaSeq 6000,AMPLICON,VIRAL RNA,RT-PCR,PAIRED,Helix Hybrid-Capture Test,latf-load,GA_2742,SARS-CoV-2/human/USA/GA_3742/2020,NIH,NCBI,"10 Center Dr, Bethesda, MD, USA 20895","Doe, John; Doe, Jane;",unpublished,,SARS-CoV-2/human/USA/GA_3742/2020,USA: GA,Homo sapiens,nasal swab,Assembly-Data,Newbler v. 2.3,100x,Illumina,Assembly-Data ,hCoV-19/USA/GA_2742/2022,betacoronavirus,Original,North America/USA/Georgia,unknown,Human,,,unknown,unknown,unknown,,,,,Illumina NextSeq 550,,3000x,Bio1 Mel,"16 Info St, Mel, Aus",,Bio2 Mel,"33 Data St, Mel, Aus",,,,
Binary file added TEST_COV/raw_reads/fastq_1_R1.fastq.gz
Binary file not shown.
Binary file added TEST_COV/raw_reads/fastq_1_R2.fastq.gz
Binary file not shown.
Binary file added TEST_COV/raw_reads/fastq_2_R1.fastq.gz
Binary file not shown.
Binary file added TEST_COV/raw_reads/fastq_2_R2.fastq.gz
Binary file not shown.
750 changes: 750 additions & 0 deletions TEST_COV/sequence.fasta

Large diffs are not rendered by default.

90 changes: 90 additions & 0 deletions TEST_COV/submission_files/BIOSAMPLE/submission.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
<?xml version='1.0' encoding='utf-8'?>
<Submission>
<Description>
<Title>cov_test_submission</Title>
<Comment>This is a test submission</Comment>
<Organization type="institute" role="owner">
<Name>CDC</Name>
<Contact email="[email protected]">
<Name>
<First>Jane</First>
<Last>Doe</Last>
</Name>
</Contact>
</Organization>
</Description>
<Action>
<AddData target_db="BioSample">
<Data content_type="xml">
<XmlContent>
<BioSample schema_version="2.0">
<SampleId>
<SPUID spuid_namespace="CDC-OAMD">SARS-CoV-2/human/USA/GA_2741/2020</SPUID>
</SampleId>
<Descriptor>
<Title>Sars CoV2 Sequencing Baseline Constellation</Title>
</Descriptor>
<Organism>
<OrganismName>Severe acute respiratory syndrome coronavirus 2</OrganismName>
</Organism>
<BioProject>
<PrimaryId db="BioProject">PRJNA512913</PrimaryId>
</BioProject>
<Package>SARS-CoV-2.cl.1.0</Package>
<Attributes>
<Attribute attribute_name="isolate">SARS-CoV-2/human/USA/GA_2741/2020</Attribute>
<Attribute attribute_name="collected_by">Helix</Attribute>
<Attribute attribute_name="host">Homo sapiens</Attribute>
<Attribute attribute_name="host_disease">COVID-19</Attribute>
<Attribute attribute_name="isolation_source">nasal swab</Attribute>
<Attribute attribute_name="geo_loc_name">United States: Georgia</Attribute>
<Attribute attribute_name="host_sex">Male</Attribute>
<Attribute attribute_name="host_age">28</Attribute>
<Attribute attribute_name="collection_date">2020-03-28</Attribute>
</Attributes>
</BioSample>
</XmlContent>
</Data>
<Identifier>
<SPUID spuid_namespace="CDC-OAMD_bs">SARS-CoV-2/human/USA/GA_2741/2020</SPUID>
</Identifier>
</AddData>
</Action>
<Action>
<AddData target_db="BioSample">
<Data content_type="xml">
<XmlContent>
<BioSample schema_version="2.0">
<SampleId>
<SPUID spuid_namespace="CDC-OAMD">SARS-CoV-2/human/USA/GA_3742/2020</SPUID>
</SampleId>
<Descriptor>
<Title>Sars CoV2 Sequencing Baseline Constellation</Title>
</Descriptor>
<Organism>
<OrganismName>Severe acute respiratory syndrome coronavirus 2</OrganismName>
</Organism>
<BioProject>
<PrimaryId db="BioProject">PRJNA512962</PrimaryId>
</BioProject>
<Package>SARS-CoV-2.cl.1.0</Package>
<Attributes>
<Attribute attribute_name="isolate">SARS-CoV-2/human/USA/GA_3742/2020</Attribute>
<Attribute attribute_name="collected_by">Helix</Attribute>
<Attribute attribute_name="host">Homo sapiens</Attribute>
<Attribute attribute_name="host_disease">COVID-20</Attribute>
<Attribute attribute_name="isolation_source">nasal swab</Attribute>
<Attribute attribute_name="geo_loc_name">United States: Georgia</Attribute>
<Attribute attribute_name="host_sex">Male</Attribute>
<Attribute attribute_name="host_age">45</Attribute>
<Attribute attribute_name="collection_date">2020-04-29</Attribute>
</Attributes>
</BioSample>
</XmlContent>
</Data>
<Identifier>
<SPUID spuid_namespace="CDC-OAMD_bs">SARS-CoV-2/human/USA/GA_3742/2020</SPUID>
</Identifier>
</AddData>
</Action>
</Submission>
Binary file added TEST_COV/submission_files/GENBANK/TEST_COV.zip
Binary file not shown.
88 changes: 88 additions & 0 deletions TEST_COV/submission_files/GENBANK/authorset.sbt
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
Submit-block ::= {
contact {
contact {
name name {
last "Doe",
first "Jane",
middle "",
initials "",
suffix "",
title ""
},
affil std {
affil "Centers for Disease Control and Prevention",
div "Respiratory Viruses Branch, Division of Viral Diseases",
city "Atlanta",
sub "GA",
country "USA",
street "1600 Clifton Rd",
email "[email protected]",
phone "",
postal-code "30329"
}
}
},
cit {
authors {
names std {
{
name name {
last "Doe",
first "John",
suffix "R."
}
},
{
name name {
last "Doe",
first "Jane"
}
}
},
affil std {
affil "Centers for Disease Control and Prevention",
div "Respiratory Viruses Branch, Division of Viral Diseases",
city "Atlanta",
sub "GA",
country "USA",
street "1600 Clifton Rd",
postal-code "30329"
}
}
},
subtype new
}
Seqdesc ::= pub {
pub {
gen {
cit "unpublished",
authors {
names std {
{
name name {
last "Doe",
first "John",
suffix "R."
}
},
{
name name {
last "Doe",
first "Jane"
}
}
}
},
title ""
}
}
}
Seqdesc ::= user {
type str "Submission",
data {
{
label str "AdditionalComment",
data str "Submission Title: TEST_COV"
}
}
}
3 changes: 3 additions & 0 deletions TEST_COV/submission_files/GENBANK/comment.cmt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
SeqID StructuredCommentPrefix organism collection_date Assembly Method Coverage Sequencing Technology StructuredCommentSuffix
SARS-CoV-2/human/USA/GA_2741/2020 Assembly-Data Severe acute respiratory syndrome coronavirus 2 2020-03-28 Newbler v. 2.3 100x Illumina Assembly-Data
SARS-CoV-2/human/USA/GA_3742/2020 Assembly-Data Severe acute respiratory syndrome coronavirus 2 2020-04-29 Newbler v. 2.3 100x Illumina Assembly-Data
Loading

0 comments on commit 18f77f1

Please sign in to comment.