SGN tomato data description

Data set metadata

Dataset title: Tomato gene models

Dataset description: The ITAG2.4 release of the official Solanum lycopersicum (cultivar Heinz 1706) genome annotation with 34,725 gene models, available from the Sol Genomics Network (SGN).

Download URL: ftp://ftp.solgenomics.net/genomes/Solanum_lycopersicum/annotation/ITAG2.4_release/ITAG2.4_gene_models.gff3

License: ?

Release/version: ITAG2.4 genome annotation (based on SL2.50 genome assembly)

Release issue date: 23-02-2014 (DD-MM-YYYY)

Distribution format: GFF3 (according to gff-version pragma)

Note: Strictly speaking, the GFF file does not comply with this specification.

MD5 checksum: 4bf947efde8b0f8101e3bcb9746e5986

Data record metadata

Example:

SL2.50ch00	ITAG_eugene	gene	16437	18189	.	+	.	Alias=Solyc00g005000;ID=gene:Solyc00g005000.2;Name=Solyc00g005000.2;from_BOGAS=1;length=1753
SL2.50ch00	ITAG_eugene	mRNA	16437	18189	.	+	.	ID=mRNA:Solyc00g005000.2.1;Name=Solyc00g005000.2.1;Note=Aspartic proteinase nepenthesin I (AHRD V1 **-- A9ZMF9_NEPAL)%3B contains Interpro domain(s)  IPR001461  Peptidase A1 ;Ontology_term=GO:0006508;Parent=gene:Solyc00g005000.2;from_BOGAS=1;interpro2go_term=GO:0006508;length=1753;nb_exon=2
SL2.50ch00	ITAG_eugene	exon	16437	17275	.	+	.	ID=exon:Solyc00g005000.2.1.1;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1

GFF3 files are nine-column, tab-delimited, plain text files:

Column 1 "seqid": chromosome numbers (e.g. SL2.50ch00..ch12), mandatory

Note: Link the chromosomes to ENA/GenBank accessions (e.g. SL2.50ch01 -> CM001064.2).

Column 2 "source": data source (constant: ITAG_eugene, refers to Eugene gene predictor), mandatory

Column 3 "type": feature types (gene, mRNA, CDS, exon, intron, five_prime_UTR, three_prime_UTR), mandatory

Column 4 "start": start coordinate of the feature, mandatory

Column 5 "end": end coordinate of the feature, mandatory

Column 6 "score": not available (.)

Column 7 "strand": DNA strandedness (+/-), mandatory

Column 8 "phase": the phase of feature type (CDS or exon) indicates where the feature begins with reference to the reading frame (0, 1 or 2; and '.' used for other features), mandatory

Column 9 "attributes": contains key=value pairs separated by ;

ID: unique feature ID (redundantly) prefixed with feature type (e.g. gene:Solyc00g005000.2 or mRNA:Solyc00g005000.2.1) and (inconsistently) used in the non-prefixed form at the web front-end.
Name: the non-prefixed form of feature ID (e.g. Solyc00g005000.2 or Solyc00g005000.2.1)
Parent: refers to the parent ID of this (child) feature, indicates part-of relation (e.g. to group transcripts into genes or exons into transcripts)
Note: function annotation of transcripts based on homology to (plant) proteins in UniProtKB, domains/motifs in InterPro
Ontology_term, interpro2go_term or Sifter_term cross-references to Gene Ontology term IDs

Data set metadata

Dataset title: Tomato SGN genetic markers

Dataset description: The original dataset contains alignments to SGN unigenes, SGN marker sequences and SGN locus sequences. Only SGN markers are imported (in total 5077 ITAG_sgn_markers).

Download URL: ftp://ftp.solgenomics.net/genomes/Solanum_lycopersicum/annotation/ITAG2.4_release/ITAG2.4_sgn_data.gff3

License: ?

Release/version: ITAG2.4

Release issue date: 23-02-2014

Distribution format: GFF3 (according to gff-version pragma)

MD5 checksum: 939cf6f468eab5572653b626d5078aaa

Data record metadata

Example:

SL2.50ch00	ITAG_sgn_markers	match	3999461	4000061	0.989	-	.	Alias=SGN-M676;ID=gene1_0-i2;Name=SSR3;Note=marker name(s): SSR3%2C SGN-M676;Target=SGN-M676 1 601 +

Column 1 "seqid": chromosome numbers (e.g. SL2.50ch00..ch12), mandatory

Column 2 "source": data source (constant: ITAG_sgn_markers), mandatory

Column 3 "type": feature type match used only (although variant would be more correct)

Column 4 "start": start coordinate of the feature, mandatory

Column 5 "end": end coordinate of the feature, mandatory

Column 6 "score": not relevant

Column 7 "strand": not relevant

Column 8 "phase": not relevant

Column 9 "attributes": contains key=value pairs separated by ;

ID: unique marker ID (e.g. gene1_0-i2), mandatory
Name: maker as known in literature? (e.g. cLER-14-H18), mandatory
Alias: alternative name for the marker (e.g. SGN-M2995), mandatory
Note: concatenation of both Name and Alias values (redundant)

Data set metadata

Dataset title: Tomato SolCAP genetic markers

Dataset description: The dataset contains SolCAP genetic markers (in total 8760 SNPs).

Download URL: ftp://ftp.solgenomics.net/genomes/Solanum_lycopersicum/annotation/ITAG2.4_release/ITAG2.4_solCAP.gff3

License: ?

Release/version: ITAG2.4

Release issue date: 11-07-2014

Distribution format: GFF3 (according to gff-version pragma)

MD5 checksum: 6d1e291acfa8f20cf89438e521315c80

Data record metadata

Example:

SL2.50ch00	ITAG_sgn_markers	match	16728330	16728330	.	+	.	ID=solcap_snp_sl_100476;Name=solcap_snp_sl_100476;Alias=solcap_snp_sl_100476;Note=marker name(s): solcap_snp_sl_100476;Target=solcap_snp_sl_100476 1 1 +

Column 1 "seqid": chromosome numbers (e.g. SL2.50ch00..ch12), mandatory

Column 2 "source": data source (constant: ITAG_sgn_markers), mandatory

Column 3 "type": feature type match (although variant would be more correct)

Column 4 "start": start coordinate of the feature, mandatory

Column 5 "end": end coordinate of the feature, mandatory

Column 6 "score": not relevant

Column 7 "strand": not relevant

Column 8 "phase": not relevant

Column 9 "attributes": contains key=value pairs separated by ;

ID: unique marker ID (e.g. solcap_snp_sl_100476), mandatory
Name, Alias and Note: same as ID (redundant)

Data set metadata

Dataset title: Wild tomato genome annotation

Dataset description: The genome of the stress-tolerant wild tomato species Solanum pennellii (Bolger et al. 2014).

Download URL: ftp://ftp.solgenomics.net/genomes/Solanum_pennellii/spenn_v2.0_gene_models_annot.gff

License: ?

Release/version: 2.0 (not official but deduced from the file name)

Release issue date: 27-08-2014

Distribution format: GFF3 (according to gff-version pragma)

MD5 checksum: 71158bb0bf7bb323644c52c0fde37dc8

Data record metadata

Example:


Spenn-ch01	AUGUSTUS	gene	6838142	6841569	0.2	+	.	ID=Sopen01g006750;Name=Sopen01g006750;
Spenn-ch01	AUGUSTUS	mRNA	6838142	6841569	0.2	+	.	ID=Sopen01g006750.1;Name=Sopen01g006750.1;Parent=Sopen01g006750;Note=Member of the R2R3 factor gene family. | myb domain protein 16 (MYB16) | CONTAINS InterPro DOMAIN/s: SANT, DNA-binding , Homeodomain-like , Myb, DNA-binding , Homeodomain-related , Myb transcription factor , HTH transcriptional regulator, Myb-type, DNA-binding | BEST Arabidopsis thaliana protein match is: myb domain protein 106;
Spenn-ch01	AUGUSTUS	exon	6838142	6838355	.	+	.

GFF3 files are nine-column, tab-delimited, plain text files:

Column 1 "seqid": chromosome numbers (e.g. Spenm-ch00..ch12), mandatory

Note: Link the chromosomes to ENA/GenBank. Apparently, there are three S.pennellii genome assemblies in ENA.

Column 2 "source": data source (constant: AUGUSTUS gene predictor), mandatory

Column 3 "type": feature types (gene, mRNA, CDS, exon, intron but no five_prime_UTR or three_prime_UTR), mandatory

Column 4 "start": start coordinate of the feature, mandatory

Column 5 "end": end coordinate of the feature, mandatory

Column 6 "score": values between 0 and 4 for gene, mRNA, CDS and intron features but '.' for exons

Column 7 "strand": DNA strandedness (+/-), mandatory

Column 8 "phase": the phase of feature type (CDS or exon) indicates where the feature begins with reference to the reading frame (0, 1 or 2; and '.' used for other features), mandatory

Column 9 "attributes": contains key=value pairs separated by ;

ID: unique feature ID (e.g. Sopen01g006750 or Sopen01g006750.1) but CDS/exon/intron IDs are prefixed with feature type (e.g. cds:Sopen01g006750.1.1)!
Name: same as ID (redundant)
Parent: refers to the parent ID of this (child) feature, indicates part-of relation (e.g. to group transcripts into genes or exons into transcripts)
Note: function annotation of transcripts but without reference to e.g. UniProtKB, InterPro or GO term accessions/IDs

Data set metadata

Dataset title: Wild tomato SGN genetic markers

Dataset description: The dataset contains SGN genetic markers for S.pennellii (in total 2225).

Download URL: ftp://ftp.solgenomics.net/genomes/Solanum_pennellii/sgnMarkersSpenn.gff3

License: ?

Release/version: ?

Release issue date: 08-10-2014

Distribution format: GFF (non-standard, gff-version pragma missing)

MD5 checksum: 047c9b3b8ed0bd0f813bd897e502529b

Data record metadata

Example:

Spenn-ch12      sgn_markers     match   2621812 2622049 .       +       .       Alias=SGN-M1347;ID=T0028;Note=marker name(s): T0028 SGN-M1347 |identity=99.58|escore=2e-126

Column 1 "seqid": chromosome numbers (e.g. Spenn-ch00..ch12), mandatory

Column 2 "source": data source (constant: sgn_markers), mandatory

Column 3 "type": feature type match (although variant would be more correct)

Column 4 "start": start coordinate of the feature, mandatory

Column 5 "end": end coordinate of the feature, mandatory

Column 6 "score": not relevant

Column 7 "strand": not relevant

Column 8 "phase": not relevant

Column 9 "attributes": contains key=value pairs separated by ;

ID: unique marker ID (e.g. T0028), mandatory
- Note: There are 25 duplicates (markers) found with the same ID (e.g. P1).
Alias : alternative name/ID for the marker (e.g. SGN-M1347)
Note : contains both ID and Alias (redundant) followed by ill-formated/delimited pairs |identity=...|escore=...

ODEX4all

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SGN tomato data description

Bayer Crop Science

Breed4Food

DSM

IOS Press

Limagrain

NIZO

Roche

VLPB

Other

Clone this wiki locally