Mail: [email protected]
Research Gate: https://www.researchgate.net/profile/Gamaliel_Lopez-Leal
All samples were deposited in Bonampack server (132.248.220.35). The work directory is: /space31/PGE/gamlopez/INCAN
We fisrt run Quiime2 in a pair end mode, fisrt we have to activate qiime
source activate qiime2-2018.6
Then we have to import the data
nohup qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path manifest2.csv --output-path sequences.qza --source-format PairedEndFastqManifestPhred33 &
Here we need two files the manifest file and the mapping file, according to the qiime's manual for pair end mode. The manisfest file looks like:
sample-id,absolute-filepath,direction
# Lines starting with '#' are ignored and can be used to create
# "comments" or even "comment out" entries
sample-1,$PWD/some/filepath/sample1_R1.fastq.gz,forward
sample-2,$PWD/some/filepath/sample2_R1.fastq.gz,forward
sample-1,$PWD/some/filepath/sample1_R2.fastq.gz,reverse
sample-2,$PWD/some/filepath/sample2_R2.fastq.gz,reverse
You can check the manual here: https://docs.qiime2.org/2018.11/tutorials/importing/
The mapping file: https://docs.qiime2.org/2018.2/tutorials/metadata/
Note: In the Pair mode directory (Q2_results) the mapping file is named as metadata.txt
qiime dada2 denoise-paired --i-demultiplexed-seqs sequences.qza --o-table table-dada2_newParameters.qza --o-representative-sequences rep-seqs-dada2.qza --o-denoising-stats stats-dada2_newParameters.qza --p-trim-left-f 20 --p-trim-left-r 20 --p-trunc-len-f 200 --p-trunc-len-r 200 --p-n-threads 20
To vizualized the repots files:
nohup qiime metadata tabulate --m-input-file stats-dada2.qza --o-visualization stats-dada2.qzv &
nohup qiime feature-table summarize --i-table table-dada2.qza --o-visualization table-dada2.qzv --m-sample-metadata-file metadata.txt &
nohup qiime alignment mafft --i-sequences rep-seqs-dada2.qza --o-alignment aligned-rep-seqs.qza &
nohup qiime alignment mask --i-alignment aligned-rep-seqs.qza --o-masked-alignment masked-aligned-rep-seqs.qza &
nohup qiime phylogeny fasttree --i-alignment masked-aligned-rep-seqs.qza --o-tree unrooted-tree.qza &
nohup qiime phylogeny midpoint-root --i-tree unrooted-tree.qza --o-rooted-tree rooted-tree.qza &
The outfiles can loaded in iTol to vizualise the tree/trees (https://itol.embl.de/upload.cgi)
Before you classified your reads, you have to train the classifier. Here we used GreenGenes data base according to the qiime2 tutorial. Fisrt we download the files to train qiime
wget -O "85_otu_taxonomy.txt" "https://data.qiime2.org/2017.7/tutorials/training-feature-classifiers/85_otu_taxonomy.txt"
wget -O "85_otus.fasta" "https://data.qiime2.org/2017.7/tutorials/training-feature-classifiers/85_otus.fasta"
wget -O "rep-seqs.qza" "https://data.qiime2.org/2017.7/tutorials/training-feature-classifiers/rep-seqs.qza"
Process the data base and trainning
qiime tools import --type 'FeatureData[Sequence]' --input-path 85_otus.fasta --output-path 85_otus.qza
qiime tools import --type 'FeatureData[Taxonomy]' --source-format HeaderlessTSVTaxonomyFormat --input-path 85_otu_taxonomy.txt --output-path ref-taxonomy.qza
qiime feature-classifier extract-reads --i-sequences 85_otus.qza --p-f-primer GTGCCAGCMGCCGCGGTAA --p-r-primer GGACTACHVGGGTWTCTAAT --p-trunc-len 100 --o-reads ref-seqs.qza
qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads ref-seqs.qza --i-reference-taxonomy ref-taxonomy.qza --o-classifier classifier.qza
Classify your reads and vizualize using qiime2 view (https://view.qiime2.org/)
qiime feature-classifier classify-sklearn --i-classifier classifier.qza --i-reads rep-seqs-dada2.qza --o-classification taxonomy.qza
qiime metadata tabulate --m-input-file taxonomy.qza --o-visualization taxonomy.qzv
qiime taxa barplot --i-table table-dada2.qza --i-taxonomy taxonomy.qza --m-metadata-file metadata.txt --o-visualization taxa-bar-plots.qzv
qiime diversity core-metrics-phylogenetic --i-table table-dada2.qza --i-phylogeny rooted-tree.qza --p-sampling-depth 19 --m-metadata-file metadata.txt --output-dir core-diversity-phylogenetic_samplingdepth19
qiime emperor plot --i-pcoa bray_curtis_pcoa_results.qza --m-metadata-file /space31/PGE/gamlopez/INCAN/Q2_results/metadata.txt --o-visualization pcoa-visualization.qzv
Note: using this protocol we lost 90% of the reads in each sample after used dada2 (see dada2 report files). Therefore, we performed the quality filter and merges using Trim_galore, FASTX-Toolkit and PEAR
TrimGalore: https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/
FASTX-Toolkit: http://hannonlab.cshl.edu/fastx_toolkit/
PEAR: https://cme.h-its.org/exelixis/web/software/pear/
To import the data
nohup qiime tools import --type 'SampleData[SequencesWithQuality]' --input-path manifest-SE.csv --output-path single-end-sequence.qza --source-format SingleEndFastqManifestPhred33 &
The manifest format looks like:
sample-id,absolute-filepath,direction
sample-1,$PWD/some/filepath/sample1_R1.fastq,forward
Here we have to denoise the amplicons using dada2:
nohup qiime dada2 denoise-single --i-demultiplexed-seqs single-end-sequence.qza --o-table table-dada2 --o-representative-sequences rep-seqs-dada2 --o-denoising-stats stats-dada2.qza --p-n-threads 20 --p-trunc-len 240
Check the dada2 table reports files:
nohup qiime metadata tabulate --m-input-file stats-dada2.qza --o-visualization stats-dada2.qzv &
nohup qiime feature-table summarize --i-table table-dada2.qza --o-visualization table-dada2.qzv --m-sample-metadata-file mapping.txt &
nohup qiime alignment mafft --i-sequences rep-seqs-dada2.qza --o-alignment aligned-rep-seqs.qza &
nohup qiime alignment mask --i-alignment aligned-rep-seqs.qza --o-masked-alignment masked-aligned-rep-seqs.qza &
nohup qiime phylogeny fasttree --i-alignment masked-aligned-rep-seqs.qza --o-tree unrooted-tree.qza &
nohup qiime phylogeny midpoint-root --i-tree unrooted-tree.qza --o-rooted-tree rooted-tree.qza &
The outfiles can loaded in iTol to vizualise the tree/trees (https://itol.embl.de/upload.cgi)
qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads ref-seqs.qza --i-reference-taxonomy ref-taxonomy.qza --o-classifier classifier.qza
qiime feature-classifier classify-sklearn --i-classifier classifier.qza --i-reads rep-seqs-dada2.qza --o-classification taxonomy.qza
qiime metadata tabulate --m-input-file taxonomy.qza --o-visualization taxonomy.qzv
qiime taxa barplot --i-table table-dada2.qza --i-taxonomy taxonomy.qza --m-metadata-file mapping.txt --o-visualization taxa-bar-plots.qzv
nohup qiime diversity core-metrics-phylogenetic --i-table table-dada2.qza --i-phylogeny rooted-tree.qza --p-sampling-depth 100000 --m-metadata-file mapping.txt --output-dir core-diversity-phylogenetic_samplingdepth100mil &
cd core-diversity-phylogenetic_samplingdepth100mil
qiime emperor plot --i-pcoa bray_curtis_pcoa_results.qza --m-metadata-file /space31/PGE/gamlopez/INCAN/Q2_results-SE/mapping.txt --o-visualization pcoa-visualization.qzv
qiime diversity alpha-rarefaction --i-table table-dada2.qza --i-phylogeny rooted-tree.qza --p-max-depth 83732 --m-metadata-file mapping.txt --o-visualization alpha-rarefaction.qzv &
Note: the value 83732 correspond to the 75% of the reads from the sample with lower reads (after denoised the samples). This was used to include all samples in the test.