Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow error from V-pipe dry run #160

Open
robertsap opened this issue Sep 9, 2024 · 4 comments
Open

Workflow error from V-pipe dry run #160

robertsap opened this issue Sep 9, 2024 · 4 comments
Assignees

Comments

@robertsap
Copy link

Describe the bug
I am analyzing data from plant (cherry) samples hoping to determine viral quasispecies of Little Cherry Virus
I set up my v-pipe workflow based on the sars-cov2 tutorial
However, when I attempt to run v-pipe, either through a dry run or fully, I get a "workflow error"

My questions:
I'm curious why there is a "missing input file" (for sam2bam and gunzip). I was not instructed to give any other files except the fastq.
Is the workflow error a bug, or something I am missing in my input/config files?

To Reproduce

  1. V-pipe configuration file used
general:
    virus_base_config: ""

input:
    datadir: /samples/
    samples_file: samples.tsv
    reference: "{VPIPE_BASEDIR}/resources/LChV-2/reference.fasta"
    genes_gff: "{VPIPE_BASEDIR}/../resources/LChV-2/genomic.gff"
    read_length: 150

output:
    datadir: /results/
    trim_primers: false
    snv: true
    local: true
    global: true
    visualization: true
    diversity: true
    QA: true
    upload: false
    dehumanized_raw_reads: false
  1. Samples TSV file used
samples
├── 22-L147
│   └── 230309
│       └── raw_data
│           ├── 22-L147_S3_R1.fastq
│           └── 22-L147_S3_R2.fastq
└── 22-L801
    └── 230309
        └── raw_data
            ├── 22-L801_S14_R1.fastq
            └── 22-L801_S14_R2.fastq

6 directories, 4 files

vi samples.tsv 
22-L147 22-L147
22-L801 22-L801
  1. Commands executed
./vpipe --dryrun
  1. See error
Building DAG of jobs...
WorkflowError:
MissingInputException: Missing input files for rule sam2bam:
    output: /results/22-L147/22-L147/alignments/REF_aln.bam, /results/22-L147/22-L147/alignments/REF_aln.bam.bai
    wildcards: file=/results/22-L147/22-L147/alignments/REF_aln
    affected files:
        /results/22-L147/22-L147/alignments/REF_aln.sam
WorkflowError:
    WorkflowError:
        MissingInputException: Missing input files for rule gunzip:
            output: /results/22-L147/22-L147/extracted_data/R1.fastq
            wildcards: file=/results/22-L147/22-L147/extracted_data/R1, ext=fastq
            affected files:
                /results/22-L147/22-L147/extracted_data/R1.fastq.gz
        MissingInputException: Missing input files for rule gunzip:
            output: /results/22-L147/22-L147/extracted_data/R1.fastq
            wildcards: file=/results/22-L147/22-L147/extracted_data/R1, ext=fastq
            affected files:
                /results/22-L147/22-L147/extracted_data/R1.fastq.gz
    CyclicGraphException: Cyclic dependency on rule convert_to_ref.

Expected behavior
Due to following the setup tutorial, and sars-cov2 tutorial, I expected to get an output message indicating I either had everything in the right place in my config file, or indicating where I would need to make changes

Desktop

  • OS: Linux
  • Version: not sure? Installed using the quick install script from the tutorial on August 13th 2024
@DrYak
Copy link
Member

DrYak commented Sep 19, 2024

Hi (and sorry for the slow answer, I was on holiday).

I notice that you're giving absolute paths in you configuration file (begining with a slash /):

input:
   datadir: /samples/

#
output:
    datadir: /results/

And thus, V-pipe is trying to read and write file on the root directory of your workstation:

WorkflowError:
MissingInputException: Missing input files for rule sam2bam:
    output: /results/22-L147/22-L147/alignments/REF_aln.bam, /results/22-L147/22-L147/alignments/REF_aln.bam.bai
    wildcards: file=/results/22-L147/22-L147/alignments/REF_aln
    affected files:
       /results/22-L147/22-L147/alignments/REF_aln.sam

see directories /results/22-L147/22-L147/… above.

I presume you should be using paths relative to your current working directory, like the tutorials do, so without a leading /, e.g.:

input:
   datadir: samples/
#           ^- no '/' here
#
output:
    datadir: results/
#            ^- no '/' here

@DrYak
Copy link
Member

DrYak commented Sep 19, 2024

Another problem is that currently V-pipe doesn't provide any informations about Little Cherry Virus (See here for a list of available resources for viruses )

So this part is not going to work:

input:
    #
    reference: "{VPIPE_BASEDIR}/resources/LChV-2/reference.fasta"
    genes_gff: "{VPIPE_BASEDIR}/../resources/LChV-2/genomic.gff"

You will need to provide your own. And change the configuration file accordingly. for example:

# create a resource directory in the current working directory:
mkdir -p resources/LChV-2/

# copy the files in there
cp …somewhere_where_you_have_the_files…/LChV-2/reference.fasta resources/LChV-2/
cp …somewhere_where_you_have_the_files…/LChV-2/genomic.gff resources/LChV-2/

and then edit the configuration file to point to this new resource directory you created:

input:
    #
    reference: "resources/LChV-2/reference.fasta"
    genes_gff: "resources/LChV-2/genomic.gff"
    #           ^- no leading '/': search in the current working directory.

(Of course you could also install the files into your local copy of V-pipe, in which case you would have to fix a missing .. as the {VPIPE_BASEDIR} refers to the V-pipe/workflow/ directory, due to a limitation of how Snakemake works).

input:
    #
    # '..' missing here --------vv
    reference: "{VPIPE_BASEDIR}/../resources/LChV-2/reference.fasta"
    genes_gff: "{VPIPE_BASEDIR}/../resources/LChV-2/genomic.gff"

(NOTE: if you decide to modify V-pipe to add support for LChV-2, we would be interested in your pull request)

@DrYak DrYak self-assigned this Sep 19, 2024
@robertsap
Copy link
Author

Thanks so much for your response! I hope you had a pleasant holiday :)

I made the necessary modifications to the directory paths in my config file, however I am still getting the same error message

config file:
`
general:
virus_base_config: ""

input:
datadir: samples/
samples_file: samples.tsv
reference: "{VPIPE_BASEDIR}/../resources/LChV-2/reference.fasta"
genes_gff: "{VPIPE_BASEDIR}/../resources/LChV-2/genomic.gff"
read_length: 150

output:
datadir: results/
trim_primers: false
snv: true
local: true
global: true
visualization: true
diversity: true
QA: true
upload: false
dehumanized_raw_reads: false
`

error message:
WorkflowError: MissingInputException: Missing input files for rule sam2bam: output: results/22-L147/22-L147/alignments/REF_aln.bam, results/22-L147/22-L147/alignments/REF_aln.bam.bai wildcards: file=results/22-L147/22-L147/alignments/REF_aln affected files: results/22-L147/22-L147/alignments/REF_aln.sam WorkflowError: WorkflowError: MissingInputException: Missing input files for rule gunzip: output: results/22-L147/22-L147/extracted_data/R1.fastq wildcards: file=results/22-L147/22-L147/extracted_data/R1, ext=fastq affected files: results/22-L147/22-L147/extracted_data/R1.fastq.gz MissingInputException: Missing input files for rule gunzip: output: results/22-L147/22-L147/extracted_data/R1.fastq wildcards: file=results/22-L147/22-L147/extracted_data/R1, ext=fastq affected files: results/22-L147/22-L147/extracted_data/R1.fastq.gz CyclicGraphException: Cyclic dependency on rule convert_to_ref.

As for the reference, gff file locations etc. I did have them in the 'V-pipe/workflow' directory, so the pathing should have worked. However I did as you recommended and moved them into the 'resources' directory, and changed my config file to reflect the pathing (see above).

Thanks in advance for your patience. I'm a novice on the command line, so there may be something basic that I'm missing.

@DrYak
Copy link
Member

DrYak commented Dec 30, 2024

Hello, sorry for the absence of answer, I kind of lost track of your issue.

The MissingInputException is raised by snakemake whenever it doesn't know how to find some input files.

In the case of V-pipe this most often happens when the input raw files cannot be found.
V-pipe expects a very precise layout of input files (see files config/README.md in the section samples and the file config/config.html).
If the files are not placed according to this tree (mind the subdirectory raw_data !), or don't end with {some file name}_R1.fastq.gz and _R2.fastq.gz, V-pipe might fail to find them.

I would recommend having a look at the mass importers in utils (see utils/README.md), they can help layout the files correctly and generate a samples.tsv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants