-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get gtdbtk to use dfast proteins as input. #142
Comments
To do the conversion, I had to change the batch file to point at the I also got the following warning:
|
@LeeBergstrand Thanks for this idea and the extra context! Just to confirm, are the key reasons for exposing the On my end, aside from those possible benefits, I can see the following possible disadvantages:
Weighing these advantages and disadvantages, I wonder if most users would not need to use |
I think, given the issues you brought up and the fact that Prodigal runs fast enough, most of the advantages of running |
gtdbtk has a
--genes
parameter that allows gtdbtk to use the output of gene prediction pipelines rather than prodigal as input.This parameter causes gtdbtk to take proteins as input (Ecogenomics/GTDBTk#571).
I'm wondering if the speed-up of skipping prodigal inside the GTDBtk rule is worth it, as it skips the ANI and mash steps and starts putting the found markers in the GTDB trees using pplacer immediately. On my machine, I run out of memory using pplacer. With my dataset, most of the time, the pipeline skips the pplacer step. If the ANI screen finds a close match (you have an organism already in the tree), I think it skips the marker gene insert step and speeds up the pipeline.
@jmtsuji Would using
--genes
be useful for you as an optional approach?The text was updated successfully, but these errors were encountered: