Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Fairy #6647

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Add Fairy #6647

wants to merge 12 commits into from

Conversation

SantaMcCloud
Copy link
Contributor

FOR CONTRIBUTOR:

  • I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • License permits unrestricted use (educational + commercial)
  • This PR adds a new tool or tool collection
  • This PR updates an existing tool or tool collection
  • This PR does something else (explain below)

@SantaMcCloud
Copy link
Contributor Author

SantaMcCloud commented Dec 20, 2024

I know that there will be an error because of the format bcsp. Is it possible to add this format into the galaxy datatype?

I can do it but i have to read and maybe ask question about it. I tried to use the tabular format but with this the tool had always erros possible because the bcsp is a binary file.

@SantaMcCloud
Copy link
Contributor Author

SantaMcCloud commented Dec 20, 2024

i also get this error when i try to test fairy_cov.xml

(planemo) (fairy_env) sf373@LAPTOP-7RMLPR2D:~/sf373$ planemo test tools-iuc/tools/fairy/fairy_cov.xml 
Traceback (most recent call last):
  File "/home/sf373/sf373/planemo/bin/planemo", line 8, in <module>
    sys.exit(planemo())
  File "/home/sf373/sf373/planemo/lib/python3.7/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/sf373/sf373/planemo/lib/python3.7/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/sf373/sf373/planemo/lib/python3.7/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/sf373/sf373/planemo/lib/python3.7/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/sf373/sf373/planemo/lib/python3.7/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/sf373/sf373/planemo/lib/python3.7/site-packages/click/decorators.py", line 92, in new_func
    return ctx.invoke(f, obj, *args, **kwargs)
  File "/home/sf373/sf373/planemo/lib/python3.7/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/sf373/sf373/planemo/lib/python3.7/site-packages/planemo/cli.py", line 103, in handle_blended_options
    return f(*args, **kwds)
  File "/home/sf373/sf373/planemo/lib/python3.7/site-packages/planemo/commands/cmd_test.py", line 76, in cli
    return_value = test_runnables(ctx, runnables, original_paths=uris, **kwds)
  File "/home/sf373/sf373/planemo/lib/python3.7/site-packages/planemo/engine/test.py", line 13, in test_runnables
    test_data = engine.test(runnables, test_timeout=kwds.get("test_timeout"))
  File "/home/sf373/sf373/planemo/lib/python3.7/site-packages/planemo/engine/interface.py", line 79, in test
    test_cases = [t for tl in map(cases, runnables) for t in tl]
  File "/home/sf373/sf373/planemo/lib/python3.7/site-packages/planemo/engine/interface.py", line 79, in <listcomp>
    test_cases = [t for tl in map(cases, runnables) for t in tl]
  File "/home/sf373/sf373/planemo/lib/python3.7/site-packages/planemo/runnable.py", line 261, in cases
    test_dicts = tool_source.parse_tests_to_dict()
  File "/home/sf373/sf373/planemo/lib/python3.7/site-packages/galaxy/tool_util/parser/xml.py", line 649, in parse_tests_to_dict
    tests.append(_test_elem_to_dict(test_elem, i, profile))
  File "/home/sf373/sf373/planemo/lib/python3.7/site-packages/galaxy/tool_util/parser/xml.py", line 693, in _test_elem_to_dict
    outputs=__parse_output_elems(test_elem),
  File "/home/sf373/sf373/planemo/lib/python3.7/site-packages/galaxy/tool_util/parser/xml.py", line 718, in __parse_output_elems
    name, file, attributes = __parse_output_elem(output_elem)
  File "/home/sf373/sf373/planemo/lib/python3.7/site-packages/galaxy/tool_util/parser/xml.py", line 728, in __parse_output_elem
    file, attributes = __parse_test_attributes(output_elem, attrib, parse_discovered_datasets=True)
  File "/home/sf373/sf373/planemo/lib/python3.7/site-packages/galaxy/tool_util/parser/xml.py", line 781, in __parse_test_attributes
    attributes["object"] = json.loads(attrib.pop("value_json"))
  File "/home/sf373/anaconda3/envs/comebin_env/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/sf373/anaconda3/envs/comebin_env/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/sf373/anaconda3/envs/comebin_env/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I dont know how this error happen but i will have a look at this the next couple of days. Maybe i can fix it somehow

@bernt-matthias
Copy link
Contributor

I know that there will be an error because of the format bcsp. Is it possible to add this format into the galaxy datatype?

Yes. Is they documentation available?

@bernt-matthias
Copy link
Contributor

I dont know how this error happen but i will have a look at this the next couple of days. Maybe i can fix it somehow

I think value_json="normal_test.tsv" should be value="normal_test.tsv".

@SantaMcCloud
Copy link
Contributor Author

I know that there will be an error because of the format bcsp. Is it possible to add this format into the galaxy datatype?

Yes. Is they documentation available?

Yes and i found the right tab to it. I will read it and try it out!

@SantaMcCloud
Copy link
Contributor Author

I dont know how this error happen but i will have a look at this the next couple of days. Maybe i can fix it somehow

I think value_json="normal_test.tsv" should be value="normal_test.tsv".

yes i didnt saw this typo yesterday thank you for the hint :)

@SantaMcCloud
Copy link
Contributor Author

@bernt-matthias i have a question about adding a datatype. Do i have to add a sniffer or is this an optional function?

@SantaMcCloud
Copy link
Contributor Author

SantaMcCloud commented Dec 20, 2024

@bernt-matthias
Copy link
Contributor

@bernt-matthias i have a question about adding a datatype. Do i have to add a sniffer or is this an optional function?

Optional, I would say.

Copy link
Contributor

@bernt-matthias bernt-matthias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few more comments.

tools/fairy/fairy_cov.xml Outdated Show resolved Hide resolved
tools/fairy/fairy_cov.xml Outdated Show resolved Hide resolved
tools/fairy/fairy_cov.xml Outdated Show resolved Hide resolved
tools/fairy/fairy_cov.xml Outdated Show resolved Hide resolved
<param name="contig" type="data" format="fasta,fasta.gz" label="Input fasta contig file" help="Input the raw fasta contig file. It can be gzip!"/>
<param name="bcsp_file" type="data" format="bcsp" label="Input the pre-sketched file (.bcsp file)" help="This file will be generated with the fairy sketch tool."/>
<param argument="--minimum-ani" type="integer" optional="true" min="0" max="100" value="95" label="Set minimum ANI" help="Set the minimum adjusted ANI for the coverage calculation. CARE: only adjust it when you know what it does!"/>
<param argument="--min-number-kmers" type="integer" value="8" optional="true" label="Genome filter" help="Set the number for exclude genomes with less than this number of sampled k-mers."/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't undersand this parameter. For the default sequences with less than 8 kmers are removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did rewrote this help, maybe it is better now to understand but yes this paramtere should be a filter to filter out genomes with less k-mers then this paramter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still do not understand this parameter. What does it mean if a genome has x k-mers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that a genome have less then x (the value of the parameter) k-mers it will execlude from the algorithm. It should not appear in the coverage file generated by Fairy. It this more cleare? And sorry for this missunderstanding

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately no. I just do not understand what it means that a genome has a kmer. A genome of length n has n-k+1 kmers. There must be something special about the x kmers the help text is talking about.

Copy link
Contributor Author

@SantaMcCloud SantaMcCloud Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This x stands for a lower border to filter out genomes with less then x k-mers to not appear in the coverage file.

For an example there are 3 genome in a contig. The first hast 5 k-mers, the second has 10 k-mers and the last one has 8 k-mers. We set this paramter to a 7 which means that the first genome with 5 k-mers will not be included since it hast less then 7 k-mers. When we set this to 2 then there will be no excluding of a genome since all 3 has more then 2 k-mers.

Is this explenation better?

tools/fairy/fairy_sketch.xml Show resolved Hide resolved
@SantaMcCloud
Copy link
Contributor Author

Will still fail since the datatype is not in galaxy yet

@SantaMcCloud
Copy link
Contributor Author

Hm okay i dont know why this test is still failing here. I did test it localy with cloning the 24.1 branch and use this as root for planemo and this worked fine

@bgruening
Copy link
Member

The test is fine, the linting fails.

@bgruening
Copy link
Member

.. WARNING: Repository name [Fairy] invalid. Repository names must contain only lower-case letters, numbers and underscore.

@SantaMcCloud
Copy link
Contributor Author

The test is fine, the linting fails.

oh mb then thought it was the test thank you for looking at it!

<param argument="--minimum-ani" type="integer" optional="true" min="0" max="100" value="95" label="Set minimum ANI" help="Set the minimum adjusted ANI for the coverage calculation. CARE: only adjust it when you know what it does!"/>
<param argument="--min-number-kmers" type="integer" value="8" optional="true" label="Genome filter" help="Set a filter to filter out genomes with less then x k-mer sampled."/>
<param argument="-c" type="integer" value="50" optional="true" label="Set subsampling rate" help="This value does not interact with the .bcsp file which was used as input."/>
<param argument="-k" type="select" label="Select k-mer size" help="This value does not interact with the .bcsp file which was used as input.">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand the help text and the above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the inputed contig file will be sketch aswell from in this part of the tool which also run the sketching algo. This value is one setting of this algo which means that the presketch file, which is the .bcsp file, will not be changed by this value since it was sketch by the other part of the tool befor hand by the same algo.

tools/fairy/fairy_cov.xml Outdated Show resolved Hide resolved
tools/fairy/fairy_cov.xml Outdated Show resolved Hide resolved
tools/fairy/fairy_cov.xml Outdated Show resolved Hide resolved
<option value="21">21</option>
</param>
<param argument="--min-spacing" type="integer" value="30" label="Set spacing between k-mers" help=" Minimum spacing between selected k-mers on the contigs."/>
<param argument="--full-contig-name" type="boolean" falsevalue="" truevalue="--full-contig-name" label="Full contig name"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this option do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a contig has a space in then when this option is not used the tool only take evreything befor this space as name and not using the hole name. When set this option on true evreything will be used. I add a help text to it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#end if
&&

cp './res/${filename}' $output
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a comment, why the filename is chosen by the program? Its a bit confusing that only file_1 is used.

Suggested change
cp './res/${filename}' $output
cp './res/${filename}' '$output'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add a comment to it and this is because the tool used the first inputed file to name outpout file. There is an option to change the name but galaxy this option is not needed. To copy the outputfile then i used file_1 plus the new fromat, which is .bcsp to find the output and copy this to the output placeholder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tools/fairy/fairy_sketch.xml Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants