Subunit detection is returning wrong numbers #241

Porthmeus · 2024-11-07T16:49:03Z

I just spotted this while rewriting the code for diamond. If the subunit nomenclature of the enzyme is denoted by letters, subunits I, V and X will be get wrong numbers in complex_detection.R, because they will be treated as roman numbers.

So I would expect
A -> 1
B -> 2
C -> 3
...
I -> 9

but instead it does
I -> 1

You can test this for example with either the single uniprot entrance: A0A7V5FFT7
Or you can just use seq/Bacteria/unrev/1.6.5.3.fasta from the repository.

Furthermore in the very same test case the extraction of the subunits can fail if one of the keywords for detecting subunits is preceded by a single capital letter. For example if the header of the faster looks like this:
"UniRef50_U3TYP0 NADH dehydrogenase I chain F n=1 Tax=Plautia stali symbiont TaxID=891974 RepID=U3TYP0_9ENTR"
the script will extract: "I chain" as the subunit, instead of the expected "chain F"

My current plan is to implement diamond only for the -p all option, where I will try to correct these errors, however, I thought I report it here, as I am not sure when and if I find the time to finish it.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subunit detection is returning wrong numbers #241

Subunit detection is returning wrong numbers #241

Porthmeus commented Nov 7, 2024 •

edited

Loading

Subunit detection is returning wrong numbers #241

Subunit detection is returning wrong numbers #241

Comments

Porthmeus commented Nov 7, 2024 • edited Loading

Porthmeus commented Nov 7, 2024 •

edited

Loading