Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subunit detection is returning wrong numbers #241

Open
Porthmeus opened this issue Nov 7, 2024 · 0 comments
Open

Subunit detection is returning wrong numbers #241

Porthmeus opened this issue Nov 7, 2024 · 0 comments

Comments

@Porthmeus
Copy link

Porthmeus commented Nov 7, 2024

I just spotted this while rewriting the code for diamond. If the subunit nomenclature of the enzyme is denoted by letters, subunits I, V and X will be get wrong numbers in complex_detection.R, because they will be treated as roman numbers.

So I would expect
A -> 1
B -> 2
C -> 3
...
I -> 9

but instead it does
I -> 1

You can test this for example with either the single uniprot entrance: A0A7V5FFT7
Or you can just use seq/Bacteria/unrev/1.6.5.3.fasta from the repository.

Furthermore in the very same test case the extraction of the subunits can fail if one of the keywords for detecting subunits is preceded by a single capital letter. For example if the header of the faster looks like this:
"UniRef50_U3TYP0 NADH dehydrogenase I chain F n=1 Tax=Plautia stali symbiont TaxID=891974 RepID=U3TYP0_9ENTR"
the script will extract: "I chain" as the subunit, instead of the expected "chain F"

My current plan is to implement diamond only for the -p all option, where I will try to correct these errors, however, I thought I report it here, as I am not sure when and if I find the time to finish it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant