-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow calculation of ambiguity feature in MLLM #822
Comments
Thanks @RietdorfC ! The I'll see if anything can be done to speed up the slow ambiguity calculation, but this is a symptom of matching gone wrong in other ways as well. |
Hi @RietdorfC , I've now implemented a new, hopefully much faster method for calculating the ambiguity feature in PR #825. Could you please test the code in that branch? I'm especially interested in
|
Hi @osma, I have found the token that was responisble for the large number of matches (and the coressponding matches). We will investigate this issue further. I will test your new method and report back to you as soon as possible. Best regards |
Dear Osma, dear annif-team,
As discussed in the annif-users-group (https://groups.google.com/g/annif-users/c/8d3AL4LAzBQ), I have added the debugging lines and performed the suggest operation with an MLLM model trained with the full GND vocabulary set we use (1.4M subjects) on a document with a long processing time (305.72 sec.). Please find the ziped tsets.jsonl file attached to this issue.
Best regards
Clemens
tsets.zip
The text was updated successfully, but these errors were encountered: