-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification on if snowball
(specifically python implemenation) is not thread-safe
#162
Comments
Found it in the docs https://github.com/zvelo/libstemmer/blob/78c149a3a6f262a35c7f7351d3f77b725fc646cf/README#L45
Apologies for the extra noise. |
I should point out that's not our repo you've linked to - it's a git repo created from an ancient tarball from the old website (snowball.tartarus.org) - it's essentially a decade-old (and, judging from their git history, barely maintained) fork, and definitely not a canonical source of information about current Snowball releases from our repo. You're also looking at the README for the C libstemmer, not Python. Your conclusion is right though, and I think it's currently true for all the target languages. The issue is that the Snowball cursor position, limits, slice, and any Snowball variables are stored in the object (or C |
Thanks for the clarification, appreciated.
I couldn't find a similar reference to Snowball not being thread-safe within this repo, but it is likely I have just not managed to find it. If there isn't anywhere in the docs that state that Snowball instances are not thread-safe and that to use in a multi-thread/process run one should instantiate a Snowball stemmer instance within each thread (as opposed to sharing one instance across multiple threads), then that could be a valuable addition to the docs to help future uses avoid such race-condition intermittent bugs that are particularly hard to track down. Thanks for all your time and energy that goes into this package. It is appreciated! |
As you say, we should document this in all the target language README files though. |
I misread you message, it's clear what is requested. |
A patch would be lovely. The target language README files are either in One thought - I don't write a lot of heavily multithreaded code, but I think it would probably be better advice to suggest creating one stemmer object per thread wanting to stem words - currently we talk about sharing one stemmer object and protecting it with a mutex or similar, but that adds contention on a single resource without a good reason to. |
#163 :) I concur(ed) with your thought on advising user to instantiate per thread is more user friendly than advising them to use a mutex/lock. |
Fixed by #163. |
Hi, we've been experiencing intermittent inconsistent outputs (i.e. bug) when using
snowball
withdask
multiprocessing. We can stop these bugs occurring by any of a) using a single process/thread, b) removing the stemming, or c) moving the instantiation of the stemmer inside the function which is being applied within threads.Could someone with expertise input on whether
snowball
is thread-safe or not?Might be related to #146 which seems to imply that the C# implementation is not thread safe.
The text was updated successfully, but these errors were encountered: