Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ldoce5viewer failed to create index #33

Open
reflectionalist opened this issue Dec 21, 2015 · 8 comments
Open

ldoce5viewer failed to create index #33

reflectionalist opened this issue Dec 21, 2015 · 8 comments

Comments

@reflectionalist
Copy link
Contributor

While creating index, ldoce5viewer failed with the following error message. I have tried both python2 and python3 (and their corresponding versions of lxml and whoosh) on Netrunner (which is Arch/Manjaro-based), indexing failed in both settings.

Error message for python2 setting:
Finalizing...
Done.
Building the full text search index for headwords and phrases...
Error occurred
Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/ldoce5viewer/qtgui/indexer.py", line 438, in run
self._make_index()
File "/usr/lib/python3.5/site-packages/ldoce5viewer/qtgui/indexer.py", line 397, in _make_index
make_full_hp(scan_temp)
File "/usr/lib/python3.5/site-packages/ldoce5viewer/qtgui/indexer.py", line 356, in make_full_hp
label, path, prio, sortkey)
File "/usr/lib/python3.5/site-packages/ldoce5viewer/fulltext.py", line 176, in add_item
data=(label, path, prio, normalize_index_key(sortkey))
File "/usr/lib/python3.5/site-packages/whoosh/writing.py", line 750, in add_document
for tbytes, freq, weight, vbytes in items:
File "/usr/lib/python3.5/site-packages/whoosh/fields.py", line 164, in index
for tstring, freq, wt, vbytes in word_values(value, ana, **kwargs):
File "/usr/lib/python3.5/site-packages/whoosh/formats.py", line 146, in word_values
wordset = set(t.text for t in tokens(value, analyzer, kwargs))
File "/usr/lib/python3.5/site-packages/whoosh/formats.py", line 125, in tokens
gen = analyzer(value, **kwargs)
TypeError: 'NoneType' object is not callable
Removing files...
Failed to create index
Error message for python3 setting:
Finalizing...
Done.
Building the full text search index for headwords and phrases...
Error occurred
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ldoce5viewer/qtgui/indexer.py", line 438, in run
self._make_index()
File "/usr/lib/python2.7/site-packages/ldoce5viewer/qtgui/indexer.py", line 397, in _make_index
make_full_hp(scan_temp)
File "/usr/lib/python2.7/site-packages/ldoce5viewer/qtgui/indexer.py", line 356, in make_full_hp
label, path, prio, sortkey)
File "/usr/lib/python2.7/site-packages/ldoce5viewer/fulltext.py", line 176, in add_item
data=(label, path, prio, normalize_index_key(sortkey))
File "/usr/lib/python2.7/site-packages/whoosh/writing.py", line 750, in add_document
for tbytes, freq, weight, vbytes in items:
File "/usr/lib/python2.7/site-packages/whoosh/fields.py", line 164, in index
for tstring, freq, wt, vbytes in word_values(value, ana, **kwargs):
File "/usr/lib/python2.7/site-packages/whoosh/formats.py", line 146, in word_values
wordset = set(t.text for t in tokens(value, analyzer, kwargs))
File "/usr/lib/python2.7/site-packages/whoosh/formats.py", line 125, in tokens
gen = analyzer(value, **kwargs)
TypeError: 'NoneType' object is not callable
Removing files...
Failed to create index
@lyz1990
Copy link

lyz1990 commented Dec 27, 2015

Removing python-whoosh 2.7.0 and installing python-whoosh 2.5.7 works for me ( ubuntu with python 2.7)

@ghost
Copy link

ghost commented Dec 27, 2015

I ran into the same error with Python 2.7.9 and whoosh 2.7.0.

Fixed it by wrapping the corresponding lines in indexer.py into try-except blocks:

diff --git a/ldoce5viewer/qtgui/indexer.py b/ldoce5viewer/qtgui/indexer.py
index 0fedba6..582e99f 100644
--- a/ldoce5viewer/qtgui/indexer.py
+++ b/ldoce5viewer/qtgui/indexer.py
@@ -352,8 +352,12 @@ class IndexingThread(QThread):
                     i += 1
                     if i % 10000 == 0:
                         self._message('{0} items added'.format(i))
-                    fulltext_hwdphr_maker.add_item(itemtype, content, asfilter,
+
+                    try:
+                        fulltext_hwdphr_maker.add_item(itemtype, content, asfilter,
                             label, path, prio, sortkey)
+                    except Exception as e:
+                        print ("itemtype=", itemtype, "content=", content, "label=", label, "path=", path, e)

             self._message('{0} items were added.'.format(i))
             self._message('Finalizing...')
@@ -379,8 +383,11 @@ class IndexingThread(QThread):
                     i += 1
                     if i % 10000 == 0:
                         self._message('{0} items added'.format(i))
-                    fulltext_defexa_maker.add_item(itemtype, content, asfilter,
-                            label, path, prio, sortkey)
+                    try:
+                        fulltext_defexa_maker.add_item(itemtype, content, asfilter,
+                                label, path, prio, sortkey)
+                    except Exception as e:
+                        print ("itemtype=", itemtype, "content=", content, "=label", label, "path=", path, e)

             self._message('{0} items were added.'.format(i))
             self._message('Finalizing...')```

@reflectionalist
Copy link
Contributor Author

@elfua What is the rationale behind the wrapping? Also, have you tested your patch for Python 3?

@ghost
Copy link

ghost commented Dec 29, 2015

@reflectionalist: The rationale behind the wrapping is to capture the indexing errors that happen with certain records, as outlined in the stacktraces you posted.

It is a dirty workaround and not the proper way to fix the errors, but at least it allows the program to proceed even if something failed during indexing.

I have updated the print statements to be python 3 compatible.

Btw, all of the triggered whoosh exceptions are of the type Called start_doc when already in a doc. Googling for the error returns this thread with some explanation (in the last post) about what might be causing it.

@ghost
Copy link

ghost commented Jan 1, 2016

To anyone running into the indexing issue: I can confirm that indexing works as expected with whoosh 2.5.7.

The whoosh version currently available from pip is 2.7.0 -- it is not compatible with the current ldoce5viewer, and will break with the error mentioned above. The patch I posted in the previous post is not a proper fix, -- it simply allows the indexing to continue while ignoring the errors, which will result in some content not being indexed for full text search, and thus not appearing in the search. Searching by headwords will still work though.

Until @ciscorn updates the app to be compatible with whoosh 2.7+ the proper way to fix the indexing error is to downgrade whoosh to 2.5.7.

If you installed whoosh via pip then you should:

  1. Remove whoosh 2.7:
    sudo pip uninstall whoosh
  1. Explicitly install version 2.5.7:
    sudo pip install whoosh==2.5.7

For Python 3 pip3 should be used instead.

On Ubuntu pyhon-whoosh and python3-whoosh happen to currently be at version 2.5.7-3, so using the repo package is another option, but it might break on future version bumps, unless you explicitly lock the package version.

@hoanghadu
Copy link

Thank elFua so much. That works on me.

@bzhao
Copy link

bzhao commented Feb 11, 2016

I am using Arch Linux system
But after using the command of "pip install whoosh==2.5.7 and pip3 install whoosh==2.5.7" sucessfully, the index data generating process is finished successfully
When I start to look up the word of "about", the error window popup with the content of:

Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/ldoce5viewer/qtgui/utils/soundplayer.py", line 107, in need_data
appsrc.emit('push-buffer', Gst.Buffer.new_wrapped(self._data[:size]))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/ldoce5viewer/qtgui/utils/soundplayer.py", line 107, in need_data
appsrc.emit('push-buffer', Gst.Buffer.new_wrapped(self._data[:size]))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/ldoce5viewer/qtgui/utils/soundplayer.py", line 107, in need_data
appsrc.emit('push-buffer', Gst.Buffer.new_wrapped(self._data[:size]))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/ldoce5viewer/qtgui/utils/soundplayer.py", line 107, in need_data
appsrc.emit('push-buffer', Gst.Buffer.new_wrapped(self._data[:size]))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/ldoce5viewer/qtgui/utils/soundplayer.py", line 107, in need_data
appsrc.emit('push-buffer', Gst.Buffer.new_wrapped(self._data[:size]))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/ldoce5viewer/qtgui/utils/soundplayer.py", line 107, in need_data
appsrc.emit('push-buffer', Gst.Buffer.new_wrapped(self._data[:size]))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

@linluc
Copy link

linluc commented Aug 5, 2017

The error while indexing using whoosh 2.7:

gen = analyzer(value, **kwargs)
TypeError: 'NoneType' object is not callable

is caused by regression in whoosh files, as described here.

Solutions:

  1. Update to whoosh 2.7.4 or higher, or
  2. Edit "fields.py" in the whoosh package (ver. 2.7) by adding one line:
         self.analyzer = analysis.RegexAnalyzer(expression=expression)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants