Skip to content
This repository has been archived by the owner on Mar 6, 2019. It is now read-only.

refactored singularize() to use SnowballStemmer; fixed the smartJoin() #11

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ramji-c
Copy link

@ramji-c ramji-c commented Jun 8, 2017

Summary of changes:

  1. refactored singularize() function in utils.py to make use of SnowballStemmer to convert plural words to singular form
  2. fixed the smartJoin() function to use regex for whitespace removal
  3. fixed a minor bug in use of convert_to_json.py, wherein the output of roundtrip.sh script wasn't compatible. The import_data() now checks if confidence score is available during the conversion process

…(); updated convert_to_json.py to work with output of roundtrip.sh
@@ -96,41 +99,11 @@ def getFeatures(token, index, tokens):

def singularize(word):
"""
A poor replacement for the pattern.en singularize function, but ok for now.
use nltk/Snowballstemmer to singularize word.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a stemmer will do a lot more than singularize words.. ex "larger" becomes "large"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to use nltk.WordNetLemmatizer?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants