-
-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protect community from confusable homoglyphs #1470
base: master
Are you sure you want to change the base?
Conversation
How often do they need to be updated? Could this be done in cron, or in the payday script? I'm not certain |
I think updating it every time the site is updated would be fair enough. The script update if from this page: http://ftp.unicode.org/Public/security/latest/confusables.txt Last update was november 2018. |
for existing_name in all_names: | ||
if cls._unconfusable(name) == cls._unconfusable(existing_name): | ||
raise CommunityAlreadyExists | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very inefficient implementation, the more communities there are the slower it becomes. Also you're repeating an identical function call (cls._unconfusable(name)
) in every loop iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed here: 9b496bc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You only fixed the repeated call, not the wider issue: this implementation is still inefficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which part is the most inefficient:
- the database query?
- the unconfusable lookup?
- the
unconfusable_string
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fetching of all the existing community names from the database and the subsequent loop. That's linear complexity, O(n), with n
being the number of existing communities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably do something like this instead:
possible_collisions = get_confusables(name)
possible_collisions.append(name)
collides_with = cursor.one("""
SELECT name
FROM communities
WHERE name IN %s
LIMIT 1
""", (tuple(possible_collisions),))
if collides_with:
raise CommunityAlreadyExists(collides_with)
|
|
Strike that, the |
Ok I removed the update in the |
for existing_name in all_names: | ||
if cls._unconfusable(name) == cls._unconfusable(existing_name): | ||
raise CommunityAlreadyExists | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You only fixed the repeated call, not the wider issue: this implementation is still inefficient.
liberapay/utils/unconfusable.py
Outdated
@@ -0,0 +1,11 @@ | |||
from confusable_homoglyphs import confusables | |||
|
|||
def unconfusable_string(name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a docstring explaining what this function does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done here: a73af92
Tell me if there is a specific format for that.
for existing_name in all_names: | ||
if cls._unconfusable(name) == cls._unconfusable(existing_name): | ||
raise CommunityAlreadyExists | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fetching of all the existing community names from the database and the subsequent loop. That's linear complexity, O(n), with n
being the number of existing communities.
from confusable_homoglyphs import confusables | ||
|
||
# Convert an Unicode string to its equivalent replacing all confusable homoglyphs | ||
# to its common/latin equivalent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a comment, not a docstring.
This PR aims to fix #1469 by protecting the creation of confusable name for community.
I learned how to use the library during the implementation but there is something I don't know very well how to do.
The documentation says that we have to update the data file by executing the two following commands:
I suppose the call the
confusable_homoglyphs update
has to be added to theMakefile
during themake env
step, am I right?