-
Notifications
You must be signed in to change notification settings - Fork 0
ssun32/clir_scripts
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Used tools ---------- convert traditional chinese to simplied http://github.com/berniey/hanziconv stanford segmenter tokenize chinese language filter.py --------- filter out ill-formed documents, such as Wikipedia:, Help: PortalA fix_query.py ------------ Regenerate the query from raw wikipedia documents. Previously, title words were not deleted from queries properly. fix_rel.py: ---------- Scan all the rel and documents files and remove the lines with deleted wikipedia articles
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published