-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TODO list for keyboard search #87
Comments
Also: review feedback from @ermshiperete at https://community.software.sil.org/t/looking-for-testers-keyboard-search-refresh/3570/8 |
TODO:
|
|
|
@ermshiperete comments:
Further comments from @ermshiperete:Just found a bug:
Further comments from @ermshiperete:
|
|
This is by design. There is only one keyboard currently listed that supports those languages:
This is an advanced feature and is by design. There is now a hint to help you search by language code on the search page.
Correct. We don't currently support synonyms or abbreviations for country names. This would be a low priority feature I think; I don't want to maintain a database of synonyms for countries and the ISO 3166-1 list does not include them. We use the ISO 3166-1 alpha-1 list, which is the most common format.
This is by design. "EuroLatin (SIL)" matches on "Swati" language rather than "Swabian", and the keyboard won't be shown twice. There are 13 different language names starting with "swa" in langtags.json and we don't want to show duplicates. Just keep typing if it hasn't found the name you are looking for 😉.
I think this is mostly personal preference 😁.
This is correct. We normalise the BCP 47 language subtag from ISO639-3 to ISO639-1 (which gives us
This is a side-effect of the precise match signal, which pushes the exact string match of
I also, in some ways, prefer the nested search results... But this was the trade-off I made at the start of the design. The old search had too much complexity due to the multiple search result lists and I think that this simpler flat search result matches what most users are going to expect (as they will be familiar with the flat Google-style searches).
This has been resolved in an earlier PR.
Okay, so this is actually a bit of a tricky one. For the embedded search, IPATotal would not show up. For the basic web search, we don't use the current user's platform as a signal, currently. The unexpected ordering here comes about because we are multiplying the match weight against the ln() of the download count (+2 for reasons). IPATotal currently wins out because its name starts with IPA as well as having IPA in the description, giving it a basic weight of 60 vs SIL IPA of 35. The final weights are 286.24 and 225.48 respectively. We just need to download sil_ipa another 3000 times a month and it'll sort itself out 🙈. Perhaps that indicates that And with Changing this formula will break all my tests because all the weights change so I am really not very keen 🤣... but will do if this is a good solution. Thoughts appreciated. |
Finish keyboard install page (aka universal link infrastructure) for:
|
|
I think the staging site is using the BCP 47 tag On Keyman for Android alpha, I do a keyboard search for "sil_ipa" and install the keyboard. The sil_ipa keyboard shows up with the tag |
Re sil_ipa.keyboard_info
sil_ipa.kps
This was deliberate at the time, because we had trouble installing |
All remaining items extracted into separate issues, so closing this mega checklist |
FUTURE:
TODO:
Include hints on how to use search (advanced search link?) -- extrapolated from Doug's feedback: (FIX: fix: search history and hints keyman.com#149)
The new site gives 27 keyboards in response to fulfulde.
The old gives only 1, and it is not a good one.
However, the new site returns no results for the code ffm
whereas the old site gives one language, under which there are 4 keyboards.
Update database layer to use schemas for switching instead of separate databases. (FIX: refactor: use schemas instead of swappable databases #96, chore: cleanup global variables somewhat #97)
Separate deprecated keyboards out of default search: https://community.software.sil.org/t/looking-for-testers-keyboard-search-refresh/3570/3?u=marc_keyman (FIX: fix: skip deprecated keyboards #99, fix: skip deprecated keyboards keyman.com#147)
strip out generic words such as “language” and “keyboard”. https://community.software.sil.org/t/looking-for-testers-keyboard-search-refresh/3570/5 (FIX: fix: strip common words from search keyman.com#148)
skip the “page 1 of 0” when there are no results! https://community.software.sil.org/t/looking-for-testers-keyboard-search-refresh/3570/5 (FIX: fix: hide page statistics when few results keyman.com#146)
http://api.keyman.com.local/search?q=l:id:str-latn is matching nothing. Searching for
str-latn
returnsstr
as expected, but losing thestr-latn
which is needed for the keyboard itself. So there is something mismatching here. (via slack https://keymanapp.slack.com/archives/C6Q9WS09G/p1593670613153900) (FIX: fix(search): use keyboard's BCP 47 tag in links #98)Use https://keyman.com/go/package/download/ pattern for package download APIs on api.keyman.com. (https://docs.google.com/document/d/1rhgMeJlCdXCi6ohPb_CuyZd0PZMoSzMqGpv1A8cMFHY/edit#heading=h.mio0p3wdzdye)
Embedded search is a bit wasteful on space. Let's restructure it (e.g. Home link, keyman.com link could be together on right, with Keyboard Search and search box on left, all on one line) (Fix feat: rework download keyboard dialog style keyman.com#153, feat(windows): rework download keyboard dialog style keyman#3463)
Uncaught TypeError: Cannot read property 'q' of undefined
at init (search.js:282)
at load_search (search.js:311)
when using in embedded search; I searched for "khmer" and then clicked the first link. URL: http://keyman.com.local/keyboards/khmer_angkor?bcp47=km&embed=windows (Fix: fix: avoid missing packages with spaces or periods in id #150)
Setup internal links for back-end API calls to use internal.api.keyman.com (non-proxied form of api.keyman.com); these avoid trips outside the datacenter for PHP calls to the backend. Requires also DNS setup. (Will be fixed in Performance improvements for keyboard search (and related) #107)
Deleting all text from search box doesn't reset search (FIX: fix: search history and hints keyman.com#149)
Verify that all appropriate indexes are in place (especially check short searches such as 'r', 'ra'). (Will be fixed in Performance improvements for keyboard search (and related) #107)
Non-Unicode keyboards are not listed as 'obsolete' yet (FIX: fix: non-unicode keyboards are obsolete too keyman.com#151)
===========================================================================================================================
DONE:
Additional Notes: Notes on api.keyman.com changes for langtag consumption
example keyboard: burushaski_girminas, khw-latn "Khowar (Latin)"
ខ្មែរ finds zero results but ខ្មែ finds 7...
Show a 'popular keyboards' list for the empty search -- this can also be the search engine jumping-off point.
"Show obsolete keyboards" needs an indication of the change of status ("Hide obsolete keyboards") and needs to be outdented. Also
needs some thought with paginated results.
Too many pages leads to overwhelming number of page links at bottom (e.g. s:latin)
http://api.keyman.com.local/search?q=l gives 500
el_dinka appears to have non-canonical bcp47 codes -- search finds it no trouble.
Show list of associated languages+scripts+countries in keyboard deatils (and related keyboards?)
For in-app download links, include information on searched language code (if available), for default language install (#1456)
Match fields in json should be integer or float where possible, not string! (and update schema accordingly)
schema for match type should be restrictive to actual types used
Search "spa" vs "spanish" -- the weighting could be better. Similar "ger" vs "german". (probably need length-based match weight override)
REFACTOR: region vs country
REFACTOR: code vs id vs tag
Pagination
Need to give more detail on failed links (and make it easier to find in logs, so tweak the broken link search a node wrapper)
Searches for keyboard ids should work
Phrases are not working yet (need to split into either a phrase search or separate words)
Searches for bcp47 tags, scripts, regions should work
FAIL: http://api.keyman.com.local/search/2.0?f=1&q=l:%, c:%, etc.
Default search should return a FLAT LIST of KEYBOARDS ONLY with highlights. e.g. 'Thai' should return keyboards with 'Thai' in the name, in a language name, or in the country associated with the language.
Search results must be weighted (summed?)
a) match of primary language name 1.0
b) match of alternate language name 0.3
c) match of keyboard name or id 1.0
d) match of script name 1.0
e) match of country name 0.5
f) match on term in description 0.5
g) match quality (whole word match = 1.0, down to 0.1 for further distance? as a multiplicand)
select * from t_langtag_name inner join containstable(t_langtag_name, name, 'isabout (thai weight (1.0), "thai*" weight (0.5))') as KEY_TBL ON t_langtag_name._id = KEY_TBL.[KEY] order by [RaNK] desc
5 / 5 = 1.0
4 / 5 = 0.8
1 / 5 = 0.2
NOTE: final weighting is different but ... let's see how it goes
Can also specify a search:
?q=l:<term>
search for keyboards that support a language, by name (does not check id)?q=l:id:<id>
search for keyboards that support a language, by bcp 47 id?q=c:<term>
search for keyboards that support languages within a country?q=c::id:<id>
search for keyboards that support languages within a country, by iso 3166 id?q=s:<term>
search for keyboards that support a script?q=s:id:<id>
search for keyboards that support a script by script id?q=id:<id>
search for keyboards that match the id?q=legacy:<id>
search for keyboards that match the legacy id, only one returned!Should be able to specify alternate names? Searches should match on NFKD with diacritics stripped.
Process:
The text was updated successfully, but these errors were encountered: