-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better fancify.sh
#3
Comments
One might want to also add accent marks on capital letters (e.g. Also, we might eventually want to support several languages, so it should be an entry parameter. Given that the above use case and some functionality such as quote pair matching might be a little complex to implement, I would tend to switch from bash script to python wdyt ? |
I was comparing the Alice ebook I had :
to the edition in this repo… I mean, sure, some standard mistakes could be address (like the one pointed out at the beginig of this issue) For instance, in Alice (Guttenberg edition) ellipse wouldn’t it better to chose already well typographed editions? |
+1
True. Some substitutions are generic, others are language-specific.
I’d bet
I agree it won’t be perfect. But even non-perfect typographic improvements would be welcome for conversational corpora such as Leipzig.
This is absolutely non-standard and should be fixed. I expect most books to require specific tuning/replacements.
I agree it would be better but I’m not sure we can find a significant book collection with an open licence. |
Given that:
æ
,’
,“”
,…
, etc.ae
instead ofæ
,'
instead of’
,...
instead of…
,""
instead of“”
, etc.our corpus should be “fancified” before getting transformed into JSON dictionary, in order not to penalize keyboard layouts that have a proper support for these special characters. That’s what the
fancify.sh
script (ormake fancy
target) does. But this is still a work in progress — several substitutions are still missing, e.g.:“”
,« »
,„“
depending on the language?:;!
in French¿
sign in Spanish--
The text was updated successfully, but these errors were encountered: