-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Case-insensitive parsing and validation. #51
Comments
|
I agree with 2. |
Noting this here, as it might be helpful: https://docs.python.org/3/library/stdtypes.html#str.casefold |
Does it make sense to you that the case of quoted strings should be preserved? The way Getkw works, you can have quoted and unquoted strings. The latter can only look like valid C user tokens (first character can be an underscore or an alphabetic chars, followed by more underscores and alphanumeric chars) whereas the former can be almost anything (unicode too, I think) Most importantly, quoted string can contain slashes, so that this is the format one should use to specify file paths. I'd expect file paths to always be intended to be case-sensitive, hence my question. |
I agree that changing case of paths is dangerous. Also until now I did not realize that we wanted to possibly normalize everything - somehow I imagined that we only normalize keys but not values. |
Yeah, it's thorny. Are all of these acceptable?
|
Every time I saw case insensitivity as user (e.g. Fortran, old OS X file system (?)), I thought: "wow what a terrible idea!" - but I guess we can offer the possibility. |
How will this work:
|
The current grammar will not be able to parse the former. |
How about we only offer ignore-casing of keys? But we preserve values? |
How about this
Is it defined in the grammar what to accept? |
@stigrj All of these are valid for booleans as caseless literals already.
@bast I think I would still like |
Good point about the booleans, this is a frequent papercut. Also scientific notation floats should allow ignore-case "D" and "E". |
Yes, b3lyp example is convincing. Also I would like to type cc-pvdz.
|
Yes, I think the most realistic use case is to allow both
The keywords can always be set to follow a convention, like in MRChem:
It's not that easy with values |
About the "NO" interpreted as falsey - we had a nasty bug in a web thingy where we were parsing country codes for Nordic projects and NO (Norway) was interpreted as false :-) |
I think paths are the only case where we would never want to modify the case of the string. Are there any other examples? I feel adding a
@bast I was bitten by the same two days ago (in the randomized testing of lists of unquoted strings) Should we only allow true and false? I was trying to reproduce the original grammar: https://github.com/dev-cafe/libgetkw/blob/master/Python/getkw.py#L757 |
Bonus of having the |
How about adding actions to the keywords similar to predicates, so that you can get normalized values where you want them, like functionals and basis sets
A |
So much power... I like the idea, but I don't think it scales well (I need to do it for all keywords) I need to think about it more. |
After some more thinking I still like the idea of a As to "NO": I would not miss yes/no. I am still unsure whether I would miss on/off. |
Another thing that I would not like to case-normalize are checksums/hashes. |
Let's split the |
I like the suggestion of having two |
Name suggestions for the types? |
How about |
I have written some preliminary design docs. Let me know if you agree. See also my comment on #55 as to why, contrary to my initial enthusiam, we do not need/want a path type. Case-sensitivity is a sensitive issue. It is desirable to have the following two examples be equally valid:
We can ensure case-insensitive comparisions by preliminarly normalizing the case of the input. The programmer would have the choice of normalizing to uppercase or lowercase. Hence, the above examples would both decay to either:
when normalizing to lowercase, or:
when normalizing to uppercase. Note how normalization happens on the left-hand (section and keyword labels) and the right-hand side (values). There are however cases where normalizing the case is not desirable. Two notable examples are filesystem paths and checksums. For these cases, the case from input needs to be respected. Once again, note that the case of default values in the We offer two string types:
|
Do we really need/want insensitive keywords and sections? Wouldn't it be more sensible to only normalize the How about three types:
|
Thanks for convincing arguments about |
Discussion with Stig and Roberto:
|
As pointed out by @stigrj it would be good to not force case-sensitivity upon the users. I see some options:
api.lex
(and expose it in the CLI). Something like--case upper
(--case lower
)The text was updated successfully, but these errors were encountered: