Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glob: Make double asterisk match zero or more directories #192

Merged
merged 4 commits into from
Jul 28, 2021
Merged

Glob: Make double asterisk match zero or more directories #192

merged 4 commits into from
Jul 28, 2021

Conversation

eWert-Online
Copy link
Contributor

Fixes #185

I don't know, if this is a good implementation for the fix (I rather doubt it) but maybe it can be used as a starting point to further improve on.

As I already noted in the issue, the problem was, that the parser saw the /'s as explicitly set characters that needed to be present.
I am now checking, if the double asterisk has leading or trailing slashes and ignore them if there are any.
If the trailing slash is in the last place, I do not ignore it, so the test on line 94 does not fail.

If you have any suggestions for an implementation that is easier to understand / read, I would be happy to hear it 😄

@rgrinberg
Copy link
Member

What about the case where ** is part of a path element? E.g. foo/**bar. If I understand the spec correctly, this should be treated as *? I think we want to test this behavior.

lib/glob.ml Outdated
else if read '*'
then if double_asterisk && read '*'
then [ManyMany]
else [Many]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed this is confusing. How about we just keep one clause for double asterisk and detect the trailing ** case by checking that s.[!i - 2] = /

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is possible, because then the / is already added as an Exactly('/') piece.
The idea is, to not add the / as a piece, when a ** follows.

Copy link
Member

@rgrinberg rgrinberg Jul 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, but I still think what is problematic is that we are spreading the double asterisk clauses across so many branches. How about if we introduce the following primitive in addition to read:

(** [read_prefix prefix] will attempt to read [prefix] and will return [true] if it was successful.
   If it fails, it will not increment the read index. *)
val read_prefix : string -> bool

Then the code can be written as:

if double_asterisk && read_prefix "/**" then
   ..
else if read_prefix "**" then ..
 (* etc *)

Do you think this will help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this is much cleaner now!

I decided to name the function lookahead instead of read_prefix, as the word "prefix" for me suggests looking back from the current position and not ahead.
Would be fine with renaming it though, if you don't like "lookahead".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with the name lookahead is that it is reserved for peeking into available characters but not consuming it. Your lookahead function does the consuming. read_prefix isn't that great either, so feel free to come up with another name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of read_ahead?

@eWert-Online
Copy link
Contributor Author

eWert-Online commented Jul 15, 2021

What about the case where ** is part of a path element? E.g. foo/**bar. If I understand the spec correctly, this should be treated as *? I think we want to test this behavior.

I think this behaviour is already tested here:

assert (re_match (glob ~anchored "foo/**bar") "foo/far/oofbar");

...or do you mean another test like the following (which also passes currently)

assert (re_match    (glob ~anchored "foo/**bar") "foo/foobar"); 

@eWert-Online eWert-Online requested a review from rgrinberg July 28, 2021 10:42
@rgrinberg
Copy link
Member

Thanks!

@rgrinberg rgrinberg merged commit 42c84b0 into ocaml:master Jul 28, 2021
@nojb
Copy link
Contributor

nojb commented Aug 23, 2021

The project does not compile anymore after this PR (raise_no_trace should be raise_notrace).

rgrinberg added a commit to rgrinberg/opam-repository that referenced this pull request Aug 25, 2021
CHANGES:

* Add the `[:alpha:]` character class in `Re.Perl` (ocaml/ocaml-re#169)
* Double asterisk (`**`) in `Re.Glob` (ocaml/ocaml-re#172)
  Like `*` but also match `/` characters when `pathname` is set.
* Double asterisk should match 0 or more directories unless in trailing
  position. (ocaml/ocaml-re#192, fixes ocaml/ocaml-re#185)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Glob: Double asterisk is not matching zero directories
3 participants