-
Notifications
You must be signed in to change notification settings - Fork 574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i/prompting: implement path pattern matching #13866
i/prompting: implement path pattern matching #13866
Conversation
interfaces/prompting/patterns.go
Outdated
break | ||
} | ||
switch r { | ||
case '{': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm confused even more now. The spec does not mention {}
for path patterns, so why validate this? Also, there's inconsistency between ValidatePathPattern and MatchPathPatter, where the latter is oblivious to {}
, which means that a definition of a valid pattern is different for each of those calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MatchPathPattern
directly calls doublestar.Match
to do matching (discussed some here: https://docs.google.com/document/d/1tBnefdukP69EUJOlH8bgD2hrvZCYoE8-1ZlqRRYlOqc/edit?pli=1#heading=h.a0e0vfqho1ip). This should be called on expanded patterns, but this isn't technically necessary, since doublestar.Match
happily matches patterns with '{'
'}'
groups in them (see: https://pkg.go.dev/github.com/bmatcuk/doublestar#Match). Path pattern validation should occur on patterns before they are expanded, so they may contain '{'
'}'
-defined groups.
interfaces/prompting/patterns.go
Outdated
} | ||
case '\\': | ||
// Skip next rune | ||
_, _, err = reader.ReadRune() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't the escape sequence valid only for some of the characters that follow? eg what would a pattern like \c
mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to doublestar.Match
(https://pkg.go.dev/github.com/bmatcuk/doublestar#Match), \c
matches c
for any character c
. I'm not adjusting that behavior at all, at the moment.
matched, err := doublestar.Match(pattern, path) | ||
if err != nil { | ||
return false, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't the pattern be validated prior to being used? Perhaps ValidatePathPatter should call doublestar.Match with say "/" as path, discard the result and only check the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doublestar.ValidatePattern
is called as part of ValidatePathPattern
, so the pattern should be valid. This error should thus never occur when the PathPatternMatch
is called in practice, but it's a check in case PathPatternMatch
is called in isolation without being validated. So I'm inclined to leave the error return in place rather than discarding the error, for completeness.
1cab837
to
6e34c06
Compare
d414d4d
to
eff5c6d
Compare
Note: the path pattern validation in this PR may be no longer relevant due to changes in #13867, which was intended to be based on this PR but has now grown to replace manual path pattern validation with path pattern parsing, which validates and expands path patterns. |
Path pattern matching is implemented via the doublestar package, which emulates bash's globstar matching. Patterns may include '*' wildcard characters (which match any number of non-separator characters), '**' doublestars (which match zero or more subdirectories), '?' wildcard characters (which match exactly one non-separator character), and nested groups delimited by '{' and '}'. Notably, path patterns are *not* allowed to have character classes delimited by '[' and ']', nor inverted classes of the form "[^abc]". There is a limit on the number of groups allowed in path patterns, but up to that limit, groups may be arbitrarily nested or sequential. Signed-off-by: Oliver Calder <[email protected]> i/prompting: fix typo and add notes to remove test boilerplate Signed-off-by: Oliver Calder <[email protected]> i/prompting: use separate test suite for patterns Signed-off-by: Oliver Calder <[email protected]> i/prompting: improve unit test coverage Signed-off-by: Oliver Calder <[email protected]>
Rather than counting the number of groups and using it as a heuristic for the number of patterns into which a given path pattern will expand, instead compute the true number of expanded patterns and compare it against a set limit. Signed-off-by: Oliver Calder <[email protected]>
Signed-off-by: Oliver Calder <[email protected]>
…erns Signed-off-by: Oliver Calder <[email protected]>
Signed-off-by: Oliver Calder <[email protected]>
deff68c
to
e841e60
Compare
@olivercalder what's the status of this PR? Can it be reviewed now? |
@MiguelPires this PR is the basis for two PRs with possible implementations of path pattern expansion: #13867 and #14059. Both strip out the validation function from this PR, so it's possible/likely that this PR won't be merged on its own. Alternatively, I could strip out the validation function and no longer rebase the other PRs on this one when it changes. I think that might be the best approach. So it's ready for review, just ignore the validation part :) |
Please do, it'll make it easier to review the actual thing that you want to land. |
Signed-off-by: Oliver Calder <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// `/**/`: `/foo/**`, and `/foo/**/` both match `/foo` and `/foo/`. We want to | ||
// override this behavior to make `/foo/**/` not match `/foo`. We also want to | ||
// override doublestar to make `/foo` match `/foo/`. | ||
func PathPatternMatch(pattern string, path string) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker, but just a random idea as there's no control over what is passed as a pattern, but if the pattern expansion resulted in an opaque type eg struct with unexported fields, then you could guarantee that what is passed into this function as the pattern parameter is not a random string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this point, but it can be addressed together with the parsing/validation code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may relate to precedence as well, if we want to do something a bit more clever than comparing patterns rune-by-rune (e.g. strip common prefixes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
// `/**/`: `/foo/**`, and `/foo/**/` both match `/foo` and `/foo/`. We want to | ||
// override this behavior to make `/foo/**/` not match `/foo`. We also want to | ||
// override doublestar to make `/foo` match `/foo/`. | ||
func PathPatternMatch(pattern string, path string) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this point, but it can be addressed together with the parsing/validation code
Path pattern matching is implemented via the doublestar package, which emulates bash's globstar matching. Patterns may include '*' wildcard characters (which match any number of non-separator characters), '**' doublestars (which match zero or more subdirectories), '?' wildcard characters (which match exactly one non-separator character), and nested groups delimited by '{' and '}'. Notably, path patterns are not allowed to have character classes delimited by '[' and ']', nor inverted classes of the form "[^abc]".
There is a limit on the number of groups allowed in path patterns, but up to that limit, groups may be arbitrarily nested or sequential.
This PR is split from #13730. This work is tracked internally by SNAPDENG-18541.