Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i/prompting: render path patterns variants using recursive descent parser #14059

Merged

Conversation

olivercalder
Copy link
Member

@olivercalder olivercalder commented Jun 7, 2024

Parse path patterns using a recursive descent parser such that all possible choices of options within groups can be rendered into distinct variants of the pattern, where each variant has no groups remaining. This parsing occurs when patterns are unmarshalled from JSON, so the parsing only needs to occur once.

This is done using a dedicated patterns package, which internally includes a tokenizer, parser, and renderer.

This is an alternative to #13867, and also relates to #13866 and #13868, as well as SNAPDENG-20094.

@olivercalder olivercalder added the Skip spread Indicate that spread job should not run label Jun 7, 2024
@olivercalder olivercalder requested a review from zyga June 7, 2024 23:57
@bboozzoo bboozzoo self-requested a review June 11, 2024 13:22
Copy link
Contributor

@bboozzoo bboozzoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much nicer this way

interfaces/prompting/patterns/scan.go Outdated Show resolved Hide resolved
@olivercalder olivercalder force-pushed the prompting-path-pattern-parser branch from 0c8702c to 2939985 Compare June 13, 2024 03:44
@olivercalder olivercalder added Needs Samuele review Needs a review from Samuele before it can land and removed Skip spread Indicate that spread job should not run labels Jun 13, 2024
@olivercalder olivercalder reopened this Jun 13, 2024
@olivercalder olivercalder marked this pull request as ready for review June 13, 2024 03:51
Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please join some of the files in patterns, one struct per file is a bit too much

Copy link
Member Author

@olivercalder olivercalder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions for @zyga regarding the original code :)

interfaces/prompting/patterns/render.go Outdated Show resolved Hide resolved
interfaces/prompting/patterns/render.go Outdated Show resolved Hide resolved
interfaces/prompting/patterns/render.go Outdated Show resolved Hide resolved
interfaces/prompting/patterns/render.go Outdated Show resolved Hide resolved
@bboozzoo
Copy link
Contributor

needs a rebase now

Copy link
Contributor

@zyga zyga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick scan and this looks good to me. Since the PR is huge I will do a full pass later not to cannibalise other reviews.

@pedronis pedronis self-requested a review June 18, 2024 08:42
olivercalder and others added 11 commits June 18, 2024 09:35
Path patterns may include arbitrary nested groups, delimited by '{' and
'}', up to a limit on the total number of groups. In order to compare
the precedence of path patterns which match a given path, these path
patterns must be expanded until no groups remain, and thus the
particular group-free patterns which was resolved from the original
patterns when matching the path can be compared.

Signed-off-by: Oliver Calder <[email protected]>
Rather than separately validate and expand path patterns, storing the
result as a list of expanded patterns, parse a pattern into a
PathPattern type, which can dynamically render expanded path patterns as
needed with minimal overhead.

When path patterns are received from prompting clients, path patterns
can be unmarshalled and automatically validated, and any future use of
the pattern in-memory can use the pre-parsed PathPattern to iterate
through expanded path patterns without needing to explicitly expand and
store all path patterns.

Additionally, the new PathPattern type should be used in Constraints in
place of the old path pattern string.

Signed-off-by: Oliver Calder <[email protected]>
Rather than keep separate stacks for the sequences and paths which the
parser is currently inside, instead keep a single stack, to which the
existing sequence and a new group is added whenever a '{' rune is
encountered.

Then there is no need to no need for a variable to hold the current
group, peeking the stack yields the most recent group, to which the
current sequence can be added whenever a ',' or '}' is encountered.

When a '}' is encountered, the most recent group is popped off the
stack, the current sequence is added to it (completing the group), and
then the previous sequence is popped off the stack and the completed
group is added to it. From there, that previous sequence is now
considered the current sequence until another '{' is encountered.

Signed-off-by: Oliver Calder <[email protected]>
…erns

Co-authored-by: Oliver Calder <[email protected]>
Signed-off-by: Oliver Calder <[email protected]>
@olivercalder olivercalder force-pushed the prompting-path-pattern-parser branch from 225d8f9 to 048b233 Compare June 21, 2024 02:03
@olivercalder olivercalder requested a review from zyga June 21, 2024 05:07
Add a reference to the `renderNode` used to generate a given
`variantState` to that state itself. This allows methods on
`variantState` to be called without needing to pass as a parameter the
same `renderNode` which was used to generate the `variantState`.

Also, move the `Render` function to be a method on `variantState`
instead of `renderNode`. This makes sense semantically, since we render
particular variants, rather than nodes themselves, and makes sense
ergonomically since we now have a reference to the `renderNode` within
each `variantState`, so there is no need to pass parameters around for
nodes and variants which are required to be associated anyway.

Signed-off-by: Oliver Calder <[email protected]>
@olivercalder
Copy link
Member Author

@peronis @zyga the static checker is unhappy that I've used the patterns package for various test files which exercise internal types and methods. Is there a standard way we get around this?

@pedronis
Copy link
Collaborator

@peronis @zyga the static checker is unhappy that I've used the patterns package for various test files which exercise internal types and methods. Is there a standard way we get around this?

use the naming pattern XXX_internal_test.go for them, afaics from the linter docs

Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the changes, some more comments

interfaces/prompting/patterns/parse.go Outdated Show resolved Hide resolved
interfaces/prompting/patterns/parse.go Outdated Show resolved Hide resolved
interfaces/prompting/patterns/render.go Outdated Show resolved Hide resolved
interfaces/prompting/patterns/render.go Outdated Show resolved Hide resolved
Add comment as such to the `literal` type definition, and have
`literal.NextVariant` return length 0 to make it consistent with other
`variantState` types.

Signed-off-by: Oliver Calder <[email protected]>
@olivercalder olivercalder changed the title i/prompting: expand path patterns using recursive descent parser i/prompting: render path patterns variants using recursive descent parser Jun 25, 2024
Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

interfaces/prompting/patterns/parse.go Show resolved Hide resolved
interfaces/prompting/patterns/patterns.go Outdated Show resolved Hide resolved
Comment on lines 59 to 61
if pattern[0] != '/' {
return fmt.Errorf("%s: pattern must start with '/'", prefix)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be caught by the parser at level 0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to interrogate inside text tokens once they've been tokenized, and there is precedent for scanner throwing errors about invalid characters, so I think I'd rather catch this by the scanner. But I agree, not in the calling function in patterns.go.

Comment on lines 56 to 58
if pattern == "" {
return fmt.Errorf("%s: pattern has length 0", prefix)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parser could complain about no tokens

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a scanner check as well, as part of checking that the first character is '/'

Comment on lines 172 to 174
if strings.HasSuffix(pattern, "/") && !strings.HasSuffix(path, "/") {
return false, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're discarding match result, then could this be done earlier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first match result happens first to check that everything is valid, and throw an error if not. Then (assuming there was not an error), the match result is used to see if the pattern as-is matches the path. If there's no match, we then check if the pattern would match if it ended in a '/'. So the match result isn't really discarded, just some reorganization to handle the error first.

But the pattern should be known to be valid, since we rendered it, so perhaps this check could happen before the first match. And this would save the computation time of doublestar.Match if we know it doesn't match (by our standards).

totalLength := 0

for _, element := range v.elements {
totalLength += element.Length()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this may be getting called mulitple times, any way we could store totalLength in seqVariant to avoid redoing all the the work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think caching length of the current variant for a seqVariant helps, as literal and altVariant are O(1) (O(depth) total in variants). More importantly we nearly always if not always need to know the length immediately after calling InitialVariant, so we might as well return it as well, meaning we almost never need to call Length() directly -- only when we've changed a variant partway through a seq and need to compute the total length again -- but this is cached now, so much faster.

Copy link
Contributor

@bboozzoo bboozzoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// parse validates the given pattern and parses it into a PathPattern from
// which expanded path patterns can be iterated, overwriting the receiver.
func (p *PathPattern) parse(pattern string) error {
prefix := fmt.Sprintf("cannot parse path pattern %q", pattern)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
prefix := fmt.Sprintf("cannot parse path pattern %q", pattern)

func ParsePathPattern(pattern string) (*PathPattern, error) {
pathPattern := &PathPattern{}
if err := pathPattern.parse(pattern); err != nil {
return nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, err
return nil, fmt.Errorf("cannot parse path pattern %q: %w", pattern, err)

end even then I would question whether you need this prefix at all, since the caller knows the pattern already

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want the parse method to handle its error messages rather than ParsePathPattern, since parse is called by UnmarshalJSON and I think the error should be properly prefixed there as well.

Copy link
Contributor

@zyga zyga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only one minor comment inline.

interfaces/prompting/patterns/render.go Outdated Show resolved Hide resolved
interfaces/prompting/patterns/render.go Outdated Show resolved Hide resolved
@olivercalder olivercalder added Squash-merge Please squash this PR when merging. and removed Needs Samuele review Needs a review from Samuele before it can land labels Jun 28, 2024
@olivercalder olivercalder merged commit 8fc33dd into canonical:master Jun 28, 2024
48 of 51 checks passed
@olivercalder olivercalder added this to the 2.65 milestone Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Squash-merge Please squash this PR when merging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants