support EBNF #50

VladimirAlexiev · 2024-03-22T13:24:37Z

I want to analyze this grammar for complexity, it's in EBNF: https://github.com/VladimirAlexiev/shacl/blob/shaclc-grammars/shacl-compact-syntax/grammar/shaclc-XText.ebnf.
It's also available in Eclipse XText: https://github.com/VladimirAlexiev/shacl/blob/shaclc-grammars/shacl-compact-syntax/grammar/shaclc-XText.xtext.

Is it possible for Grammophone to support one of these formats?

mdaines · 2024-03-23T13:56:24Z

I'd like Grammophone to do this. I think the obvious solution is to convert EBNF to BNF just after parsing by adding rules to the grammar. However, when considering this before, I wasn't sure how to present calculations with these additional rules.

For example, this grammar:

S -> x y* .

...could be converted internally to something like this:

S -> x S_ .
S_ -> y S_ | .

This would add S_ to the nonterminals table, parsing tables, etc. Maybe that's OK, but I was concerned about the results of the conversion being difficult to follow, especially for a more complicated grammar.

Perhaps the added nonterminals could be displayed differently, or the conversion to BNF could be shown explicitly somehow.

modulovalue · 2024-03-30T11:17:01Z

There's one huge issue that prevents a traditional EBNF grammar from being straightforwardly-useful with grammophone. The conversion between CFG and EBNF is not "lossless". There's some amount of ambiguity there as there are alternative interpretation strategies for the constructs that EBNF comes with.

For example, repetition operators can be interpreted to either mean a left recursive or a right recursive version.

Left recursive: on grammophone

Start -> S.
S -> a.
S -> S a.

Right recursive: on grammophone

Start -> S.
S -> a.
S -> a S.

Both describe the same language. Which one should be used?

There are more implementations possible, see this comment: #26 (comment)

There's no clear winner there so it's not clear which implementation should be used as each has the ability to offer different benefits wrt to the conflict and lookahead profile of a CFG.

Here are two separate solutions to the actual issue (which is to support a more concise grammar formalism)

the "Parametrization" approach

Start -> StarRightRec(a).
StarRightRec(X) -> X.
StarRightRec(X) -> X S.

Which would make it possible to define ones own constructs.

the "Native Rules" approach:

Start -> *rr(a).

where *rr/*lr is desugared into a fixed right recursive/left recursive form behind the scenes. The "semantics" of the star operator would be fixed and users of grammophone would precisely know what *rr or *lr is converted to.

I think the second approach is a much better fit for grammophone. The first one is very convenient, but much harder to implement.

Extra: 3. The "cutting-edge-infinite-resources" approach.

Grammophone could use e-graphs (https://egraphs-good.github.io) for finding the best version that leads to the fewest conflicts. This has not been done before and it is much harder, but I hope one day we'll have something like that.

modulovalue · 2024-03-30T11:49:03Z

However, when considering this before, I wasn't sure how to present calculations with these additional rules.

This would add S_ to the nonterminals table, parsing tables, etc. Maybe that's OK, but I was concerned about the results of the conversion being difficult to follow, especially for a more complicated grammar.

I think this could be achieved by taking a dataflow approach (either via DeRemer & Penello or via Pingali #27) and introducing unique nodes for the extended constructs. (Edit: Pingali discusses how a dataflow approach over a GFG can be used to find lookaheads.)

Regular Expression-to-NFA compilers can find firstsets/followsets just from the regular expression itself without desugaring those constructs into a CFG. I think this should also be doable on the CFG level.

mdaines · 2024-04-01T14:40:16Z

For example, repetition operators can be interpreted to either mean a left recursive or a right recursive version.

Of course, somehow I forgot about that...

I think the ambiguity has a benefit, though. Grammophone could choose an arbitrary interpretation but also allow the user to select one interactively (ie, not using syntax). This would allow for demonstrating the differences between the two.

So, given the example grammar:

S -> x y* .

Grammophone could indicate that y* is interpreted like this:

y* => S_ -> y S_ | .

(Let's say => means "the EBNF construct is interpreted as..." or something.)

But also allow the user to select this:

y* => S_ -> S_ y | .

I'm not sure where this would be presented. Maybe above the "Sanity Checks" section.

VladimirAlexiev mentioned this issue Mar 22, 2024

SHACLC: grok shaclc-XText and its advanced features w3c/shacl#52

Closed

VladimirAlexiev mentioned this issue Aug 2, 2024

SHACL-C: analyze EBNF grammar complexity w3c/shacl#67

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support EBNF #50

support EBNF #50

VladimirAlexiev commented Mar 22, 2024

mdaines commented Mar 23, 2024

modulovalue commented Mar 30, 2024

modulovalue commented Mar 30, 2024 •

edited

Loading

mdaines commented Apr 1, 2024

support EBNF #50

support EBNF #50

Comments

VladimirAlexiev commented Mar 22, 2024

mdaines commented Mar 23, 2024

modulovalue commented Mar 30, 2024

modulovalue commented Mar 30, 2024 • edited Loading

mdaines commented Apr 1, 2024

modulovalue commented Mar 30, 2024 •

edited

Loading