Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support EBNF #50

Open
VladimirAlexiev opened this issue Mar 22, 2024 · 4 comments
Open

support EBNF #50

VladimirAlexiev opened this issue Mar 22, 2024 · 4 comments

Comments

@VladimirAlexiev
Copy link

I want to analyze this grammar for complexity, it's in EBNF: https://github.com/VladimirAlexiev/shacl/blob/shaclc-grammars/shacl-compact-syntax/grammar/shaclc-XText.ebnf.
It's also available in Eclipse XText: https://github.com/VladimirAlexiev/shacl/blob/shaclc-grammars/shacl-compact-syntax/grammar/shaclc-XText.xtext.

Is it possible for Grammophone to support one of these formats?

@mdaines
Copy link
Owner

mdaines commented Mar 23, 2024

I'd like Grammophone to do this. I think the obvious solution is to convert EBNF to BNF just after parsing by adding rules to the grammar. However, when considering this before, I wasn't sure how to present calculations with these additional rules.

For example, this grammar:

S -> x y* .

...could be converted internally to something like this:

S -> x S_ .
S_ -> y S_ | .

This would add S_ to the nonterminals table, parsing tables, etc. Maybe that's OK, but I was concerned about the results of the conversion being difficult to follow, especially for a more complicated grammar.

Perhaps the added nonterminals could be displayed differently, or the conversion to BNF could be shown explicitly somehow.

@modulovalue
Copy link

There's one huge issue that prevents a traditional EBNF grammar from being straightforwardly-useful with grammophone. The conversion between CFG and EBNF is not "lossless". There's some amount of ambiguity there as there are alternative interpretation strategies for the constructs that EBNF comes with.

For example, repetition operators can be interpreted to either mean a left recursive or a right recursive version.

Left recursive: on grammophone

Start -> S.
S -> a.
S -> S a.

Right recursive: on grammophone

Start -> S.
S -> a.
S -> a S.

Both describe the same language. Which one should be used?

There are more implementations possible, see this comment: #26 (comment)

There's no clear winner there so it's not clear which implementation should be used as each has the ability to offer different benefits wrt to the conflict and lookahead profile of a CFG.


Here are two separate solutions to the actual issue (which is to support a more concise grammar formalism)

  1. the "Parametrization" approach
Start -> StarRightRec(a).
StarRightRec(X) -> X.
StarRightRec(X) -> X S.

Which would make it possible to define ones own constructs.

  1. the "Native Rules" approach:
Start -> *rr(a).

where *rr/*lr is desugared into a fixed right recursive/left recursive form behind the scenes. The "semantics" of the star operator would be fixed and users of grammophone would precisely know what *rr or *lr is converted to.

I think the second approach is a much better fit for grammophone. The first one is very convenient, but much harder to implement.

Extra: 3. The "cutting-edge-infinite-resources" approach.

Grammophone could use e-graphs (https://egraphs-good.github.io) for finding the best version that leads to the fewest conflicts. This has not been done before and it is much harder, but I hope one day we'll have something like that.

@modulovalue
Copy link

modulovalue commented Mar 30, 2024

However, when considering this before, I wasn't sure how to present calculations with these additional rules.

This would add S_ to the nonterminals table, parsing tables, etc. Maybe that's OK, but I was concerned about the results of the conversion being difficult to follow, especially for a more complicated grammar.

I think this could be achieved by taking a dataflow approach (either via DeRemer & Penello or via Pingali #27) and introducing unique nodes for the extended constructs. (Edit: Pingali discusses how a dataflow approach over a GFG can be used to find lookaheads.)

Regular Expression-to-NFA compilers can find firstsets/followsets just from the regular expression itself without desugaring those constructs into a CFG. I think this should also be doable on the CFG level.

@mdaines
Copy link
Owner

mdaines commented Apr 1, 2024

For example, repetition operators can be interpreted to either mean a left recursive or a right recursive version.

Of course, somehow I forgot about that...

I think the ambiguity has a benefit, though. Grammophone could choose an arbitrary interpretation but also allow the user to select one interactively (ie, not using syntax). This would allow for demonstrating the differences between the two.

So, given the example grammar:

S -> x y* .

Grammophone could indicate that y* is interpreted like this:

y* => S_ -> y S_ | .

(Let's say => means "the EBNF construct is interpreted as..." or something.)

But also allow the user to select this:

y* => S_ -> S_ y | .

I'm not sure where this would be presented. Maybe above the "Sanity Checks" section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants