Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"unrecognized input" after upgrade #208

Open
danabr opened this issue May 3, 2024 · 1 comment
Open

"unrecognized input" after upgrade #208

danabr opened this issue May 3, 2024 · 1 comment

Comments

@danabr
Copy link

danabr commented May 3, 2024

Description

I recently upgraded from FsLexYacc 10.0 to the latest 11.3.0. After the upgrade, parsing a comment line // ä now fails with "unrecognized input". I have made no changes to the lexer or parser options, nor to the parser or lexer definitions.

Repro steps

I have managed to create a small-ish reproducer:

Parser.fsy:

%token EOF
%token <string*FSharp.Text.Lexing.Position> IDENTIFIER

%start top
%type <string> top

%%

top: EOF { "hello" }

Lexer.fsl:

{
module Lexer

open FSharp.Text.Lexing
open Parser

let lexeme lexbuf = LexBuffer<char>.LexemeString lexbuf

}

let alpha = ['a' - 'z' 'A' - 'Z']
let swe = ['ä' 'Ä' 'ö' 'Ö' 'å' 'Å' ]
let letter = alpha | swe
let ident = letter+
let newline = ('\n' | "\r\n" )

rule token = parse
| "//"           { commentline lexbuf.StartPos lexbuf }
| ident          { IDENTIFIER(lexeme lexbuf, lexbuf.StartPos) }
| newline        { token lexbuf }
| eof            { EOF }
| _              { failwith "unknown token" }

and commentline p = parse
| newline        { token lexbuf }
| eof            { EOF }
| _              { commentline p lexbuf }

Program.fs:

open Parser
open Lexer

let input = "// ä"
let lexbuf = FSharp.Text.Lexing.LexBuffer<_>.FromString input
let result = Parser.top Lexer.token lexbuf

printfn "%s" result

FsLexYaccRepro.fsproj:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net8.0</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="FsLexYacc.Runtime" Version="11.3.0" />
    <PackageReference Include="FsLexYacc" Version="11.3.0" />
  </ItemGroup>

  <ItemGroup>
    <FsLex Include="Lexer.fsl">
      <OtherFlags>--unicode</OtherFlags>
    </FsLex>
    <FsYacc Include="Parser.fsy">
      <OtherFlags>--module Parser</OtherFlags>
    </FsYacc>
    <Compile Include="Parser.fs" />
    <Compile Include="Lexer.fs" />
    <Compile Include="Program.fs" />
  </ItemGroup>
</Project>

Expected behavior

When running the program above with dotnet run the output should be "hello".

Actual behavior

We get an exception with the stacktrace:

Unhandled exception. System.Exception: unrecognized input
   at FSharp.Text.Lexing.LexBuffer`1.EndOfScan() in /home/runner/work/FsLexYacc/FsLexYacc/src/FsLexYacc.Runtime/Lexing.fs:line 128
   at FSharp.Text.Lexing.UnicodeTables.scanUntilSentinel(LexBuffer`1 lexBuffer, Int32 state) in /home/runner/work/FsLexYacc/FsLexYacc/src/FsLexYacc.Runtime/Lexing.fs:line 448
   at Lexer.commentline(Position p, LexBuffer`1 lexbuf) in C:\cygwin64\home\daab\dev\FsLexYaccRepro\Lexer.fs:line 81
   at Lexer.token(LexBuffer`1 lexbuf) in C:\cygwin64\home\daab\dev\FsLexYaccRepro\Lexer.fs:line 18
   at [email protected](LexBuffer`1 lexbuf)
   at FSharp.Text.Parsing.Implementation.interpret[tok,a](Tables`1 tables, FSharpFunc`2 lexer, LexBuffer`1 lexbuf, Int32 initialState) in /home/runner/work/FsLexYacc/FsLexYacc/src/FsLexYacc.Runtime/Parsing.fs:line 346
   at FSharp.Text.Parsing.Tables`1.Interpret[char](FSharpFunc`2 lexer, LexBuffer`1 lexbuf, Int32 startState) in /home/runner/work/FsLexYacc/FsLexYacc/src/FsLexYacc.Runtime/Parsing.fs:line 498
   at Parser.engine[a](FSharpFunc`2 lexer, LexBuffer`1 lexbuf, Int32 startState) in C:\cygwin64\home\daab\dev\FsLexYaccRepro\Parser.fs:line 111
   at Parser.top[a](FSharpFunc`2 lexer, LexBuffer`1 lexbuf) in C:\cygwin64\home\daab\dev\FsLexYaccRepro\Parser.fs:line 113
   at <StartupCode$FsLexYaccRepro>.$Program.main@() in C:\cygwin64\home\daab\dev\FsLexYaccRepro\Program.fs:line 6

Note that parsing the input "// a" works fine. Also, parsing works if I remove ä from swe in Lexer.fsl.

@danabr
Copy link
Author

danabr commented May 3, 2024

Bisection indicates that the regression was introduced with 48ec571 (break out core domain logic and generation into core libraries (#144), 2021-01-27).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant