Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delimited parsers #129

Open
Lysxia opened this issue Jun 16, 2017 · 2 comments
Open

Delimited parsers #129

Lysxia opened this issue Jun 16, 2017 · 2 comments

Comments

@Lysxia
Copy link
Contributor

Lysxia commented Jun 16, 2017

A common situation is to parse an encoding prefixed by its length. So you first parse the length as an integer n, and then you would like to run a (sub)parser p :: Parser a only on the next n bytes. I could think of two solutions for users today:

  • Use take to get the ByteString and apply parseOnly p. However, we lose source position information in case the subparser fails, and we have to keep the whole ByteString in memory.

  • Wrap Parser (e.g., with a few monad transformers) to track things like the number of bytes read; that would allow combinators like the ones I have in mind. However, this is rather heavyweight to implement. Does an existing library already offer this? I also suspect this approach would have more overhead than necessary.

It would be nice for attoparsec to have combinators to delimit the input that a subparser gets to see, like span and splitAt in pipes-parse.

What do you think of such an addition? Is there a better solution?

@Lysxia
Copy link
Contributor Author

Lysxia commented Jun 16, 2017

#48 and #95 were in this situation before, with solutions that correspond to the first item above.

@joeyh
Copy link

joeyh commented Jan 10, 2019

I keep needing to do this kind of thing in my attoparsec parsers, and only on discovering this bug am I shaking the feeling that I'm somehow using attoparsec wrong to need to use parseOnly within a parser so frequently.

With me, it often comes up while writing something like Parser a -> Parser b, which needs to pick out the delimited data and run the sub-parser over it.

Checking endOfInput seems like one thing that can easily be gotten wrong when doing this. The example in #95 perhaps forgot to do that. I wonder if a combinator for this should require the sub-parser to consume all the input?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants