Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the time and memory complexity of the library #34

Open
neilmayhew opened this issue Jul 23, 2019 · 3 comments
Open

Document the time and memory complexity of the library #34

neilmayhew opened this issue Jul 23, 2019 · 3 comments

Comments

@neilmayhew
Copy link

It's not clear whether this library is subject to the same exponential blowups as other RE implementations. Adding a note in the README would be very helpful.

This seems particularly relevant after the recent CloudFlare outage which was partly caused by a bad RE that led to crippling CPU usage.

@UnkindPartition
Copy link
Owner

There shouldn't be any exponential blowup scenarios here.

The precise complexity depends on how you use the library. If it's just matching, then I think it would be O(n) time and O(1) memory for any given regex.

For recognition, I'd predict O(n) memory because of the bookkeeping of which branch was taken at each step. Hopefully the runtime stays O(n), but I'm not sure.

If you're up for confirming these empirically and then preparing a README PR or filing issues (depending on the outcome), I'd gladly accept them.

@neilmayhew
Copy link
Author

I was hoping for more than empirical evidence, since you can never be sure there are no expressions that cause problems. I would be much happier with a logical argument based on the nature of the internal algorithm.

AFAICT from the haddocks, there's no way to do backtracking, so that eliminates an entire class of problems. However, is it possible to (inadvertently) create a pathological matcher/recognizer by including the equivalent of many empty in it?

@UnkindPartition
Copy link
Owner

I was hoping for more than empirical evidence, since you can never be sure there are no expressions that cause problems.

So empirical evidence wouldn't convince you yet a note in the README would? :)

Indeed, you can never be sure, period.

The code is sufficiently complex that a manual analysis/proof would require a lot of work and ingenuity. Besides, its runtime behaviour is in part determined by the way the compiler compiles higher-order functions.

However, is it possible to (inadvertently) create a pathological matcher/recognizer by including the equivalent of many empty in it?

It shouldn't be, but the easiest way to check is to try. (Or try to prove if you're so inclined.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants