Document the time and memory complexity of the library #34

neilmayhew · 2019-07-23T19:16:36Z

It's not clear whether this library is subject to the same exponential blowups as other RE implementations. Adding a note in the README would be very helpful.

This seems particularly relevant after the recent CloudFlare outage which was partly caused by a bad RE that led to crippling CPU usage.

UnkindPartition · 2019-07-23T19:31:04Z

There shouldn't be any exponential blowup scenarios here.

The precise complexity depends on how you use the library. If it's just matching, then I think it would be O(n) time and O(1) memory for any given regex.

For recognition, I'd predict O(n) memory because of the bookkeeping of which branch was taken at each step. Hopefully the runtime stays O(n), but I'm not sure.

If you're up for confirming these empirically and then preparing a README PR or filing issues (depending on the outcome), I'd gladly accept them.

neilmayhew · 2019-07-23T19:56:23Z

I was hoping for more than empirical evidence, since you can never be sure there are no expressions that cause problems. I would be much happier with a logical argument based on the nature of the internal algorithm.

AFAICT from the haddocks, there's no way to do backtracking, so that eliminates an entire class of problems. However, is it possible to (inadvertently) create a pathological matcher/recognizer by including the equivalent of many empty in it?

UnkindPartition · 2019-07-23T20:15:19Z

I was hoping for more than empirical evidence, since you can never be sure there are no expressions that cause problems.

So empirical evidence wouldn't convince you yet a note in the README would? :)

Indeed, you can never be sure, period.

The code is sufficiently complex that a manual analysis/proof would require a lot of work and ingenuity. Besides, its runtime behaviour is in part determined by the way the compiler compiles higher-order functions.

However, is it possible to (inadvertently) create a pathological matcher/recognizer by including the equivalent of many empty in it?

It shouldn't be, but the easiest way to check is to try. (Or try to prove if you're so inclined.)

UnkindPartition added the pr-welcome label Jan 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document the time and memory complexity of the library #34

Document the time and memory complexity of the library #34

neilmayhew commented Jul 23, 2019

UnkindPartition commented Jul 23, 2019

neilmayhew commented Jul 23, 2019

UnkindPartition commented Jul 23, 2019

Document the time and memory complexity of the library #34

Document the time and memory complexity of the library #34

Comments

neilmayhew commented Jul 23, 2019

UnkindPartition commented Jul 23, 2019

neilmayhew commented Jul 23, 2019

UnkindPartition commented Jul 23, 2019