Fuzz.string only generates ascii strings #201

drathier · 2017-07-30T12:51:15Z

As mentioned in #198, and #200, the Fuzz.string fuzzer only generates ascii characters in the range 32-126, which covers A-Za-z0-9, some whitespace and some special characters. It should generate any kind of string to make sure the code works with more characters. Even English-only users are impacted, as emoji aren't in ascii 😿.

I think we should do a breaking change and make Fuzz.string generate characters from all of unicode. This will probably fail some test suites that previously only tested ascii strings, but that's a good thing, right?

The full unicode solution is however blocked while we wait for a new release of elm-lang/core. The bug has been fixed, but it's not released yet.

The text was updated successfully, but these errors were encountered:

zkessin · 2017-07-31T08:50:42Z

I would suggest having a Fuzz.string and fuzz.utf8String or the like.

Imagine that you have a problem where somehow the Hebrew string חה were to become הח if you are not familiar with Hebrew that could be very confusing to debug, as you have 2 letters that look pretty similar swapping in position.

If we are going to do UTF8/UTF16 we want to make sure we do it really well

I assume similar problems could happen with a number of scripts but I happen to have a Hebrew keyboard handy.

drathier · 2017-07-31T09:05:21Z

I would like the default to be unicode, so Fuzz.string is unicode and Fuzz.asciiString is the current version. I was planning on doing a small subset of unicode that should find these bugs without running into homoglyph problems, right-to-left text and other things where the rendered output is vastly different the actual string.

In javascript, the main thing to worry about is characters that don't fit inside a single utf-16 code unit, such as emoji, as well as combining characters (and maybe normalization for equality testing). I think ascii, emoji and some european characters should be enough, without being too hard to debug.

zkessin · 2017-07-31T09:08:22Z

Sounds good, we probably will eventually want a way to specify character set, so if someone wants Hebrew/Greek/Arabic/Russian/Hindi etc they will be able to have them

drathier · 2017-07-31T09:30:08Z

I don't think we should let the user specify what character classes or character sets to use. That's one huge rabbit hole which could take tens of thousands of lines of code to implement in pure Elm. There are ranges of code points that can be used to select a code plane, but if you want whitespace, you'll have to manually list out the 8 different characters, and if you want mathematical characters, there's another set of ranges to use, and so on. For example, here are the code points of the Swedish alphabet: https://www.iana.org/domains/idn-tables/tables/se_sv-se_1.0.html

Since this is only for testing, I say we try to pick a subset which is easy to use when testing, but which covers "all" the special cases of unicode.

mgold · 2017-08-01T03:06:10Z

Since it sounds like Fuzz.string won't be removed, just changed, I'm removing the newly-renamed major-release-blocker label. Patch and minor releases can ship whenever they are ready.

drathier added fuzzers major-release-blocker labels Jul 30, 2017

drathier self-assigned this Jul 30, 2017

drathier added the blocked label Jul 30, 2017

mgold removed the major-release-blocker label Aug 1, 2017

drathier mentioned this issue Aug 6, 2017

Make Fuzz.string generate unicode strings #204

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuzz.string only generates ascii strings #201

Fuzz.string only generates ascii strings #201

drathier commented Jul 30, 2017

zkessin commented Jul 31, 2017

drathier commented Jul 31, 2017 •

edited

Loading

zkessin commented Jul 31, 2017

drathier commented Jul 31, 2017

mgold commented Aug 1, 2017

Fuzz.string only generates ascii strings #201

Fuzz.string only generates ascii strings #201

Comments

drathier commented Jul 30, 2017

zkessin commented Jul 31, 2017

drathier commented Jul 31, 2017 • edited Loading

zkessin commented Jul 31, 2017

drathier commented Jul 31, 2017

mgold commented Aug 1, 2017

drathier commented Jul 31, 2017 •

edited

Loading