Make Fuzz.string generate unicode strings #204

drathier · 2017-08-06T13:59:40Z

This implements #201. It also replaces the implementation for #198 #200 (generate whitespace-only strings).

The main difference is that I changed the way we generate strings to make use of equivalence classes:

Choose how many classes of characters to use:
- single repeated character from a single character class
- single character class
- two classes
- all classes
Pick which character class(es) to use, according to the provided frequency
Generate strings of random length according to the previous selections.

The character classes are:

ASCII: same as before, characters in the range [32, 126] (inclusive)
Whitespace: same as before, space, \t, \n
Emoji: 🌈, ❤, 🔥
Combining diacritical marks: ~^¨, or rather ̃, ̂, ̈, (without the commas)

I chose these characters since I think they give people who write websites in English for English users a simple reason to support all of unicode, which helps a lot of people. Bilinguals, people whose names use non-ascii characters, and anyone who wants to use Emoji benefits from this. The characters are all familiar to English speakers, and they cover the most important classes in unicode. Emoji require two utf-16 code units, as does most (all?) Asian characters. Combining diacritical mark support covers all of extended Latin, i.e. all of Europe.

The only large character class remaining is inline right-to-left text, but I've left that one out because I have no idea what properties you'd test in such a script that aren't relevant to the rest of unicode. Arabic letters change shape depending on where in a sentance they're used, the byte order doesn't match the order the characters are presented on screen, and it becomes possible to generate invalid strings both in the generation and shrinking phases.

This change builds on the assumption that "abc" and "xyz" are probably going to trigger the same errors, because they share a lot of properties; a-z, lowercase, sorted, unique characters, same length, substring of the English alphabet.

I haven't written a custom shrinker for this subset of unicode, maybe I should do that?

I've previously been unsure if we want to release this right away, or wait for core to release a patch. It's not looking like we're getting a patch until the next major version is released, and 0.19 isn't in alpha yet.

Some test suites will break if they don't have the patch from core which fixes String.toList, and those users will have to use a non-core version of that function until core releases a patch, if they have this bug. The String.toList >> String.fromList identity seems to hold both with and without this bug, so it's not going to affect a lot of users.
Or we release it right away, because it's a bug in our users' codebases, and the purpose of testing is to find bugs.

mgold

It's not very helpful to break someone's test suite and the only way to fix it is to use a forked core. This looks good for the most part, but I think we should hold until the core fix is released. Adding blocked label.

mgold · 2017-08-08T02:12:47Z

src/Util.elm

+
+                -- all
+                , frequency pairs
+                    |> constant


This looks the same as the "single generator".

The single generator was incorrect before. It's fixed now :)

drathier · 2018-10-31T09:41:06Z

This should be safe to merge into 0.19 now. Elm doesn't break emojis in 0.19, but the language still doesn't handle reversal of unicode strings properly, it simply reverses the array of unicode code points.

let t = "😻🙀👌x\u{0303}y" in text (t ++ " <=> " ++ String.reverse t)

prints

😻🙀👌x̃y <=> ỹx👌🙀😻

mgold · 2018-10-31T15:17:25Z

I'll hit the approve button, but we should be careful about the release and have a plan to rollback if we break a lot of tests.

drathier · 2018-10-31T19:26:54Z

I'm getting very confused regarding elm-community/elm-test vs elm-explorations/test. Is this repo legacy 0.18, and going to be left as-is from now on, with new development being done in elm-explorations/test? If so, this PR should be merged into elm-explorations/test, not this repo.

mgold · 2018-11-01T05:08:10Z

Oh shoot, I got turned around and thought this was the active repo. It is not.

rtfeldman · 2018-11-01T12:44:37Z

Yeah, I think we should archive this repo.

Any objections?

drathier · 2018-11-01T12:54:38Z

Sure, as soon as we've migrated the PR's

Make Fuzz.string generate unicode strings.

05a86be

drathier added the fuzzers label Aug 6, 2017

drathier self-assigned this Aug 6, 2017

drathier requested review from rtfeldman and mgold August 6, 2017 13:59

mgold suggested changes Aug 8, 2017

View reviewed changes

mgold added the blocked label Aug 8, 2017

FrequencyList can now choose to pick values from single charset. Bugfix.

5f68136

drathier mentioned this pull request Dec 3, 2017

Non-printable characters aren't escaped #217

Open

drathier removed the blocked label Oct 31, 2018

mgold approved these changes Oct 31, 2018

View reviewed changes

drathier mentioned this pull request May 1, 2019

Make Fuzz.string generate unicode strings. elm-explorations/test#92

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Fuzz.string generate unicode strings #204

Make Fuzz.string generate unicode strings #204

drathier commented Aug 6, 2017

mgold left a comment

mgold Aug 8, 2017

drathier Aug 10, 2017

drathier commented Oct 31, 2018

mgold commented Oct 31, 2018

drathier commented Oct 31, 2018

mgold commented Nov 1, 2018

rtfeldman commented Nov 1, 2018

drathier commented Nov 1, 2018

Make Fuzz.string generate unicode strings #204

Are you sure you want to change the base?

Make Fuzz.string generate unicode strings #204

Conversation

drathier commented Aug 6, 2017

mgold left a comment

Choose a reason for hiding this comment

mgold Aug 8, 2017

Choose a reason for hiding this comment

drathier Aug 10, 2017

Choose a reason for hiding this comment

drathier commented Oct 31, 2018

mgold commented Oct 31, 2018

drathier commented Oct 31, 2018

mgold commented Nov 1, 2018

rtfeldman commented Nov 1, 2018

drathier commented Nov 1, 2018