Skip to content
This repository has been archived by the owner on Oct 9, 2019. It is now read-only.

Make Fuzz.string generate unicode strings #204

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

drathier
Copy link
Collaborator

@drathier drathier commented Aug 6, 2017

This implements #201. It also replaces the implementation for #198 #200 (generate whitespace-only strings).

The main difference is that I changed the way we generate strings to make use of equivalence classes:

  1. Choose how many classes of characters to use:
    • single repeated character from a single character class
    • single character class
    • two classes
    • all classes
  2. Pick which character class(es) to use, according to the provided frequency
  3. Generate strings of random length according to the previous selections.

The character classes are:

  • ASCII: same as before, characters in the range [32, 126] (inclusive)
  • Whitespace: same as before, space, \t, \n
  • Emoji: 🌈, ❤, 🔥
  • Combining diacritical marks: ~^¨, or rather ̃, ̂, ̈, (without the commas)

I chose these characters since I think they give people who write websites in English for English users a simple reason to support all of unicode, which helps a lot of people. Bilinguals, people whose names use non-ascii characters, and anyone who wants to use Emoji benefits from this. The characters are all familiar to English speakers, and they cover the most important classes in unicode. Emoji require two utf-16 code units, as does most (all?) Asian characters. Combining diacritical mark support covers all of extended Latin, i.e. all of Europe.

The only large character class remaining is inline right-to-left text, but I've left that one out because I have no idea what properties you'd test in such a script that aren't relevant to the rest of unicode. Arabic letters change shape depending on where in a sentance they're used, the byte order doesn't match the order the characters are presented on screen, and it becomes possible to generate invalid strings both in the generation and shrinking phases.


This change builds on the assumption that "abc" and "xyz" are probably going to trigger the same errors, because they share a lot of properties; a-z, lowercase, sorted, unique characters, same length, substring of the English alphabet.

I haven't written a custom shrinker for this subset of unicode, maybe I should do that?

I've previously been unsure if we want to release this right away, or wait for core to release a patch. It's not looking like we're getting a patch until the next major version is released, and 0.19 isn't in alpha yet.

  • Some test suites will break if they don't have the patch from core which fixes String.toList, and those users will have to use a non-core version of that function until core releases a patch, if they have this bug. The String.toList >> String.fromList identity seems to hold both with and without this bug, so it's not going to affect a lot of users.
  • Or we release it right away, because it's a bug in our users' codebases, and the purpose of testing is to find bugs.

@drathier drathier self-assigned this Aug 6, 2017
Copy link
Member

@mgold mgold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not very helpful to break someone's test suite and the only way to fix it is to use a forked core. This looks good for the most part, but I think we should hold until the core fix is released. Adding blocked label.

src/Util.elm Outdated

-- all
, frequency pairs
|> constant
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks the same as the "single generator".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The single generator was incorrect before. It's fixed now :)

@mgold mgold added the blocked label Aug 8, 2017
@drathier
Copy link
Collaborator Author

This should be safe to merge into 0.19 now. Elm doesn't break emojis in 0.19, but the language still doesn't handle reversal of unicode strings properly, it simply reverses the array of unicode code points.

let t = "😻🙀👌x\u{0303}y" in text (t ++ " <=> " ++ String.reverse t)

prints

😻🙀👌x̃y <=> ỹx👌🙀😻

@drathier drathier removed the blocked label Oct 31, 2018
@mgold
Copy link
Member

mgold commented Oct 31, 2018

I'll hit the approve button, but we should be careful about the release and have a plan to rollback if we break a lot of tests.

@drathier
Copy link
Collaborator Author

I'm getting very confused regarding elm-community/elm-test vs elm-explorations/test. Is this repo legacy 0.18, and going to be left as-is from now on, with new development being done in elm-explorations/test? If so, this PR should be merged into elm-explorations/test, not this repo.

@mgold
Copy link
Member

mgold commented Nov 1, 2018

Oh shoot, I got turned around and thought this was the active repo. It is not.

@rtfeldman
Copy link
Member

Yeah, I think we should archive this repo.

Any objections?

@drathier
Copy link
Collaborator Author

drathier commented Nov 1, 2018

Sure, as soon as we've migrated the PR's

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants