Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Fuzz.string generate unicode strings. #92

Merged
merged 6 commits into from
May 10, 2020
Merged

Conversation

drathier
Copy link
Collaborator

@drathier drathier commented May 1, 2019

Copy of elm-community/elm-test#204. This PR was originally made for elm 0.18 back in August 2017. We delayed merging it since there was a bug with String.reverse where it didn't handle emojis or composite characters correctly. The bug has since been both fixed and published, so all 0.19 elm/core versions now reverse the strings we use in this fuzzer properly. There are still issues with the string reversal algorithm used in elm, but this new fuzzer doesn't generate strings that hit those cases.

@drathier drathier added the fuzzers Concerns randomness or simplifiers label May 1, 2019
@drathier drathier self-assigned this May 1, 2019
@drathier drathier requested review from rtfeldman and mgold May 1, 2019 17:40
@drathier
Copy link
Collaborator Author

drathier commented May 3, 2019

N.B. master already has a major change on it, renaming the exposed Shrink module to Simplify.

tests/src/FuzzerTests.elm Outdated Show resolved Hide resolved
@drathier
Copy link
Collaborator Author

Ready for another round of reviews. @mgold

@drathier drathier requested a review from mgold August 16, 2019 13:34
Copy link
Collaborator

@harrysarson harrysarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the fuzzer generate a sting containing two combining marks immediately after each other?

Would it be possible to unit test against it?

@drathier
Copy link
Collaborator Author

I think it can. Do we want to limit ourselves to just valid unicode strings? I bet for 99% of cases it would shrink down to a single one.

@harrysarson
Copy link
Collaborator

https://en.m.wikipedia.org/wiki/Combining_character

In Unicode, diacritics are always added after the main character (in contrast to some older combining character sets such as ANSEL), so it is possible to add several diacritics to the same character, although as of 2010, few applications support correct rendering of such combinations.

So multiple combination diacritics are valid!

@drathier
Copy link
Collaborator Author

If there's a bug that requires more than one diacritic mark in a row to trigger, it doesn't hurt if we find it, even if the diff is hard to read (hex output in diff helps).

If it's triggered by multiple combining characters in a row but doesn't require those exact characters, it'll shrink to something readable.

I think we should keep it as-is, mainly on the argument that it's easier to code/test/understand.

Copy link
Collaborator

@mgold mgold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is okay other than adding an unrelated method to the public API.

@drathier
Copy link
Collaborator Author

@mgold Are you thinking of this multi-line import exposing? Github folded it a bit unhelpfully. https://github.com/elm-explorations/test/pull/92/files#diff-adac3ec51c96ebc8ace1960bee9f3d35R51 There's an addition to the public interface of Fuzz.Internal, but that module isn't exposed, and I need that function in Fuzz.elm.

src/Fuzz.elm Outdated
, ( 1, Random.int 11 50 )
, ( 1, Random.int 50 1000 )
]
)
Copy link
Collaborator

@harrysarson harrysarson Aug 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this function slightly hard to grok and, whilst having a play with it to get my head round what the code does, rewrote it slightly. I believe the function might be slightly easier to read is this form:

string : Fuzzer String
string =
    let
        ( firstFreq, restFreqs ) =
            unicodeCharGeneratorFrequencies

        lengthGenerator =
            (Random.frequency
                ( 3, Random.int 1 10 )
                [ ( 0.2, Random.constant 0 )
                , ( 1, Random.int 11 50 )
                , ( 1, Random.int 50 1000 )
                ]
            )

        unicodeGenerator =
            frequencyList lengthGenerator firstFreq restFreqs
                |> Random.map String.fromList
    in
    custom unicodeGenerator Simplify.string 

Otherwise this patch looks good to me :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me :) Took it as you wrote it and ran it though elm-format.

@mgold
Copy link
Collaborator

mgold commented Aug 21, 2019

@drathier Ah, that's an import, not an exposing. Looks good!

@drathier drathier merged commit ff68cb8 into master May 10, 2020
@drathier drathier deleted the unicode-string-fuzzer branch May 10, 2020 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fuzzers Concerns randomness or simplifiers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants