Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generator for existing atoms #310

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Maria-12648430
Copy link

@Maria-12648430 Maria-12648430 commented Apr 6, 2023

This PR adds a new generator existing_atom/0 which does not create random new atoms (like atom/0 does) but only uses existing atoms. Also, a new option has been added, default_atom_generator with options atom or existing_atom, which will be used by the any/0 generator.


The main idea behind those additions is rooted in the fact that the number of atoms in an Erlang node is limited. The default atom generator, which produces new atoms at random, may exhaust this limit. This happened when we wrote a proptest suite for the lists module of OTP, which is quite large; and while we were able to work around this, it is obvious that a noble goal like property-testing all of OTP is nigh impossible this way.

The (existing) atom generator proposed in this PR does not generate new atoms but instead picks one of the existing ones at random.

As the any generator includes the (default) atom generator and is not easy to replace there, and is in turn used implicitly in many other generators like list/0, another aspect of this PR is the default_atom_generator option. This can be used to control which atom generator will be used by the any generator: the default one or the existing atom one.


As PropEr is quite big and I don't understand all of it, I'm not sure if I made all the necessary changes/additions in all places, and would be thankful for some advise.

@Maria-12648430
Copy link
Author

Last commit provides first tests.

@Maria-12648430 Maria-12648430 marked this pull request as ready for review April 18, 2023 11:02
@Maria-12648430
Copy link
Author

@kostis this is ready for a first review I think.

src/proper.erl Outdated Show resolved Hide resolved
@@ -1398,6 +1398,20 @@ sampleshrink_test_() ->
?_shrinksTo([a], Gen)},
?_test(proper_gen:sampleshrink(Gen))]}].

existing_atom_test() ->
Copy link
Collaborator

@kostis kostis Apr 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very unclear what these tests really test and what their underlying assumptions are.
For example, their correctness crucially depends on not loading any other Erlang modules between the two calls to erlang:system_info(atom_count). Is this robust to e.g. executing all the PropEr tests in parallel?

Also, it's unclear why the default_atom_test is using the any() instead of the atom() generator. For example, the test trivially succeeds if any() happens to return terms other than atoms...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, the tests are brittle, meant as a first attempt =(

Executing PropEr tests in parallel is likely to break them. In fact, anything else that may be going on in the system may break them.

Also, it's unclear why the default_atom_test is using the any() instead of the atom() generator. For example, the test trivially succeeds if any() happens to return terms other than atoms...

The default_atom_generator option, by and large, decides what atom generator is used in the any generator, either atom or existing_atom (there is probably a better name for this). Anyway, this is the reason why any is used in the test. And yes, if any happens to generate only non-atoms, this test will pass. I tried with ?SUCHTHAT, but then the test fails if the generator fails to produce any atom.

I'm at a loss about what to do about this, or rather how to test it PropErly. Thankful for advice 🤷‍♀️

@@ -418,6 +420,53 @@ atom_gen(Size) ->
atom_rev(Atom) ->
{'$used', atom_to_list(Atom), Atom}.

%% @private
Copy link
Collaborator

@kostis kostis Apr 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two functions, and the PR in general, should describe the main idea of the existing atom generation algorithm and the assumptions it relies on. Also its limitations.

Having written that, if I understand what the existing_atom_gen() does, it is in fact not an atom generator (in the sense that proper_gen:atom_gen/1 is), but it simply returns a random atom from those (currently) loaded in the atom table, which it stores in its own map, right?

Since there is a magic way (the <<131, 75, Index:24>> trick) to access each of these atoms given an Index, and the upper limit of this index is known ( it is given by erlang:system_info(atom_count)), what is the point of first iterating through all these atoms putting them in an internal map, rather than randomly picking one Index in the range and returning the corresponding atom? (Perhaps after a num_tries if this process is unlucky to only end up in atoms starting with $.) What I am missing?

Incidentally, I do not like the magic constants 131, 75 and 24. They should be made appropriate defines and ideally obtained by the Erlang/OTP system somehow.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having written that, if I understand what the existing_atom_gen() does, it is in fact not an atom generator (in the sense that proper_gen:atom_gen/1 is), but it simply returns a random atom from those (currently) loaded in the atom table, which it stores in its own map, right?

That is correct, yes. "Find all existing atoms, (store them in the map,) pick one at random". The term "generator" is probably not the exact term to use here (I guess this is you objection?), but what is the correct term to use?

What I am missing?

Nothing I think ;) That is a great suggestion, I'll do that. How about we use a fallback atom like '' to return if all tries turn up atoms starting with $? We need to return something, right?

Copy link
Collaborator

@kostis kostis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For starters, address the three comments I've left in the code.

@Maria-12648430
Copy link
Author

For starters, address the three comments I've left in the code.

With the last commit, I have reverted the cosmetic changes and changed the generation algorithm as suggested here (the number of retries-if-starts-with-$ is currently hardcoded, should that be configurable you think?)

We are still undecided how to provide good tests here, so that is not part of the commit.

@Maria-12648430
Copy link
Author

Updated the PR description to clarify what this is all about.

@devstopfix
Copy link

@Maria-12648430 Excuse my intrusion, but I have had to deal with a production system that exhausted atoms and it took me weeks to find where it was happening - so this is of interest to me. (it turns out we had a custom JSON parser that was doing UUID to atom...)

How about an atom generator that will try not to exhaust atoms? Given a standard runtime without the +t flag:

:erlang.system_info(:atom_limit)
1048576

We can solve for 26^x < 1048576 -- https://www.wolframalpha.com/input?i=26%5Ex+%3C+1048576

Lets create an alphabet of chars which we will use to build atoms. Dropped some vowels to avoid some rude words but include o and e for ok and err:

# Alphabet of chars that make atoms (dropping some vowels)
alphabet = 'bcdefghjklmnopqrstvwxyz'
alphabet_size = length(alphabet)
char = :proper_types.elements(alphabet)

Solve the above equation to see the longest atoms we can make that will not exhaust the space:

atom_limit = :erlang.system_info(:atom_limit)
max_chars = floor(:math.log10(atom_limit) / :math.log10(alphabet_size))

Write a generator (excuse it being in Elixir, I couldn't find the equivalent LET in Erlang:

import PropCheck

gen_atom =
  let n <- :proper_types.integer(1, max_chars) do
    let a <- :proper_types.vector(n, char) do
      :erlang.list_to_atom(a)
    end
  end

Sample the generator:

:proper_gen.sample(gen_atom)

ntq
lnh
nlsm
ok
bq
s
l
lhtr
wf
xd
spkt

@Maria-12648430
Copy link
Author

@Maria-12648430 Excuse my intrusion

Hi @devstopfix, no worries, every opinion is appreciated 🙂

but I have had to deal with a production system that exhausted atoms and it took me weeks to find where it was happening - so this is of interest to me. (it turns out we had a custom JSON parser that was doing UUID to atom...)

Hm, I don't see how a different atom generator, or the current one for that matter, could have helped in detecting this? 😅

How about an atom generator that will try not to exhaust atoms? Given a standard runtime without the +t flag:

:erlang.system_info(:atom_limit)
1048576

We can solve for 26^x < 1048576 -- https://www.wolframalpha.com/input?i=26%5Ex+%3C+1048576

It's not quite as simple as that, is it? The above holds only for atoms (or strings or whatever) of length of exactly x, but really you are generating atoms of length 0...x (ie, the empty atom, atoms of length 1, atoms of length 2, and so on to x), so you have to find out the highest value of x that satisfies $$\left( \sum_{n=0}^x alphabet\_size^n \right) &lt; atom\_limit$$

Lets create an alphabet of chars which we will use to build atoms. Dropped some vowels to avoid some rude words but include o and e for ok and err:

I see no reason to exclude anything here, this is just for tests, I don't think the runtime will take offense no matter how much profanity you throw at it 😁 (and, just saying, by not excluding o and e, you would still allow for words like fool and even whore to be created 😅).
On the contrary, I would even extend the alphabet by allowing numbers and some special characters like _ and the like. Even uppercase letters could (should?) be included. Actually, atoms can be composed of almost anything, even Emojis. Like, '😛' is a valid atom.

Solve the above equation to see the longest atoms we can make that will not exhaust the space:

Given the polite (:grin:) alphabet of 23 letters (without a, i and u) and the default atom limit of 1048576, that would be 4, and not allow common atoms like undefined, infinity, true, false, error (I have never seen err in use, but that may be different in Elixir).

If we were using the full alphabet in lower- and uppercase + the numbers (so, 62 characters), it would be only 3, and equally not allow the above common atoms.

IMO, those are very severe limitations which make the proposed generator pretty useless (sorry, no offense 🥺)

ntq
lnh
nlsm
ok
bq
s
l
lhtr
wf
xd
spkt

(Put an IMO on most of what's in the following paragraphs 😉)

Thing is, none of those is any more likely to cause (and thereby help detect) problems than one of the others, except ok maybe. Put differently, one atom is not more likely to cause a problem simply because it is longer, or has a weirder name than another atom.

The common use case for atoms is to be matched on, not to be compared. So it doesn't matter if you throw ntq or nlsm at your code, it either is prepared for either or it isn't for any of them. By and large and corner cases aside, the set of possibly problematic atoms is very likely a subset of the existing atoms, and also common atoms like ok, error, true, false, undefined etc are more likely to be problematic than others like yecc_123.

However, another use case where the atom name is indeed relevant is in sorting (like, sorting a list of stuff which may be atoms), but even for such use cases, the set of existing atoms should contain ample specimens.


That all said, contrary to what the PRs title indicates, the problem I was having and which did lead up to it is actually not the atom generator by itself. It is not hard to tailor one your own. The problem is its implicit use in the any generator, which I find immensely useful, but by containing the unlimited atom generator, tends to exhaust the atom limit when employed freely. But the any generator is very hard to change or replicate with the common PropEr functionality however, I must have tried and failed several dozen times 😢

@devstopfix
Copy link

@Maria-12648430 no worries I guess it just comes down to the fact that the OTP team will be testing "atoms" so they appreciate a broad spectrum of weird and wonderful atoms, whereas I normally just need an atom that is fairly short and readable and won't blow up the BEAM when running a long test :-)

@Maria-12648430
Copy link
Author

Maria-12648430 commented Jun 29, 2023

@Maria-12648430 no worries I guess it just comes down to the fact that the OTP team will be testing "atoms" so they appreciate a broad spectrum of weird and wonderful atoms

Heh, I don't think they do, picking from the set of existing atoms is just fine 😉 Normally it's the users who come up with weird stuff. I'm waiting for the day when instead of ok and error we get to see '😃' and '😢' (j/k).

whereas I normally just need an atom that is fairly short and readable and won't blow up the BEAM when running a long test :-)

I guess what I was trying to say is that, if you already make up atoms for testing at random, where do you draw the line between what is generally reasonable to randomly generate with and what isn't?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants