Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[conv.lval] Add example of indeterminate values that are not valid for the type CWG2899 #7047

Open
Eisenwave opened this issue Jun 5, 2024 · 9 comments · May be fixed by #7049
Open

[conv.lval] Add example of indeterminate values that are not valid for the type CWG2899 #7047

Eisenwave opened this issue Jun 5, 2024 · 9 comments · May be fixed by #7049

Comments

@Eisenwave
Copy link
Contributor

[conv.lval] p3.4 sentence 2 says:

If the result is an erroneous value ([basic.indet]) and the bits in the value representation are not valid for the object's type, the behavior is undefined.

It's not obvious how you would run into this case given that generally, memory is initialized for erroneous values, and this initialization can typically done so that values are valid.

To be honest, I don't understand how you can run into this case, and it's not going to be obvious to others who follow, so we should add an example of how this can happen.

@jensmaurer
Copy link
Member

jensmaurer commented Jun 5, 2024

Your erroneous value might have used a bit-pattern that's not valid for the type, e.g. "2" for a bool. So, a simple

bool x; // not initialized

at block scope is the example.

@Eisenwave
Copy link
Contributor Author

Hmm, I've kinda suspected that this could happen, and it's pretty unfortunate that it can.

On another note, I don't think that sentence 2 should be normative wording at all. Lvalue-to-rvalue conversion requires reading the value of an object, and by definition, a value must exist. A value representation of 0x02 for bool corresponds to no value, so reading the bool is impossible, and there is no "result" of an lvalue-to-rvalue conversion in the first place.

Two changes make sense to me:

  • Don't imply that this effect is limited to erroneous values by singleing them out. You cannot perform lvalue-to-rvalue conversion when there is no value in general.
  • Turn the sentence into a note.

@jensmaurer
Copy link
Member

Values can be invalid (e.g. trapping). For example, a pointer value that has a segment component where the segment no longer exists might trap when read.

The question here seems to be whether the value representation (i.e. set of bits) for a bool can have more than one bit. Presumably it can, for example one could say that 0x00 is false and 0xff is true. There are 8 bits to the value representation, but 0x01 is not a valid value for a bool.

And yes, we believe that situation only makes it to [conv.lval] in the erroneous value case; it's caught (with UB) earlier in other situations.

"A value representation of 0x02 for bool corresponds to no value, so reading the bool is impossible, and there is no "result" of an lvalue-to-rvalue conversion in the first place."

I don't know what "impossible" means in standardese. The best I can come up with is "undefined behavior", which is exactly what we do here.

In practical terms, there is no question that

bool x; // at block scope

creates an object of type bool, and we must allow (for the erroneous behavior mechanics) for the bytes here to be initialized to something like 0xdeadbeef. Yet, we know there are implementations that will violate the "either true or false" semantics of bool when reading such a value from x. We have to allow for that.

@Eisenwave
Copy link
Contributor Author

The question here seems to be whether the value representation (i.e. set of bits) for a bool can have more than one bit.

Yes, I believe that's how it also works in the Itanium ABI. A bool is always required/guaranteed to be 0x00 or 0x01, which implies that the upper seven bits are not considered to be padding bits, but bits of the value representation that are always required to be zero.

And yes, we believe that situation only makes it to [conv.lval] in the erroneous value case; it's caught (with UB) earlier in other situations.

I believe it never makes it to the last sentence of [conv.lval], even for erroneous values. More explanation at #7051

In short, [conv.lval] says that it "reads" the object ([defns.access]) but according to [defns.access], by definition, this means reading the value. A "value" by defintion is "one discrete element of an implementation-defined set of values.". If the value representation 0x02 doesn't correspond to true or false, by definition, no value exists, and the first sentence of [conv.lval] already implies UB.

I don't know what "impossible" means in standardese. The best I can come up with is "undefined behavior", which is exactly what we do here.

Yeah, that is what I mean. You would run into UB before running into that [conv.lval] case, always, including for erroneous values.

Yet, we know there are implementations that will violate the "either true or false" semantics of bool when reading such a value from x. We have to allow for that.

Such implementations could consider any value representation other than 0x00 to correspond to true, for example. That's still perfectly valid, with the example (or other PRs I've made).

@jensmaurer
Copy link
Member

If the value representation 0x02 doesn't correspond to true or false, by definition, no value exists, and the first sentence of [conv.lval] already implies UB.

We strive never to imply undefined behavior. Undefined behavior should be spelled as such whenever it appears.

Such implementations could consider any value representation other than 0x00 to correspond to true, for example.

That would be the easy case. No, they consider (in some situations) 0x02 as both true and false (or neither).

@jensmaurer
Copy link
Member

Would something like

If the result is object has an erroneous value ([basic.indet]) and the bits in the value representation are not valid for the object's type, the behavior is undefined.

help?

@Eisenwave
Copy link
Contributor Author

Eisenwave commented Jun 5, 2024

We strive never to imply undefined behavior. Undefined behavior should be spelled as such whenever it appears.

I agree that this would be ideal. To be honest I feel like the clearest way forward would be to re-introduce the notion of a "trap representation" (now called "value-less representation" in C23) into the standard. This concept already exists (e.g. a 0x02 bit pattern may be a value representation for bool, but not correspond to any value), however, we don't give it a name, and this is making things harder and turning explicit UB into implied UB.

No, they consider (in some situations) 0x02 as both true and false (or neither).

I'm not really getting what the meaning of that in terms of standardese would be (if it's both). Is 0x02 a value representation that corresponds to no value? Does it correspond to multiple values simultaneously? I don't believe such "quantum superstates" are allowed.

@Eisenwave
Copy link
Contributor Author

Eisenwave commented Jun 5, 2024

Would something like

If the result is object has an erroneous value ([basic.indet]) and the bits in the value representation are not valid for the object's type, the behavior is undefined.

help?

Yes, that is substantially better. Wording the effect in terms of the input instead of a "result" that (to my understanding) never actually exists is a major improvement.

@jensmaurer
Copy link
Member

I've created CWG2899 to address this.

@jensmaurer jensmaurer changed the title [conv.lval] Add example of indeterminate values that are not valid for the type [conv.lval] Add example of indeterminate values that are not valid for the type CWG2899 Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants