Skip to content

Specifications

Shachar Shemesh edited this page Oct 10, 2018 · 17 revisions

Table of Contents

Lexical Analysis

Literals

Literals are values given explicitly in the source code.

Numeric Literals

All integer numeric literals may have any number of underscore characters (_) in them to space out the digits. The only exception is that the underscore may not be the first character of the literal.[1]

Decimal Integers

Decimal integers are denoted as a sequence of one or more decimal digits (in the range 0-9) or the underscore character. The first character of the literal may not be an underscore. The first character of the literal may only be the digit 0 if the literal contains no more digits[2].

Hexadecimal Integers

Hexadecimal integers start with 0x or 0X, followed by one or more hexadecimal digits (0-9 or a-f in either cases). The digits MAY have any number of underscores between them, but MAY NOT begin with an underscore or have an underscore between the leading 0 and the x.

All of the following are legal hexadecimal literals:

  • 0x0
  • 0x___12
  • 0xA
  • 0X12_
The following are not legal hexadecimal literals:
  • 0x___ (no digits)
  • _0x12 (a legal identifier)
  • 0_x12 (underscore between 0 and x)
  • 0xcovfefe ("o" and "v" are not hexadecimal digits).

Binary Integers

Binary integers start with 0b or 0B, followed by one or more binary digits (0 or 1). The digits MAY have any number of underscores between them, but the literal MAY NOT begin with an underscore or have an underscore between the leading 0 and the b.

All of the following are legal binary literals:

  • 0b0
  • 0b_0010_1101__1011_1000
  • 0b10_
The following are not legal binary literals:
  • 0b___ (no digits)
  • _0b11 (a legal identifier)
  • 0_b11 (underscore between 0 and b)
  • 0b12 (2 is not a binary digit)

Octal Integers

Octal integers start with 0o or 0O (a zero followed by the letter Oh), followed by one or more octal digits (0-7). The digits MAY have any number of underscores between them, but the literal MAY NOT begin with an underscore or have an underscore between the leading 0 and the O.[4] C defines octal literals as those starting with 0. This has proven dangerously confusing, both when programmers accidentally defining a literal as octal when they intends for it to be decimal, and when programmers mistakenly intends to write an octal literal but leaves the leading zero out[3]

Due to the similarity between 0 and O, the programmer SHOULD place an underscore between the 0o sequence and the actual digits.

All of the following are legal octal literals:

  • 0o00123
  • 0o_13
  • 0O000
All of the following are not legal octal literals:
  • 0o___ (no digits)
  • _0o73 (a legal identifier)
  • 0_O11 (underscore between 0 and O)
  • 0o38 (8 is not an octal digit)

String Literals

Compound Literals

Array Literals

TBD

Identifiers

Rational

  1. ^ Otherwise it is not possible to know whether this is an integer literal or an identifier.
  2. ^ This is done to avoid confusion with C's octal literals.
  3. ^ as evident by programs on Unix creating files with the permission -wxrw--wt, which in decimal is 755, actually intending rwxr-xr-x. While 0 and O are confusingly similar, mistakes are easily spotted by the compiler. A programmer reading the literal might confuse this with two actual zeros, but that just looks like a C octal literal. A programmer writing 00755 instead of 0o755 will cause the compiler to complain that a decimal literal may not begin with 0.
Clone this wiki locally