Skip to content

Specifications

Shachar Shemesh edited this page Feb 7, 2019 · 17 revisions

Table of Contents

Lexical Analysis

Literals

Literals are values given explicitly in the source code.

Numeric Literals

All integer numeric literals may have any number of underscore characters (_) in them to space out the digits. The only exception is that the underscore may not be the first character of the literal.[1]

Decimal Integers

Decimal integers are denoted as a sequence of one or more decimal digits (in the range 0-9) or the underscore character. The first character of the literal may not be an underscore. The first character of the literal may only be the digit 0 if the literal contains no more digits[2].

Hexadecimal Integers

Hexadecimal integers start with 0x or 0X, followed by one or more hexadecimal digits (0-9 or a-f in either cases). The digits MAY have any number of underscores between them, but MAY NOT begin with an underscore or have an underscore between the leading 0 and the x.

All of the following are legal hexadecimal literals:

  • 0x0
  • 0x___12
  • 0xA
  • 0X12_
The following are not legal hexadecimal literals:
  • 0x___ (no digits)
  • _0x12 (a legal identifier)
  • 0_x12 (underscore between 0 and x)
  • 0xcovfefe ("o" and "v" are not hexadecimal digits).

Binary Integers

Binary integers start with 0b or 0B, followed by one or more binary digits (0 or 1). The digits MAY have any number of underscores between them, but the literal MAY NOT begin with an underscore or have an underscore between the leading 0 and the b.

All of the following are legal binary literals:

  • 0b0
  • 0b_0010_1101__1011_1000
  • 0b10_
The following are not legal binary literals:
  • 0b___ (no digits)
  • _0b11 (a legal identifier)
  • 0_b11 (underscore between 0 and b)
  • 0b12 (2 is not a binary digit)

Octal Integers

Octal integers start with 0o or 0O (a zero followed by the letter Oh), followed by one or more octal digits (0-7). The digits MAY have any number of underscores between them, but the literal MAY NOT begin with an underscore or have an underscore between the leading 0 and the O.[4] C defines octal literals as those starting with 0. This has proven dangerously confusing, both when programmers accidentally defining a literal as octal when they intends for it to be decimal, and when programmers mistakenly intends to write an octal literal but leaves the leading zero out[3]

Due to the similarity between 0 and O, the programmer SHOULD place an underscore between the 0o sequence and the actual digits.

All of the following are legal octal literals:

  • 0o00123
  • 0o_13
  • 0O000
All of the following are not legal octal literals:
  • 0o___ (no digits)
  • _0o73 (a legal identifier)
  • 0_O11 (underscore between 0 and O)
  • 0o38 (8 is not an octal digit)

String Literals

Compound Literals

Array Literals

TBD

Identifiers

Parser

MODULE -> GLOBAL_EXPRESSIONS_LIST

GLOBAL_EXPRESSIONS_LIST -> ϵ

GLOBAL_EXPRESSIONS_LIST -> GLOBAL_EXPRESSIONS_LIST GLOBAL_EXPRESSION

GLOBAL_EXPRESSION -> FUNC_DEF

FUNC_DEF -> reserved_def FUNC_DECL_BODY COMPOUND_EXPRESSION

FUNC_DECL_BODY -> identifier ( FUNC_DECL_ARGS ) FUNC_DECL_RET

FUNC_DECL_ARGS -> ϵ

FUNC_DECL_ARGS -> FUNC_DECL_ARGS_NONEMPTY

FUNC_DECL_ARGS_NONEMPTY -> FUNC_DECL_ARG

FUNC_DECL_ARGS_NONEMPTY -> FUNC_DECL_ARG , FUNC_DECL_ARGS_NONEMPTY

FUNC_DECL_ARG -> TYPE identifier

FUNC_DECL_RET -> ϵ

FUNC_DECL_RET -> '->' TYPE

STATEMENT -> EXPRESSION ;

STATEMENT -> VARIABLE_DEFINITION ;

EXPRESSION -> COMPOUND_EXPRESSION

EXPRESSION -> LITERAL

EXPRESSION -> identifier

EXPRESSION -> TBD // operators and function calls

LITERAL -> literal_string

LITERAL -> literal_int

LITERAL -> literal_fp

COMPOUND_EXPRESSION -> { STATEMENT_LIST }

COMPOUND_EXPRESSION -> { STATEMENT_LIST EXPRESSION }

STATEMENT_LIST -> ϵ

STATEMENT_LIST -> STATEMENT_LIST STATEMENT

TYPE -> identifier

VARIABLE_DEFINITION -> reserved_def VARIABLE_DECL_BODY

VARIABLE_DEFINITION -> reserved_def VARIABLE_DECL_BODY = EXPRESSION

VARIABLE_DECL_BODY -> identifier : TYPE

Rational

  1. ^ Otherwise it is not possible to know whether this is an integer literal or an identifier.
  2. ^ This is done to avoid confusion with C's octal literals.
  3. ^ as evident by programs on Unix creating files with the permission -wxrw--wt, which in decimal is 755, actually intending rwxr-xr-x. While 0 and O are confusingly similar, mistakes are easily spotted by the compiler. A programmer reading the literal might confuse this with two actual zeros, but that just looks like a C octal literal. A programmer writing 00755 instead of 0o755 will cause the compiler to complain that a decimal literal may not begin with 0.
Clone this wiki locally