Skip to content

Language Design and Theory of Computation

andychu edited this page Jul 20, 2020 · 25 revisions

Undecidable Parsing

Deciding whether to print "syntax error" shouldn't require simulating a Turing machine.

  • Parsing Bash is Undecidable
    • Bash, Perl, and Make can't be statically parsed because they interleave parsing and execution.
    • C++ can't be statically parsed because it has two Turing complete computation mechanisms -- templates at compile time and normal code at runtime -- and parsing sits in between them.

Accidentally Turing Complete

  • Shell wasn't Turing complete when first designed.

  • Make wasn't Turing complete when first designed. See Example Code in Shell, Awk, and Make.

  • sedlisp is proof that sed is Turing complete. Not sure what constructs are used.

  • Accidentally Turing Complete -- nice comprehensive list, including some games which we don't care about here

  • What about TeX and PostScript? I suspect those were both intended to be Turing Complete, although the syntax is odd.

  • CSS3 proven to be Turing Complete --

  • printf is Turing complete. %n specifier writes to memory?

    When we control the arguments to printf(), it is possible to obtain Turing-complete computation. We show this formally in Appendix B by giving calls to printf() which create logic gates.

  • x86 MMU Fault Handling

    • trapcc -- Compute with 0 instructions on Intel!

    This is a proof by construction that the Intel MMU's fault handling mechanism is Turing complete. We have constructed an assembler that translates 'Move, Branch if Zero, Decrement' instructions to C source that sets up various processor control tables. After this code has executed, the CPU computes by attempting to fault without ever executing a single instruction.

Context-Free Grammars are not Powerful Enough in Practice

Parsing is Difficult

TODO: Move to Parsing Is Difficult page?

Humans are better than computers at parsing programming languages.

To-do: collect instances of people arguing for grammars . For tools, IDEs?

Yacc isn't Powerful Enough in Practice

  • Zinc Abstract Machine paper for CamlLight: the yacc parser had ~50 reduce/reduce conflicts and ~500 shift/reduce conflicts, or it had to be written in a convoluted style. Problem: Caml doesn't have closing delimiters.

  • CoffeeScript video: "Rise of Transpilers": "cheating" by adding fake tokens in the input

  • JRuby: cheating by massaging output

  • How Clang handles the type/variable name ambiguity of C++ -- My gripe is not with the grammar itself (although I admit it's needlessly complex), it's with the inability of Yacc-generated LALR(1) parsers to parse it without considerable hacks. As I've mentioned numerous times before, industrial-strength compilers for C/C++ exist after all, so they do manage to somehow parse these languages.

Alternative idea: parsers should be architected as pure functions from tokens to AST... (Doesn't matter what computational model) And then you can instrument them and empirically determine backtracking/efficiency ?

  • TODO Review/Summarize papers
    • Yakker -- scannerless, code available in OCaml
    • Niall -- for data formats, length prefixes and checksums, etc.
    • Pure Declarative Syntax Lost and Regained (GLR). From Nix authors.
    • Langsec: this is a different but related issue. CFG is the wrong model. Models should be based on reality.
      • finite vs infinite languages. Length prefixes.

Sweet spot? Context-sensitive / Turing complete, linear time lexing, and statically parseable

Lexing and Parsing should be Separate.

PEGs combine them, only for reasons of presentation.

Had this discussion twice: at work about GLR, and maybe about Gazelle. And on HN: https://news.ycombinator.com/item?id=13042835

Wrote up a wiki page that was popular: Why Lexing and Parsing Should Be Separate

Related Issues

  • Turing completeness vs. side effects/capabilities. Both of these are design issues and shouldn't be conflated.
  • Whether there is abstraction power like functions. I think Awk was Turing complete at first, but it didn't have functions. Related lobsters thread.
Clone this wiki locally