24. Changing CPython’s Grammar

24.1. Abstract

There’s more to changing Python’s grammar than editing Grammar/Grammar. This document aims to be a checklist of places that must also be fixed.

It is probably incomplete. If you see omissions, submit a bug or patch.

This document is not intended to be an instruction manual on Python grammar hacking, for several reasons.

24.2. Rationale

People are getting this wrong all the time; it took well over a year before someone noticed that adding the floor division operator (//) broke the parser module.

24.3. Checklist

Note: sometimes things mysteriously don’t work. Before giving up, try make clean.

  • Grammar/Grammar: OK, you’d probably worked this one out. :-) After changing it, run make regen-grammar, to regenerate Include/graminit.h and Python/graminit.c. (This runs Python’s parser generator, Python/pgen).
  • Grammar/Tokens is a place for adding new token types. After changing it, run make regen-token to regenerate Include/token.h, Parser/token.c, Lib/token.py and Doc/library/token-list.inc. If you change both Grammar and Tokens, run make regen-tokens before make regen-grammar.
  • Parser/Python.asdl may need changes to match the Grammar. Then run make regen-ast to regenerate Include/Python-ast.h and Python/Python-ast.c.
  • Parser/tokenizer.c contains the tokenization code. This is where you would add a new type of comment or string literal, for example.
  • Python/ast.c will need changes to create the AST objects involved with the Grammar change.
  • The Design of CPython’s Compiler has its own page.
  • The parser module. Add some of your new syntax to test_parser, bang on Modules/parsermodule.c until it passes.
  • Add some usage of your new syntax to test_grammar.py.
  • Certain changes may require tweaks to the library module pyclbr.
  • Lib/tokenize.py needs changes to match changes to the tokenizer.
  • Lib/lib2to3/Grammar.txt may need changes to match the Grammar.
  • Documentation must be written!