Implements new katscript v3 grammar using a new "packrat" parser. See last comments in #240 (closed) for details.
Finish model compilation
- Graph filler
- Handling of self-referencing parameters
- Tests for self-referencing parameters
- Graph -> tokens
- Model -> graph
- How do we want to handle symbols in info parameters? (See below)
- Update syntax documentation
- Developer documentation: overview of how the parser works, how to add new components, etc.
Finish documentation syntax generator (fix
finesse_sphinx, could use a different generator)
Restore disabled unparse examples in
Show arguments that cause errors in
- CLI: add parser debug CLI commands
- Fuzzing tests: generate lines of valid script and check they parse/build
Differences in behaviour from Python API
As per the behaviour decided in the telecon that discussed !40 (closed) (noted there), specifying
R=x.T in kat script (i.e. a reference to another parameter without the
x.T's value to
R directly during parsing and passes the value (now a float or whatever) into the Python constructor. Meanwhile, if using the Python API directly, if the user specifies
R=x.T it should throw an error saying the user should specify either
Changes to core
Model.run defaults to
Noxaxis if not specified
Previously the parser set the analysis to
Noxaxis() if it wasn't specified in kat script, but this of course didn't do anything for Python API users. I figured it made more sense for
Model.run to handle this default so I moved the behaviour there.
Some checking tools like
pyflakes had to be updated because I used some walrus operators. Development environments will need to be updated when this gets merged.
Symbols in info parameters
In the example below,
order parameter is an expression involving a symbol. However,
order is not a model parameter, just an info parameter, so this will (probably) fail at the simulation stage. My opinion is that the parser should allow this since it's still possible to do via the Python API. We should probably then catch this earlier than the simulation, such as in the validation stage I discuss in #246.
var order 2 modulator mod1 10M 0.1 1+&order.value
Registering custom elements
KatSpec is no longer a singleton. This means it's slightly more difficult to register custom elements etc. than it was before. However, we might consider a different approach altogether for custom components, such as specifying their Python path / file path in the config. Needs discussed.
Why use a packrat parser?
- Infinite lookahead at the expense of more memory, but memory is cheap now. An entire program in memory is tiny compared to a photo taken with your smartphone...
- One function per grammar rule; recursive stack based descent, in contrast to LALR etc. that build a parsing table (push-down automaton).
- Lets you easily parse e.g. functions and statements: different only depending on what's inside the (). LALR would only look at the ( and not know what to do.
- Useful features not found in LALR/etc.: lookahead assertions (like regex) (lets you deal with contextful grammars)
- Conceptually simpler to debug. Shift/reduce conflicts in LALR can be pretty hard to figure out.
- Allows left recursion in grammar rules, like
key_value_list <- key_value_list ',' key_value_list. In LALR it would refuse to build the parser.
- Syntax errors can be detected by implementing rules just like the real rules (e.g. detecting if args follow kwargs), don't have to work out from the error handler.
- Blog posts/videos from Guido van Rossum:
Why use a new tokeniser?
- Need to keep raw token values around for re-generation. SLY and friends usually throw this information away at some point, e.g. by converting a
- Also needed to be able to display token values in errors correctly. Tabs need assigned a given number of spaces.
- Using the same container objects for tokens as for parser productions makes error messages easier to build.
- Lots of "magic" happens in SLY's tokeniser that is hard to understand. New tokeniser is conceptually simpler.