Skip to content

Remove query parsers' dependence on database state

There are several database querying mini-languages written using the pyparsing module. The very bad decision was made to have the languages depend on the state of the database, by having things like labels and pipeline names be reserved words. This means any addition to the set of these values will require recompiling the parser, so as a result it's recompiled for every query. Speed considerations aside, this adds some serious complexity to the parsers, and means it's possible to break the parser by adding a badly named or non-unique value into one of the tables.

A much better approach would be to add a generic "identifier" token to the language. Then at code-generation time it would be resolved based on the database state.

To use Python as an analogy, consider what happens if one tries accessing an undefined variable

>>> foo
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'foo' is not defined

note that this isn't a SyntaxError, as Python knows foo is a valid identifier, but is unbound. The parser read my statement without issue, but the code generation phase correctly identified the missing name. This should be how our query language works as well.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information