Remove query parsers' dependence on database state
There are several database querying mini-languages written using the pyparsing
module. The very bad decision was made to have the languages depend on the state of the database, by having things like labels and pipeline names be reserved words. This means any addition to the set of these values will require recompiling the parser, so as a result it's recompiled for every query. Speed considerations aside, this adds some serious complexity to the parsers, and means it's possible to break the parser by adding a badly named or non-unique value into one of the tables.
A much better approach would be to add a generic "identifier" token to the language. Then at code-generation time it would be resolved based on the database state.
To use Python as an analogy, consider what happens if one tries accessing an undefined variable
>>> foo
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'foo' is not defined
note that this isn't a SyntaxError
, as Python knows foo
is a valid identifier, but is unbound. The parser read my statement without issue, but the code generation phase correctly identified the missing name. This should be how our query language works as well.