These trees are rewritten to contain only contain nodes ultimately needed by the Executor.This means that all nodes not expected in later stages of query execution can't be printed, for example INSERT/UPDATE/DELETE nodes - since later on these are all aggregated in a node.

In short give pyparsing a try, it will most likely be powerful enough to do what you need and the simple integration with python (with easy callbacks and error handling) will make the experience pretty painless.

This poster on the pyparsing wiki ( just reported completing a SQL SELECT parser - perhaps you could contact him/her for help, suggestions, or even the code.

I am looking into using ANTLR to produce an AST that represents the SQL as a relational algebra expression. I have never implemented a parser before, and I would therefore like some advice on how to best implement a SQL parser and evaluator.

Update: I implemented a simple SQL parser using pyparsing.

It provides support for parsing, splitting and formatting SQL statements.

Visit the project page for additional information and documentation.Postgres already offers tooling to create a query tree from a given query string.This approach isn't without caveats though, since the node To String code is used only for printing out planned query trees.For my case, I only essentially needed a where clause.I tried booleneo (a boolean expression parser) written with pyparsing but ended up using pyparsing from scratch. я соврал )(src): yield [value] values, src return for value, src in parse_value(src): yield [value], src parse_left_curly_bracket = parse_word("") parse_empty_object = sequence(parse_left_curly_bracket, parse_right_curly_bracket) def parse_object(src): for _, src in parse_empty_object(src): yield , src return for (_, items, _), src in sequence( parse_left_curly_bracket, parse_comma_separated_keyvalues, parse_right_curly_bracket, )(src): yield items, src parse_colon = parse_word(":") def parse_keyvalue(src): for (key, _, value), src in sequence( parse_string, parse_colon, parse_value )(src): yield , src def parse_comma_separated_keyvalues(src): for (keyvalue, _, keyvalues), src in sequence( parse_keyvalue, parse_comma, parse_comma_separated_keyvalues, # тут снова рекурсия, не проглядите )(src): keyvalue.update(keyvalues) yield keyvalue, src return for keyvalue, src in parse_keyvalue(src): # к сожалению, питон не умеет в генераторе возвращать другой генератор yield keyvalue, src def parse(s): s = s.strip() # наш токенайзер убивает пробелы после токенов, но не терпит до match = list(parse_value(s)) if len(match) ! про отладку расскажу в другой раз :) raise Value Error("not a valid JSON string") result, src = match[0] if src.strip(): # мы распарсили, но в строке ещё что-то осталось. raise Value Error("not a valid JSON string") return result from itertools import chain import re def sequence(*funcs): if len(funcs) == 0: def result(src): yield (), src return result def result(src): for arg1, src in funcs[0](src): for others, src in sequence(*funcs[1:])(src): yield (arg1,) others, src return result number_regex = re.compile(r"(-? При этом я прекрасно отдаю себе отчёт, что это просто новая форма для старого доброго рекурсивного спуска; но если программирование — это искусство, разве не важна в нём форма если не наравне, то хотя бы в степени, близкой к содержанию? Как обычно, не откладывая пишите в личку обо всех обнаруженных неточностях, орфографических, грамматических и фактических ошибках — иначе я сгорю от стыда!