我想解析一条SQL Select语句,它具有像MySQL这样的普通SQL方言所具有的所有功能.我寻找了用Python编写的解析库,但找不到可以完成这项工作的解析库.我的意思是我找到了一些解析库,但它们只能解析基本的SELECT语句(FROM和WHERE,甚至不能进行ORDER BY).因此,作为替代,我制作了自己的解析器(我知道这根本不是一个很好的解决方案).我花了几个小时来处理它,但我一直收到一个奇怪的错误,不知道如何处理它. 在我展示代码之前,我只想提到,如果您知道一个能够解析SQL语句(不仅是SELECT语句,还可以是CREATE TABLE、INSERT等)的Python库,请让我知道.
以下是我的语言语法字符串:
select_grammar = """
start: select_statement ";"
select_statement: "SELECT" column_list "FROM" table_list join_list? where_clause? groupby_clause? having_clause? orderby_clause?
column_list: "*" | column_expr ("," column_expr)*
column_expr: function_call | column_name | subquery
column_name: (table_name ".")? NAME ("AS" NAME)?
table_name: NAME ("AS" NAME)?
function_call: NAME "(" function_args ")" ("AS" NAME)?
function_args: expression ("," expression)*
where_clause: "WHERE" condition
groupby_clause: "GROUP BY" column_expr ("," column_expr)*
having_clause: "HAVING" logical_expr
orderby_clause: "ORDER BY" order_column ("," order_column)*
order_column: column_expr ["ASC" | "DESC"]?
condition: logical_expr
logical_expr: logical_term
| logical_expr "AND" logical_term
| logical_expr "OR" logical_term
| "NOT" logical_term
logical_term: comparison_expr
| "(" logical_expr ")"
| subquery
comparison_expr: expression OPERATOR expression
| expression "IS" ("NULL" | "NOT NULL")
expression: (table_name ".")? NAME | INT | string | function_call | subquery
table_list: table_name ("," table_name)* | subquery
subquery: "(" select_statement ")"
join_list: join_expr+
join_expr: join_type (table_name | subquery) "ON" condition
join_type: "INNER JOIN" | "LEFT JOIN" | "RIGHT JOIN" | "FULL JOIN"
string: ESCAPED_STRING | /'[^']*'/
OPERATOR: ">" | "<" | ">=" | "<=" | "=" | "!="
%import common.CNAME -> NAME
%import common.INT
%import common.ESCAPED_STRING
%import common.WS
%ignore WS
"""
我还创建了Transformer类,如下所示:
@v_args(inline=True)
class SelectTransformer(Transformer):
def start(self, *args):
print("start result: ", args)
return Tree("SELECT statement", args)
def column_list(self, *args):
return args
def column_expr(self, *args):
return args[0] if len(args) == 1 else args
def function_call(self, name, args, alias=None):
return (name, args, alias)
def subquery(self, value):
print("Subquery:", value)
def where_clause(self, condition=None):
return condition
def groupby_clause(self, *args):
return args
def having_clause(self, condition=None):
return condition
def orderby_clause(self, *args):
return args
def order_column(self, *args):
return args
def condition(self, *args):
return args
def logical_expr(self, *args):
return args
def logical_term(self, *args):
return args
def comparison_expr(self, *args):
return args
def expression(self, *args):
return args[0] if len(args) == 1 else args
def column_name(self, *args):
if len(args) == 1:
return args[0] # No alias present
elif len(args) == 3:
return args[0], args[2] # Alias present, return a tuple
else:
return args
def table_list(self, *args):
return args
def join_list(self, *args):
return args
def join_expr(self, *args):
return args
def join_type(self, *args):
return args
def subquery(self, *args):
return args
def string(self, value):
return value.strip("'")
def table_name(self, *args):
if len(args) == 1:
return args[0] # No alias present
elif len(args) == 3:
return args[0], args[2] # Alias present, return a tuple
else:
return args
我不知道这是否重要,我还创建了一个小函数来很好地显示最终的树:
def format_ast(ast, level=0):
result = ""
indent = " " * level
if isinstance(ast, tuple):
for item in ast:
result += format_ast(item, level + 1)
elif isinstance(ast, Token):
result += f"{indent}{ast.type}, Token('{ast.value}')\n"
elif isinstance(ast, Tree):
result += f"{indent}Tree({ast.data}), [\n"
for child in ast.children:
result += format_ast(child, level + 1)
result += f"{indent}]\n"
else:
result += f"{indent}{ast}\n"
return result
下面是我正在分析的语句:
sql_query = 'SELECT ' \
'name AS alias, ' \
'COUNT(age) AS age_alias, ' \
'(SELECT department_name FROM departments WHERE department_id = employees.department_id) ' \
'FROM employees AS emp, department ' \
'INNER JOIN departments AS dep ON employees.department_id = departments.id ' \
'LEFT JOIN other_table AS ot ON other_table.id = employees.table_id ' \
'WHERE age > 25 ' \
'GROUP BY age, name ' \
'HAVING COUNT(age) > 1 ' \
'ORDER BY name ASC, age DESC;'
我执行的代码是这样的:
parser = Lark(select_with_joins_grammar, parser='lalr', transformer=SelectTransformer())
tree = parser.parse(sql_query)
# Print the custom export format
print(format_ast(tree))
这个问题与我的类SelectTransformer的Join_type()方法有关.不知何故,*args总是空的,尽管理论上它应该包含(就像规则中定义的那样)"内联接"或"左联接"、"右联接"或"完全联接". 我的输出如下所示:
Tree(SELECT statement), [
Tree(select_statement), [
NAME, Token('name')
NAME, Token('alias')
NAME, Token('COUNT')
Tree(function_args), [
NAME, Token('age')
]
NAME, Token('age_alias')
Tree(select_statement), [
NAME, Token('department_name')
NAME, Token('departments')
NAME, Token('department_id')
OPERATOR, Token('=')
NAME, Token('employees')
NAME, Token('department_id')
]
NAME, Token('employees')
NAME, Token('emp')
NAME, Token('department')
NAME, Token('departments')
NAME, Token('dep')
NAME, Token('employees')
NAME, Token('department_id')
OPERATOR, Token('=')
NAME, Token('departments')
NAME, Token('id')
NAME, Token('other_table')
NAME, Token('ot')
NAME, Token('other_table')
NAME, Token('id')
OPERATOR, Token('=')
NAME, Token('employees')
NAME, Token('table_id')
NAME, Token('age')
OPERATOR, Token('>')
INT, Token('25')
NAME, Token('age')
NAME, Token('name')
NAME, Token('COUNT')
Tree(function_args), [
NAME, Token('age')
]
None
OPERATOR, Token('>')
INT, Token('1')
NAME, Token('name')
NAME, Token('age')
]
]
如您所见,没有显示任何联接类型. 我对解析比较陌生,所以我真的不知道该try 什么.