In the previous part of this series, we introduced the following 3 examples which have equivalent behaviour, and found that type checking alone is insufficient to verify the correctness of rewrites.
def guarded_handler(event: dict[str, str]) -> int | None:
if (payload := event.get("payload")) is None:
return None
return len(payload)
def ternary_handler(event: dict) -> int | None:
return None if (payload := event.get("payload")) is None else len(payload)
def handler(event: dict) -> int | None:
payload = event.get("payload")
if payload is None:
output = None
else:
output = len(payload)
return output
The 2nd and 3rd (ternary_handler
and handler
) are more alike than the 1st,
as they both have a single return value, so it makes sense for us to try to verify them first.
import ast
from enum import Enum
from pathlib import Path
class Handlers(Enum):
guarded = "guarded_handler.py"
ternary = "ternary_handler.py"
linear = "handler.py"
all_handler_programs = [ast.parse(Path(h.value).read_text()) for h in Handlers]
all_handler_functions = [tree.body[1] for tree in all_handler_programs]
assert all("handler" in f.name for f in all_handler_functions)
Comparing two Abstract Syntax Trees
I found some code to dict
-ify the AST on StackOverflow (ironically using early
returns itself), which I updated for Python 3, to pretty-print the ASTs for review:
def simplify(node):
"""
Modified version of AST pprinter https://stackoverflow.com/a/19598419/
"""
ignored_node_attributes = (
"lineno end_lineno col_offset end_col_offset ctx type_comment keywords"
)
if isinstance(node, ast.AST):
res = vars(node).copy()
for k in ignored_node_attributes.split():
res.pop(k, None)
for k, v in res.items():
res[k] = simplify(v)
res["__type__"] = type(node).__name__
return res
elif isinstance(node, list):
return list(map(simplify, node))
else:
return node
For the ternary handler it gives:
[{'value': {'test': {'left': {'target': {'id': 'payload', '__type__': 'Name'},
'value': {'func': {'value': {'id': 'event',
'__type__': 'Name'},
'attr': 'get',
'__type__': 'Attribute'},
'args': [{'value': 'payload',
'kind': None,
'__type__': 'Constant'}],
'__type__': 'Call'},
'__type__': 'NamedExpr'},
'ops': [{'__type__': 'Is'}],
'comparators': [{'value': None,
'kind': None,
'__type__': 'Constant'}],
'__type__': 'Compare'},
'body': {'value': None, 'kind': None, '__type__': 'Constant'},
'orelse': {'func': {'id': 'len', '__type__': 'Name'},
'args': [{'id': 'payload', '__type__': 'Name'}],
'__type__': 'Call'},
'__type__': 'IfExp'},
'__type__': 'Return'}]
And for the linear handler it gives:
[{'targets': [{'id': 'payload', '__type__': 'Name'}],
'value': {'func': {'value': {'id': 'event', '__type__': 'Name'},
'attr': 'get',
'__type__': 'Attribute'},
'args': [{'value': 'payload',
'kind': None,
'__type__': 'Constant'}],
'__type__': 'Call'},
'__type__': 'Assign'},
{'test': {'left': {'id': 'payload', '__type__': 'Name'},
'ops': [{'__type__': 'Is'}],
'comparators': [{'value': None,
'kind': None,
'__type__': 'Constant'}],
'__type__': 'Compare'},
'body': [{'targets': [{'id': 'output', '__type__': 'Name'}],
'value': {'value': None, 'kind': None, '__type__': 'Constant'},
'__type__': 'Assign'}],
'orelse': [{'targets': [{'id': 'output', '__type__': 'Name'}],
'value': {'func': {'id': 'len', '__type__': 'Name'},
'args': [{'id': 'payload', '__type__': 'Name'}],
'__type__': 'Call'},
'__type__': 'Assign'}],
'__type__': 'If'},
{'value': {'id': 'output', '__type__': 'Name'}, '__type__': 'Return'}]
From glancing between these trees, and thinking about what they represent, it seems simple enough to construct an equivalence from three simplifications:
-
The assignment with
:=
should be considered as a preceding assignment expression. This is represented in the AST by aNamedExpr
node. -
The ternary value assignment should be considered as equivalent to
if
/else
clauses. Both are represented in the AST by atest
node. -
The unnamed return value (with the ternary expression) should be considered as equivalent to the named value. That is, consider a 'direct' value in the return statement as equivalent to a reference.
That seems to break the problem down sufficiently to tackle it.
-
The
NamedExpr
node in the ternary handler can be pulled out of theleft
side of theIfExp
and be considered for purpose of comparison as a precedingAssign
node, for which it should match the same node in the linear handler. Where the value was used, it can be considered equivalent to any named variable with the samevalue
forargs
(i.e. the variable is not renamed). -
The
test
node in the ternary handler (which says simplypayload is None
):
{'target': {'id': 'payload', '__type__': 'Name'} ...
'ops': [{'__type__': 'Is'}], 'comparators': [{'value': None
matches the test
node in the linear handler:
{'test': {'left': {'id': 'payload', '__type__': 'Name'},
'ops': [{'__type__': 'Is'}], 'comparators': [{'value': None,
- The direct return value of the ternary handler
'body': {'value': None, 'kind': None, '__type__': 'Constant'},
'orelse': {'func': {'id': 'len', '__type__': 'Name'},
'args': [{'id': 'payload', '__type__': 'Name'}], '__type__': 'Call'},
is the same as the return value from the variable output
in the linear handler:
{'value': {'id': 'output', '__type__': 'Name'}, '__type__': 'Return'}]
which comes from one of two Assign
nodes in the body
and orelse
of the If
node.
'body': ... {'value': None, 'kind': None, '__type__': 'Constant'},
'orelse': ... {'func': {'id': 'len', '__type__': 'Name'},
'args': [{'id': 'payload', '__type__': 'Name'}], '__type__': 'Call'},
So far so easy to make rules for: but what about early returns?
Revisiting the guarded handler
Now we can return to the guarded handler:
[{'test': {'left': {'target': {'id': 'payload', '__type__': 'Name'},
'value': {'func': {'value': {'id': 'event',
'__type__': 'Name'},
'attr': 'get',
'__type__': 'Attribute'},
'args': [{'value': 'payload',
'kind': None,
'__type__': 'Constant'}],
'__type__': 'Call'},
'__type__': 'NamedExpr'},
'ops': [{'__type__': 'Is'}],
'comparators': [{'value': None,
'kind': None,
'__type__': 'Constant'}],
'__type__': 'Compare'},
'body': [{'value': {'value': None, 'kind': None, '__type__': 'Constant'},
'__type__': 'Return'}],
'orelse': [],
'__type__': 'If'},
{'value': {'func': {'id': 'len', '__type__': 'Name'},
'args': [{'id': 'payload', '__type__': 'Name'}],
'__type__': 'Call'},
'__type__': 'Return'}]
To get to the linear handler from this will require all of the 3 rules used for the ternary handler, plus:
- An
if
statement with areturn
statement inside entails an implicitelse
clause for everything that follows.
This lets it be 'translated' into a program with an if
/else
clause that returns once
with a variable assigned in the condition blocks rather than returning early.
The only other rule needed to 'rewrite' the AST into the linear handler's AST is the 3rd one we already defined (consider anonymous and named return values as equivalent).
Specifying the precise ast.AST
node conditions
To actually use these rules, we need to be more specific. We can rewrite them with reference to the AST nodes themselves as:
- An
ast.NamedExpr
walrus assignment ⇒ a precedingast.Assign
. - An
ast.Assignment
from anast.IfExp
ternary value ⇒ anast.If
if
/else
statement with 2ast.Assign
statements (inbody
andorelse
). - Unnamed
ast.Return
value + name ⇒ast.Return
byid
(named value) - An
ast.Compare
if
block with anast.Return
in the body ⇒ anast.orelse
clause for everything that follows.
This post is the 3rd of a series on Refactor verification, investigating how to verify the correctness of refactors (or automating the human error away). In the next section, we'll get into writing some real refactor rules, then come up with a schema to validate our rules without running them (so we can store and use them confidently).