Early returns

What are early returns and how can we verify the correctness of refactors to/from them?

A nice summary of the debate pro/anti-early returns can be found on the Wikipedia page for 'structured programming'.

I find I tend to use early returns when I'm writing hurriedly, and when I do so I transfer the 'short term context' into an 'imperative' style of code (which can be more succinct but ultimately less clear to re-read later on, or by another person, who no longer has the short-term context of the original author and instead must build it back up).

I repeatedly find that Python code which uses early returns can end up duplicating code at the return sites, and even include unnecessary re-computation (presumably as it was unclear exactly what state was available).

Another way of referring to this problem of duplicating code at the return sites is that single exit functions are “easier to instrument”.

Guard clauses

A common event handling pattern is to run some basic checks on the event you've received before processing. These are known as guard clauses (or pre-conditions). The main idea of the structured programming paradigm is that it's best to make the pre- and post-conditions of a program explicit.

Here's an example:

Note: put from __future__ import annotations at the top of your imports to run these examples pre-3.11

def guarded_double(a: int | None) -> int | None:
    if a is None:
      return None
    return a * 2

While return on its own may seem equivalent to return None, both PEP8 and the type checker mypy disagree, hence I use return None in place of return

Here's the same logic, refactored to use a single return statement:

def double(a: int | None) -> int | None:
    return None if a is None else (a * 2)

I can verify this refactor was correct by writing exhaustive test cases: 1 binary condition, two functions, so 2 arguments for each function = 2 pairs of outputs that should be equivalent.

>>> guarded_double(None) is double(None) is None
True
>>> guarded_double(1) == double(1) == 2
True

One way to verify the correctness of this refactor is with an overloaded signature that can be type checked.

@typing.overload was explained nicely by Adam Johnson, who jokes "May type hints never overload you," which is suggestive of the cognitive load of refactoring. You must essentially hold these overloaded function signatures in your head at once, until finished.

@overload
def overloaded_double(a: int) -> int:
    ...

@overload
def overloaded_double(a: None) -> None:
    ...

def overloaded_double(a: int | None) -> int | None:
    if a is None:
        return
    return a * 2

What we've done here isn't quite the same as writing out test cases (as test cases would need to be checked at runtime), but we have made the implicit function overloading of the guard clause explicit in the separated (overloaded) function signatures.

mypy can check the correctness from the type signature (which I've put on GitHub [here][example1]: compare before and after).

If our function accidentally changed behaviour during the refactor (let's say int input can now give None output) then neither the overloaded signatures nor the simple one will allow mypy to detect this. The tool is simply not able to do this at present.

For brevity I'm not showing both versions for this one, but you can find them on GitHub

def double(a: int | None) -> int | None:
    if a is None:
        output = None
    elif a > 2:
        output = (a * 2)
    else:
        output = None
    return output

$ mypy mutants/
Success: no issues found in 2 source files

This means that if you're not careful doing your refactor, there's no way to check that you did it correctly, besides writing exhaustive unit tests.

In a real world scenario, you'd probably not encounter the assumptions of the problem already laid out nicely in overloaded function signatures.

Even if you had done, the above example is trivial for another reason.

All of the necessary information is present in the argument types declared in the function signature. However it's easy to consider examples that would escape this sort of checking, and render our mypy overloaded signature verification method useless to verify refactor correctness.

To illustrate, consider a function which measures the length of the "payload" entry in a dict.

def guarded_handler(event: dict[str, str]) -> int | None:
    if (payload := event.get("payload")) is None:
        return None
    return len(payload)

We can refactor it in the same way, but it quickly becomes a bit unsightly:

def ternary_handler(event: dict) -> int | None:
    return None if (payload := event.get("payload")) is None else len(payload)

In real world examples you'd use if/else blocks for greater readability:

def handler(event: dict) -> int | None:
    payload = event.get("payload")
    if payload is None:
        output = None
    else:
        output = len(payload)
    return output

These are 3 versions of the 'same' function: they all follow identical logic with the input:

The guarded_handler function uses a guard condition, as previously covered
The ternary_handler function uses a ternary condition when assigning to the return value
The handler function uses structured programming style (nested if/else block with explicit assignments to a named object which becomes the return value)

How can we verify the equivalence of 1, 2, and 3?

This post is the 2nd of a series on Refactor verification, investigating how to verify the correctness of refactors (or automating the human error away). Read on for discussion of AST rewriting using the refactor Python library.