Rewriting imperative functions as dataclasses

Stepping through a simple rewrite, with justifications

Here's a trivial equivalent of some Python code I wrote:

def pastry_template(
    hours: int,
    fruit: str,
    flour: str = "self-raising",
    temp: int = 180,
    turn: bool = True,
    chop: bool = True,
):
    recipe = f"""
    First turn your oven to {temp}°C.
    Next, mix 100g of {flour} flour into the {'chopped ' if chop else ''}{fruit}.
    {'Turn once during cooking' if turn else ''}
    After {'an hour' if hours == 1 else f'{hours} hours'} take your pastry out to cool."""
    return recipe

This was useful to me as I had a lot of similar fruit pastry recipies to bake at work, and I got the right recipes, but it wasn't great code. Doing everything in one big f-string seemed wrong, and for example forced me to tolerate a blank line in recipes that didn't require turning.

The obvious solution I saw was a dataclass, which simplifies to:

from dataclasses import dataclass

@dataclass
class Pastry:
    hours: int
    fruit: str
    flour: str = "self-raising"
    temp: int = 180
    turn: bool = True
    chop: bool = True

    @property
    def recipe(self) -> str:
    return f"""
    First turn your oven to {temp}°C.
    Next, mix 100g of {flour} flour into the {'chopped ' if chop else ''}{fruit}.
    {'Turn once during cooking' if turn else ''}
    After {'an hour' if hours == 1 else f'{hours} hours'} take your pastry out to cool."""

def pastry_template(
    hours: int,
    fruit: str,
    flour: str = "self-raising",
    temp: int = 180,
    turn: bool = True,
    chop: bool = True,
):
    pastry = Pastry(hours=hours, fruit=fruit, flour=flour, temp=temp, turn=turn, chop=chop)
    return pastry.recipe

This is the strangler fig refactor pattern: the old interface is kept, all that changed is what happens inside. Once that's confirmed, the old interface can be stripped away if not needed. The kwarg passing is clearly redundant and begging to be deleted.

The correctness of a rewrite aside (which tests can verify, or use of a trustworthy automated refactoring tool), what I prefer about the new style is that it is clearer about the 'contract' and the computation.

I went down a Wikipedia rabbit hole recently and at the bottom found the Eiffel programming language, an object-oriented language written with the goal of

conceived in 1985 with the goal of increasing the reliability of commercial software development

and whose key characteristics include:

  • An object-oriented program structure in which a class serves as the basic unit of decomposition.
  • Design by contract tightly integrated with other language constructs.
  • ...
  • Static typing

Design by contract in brief:

prescribes that software designers should define formal, precise and verifiable interface specifications for software components, which extend the ordinary definition of abstract data types with preconditions, postconditions and invariants. These specifications are referred to as "contracts", in accordance with a conceptual metaphor with the conditions and obligations of business contracts.

I don't want to revive DbC, but I really appreciated this idea of preconditions and postconditions. The clear separation of these aspects does allow us to reason about code better.

My original function also did some comparisons between the bools involved, which is harder to illustrate for a pastry, but imagine the function had extra flags it compared like:

def pastry_template(
    allergens: bool = False,
    rare_ingredient: bool = False,
    prep_time_mins: int = 60,
):
    if not (rare_ingredient or allergens):
        no_risk = True
    if rare_ingredient or prep_time_mins > 60:
        high_effort = True

Dataclasses don't support this, so you'd put these checks in property methods too:

@dataclass
class Pastry:
    allergens: bool = False
    difficult: bool = False
    prep_time_mins: int = 60

    @property
    def no_risk(self) -> bool:
        return not (self.rare_ingredient or self.allergens)

    @property
    def high_effort(self) -> bool:
        return self.rare_ingredient or self.prep_time_mins > 60

...and these properties could be accessed in your recipe method on self rather than from the local namespace. This is the part I like: it makes the individual calculations that go into the 'one big procedure' separate, and nudges you to maintain a sense of state on the class, which in turn simplifies variable names (e.g. pastry_risk can be shortened to no_risk when it's on the Pastry class). That shedding of a noun becomes quite clarifying when you do it at the scale of an entire program.

By this nudge toward dataclasses, Python programs get nudged to store their state in a clearer way, and the computation upon that state is more readily separated as being available at instantiation or not.

When you split them apart like that, you are then led to move those classes out of the same module as the business logic operating on them entirely. This is how you end up with modules with names like data_model.py (perhaps a bad name, but sometimes appropriate and feels clear to me from reuse).

Crucially, I find code with this emphasis on clarity and separation easier to show others, and easier to revisit myself.

def __post_init__(self) -> None:
    """Fail fast if requirements not met!"""
    if recipe.high_effort and not self.no_risk:
        raise ValueError("Recipe must be less hassle!")

In reality this is the equivalent of asserting the paramaters do not specify incompatible aims for the program.


This post is the 2nd of Designing with dataclasses, a series on using Python dataclasses for clarity about where state lives and ease of reasoning about program behaviour. Read on for discussion of how you can use them with inheritance