Introduction

Dataclass tensions

I sense that the introduction of dataclasses into the standard library (in Python 3.7, released mid-2018 and with one more year left til end of life), there is an implicit push to refactor code into a different form.

I say implicit because different tools have different affordances, so different programming components afford different ways of use, and ultimately library layout/architecture.

There are some observations to make about dataclasses themselves, as well as their intersection with typing in both the typing module and pydantic (a library for validating typed Python).

As the name suggests, dataclasses are classes, but through using them I've become more attuned to the benefits of the separation of concern between dynamic state built 'on the fly' and that which can be assigned at instantiation.

Typical Python classes have an initialisation method run at instantiation. However a lot of code doesn't even use classes.

I know some Python programmers who are averse to classes entirely, and prefer what I call a 'procedural' style of Python (everything just passing through functionseverything just passing through functions).

I should admit here that I used to be one such Python programmer myself, and only began to when I started to tackle more structured problems involving state (a tricky puzzle game specifically), but it soon became a fixture in my programming on less recreational projects.

Not only do I sense a move away from this (maybe while myself following trends and nudges in the trajectory of the language) but I see programmers who are leaning on this style (e.g. kwarg passing rather than state stored in classes) complaining at the friction created by typed Python: which is to say the opposite of an affordance, a barrier.

I believe the reason classes aren't so readily used is two-fold:

Classes require some level of forethought whereas a function can be written in an imperative style

text = "1"
x = int(text)
y = x + 1
z = y * 2

is imperative code to get a long-winded 4. Statements are made and depend on the previous order. Rewriting this code as a class nudges you to rename the methods (again, properties here, as the computation is trivial), and in doing so the actual abstract function of the transformation involved is clarified, whereas in imperative code you tend to name things after what the output of that transformation represents:

class FancyNumber:
    def __init__(self, text: str = "1"):
        self.text = text

    @property
    def as_int(self) -> int:
        return int(self.text)

    @property
    def incremented_int(self) -> int:
        return self.as_int + 1

    @property
    def doubled_incremened_int(self) -> int:
        return self.incremented_int * 2

assert z == FancyNumber.doubled_incremened_int

Note that this is such a trivial example that the extra code looks like too much effort, but when used on code that's handling complex business logic (whose proper operation and future adaptability is important, and thus whose correctness it is valuable to be confident in) this pays off in ease of re-reading, and contributing to a clearer mental model within the context of a larger project.

Secondly, they require you to write init methods, which can be tedious exercises of repeating the same variable names over and over.

class A:
    def __init__(foo: str, bar: str, baz: str):
        self.foo = foo
        self.bar = bar
        self.baz = baz

Dataclasses are just classes whose init method is 'written for you', solving the 2nd.

In regards to the first objection to classes, there's an implicit assumption that an imperative code style is sustainable, and that mixing up ideas of preconditions, invariants, etc. is not going to degrade your ability to reason about the software as a whole.

I also find that you can achieve something that just feels a lot like verifying the 'contract' is met for the part of the program when writing a dataclass, by incorporating error checking into a 'post-init' method. I'll show examples below of doing this in practice.


This post is the 1st of Designing with dataclasses, a series on using Python dataclasses for clarity about where state lives and ease of reasoning about program behaviour. Read on for discussion of Rewriting imperative functions as dataclasses