Data Classes
A data class is a regular Python class that has basic data model methods like __init__()
, __repr__()
, and __eq__()
implemented for you.
Introduced in Python 3.7, they typically containing mainly data, although there aren’t really any restrictions.
from dataclasses import dataclass
@dataclass
class DataClassCard:
rank: str
suit: str
They behave similar to named tuples but come with many more features. At the same time, named tuples have some other features that are not necessarily desirable, such as:
- By design it's a regular tuple, which can lead to subtle and hard to find bugs.
- It's hard to add default values to some fields.
- It's by nature immutable.
That being said, if you need your data structure to behave like a tuple, then a named tuple is a great alternative.
Advantages over regular classes⚑
- Simplify the class definition
@dataclass class DataClassCard: rank: str suit: str # Versus class RegularCard def __init__(self, rank, suit): self.rank = rank self.suit = suit
-
More descriptive object representation through a better default
__repr__()
method.* Instance comparison out of the box through a better default>>> queen_of_hearts = DataClassCard('Q', 'Hearts') >>> queen_of_hearts DataClassCard(rank='Q', suit='Hearts') # Versus >>> queen_of_spades = RegularCard('Q', 'Spades') >>> queen_of_spades <__main__.RegularCard object at 0x7fb6eee35d30>
__eq__()
method.>>> queen_of_hearts == DataClassCard('Q', 'Hearts') True # Versus >>> queen_of_spades == RegularCard('Q', 'Spades') False
Usage⚑
Definition⚑
from dataclasses import dataclass
@dataclass
class Position:
name: str
lon: float
lat: float
What makes this a data class is the @dataclass
decorator. Beneath the class Position:
, simply list the fields you want in your data class.
The data class decorator support the following parameters:
init
: Add.__init__()
method? (Default is True).repr
: Add.__repr__()
method? (Default is True).eq
: Add.__eq__()
method? (Default is True).order
: Add ordering methods? (Default is False).unsafe_hash
: Force the addition of a.__hash__()
method? (Default is False).frozen
: IfTrue
, assigning to fields raise an exception. (Default is False).
Default values⚑
It's easy to add default values to the fields of your data class:
from dataclasses import dataclass
@dataclass
class Position:
name: str
lon: float = 0.0
lat: float = 0.0
More complex default values can be defined through the use of functions. For example, the next snippet builds a French deck:
from dataclasses import dataclass, field
from typing import List
RANKS = '2 3 4 5 6 7 8 9 10 J Q K A'.split()
SUITS = '♣ ♢ ♡ ♠'.split()
def make_french_deck():
return [PlayingCard(r, s) for s in SUITS for r in RANKS]
@dataclass
class PlayingCard:
rank: str
suit: str
@dataclass
class Deck:
cards: List[PlayingCard] = field(default_factory=make_french_deck)
Using cards: List[PlayingCard] = make_french_deck()
introduces the using mutable default arguments anti-pattern. Instead, data classes use the default_factory
to handle mutable default values. To use it, you need to use the field()
specifier which is used to customize each field of a data class individually. It supports the following parameters:
default
: Default value of the field.default_factory
: Function that returns the initial value of the field.init
: Use field in.__init__()
method? (Default isTrue
).repr
: Use field inrepr
of the object? (Default isTrue
). For example to hide a parameter from therepr
, uselat: float = field(default=0.0, repr=False)
.compare
: Include the field in comparisons? (Default isTrue
).hash
: Include the field when calculatinghash()
? (Default is to use the same ascompare
).-
metadata
: A mapping with information about the field. It's not used by the data classes themselves but is available for you to attach information to fields. For example:from dataclasses import dataclass, field @dataclass class Position: name: str lon: float = field(default=0.0, metadata={'unit': 'degrees'}) lat: float = field(default=0.0, metadata={'unit': 'degrees'})
To retrieve the information use the
fields()
function.>>> from dataclasses import fields >>> fields(Position) (Field(name='name',type=<class 'str'>,...,metadata={}), Field(name='lon',type=<class 'float'>,...,metadata={'unit': 'degrees'}), Field(name='lat',type=<class 'float'>,...,metadata={'unit': 'degrees'})) >>> lat_unit = fields(Position)[2].metadata['unit'] >>> lat_unit 'degrees'
Type hints⚑
They support typing out of the box. Without a type hint, the field will not be a part of the data class.
While you need to add type hints in some form when using data classes, these types are not enforced at runtime. This is how typing in python usually works: Python is and will always be a dynamically typed language.
Adding methods⚑
Same as with a normal class.
Adding complex order comparison logic⚑
from dataclasses import dataclass
@dataclass(order=True)
class PlayingCard:
rank: str
suit: str
def __str__(self):
return f'{self.suit}{self.rank}'
After setting order=True
in the decorator definition the instances of PlayingCard
can be compared.
>>> queen_of_hearts = PlayingCard('Q', '♡')
>>> ace_of_spades = PlayingCard('A', '♠')
>>> ace_of_spades > queen_of_hearts
False
Data classes compare objects as if they were tuples of their fields. A Queen is higher than an Ace because Q
comes after A
in the alphabet.
>>> ('A', '♠') > ('Q', '♡')
False
To use more complex comparisons, we need to add the field .sort_index
to the class. However, this field should be calculated from the other fields automatically. That's what the special method .__post_init__()
is for. It allows for special processing after the regular .__init__()
method is called.
from dataclasses import dataclass, field
RANKS = '2 3 4 5 6 7 8 9 10 J Q K A'.split()
SUITS = '♣ ♢ ♡ ♠'.split()
@dataclass(order=True)
class PlayingCard:
sort_index: int = field(init=False, repr=False)
rank: str
suit: str
def __post_init__(self):
self.sort_index = (RANKS.index(self.rank) * len(SUITS)
+ SUITS.index(self.suit))
def __str__(self):
return f'{self.suit}{self.rank}'
Note that .sort_index
is added as the first field of the class. That way, the comparison is first done using .sort_index
and only if there are ties are the other fields used. Using field()
, you must also specify that .sort_index
should not be included as a parameter in the .__init__()
method (because it is calculated from the .rank
and .suit
fields). To avoid confusing the user about this implementation detail, it is probably also a good idea to remove .sort_index from the repr
of the class.
Immutable data classes⚑
To make a data class immutable, set frozen=True
when you create it.
from dataclasses import dataclass
@dataclass(frozen=True)
class Position:
name: str
lon: float = 0.0
lat: float = 0.0
In a frozen data class, you can not assign values to the fields after creation:
>>> pos = Position('Oslo', 10.8, 59.9)
>>> pos.name
'Oslo'
>>> pos.name = 'Stockholm'
dataclasses.FrozenInstanceError: cannot assign to field 'name'
from dataclasses import dataclass
from typing import List
@dataclass(frozen=True)
class ImmutableCard:
rank: str
suit: str
@dataclass(frozen=True)
class ImmutableDeck:
cards: List[PlayingCard]
Even though both ImmutableCard
and ImmutableDeck
are immutable, the list holding cards is not. You can therefore still change the cards in the deck:
>>> queen_of_hearts = ImmutableCard('Q', '♡')
>>> ace_of_spades = ImmutableCard('A', '♠')
>>> deck = ImmutableDeck([queen_of_hearts, ace_of_spades])
>>> deck
ImmutableDeck(cards=[ImmutableCard(rank='Q', suit='♡'), ImmutableCard(rank='A', suit='♠')])
>>> deck.cards[0] = ImmutableCard('7', '♢')
>>> deck
ImmutableDeck(cards=[ImmutableCard(rank='7', suit='♢'), ImmutableCard(rank='A', suit='♠')])
To avoid this, make sure all fields of an immutable data class use immutable types (but remember that types are not enforced at runtime). The ImmutableDeck
should be implemented using a tuple instead of a list.
Inheritance⚑
You can subclass data classes quite freely.
from dataclasses import dataclass
@dataclass
class Position:
name: str
lon: float
lat: float
@dataclass
class Capital(Position):
country: str
>>> Capital('Oslo', 10.8, 59.9, 'Norway')
Capital(name='Oslo', lon=10.8, lat=59.9, country='Norway')
Warning
This won't work if the base class have default values unless all the subclass parameters also have default values.
Warning
If you redefine a base class field, you need to keep the fields order after the subclass new fields:
from dataclasses import dataclass
@dataclass
class Position:
name: str
lon: float = 0.0
lat: float = 0.0
@dataclass
class Capital(Position):
country: str = 'Unknown'
lat: float = 40.0
Optimizing Data Classes⚑
Slots can be used to make classes faster and use less memory.
from dataclasses import dataclass
@dataclass
class SimplePosition:
name: str
lon: float
lat: float
@dataclass
class SlotPosition:
__slots__ = ['name', 'lon', 'lat']
name: str
lon: float
lat: float
.__slots__
to list the variables on a class. Variables or attributes not present in .__slots__
may not be defined. Furthermore, a slots class may not have default values.