Pydantic
Pydantic is a data validation and settings management using python type annotations.
pydantic enforces type hints at runtime, and provides user friendly errors when data is invalid.
Define how data should be in pure, canonical python; check it with pydantic.
Install⚑
pip install pydantic
If you use mypy I highly recommend you to activate the pydantic plugin by adding to your pyproject.toml
:
[tool.mypy]
plugins = [ "pydantic.mypy",]
[tool.pydantic-mypy]
init_forbid_extra = true
init_typed = true
warn_required_dynamic_aliases = true
warn_untyped_fields = true
Advantages and disadvantages⚑
Advantages:
- Perform data validation in an easy and nice way.
- Seamless integration with FastAPI and Typer.
- Nice way to export the data and data schema.
Disadvantages:
- You can't define cyclic relationships, therefore there is no way to simulate the backref SQLAlchemy function.
Models⚑
The primary means of defining objects in pydantic is via models (models are simply classes which inherit from BaseModel
).
You can think of models as similar to types in strictly typed languages, or as the requirements of a single endpoint in an API.
Untrusted data can be passed to a model, and after parsing and validation pydantic guarantees that the fields of the resultant model instance will conform to the field types defined on the model.
Basic model usage⚑
from pydantic import BaseModel
class User(BaseModel):
id: int
name = "Jane Doe"
User
here is a model with two fields id
which is an integer and is required, and name
which is a string and is not required (it has a default value). The type of name
is inferred from the default value, and so a type annotation is not required.
user = User(id="123")
user
here is an instance of User
. Initialisation of the object will perform all parsing and validation, if no ValidationError
is raised, you know the resulting model instance is valid.
Model properties⚑
Models possess the following methods and attributes:
dict()
: returns a dictionary of the model's fields and values.
json()
: returns a JSON string representation dict()
.
copy()
: returns a deep copy of the model.
parse_obj()
: very similar to the __init__
method of the model, used to import objects from a dict rather than keyword arguments. If the object passed is not a dict a ValidationError
will be raised.
parse_raw()
: takes a str or bytes and parses it as json, then passes the result to parse_obj
.
parse_file()
: reads a file and passes the contents to parse_raw
. If content_type
is omitted, it is inferred from the file's extension.
from_orm()
: loads data into a model from an arbitrary class.
schema()
: returns a dictionary representing the model as JSON Schema.
schema_json()
: returns a JSON string representation of schema()
.
Recursive Models⚑
More complex hierarchical data structures can be defined using models themselves as types in annotations.
from typing import List
from pydantic import BaseModel
class Foo(BaseModel):
count: int
size: float = None
class Bar(BaseModel):
apple = "x"
banana = "y"
class Spam(BaseModel):
foo: Foo
bars: List[Bar]
m = Spam(foo={"count": 4}, bars=[{"apple": "x1"}, {"apple": "x2"}])
print(m)
# > foo=Foo(count=4, size=None) bars=[Bar(apple='x1', banana='y'),
# > Bar(apple='x2', banana='y')]
print(m.dict())
"""
{
'foo': {'count': 4, 'size': None},
'bars': [
{'apple': 'x1', 'banana': 'y'},
{'apple': 'x2', 'banana': 'y'},
],
}
"""
For self-referencing models, use postponed annotations.
Definition of two models that reference each other⚑
class A(BaseModel):
b: Optional["B"] = None
class B(BaseModel):
a: Optional[A] = None
A.update_forward_refs()
Although it doesn't work as expected!
Error Handling⚑
pydantic will raise ValidationError
whenever it finds an error in the data it's validating.
!!! note Validation code should not raise ValidationError
itself, but rather raise ValueError
, TypeError
or AssertionError
(or subclasses of ValueError
or TypeError
) which will be caught and used to populate ValidationError
.
One exception will be raised regardless of the number of errors found, that ValidationError
will contain information about all the errors and how they happened. It does not include however the data that produced the error. A nice way of showing it is to capture the error and print it yourself:
try:
model = Model(
state=state,
)
except ValidationError as error:
log.error(f'Error building model with state {state}')
raise error
This creates a message that does not include the data that generated the i
You can access these errors in a several ways:
e.errors()
: method will return list of errors found in the input data.
e.json()
: method will return a JSON representation of errors
.
str(e)
: method will return a human readable representation of the errors.
Each error object contains:
loc
: the error's location as a list. The first item in the list will be the field where the error occurred, and if the field is a sub-model, subsequent items will be present to indicate the nested location of the error.
type
: a computer-readable identifier of the error type.
msg
: a human readable explanation of the error.
ctx
: an optional object which contains values required to render the error message.
Custom Errors⚑
You can also define your own error classes, which can specify a custom error code, message template, and context:
from pydantic import BaseModel, PydanticValueError, ValidationError, validator
class NotABarError(PydanticValueError):
code = "not_a_bar"
msg_template = 'value is not "bar", got "{wrong_value}"'
class Model(BaseModel):
foo: str
@validator("foo")
def name_must_contain_space(cls, v):
if v != "bar":
raise NotABarError(wrong_value=v)
return v
try:
Model(foo="ber")
except ValidationError as e:
print(e.json())
"""
[
{
"loc": [
"foo"
],
"msg": "value is not \"bar\", got \"ber\"",
"type": "value_error.not_a_bar",
"ctx": {
"wrong_value": "ber"
}
}
]
"""
Dynamic model creation⚑
There are some occasions where the shape of a model is not known until runtime. For this pydantic provides the create_model
method to allow models to be created on the fly.
from pydantic import BaseModel, create_model
DynamicFoobarModel = create_model("DynamicFoobarModel", foo=(str, ...), bar=123)
class StaticFoobarModel(BaseModel):
foo: str
bar: int = 123
Here StaticFoobarModel
and DynamicFoobarModel
are identical.
Warning
Required Optional Fields for the distinct between an ellipsis as a field default and annotation only fields. See samuelcolvin/pydantic#1047 for more details.
Fields are defined by either a tuple of the form (<type>, <default value>)
or just a default value. The special key word arguments __config__
and __base__
can be used to customize the new model. This includes extending a base model with extra fields.
from pydantic import BaseModel, create_model
class FooModel(BaseModel):
foo: str
bar: int = 123
BarModel = create_model(
"BarModel",
apple="russet",
banana="yellow",
__base__=FooModel,
)
print(BarModel)
# > <class 'BarModel'>
print(BarModel.__fields__.keys())
# > dict_keys(['foo', 'bar', 'apple', 'banana'])
Abstract Base Classes⚑
Pydantic models can be used alongside Python's Abstract Base Classes (ABCs).
import abc
from pydantic import BaseModel
class FooBarModel(BaseModel, abc.ABC):
a: str
b: int
@abc.abstractmethod
def my_abstract_method(self):
pass
Field Ordering⚑
Field order is important in models for the following reasons:
- Validation is performed in the order fields are defined; fields validators can access the values of earlier fields, but not later ones
- Field order is preserved in the model schema
- Field order is preserved in validation errors
- Field order is preserved by
.dict()
and.json()
etc.
As of v1.0 all fields with annotations (whether annotation-only or with a default value) will precede all fields without an annotation. Within their respective groups, fields remain in the order they were defined.
Field with dynamic default value⚑
When declaring a field with a default value, you may want it to be dynamic (i.e. different for each model). To do this, you may want to use a default_factory
.
!!! info "In Beta" The default_factory
argument is in beta, it has been added to pydantic in v1.5 on a provisional basis. It may change significantly in future releases and its signature or behaviour will not be concrete until v2. Feedback from the community while it's still provisional would be extremely useful; either comment on #866 or create a new issue.
Example of usage:
from datetime import datetime
from uuid import UUID, uuid4
from pydantic import BaseModel, Field
class Model(BaseModel):
uid: UUID = Field(default_factory=uuid4)
updated: datetime = Field(default_factory=datetime.utcnow)
m1 = Model()
m2 = Model()
print(f"{m1.uid} != {m2.uid}")
# > 3b187763-a19c-4ed8-9588-387e224e04f1 != 0c58f97b-c8a7-4fe8-8550-e9b2b8026574
print(f"{m1.updated} != {m2.updated}")
# > 2020-07-15 20:01:48.451066 != 2020-07-15 20:01:48.451083
!!! warning The default_factory
expects the field type to be set. Moreover if you want to validate default values with validate_all
, pydantic will need to call the default_factory
, which could lead to side effects!
Field customization⚑
Optionally, the Field
function can be used to provide extra information about the field and validations. It has the following arguments:
default
: (a positional argument) the default value of the field. Since theField
replaces the field's default, this first argument can be used to set the default. Use ellipsis (...
) to indicate the field is required.default_factory
: a zero-argument callable that will be called when a default value is needed for this field. Among other purposes, this can be used to set dynamic default values. It is forbidden to set bothdefault
anddefault_factory
.alias
: the public name of the field.title
: if omitted,field_name.title()
is used.description
: if omitted and the annotation is a sub-model, the docstring of the sub-model will be used.const
: this argument must be the same as the field's default value if present.gt
: for numeric values (int
,float
,Decimal
), adds a validation of "greater than" and an annotation ofexclusiveMinimum
to the JSON Schema.ge
: for numeric values, this adds a validation of "greater than or equal" and an annotation of minimum to the JSON Schema.lt
: for numeric values, this adds a validation of "less than" and an annotation ofexclusiveMaximum
to the JSON Schema.le
: for numeric values, this adds a validation of "less than or equal" and an annotation of maximum to the JSON Schema.multiple_of
: for numeric values, this adds a validation of "a multiple of" and an annotation ofmultipleOf
to the JSON Schema.min_items
: for list values, this adds a corresponding validation and an annotation ofminItems
to the JSON Schema.max_items
: for list values, this adds a corresponding validation and an annotation ofmaxItems
to the JSON Schema.min_length
: for string values, this adds a corresponding validation and an annotation ofminLength
to the JSON Schema.max_length
: for string values, this adds a corresponding validation and an annotation ofmaxLength
to the JSON Schema.allow_mutation
: a boolean which defaults toTrue
. WhenFalse
, the field raises aTypeError
if the field is assigned on an instance. The model config must setvalidate_assignment
toTrue
for this check to be performed.regex
: for string values, this adds a Regular Expression validation generated from the passed string and an annotation of pattern to the JSON Schema.**
: any other keyword arguments (e.g.examples
) will be added verbatim to the field's schema.
!!! note pydantic validates strings using re.match
, which treats regular expressions as implicitly anchored at the beginning. On the contrary, JSON Schema validators treat the pattern keyword as implicitly unanchored, more like what re.search
does.
Instead of using Field
, the fields
property of the Config
class can be used to set all of the arguments above except default.
Parsing data into a specified type⚑
Pydantic includes a standalone utility function parse_obj_as
that can be used to apply the parsing logic used to populate pydantic models in a more ad-hoc way. This function behaves similarly to BaseModel.parse_obj
, but works with arbitrary pydantic-compatible types.
This is especially useful when you want to parse results into a type that is not a direct subclass of BaseModel
. For example:
from typing import List
from pydantic import BaseModel, parse_obj_as
class Item(BaseModel):
id: int
name: str
# `item_data` could come from an API call, eg., via something like:
# item_data = requests.get('https://my-api.com/items').json()
item_data = [{"id": 1, "name": "My Item"}]
items = parse_obj_as(List[Item], item_data)
print(items)
# > [Item(id=1, name='My Item')]
This function is capable of parsing data into any of the types pydantic can handle as fields of a BaseModel
.
Pydantic also includes a similar standalone function called parse_file_as
, which is analogous to BaseModel.parse_file
.
Data Conversion⚑
pydantic may cast input data to force it to conform to model field types, and in some cases this may result in a loss of information. For example:
from pydantic import BaseModel
class Model(BaseModel):
a: int
b: float
c: str
print(Model(a=3.1415, b=" 2.72 ", c=123).dict())
# > {'a': 3, 'b': 2.72, 'c': '123'}
This is a deliberate decision of pydantic, and in general it's the most useful approach. See here for a longer discussion on the subject.
Initialize attributes at object creation⚑
pydantic
recommends using root validators, but it's difficult to undestand how to do it and to debug the errors. You also don't have easy access to the default values of the model. I'd rather use the overwriting the __init__
method.
Overwriting the __init__
method⚑
class fish(BaseModel):
name: str
color: str
def __init__(self, **kwargs):
super().__init__(**kwargs)
print("Fish initialization successful!")
self.color=complex_function()
If you want to create part of the attributes you can use the next snippet
class Sqlite(BaseModel):
model_config = ConfigDict(arbitrary_types_allowed=True)
path: Path
db: sqlite3.Cursor
def __init__(self, **kwargs):
conn = sqlite3.connect(kwargs['path'])
kwargs['db'] = conn.cursor()
super().__init__(**kwargs)
Using root validators⚑
If you want to initialize attributes of the object automatically at object creation, similar of what you'd do with the __init__
method of the class, you need to use root_validators
.
from pydantic import root_validator
class PypikaRepository(BaseModel):
"""Implement the repository pattern using the Pypika query builder."""
connection: sqlite3.Connection
cursor: sqlite3.Cursor
class Config:
"""Configure the pydantic model."""
arbitrary_types_allowed = True
@root_validator(pre=True)
@classmethod
def set_connection(cls, values: Dict[str, Any]) -> Dict[str, Any]:
"""Set the connection to the database.
Raises:
ConnectionError: If there is no database file.
"""
database_file = values["database_url"].replace("sqlite:///", "")
if not os.path.isfile(database_file):
raise ConnectionError(f"There is no database file: {database_file}")
connection = sqlite3.connect(database_file)
values["connection"] = connection
values["cursor"] = connection.cursor()
return values
I had to set the arbitrary_types_allowed
because the sqlite3 objects are not between the pydantic object types.
Set private attributes⚑
If you want to define some attributes that are not part of the model use PrivateAttr
:
from datetime import datetime
from random import randint
from pydantic import BaseModel, PrivateAttr
class TimeAwareModel(BaseModel):
_processed_at: datetime = PrivateAttr(default_factory=datetime.now)
_secret_value: str = PrivateAttr()
def __init__(self, **data: Any) -> None:
super().__init__(**data)
# this could also be done with default_factory
self._secret_value = randint(1, 5)
m = TimeAwareModel()
print(m._processed_at)
# > 2021-03-03 17:30:04.030758
print(m._secret_value)
# > 5
Define fields to exclude from exporting at config level⚑
This won't be necessary once they release the version 1.9 because you can define the fields to exclude in the Config
of the model using something like:
class User(BaseModel):
id: int
username: str
password: str
class Transaction(BaseModel):
id: str
user: User
value: int
class Config:
fields = {
"value": {
"alias": "Amount",
"exclude": ...,
},
"user": {"exclude": {"username", "password"}},
"id": {"dump_alias": "external_id"},
}
The release it's taking its time because the developer's gremlin and salaried work are sucking his time off.
Update entity attributes with a dictionary⚑
To update a model with the data of a dictionary you can create a new object with the new data using the update
argument of the copy
method.
class FooBarModel(BaseModel):
banana: float
foo: str
m = FooBarModel(banana=3.14, foo="hello")
m.copy(update={"banana": 0})
Lazy loading attributes⚑
Currently there is no official support for lazy loading model attributes.
You can define your own properties but when you export the schema they won't appear there. dgasmith has a workaround though.
Load a pydantic model from json⚑
You can use the model_validate_json
method that will validate and return an object with the loaded data.
from datetime import date
from pydantic import BaseModel, ConfigDict, ValidationError
class Event(BaseModel):
model_config = ConfigDict(strict=True)
when: date
where: tuple[int, int]
json_data = '{"when": "1987-01-28", "where": [51, -1]}'
print(Event.model_validate_json(json_data))
#> when=datetime.date(1987, 1, 28) where=(51, -1)
try:
Event.model_validate({'when': '1987-01-28', 'where': [51, -1]})
except ValidationError as e:
print(e)
"""
2 validation errors for Event
when
Input should be a valid date [type=date_type, input_value='1987-01-28', input_type=str]
where
Input should be a valid tuple [type=tuple_type, input_value=[51, -1], input_type=list]
"""
Troubleshooting⚑
Ignore a field when representing an object⚑
Use repr=False
. This is useful for properties that don't return a value quickly, for example if you save an sh
background process.
class Temp(BaseModel):
foo: typing.Any
boo: typing.Any = Field(..., repr=False)
Copy produces copy that modifies the original⚑
When copying a model, changing the value of an attribute on the copy updates the value of the attribute on the original. This only happens if deep != True
. To fix it use: model.copy(deep=True)
.
E0611: No name 'BaseModel' in module 'pydantic'⚑
Add to your pyproject.toml the following lines:
# --------- Pylint -------------
[tool.pylint.'MESSAGES CONTROL']
extension-pkg-whitelist = "pydantic"
Or if it fails, add to the line # pylint: extension-pkg-whitelist
.