SQLAlchemy
SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.
I discourage you to use an ORM to manage the interactions with the database. Check the alternative solutions.
Creating an SQL Schema⚑
First of all it's important to create a diagram with the database structure, it will help you in the design and it's a great improvement of the project's documentation. I usually use ondras wwwsqldesigner through his demo, as it's easy to use and it's possible to save the data in your repository in an xml file.
I assume you've already set up your project to support sqlalchemy in your project. If not, do so before moving forward.
Mapping styles⚑
Modern SQLAlchemy features two distinct styles of mapper configuration. The “Classical” style is SQLAlchemy’s original mapping API, whereas “Declarative” is the richer and more succinct system that builds on top of “Classical”.
The Classical, on the other hand, doesn't lock your models to the ORM, something that we avoid when using the repository pattern.
If you aren't going to use the repository pattern, use the declarative way, otherwise use the classical one
Creating Tables⚑
If you simply want to create a table of association without any parameters, such as with a many to many relationship association table, use this type of object.
Declarative type⚑
class User(Base):
"""
Class to define the User model.
"""
__tablename__ = 'user'
id = Column(Integer, primary_key=True, doc='User ID')
name = Column(String, doc='User name')
def __init__(
self,
id,
name=None,
):
self.id = id
self.name = name
There are different types of fields to add to a table:
- Boolean:
is_true = Column(Boolean)
. - Datetime:
created_date = Column(DateTime, doc='Date of creation')
. - Float:
score = Column(Float)
- Integer:
id = Column(Integer, primary_key=True, doc='Source ID')
. - String:
title = Column(String)
. - Text:
long_text = Column(Text)
.
To make sure that a field can't contain nulls
set the nullable=False
attribute in the definition of the Column
. If you want the contents to be unique use unique=True
.
If you want to use the Mysql driver of SQLAlchemy make sure to specify the length of the colums, for example String(16)
. For reference this are the common lengths:
- url: 2083
- name: 64 (it occupies the same 2 and 255).
- email: 64 (it occupies the same 2 and 255).
- username: 64 (it occupies the same 2 and 255).
Classical type⚑
File: model.py
class User():
def __init__(self, id, name=None):
self.id = id
self.name = name
File: orm.py
from models import User
from sqlalchemy import (
Column,
MetaData,
String,
Table,
Text,
)
metadata = MetaData()
user = Table(
"user",
metadata,
Column("id", String(64), primary_key=True),
Column("name", String(64)),
)
def start_mappers():
mapper(User, user)
Creating relationships⚑
Joined table inheritance⚑
In joined table inheritance, each class along a hierarchy of classes is represented by a distinct table. Querying for a particular subclass in the hierarchy will render as a SQL JOIN
along all tables in its inheritance path. If the queried class is the base class, the default behavior is to include only the base table in a SELECT
statement. In all cases, the ultimate class to instantiate for a given row is determined by a discriminator column or an expression that works against the base table. When a subclass is loaded only against a base table, resulting objects by default will have base attributes populated at first; attributes that are local to the subclass will lazy load when they are accessed.
The base class in a joined inheritance hierarchy is configured with additional arguments that will refer to the polymorphic discriminator column as well as the identifier for the base class.
Declarative⚑
class Employee(Base):
__tablename__ = 'employee'
id = Column(Integer, primary_key=True)
name = Column(String(50))
type = Column(String(50))
__mapper_args__ = {
'polymorphic_identity': 'employee',
'polymorphic_on': type
}
class Engineer(Employee):
__tablename__ = 'engineer'
id = Column(Integer, ForeignKey('employee.id'), primary_key=True)
engineer_name = Column(String(30))
__mapper_args__ = {
'polymorphic_identity': 'engineer',
}
class Manager(Employee):
__tablename__ = 'manager'
id = Column(Integer, ForeignKey('employee.id'), primary_key=True)
manager_name = Column(String(30))
__mapper_args__ = {
'polymorphic_identity': 'manager',
}
Classical⚑
File: model.py
class Employee:
def __init__(self, name):
self.name = name
class Manager(Employee):
def __init__(self, name, manager_data):
super().__init__(name)
self.manager_data = manager_data
class Engineer(Employee):
def __init__(self, name, engineer_info):
super().__init__(name)
self.engineer_info = engineer_info
File: orm.py
metadata = MetaData()
employee = Table(
'employee',
metadata,
Column('id', Integer, primary_key=True),
Column('name', String(50)),
Column('type', String(20)),
Column('manager_data', String(50)),
Column('engineer_info', String(50))
)
mapper(Employee, employee,
polymorphic_on=employee.c.type,
polymorphic_identity='employee',
exclude_properties={'engineer_info', 'manager_data'})
mapper(Manager,
inherits=Employee,
polymorphic_identity='manager',
exclude_properties={'engineer_info'})
mapper(Engineer,
inherits=Employee,
polymorphic_identity='engineer',
exclude_properties={'manager_data'})
One to many⚑
from sqlalchemy.orm import relationship
class User(db.Model):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
posts = relationship('Post', back_populates='user')
class Post(db.Model):
id = Column(Integer, primary_key=True)
body = Column(String(140))
user_id = Column(Integer, ForeignKey('user.id'))
user = relationship('User', back_populates='posts')
In the tests of the Post
class, only check that the user
attribute is present.
Factoryboy supports the creation of Dependent objects direct ForeignKey.
Self referenced one to many⚑
class Task(Base):
__tablename__ = 'task'
id = Column(String, primary_key=True, doc='fulid of creation')
parent_id = Column(String, ForeignKey('task.id'))
parent = relationship('Task', remote_side=[id], backref='children')
Many to many⚑
# Association tables
source_has_category = Table(
'source_has_category',
Base.metadata,
Column('source_id', Integer, ForeignKey('source.id')),
Column('category_id', Integer, ForeignKey('category.id'))
)
# Tables
class Category(Base):
__tablename__ = 'category'
id = Column(String, primary_key=True)
contents = relationship(
'Content',
back_populates='categories',
secondary=source_has_category,
)
class Content(Base):
__tablename__ = 'content'
id = Column(Integer, primary_key=True, doc='Content ID')
categories = relationship(
'Category',
back_populates='contents',
secondary=source_has_category,
)
Self referenced many to many⚑
Using the followers
table as an association table.
followers = db.Table(
'followers',
Base.metadata,
Column('follower_id', Integer, ForeignKey('user.id')),
Column('followed_id', Integer, ForeignKey('user.id')),
)
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
followed = relationship(
'User',
secondary=followers,
primaryjoin=(followers.c.follower_id == id),
secondaryjoin=(followers.c.followed_id == id),
backref=db.backref('followers', lazy='dynamic'),
lazy='dynamic',
)
User
: Is the right side entity of the relationship. Since this is a self-referential relationship, The same class must be used on both sides.secondary
: configures the association table that is used for this relationship.primaryjoin
: Indicates the condition that links the left side entity (the follower user) with the association table. The join condition for the left side of the relationship is the user id matching thefollower_id
field of the association table. Thefollowers.c.follower_id
expression references thefollower_id
column of the association table.secondaryjoin
: Indicates the condition that links the right side entity (the followed user) with the association table. This condition is similar to the one forprimaryjoin
.backref
: Defines how this relationship will be accessed from the right side entity. From the left side, the relationship is namedfollowed
, so from the right side, the namefollowers
represent all the left side users that are linked to the target user in the right side. The additionallazy
argument indicates the execution mode for this query. A mode ofdynamic
sets up the query not to run until specifically requested.lazy
: same as withbackref
, but this one applies to the left side query instead of the right side.
Testing SQLAlchemy Code⚑
The definition of the database can be tested, I usually use them to test that the attributes are loaded and that the factory objects work as expected. Several steps need to be set to make it work:
-
Create the factory boy objects in
tests/factories.py
. -
Configure the tests to use a temporal sqlite database in the
tests/conftest.py
file with the following contents (changing{{ program_name }}
):from alembic.command import upgrade from alembic.config import Config from sqlalchemy.orm import sessionmaker import os import pytest import tempfile temp_ddbb = tempfile.mkstemp()[1] os.environ['{{ program_name }} _DATABASE_URL'] = 'sqlite:///{}'.format(temp_ddbb) # It needs to be after the environmental variable from {{ program_name }}.models import engine from tests import factories @pytest.fixture(scope='module') def connection(): ''' Fixture to set up the connection to the temporal database, the path is stablished at conftest.py ''' # Create database connection connection = engine.connect() # Applies all alembic migrations. config = Config('{{ program_name }}/migrations/alembic.ini') upgrade(config, 'head') # End of setUp yield connection # Start of tearDown connection.close() @pytest.fixture(scope='function') def session(connection): ''' Fixture to set up the sqlalchemy session of the database. ''' # Begin a non-ORM transaction and bind session transaction = connection.begin() session = sessionmaker()(bind=connection) factories.UserFactory._meta.sqlalchemy_session = session yield session # Close session and rollback transaction session.close() transaction.rollback()
-
Define an abstract base test class
BaseModelTest
defined as following in thetests/unit/test_models.py
file.
from {{ program_name }} import models
from tests import factories
import pytest
class BaseModelTest:
"""
Abstract base test class to refactor model tests.
The Children classes must define the following attributes:
self.model: The model object to test.
self.dummy_instance: A factory object of the model to test.
self.model_attributes: List of model attributes to test
Public attributes:
dummy_instance (Factory_boy object): Dummy instance of the model.
"""
@pytest.fixture(autouse=True)
def base_setup(self, session):
self.session = session
def test_attributes_defined(self):
for attribute in self.model_attributes:
assert getattr(self.model, attribute) == \
getattr(self.dummy_instance, attribute)
@pytest.mark.usefixtures('base_setup')
class TestUser(BaseModelTest):
@pytest.fixture(autouse=True)
def setup(self, session):
self.factory = factories.UserFactory
self.dummy_instance = self.factory.create()
self.model = models.User(
id=self.dummy_instance.id,
name=self.dummy_instance.name,
)
self.model_attributes = [
'name',
'id',
]
- Then create the
models
table. - Create an alembic revision
- Run
pytest
:python -m pytest
.
Exporting database to json⚑
import json
def dump_sqlalchemy(output_connection_string,output_schema):
""" Returns the entire content of a database as lists of dicts"""
engine = create_engine(f'{output_connection_string}{output_schema}')
meta = MetaData()
meta.reflect(bind=engine) # http://docs.sqlalchemy.org/en/rel_0_9/core/reflection.html
result = {}
for table in meta.sorted_tables:
result[table.name] = [dict(row) for row in engine.execute(table.select())]
return json.dumps(result)
Cloning an SQLAlchemy object⚑
The following function:
- Copies all the non-primary-key columns from the input model to a new model instance.
- Allows definition of specific arguments.
- Leaves the original model object unmodified.
def clone_model(model, **kwargs):
"""Clone an arbitrary sqlalchemy model object without its primary key values."""
table = model.__table__
non_primary_key_columns = [
column_name
for column_name in table.__mapper__.attrs..keys()
if column_name not in table.primary_key
]
data = {
column_name: getattr(model, column_name)
for column_name in non_pk_columns
}
data.update(kwargs)
return model.__class__(**data)