Skip to content

Python Snippets

Initialize a dataclass with kwargs

If you care about accessing attributes by name, or if you can't distinguish between known and unknown arguments during initialisation, then your last resort without rewriting __init__ (which pretty much defeats the purpose of using dataclasses in the first place) is writing a @classmethod:

from dataclasses import dataclass
from inspect import signature


@dataclass
class Container:
    user_id: int
    body: str

    @classmethod
    def from_kwargs(cls, **kwargs):
        # fetch the constructor's signature
        cls_fields = {field for field in signature(cls).parameters}

        # split the kwargs into native ones and new ones
        native_args, new_args = {}, {}
        for key, value in kwargs.items():
            if key in cls_fields:
                native_args[key] = value
            else:
                new_args[key] = value

        # use the native ones to create the class ...
        ret = cls(**native_args)

        # ... and add the new ones by hand
        for new_key, new_value in new_args.items():
            setattr(ret, new_key, new_value)
        return ret

Usage:

params = {'user_id': 1, 'body': 'foo', 'bar': 'baz', 'amount': 10}
Container(**params)  # still doesn't work, raises a TypeError
c = Container.from_kwargs(**params)
print(c.bar)  # prints: 'baz'

Replace a substring of a string

txt = "I like bananas"

x = txt.replace("bananas", "apples")

Parse an RFC2822 date

Interesting to test the accepted format of RSS dates.

>>> from email.utils import parsedate_to_datetime
>>> datestr = 'Sun, 09 Mar 1997 13:45:00 -0500'
>>> parsedate_to_datetime(datestr)
datetime.datetime(1997, 3, 9, 13, 45, tzinfo=datetime.timezone(datetime.timedelta(-1, 68400)))

Convert a datetime to RFC2822

Interesting as it's the accepted format of RSS dates.

>>> import datetime
>>> from email import utils
>>> nowdt = datetime.datetime.now()
>>> utils.format_datetime(nowdt)
'Tue, 10 Feb 2020 10:06:53 -0000'

Encode url

import urllib.parse
from pydantic import AnyHttpUrl

def _normalize_url(url: str) -> AnyHttpUrl:
    """Encode url to make it compatible with AnyHttpUrl."""
    return typing.cast(
        AnyHttpUrl,
        urllib.parse.quote(url, ":/"),
    )

The :/ is needed when you try to parse urls that have the protocol, otherwise https://www. gets transformed into https%3A//www..

Fix SIM113 Use enumerate

Use enumerate to get a running number over an iterable.

# Bad
idx = 0
for el in iterable:
    ...
    idx += 1

# Good
for idx, el in enumerate(iterable):
    ...

Define a property of a class

If you're using Python 3.9 or above you can directly use the decorators:

class G:
    @classmethod
    @property
    def __doc__(cls):
        return f'A doc for {cls.__name__!r}'

If you're not, you can define the decorator classproperty:

# N801: class name 'classproperty' should use CapWords convention, but it's a decorator.
# C0103: Class name "classproperty" doesn't conform to PascalCase naming style but it's
# a decorator.
class classproperty:  # noqa: N801, C0103
    """Define a class property.

    From Python 3.9 you can directly use the decorators directly.

    class G:
        @classmethod
        @property
        def __doc__(cls):
            return f'A doc for {cls.__name__!r}'
    """

    def __init__(self, function: Callable[..., Any]) -> None:
        """Initialize the decorator."""
        self.function = function

    # ANN401: Any not allowed in typings, but I don't know how to narrow the hints in
    # this case.
    def __get__(self, owner_self: Any, owner_cls: Any) -> Any:  # noqa: ANN401
        """Return the desired value."""
        return self.function(owner_self)

But you'll run into the W0143: Comparing against a callable, did you omit the parenthesis? (comparison-with-callable) mypy error when using it to compare the result of the property with anything, as it doesn't detect it's a property instead of a method.

How to close a subprocess process

subprocess.terminate()

How to extend a dictionary

a.update(b)

How to Find Duplicates in a List in Python

numbers = [1, 2, 3, 2, 5, 3, 3, 5, 6, 3, 4, 5, 7]

duplicates = [number for number in numbers if numbers.count(number) > 1]
unique_duplicates = list(set(duplicates))

# Returns: [2, 3, 5]

If you want to count the number of occurrences of each duplicate, you can use:

from collections import Counter
numbers = [1, 2, 3, 2, 5, 3, 3, 5, 6, 3, 4, 5, 7]

counts = dict(Counter(numbers))
duplicates = {key:value for key, value in counts.items() if value > 1}

# Returns: {2: 2, 3: 4, 5: 3}

To remove the duplicates use a combination of list and set:

unique = list(set(numbers))

# Returns: [1, 2, 3, 4, 5, 6, 7]

How to decompress a gz file

import gzip
import shutil
with gzip.open('file.txt.gz', 'rb') as f_in:
    with open('file.txt', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

How to compress/decompress a tar file

def compress(tar_file, members):
    """
    Adds files (`members`) to a tar_file and compress it
    """
    tar = tarfile.open(tar_file, mode="w:gz")

    for member in members:
        tar.add(member)

    tar.close()

def decompress(tar_file, path, members=None):
    """
    Extracts `tar_file` and puts the `members` to `path`.
    If members is None, all members on `tar_file` will be extracted.
    """
    tar = tarfile.open(tar_file, mode="r:gz")
    if members is None:
        members = tar.getmembers()
    for member in members:
        tar.extract(member, path=path)
    tar.close()

Parse XML file with beautifulsoup

You need both beautifulsoup4 and lxml:

bs = BeautifulSoup(requests.get(url), "lxml")

Get a traceback from an exception

import traceback

# `e` is an exception object that you get from somewhere
traceback_str = ''.join(traceback.format_tb(e.__traceback__))

Change the logging level of a library

For example to change the logging level of the library sh use:

sh_logger = logging.getLogger("sh")
sh_logger.setLevel(logging.WARN)

Get all subdirectories of a directory

[x[0] for x in os.walk(directory)]

Move a file

import os

os.rename("path/to/current/file.foo", "path/to/new/destination/for/file.foo")

IPv4 regular expression

regex = re.compile(
    r"(?<![-\.\d])(?:0{0,2}?[0-9]\.|1\d?\d?\.|2[0-5]?[0-5]?\.){3}"
    r'(?:0{0,2}?[0-9]|1\d?\d?|2[0-5]?[0-5]?)(?![\.\d])"^[0-9]{1,3}*$'
)

[Remove the elements of a list from

another](https://stackoverflow.com/questions/4211209/remove-all-the-elements-that-occur-in-one-list-from-another)

>>> set([1,2,6,8]) - set([2,3,5,8])
set([1, 6])

Note, however, that sets do not preserve the order of elements, and cause any duplicated elements to be removed. The elements also need to be hashable. If these restrictions are tolerable, this may often be the simplest and highest performance option.

Copy a directory

import shutil

shutil.copytree('bar', 'foo')

Copy a file

import shutil

shutil.copyfile(src_file, dest_file)

Capture the stdout of a function

import io
from contextlib import redirect_stdout

f = io.StringIO()
with redirect_stdout(f):
    do_something(my_object)
out = f.getvalue()

[Make temporal

directory](https://stackoverflow.com/questions/3223604/how-to-create-a-temporary-directory-and-get-its-path-file-name)

import tempfile

dirpath = tempfile.mkdtemp()

Change the working directory of a test

The following function-level fixture will change to the test case directory, run the test (yield), then change back to the calling directory to avoid side-effects.

@pytest.fixture(name="change_test_dir")
def change_test_dir_(request: SubRequest) -> Any:
    os.chdir(request.fspath.dirname)
    yield
    os.chdir(request.config.invocation_dir)
  • request is a built-in pytest fixture
  • fspath is the LocalPath to the test module being executed
  • dirname is the directory of the test module
  • request.config.invocationdir is the folder from which pytest was executed
  • request.config.rootdir is the pytest root, doesn't change based on where you run pytest. Not used here, but could be useful.

Any processes that are kicked off by the test will use the test case folder as their working directory and copy their logs, outputs, etc. there, regardless of where the test suite was executed.

Remove a substring from the end of a string

On Python 3.9 and newer you can use the removeprefix and removesuffix methods to remove an entire substring from either side of the string:

url = 'abcdc.com'
url.removesuffix('.com')    # Returns 'abcdc'
url.removeprefix('abcdc.')  # Returns 'com'

On Python 3.8 and older you can use endswith and slicing:

url = 'abcdc.com'
if url.endswith('.com'):
    url = url[:-4]

Or a regular expression:

import re
url = 'abcdc.com'
url = re.sub('\.com$', '', url)

Make a flat list of lists with a list comprehension

There is no nice way to do it :(. The best I've found is:

t = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
flat_list = [item for sublist in t for item in sublist]

Replace all characters of a string with another character

mystring = '_'*len(mystring)

Locate element in list

a = ['a', 'b']

index = a.index('b')

Transpose a list of lists

>>> l=[[1,2,3],[4,5,6],[7,8,9]]
>>> [list(i) for i in zip(*l)]
... [[1, 4, 7], [2, 5, 8], [3, 6, 9]]

Check the type of a list of strings

def _is_list_of_lists(data: Any) -> bool:
    """Check if data is a list of strings."""
    if data and isinstance(data, list):
        return all(isinstance(elem, list) for elem in data)
    else:
        return False

Install default directories and files for a command line program

I've been trying for a long time to configure setup.py to run the required steps to configure the required directories and files when doing pip install without success.

Finally, I decided that the program itself should create the data once the FileNotFoundError exception is found. That way, you don't penalize the load time because if the file or directory exists, that code is not run.

Check if a dictionary is a subset of another

If you have two dictionaries big = {'a': 1, 'b': 2, 'c':3} and small = {'c': 3, 'a': 1}, and want to check whether small is a subset of big, use the next snippet:

>>> small.items() <= big.items()
True

As the code is not very common or intuitive, I'd add a comment to explain what you're doing.

When to use isinstance and when to use type

isinstance takes into account inheritance, while type doesn't. So if we have the next code:

class Shape:
    pass

class Rectangle(Shape):
    def __init__(self, length, width):
        self.length = length
        self.width = width
        self.area = length * width

    def get_area(self):
        return self.length * self.width

class Square(Rectangle):
    def __init__(self,length):
        Rectangle.__init__(self,length,length)

And we want to check if an object a = Square(5) is of type Rectangle, we could not use isinstance because it'll return True as it's a subclass of Rectangle:

>>> isinstance(a, Rectangle)
True

Instead, use a comparison with type:

>>> type(a) == Rectangle
False

Find a static file of a python module

Useful when you want to initialize a configuration file of a cli program when it's not present.

Imagine you have a setup.py with the next contents:

setup(
    name="pynbox",
    packages=find_packages("src"),
    package_dir={"": "src"},
    package_data={"pynbox": ["py.typed", "assets/config.yaml"]},

Then you could import the data with:

import pkg_resources

file_path = pkg_resources.resource_filename("pynbox", "assets/config.yaml"),

Delete a file

import os
os.remove('demofile.txt')

Measure elapsed time between lines of code

import time

start = time.time()
print("hello")
end = time.time()
print(end - start)

Create combination of elements in groups of two

Using the combinations function in Python's itertools module:

>>> list(itertools.combinations('ABC', 2))
[('A', 'B'), ('A', 'C'), ('B', 'C')]

If you want the permutations use itertools.permutations.

Convert html to readable plaintext

pip install html2text
import html2text
html = open("foobar.html").read()
print(html2text.html2text(html))

Parse a datetime from a string

from dateutil import parser
parser.parse("Aug 28 1999 12:00AM")  # datetime.datetime(1999, 8, 28, 0, 0)

Install a python dependency from a git repository

With pip you can:

pip install git+git://github.com/path/to/repository@master

If you want to hard code it in your setup.py, you need to:

install_requires = [
  'some-pkg @ git+ssh://git@github.com/someorgname/pkg-repo-name@v1.1#egg=some-pkg',
]

But Pypi won't allow you to upload the package, as it will give you an error:

HTTPError: 400 Bad Request from https://test.pypi.org/legacy/
Invalid value for requires_dist. Error: Can't have direct dependency: 'deepdiff @ git+git://github.com/lyz-code/deepdiff@master'

It looks like this is a conscious decision on the PyPI side. Basically, they don't want pip to reach out to URLs outside their site when installing from PyPI.

An ugly patch is to install the dependencies in a PostInstall custom script in the setup.py of your program:

from setuptools.command.install import install
from subprocess import getoutput

# ignore: cannot subclass install, has type Any. And what would you do?
class PostInstall(install):  # type: ignore
    """Install direct dependency.

    Pypi doesn't allow uploading packages with direct dependencies, so we need to
    install them manually.
    """

    def run(self) -> None:
        """Install dependencies."""
        install.run(self)
        print(getoutput("pip install git+git://github.com/lyz-code/deepdiff@master"))

setup(
    cmdclass={'install': PostInstall}
)

It may not work!

Last time I used this solution, when I added the library on a setup.py the direct dependencies weren't installed :S

Check or test directories and files

def test_dir(directory):
    from os.path import exists
    from os import makedirs
    if not exists(directory):
        makedirs(directory)


def test_file(filepath, mode):
    ''' Check if a file exist and is accessible. '''

    def check_mode(os_mode, mode):
        if os.path.isfile(filepath) and os.access(filepath, os_mode):
            return
        else:
            raise IOError("Can't access the file with mode " + mode)

    if mode is 'r':
        check_mode(os.R_OK, mode)
    elif mode is 'w':
        check_mode(os.W_OK, mode)
    elif mode is 'a':
        check_mode(os.R_OK, mode)
        check_mode(os.W_OK, mode)

Remove the extension of a file

os.path.splitext("/path/to/some/file.txt")[0]

Iterate over the files of a directory

import os

directory = '/path/to/directory'
for entry in os.scandir(directory):
    if (entry.path.endswith(".jpg")
            or entry.path.endswith(".png")) and entry.is_file():
        print(entry.path)

Create directory

if not os.path.exists(directory):
    os.makedirs(directory)

Touch a file

from pathlib import Path

Path('path/to/file.txt').touch()

Get the first day of next month

current = datetime.datetime(mydate.year, mydate.month, 1)
next_month = datetime.datetime(mydate.year + int(mydate.month / 12), ((mydate.month % 12) + 1), 1)

Get the week number of a datetime

datetime.datetime has a isocalendar() method, which returns a tuple containing the calendar week:

>>> import datetime
>>> datetime.datetime(2010, 6, 16).isocalendar()[1]
24

datetime.date.isocalendar() is an instance-method returning a tuple containing year, weeknumber and weekday in respective order for the given date instance.

Get the monday of a week number

A week number is not enough to generate a date; you need a day of the week as well. Add a default:

import datetime
d = "2013-W26"
r = datetime.datetime.strptime(d + '-1', "%Y-W%W-%w")

The -1 and -%w pattern tells the parser to pick the Monday in that week.

Get the month name from a number

import calendar

>> calendar.month_name[3]
'March'

Get ordinal from number

def int_to_ordinal(number: int) -> str:
    '''Convert an integer into its ordinal representation.

    make_ordinal(0)   => '0th'
    make_ordinal(3)   => '3rd'
    make_ordinal(122) => '122nd'
    make_ordinal(213) => '213th'

    Args:
        number: Number to convert

    Returns:
        ordinal representation of the number
    '''
    suffix = ['th', 'st', 'nd', 'rd', 'th'][min(number % 10, 4)]
    if 11 <= (number % 100) <= 13:
        suffix = 'th'
    return f"{number}{suffix}"

Group or sort a list of dictionaries or objects by a specific key

Python lists have a built-in list.sort() method that modifies the list in-place. There is also a sorted() built-in function that builds a new sorted list from an iterable.

Sorting basics

A simple ascending sort is very easy: just call the sorted() function. It returns a new sorted list:

>>> sorted([5, 2, 3, 1, 4])
[1, 2, 3, 4, 5]

Key functions

Both list.sort() and sorted() have a key parameter to specify a function (or other callable) to be called on each list element prior to making comparisons.

For example, here’s a case-insensitive string comparison:

>>> sorted("This is a test string from Andrew".split(), key=str.lower)
['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']

The value of the key parameter should be a function (or other callable) that takes a single argument and returns a key to use for sorting purposes. This technique is fast because the key function is called exactly once for each input record.

A common pattern is to sort complex objects using some of the object’s indices as keys. For example:

>>> from operator import itemgetter
>>> student_tuples = [
    ('john', 'A', 15),
    ('jane', 'B', 12),
    ('dave', 'B', 10),
]

>>> sorted(student_tuples, key=itemgetter(2))   # sort by age
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

The same technique works for objects with named attributes. For example:

>>> from operator import attrgetter
>>> class Student:
    def __init__(self, name, grade, age):
        self.name = name
        self.grade = grade
        self.age = age

    def __repr__(self):
        return repr((self.name, self.grade, self.age))

>>> student_objects = [
    Student('john', 'A', 15),
    Student('jane', 'B', 12),
    Student('dave', 'B', 10),
]

>>> sorted(student_objects, key=attrgetter('age'))   # sort by age
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

The operator module functions allow multiple levels of sorting. For example, to sort by grade then by age:

>>> sorted(student_tuples, key=itemgetter(1,2))
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

>>> sorted(student_objects, key=attrgetter('grade', 'age'))
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

Sorts stability and complex sorts

Sorts are guaranteed to be stable. That means that when multiple records have the same key, their original order is preserved.

>>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]

>>> sorted(data, key=itemgetter(0))
[('blue', 1), ('blue', 2), ('red', 1), ('red', 2)]

Notice how the two records for blue retain their original order so that ('blue', 1) is guaranteed to precede ('blue', 2).

This wonderful property lets you build complex sorts in a series of sorting steps. For example, to sort the student data by descending grade and then ascending age, do the age sort first and then sort again using grade:

>>> s = sorted(student_objects, key=attrgetter('age'))     # sort on secondary key

>>> sorted(s, key=attrgetter('grade'), reverse=True)       # now sort on primary key, descending
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

This can be abstracted out into a wrapper function that can take a list and tuples of field and order to sort them on multiple passes.

>>> def multisort(xs, specs):
    for key, reverse in reversed(specs):
        xs.sort(key=attrgetter(key), reverse=reverse)
    return xs

>>> multisort(list(student_objects), (('grade', True), ('age', False)))
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

Get the attribute of an attribute

To sort the list in place:

ut.sort(key=lambda x: x.count, reverse=True)

To return a new list, use the sorted() built-in function:

newlist = sorted(ut, key=lambda x: x.body.id_, reverse=True)

Iterate over an instance object's data attributes in Python

@dataclass(frozen=True)
class Search:
    center: str
    distance: str

se = Search('a', 'b')
for key, value in se.__dict__.items():
   print(key, value)

Generate ssh key

pip install cryptography
from os import chmod
from cryptography.hazmat.primitives import serialization
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.backends import default_backend as crypto_default_backend

private_key = rsa.generate_private_key(
    backend=crypto_default_backend(),
    public_exponent=65537,
    key_size=4096
)
pem = private_key.private_bytes(
    encoding=serialization.Encoding.PEM,
    format=serialization.PrivateFormat.TraditionalOpenSSL,
    encryption_algorithm=serialization.NoEncryption()
)

with open("/tmp/private.key", 'wb') as content_file:
    chmod("/tmp/private.key", 0600)
    content_file.write(pem)

public_key = (
    private_key.public_key().public_bytes(
        encoding=serialization.Encoding.OpenSSH,
        format=serialization.PublicFormat.OpenSSH,
    )
    + b' user@email.org'
)
with open("/tmp/public.key", 'wb') as content_file:
    content_file.write(public_key)

Make multiline code look clean

If you need variables that contain multiline strings inside functions or methods you need to remove the indentation

def test():
    # end first line with \ to avoid the empty line!
    s = '''\
hello
  world
'''

Which is inconvenient as it breaks some editor source code folding and it's ugly for the eye.

The solution is to use textwrap.dedent()

import textwrap

def test():
    # end first line with \ to avoid the empty line!
    s = '''\
    hello
      world
    '''
    print(repr(s))          # prints '    hello\n      world\n    '
    print(repr(textwrap.dedent(s)))  # prints 'hello\n  world\n'

If you forget to add the trailing \ character of s = '''\ or use s = '''hello, you're going to have a bad time with black.

Play a sound

pip install playsound
from playsound import playsound
playsound('path/to/file.wav')

Deep copy a dictionary

import copy
d = { ... }
d2 = copy.deepcopy(d)

Find the root directory of a package

pyprojroot finds the root working directory for your project as a pathlib object. You can now use the here function to pass in a relative path from the project root directory (no matter what working directory you are in the project), and you will get a full path to the specified file.

Installation

pip install pyprojroot

Usage

from pyprojroot import here

here()

Check if an object has an attribute

if hasattr(a, 'property'):
    a.property

Check if a loop ends completely

for loops can take an else block which is not run if the loop has ended with a break statement.

for i in [1,2,3]:
    print(i)
    if i==3:
        break
else:
    print("for loop was not broken")

Merge two lists

z = x + y

Merge two dictionaries

z = {**x, **y}

Create user defined exceptions

Programs may name their own exceptions by creating a new exception class. Exceptions should typically be derived from the Exception class, either directly or indirectly.

Exception classes are meant to be kept simple, only offering a number of attributes that allow information about the error to be extracted by handlers for the exception. When creating a module that can raise several distinct errors, a common practice is to create a base class for exceptions defined by that module, and subclass that to create specific exception classes for different error conditions:

class Error(Exception):
    """Base class for exceptions in this module."""


class ConceptNotFoundError(Error):
    """Transactions with unmatched concept."""

    def __init__(self, message: str, transactions: List[Transaction]) -> None:
        """Initialize the exception."""
        self.message = message
        self.transactions = transactions
        super().__init__(self.message)

Most exceptions are defined with names that end in “Error”, similar to the naming of the standard exceptions.

Import a module or it's objects from within a python program

import importlib

module = importlib.import_module('os')
module_class = module.getcwd

relative_module = importlib.import_module('.model', package='mypackage')
class_to_extract = 'MyModel'
extracted_class = geattr(relative_module, class_to_extract)

The first argument specifies what module to import in absolute or relative terms (e.g. either pkg.mod or ..mod). If the name is specified in relative terms, then the package argument must be set to the name of the package which is to act as the anchor for resolving the package name (e.g. import_module('..mod', 'pkg.subpkg') will import pkg.mod).

Get system's timezone and use it in datetime

To obtain timezone information in the form of a datetime.tzinfo object, use dateutil.tz.tzlocal():

from dateutil import tz
myTimeZone = tz.tzlocal()

This object can be used in the tz parameter of datetime.datetime.now():

from datetime import datetime
from dateutil import tz
localisedDatetime = datetime.now(tz = tz.tzlocal())

Capitalize a sentence

To change the caps of the first letter of the first word of a sentence use:

>> sentence = "add funny Emojis"
>> sentence[0].upper() + sentence[1:]
Add funny Emojis

The .capitalize method transforms the rest of words to lowercase. The .title transforms all sentence words to capitalize.

Get the last monday datetime

import datetime

today = datetime.date.today()
last_monday = today - datetime.timedelta(days=today.weekday())

Issues


Last update: 2022-08-01