Skip to content

Profiling

Profiling is to find out where your code spends its time. Profilers can collect several types of information: timing, function calls, interruptions or cache faults. It can be useful to identify bottlenecks, which should be the first step when trying to optimize some code, or to study the evolution of the performance of your code.

Profiling types

There are two types of profiling:

Deterministic Profiling
All events are monitored. It provides accurate information but has a big impact on performance (overhead). It means the code runs slower under profiling. Its use in production systems is often impractical. This type of profiling is suitable for small functions.
Statistical profiling
Sampling the execution state at regular intervals to compute indicators. This method is less accurate, but it also reduces the overhead.

Profiling tools

The profiling tools you should use vary with the code you are working on. If you are writing a single algorithm or a small program, you should use a simple profiler like cProfile or even a fine-grained tool like line_profiler. In contrast, when you are optimizing a whole program, you may want to use a statistical profiler to avoid overhead, such as pyinstrument, or if you're debugging a running process, using py-spy.

Deterministic Profiling

cProfile

Python comes with two built-in modules for deterministic profiling: cProfile and profile. Both are different implementations of the same interface. The former is a C extension with relatively small overhead, and the latter is a pure Python module. As the official documentation says, the module profile would be suitable when we want to extend the profiler in some way. Otherwise, cProfile is preferred for long-running programs. Unfortunately, there is no built-in module for statistical profiling, but we will see some external packages for it.

$: python3 -m cProfile script.py

58 function calls in 9.419 seconds

Ordered by: standard namen

calls  tottime  percall  cumtime  percall filename:lineno(function)
   1    0.000    0.000    9.419    9.419 part1.py:1(<module>)
   51   9.419    0.185    9.419    0.185 part1.py:1(computation)
   1    0.000    0.000    9.419    9.419 part1.py:10(function1)
   1    0.000    0.000    9.243    9.243 part1.py:15(function2)
   1    0.000    0.000    0.176    0.176 part1.py:20(function3)
   1    0.000    0.000    9.419    9.419 part1.py:24(main)

Where:

ncalls
Is the number of calls. We should try to optimize functions that have a lot of calls or consume too much time per call.
tottime
The total time spent in the function itself, excluding sub calls. This is where we should look closely at. We can see that the function computation is called 51 times, and each time consumes 0.185s.
cumtime
Cumulative time. It includes sub calls.
percall
We have two “per call” metrics. The first one: total time per call, and the second one: cumulative time per call. Again, we should focus on the total time metric.

We can also sort the functions by some criteria, for example python3 -m cProfile -s tottime script.py.

Statistical profiling

Py-spy

Py-Spy is a statistical (sampling) profiler that lets you visualize the time each function consumes during the execution. An important feature is that you can attach the profiler without restarting the program or modifying the code, and has a low overhead. This makes the tool highly suitable for production code.

To install it, just type:

pip install py-spy

To test the performance of a file use:

py-spy top python3 script.py

To assess the performance of a runnin process, specify it's PID:

py-spy top --pid $PID

They will show a top like interface showing the following data:

GIL: 100.00%, Active: 100.00%, Threads: 1

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)
 61.00%  61.00%   10.50s    10.50s   computation (script.py:7)
 39.00%  39.00%    7.50s     7.50s   computation (script.py:6)
  0.00% 100.00%   0.000s    18.00s   <module> (script.py:30)
  0.00% 100.00%   0.000s    18.00s   function2 (script.py:18)
  0.00% 100.00%   0.000s    18.00s   main (script.py:26)
  0.00% 100.00%   0.000s    18.00s   function1 (script.py:12)

pyinstrument

It is similar to cProfile in the sense that we can’t attach the profiler to a running program, but that is where the similarities end, as pyinstrument doesn't track every function call that your program makes. Instead, it's recording the call stack every 1ms.

Install it with:

pip install pyinstrument

Use:

The advantages are that:

  • The output is far more attractive.
  • It has less overhead, so it distorts less the results.
  • Doesn't show the internal calls that make cProfiling result reading difficult.
  • It uses wall-clock time instead of CPU time. So it takes into account the IO time.
$: pyinstrument script.py

  _     ._   __/__   _ _  _  _ _/_   Recorded: 15:45:20  Samples:  51
 /_//_/// /_\ / //_// / //_'/ //     Duration: 4.517     CPU time: 4.516
/   _/                      v3.3.0

Program: script.py

4.516 <module>  script.py:2
└─ 4.516 main  script.py:25
   └─ 4.516 function1  script.py:11
      ├─ 4.425 function2  script.py:16
      │  └─ 4.425 computation  script.py:2
      └─ 0.092 function3  script.py:21
         └─ 0.092 computation  script.py:2

With the possibility to generate an HTML report.

The disadvantages are that it's only easy to profile python script files, not full packages.

You can also profile a chunk of code, which can be useful when developing or for writing performance tests.

from pyinstrument import Profiler

profiler = Profiler()
profiler.start()

# code you want to profile

profiler.stop()

print(profiler.output_text(unicode=True, color=True))

To explore the profile in a web browser, use profiler.open_in_browser(). To save this HTML for later, use profiler.output_html().

Introduce profiling in your test workflow

I run out of time, so here are the starting points:

The idea is to develop the following ideas:

  • How to integrate profiling with pytest.
  • How to compare benchmark results between CI runs.
  • Some guidelines on writing performance tests

And memray looks very promising.

References


Last update: 2022-04-21