Promql

Usage ⚑

Selecting series⚑

Select latest sample for series with a given metric name:

node_cpu_seconds_total

Select 5-minute range of samples for series with a given metric name:

node_cpu_seconds_total[5m]

Only series with given label values:

node_cpu_seconds_total{cpu="0",mode="idle"}

Complex label matchers (=: equality, !=: non-equality, =~: regex match, !~: negative regex match):

node_cpu_seconds_total{cpu!="0",mode=~"user|system"}

Select data from one day ago and shift it to the current time:

process_resident_memory_bytes offset 1d

Rates of increase for counters⚑

Per-second rate of increase, averaged over last 5 minutes:

rate(demo_api_request_duration_seconds_count[5m])

Per-second rate of increase, calculated over last two samples in a 1-minute time window:

irate(demo_api_request_duration_seconds_count[1m])

Absolute increase over last hour:

increase(demo_api_request_duration_seconds_count[1h])

Aggregating over multiple series⚑

Sum over all series:

sum(node_filesystem_size_bytes)

Preserve the instance and job label dimensions:

sum by(job, instance) (node_filesystem_size_bytes)

Aggregate away the instance and job label dimensions:

sum without(instance, job) (node_filesystem_size_bytes)

Available aggregation operators: sum(), min(), max(), avg(), stddev(), stdvar(), count(), count_values(), group(), bottomk(), topk(), quantile().

Time⚑

Get the Unix time in seconds at each resolution step:

time()

Get the age of the last successful batch job run:

time() - demo_batch_last_success_timestamp_seconds

Find batch jobs which haven't succeeded in an hour:

time() - demo_batch_last_success_timestamp_seconds > 3600

Snippets⚑

Run operation only on the elements that match a condition ⚑

Imagine we want to run the zfs_dataset_used_bytes - zfs_dataset_used_by_dataset_bytes operation only on the elements that match zfs_dataset_used_by_dataset_bytes > 200e3. You can do this with and:

zfs_dataset_used_bytes - zfs_dataset_used_by_dataset_bytes and zfs_dataset_used_by_dataset_bytes > 200e3

Substracting two metrics ⚑

To run binary operators between vectors you need them to match. Basically it means that it will only do the operation on the elements that have the same labels. Sometimes you want to do operations on metrics that don't have the same labels. In those cases you can use the on operator. Imagine that we want to substract the next vectors:

zfs_dataset_used_bytes{type='filesystem'}

And

sum by (hostname,filesystem) (zfs_dataset_used_bytes{type='snapshot'})

That only have in common the labels hostname and filesystem`.

You can use the next expression then:

zfs_dataset_used_bytes{type='filesystem'} - on (hostname, filesystem) sum by (hostname,filesystem) (zfs_dataset_used_bytes{type='snapshot'})

To learn more on Vector matching read this article

Generating range vectors from return values in Prometheus queries ⚑

Use the subquery-syntax

Warning: These subqueries are expensive, i.e. create very high load on Prometheus. Use recording-rules when you use these queries regularly.

Subquery syntax⚑

<instant_query>[<range>:<resolution>]

instant_query: A PromQL-function which returns an instant-vector).
range: Offset (back in time) to start the first subquery.
resolution: The size of each of the subqueries.

It returns a range-vector.

For example:

deriv(rate(varnish_main_client_req[2m])[5m:10s])

In the example above, Prometheus runs rate() (= instant_query) 30 times (the first from 5 minutes ago to -4:50, ..., the last -0:10 to now). The resulting range-vector is input to the deriv() function.

Troubleshooting⚑

Ranges only allowed for vector selectors ⚑

You may need to specify a subquery range such as [1w:1d].

Links⚑

Prometheus cheatsheet