Lab 5 - Prometheus

Advanced PromQL - Nested query expressions

PromQL expressions are not a single query, but often a set of nested expressions, each one being evaluated and used as an argument or operand to the expressions above it in the nested structure. For example, putting the query expression from the previous exercise into PromLens and executing it will generate a visual nested structure:

							rate(demo_api_request_duration_seconds_count{job="services"}[5m])

Advanced PromQL - Results nested structure

This is the breakdown of the nested query expressions from our query. First, an embedded metric query that delivers its results to the rate function, which generates the final results. Note PromLens breaks down the query expression automatically for you below the expression window:

Advanced PromQL - Testing nested queries

PromLens broke down the query into its nested parts and you can click on the bottom metric line under the rate function line to execute it to validate the results being returned:

Advanced PromQL - Exploring label dimension results

Also note that next to the query line you see links to each of the label dimensions that are part of the results. To view them, click on one (below an example of the two method label results) and see how it opens a new expression window below that you can execute to validate the results. It also breaks down the new query to be run so you can explore it too:

Advanced PromQL - Explaining nested query expressions

A really nice feature of PromLens is the EXPLAIN tab. Remove the previous query by clicking on the X on the right. Then select the EXPLAIN tab, followed by clicking on the lower metric query expression to see it explained for you:

Advanced PromQL - Explaining basic functions

Also try clicking on the rate line for an explanation. At its most basic form of just the function name, you see that you end up with the documentation explanation:

Advanced PromQL - Exploring complexer expressions

Let's try a more complex nested query such as below, that if you cut-and-paste into your PromLens instance will let you explore a bit more:

							histogram_quantile(
	0.9,
	sum by(le, method, path) (
		rate(
			demo_api_request_duration_seconds_bucket{job="services"}[5m]
		)
	)
)

Advanced PromQL - Run each nested function

Try each nested function in the broken down query in PromLens, viewing labels, results, graphs (where possible), and look at the explanations. Noting of course, that depending on how long your instance as been running, results may be different from those shown here:

Theory intermezzo - PromQL expression result types

There are two concepts of expression type when talking about querying Prometheus and it's crucial you're able to understand the differences:

metric type - as reported by a scraped target: counter, gauge, histogram, summary, or untyped.
results type - data type of a PromQL expression: string, scalar, instant vector, or range vector.

PromQL has no concept of metric types. It's only concerned with expression result types. Each PromQL expression has a type, and each function, operator, or other type of operation requires its arguments to be of a certain expression type. String and scalar are straight forward for most, so let's look at the instant vector and range vector types.

Theory intermezzo - Instant vector expression results

Instant vectors are a list of labeled time series with one sample for each series, all measured at the same timestamp. They can result from a direct selection query of a time series metric, or from any function or other transformation that returns one. This is an example of an instant vector results:

							demo_cpu_usage_seconds_total{instance="localhost:8080", job="services", mode="idle"}	8310.383198889807
demo_cpu_usage_seconds_total{instance="localhost:8080", job="services", mode="system"}	3327.929701170597
demo_cpu_usage_seconds_total{instance="localhost:8080", job="services", mode="user"}	4984.887099939602

Theory intermezzo - Range vector expression results

Range vectors are a list of labeled time series with a range of samples over time for each series. They can result from a literal range vector selection query, or from a subquery expression. Useful when you want to aggregate over the behavior of a series over a specified time window. For example, rate(demo_cpu_usage_seconds_total[5m]).

This is an example of a range vector results:

							demo_cpu_usage_seconds_total{instance="localhost:8080", job="services", mode="idle"}	1.9860637950939426
demo_cpu_usage_seconds_total{instance="localhost:8080", job="services", mode="system"}	0.794675335030005
demo_cpu_usage_seconds_total{instance="localhost:8080", job="services", mode="user"}	1.1934846860666322

Theory intermezzo - PromQL expression node types

There are 10 different node types, which are the types of queries or expressions you can write:

number literals : 6.45
string literals : "hello o11y" -- occur infrequently, used as parameter values to functions
instant vector selectors : some_metric{job="services"} -- explained previously
range vector selectors : some_metric{job="services"}[15m] -- explained previously
aggregation : sum by(job) (some_metric) -- allows aggregating over multiple series, always yields an instant vector
unary operators : -some_metric -- negates any scalar or instant vector values, returns same type as it was applied on
binary operators : some_metric_1 + some_metric_2 -- returns scalar if both operands are scalar, otherwise vector
function calls : rate(some_metric[15m]) -- takes input parameters of varying types, returns varying types
sub-queries* : (expression)[1d:] -- takes instant vector expression as input, returns a range vector
parentheses expressions : (42) -- may return string, scalar, instant vector, or range vector, depending on usage

Theory intermezzo - Query evaluation times

If you've been paying attention so far, PromQL queries are using relative reference to time, such as looking back five minutes [5m]. The question is how are you going to specify an absolute graph time range or a timestamp to show query results in a table?

Time parameters are sent separately to the Prometheus Query API and depend on which type of query you are using. Let's look at each of the two query types and explore how they tackle evaluation times.

Theory intermezzo - Instant query evaluation time

The first type is called an instance query , used for table views showing results at a single point in time. It has the following parameters:

expression
evaluation timestamp

The expression is evaluated at the given timestamp. For example, some_metric[15m] selects the last 15 minutes of data, but never into the future. [-15m] is not valid. To be precise, the metric data is collected from the latest 15 minutes, meaning the data points need to be "at most 15m old, and not stale" relative to the evaluation timestamp. Output timestamps of all samples found in the last 15m will return them with their timestamps set to the evaluation timestamp. If there is a 15m gap in samples for a query, then the query returns an empty result.

Theory intermezzo - Range query evaluation times

The second type is called a range query , used for graphs showing expressions over a given time range. It works just like a set of independent instant queries that evaluate a resolution's steps over the given time range (though highly optimized in action). It has the following parameters:

expression
start time
end time
resolution step

The expression is evaluated at every resolution step between the start and end time, stitching together into a single range vector the individually evaluated time slices. Note if there are no samples in a given individual time frame, there will be gaps in the final series data used to produce the range vector

Advanced PromQL - Exploring histogram metrics

Onwards to our more advanced PromQL, starting with histograms, one of the more complex metrics to understand. You'll be looking at how histograms are represented in Prometheus, learn about observation buckets, and how to approximate quantiles from your histograms. The two things we'll be covering:

histogram metrics - how to interpret
quantiles - a generalized form of a percentile and how to calculate them from histograms

Advanced PromQL - Histograms in Prometheus

Prometheus client libraries support histogram metrics, allowing a service (app) to record the distribution of a stream of data values into a set of ranged buckets. Histograms usually track latency measurements or response sizes. Prometheus histograms sample data on the client-side, meaning that they count observed values using a number of configurable buckets and expose buckets as individual counter time series.

Internally, Prometheus histograms are implemented as a group of counter time series that each represent the current count for a given bucket. The per-bucket counters are cumulative in Prometheus, meaning that buckets for larger ranges include the counts for all lower-ranged buckets as well. Each histogram bucket time series has an le label ("less than or equal") and specifies the bucket's upper value boundary as a number encoded in a string label value such as le="0.05" for an upper boundary of 0.05 seconds. Note that this adds an additional cardinality dimension to any existing labels you are tracking.

Advanced PromQL - Prometheus histogram representation

This is what a histogram representation might look like in Prometheus metrics format:

							http_request_duration_seconds_bucket{le="0.025"} 20
http_request_duration_seconds_bucket{le="0.05"} 60
http_request_duration_seconds_bucket{le="0.1"} 90
http_request_duration_seconds_bucket{le="0.25"} 100
http_request_duration_seconds_bucket{le="+Inf"} 105
http_request_duration_seconds_sum 21.322
http_request_duration_seconds_count 105

Note here that the first bucket is reporting 20, the second is reporting 60 (that's 40 + previous 20), the third bucket is reporting 90 (that's 10 + 60 + 20), and so on...

Advanced PromQL - Example histogram query

Let's look at a histogram for one combination of request dimensions, from one instance:

							demo_api_request_duration_seconds_bucket{instance="localhost:8080",method="POST",path="/api/bar",status="200",job="services"}

You should see 26 series that each represent one observation bucket, identified by the le label, as shown on the next slide.

Advanced PromQL - Results histogram query

Advanced PromQL - Moving on to quantiles

A histogram helps answer questions like "How many of my requests take longer than 300ms to complete?" You answer this by configuring histogram buckets with 300ms boundaries and you're off and visualizing.

More likely, you'd like answers to: "What is the latency under which 99% of my queries complete?" This is a percentile or quantile answer we need. Note that in Prometheus percentile and quantile are the same, except that percentiles are in a 0 - 100 range, quantiles are between 0 and 1. In this case, the 99th percentile would be equivalent to a quantile of 0.99.

Advanced PromQL - The histogram_quantile function

If your histogram buckets are fine-grained enough, calculate quantiles from it using the following function:

							histogram_quantile(quantile_value, histogram_metric)

This function takes a quantile value between 0 and 1 and a histogram metric as inputs and outputs corresponding quantile values. See documentation for more details on the inner workings of how it converts 90 percentile to 0.9 quantile in the following slides.

Advanced PromQL - Calculating 90% API latency (first try)

Using our demo service, let's try and calculate at what latency 90% of our API requests finish. The first attempt might look like this:

							histogram_quantile(0.9, demo_api_request_duration_seconds_bucket{job="services"})

The problem is that it's not very useful. The bucket counters reset when individual service instances are restarted, and you usually want to see what the latency is measured over the last five minutes rather than over the entire time of the metric. In the next slide you'll see how to overcome this problem, or can you make a guess at the solution?

Advanced PromQL - Calculating 90% API latency (solution)

Applying a rate() function so that it only takes into account the most recent increments to the bucket counters within the last five minutes, while also dealing with counter resets correctly. Calculate the 90th percentile API latency over the last 5 minutes like this:

							histogram_quantile(0.9, rate(demo_api_request_duration_seconds_bucket{job="services"}[5m]))

Advanced PromQL - Results 90% API latency query

Advanced PromQL - Aggregating away dimensions

Our query shows the 90th percentile for every sub-dimension (job, instance, path, method, and status), which is a lot of data we don't really need in this case. Let's try to apply some cost savings for our organization and aggregate some of these labels away. Fortunately, you can use Prometheus' sum aggregation operator with your histogram_quantile() during query time to compute aggregated quantiles! The following aggregates away status and method dimensions:

							histogram_quantile(
	0.9,
	sum without(status, method) (
		rate(demo_api_request_duration_seconds_bucket{job="services"}[5m])
	)
)

Advanced PromQL - Results aggregation latency query

Advanced PromQL - Filtering queries for thresholds

We can filter our query results based on sample values by using binary filtering operators such as >, <, <=, >=, ==, and !=. This is a common use case where you want to apply a threshold in alerting rules. Should the value sampled exceed a given threshold, you want to be alerted. In the following example, we want to see the CPU usage in the USER space that exceeds a threshold of 1.19 measured over the last 15 minutes, so we add a > 1.19 filter operator to the rate expression:

							rate(demo_cpu_usage_seconds_total{mode="user", job="services"}[15m]) > 1.19

Advanced PromQL - Setting user cpu threshold

Advanced PromQL - Adjusting the filter threshold

As you might have noticed, the threshold was too low, all data was above it! Let's adjust our threshold and see how that looks for the CPU usage in the USER space that exceeds a new threshold of 1.194 measured over the last 15 minutes. This should result in only showing some of the cpu usage that exceeds our new threshold (feel free to play with the threshold value on your machine as the data will differ):

							rate(demo_cpu_usage_seconds_total{mode="user", job="services"}[15m]) > 1.194

Advanced PromQL - Results real cpu threshold

Advanced PromQL - Filtering with time series

Often binary operators are not common in graphs (note some rates disappear at evaluation time), while they appear regularly in alerting conditions to indicate value thresholds. What about filtering one time series results with another? Not a problem as the comparison operators also apply between series with identical label sets on both the left and right side of the comparison Note that on(), ignoring(), group_left(), and group_right() modifiers also work.

Let's test this by selecting the status="500" error rates for all label combinations where that error rate is more than 1/150th (2%) of the total request rate for the same label set:

							rate(demo_api_request_duration_seconds_count{status="500",job="services"}[5m]) * 50
> ignoring(status)
	sum without(status) (rate(demo_api_request_duration_seconds_count{job="services"}[5m]))

Advanced PromQL - Results filtering time series

You'll have to ignore the status label in the matching since it is always 500 on one side and missing on the other. The resulting graph looks a little strange since the series appear and disappear based on their current error rate ratio at each step in the graph:

Advanced PromQL - Using boolean for filtering

There is a way to determine the result of a comparison operator without actually removing any output series from the results. It's done by adding a bool modifier to the operator which results in keeping all series while setting the output sample value into a 1 (true) or 0 (false).

To show which request rates out of a set are above or below 0.2 per second, we can query as shown below. This results in a 0 or 1 for all label sets of the input series, allowing you to turn numeric conditions into boolean output values:

							rate(demo_api_request_duration_seconds_count{job="services"}[5m]) > bool 0.2

Advanced PromQL - Results boolean filtering

Advanced PromQL - The 3 set operators

Prometheus provides 3 set binary operators that work between instant vectors:

AND (set intersection) — Alert on high error rates, but only if the corresponding total rate is above some threshold.
OR (set union) — Graph one set of series, but fill up missing values from another series.
UNLESS (set diff) — Alert on low disk space, unless it's a read-only filesystem.

Set operators try to find matching series between the left and right side based on identical label sets, unless you provide an on() or ignoring() modifier to specify how matches should be found.

Unlike mathematical operators, there is no group_left() or group_right() modifier for set operators, as set operators always perform many-to-many matching. They always allow matching series from either side to match multiple series on the other side.

Advanced PromQL - Merging sets of time series

For the AND operator, if a match is found, the left-hand-side series becomes part of the output. If no matching series exists on the right, the series is omitted from the output. Let's explore this by selecting any HTTP endpoints that have a 90th percentile latency higher than 50ms (0.05s), but only for the dimensional combinations that receive more than one request per second. Let's use the histogram_quantile() function to calculate the 90th percentile latency for each sub-dimension, then filter the resulting bad latencies out and retain only those that receive more than one request per second:

							histogram_quantile(0.9, rate(demo_api_request_duration_seconds_bucket{job="services"}[5m])) > 0.05
and
rate(demo_api_request_duration_seconds_count{job="services"}[5m]) > 1

Advanced PromQL - Results merging time series

Advanced PromQL - Using the union of sets

Let's build a union of two sets of time series. Prometheus provides the OR set operator and it results in the series from the left-hand side of the operation, as well as any series from the right-hand side which don't have matching label sets on the left. To see this at work, let's list all request rates which are either below 10 or above 30 as follows:

							rate(demo_api_request_duration_seconds_count{job="services"}[5m]) < 10
or
rate(demo_api_request_duration_seconds_count{job="services"}[5m]) > 30

Advanced PromQL - Results union of time series

Advanced PromQL - Thoughts on merging sets

A final note about filtering, as stated before, using value filters and set operations in graphs leads to time series appearing and disappearing depending on whether they match a filter or not at any time step along the graph. Using this kind of filter logic is recommended only for alerting rules.

Exploring the UNLESS operator, which only keeps series from the left-hand side if equivalent label sets do not exist on the right-hand side (the right side "cuts out" elements from the left side), is left for you to explore on your own. You're now able to build set intersections, unions, and differences out of labeled time series.

Advanced PromQL - Using metrics with timestamps

Prometheus frequently exposes timestamps such as the last time that a batch job completed successfully or when a machine was restarted. As we might expect, these times are represented as Unix timestamps in seconds since January 1, 1970 UTC. Our demo services expose timestamps in a few metrics, such as the last time when a simulated batch job succeeded. It's simulating running once per minute and failing 25% of the time. When it fails, the metric keeps its last value until another successful run occurs. The raw timestamp can be graphed like this:

							demo_batch_last_success_timestamp_seconds{job="services"}

Advanced PromQL - Graph of raw timestamp query

Advanced PromQL - Getting value from timestamps

Raw timestamps are not very useful, but if you want to know how old the timestamp is, you can then spot batch jobs that are overdue. To calculate this, subtract the timestamp in the metric from the current time using the time() function to get time in seconds since the last successful run of the batch job:

							time() - demo_batch_last_success_timestamp_seconds{job="services"}

Advanced PromQL - Graph using time()

Advanced PromQL - Manipulating your timestamps

You might need to convert your output from seconds into hours. This is done by dividing the results by 3600. An expression like this is useful for both graphing and alerting:

							(time() - demo_batch_last_success_timestamp_seconds{job="services"}) / 3600

Advanced PromQL - Results converting to hours

Advanced PromQL - Finding slow batch jobs

When visualizing the timestamp age like above, you receive a sawtooth graph, with linearly increasing lines and regular resets to 0 when the batch job completes successfully. If a sawtooth spike gets too large, this indicates a batch job that has not been completed in a timely manner. You can set an alert on this by adding a > threshold filter to the expression and alerting on the resulting time series. Let's try listing instances for which the batch job has not completed in the last 1.5 minutes with the following query:

							time() - demo_batch_last_success_timestamp_seconds{job="services"} > 1.5 * 60

Advanced PromQL - Results slow batch jobs

Advanced PromQL - Inspecting instance health

While scraping targets, Prometheus stores a synthetic (fake) sample with the metric name UP and the JOB and INSTANCE labels of the scraped instance. If the scrape was successful, the value of the sample is set to 1, or 0 if the scrape fails. Thus, we can easily query which instances are currently "up" or "down" by querying for:

							up{job="services"}

Advanced PromQL - Second services instance setup

Not very exciting with only one services demo instance running. Let's set up a second services demo instance to help clarify the instance health checks. If you open a new terminal window and start a second services demo instance with the following command, you'll get a second instance on localhost:8088 (any unoccupied port will work if there is a conflict):

							# source install.
$ ./installs/demo/services_demo --listen-address localhost:8088

# container install.
podman run -p 8088:8080 prometheus_services_demo:v1

Advanced PromQL - Verify second instance

Verify that you have the metrics feed on localhost:8088/metrics:

							# HELP demo_api_http_requests_in_progress The current number of API HTTP requests in progress.
# TYPE demo_api_http_requests_in_progress gauge
demo_api_http_requests_in_progress 1
# HELP demo_api_request_duration_seconds A histogram of the API HTTP request durations in seconds.
# TYPE demo_api_request_duration_seconds histogram
demo_api_request_duration_seconds_bucket{method="GET",path="/api/bar",status="200",le="0.0001"} 0
demo_api_request_duration_seconds_bucket{method="GET",path="/api/bar",status="200",le="0.00015000000000000001"} 0
demo_api_request_duration_seconds_bucket{method="GET",path="/api/bar",status="200",le="0.00022500000000000002"} 0
demo_api_request_duration_seconds_bucket{method="GET",path="/api/bar",status="200",le="0.0003375"} 0
demo_api_request_duration_seconds_bucket{method="GET",path="/api/bar",status="200",le="0.00050625"} 0
demo_api_request_duration_seconds_bucket{method="GET",path="/api/bar",status="200",le="0.000759375"} 0
demo_api_request_duration_seconds_bucket{method="GET",path="/api/bar",status="200",le="0.0011390624999999999"} 0
demo_api_request_duration_seconds_bucket{method="GET",path="/api/bar",status="200",le="0.0017085937499999998"} 0
demo_api_request_duration_seconds_bucket{method="GET",path="/api/bar",status="200",le="0.0025628906249999996"} 0
demo_api_request_duration_seconds_bucket{method="GET",path="/api/bar",status="200",le="0.0038443359374999994"} 0
...

Advanced PromQL - Add target to Prometheus

Now we need to reconfigure Prometheus to scrape this new instance, while keeping it under the same JOB title. To do this, update the same Prometheus configuration file that you worked with (workshop-prometheus.yml) and add the new services target to scrape. Note for containers, instead of localhost for the targets, you'll need to use the variable shown to discover the IP address:

							# workshop config
global:
	scrape_interval: 5s

scrape_configs:

	# Scraping Prometheus.
	- job_name: "prometheus"
	static_configs:
		- targets: ["localhost:9090"]

	# Scraping services demo.
	- job_name: "services"
	static_configs:
		- targets: ["host.containers.internal:8080"]
		- targets: ["host.containers.internal:8088"]

Advanced PromQL - Restart Prometheus instance

You can either use the previously taught HUP command to restart your Prometheus instance if running source installs or, with container installations, by rebuilding the container image before restarting the Prometheus instance with the following:

							# source install.
$ ./prometheus --config.file=support/workshop-prometheus.yml

# container install.
$ podman run -p 9090:9090 workshop-prometheus:v2.54.1 --config.file=/etc/prometheus/workshop-prometheus.yml

Advanced PromQL - Inspecting all instance health

You should see two services instances both actively scraping:

							up{job="services"}

Advanced PromQL - Stopping an instance

Now stop the second instance and check again to see a failure (0):

							up{job="services"}

Advanced PromQL - Searching for down instances

Maybe you only want to see instances that are down:

							up{job="services"} == 0

Advanced PromQL - Count the down instances

Or maybe you are only interested in the number of down instances:

							count by(job) (up{job="services"} == 0)

Advanced PromQL - Detecting missing series

A problem with the previous method for detecting missing instances is that you are looking at sample series data. What if no series are present at all, such as when Prometheus is not scraping a target at all due to misconfiguration, problems with service discovery, or bugs in the scraping subsystem?

The ABSENT function can help, it takes an instant vector as its input and returns an empty result when the input contains a series. It returns a single output series when it does not (always having a sample value of 1) and if the input is a direct vector selector, any equals matchers (=) contained in the selector get translated into labels in the output.

With our second service on port 8088 still turned off, we know that the following will return an empty output (there are still series from the service demo on port 8080):

							absent(up{job="services"})

Advanced PromQL - Detecting absent series

You can help detect situations where you know that series should be present, but are not, by testing for the existence of UP for a job that is not being scrapped (does not even exist). You can do this by using a JOB that you know does not exist, such as this, which returns a label "missing" and the value 1:

							absent(up{job="missing"})

Intermezzo - Best practice usage of absent

For any job that you know should be present, it is a good practice to not only alert on UP being 0, but also on UP being completely absent for that job.

There is also a variant of ABSENT called ABSENT_OVER_TIME which takes a range vector and tells you whether there were no samples over the entire time range of that input vector. This helps detect situations where series should be present at regular intervals, but not necessarily all the time.

Advanced PromQL - Smoothing graphs with aggregation

Previously we looked at how to aggregate metrics over label dimensions with SUM, AVG, and other operators. All of these aggregate over multiple series at a single point in time. What about when you want to aggregate over time within each series, which is often done to smooth out a spiky graph or to get a maximum value of a series over time. PromQL provides a few functions that append _OVER_TIME() to their name to support this. Note that not all aggregator operators have this variation, see the docs for details.

To see this in action you first need to query the raw number of goroutines used in the services demo to get a spiky graph:

							go_goroutines{job="services"}

Advanced PromQL - Results spiky graph

Advanced PromQL - Smoothing spiky graphs

To smooth out this spiky graph, try averaging the number of goroutines over 10 minutes at every point in the graph:

							avg_over_time(go_goroutines{job="services"}[10m])

Advanced PromQL - Results smoothed graph

Advanced PromQL - Maximum services disk usage

Just for fun, let's try the MAX aggregation over time to determine the maximum disk usage over the last day by our services demo (there were two running at one time?). Note in the output you will see both the services demo on port 8080 and 8088 listed, even when it's not up and running at the moment. The metrics are calculated from previously scraped data when it was running:

							max_over_time(demo_disk_usage_bytes{job="services"}[1d])

Advanced PromQL - Results services disk usage

Advanced PromQL - Using sub-queries

A problem arises when we want to determine something like the maximum request rate of our services demo over the last day... do you know what that problem is?

Up to now all of the _OVER_TIME functions required a range vector as input (like some_metric[10m]) selecting directly from the TSDB. Why won't work for MAX_OVER_TIME when we try to find the max request rate to our services demo over the last day? The simple answer is that the request rate is not a raw data value we can select over time... it's a derived value from:

							rate(demo_api_request_duration_seconds_count{job="services"}[5m])

Advanced PromQL - Naive max over time query

If we try in our first attempt to just pass the expressions directly into MAX_OVER_TIME with a one-day duration appended... we get an error (try it!):

							max_over_time(
	rate(
		demo_api_request_duration_seconds_count{job="services"}[5m]
	)[1d] # oops.
)

You should see some error message like this:

Error: Expression incomplete or buggy: 4:3: parse error: ranges only allowed for vector selectors

Advanced PromQL - Sub-query for max requests

Using sub-queries for this use case allows you to first run the inner query over a range of time at a specified resolution and then compute the outer query on the sub-query's results. The notation for sub-queries have an additional resolution argument added after a colon:

[DURATION:RESOLUTION]

You can now adjust our max request rate over a day by telling Prometheus to evaluate the inner expression over a range of one day, at a resolution of 15 seconds:

							max_over_time(
	rate(
		demo_api_request_duration_seconds_count{job="services"}[5m]
	)[1d:15s] # inner query over 1 day, with a 15-second resolution.
)

Advanced PromQL - Results max request rates over 1 day

Intermezzo - Sub-query thoughts

You can leave off the resolution after the colon which causes Prometheus to evaluate the inner expression at the global default rule evaluation interval (global evaluation_interval setting).

Also be aware that computing sub-queries over longer periods of time can become expensive. Consider using recording rules to pre-record the derived expression for selecting its results rather than computing it every time in your outer query runs.

Advanced PromQL - Sorting query output

Up to now you've been querying metrics and not really concerned about how the output looks, especially in the TABLE view. Note that GRAPH views automatically sort samples by their timestamps and sample values along the X-axis and Y-axis.

Let's take a look at how we can sort output from our queries, sort to display only the largest or top results, and sort to display the smallest or bottom results. To start with here is an unsorted query of the per-path request rates:

							sum by(path) (rate(demo_api_request_duration_seconds_count{job="services"}[5m]))

Advanced PromQL - Results unsorted query

Advanced PromQL - General sorting output

This output has only three results, but annoyingly they are displayed with the values in an unorganized order. Let's tidy this up by adding SORT_DESC to sort the results from largest to smallest:

							sort_desc(sum by(path) (rate(demo_api_request_duration_seconds_count{job="services"}[5m])))

Advanced PromQL - Results sorted descending

Advanced PromQL - Topk sorting output

Maybe you are only interested in the N number of biggest values in a results set. This can be achieved by using the TOPK function, which provides a number to select from the instant vector results. Let's try this on the previous query and only display the top two results:

							topk(2, sum by(path) (rate(demo_api_request_duration_seconds_count{job="services"}[5m])))

Advanced PromQL - Results sorted descending

Advanced PromQL - Bottomk sorting output

The opposite is also possible, where you are only interested in the N number of smallest values in a results set. This can be achieved by using the BOTTOMK function, which provides a number to select from the instant vector results. Let's try this on the same query to display just the last two results:

							bottomk(2, sum by(path) (rate(demo_api_request_duration_seconds_count{job="services"}[5m])))

Advanced PromQL - Results sorted descending

Intermezzo - Stabilizing topk and bottomk

Graphing the output of TOPK and BOTTOMK can yield surprising results due to the way PromQL is evaluated. You might expect to see the K top or bottom series as averaged over the entire graph time range — but instead, PromQL computes the K series separately for each resolution step along the graph range, since a range query is evaluated as many independent instant queries at subsequent resolution timestamps. Thus, the identity of the top or bottom K series can actually vary over the range of the graph and your graph may show more than K (partially present) series in total if the series doesn't maintain an equal ordering over the whole graph range. The question is how to avoid this?

To show only a stable K number of series that represent an overall top or bottom K for the graph, there's a feature in PromQL for vector selectors supporting an @ TIMESTAMP modifier that causes the selector to select data at a fixed absolute timestamp. Let's look at how this modifier works and then use it to stabilize our own TOPK and BOTTOMK results.

Advanced PromQL - Using the @ modifier

Let's query to select the free bytes relative to and absolute Unix timestamp (in seconds):

							demo_memory_usage_bytes{type="free"} @ 1643623922.444

Or you can anchor the selector either to the start or the end of the graph range:

							demo_memory_usage_bytes{type="free"} @ start()

demo_memory_usage_bytes{type="free"} @ end()

Advanced PromQL - Stabilizing our graph output

With all this new knowledge, are you ready to try to stabilize your TOPK expression from the previous exercises? Instead of directly selecting the top 2 request rates at every resolution step, first select all rates in the usual fashion (without an @ modifier), but then filter them using an and operator and a second selector that selects the overall highest 2 rates as measured over the full graph range. This second selector needs to be anchored to the end of the graph range and then look back over the entire graph range (1h in this example), to compute the overall largest rates:

							# first select all rates.
sum by(path) (rate(demo_api_request_duration_seconds_count{job="services"}[5m]))

and

# Select top 2 biggest rates for the 1-hour graph window.
topk(2,sum by(path) (rate(demo_api_request_duration_seconds_count{job="services"}[1h] @ end())))

Advanced PromQL - Results stabilized graph

Advanced PromQL - Results topk vs stabilized

When we put the graphs side-by-side for TOPK and @ END you might notice that this doesn't change much for the output of our very stable service demo. Rest assured, this has a big impact on more noisy and rapidly changing input metrics. Below you see TOPK on the left and the stabilized @ END on the right:

Advanced PromQL - Anomaly holiday detection

Anomaly detection is done by accessing past data in a time-shifted way to compare it to current data. To see if something has changed, you compare today's request rate to that of one week ago. To time-shift past data into the present, you can append an offset DURATION modifier to any range or instant series selector.

Let's use our services demo that exposes a counter DEMO_ITEMS_SHIPPED_TOTAL that tracks shipment of items with a simulated "daily" traffic period of 5 minutes (so we don't have to wait a full day to see the period). Take a look at its rate to see it's got a built in dip which represents a holiday:

							rate(demo_items_shipped_total{instance="localhost:8080"}[1m])

Advanced PromQL - Results holiday graph

Advanced PromQL - Tracking the holiday metric

The service demo happens to also exposes a 0 / 1 boolean metric that tells us whether it is currently a holiday. When you run this the graph shows square (absolute) spikes when a holiday is in effect, which was a dip on the previous rate function. This means comparing the holidays to the shipped items rate you'll notice that it decreases on holidays:

							demo_is_holiday{instance="localhost:8080"}

Advanced PromQL - Results holiday graph

Advanced PromQL - First try anomaly detection

First you can try comparing the current shipment rate with the rate 7 "days" (7 * 5 minutes) ago to see if anything is out of the ordinary. Normally, the ratio is around 1, but when either the current day or the past day was a holiday, we get either a lower or a higher ratio than normal:

							rate(demo_items_shipped_total{instance="localhost:8080"}[1m])
/
rate(demo_items_shipped_total{instance="localhost:8080"}[1m] offset 35m)

Advanced PromQL - Results first try

Advanced PromQL - Refining anomaly detection

What you want is to ignore these lower and higher ratio spikes if the cause was just a holiday. You can filter away the ratio at times when either the past or the present was a holiday by appending an UNLESS set operator:

							(
	rate(demo_items_shipped_total{instance="localhost:8080"}[1m])
	/
	rate(demo_items_shipped_total{instance="localhost:8080"}[1m] offset 35m)
)
unless
(
	demo_is_holiday == 1             # Is a holiday?
	or
	demo_is_holiday offset 35m == 1  # Was a holiday 7 "days" ago?
)

Advanced PromQL - Results refining detection

Advanced PromQL - Finishing anomaly detection

A final refinement is to compare whether the "holiday-ness" today is the same as a week ago, giving you a cleaner graph, mostly filtering away the ratio at times when there is either a holiday at the current time or there was one in the past:

							(
	rate(demo_items_shipped_total{instance="localhost:8080"}[1m])
	/
	rate(demo_items_shipped_total{instance="localhost:8080"}[1m] offset 35m)
)
unless
(
	demo_is_holiday
	!=
	demo_is_holiday offset 35m
)

Advanced PromQL - Results finishing detection

Lab completed - Results

Next up, relabeling metrics in Prometheus...

Contact - are there any questions?

Eric D. Schabell
Director Evangelism
Contact: @ericschabell {@fosstodon.org) or https://www.schabell.org

Up next in workshop...

Lab 6 - Relabeling Metrics in Prometheus

Advanced PromQL - Configuring PromLens