Lab 7 - Prometheus

Lab Goal

This lab provides an understanding of how service discovery is used in Prometheus for locating and scraping targets for metrics collection. You're learning by setting up a service discovery mechanism to dynamically maintain a list of scraping targets.

Service discovery - It's been all static so far

Up to this point in this workshop, you've been statically configuring your Prometheus installation to scrap each target in its own job in the static_configs section of your workshop-prometheus.yml file.

In real world cloud native infrastructure things will be dynamically scaling and make it entirely impossible to maintain such a static configuration. You'll be faced with virtual machines on cloud providers, services and applications on container orchestrators such as Kubernetes, and microservice architectures constantly changing the observability data targets you need to be scraping.

Service discovery - Introducing dynamic discovery

Remember back in the first introduction you saw the Prometheus architecture where service discovery was looming as a way to leverage built in mechanisms for discovering virtual machines on cloud providers, service and application instances on container orchestrators, and other generic lookup mechanisms (DNS, Consul, Zookeeper, etc.):

Service discovery - Supporting dynamic discovery

Any of those methods can be added to a scrape_config section in your Prometheus configuration file to provide a dynamic list of targets, continuously updating during runtime. Prometheus automatically stops scraping old instances and starts scraping new ones, making highly dynamic environments such as services running on Kubernetes manageable:

Service discovery - Functions of discovery data

When Prometheus is configured to use a service discovery mechanism, it's using the provided discovery information for three purposes:

knowing what should exist
knowing how to pull metrics from targets that exist
how to use associated target metadata

Service discovery - Knowing what should exist

Prometheus, as a monitoring system, needs to know what systems and services should be up and running at any given point in time. A key function of any service discovery mechanism that Prometheus uses, is to continuously provide that information. With this information being available to Prometheus, it's trivial to leverage the Alert Manager to alert on any unreachable targets.

Service discovery - Knowing how to pull metrics

Prometheus, as a monitoring system, needs to know more than whether a target system exists or not, it needs to know how to pull metrics from it. You can imagine what Prometheus needs to know, such as:

host name
port number
protocol (http or https)
...any other information needed to reach or access the target

Most of this is provided by different service discovery mechanisms in target metadata and it enables Prometheus to fetch data from the target.

Service discovery - How to use target metadata

Many of the service discovery mechanisms you'll use are providing metadata about each target (with things like labels, annotations, service names, ready states, etc). This metadata is used during the relabeling phase to filter targets, modify how targets are scraped, or map any metadata into final target labels.

As shown previously in this workshop, relabeling allows enriching target labels based on discovery information creates more useful time series data for eventual queries.

Service discovery - Target metadata and relabeling

During a previous lab in this workshop you were introduced to target metadata, normal labels, and how the relabeling phase can be used to modify target labels before persisting. When using service discovery, target sources provide normal labels and 'hidden' labels prefixed with a double underscore (__). These 'hidden' labels contain additional metadata about the target.

All 'hidden' labels are removed after the relabeling phase, only making it into the target's final labels if you filter them using relabeling and change a target's labels or scrape behavior.

Service discovery - Metadata labels affecting scrape behavior

It's possible to use the following metadata labels, as they are always provided by raw targets, to modify any targets scrape behavior:

__address__: Contains the target TCP address that should be scrapped. It initially defaults to [host]:[port] provided by the service discovery mechanism. Prometheus sets the instance label to the value of __address__ after relabeling if you don't set the instance label explicitly to another value during relabeling.
__scheme__: Contains the HTTP scheme (http or https) with which target should be scrapped. Defaults to http.
__metrics_path__: Contains the HTTP path to scrape metrics from. Defaults to /metrics.

Another interesting label is the __param_[name] label, allowing you to send HTTP query parameters along with a scrape on any target. For example, you could set the __param_filter label to the value active to send a filter active HTTP query parameter.

Service discovery - Metadata (__meta_) labels

Finally, every service discovery mechanism can provide discovery specific metadata about a target using labels starting with __meta_. For example, service discovery for Kubernetes provides:

__meta_kubernetes_pod_name label for each pod target
__meta_kubernetes_pod_ready label indicating if pod is in a ready state or not

There are numerous metadata labels available to each service discovery mechanism and you are encouraged to explore their individual configuration documentation to find out more. Here is an example for Kubernetes service discovery configuration documentation.

Service discovery - Options for demo environments

For the rest of this lab you will be setting up your Prometheus custom service discovery integration to watch a set of local files containing target information. You can then write custom code to update the target files and Prometheus will automatically adjust to any new targets. You'll be using the file-based service discovery mechanism to feed a changing list of custom targets to Prometheus during runtime.

The lab exercise uses the services demo project to simulate several infrastructure environments, all to be monitored by a Prometheus instance. They all can be installed on your local machine in one of two ways, so please click on the option you want to use to continue with this workshop:

Lab 7 - Discovering Service Targets