Lab 2 - Installing Prometheus (container image)
Lab Goal
This lab guides you through installing Prometheus using an open source container image on
your local machine, configuring, and running it to start gathering metrics.
Prometheus - Container installation notes
This lab guides you through installing Prometheus using Podman, an open source container
tooling set that works on Mac OSX, Linux, and Windows systems. This workshop assumes you have
installed, initialized, and
running Podman on
your machine:
The rest of this lab is pictured using Mac OSX, but it is assumed you have enough knowledge of
your Linux or Windows systems to be able to achieve the same results. Also note that you can
run this workshop using Docker, but let's you sort out the details.
Prometheus - Verifying tooling installation
To start with a working container installation, you should see the following results for the
commands shown (note versions shown might differ for you):
$ podman -v
podman version 5.x <<<< MINIMUM VERSION REQUIRED
$ podman machine init
Downloading VM image: fedora-coreos-38.20230625.2.0-qemu.aarch64.qcow2.xz: done
Extracting compressed file
Image resized.
Machine init complete
$ podman machine start
Starting machine "podman-machine-default"
...
(more console output...)
...
Machine "podman-machine-default" started successfully
$ podman machine list
NAME VM TYPE CREATED LAST UP CPUS
podman-machine-default* qemu 2 days ago Currently running 1
Now that you have the tooling, let's get started installing Prometheus in a container...
Installation - Make a project directory
The first step will be to open a console or terminal window and start with the command line
to create yourself an empty workshop directory, something like this:
$ mkdir workshop-prometheus
$ cd workshop-prometheus
Setup - A workshop configuration
Using any editor you like, create a file named workshop-prometheus.yml
and you are
going to create a basic Prometheus configuration that looks like the YAML code below which you
can cut-and-paste (be sure to save the results):
# workshop config
global:
scrape_interval: 5s
# Scraping only Prometheus.
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
Configuration - Some thoughts on setup
Normally you are monitoring other targets over HTTP and scraping their endpoints, but we are
going to start with Prometheus as it also exposes its own metrics endpoints. Monitoring your
Prometheus servers health is only an exercise for this workshop. Also note that scraping metrics
every 5 seconds is a bit over the top, commonly you would see 10-60 seconds, but we want our
data to flow in a steady stream for this workshop.
Configuration - The global section
As you can imagine, the global
section is used for settings and default
values. Here we have just set the default scrape interval
to be 5 seconds:
# workshop config
global:
scrape_interval: 5s
Configuration - The scrape configs section
The scrape_configs
section is where you tell Prometheus which targets to
scrape to collect metrics from. In the beginning we will be listing each job for our targets
manually, using a host:port
format:
# Scraping only Prometheus.
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
Note: production configurations would use service discovery integrations to find targets, more
on that later in this workshop.
Installation - Creating container build file
Next you can use any editor you like, but create a file called Buildfile
with the following which you can use through cut-and-pasting:
FROM prom/prometheus:v2.54.1
ADD workshop-prometheus.yml /etc/prometheus
Installation - Building a container image
Now you can build your own container image with our custom configuration inserted:
$ podman build -t workshop-prometheus:v2.54.1 -f Buildfile
STEP 1/2: FROM prom/prometheus:v2.54.1
Resolving "prom/prometheus" using unqualified-search
registries (/etc/containers/registries.conf.d/999-podman-machine.conf)
Trying to pull docker.io/prom/prometheus:v2.54.1...
Getting image source signatures
Copying blob sha256:54512e4fd08c47b3ca9a1a819b67275b22bf3a26be2d068369e330cc2dcaaa30
Copying blob sha256:ae2aea4a76d0bd645dd670e4146ffb8c6700e5cff84fffc00c57e0546b40e3fe
Copying blob sha256:9a4e2bd7c8b5eff5563de824d2eda1ab776820b49b8b3371404f796ff14bf5f2
Copying config sha256:dd202374baaf7882936f1edebed305ff0d1a7dfd9dfb6a09b198f550043d02d2
Writing manifest to image destination
Storing signatures
STEP 2/2: ADD workshop-prometheus.yml /etc/prometheus
COMMIT workshop-prometheus
--> a6433a85d68f
Successfully tagged localhost/workshop-prometheus:v2.54.1
a6433a85d68faa925da7673a2bbb3bd4a9674f3dca5061d9ae34b5ab1b4d1a77
Installation - Verifying built image
Looking at the IMAGES
we see the following:
$ podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
localhost/workshop-prometheus v2.54.1 df411b583ae4 2 minutes ago 270 MB
docker.io/prom/prometheus v2.54.1 4336a50d4ad9 6 days ago 270 MB
Prometheus - Start your metrics engines!
Now it's time to start the Prometheus server. We will point to our config file inside the image
we built using a flag (scroll to view log):
$ podman run -p 9090:9090 workshop-prometheus:v2.54.1 --config.file=/etc/prometheus/workshop-prometheus.yml
...
ts=2024-09-03T08:30:37.085Z caller=main.go:601 level=info msg="No time or size retention was set so using the default time retention" duration=15d
ts=2024-09-03T08:30:37.085Z caller=main.go:645 level=info msg="Starting Prometheus Server" mode=server version="(version=2.54.1, branch=HEAD, revision=e6cfa720fbe6280153fab13090a483dbd40bece3)"
ts=2024-09-03T08:30:37.085Z caller=main.go:650 level=info build_context="(go=go1.22.6, platform=linux/arm64, user=root@812ffd741951, date=20240827-10:59:03, tags=netgo,builtinassets,stringlabels)"
ts=2024-09-03T08:30:37.086Z caller=main.go:651 level=info host_details="(Linux 6.8.11-300.fc40.aarch64 #1 SMP PREEMPT_DYNAMIC Mon May 27 15:22:03 UTC 2024 aarch64 6de96720c902 (none))"
ts=2024-09-03T08:30:37.086Z caller=main.go:652 level=info fd_limits="(soft=524288, hard=524288)"
ts=2024-09-03T08:30:37.086Z caller=main.go:653 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2024-09-03T08:30:37.087Z caller=web.go:571 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090
ts=2024-09-03T08:30:37.087Z caller=main.go:1160 level=info msg="Starting TSDB ..."
ts=2024-09-03T08:30:37.088Z caller=tls_config.go:313 level=info component=web msg="Listening on" address=[::]:9090
ts=2024-09-03T08:30:37.088Z caller=tls_config.go:316 level=info component=web msg="TLS is disabled." http2=false address=[::]:9090
ts=2024-09-03T08:30:37.090Z caller=head.go:626 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
ts=2024-09-03T08:30:37.090Z caller=head.go:713 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=500ns
ts=2024-09-03T08:30:37.090Z caller=head.go:721 level=info component=tsdb msg="Replaying WAL, this may take a while"
ts=2024-09-03T08:30:37.090Z caller=head.go:793 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
ts=2024-09-03T08:30:37.090Z caller=head.go:830 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=11.083µs wal_replay_duration=278.918µs wbl_replay_duration=84ns chunk_snapshot_load_duration=0s mmap_chunk_replay_duration=500ns total_replay_duration=299.709µs
ts=2024-09-03T08:30:37.092Z caller=main.go:1181 level=info fs_type=XFS_SUPER_MAGIC
ts=2024-09-03T08:30:37.092Z caller=main.go:1184 level=info msg="TSDB started"
ts=2024-09-03T08:30:37.092Z caller=main.go:1367 level=info msg="Loading configuration file" filename=/etc/prometheus/workshop-prometheus.yml
ts=2024-09-03T08:30:37.092Z caller=main.go:1404 level=info msg="updated GOGC" old=100 new=75
ts=2024-09-03T08:30:37.092Z caller=main.go:1415 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/workshop-prometheus.yml totalDuration=389.211µs db_storage=6.166µs remote_storage=709ns web_handler=250ns query_engine=1.083µs scrape=194.168µs scrape_sd=19.375µs notify=459ns notify_sd=375ns rules=1.167µs tracing=2.625µs
ts=2024-09-03T08:30:37.092Z caller=main.go:1145 level=info msg="Server is ready to receive web requests."
ts=2024-09-03T08:30:37.092Z caller=manager.go:164 level=info component="rule manager" msg="Starting rule manager..."
Prometheus - Adjusting settings live
Using this container image we built means any changes you need to make to the configuration
that were in the flags used to start the server will require a new image be built, the old
container stopped, and the new container started using the new image.
Now let's see if our Prometheus server is up and running on our local machine by loading the
status page in our browser at
http://localhost:9090,
noting it needs to run a little bit to collect some data from its own HTTP metrics endpoint.
Prometheus - The status page
Now try the dark mode feature by clicking on the half moon icon in the top right corner.
Prometheus - Status page (dark mode)
Prometheus - Live metrics endpoint
Prometheus - Checking your targets
After you configure a new prometheus
target to scrape and (re)start the
Prometheus server, validate it's running correctly by going to the status page, using the
drop down menu at the top labeled STATUS
and selecting
TARGETS
:
Prometheus - Checking your targets
This shows you a list of the targets, in our case just one, featuring the scrape configuration
details. The most important field here is the STATE
, where we want to see a
green UP
:
Prometheus - Bad target state
Let's break our configuration and see what that looks like in the targets status page. To do this
open up your configuration file workshop-prometheus.yml
and edit the
scrape_configs
section to alter the targets port number as shown:
# Scraping only Prometheus.
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9099"]
Save and see next slide for applying the new configuration.
Installation - Building a new container image
Now you can rebuild your own container image with our broken configuration inserted:
$ podman build -t workshop-prometheus:bad-target -f Buildfile
STEP 1/2: FROM prom/prometheus:v2.54.1
STEP 2/2: ADD workshop-prometheus.yml /etc/prometheus
COMMIT workshop-prometheus
--> b63d3b6d2139
Successfully tagged localhost/workshop-prometheus:bad-target
b63d3b6d2139c3a28eeab4b8d65169a1b4d77da503c51a587340e0a1b0a52b8a
Installation - Verifying rebuilt image
Looking at the IMAGES
we see it was rebuilt just a bit ago, moving the older
image out of the way:
$ podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
localhost/workshop-prometheus bad-target 00d6552169cf About a minute ago 241 MB
localhost/workshop-prometheus v2.54.1 d53a1e8ff3dc 7 minutes ago 241 MB
docker.io/prom/prometheus v2.54.1 eb8939d5c174 2 hours ago 241 MB
Prometheus - Start your broken configuration
First you stop the running container (using CTRL-C as we are not running the container detached),
then start it again as we did before, but this time the newest rebuilt image will be used:
$ podman run -p 9090:9090 workshop-prometheus:bad-target --config.file=/etc/prometheus/workshop-prometheus.yml
...
ts=2024-09-03T08:30:37.085Z caller=main.go:601 level=info msg="No time or size retention was set so using the default time retention" duration=15d
ts=2024-09-03T08:30:37.085Z caller=main.go:645 level=info msg="Starting Prometheus Server" mode=server version="(version=2.54.1, branch=HEAD, revision=e6cfa720fbe6280153fab13090a483dbd40bece3)"
ts=2024-09-03T08:30:37.085Z caller=main.go:650 level=info build_context="(go=go1.22.6, platform=linux/arm64, user=root@812ffd741951, date=20240827-10:59:03, tags=netgo,builtinassets,stringlabels)"
ts=2024-09-03T08:30:37.086Z caller=main.go:651 level=info host_details="(Linux 6.8.11-300.fc40.aarch64 #1 SMP PREEMPT_DYNAMIC Mon May 27 15:22:03 UTC 2024 aarch64 6de96720c902 (none))"
ts=2024-09-03T08:30:37.086Z caller=main.go:652 level=info fd_limits="(soft=524288, hard=524288)"
ts=2024-09-03T08:30:37.086Z caller=main.go:653 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2024-09-03T08:30:37.087Z caller=web.go:571 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090
ts=2024-09-03T08:30:37.087Z caller=main.go:1160 level=info msg="Starting TSDB ..."
ts=2024-09-03T08:30:37.088Z caller=tls_config.go:313 level=info component=web msg="Listening on" address=[::]:9090
ts=2024-09-03T08:30:37.088Z caller=tls_config.go:316 level=info component=web msg="TLS is disabled." http2=false address=[::]:9090
ts=2024-09-03T08:30:37.090Z caller=head.go:626 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
ts=2024-09-03T08:30:37.090Z caller=head.go:713 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=500ns
ts=2024-09-03T08:30:37.090Z caller=head.go:721 level=info component=tsdb msg="Replaying WAL, this may take a while"
ts=2024-09-03T08:30:37.090Z caller=head.go:793 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
ts=2024-09-03T08:30:37.090Z caller=head.go:830 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=11.083µs wal_replay_duration=278.918µs wbl_replay_duration=84ns chunk_snapshot_load_duration=0s mmap_chunk_replay_duration=500ns total_replay_duration=299.709µs
ts=2024-09-03T08:30:37.092Z caller=main.go:1181 level=info fs_type=XFS_SUPER_MAGIC
ts=2024-09-03T08:30:37.092Z caller=main.go:1184 level=info msg="TSDB started"
ts=2024-09-03T08:30:37.092Z caller=main.go:1367 level=info msg="Loading configuration file" filename=/etc/prometheus/workshop-prometheus.yml
ts=2024-09-03T08:30:37.092Z caller=main.go:1404 level=info msg="updated GOGC" old=100 new=75
ts=2024-09-03T08:30:37.092Z caller=main.go:1415 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/workshop-prometheus.yml totalDuration=389.211µs db_storage=6.166µs remote_storage=709ns web_handler=250ns query_engine=1.083µs scrape=194.168µs scrape_sd=19.375µs notify=459ns notify_sd=375ns rules=1.167µs tracing=2.625µs
ts=2024-09-03T08:30:37.092Z caller=main.go:1145 level=info msg="Server is ready to receive web requests."
ts=2024-09-03T08:30:37.092Z caller=manager.go:164 level=info component="rule manager" msg="Starting rule manager..."
Prometheus - Verify bad target state
Back to the target state page and we see that indeed the target we configured is broken:
Exercise: go back and fix this target before proceeding!
Prometheus - Exploring command-line flags
When you started Prometheus we mentioned a flag to point at our new configuration file. There is
a status page you can use to view all the flags that have been set (by your or default settings).
Using the drop down menu again at the top labeled STATUS
and selecting
COMMAND-LINE FLAGS
:
Prometheus - Exploring command-line flags
This shows you a long list of flags with their current values and a search field for locating
one you might be interested in:
Prometheus - Searching for a flag
By filling in a search query, you can narrow down the long list to a specific area. Let's
explore the flag value for --config.file
:
Prometheus - Time series database status
Next, there is a status page you can use to view your time series database, or TSDB, status.
Using the drop down menu again at the top labeled STATUS
and selecting
TSDB-STATUS
:
Prometheus - Time series database status
This shows you some details for the time series being collected in an overview status line with
several tables below with cardinality status:
Prometheus - Exploring active configuration
Lastly, at least in this lab, you can verify the exact configuration being applied to your
Prometheus server. Using the drop down menu again at the top labeled STATUS
and selecting CONFIGURATION
:
Prometheus - Exploring active configuration
This shows you your exact configuration, often including some defaults that you might not have
in your personal configuration file, yet are begin used. There is even a handy copy-to-clipboard
button for you to grab it:
Prometheus - Using the expression tooling
You can navigate back to the expression browser that let's you query your time series data
by clicking on the menu entry GRAPH
. This is the default built-in query
interface for running Prometheus Query Language (PromQL) queries. Be sure you are in the
TABLE
tab:
Prometheus - Total samples ingested
The TABLE
view provides the output of a query written using PromQL expression as
a series. It's less expensive to use than the other option, GRAPH
, because you are
not plotting out the series answers in a graph. Without worrying about the PromQL used (we'll
explore that later in this workshop), lets show the total number of samples ingested by our
Prometheus server since it started:
prometheus_tsdb_head_samples_appended_total
Prometheus - Validating an expression
You will have noticed that there are three buttons to the right of the
EXPRESSION
entry field. The first one can be used to validate the expression
you entered, just click and if it's a good expression you get a checkmark. Enter the previous
slides expression and click on the first button to generate a checkmark:
Prometheus - Execute the expression query
After validating our expression, run it by clicking on the EXECUTE
button:
Prometheus - Exploring the visualization
Let's explore by clicking on the GRAPH
tab. Note the features included to
visualize queries. I have adjusted my longer running server to look at 12h of data here and
am browsing the results:
Prometheus - Add another panel
Go back to the TABLE
tab. Notice at the bottom there is a button
ADD PANEL
, click on it to add another query panel in which we will execute
the following query expression. Let's look at the number of samples ingested per second averaged
over a 1m window of time:
rate(prometheus_tsdb_head_samples_appended_total[1m])
Prometheus - Execute the second query
You can now see multiple expressions are possible:
Prometheus - Exploring second visualization
Intermezzo - Warning about pages
If you now select any of the status pages from the STATUS
menu at the top and
then return to the expression query page using the GRAPH
menu entry, you will
notice that extra panels you might have added will be gone.
Pro tip: you might want to work using browser tabs.
Prometheus - One last query
Assuming you went to another status page and back, we have a last query we will run here to
simulate the same query used to fill the system UP metric we viewed for our Prometheus target:
Prometheus - Execute the second query
You can now see the results as a boolean value suggesting it is really UP:
Prometheus - Exploring this silly visualization
This is a rather silly metric to visualize this way, but this is what it looks like going back
over 1d as my server was running during this lab's development (you can check my working hours!):
Lab completed - Results
Next up, exploring the query language...
Contact - are there any questions?