Lab 2 - Installing Prometheus (binary package)
Lab Goal
This lab guides you through installing Prometheus from a pre-compiled binary package on your
local machine, configuring, and running it to start gathering metrics.
Prometheus - Installation notes
This lab guides you through installing Prometheus using one of the pre-compiled binaries. You
will see this done here for Mac OSX, but links will be provided for both Linux and Windows
systems. It is expected that if you are using either of these operating systems, that you have
enough knowledge to apply the steps from this guide using the specific system tooling provided
by your local machine.
Installation - Make a project directory
The first step will be to open a console or terminal window and start with the command line
to create yourself an empty workshop directory, something like this:
$ mkdir workshop-prometheus
$ cd workshop-prometheus
Installation - Download Prometheus
Next up you will need to download the Prometheus binary (one that matches your local machine
operating system) and unzip it in the
workshop-directory
:
-
Prometheus 2.54.1
amd64
(Mac OSX / Darwin)
-
Prometheus 2.54.1
amd64
or
i386
(Linux)
-
Prometheus 2.54.1
amd64
or
i386
(Windows)
Installation - Unpacking the binary
Unpacking the download should look something like this (note the version might
be different by the time you take this workshop):
$ tar -xzvf prometheus-2.54.1.darwin-amd64.tar.gz
x prometheus-2.54.1.darwin-amd64/
x prometheus-2.54.1.darwin-amd64/console_libraries/
x prometheus-2.54.1.darwin-amd64/console_libraries/prom.lib
x prometheus-2.54.1.darwin-amd64/console_libraries/menu.lib
x prometheus-2.54.1.darwin-amd64/NOTICE
x prometheus-2.54.1.darwin-amd64/promtool
x prometheus-2.54.1.darwin-amd64/prometheus.yml
x prometheus-2.54.1.darwin-amd64/LICENSE
x prometheus-2.54.1.darwin-amd64/prometheus
x prometheus-2.54.1.darwin-amd64/consoles/
x prometheus-2.54.1.darwin-amd64/consoles/node-disk.html
x prometheus-2.54.1.darwin-amd64/consoles/node-cpu.html
x prometheus-2.54.1.darwin-amd64/consoles/prometheus.html
x prometheus-2.54.1.darwin-amd64/consoles/prometheus-overview.html
x prometheus-2.54.1.darwin-amd64/consoles/node.html
x prometheus-2.54.1.darwin-amd64/consoles/index.html.example
x prometheus-2.54.1.darwin-amd64/consoles/node-overview.html
Installation - Exploring the tools
There are four items you just unpacked that are of interest to us:
prometheus
- the binary executable for Prometheus
promtool
- a command line configuration validation tool
prometheus.yml
- simple configuration to run Prometheus
-
consoles directory
- contains
example console templates
We will be making the most use of the Prometheus binary and the basic configuration file
in the rest of this workshop.
Setup - Copy basic configuration
First we will make a copy of the provided configuration file for use in the rest of this
workshop. We need to move into the Prometheus directory and then make the copy:
$ cd prometheus-2.54.1.darwin-amd64
$ cp prometheus.yml workshop-prometheus.yml
Setup - Workshop configuration
Open the copied file workshop-prometheus.yml
and you should see a lot of comments
spread over the file (something like 25-30 lines). By default it's set up to scrape metrics
from itself every 15 seconds. Using your favorite editor, clean it up a bit so that it looks
like this (be sure to save the results):
# workshop config
global:
scrape_interval: 5s
# Scraping only Prometheus.
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
Configuration - Some thoughts on setup
Normally you are monitoring other targets over HTTP and scraping their endpoints, but we are
going to start with Prometheus as it also exposes its own metrics endpoints. Monitoring your
Prometheus servers' health is only an exercise for this workshop. Also note that scraping metrics
every 5 seconds is a bit over the top, commonly you would see 10-60 seconds, but we want our
data to flow in a steady stream for this workshop.
Configuration - The global section
As you can imagine, the global
section is used for settings and default values.
Here we have just set the default scrape interval
to be 5 seconds:
# workshop config
global:
scrape_interval: 5s
Configuration - The scrape configs section
The scrape_configs
section is where you tell Prometheus which targets to scrape to
collect metrics from. In the beginning we will be listing each job for our targets manually,
using a host:port
format:
# Scraping only Prometheus.
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
Note: production configurations would use service discovery integrations to find targets, more
on that later in this workshop.
Prometheus - Start your metrics engines!
Now it's time to start the Prometheus server. We will point to our config file using a flag, if
you don't it's going to grab the default prometheus.yml
file. Also note, the in
memory database is stored by default in ./data
. Note: might need to approve the
file in security settings if it fails to start. (scroll to view log):
$ ./prometheus --config.file=workshop-prometheus.yml
...
ts=2024-09-03T07:28:19.024Z caller=main.go:601 level=info msg="No time or size retention was set so using the default time retention" duration=15d
ts=2024-09-03T07:28:19.024Z caller=main.go:645 level=info msg="Starting Prometheus Server" mode=server version="(version=2.54.1, branch=HEAD, revision=e6cfa720fbe6280153fab13090a483dbd40bece3)"
ts=2024-09-03T07:28:19.024Z caller=main.go:650 level=info build_context="(go=go1.22.6, platform=darwin/amd64, user=root@432dfbac62dc, date=20240827-10:56:36, tags=netgo,builtinassets,stringlabels)"
ts=2024-09-03T07:28:19.024Z caller=main.go:651 level=info host_details=(darwin)
ts=2024-09-03T07:28:19.025Z caller=main.go:652 level=info fd_limits="(soft=61440, hard=unlimited)"
ts=2024-09-03T07:28:19.025Z caller=main.go:653 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2024-09-03T07:28:19.028Z caller=web.go:571 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090
ts=2024-09-03T07:28:19.028Z caller=main.go:1160 level=info msg="Starting TSDB ..."
ts=2024-09-03T07:28:19.031Z caller=tls_config.go:313 level=info component=web msg="Listening on" address=[::]:9090
ts=2024-09-03T07:28:19.031Z caller=tls_config.go:316 level=info component=web msg="TLS is disabled." http2=false address=[::]:9090
ts=2024-09-03T07:28:19.034Z caller=head.go:626 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
ts=2024-09-03T07:28:19.034Z caller=head.go:713 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=7.625µs
ts=2024-09-03T07:28:19.034Z caller=head.go:721 level=info component=tsdb msg="Replaying WAL, this may take a while"
ts=2024-09-03T07:28:19.035Z caller=head.go:793 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
ts=2024-09-03T07:28:19.035Z caller=head.go:830 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=55.458µs wal_replay_duration=994.875µs wbl_replay_duration=167ns chunk_snapshot_load_duration=0s mmap_chunk_replay_duration=7.625µs total_replay_duration=1.084708ms
ts=2024-09-03T07:28:19.036Z caller=main.go:1181 level=info fs_type=1a
ts=2024-09-03T07:28:19.036Z caller=main.go:1184 level=info msg="TSDB started"
ts=2024-09-03T07:28:19.036Z caller=main.go:1367 level=info msg="Loading configuration file" filename=../workshop-prometheus.yml
ts=2024-09-03T07:28:19.117Z caller=main.go:1404 level=info msg="updated GOGC" old=100 new=75
ts=2024-09-03T07:28:19.117Z caller=main.go:1415 level=info msg="Completed loading of configuration file" filename=../workshop-prometheus.yml totalDuration=80.777625ms db_storage=2.083µs remote_storage=2.75µs web_handler=542ns query_engine=1.542µs scrape=79.484375ms scrape_sd=37.75µs notify=2.834µs notify_sd=625ns rules=6.375µs tracing=68.333µs
ts=2024-09-03T07:28:19.117Z caller=main.go:1145 level=info msg="Server is ready to receive web requests."
ts=2024-09-03T07:28:19.117Z caller=manager.go:164 level=info component="rule manager" msg="Starting rule manager..."
Prometheus - Adjusting settings live
If you change settings in a configuration file from a live running Prometheus server, you can
apply them by sending a
HUP
signal on your platform or by reloading via the HTTP
API. Any changes you need to make that were in the flags used to start the server will require
a full server restart to apply.
Now let's see if our Prometheus server is up and running on our local machine by loading the
status page in our browser at
http://localhost:9090,
noting it needs to run a little bit to collect some data from its own HTTP metrics endpoint.
Prometheus - The status page
Now try the dark mode feature by clicking on the half moon icon in the top right corner.
Prometheus - Status page (dark mode)
Prometheus - Live metrics endpoint
Prometheus - Checking your targets
After you configure a new prometheus
target to scrape and (re)start the
Prometheus server, validate it's running correctly by going to the status page, using the
drop down menu at the top labeled STATUS
and selecting
TARGETS
:
Prometheus - Checking your targets
This shows you a list of the targets, in our case just one, featuring the scrape configuration
details. The most important field here is the STATE
, where we want to see a green
UP
:
Prometheus - Bad target state
Let's break our configuration and see what that looks like in the targets status page. To do this
open up your configuration file workshop-prometheus.yml
and edit the
scrape_configs
section to alter the targets port number as shown:
# Scraping only Prometheus.
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9099"]
Save and see next slide for applying the new configuration.
Prometheus - Apply a new configuration
We mentioned you can restart the Prometheus, that is, stop it and then restart when a new
configuration needs to be applied. That's going to mean you lose a time period of collecting
time series data, so let's send a restart signal instead using the kill
command. First we find the Prometheus server process id (PID) using one command, then apply it
using the command as shown below:
$ ps aux | grep prometheus
erics 94110 1:31PM 0:05.28 ./prometheus --config.file=workshop-prometheus.yml
erics 97648 2:43PM 0:00.00 grep prometheus
$ kill -s HUP 94110
Prometheus - Verify bad target state
Back to the target state page and we see that indeed the target we configured is broken:
Exercise: go back and revert or fix this target and restart the instance before proceeding!
Prometheus - Exploring command-line flags
When you started Prometheus we mentioned a flag to point at our new configuration file. There is
a status page you can use to view all the flags that have been set (by your or default settings).
Using the drop down menu again at the top labeled STATUS
and selecting
COMMAND-LINE FLAGS
:
Prometheus - Exploring command-line flags
This shows you a long list of flags with their current values and a search field for locating
one you might be interested in:
Prometheus - Searching for a flag
By filling in a search query, you can narrow down the long list to a specific area. Let's explore
the flag value for --config.file
:
Prometheus - Time series database status
Next, there is a status page you can use to view your time series database, or TSDB, status.
Using the drop down menu again at the top labeled STATUS
and selecting
TSDB-STATUS
:
Prometheus - Time series database status
This shows you some details for the time series being collected in an overview status line with
several tables below with cardinality status:
Prometheus - Exploring active configuration
Lastly, at least in this lab, you can verify the exact configuration being applied to your
Prometheus server. Using the drop down menu again at the top labeled STATUS
and selecting CONFIGURATION
:
Prometheus - Exploring active configuration
This shows you your exact configuration, often including some defaults that you might not have
in your personal configuration file, yet are begin used. There is even a handy copy-to-clipboard
button for you to grab it:
Prometheus - Using the expression tooling
You can navigate back to the expression browser that let's you query your time series data
by clicking on the menu entry GRAPH
. This is the default built-in query
interface for running Prometheus Query Language (PromQL) queries. Be sure you are in the
TABLE
tab:
Prometheus - Total samples ingested
The TABLE
view provides the output of a query written using PromQL expression as
a series. It's less expensive to use than the other option, GRAPH
, because you are
not plotting out the series answers in a graph. Without worrying about the PromQL used (we'll
explore that later in this workshop), lets show the total number of samples ingested by our
Prometheus server since it started:
prometheus_tsdb_head_samples_appended_total
Prometheus - Validating an expression
You will have noticed that there are three buttons to the right of the
EXPRESSION
entry field. The first one can be used to validate the expression
you entered, just click and if it's a good expression you get a checkmark. Enter the previous
slides expression and click on the first button to generate a checkmark:
Prometheus - Execute the expression query
After validating our expression, run it by clicking on the EXECUTE
button:
Prometheus - Exploring the visualization
Let's explore by clicking on the GRAPH
tab. Note the features included to
visualize queries. I have adjusted my longer running server to look at 12h of data here and
am browsing the results:
Prometheus - Add another panel
Go back to the TABLE
tab. Notice at the bottom there is a button
ADD PANEL
, click on it to add another query panel in which we will execute
the following query expression. Let's look at the number of samples ingested per second averaged
over a 1m window of time:
rate(prometheus_tsdb_head_samples_appended_total[1m])
Prometheus - Execute the second query
You can now see multiple expressions are possible:
Prometheus - Exploring second visualization
Intermezzo - Warning about pages
If you now select any of the status pages from the STATUS
menu at the top
and then return to the expression query page using the GRAPH
menu entry, you
will notice that extra panels you might have added will be gone.
Pro tip: you might want to work using browser tabs.
Prometheus - One last query
Assuming you went to another status page and back, we have a last query we will run here to
simulate the same query used to fill the system UP metric we viewed for our Prometheus target:
Prometheus - Execute the second query
You can now see the results as a boolean value suggesting it is really UP:
Prometheus - Exploring this silly visualization
This is a rather silly metric to visualize this way, but this is what it looks like going back
over 1d as my server was running during this lab's development (you can check my working hours!):
Lab completed - Results
Next up, exploring the query language...
Contact - are there any questions?