# Monitoring and Alerting

## Introduction

This section will provide the steps to install Prometheus, Grafana, Nginx and Certbot for monitoring your node server plus provide a means to send alerts using Telegram and PagerDuty. The Prometheus steps are once again taken from Digital Ocean's guide [here,](https://www.digitalocean.com/community/tutorials/how-to-install-prometheus-on-ubuntu-16-04) Grafana steps [here](https://www.digitalocean.com/community/tutorials/how-to-install-and-secure-grafana-on-ubuntu-20-04), Blackbox Exporter steps here, Nginx steps [here](https://www.digitalocean.com/community/tutorials/how-to-install-nginx-on-ubuntu-20-04) and finally the Certbot/Let's encrypt guide [here](https://www.digitalocean.com/community/tutorials/how-to-secure-nginx-with-let-s-encrypt-on-ubuntu-18-04).

## NGINX

Before we install Prometheus we will need to install NGINX to serve the HTTP traffic.

```bash
sudo apt update
sudo apt install nginx
```

Before testing Nginx, the firewall software needs to be adjusted to allow access to the service. Nginx registers itself as a service with `ufw` upon installation, making it straightforward to allow Nginx access.

List the application configurations that `ufw` knows how to work with by typing:

```bash
sudo ufw app list
```

You should get a listing of the application profiles:

```bash
Available applications:
  Nginx Full
  Nginx HTTP
  Nginx HTTPS
  OpenSSH
```

As demonstrated by the output, there are three profiles available for Nginx:

* **Nginx Full**: This profile opens both port 80 (normal, unencrypted web traffic) and port 443 (TLS/SSL encrypted traffic)
* **Nginx HTTP**: This profile opens only port 80 (normal, unencrypted web traffic)
* **Nginx HTTPS**: This profile opens only port 443 (TLS/SSL encrypted traffic)

{% hint style="info" %}
It is recommended that you enable the most restrictive profile that will still allow the traffic you’ve configured. We will choose 'Full' to begin with. Once Testing is fully complete you may want to restrict this further by changing to 'HTTPS' and deleting 'Full'
{% endhint %}

&#x20;You can enable this by typing:

```bash
sudo ufw allow 'Nginx HTTPS'
```

You can verify the change by typing:

```bash
sudo ufw status
```

The output will indicate which traffic is allowed:

![](/files/-MeV1_Pk6vGZTtEMEv6Q)

## Prometheus

For security purposes, we’ll begin by creating the Prometheus user account, **`prometheus`**. We’ll use this account throughout the tutorial to isolate the ownership on Prometheus’ core files and directories.

Create these user, and use the `--no-create-home` and `--shell /bin/false` options so that these users can’t log into the server.

```
sudo useradd --no-create-home --shell /bin/false prometheus
```

&#x20;Before we download the Prometheus binaries, create the necessary directories for storing Prometheus’ files and data. Following standard Linux conventions, we’ll create a directory in `/etc` for Prometheus’ configuration files and a directory in `/var/lib` for its data.

```
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
```

&#x20;Now, set the user and group ownership on the new directories to the **`prometheus`** user.

```
sudo chown prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
```

&#x20;With our user and directories in place, we can now download Prometheus and then create the minimal configuration file to run Prometheus for the first time.

### Download Prometheus

First, download and unpack the current stable version of Prometheus into your home directory. You can find the latest binaries along with their checksums on the [Prometheus download page](https://prometheus.io/download/).

```
cd ~
curl -LO https://github.com/prometheus/prometheus/releases/download/v2.28.1/prometheus-2.28.1.linux-amd64.tar.gz
```

&#x20;Next, use the `sha256sum` command to generate a checksum of the downloaded file:

```
sha256sum prometheus-2.28.1.linux-amd64.tar.gz
```

Compare the output from this command with the checksum on the Prometheus download page to ensure that your file is both genuine and not corrupted.

![](/files/-MeVFyecFogdL_n7dgVb)

Now, unpack the downloaded archive.

```bash
tar xvf prometheus-2.28.1.linux-amd64.tar.gz
```

This will create a directory called `prometheus-2.28.1.linux-amd64` containing two binary files (`prometheus` and `promtool`), `consoles` and `console_libraries` directories containing the web interface files, a license, a notice, and several example files.

Copy the two binaries to the `/usr/local/bin` directory.

```
sudo cp prometheus-2.28.1.linux-amd64/prometheus /usr/local/bin/
sudo cp prometheus-2.28.1.linux-amd64/promtool /usr/local/bin/
```

Set the user and group ownership on the binaries to the **`prometheus`** user created in Step 1.

```
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
```

Copy the `consoles` and `console_libraries` directories to `/etc/prometheus`.

```
sudo cp -r prometheus-2.28.1.linux-amd64/consoles /etc/prometheus
sudo cp -r prometheus-2.28.1.linux-amd64/console_libraries /etc/prometheus
```

Set the user and group ownership on the directories to the **prometheus** user. Using the `-R` flag will ensure that ownership is set on the files inside the directory as well.

```
sudo chown -R prometheus:prometheus /etc/prometheus/consoles
sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries
```

Lastly, remove the leftover files from your home directory as they are no longer needed.

```
rm -rf prometheus-2.28.1.linux-amd64.tar.gz prometheus-2.0.0.linux-amd64
```

Now that Prometheus is installed, we’ll create its configuration and service files in preparation of its first run.

### Configure Prometheus

In the `/etc/prometheus` directory, use `nano` or your favorite text editor to create a configuration file named `prometheus.yml`. For now, this file will contain just enough information to run Prometheus for the first time.

```bash
sudo nano /etc/prometheus/prometheus.yml
```

{% hint style="danger" %}
**Warning:** Prometheus’ configuration file uses the [YAML format](http://www.yaml.org/start.html), which strictly forbids tabs and requires two spaces for indentation. Prometheus will fail to start if the configuration file is incorrectly formatted.
{% endhint %}

In the `global` settings, define the default interval for scraping metrics. Note that Prometheus will apply these settings to every exporter unless an individual exporter’s own settings override the globals.

{% code title="Prometheus config file part 1 - /etc/prometheus/prometheus.yml global:" %}

```
 scrape_interval: 15s
```

{% endcode %}

This `scrape_interval` value tells Prometheus to collect metrics from its exporters every 15 seconds, which is long enough for most exporters.

Now, add Prometheus itself to the list of exporters to scrape from with the following `scrape_configs` directive:

Prometheus config file part 2 - /etc/prometheus/prometheus.yml

```
...
scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
```

Prometheus uses the `job_name` to label exporters in queries and on graphs, so be sure to pick something descriptive here.

And, as Prometheus exports important data about itself that you can use for monitoring performance and debugging, we’ve overridden the global `scrape_interval` directive from 15 seconds to 5 seconds for more frequent updates.

Lastly, Prometheus uses the `static_configs` and `targets` directives to determine where exporters are running. Since this particular exporter is running on the same server as Prometheus itself, we can use `localhost` instead of an IP address along with the default port, `9090`.

Your configuration file should now look like this:

{% code title="Prometheus config file - /etc/prometheus/prometheus.yml" %}

```
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
```

{% endcode %}

Save the file and exit your text editor.

Now, set the user and group ownership on the configuration file to the **prometheus** user created in Step 1.

```
sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml
```

With the configuration complete, we’re ready to test Prometheus by running it for the first time.

### Starting Prometheus

Start up Prometheus as the **`prometheus`** user, providing the path to both the configuration file and the data directory.

```
sudo -u prometheus /usr/local/bin/prometheus \
  --config.file /etc/prometheus/prometheus.yml \
  --storage.tsdb.path /var/lib/prometheus/ \
  --web.console.templates=/etc/prometheus/consoles \
  --web.console.libraries=/etc/prometheus/console_libraries
```

The output contains information about Prometheus’ loading progress, configuration file, and related services. It also confirms that Prometheus is listening on port `9090`.

```
Outputlevel=info ts=2017-11-17T18:37:27.474530094Z caller=main.go:215 msg="Starting Prometheus" version="(version=2.0.0, branch=HEAD, re
vision=0a74f98628a0463dddc90528220c94de5032d1a0)"
level=info ts=2017-11-17T18:37:27.474758404Z caller=main.go:216 build_context="(go=go1.9.2, user=root@615b82cb36b6, date=20171108-
07:11:59)"
level=info ts=2017-11-17T18:37:27.474883982Z caller=main.go:217 host_details="(Linux 4.4.0-98-generic #121-Ubuntu SMP Tue Oct 10 1
4:24:03 UTC 2017 x86_64 prometheus-update (none))"
level=info ts=2017-11-17T18:37:27.483661837Z caller=web.go:380 component=web msg="Start listening for connections" address=0.0.0.0
:9090
level=info ts=2017-11-17T18:37:27.489730138Z caller=main.go:314 msg="Starting TSDB"
level=info ts=2017-11-17T18:37:27.516050288Z caller=targetmanager.go:71 component="target manager" msg="Starting target manager...
"
level=info ts=2017-11-17T18:37:27.537629169Z caller=main.go:326 msg="TSDB started"
level=info ts=2017-11-17T18:37:27.537896721Z caller=main.go:394 msg="Loading configuration file" filename=/etc/prometheus/promethe
us.yml
level=info ts=2017-11-17T18:37:27.53890004Z caller=main.go:371 msg="Server is ready to receive requests."
```

If you get an error message, double-check that you’ve used YAML syntax in your configuration file and then follow the on-screen instructions to resolve the problem.

Now, halt Prometheus by pressing `CTRL+C`, and then open a new `systemd` service file.

```
sudo nano /etc/systemd/system/prometheus.service
```

The service file tells `systemd` to run Prometheus as the **prometheus** user, with the configuration file located in the `/etc/prometheus/prometheus.yml` directory and to store its data in the `/var/lib/prometheus` directory. (The details of `systemd` service files are beyond the scope of this tutorial, but you can learn more at [Understanding Systemd Units and Unit Files](https://www.digitalocean.com/community/tutorials/understanding-systemd-units-and-unit-files#where-are-systemd-unit-files-found).)

Copy the following content into the file:&#x20;

Prometheus service file - /etc/systemd/system/prometheus.service

```
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries

[Install]
WantedBy=multi-user.target
```

Finally, save the file and close your text editor.

To use the newly created service, reload `systemd`.

```
sudo systemctl daemon-reload
```

You can now start Prometheus using the following command:

```
sudo systemctl start prometheus
```

&#x20;To make sure Prometheus is running, check the service’s status.

```
sudo systemctl status prometheus
```

The output tells you Prometheus’ status, main process identifier (PID), memory use, and more.

If the service’s status isn’t `active`, follow the on-screen instructions and re-trace the preceding steps to resolve the problem before continuing the tutorial.

```
Output● prometheus.service - Prometheus
   Loaded: loaded (/etc/systemd/system/prometheus.service; disabled; vendor preset: enabled)
   Active: active (running) since Fri 2017-07-21 11:40:40 UTC; 3s ago
 Main PID: 2104 (prometheus)
    Tasks: 7
   Memory: 13.8M
      CPU: 470ms
   CGroup: /system.slice/prometheus.service
...
```

When you’re ready to move on, press `Q` to quit the `status` command.

Lastly, enable the service to start on boot.

```
sudo systemctl enable prometheus
```

Now that Prometheus is up and running, we can install an additional exporter to generate metrics about our server’s resources.

### Configure Prometheus to Scrape Node Exporter on the Node Server

Because Prometheus only scrapes exporters which are defined in the `scrape_configs` portion of its configuration file, we’ll need to add an entry for Node Exporter, just like we did for Prometheus itself.

{% hint style="info" %}
Before we do that however we need to open the firewall on the **Node** server to allow connections from the Monitoring server.
{% endhint %}

On the **Node** Server:

```
sudo ufw allow from <Your_Monitoring_Server_IP> to any port 9100
```

{% hint style="info" %}
Note: you may also need to open these ports within your AWS security groups
{% endhint %}

Open the configuration file.

```
sudo nano /etc/prometheus/prometheus.yml
```

&#x20;At the end of the `scrape_configs` block, add a new entry called `node_exporter`.

Prometheus config file part 1 - /etc/prometheus/prometheus.yml

```
...
  - job_name: 'node_exporter'
    scrape_interval: 5s
    static_configs:
      - targets: ['<YOUR_NODE_SERVER_IP>:9100']
```

&#x20;Because Node Exporter is running on the Node server , we need to add in `<YOUR_NODE_SERVER_IP>` with Node Exporter’s default port, `9100`.

Your whole configuration file should look like this:

Prometheus config file - /etc/prometheus/prometheus.yml

```
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'node_exporter'
    scrape_interval: 5s
    static_configs:
      - targets: ['<YOUR_NODE_SERVER_IP>:9100']
```

Save the file and exit your text editor when you’re ready to continue.

Finally, restart Prometheus to put the changes into effect.

```
sudo systemctl restart prometheus
```

Once again, verify that everything is running correctly with the `status` command.

```
sudo systemctl status prometheus
```

&#x20;If the service’s status isn’t set to `active`, follow the on screen instructions and re-trace your previous steps before moving on.

```
Output● prometheus.service - Prometheus
   Loaded: loaded (/etc/systemd/system/prometheus.service; disabled; vendor preset: enabled)
   Active: active (running) since Fri 2017-07-21 11:46:39 UTC; 6s ago
 Main PID: 2219 (prometheus)
    Tasks: 6
   Memory: 19.9M
      CPU: 433ms
   CGroup: /system.slice/prometheus.service
```

We now have Prometheus installed, configured, and running. As a final precaution before connecting to the web interface, we’ll enhance our installation’s security with basic HTTP authentication to ensure that unauthorized users can’t access our metrics.

### Securing Prometheus

Prometheus does not include built-in authentication or any other general purpose security mechanism. On the one hand, this means you’re getting a highly flexible system with fewer configuration restraints; on the other hand, it means it’s up to you to ensure that your metrics and overall setup are sufficiently secure.

For simplicity’s sake, we’ll use Nginx to add basic HTTP authentication to our installation, which both Prometheus and its preferred data visualization tool, Grafana, fully support.

Start by installing `apache2-utils`, which will give you access to the `htpasswd` utility for generating password files.

```
$ sudo apt-get update
$ sudo apt-get install apache2-utils
```

Now, create a password file by telling `htpasswd` where you want to store the file and which username `<username>` you’d like to use for authentication.

{% hint style="info" %}
**Note:** `htpasswd` will prompt you to enter and re-confirm the password you’d like to associate with this user. Also, make note of both the username and password you enter here, as you’ll need them to log into Prometheus in Step 9.
{% endhint %}

```
$ sudo htpasswd -c /etc/nginx/.htpasswd <username>
```

The result of this command is a newly-created file called `.htpasswd`, located in the `/etc/nginx` directory, containing the username and a hashed version of the password you entered.

Next, configure Nginx to use the newly-created passwords.

First, make a Prometheus-specific copy of the default Nginx configuration file so that you can revert back to the defaults later if you run into a problem.

```
sudo cp /etc/nginx/sites-available/default /etc/nginx/sites-available/prometheus
```

&#x20;Then, open the new configuration file.

```
sudo nano /etc/nginx/sites-available/prometheus
```

&#x20;Locate the `location /` block under the `server` block. It should look like:/etc/nginx/sites-available/default

```
...
    location / {
        try_files $uri $uri/ =404;
    }
...
```

&#x20;As we will be forwarding all traffic to Prometheus, replace the `try_files` directive with the following content:

{% code title="/etc/nginx/sites-available/prometheus" %}

```
...
    location / {
        auth_basic "Prometheus server authentication";
        auth_basic_user_file /etc/nginx/.htpasswd;
        proxy_pass http://localhost:9090;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
    }
...
```

{% endcode %}

These settings ensure that users will have to authenticate at the start of each new session. Additionally, the reverse proxy will direct all requests handled by this block to Prometheus.

When you’re finished making changes, save the file and close your text editor.

Now, deactivate the default Nginx configuration file by removing the link to it in the `/etc/nginx/sites-enabled` directory, and activate the new configuration file by creating a link to it.

```
sudo rm /etc/nginx/sites-enabled/default
sudo ln -s /etc/nginx/sites-available/prometheus /etc/nginx/sites-enabled/
```

&#x20;Before restarting Nginx, check the configuration for errors using the following command:

```
sudo nginx -t
```

&#x20;The output should indicate that the `syntax is ok` and the `test is successful`. If you receive an error message, follow the on-screen instructions to fix the problem before proceeding to the next step.

Output of Nginx configuration tests:

```
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
```

Then, reload Nginx to incorporate all of the changes.

```
sudo systemctl reload nginx
```

Verify that Nginx is up and running.

```
sudo systemctl status nginx
```

If your output doesn’t indicate that the service’s status is `active`, follow the on-screen messages and re-trace the preceding steps to resolve the issue before continuing.

{% code title="Output" %}

```
● nginx.service - A high performance web server and a reverse proxy server
   Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: en
   Active: active (running) since Mon 2017-07-31 21:20:57 UTC; 12min ago
  Process: 4302 ExecReload=/usr/sbin/nginx -g daemon on; master_process on; -s r
 Main PID: 3053 (nginx)
    Tasks: 2
   Memory: 3.6M
      CPU: 56ms
   CGroup: /system.slice/nginx.service
```

{% endcode %}

At this point, we have a fully-functional and secured Prometheus server, so we can log into the web interface to begin looking at metrics.

### Testing Prometheus

Prometheus provides a basic web interface for monitoring the status of itself and its exporters, executing queries, and generating graphs. But, due to the interface’s simplicity, the Prometheus team [recommends](https://prometheus.io/docs/visualization/browser/) [installing and using Grafana](https://prometheus.io/docs/visualization/grafana/) for anything more complicated than testing and debugging.

In this tutorial, we’ll use the built-in web interface to ensure that Prometheus and Node Exporter are up and running before moving on to install Blackbox Exporter and Grafana.

To begin, point your web browser to `http://your_server_ip`.

In the HTTP authentication dialogue box, enter the username and password you chose earlier.

![Prometheus Authentication](https://assets.digitalocean.com/articles/install-prometheus-on-ubuntu-16-04/Prometheus-Authentication.png)

Once logged in, you’ll see the **Expression Browser**, where you can execute and visualize custom queries.

![Prometheus Dashboard Welcome](https://assets.digitalocean.com/articles/install-prometheus-on-ubuntu-16-04/Prometheus-Dashboard-Welcome.png)

Before executing any expressions, verify the status of both Prometheus and Node Explorer by clicking first on the **Status** menu at the top of the screen and then on the **Targets** menu option. As we have configured Prometheus to scrape both itself and Node Exporter, you should see both targets listed in the `UP` state.

![](/files/-MeV7M3eIQak4Tnrof-G)

If either exporter is missing or displays an error message, check the service’s status with the following commands:

```
sudo systemctl status prometheus
```

```
sudo systemctl status node_exporter
```

The output for both services should report a status of `Active: active (running)`. If a service either isn’t active at all or is active but still not working correctly, follow the on-screen instructions and re-trace the previous steps before continuing.

## Installing Blackbox

The Blackbox exporter enables blackbox probing of endpoints over HTTP, HTTPS, DNS, TCP and ICMP. We can use it for checking the uptime status of both the Node and Monitoring servers.

### Create a Service User

For security purposes, we’ll create a **blackbox\_exporter** user account. We’ll use this account throughout the tutorial to run Blackbox Exporter and to isolate the ownership on appropriate core files and directories. This ensures Blackbox Exporter can't access and modify data it doesn't own.

Create these user with the `useradd` command using the `--no-create-home` and `--shell /bin/false` flags so that these users can’t log into the server:

```
sudo useradd --no-create-home --shell /bin/false blackbox_exporter
```

With the users in place, let’s download and configure Blackbox Exporter.

### Installing Blackbox Exporter <a href="#step-2-installing-blackbox-exporter" id="step-2-installing-blackbox-exporter"></a>

First, download the latest stable version of Blackbox Exporter to your home directory. You can find the latest binaries along with their checksums on the [Prometheus Download page](https://prometheus.io/download/).

```
cd ~
curl -LO https://github.com/prometheus/blackbox_exporter/releases/download/v0.12.0/blackbox_exporter-0.19.0.linux-amd64.tar.gz
```

Before unpacking the archive, verify the file’s checksums using the following `sha256sum` command:

```
sha256sum blackbox_exporter-0.19.0.linux-amd64.tar.gz
```

Compare the output from this command with the checksum on the [Prometheus download page](https://prometheus.io/download/) to ensure that your file is both genuine and not corrupted:

![](/files/-MeVFauxnSrN2d-0gFvp)

If the checksums don’t match, remove the downloaded file and repeat the preceding steps to re-download the file.

When you’re sure the checksums match, unpack the archive:

```
tar xvf blackbox_exporter-0.19.0.linux-amd64.tar.gz
```

This creates a directory called `blackbox_exporter-0.19.0.linux-amd64`, containing the `blackbox_exporter` binary file, a license, and example files.

Copy the binary file to the `/usr/local/bin` directory.

```
sudo mv ./blackbox_exporter-0.19.0.linux-amd64/blackbox_exporter /usr/local/bin
```

Set the user and group ownership on the binary to the **blackbox\_exporter** user, ensuring non-root users can’t modify or replace the file:

```
sudo chown blackbox_exporter:blackbox_exporter /usr/local/bin/blackbox_exporter
```

Lastly, we’ll remove the archive and unpacked directory, as they’re no longer needed.

```
rm -rf ~/blackbox_exporter-0.19.0.linux-amd64.tar.gz ~/blackbox_exporter-0.19.0.linux-amd64
```

Next, let’s configure Blackbox Exporter to probe endpoints over the HTTP protocol and then run it.

### Configuring and Running Blackbox Exporter <a href="#step-3-configuring-and-running-blackbox-exporter" id="step-3-configuring-and-running-blackbox-exporter"></a>

Let’s create a configuration file defining how Blackbox Exporter should check endpoints. We’ll also create a systemd unit file so we can manage Blackbox’s service using `systemd`.

We’ll specify the list of endpoints to probe in the Prometheus configuration in the next step.

First, create the directory for Blackbox Exporter’s configuration. Per Linux conventions, configuration files go in the `/etc` directory, so we’ll use this directory to hold the Blackbox Exporter configuration file as well:

```
sudo mkdir /etc/blackbox_exporter
```

Then set the ownership of this directory to the **blackbox\_exporter** user you created in Step 1:

```
sudo chown blackbox_exporter:blackbox_exporter /etc/blackbox_exporter
```

In the newly-created directory, create the `blackbox.yml` file which will hold the Blackbox Exporter configuration settings:

```
sudo nano /etc/blackbox_exporter/blackbox.yml
```

We’ll configure Blackbox Exporter to use the default `http` prober to probe endpoints. *Probers* define how Blackbox Exporter checks if an endpoint is running. The `http` prober checks endpoints by sending a HTTP request to the endpoint and testing its response code. You can select which HTTP method to use for probing, as well as which status codes to accept as successful responses. Other popular probers include the `tcp` prober for probing via the TCP protocol, the `icmp` prober for probing via the ICMP protocol and the `dns` prober for checking DNS entries.

For this tutorial, we’ll use the `http` prober to probe the endpoint running on port `8080` over the HTTP `GET` method. By default, the prober assumes that valid status codes in the `2xx` range are valid, so we don’t need to provide a list of valid status codes.

We’ll configure a timeout of **5** seconds, which means Blackbox Exporter will wait 5 seconds for the response before reporting a failure. Depending on your application type, choose any value that matches your needs.

{% hint style="info" %}
**Note:** Blackbox Exporter’s configuration file uses the [YAML format](http://www.yaml.org/start.html), which forbids using tabs and strictly requires using two spaces for indentation. If the configuration file is formatted incorrectly, Blackbox Exporter will fail to start up.
{% endhint %}

Add the following configuration to the file:

/etc/blackbox\_exporter/blackbox.yml

```
modules:
  http_2xx:
    prober: http
    timeout: 5s
    http:
      valid_status_codes: []
      method: GET
```

You can find more information about the configuration options in the [the Blackbox Exporter’s documentation](https://github.com/prometheus/blackbox_exporter/blob/master/CONFIGURATION.md).

Save the file and exit your text editor.

Before you create the service file, set the user and group ownership on the configuration file to the **blackbox\_exporter** user created in Step 1.

```
sudo chown blackbox_exporter:blackbox_exporter /etc/blackbox_exporter/blackbox.yml
```

Now create the service file so you can manage Blackbox Exporter using `systemd`:

```
sudo nano /etc/systemd/system/blackbox_exporter.service
```

Add the following content to the file:

/etc/systemd/system/blackbox\_exporter.service

```
[Unit]
Description=Blackbox Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=blackbox_exporter
Group=blackbox_exporter
Type=simple
ExecStart=/usr/local/bin/blackbox_exporter --config.file /etc/blackbox_exporter/blackbox.yml

[Install]
WantedBy=multi-user.target
```

This service file tells `systemd` to run Blackbox Exporter as the **blackbox\_exporter** user with the configuration file located at `/etc/blackbox_exporter/blackbox.yml`. The details of `systemd` service files are beyond the scope of this tutorial, but if you’d like to learn more see the [Understanding Systemd Units and Unit Files](https://www.digitalocean.com/community/tutorials/understanding-systemd-units-and-unit-files#where-are-systemd-unit-files-found) tutorial.

Save the file and exit your text editor.

Finally, reload `systemd` to use your newly-created service file:

```
sudo systemctl daemon-reload
```

Now start Blackbox Exporter:

```
sudo systemctl start blackbox_exporter
```

Make sure it started successfully by checking the service’s status:

```
sudo systemctl status blackbox_exporter
```

The output contains information about Blackbox Exporter’s process, including the main process identifier (PID), memory use, logs and more.

```
Output● blackbox_exporter.service - Blackbox Exporter
   Loaded: loaded (/etc/systemd/system/blackbox_exporter.service; disabled; vendor preset: enabled)
   Active: active (running) since Thu 2018-04-05 17:48:58 UTC; 5s ago
 Main PID: 5869 (blackbox_export)
    Tasks: 4
   Memory: 968.0K
      CPU: 9ms
   CGroup: /system.slice/blackbox_exporter.service
           └─5869 /usr/local/bin/blackbox_exporter --config.file /etc/blackbox_exporter/blackbox.yml
```

If the service’s status isn’t `active (running)`, follow the on-screen logs and retrace the preceding steps to resolve the problem before continuing the tutorial.

Lastly, enable the service to make sure Blackbox Exporter will start when the server restarts:

```
sudo systemctl enable blackbox_exporter
```

Now that Blackbox Exporter is fully configured and running, we can configure Prometheus to collect metrics about probing requests to our endpoint, so we can create alerts based on those metrics and set up notifications for alerts using Alertmanager.

### Configuring Prometheus To Scrape Blackbox Exporter <a href="#step-4-configuring-prometheus-to-scrape-blackbox-exporter" id="step-4-configuring-prometheus-to-scrape-blackbox-exporter"></a>

As mentioned in Step 3, the list of endpoints to be probed is located in the Prometheus configuration file as part of the Blackbox Exporter’s `targets` directive. In this step you’ll configure Prometheus to use Blackbox Exporter to scrape the Nginx web server running on port `80` that you configured in the prerequisite tutorials.

Open the Prometheus configuration file in your editor:

```
sudo nano /etc/prometheus/prometheus.yml
```

At this point, it should look like the following:

/etc/prometheus/prometheus.yml

```
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'node_exporter'
    scrape_interval: 5s
    static_configs:
      - targets: ['<YOUR_NODE_SERVER_IP>:9100']
```

At the end of the `scrape_configs` directive, add the following entry, which will tell Prometheus to probe the endpoint running on the local port `80` using the Blackbox Exporter’s module `http_2xx`, configured in Step 3.

/etc/prometheus/prometheus.yml

```
...
  - job_name: 'blackbox'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - http://localhost:80
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115
```

By default, Blackbox Exporter runs on port `9115` with metrics available on the `/probe` endpoint.

The `scrape_configs` configuration for Blackbox Exporter differs from the configuration for other exporters. The most notable difference is the `targets` directive, which lists the endpoints being probed instead of the exporter’s address. The exporter’s address is specified using the appropriate set of `__address__` labels.

You’ll find a detailed explanation of the `relabel` directives in the [Prometheus documentation](https://prometheus.io/docs/introduction/overview/).

Your Prometheus configuration file will now look like this:Prometheus config file - /etc/prometheus/prometheus.yml

```
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'node_exporter'
    scrape_interval: 5s
    static_configs:
      - targets: ['<YOUR_NODE_SERVER_IP>:9100']
  - job_name: 'blackbox'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - http://localhost:80
        - http://<YOUR_NODE_SERVER_IP>:9100
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115
```

Save the file and close your text editor.

Restart Prometheus to put the changes into effect:

```
sudo systemctl restart prometheus
```

Make sure it’s running as expected by checking the Prometheus service status:

```
sudo systemctl status prometheus
```

If the service’s status isn’t `active (running)`, follow the on-screen logs and retrace the preceding steps to resolve the problem before continuing the tutorial.

At this point, you’ve configured Prometheus to scrape metrics from Blackbox Exporter.&#x20;

## Installing Grafana

[Grafana](https://grafana.com/grafana) is an open-source data visualization and monitoring tool that we will integrate with [Prometheus](https://prometheus.io/) to provide a graphical representation of the data being pulled from the Node Server.&#x20;

It will require the following:

* A registered Domain name from a Domain Registrar.&#x20;
* An **A** record with `your_domain` pointing to your server’s public IP address.
* An **A** record with `www.your_domain` pointing to your server’s public IP address.
* Nginx installed and configured
* Installation of a Let's Encrypt SSL certificate with Certbot&#x20;
* Ensure port 443 is open for the Monitoring server within your AWS security groups
* You may also need to open port 80 within your AWS Security groups temporarily until SSL has been configured and port 443 is available

### Complete Nginx Configuration

The first step is to install Nginx which we have partially completed prior to installing Prometheus. We will pick up where we left off. These steps are taken from Digital Ocean's Nginx guide [here](https://www.digitalocean.com/community/tutorials/how-to-install-nginx-on-ubuntu-20-04).

We can check with the `systemd` init system to make sure the service is running by typing:

```
systemctl status nginx
```

You should receive the following showing the service is `active`

```
● nginx.service - A high performance web server and a reverse proxy server
   Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2020-04-20 16:08:19 UTC; 3 days ago
     Docs: man:nginx(8)
 Main PID: 2369 (nginx)
    Tasks: 2 (limit: 1153)
   Memory: 3.5M
   CGroup: /system.slice/nginx.service
           ├─2369 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
           └─2380 nginx: worker process
```

### Setting Up Server Blocks (Recommended) <a href="#step-5-setting-up-server-blocks-recommended" id="step-5-setting-up-server-blocks-recommended"></a>

When using the Nginx web server, *server blocks* (similar to virtual hosts in Apache) can be used to encapsulate configuration details and host more than one domain from a single server. We will set up a domain called **your\_domain**, but you should **replace this with your own domain name**.&#x20;

Nginx on Ubuntu 20.04 has one server block enabled by default that is configured to serve documents out of a directory at `/var/www/html`. While this works well for a single site, it can become unwieldy if you are hosting multiple sites. Instead of modifying `/var/www/html`, let’s create a directory structure within `/var/www` for our **your\_domain** site, leaving `/var/www/html` in place as the default directory to be served if a client request doesn’t match any other sites.

Create the directory for **your\_domain** as follows, using the `-p` flag to create any necessary parent directories:

```
sudo mkdir -p /var/www/your_domain/html
```

Next, assign ownership of the directory with the `$USER` environment variable:

```
sudo chown -R $USER:$USER /var/www/your_domain/html
```

The permissions of your web roots should be correct if you haven’t modified your `umask` value, which sets default file permissions. To ensure that your permissions are correct and allow the owner to read, write, and execute the files while granting only read and execute permissions to groups and others, you can input the following command:

```
sudo chmod -R 755 /var/www/your_domain
```

Next, create a sample `index.html` page using `nano` or your favorite editor:

```
nano /var/www/your_domain/html/index.html
```

Inside, add the following sample HTML:/var/www/your\_domain/html/index.html

```
<html>
    <head>
        <title>Welcome to your_domain!</title>
    </head>
    <body>
        <h1>Success!  The your_domain server block is working!</h1>
    </body>
</html>
```

Save and close the file by typing `CTRL` and `X` then `Y` and `ENTER` when you are finished.

In order for Nginx to serve this content, it’s necessary to create a server block with the correct directives. Instead of modifying the default configuration file directly, let’s make a new one at `/etc/nginx/sites-available/your_domain`:

```
sudo nano /etc/nginx/sites-available/your_domain
```

Paste in the following configuration block, which is similar to the default, but updated for our new directory and domain name:/etc/nginx/sites-available/your\_domain

```
server {
        listen 80;
        listen [::]:80;

        root /var/www/your_domain/html;
        index index.html index.htm index.nginx-debian.html;

        server_name your_domain www.your_domain;

        location / {
                try_files $uri $uri/ =404;
        }
}
```

&#x20;Notice that we’ve updated the `root` configuration to our new directory, and the `server_name` to our domain name.

Next, let’s enable the file by creating a link from it to the `sites-enabled` directory, which Nginx reads from during startup:

```
sudo ln -s /etc/nginx/sites-available/your_domain /etc/nginx/sites-enabled/
```

&#x20;Two server blocks are now enabled and configured to respond to requests based on their `listen` and `server_name` directives (you can read more about how Nginx processes these directives [here](https://www.digitalocean.com/community/tutorials/understanding-nginx-server-and-location-block-selection-algorithms)):

* `your_domain`: Will respond to requests for `your_domain` and `www.your_domain`.
* `default`: Will respond to any requests on port 80 that do not match the other two blocks.

To avoid a possible hash bucket memory problem that can arise from adding additional server names, it is necessary to adjust a single value in the `/etc/nginx/nginx.conf` file. Open the file:

```
sudo nano /etc/nginx/nginx.conf
```

Find the `server_names_hash_bucket_size` directive and remove the `#` symbol to uncomment the line. If you are using nano, you can quickly search for words in the file by pressing `CTRL` and `w`./etc/nginx/nginx.conf

```
...
http {
    ...
    server_names_hash_bucket_size 64;
    ...
}
...
```

&#x20;Save and close the file when you are finished.

Next, test to make sure that there are no syntax errors in any of your Nginx files:

```
sudo nginx -t
```

&#x20;If there aren’t any problems, restart Nginx to enable your changes:

```
sudo systemctl restart nginx
```

&#x20;Nginx should now be serving your domain name. You can test this by navigating to `http://your_domain`, where you should see something like this:

![Nginx first server block](https://assets.digitalocean.com/articles/nginx_server_block_1404/first_block.png)

## Install Let's Encrypt SSL Certificates with Certbot

Let's Encrypt is a free SSL service that can be installed on Linux hosts as an easy way to secure websites. The installation steps are taken from Certbot's guide [here](https://certbot.eff.org/lets-encrypt/ubuntufocal-nginx).

Ensure your version of snapd is up-to-date

```
sudo snap install core; sudo snap refresh core
```

Install certbot

```
sudo snap install --classic certbot
```

![](/files/-MeW9ylFR-i1zr9USnrv)

Execute the following instruction on the command line on the machine to ensure that the certbot command can be run.

```
sudo ln -s /snap/bin/certbot /usr/bin/certbot
```

Run this command to get a certificate and have Certbot edit your Nginx configuration automatically to serve it, turning on HTTPS access in a single step.

```
sudo certbot --nginx
```

select both domains by entering `1,2`

![](/files/-Mfdx1uKHZ9cK2T3YeNU)

Choose to re-direct HTTP to HTTPS

![](/files/-MfdxKI4Bly_fdFC0Lmg)

Congrats! your certificate has been installed.

![](/files/-MfdwnnUllBUj2Rq2jZK)

Optional: Test the certificate's strength using Qualys.

![](/files/-MfdwS5Zdl5NwJgpjVq7)

{% hint style="info" %}
Note: The certificate will expire in three months and would normally auto-renew. You can test the auto-renewal process will work by entering the following command
{% endhint %}

You can test automatic renewal for your certificates by running this command:

```
sudo certbot renew --dry-run
```

### Configuring Grafana

Download the Grafana [GPG key](https://www.digitalocean.com/community/tutorials/how-to-use-gpg-to-encrypt-and-sign-messages) with [`wget`](https://www.gnu.org/software/wget/), then [pipe the output](https://www.digitalocean.com/community/tutorials/an-introduction-to-linux-i-o-redirection#pipes) to `apt-key`. This will add the key to your APT installation’s list of trusted keys, which will allow you to download and verify the GPG-signed Grafana package:

```
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
```

In this command, the option `-q` turns off the status update message for `wget`, and `-O` outputs the file that you downloaded to the terminal. These two options ensure that only the contents of the downloaded file are pipelined to `apt-key`.

Next, add the Grafana repository to your APT sources:

```
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
```

&#x20;Refresh your APT cache to update your package lists:

```
sudo apt update
```

&#x20;You can now proceed with the installation:

```
sudo apt install grafana
```

Once Grafana is installed, use `systemctl` to start the Grafana server:

```
sudo systemctl start grafana-server
```

&#x20;Next, verify that Grafana is running by checking the service’s status:

```
sudo systemctl status grafana-server
```

&#x20;You will receive output similar to this:

```
Output● grafana-server.service - Grafana instance
     Loaded: loaded (/lib/systemd/system/grafana-server.service; disabled; vendor preset: enabled)
   Active: active (running) since Thu 2020-05-21 08:08:10 UTC; 4s ago
     Docs: http://docs.grafana.org
 Main PID: 15982 (grafana-server)
    Tasks: 7 (limit: 1137)
...
```

This output contains information about Grafana’s process, including its status, Main Process Identifier (PID), and more. `active (running)` shows that the process is running correctly.

Lastly, enable the service to automatically start Grafana on boot:

```
sudo systemctl enable grafana-server
```

&#x20;You will receive the following output:

```
OutputSynchronizing state of grafana-server.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable grafana-server
Created symlink /etc/systemd/system/multi-user.target.wants/grafana-server.service → /usr/lib/systemd/system/grafana-server.service.
```

This confirms that `systemd` has created the necessary symbolic links to autostart Grafana.

Grafana is now installed and ready for use. Next, you wil secure your connection to Grafana with a reverse proxy and SSL certificate.

### Setting Up the Reverse Proxy <a href="#step-2-setting-up-the-reverse-proxy" id="step-2-setting-up-the-reverse-proxy"></a>

Using an SSL certificate will ensure that your data is secure by encrypting the connection to and from Grafana. But, to make use of this connection, you’ll first need to reconfigure Nginx as a reverse proxy for Grafana.

Open the Nginx configuration file you created when you set up the Nginx server block with Let’s Encrypt in the [Prerequisites](https://www.digitalocean.com/community/tutorials/how-to-install-and-secure-grafana-on-ubuntu-20-04#prerequisites). You can use any text editor, but for this tutorial we’ll use `nano`:

```
sudo nano /etc/nginx/sites-available/your_domain
```

&#x20;Locate the following block:/etc/nginx/sites-available/your\_domain

```
...
    location / {
        try_files $uri $uri/ =404;
    }
...
```

&#x20;Because you already configured Nginx to communicate over SSL and because all web traffic to your server already passes through Nginx, you just need to tell Nginx to forward all requests to Grafana, which runs on port `3000` by default.

Delete the existing `try_files` line in this `location block` and replace it with the following `proxy_pass` option:/etc/nginx/sites-available/your\_domain

```
...
    location / {
        proxy_pass http://localhost:3000;
    }
...
```

&#x20;This will map the proxy to the appropriate port. Once you’re done, save and close the file by pressing `CTRL+X`, `Y`, and then `ENTER` if you’re using `nano`.

Now, test the new settings to make sure everything is configured correctly:

```
sudo nginx -t
```

You will receive the following output:

```
Outputnginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
```

Finally, activate the changes by reloading Nginx:

```
sudo systemctl reload nginx
```

&#x20;You can now access the default Grafana login screen by pointing your web browser to `https://your_domain`. If you’re unable to reach Grafana, verify that your firewall is set to allow traffic on port `443` and then re-trace the previous instructions.

With the connection to Grafana encrypted, you can now implement additional security measures, starting with changing Grafana’s default administrative credentials.

### Updating Credentials <a href="#step-3-updating-credentials" id="step-3-updating-credentials"></a>

Because every Grafana installation uses the same administrative credentials by default, it is a best practice to change your login information as soon as possible. In this step, you’ll update the credentials to improve security.

Start by navigating to `https://your_domain` from your web browser. This will bring up the default login screen where you’ll see the Grafana logo, a form asking you to enter an **Email or username** and **Password**, a **Log in** button, and a **Forgot your password?** link.

![Grafana Login](https://assets.digitalocean.com/articles/67242/grafana_login.png)

Enter `admin` into both the **Email or username** and **Password** fields and then click on the **Log in** button.

On the next screen, you’ll be asked to make your account more secure by changing the default password:

![Change Password](https://assets.digitalocean.com/articles/67242/change_password.png)

Enter the password you’d like to start using into the **New password** and **Confirm new password** fields.

From here, you can click **Submit** to save the new information or press **Skip** to skip this step. If you skip, you will be prompted to change the password next time you log in.

In order to increase the security of your Grafana setup, click **Submit**. You’ll go to the **Welcome to Grafana** dashboard:

![Home Dashboard](https://assets.digitalocean.com/articles/67242/home_dashboard.png)

You’ve now secured your account by changing the default credentials. Next, you will make changes to your Grafana configuration so that nobody can create a new Grafana account without your permission.

### Disabling Grafana Registrations and Anonymous Access <a href="#step-4-disabling-grafana-registrations-and-anonymous-access" id="step-4-disabling-grafana-registrations-and-anonymous-access"></a>

Grafana provides options that allow visitors to create user accounts for themselves and preview dashboards without registering. When Grafana isn’t accessible via the internet or when it’s working with publicly available data like service statuses, you may want to allow these features. However, when using Grafana online to work with sensitive data, anonymous access could be a security problem. To fix this problem, make some changes to your Grafana configuration.

Start by opening Grafana’s main configuration file for editing:

```
sudo nano /etc/grafana/grafana.ini
```

&#x20;Locate the following `allow_sign_up` directive under the `[users]` heading:/etc/grafana/grafana.ini

```
...
[users]
# disable user signup / registration
;allow_sign_up = true
...
```

&#x20;Enabling this directive with `true` adds a **Sign Up** button to the login screen, allowing users to register themselves and access Grafana.

Disabling this directive with `false` removes the **Sign Up** button and strengthens Grafana’s security and privacy.

Uncomment this directive by removing the `;` at the beginning of the line and then setting the option to `false`:/etc/grafana/grafana.ini

```
...
[users]
# disable user signup / registration
allow_sign_up = false
...
```

&#x20;Next, locate the following `enabled` directive under the `[auth.anonymous]` heading:/etc/grafana/grafana.ini

```
...
[auth.anonymous]
# enable anonymous access
;enabled = false
...
```

&#x20;Setting `enabled` to `true` gives non-registered users access to your dashboards; setting this option to `false` limits dashboard access to registered users only.

Uncomment this directive by removing the `;` at the beginning of the line and then setting the option to `false`./etc/grafana/grafana.ini

```
...
[auth.anonymous]
# enable anonymous access
enabled = false
...
```

&#x20;Save the file and exit your text editor.

To activate the changes, restart Grafana:

```
sudo systemctl restart grafana-server
```

Verify that everything is working by checking Grafana’s service status:

```
sudo systemctl status grafana-server
```

Like before, the output will report that Grafana is `active (running)`.

Now, point your web browser to `https://your_domain`. To return to the **Sign Up** screen, bring your cursor to your avatar in the lower left of the screen and click on the **Sign out** option that appears.

Once you have signed out, verify that there is no **Sign Up** button and that you can’t sign in without entering login credentials.

At this point, Grafana is fully configured and ready for use.

### Add a Dashboard

We will add two dashboards&#x20;

1. The default Grafana '[**Node Exporter Full**](https://grafana.com/grafana/dashboards/1860)' dashboard and
2. The Radix team provided '**Radix Node**' dashboard (optional)

Note: There are more dashboards available from the Grafana [website](https://grafana.com/grafana/dashboards)

#### Configuring a Data Source

Before we import a dashboard we need to connect to our Prometheus data source. From the Grafana homepage, click the cog icon, then 'Data Sources' and select 'Prometheus'

![Add the Prometheus data source](/files/-MeZrV0gQveVzKmCygUK)

Leave the HTTP URL as the default <http://localhost:9090>

![Choose the defaults](/files/-MeZryFlurEcke0g6E8O)

Scroll to the bottom and click 'Save & Test'

![Test the data source](/files/-MeZsYYosmU83cszaFXu)

#### Import Node Exporter Full Dashboard

The next step is to setup your Grafana dashboards

Each dashboard on the Grafana website has an id, the '[Node Exporter Full](https://grafana.com/grafana/dashboards/1860)' dashboard has an id of `1860`. From the main Grafana window, click on the '+' icon, followed by 'Import' and then enter the id `1860` and 'Load'.

![Import Node Exporter Dashboard](/files/-MeWV6rrGiBMjx3auXLF)

All going well, you should start seeing data populate the dashboard.

{% hint style="info" %}
Note: you man need to adjust the time range in the top right of the window to a few minutes until enough data has been collected.
{% endhint %}

![Node Exporter Full Dashboard](/files/-MeWVNKlInujvXSWDlJs)

#### Radix Node Dashboard (Optional)

![Betanet Radix Node Dashboard](/files/-MeZz0nvXo-QsRBj9LGg)

The Radix Team have provided their own Grafana dashboard which provides individual node and network wide metrics. Because our Node Server runs in a Docker environment there are additional steps we need to perform so that Prometheus can scrape the data within the Docker container. Please head to the link below to configure this dashboard.

{% content-ref url="/pages/-MeZwI\_93D015-sx2Nkg" %}
[Configure Radix Node Dashboard on Grafana](/install-and-configure-the-radix-validator-software/configure-radix-node-dashboard-on-grafana.md)
{% endcontent-ref %}

## Alerting

In this section we will be configuring alerts for two popular services

1. Telegram - a free chat app
2. PagerDuty - an enterprise level incident management service (with a free tier)

### Telegram

We will configure Grafana to send alerts to a Telegram chat account

#### Create the Telegram bot

First thing to do is to search for 'Botfather' in Telegram.

![Search for BotFather](/files/-MeZOjSeND54ATNRgIOm)

Enter `/start` to see the list of available commands.&#x20;

```
/start
```

Enter `/newbot` to request a new bot account

```
/newbot
```

Give your bot a name. It must end with 'bo&#x74;*'. eg.* RadixMonitorin&#x67;*\_*&#x62;ot

![Enter /start then /newbot](/files/-MeZOXEkjSGUqACsRfDg)

You will receive a response with your API token. Keep this secure and safe.

![Your API token](/files/-MeZQMW6MeLoZeTzBXgS)

You will also need your Chat ID. Search for 'Chat ID Echo' in Telegram

![Your Chat ID](/files/-MeZPb5kgjWbC29opins)

Enter `/start` and you will receive your Chat ID in return.

```
/start
```

![Chat ID](/files/-MeZPqdCCyFMERfVi9hd)

Now head back to Grafana and on the main Grafana homepage head to the Alerting page, select 'Notification channels' Give your Alert a name eg. Telegram. Change the 'Type' to Telegram and enter in your API token and Chat ID.

![Configure the Telegram alerts](/files/-MeZWYoF5yzdJxkdG1hs)

Click Save and Test. All going well, you should receive an alert.

![](/files/-MeZWmHiOFmXZoRuEEP5)

#### Additional Bot Security

Ok, we have set our bot, now we will add some security options. By default, our bot can be added to different groups, i.e. anyone can add it to a group which we don't really want.

In order to disable this configuration, we open the chat with the BotFather in Telegram and enter:

```
/setjoingroups
```

Select the name of your bot or the bot that you want to change this feature. After selecting it, this message appears:

```
Enable - bot can be added to groups.
Disable - block group invitations, the bot can't be added to groups.
Current status is: ENABLED
```

As you can see the current status of this feature is ENABLED, we will choose the option Disable. The following message is displayed:

```
Success! The new status is: DISABLED.
```

![](/files/-MeZNzm7HISt2EwfFmwc)

### Create Grafana Alerts

By default the Grafana alerting functionality is not compatible with query variables. Both the Grafana Node Exporter and Radix Node Dashboard use variables so we will need to create copies of any panel we would like to receive alerts from and edit the query, replacing any variables with their actual values. In the example below we will use the Validator 'UP/DOWN' status panel from the Radix Node Dashboard and create a copy of this panel. We need to configure the panel with the following settings

![](/files/-MfccQaQB47M7H-ERyH4)

The important bit is to edit the metrics field to specify your node IP

![](/files/-Mfcclk14ylHvyIRY7By)

If we replace any `instance` variables with `<NODE-SERVER-IP>:443` the Alert tab will appear and we can configure our alert.

![](/files/-MeZaJyHbNkX0Obju5oq)

Click the 'Alert' tab and&#x20;

* Give the rule a name
* Set the conditions. For this rule we can use WHEN sum() is below 1
* Configure the error handling as desired
* Send the notifications to Telegram and/or PagerDuty
* Include a message
* Save the Dashboard

![](/files/-MeZcdA3k6Z394H4udRg)

The first 'No Data' alert will be received almost instantaneously.&#x20;

![No Data alert](/files/-Meuuojvm5epXmaQ6rmz)

The 'Down' alert will take approx 10 mins to trigger due to Grafana going through two steps.

1. An Amber stage which will start 5 mins after the server goes offline, this will last for 5 mins.
2. Therefore 10 mins after the server goes offline a Red alert will trigger and send the notification.&#x20;

If we bring the server down and wait for 10 mins we should see the following alert in Telegram. Bringing the server back up again will send the second alert to notify the server is now 'OK'

![](/files/-MgzaXG8cqchaEqvjVR6)

Following the same procedure similar alerts can be created for CPU, RAM, Disk, etc set to trigger an alert if they fall below or exceed a certain threshold. eg. when CPU or RAM is above 75% utilisation.

![](/files/-Me_-mFIHGudJ0qtOpmP)

### PagerDuty

Pagerduty is a cloud hosted incident management platform that integrates with many existing alerting and incident management services. They offer a free tier with limited functionality which we will use to begin with. The additional benefit with PagerDuty is they offer SMS, email and phone call notifications.

There are two key steps

1. Sign-up for an account
2. Register a new 'Service' and integrate with Grafana (using their Prometheus integration key). Their Prometheus Integration guide is [here](https://www.pagerduty.com/docs/guides/prometheus-integration-guide/). Note: we only need the Integration key so we can copy that to the Grafana alerting section.

Once you have created a PagerDuty free tier account [here](https://www.pagerduty.com/) create a new 'Service'.

![](/files/-MeZT5UtGg07727R9Kdg)

Select the default 'Escalation Policy'

![](/files/-MeZTInOQsggLHSGkzJ4)

Select the default 'Intelligent' alert grouping

![](/files/-MeZTla6iJiiesDY70Q-)

Search for 'Prometheus' at the integrations step.

![Prometheus Integration](/files/-MeZU3ZyIqhJVXeTawP5)

Copy the 'Integration Key' and save it somewhere safe.

![](/files/-MeZUi8X9scj7O7G6lOT)

Now click on your Profile Avatar in the top right hand corner and select 'My Profile'. In the 'Contact Information' tab add your mobile number and/or an email address to receive SMS, Phone call and email alerts as desired.

Once complete, head back to Grafana and create another 'Notification Channel' for Pager Duty. Add your Integration Key, then save and test.

![Grafana Pager Duty Alerting](/files/-MeZVcN8ElESrlpIyCOW)

You should receive an alert via your chosen channels (SMS/Phone/Email) and your PagerDuty Incident dashboard should also update.

![PagerDuty Incident Dashboard](/files/-MeZW0IHC_ew9JxhPca3)

That's it! You have completed all the steps for adding monitoring and alerting of your Radix validator node

### Additional Sources

{% embed url="<https://blog.timescale.com/blog/grafana-101-getting-started-with-alerting-recap-and-resources/>" %}

{% embed url="<https://blog.smuts.me/grafana-telegram-alerts/>" %}

{% embed url="<https://www.lvlup-stakepool.com/monitoring/alert_telegram.html>" %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.radix-staking.com/monitoring-and-alerting.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
