Network Monitoring With Raspberry Pi

I’ve had a LOT of issues, over the last few years, with my Internet service. One of the most annoying things (of the many annoying things) in dealing with Internet service providers (cough Xfinity cough), is convincing them that, yes, there is actually a problem. Recently we had an issue where our internet would go out for up to 10 minutes at a time randomly, often several times a day. Because it’s random, when we got the service person out to look at it, it, of course, worked perfectly while he was here.

So, I set up a network monitoring system using two Raspberry Pis, collectd, prometheus and grafana. I used two Raspberry Pi 4s that I had lying around, one for the data collection (collectd) and publishing (prometheus), and the other for the display (grafana). I probably could have just used one, but I wanted to make sure that I didn’t overburden it with both tasks.

Setup

OS

I installed Raspberry Pi OS on both, because it works well and because I’m familiar with Debian, which we use at work. My general method for setting it up is:

  • Use the Raspberry Pi Imager to install the OS on an SD card, then boot the pi with the SD card
  • I’m lazy, so instead of plugging my keyboard and monitor into the card, I look for a new pi on the network, and SSH in using the default username and password
  • Once logged into the pi, I do the following:
    • Add myself as a user and disable the default user
    • Set the hostname
    • Install Avahi and get it running
    • Restart the pi

From there, the pi should be on my network as <hostname>.local - I set the hostname of one of them, the one that will host collectd and prometheus as “prometheus”, and the other to be “grafana”.

Collectd

Collectd configuration is pretty straightforward. I took the default collectd.conf file in /etc/collectd/collectd.conf, set the Hostname config value to prometheus, uncommented the lines for the syslog plugin (so I can get logs to tell me what’s going on), and configured the ping plugin like so:

<Plugin ping>
        Host "www.google.com"
        Interval 1.0
        Timeout 0.9
        TTL 255
        SourceAddress "192.168.0.163"
        Device "wlan0"
        MaxMissed -1
</Plugin>

This records the ping times and drop rate of pings to Google (the source address and network device will be different on your setup, obviously). Then, I just did sudo systemctl start collectd.

Prometheus

To configure Prometheus, I simply add a new job to the scrape_configs section of the prometheus.yml file:

- job_name: 'collectd'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s
    scrape_timeout: 5s

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['prometheus.local:9103']

That port, 9103, is the port that collectd publishes its data on.

Grafana

The Prometheus website has a good overview on how to configure Grafana to use prometheus as a data source. Configuring the dashboard is fairly straightforward as well.

Result

Once that was all done, I had my graphs:

Screenshot of usual network latency

Here’s an example of a network drop-out:

Network drop-out

You can see that the ping drop rate hit 1 (100%) for around 10 minutes or so.