To monitor all aspects related to users, including network bandwidth, across your Proxmox cluster, we'll enhance our setup with additional exporters and configurations. We'll use a combination of node_exporter, SNMP exporter, and a custom Proxmox VE exporter to gather comprehensive data. Here's an expanded setup that will cover user activities and network bandwidth:
wget https://github.com/znerol/prometheus-pve-exporter/releases/download/v2.2.2/prometheus-pve-exporter_2.2.2_all.deb
sudo dpkg -i prometheus-pve-exporter_2.2.2_all.deb
sudo apt-get install -fsudo nano /etc/default/prometheus-pve-exporterAdd the following content (adjust as needed):
PVE_CLUSTER_NODES="proxmox1,proxmox2,proxmox3"
PVE_USER=root@pam
PVE_PASSWORD=your_root_password
PVE_VERIFY_SSL=falsesudo systemctl start prometheus-pve-exporter
sudo systemctl enable prometheus-pve-exportersudo apt-get install collectdEdit the collectd configuration:
sudo nano /etc/collectd/collectd.confAdd or modify these plugins:
LoadPlugin interface
LoadPlugin network
<Plugin interface>
Interface "vmbr0"
IgnoreSelected false
</Plugin>
<Plugin network>
<Server "192.168.1.100" "25826">
</Server>
</Plugin>Replace "vmbr0" with your main bridge interface and "192.168.1.100" with your Prometheus server IP.
wget https://github.com/prometheus/collectd_exporter/releases/download/v0.5.0/collectd_exporter-0.5.0.linux-amd64.tar.gz
tar xvfz collectd_exporter-*.tar.gz
sudo mv collectd_exporter-*/collectd_exporter /usr/local/bin/Create a systemd service for collectd-exporter:
sudo nano /etc/systemd/system/collectd_exporter.serviceAdd the following content:
[Unit]
Description=Collectd Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=collectd_exporter
Group=collectd_exporter
Type=simple
ExecStart=/usr/local/bin/collectd_exporter
[Install]
WantedBy=multi-user.targetStart the collectd-exporter:
sudo systemctl daemon-reload
sudo systemctl start collectd_exporter
sudo systemctl enable collectd_exportersudo nano /etc/prometheus/prometheus.ymlAdd these job configurations:
- job_name: 'proxmox'
static_configs:
- targets: ['proxmox1:9221', 'proxmox2:9221', 'proxmox3:9221']
- job_name: 'collectd'
static_configs:
- targets: ['localhost:9103']sudo systemctl restart prometheusIn Grafana, import these dashboards:
Create custom dashboards in Grafana for user-specific metrics. Here are some useful Prometheus queries:
CPU usage per VM:
sum(rate(node_cpu_seconds_total{mode!="idle"}[5m])) by (instance)Memory usage per VM:
(node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Buffers_bytes - node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100Network bandwidth per VM:
rate(node_network_receive_bytes_total{device="vmbr0"}[5m])
rate(node_network_transmit_bytes_total{device="vmbr0"}[5m])Disk I/O per VM:
rate(node_disk_read_bytes_total[5m])
rate(node_disk_written_bytes_total[5m])This enhanced setup will provide comprehensive monitoring of your Proxmox cluster, including:
You can further customize the Grafana dashboards to display the most relevant information for your use case. Remember to set up alerting in Grafana to notify you of any issues or threshold breaches.
This solution uses only open-source tools and provides deep insights into your Proxmox cluster's performance and user activities. You can continue to refine and expand this setup as your monitoring needs evolve.
sum(rate(node_cpu_seconds_total{mode!="idle"}[5m])) by (instance) (node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Buffers_bytes - node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100 rate(node_network_receive_bytes_total{device="vmbr0"}[5m])
rate(node_network_transmit_bytes_total{device="vmbr0"}[5m]) rate(node_disk_read_bytes_total[5m])
rate(node_disk_written_bytes_total[5m]) create grafana dashboard
I'll guide you through creating a Grafana dashboard that includes these metrics. We'll create a dashboard with panels for CPU usage, memory usage, network bandwidth, and disk I/O.
Panel 1: CPU Usage per Instance
In the query editor (PromQL), paste:
sum(rate(node_cpu_seconds_total{mode!="idle"}[5m])) by (instance)Panel 2: Memory Usage per Instance
In the query editor, paste:
(node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Buffers_bytes - node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100Panel 3: Network Bandwidth
In the query editor, add two queries: Query A:
rate(node_network_receive_bytes_total{device="vmbr0"}[5m])Query B:
rate(node_network_transmit_bytes_total{device="vmbr0"}[5m])Panel 4: Disk I/O
In the query editor, add two queries: Query A:
rate(node_disk_read_bytes_total[5m])Query B:
rate(node_disk_written_bytes_total[5m])At the top of the dashboard, click on the settings icon (gear) and then "Variables" to add some variables that will make your dashboard more interactive:
Add a new variable:
label_values(node_cpu_seconds_total, instance){instance=~"$node"} to each of your queries to make them filterable by node.Here's an example of how your queries might look after adding the variable:
sum(rate(node_cpu_seconds_total{mode!="idle",instance=~"$node"}[5m])) by (instance)
(node_memory_MemTotal_bytes{instance=~"$node"} - node_memory_MemFree_bytes{instance=~"$node"} - node_memory_Buffers_bytes{instance=~"$node"} - node_memory_Cached_bytes{instance=~"$node"}) / node_memory_MemTotal_bytes{instance=~"$node"} * 100
rate(node_network_receive_bytes_total{device="vmbr0",instance=~"$node"}[5m])
rate(node_network_transmit_bytes_total{device="vmbr0",instance=~"$node"}[5m])
rate(node_disk_read_bytes_total{instance=~"$node"}[5m])
rate(node_disk_written_bytes_total{instance=~"$node"}[5m])This dashboard will give you a good overview of CPU usage, memory usage, network bandwidth, and disk I/O across your Proxmox nodes. You can further customize it by adding more panels, adjusting the time range, or setting up alerts based on these metrics.