Implement complete monitoring infrastructure following rick-infra principles: Components: - VictoriaMetrics: Prometheus-compatible TSDB (7x less RAM usage) - Grafana: Visualization dashboard with Authentik OAuth/OIDC integration - node_exporter: System metrics collection (CPU, memory, disk, network) Architecture: - All services run as native systemd binaries (no containers) - localhost-only binding for security - Grafana uses native OAuth integration with Authentik (not forward_auth) - Full systemd security hardening enabled - Proxied via Caddy at metrics.jnss.me with HTTPS Role Features: - Unified metrics role (single role for complete stack) - Automatic role mapping via Authentik groups: - authentik Admins OR grafana-admins -> Admin access - grafana-editors -> Editor access - All others -> Viewer access - VictoriaMetrics auto-provisioned as default Grafana datasource - 12-month metrics retention by default - Comprehensive documentation included Security: - OAuth/OIDC SSO via Authentik - All metrics services bind to 127.0.0.1 only - systemd hardening (NoNewPrivileges, ProtectSystem, etc.) - Grafana accessible only via Caddy HTTPS proxy Documentation: - roles/metrics/README.md: Complete role documentation - docs/metrics-deployment-guide.md: Step-by-step deployment guide Configuration: - Updated rick-infra.yml to include metrics deployment - Grafana port set to 3001 (Gitea uses 3000) - Ready for multi-host expansion (designed for future node_exporter deployment to production hosts)
Metrics Role
Complete monitoring stack for rick-infra providing system metrics collection, storage, and visualization with SSO integration.
Components
VictoriaMetrics
- Purpose: Time-series database for metrics storage
- Type: Native systemd service
- Listen:
127.0.0.1:8428(localhost only) - Features:
- Prometheus-compatible API and PromQL
- 7x less RAM usage than Prometheus
- Single binary deployment
- 12-month data retention by default
Grafana
- Purpose: Metrics visualization and dashboarding
- Type: Native systemd service
- Listen:
127.0.0.1:3000(localhost only, proxied via Caddy) - Domain:
metrics.jnss.me - Features:
- OAuth/OIDC integration with Authentik
- Role-based access control via Authentik groups
- VictoriaMetrics as default data source
node_exporter
- Purpose: System metrics collection
- Type: Native systemd service
- Listen:
127.0.0.1:9100(localhost only) - Metrics: CPU, memory, disk, network, systemd units
Architecture
┌─────────────────────────────────────────────────────┐
│ metrics.jnss.me (Grafana Dashboard) │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Caddy (HTTPS) │ │
│ │ ↓ │ │
│ │ Grafana (OAuth → Authentik) │ │
│ │ ↓ │ │
│ │ VictoriaMetrics (Prometheus-compatible) │ │
│ │ ↑ │ │
│ │ node_exporter (System Metrics) │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
Deployment
Prerequisites
- Caddy role deployed - Required for HTTPS proxy
- Authentik deployed - Required for OAuth/SSO
- Vault variables configured:
# In host_vars/arch-vps/vault.yml vault_grafana_admin_password: "secure-admin-password" vault_grafana_secret_key: "random-secret-key-32-chars" vault_grafana_oauth_client_id: "grafana" vault_grafana_oauth_client_secret: "oauth-client-secret-from-authentik"
Authentik Configuration
Before deployment, create OAuth2/OIDC provider in Authentik:
-
Create Provider:
- Name:
Grafana - Type:
OAuth2/OpenID Provider - Client ID:
grafana - Client Secret: Generate and save to vault
- Redirect URIs:
https://metrics.jnss.me/login/generic_oauth - Signing Key: Auto-generated
- Name:
-
Create Application:
- Name:
Grafana - Slug:
grafana - Provider: Select Grafana provider created above
- Name:
-
Create Groups (optional, for role mapping):
grafana-admins- Full admin accessgrafana-editors- Can create/edit dashboards- Users without these groups get Viewer access
Deploy
# Deploy complete metrics stack
ansible-playbook rick-infra.yml --tags metrics
# Deploy individual components
ansible-playbook rick-infra.yml --tags victoriametrics
ansible-playbook rick-infra.yml --tags grafana
ansible-playbook rick-infra.yml --tags node_exporter
Verify Deployment
# Check service status
ansible homelab -a "systemctl status victoriametrics grafana node_exporter"
# Check metrics collection
curl http://127.0.0.1:9100/metrics # node_exporter metrics
curl http://127.0.0.1:8428/metrics # VictoriaMetrics metrics
curl http://127.0.0.1:8428/api/v1/targets # Scrape targets
# Access Grafana
curl -I https://metrics.jnss.me/ # Should redirect to Authentik login
Usage
Access Dashboard
- Navigate to
https://metrics.jnss.me - Click "Sign in with Authentik"
- Authenticate via Authentik SSO
- Access granted based on Authentik group membership
Role Mapping
Grafana roles are automatically assigned based on Authentik groups:
-
Admin: Members of
grafana-adminsgroup- Full administrative access
- Can manage users, data sources, plugins
- Can create/edit/delete all dashboards
-
Editor: Members of
grafana-editorsgroup- Can create and edit dashboards
- Cannot manage users or data sources
-
Viewer: All other authenticated users
- Read-only access to dashboards
- Cannot create or edit dashboards
Creating Dashboards
Grafana comes with VictoriaMetrics pre-configured as the default data source. Use PromQL queries:
# CPU usage
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Memory usage
node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes
# Disk usage
100 - ((node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100)
# Network traffic
irate(node_network_receive_bytes_total[5m])
Import Community Dashboards
- Browse dashboards at https://grafana.com/grafana/dashboards/
- Recommended for node_exporter:
- Dashboard ID: 1860 (Node Exporter Full)
- Dashboard ID: 11074 (Node Exporter for Prometheus)
- Import via Grafana UI: Dashboards → Import → Enter ID
Configuration
Customization
Key configuration options in roles/metrics/defaults/main.yml:
# Data retention
victoriametrics_retention_period: "12" # months
# Scrape interval
victoriametrics_scrape_interval: "15s"
# OAuth role mapping (JMESPath expression)
grafana_oauth_role_attribute_path: "contains(groups, 'grafana-admins') && 'Admin' || contains(groups, 'grafana-editors') && 'Editor' || 'Viewer'"
# Memory limits
victoriametrics_memory_allowed_percent: "60"
Adding Scrape Targets
Edit roles/metrics/templates/scrape.yml.j2:
scrape_configs:
# Add custom application metrics
- job_name: 'myapp'
static_configs:
- targets: ['127.0.0.1:8080']
labels:
service: 'myapp'
Operations
Service Management
# VictoriaMetrics
systemctl status victoriametrics
systemctl restart victoriametrics
journalctl -u victoriametrics -f
# Grafana
systemctl status grafana
systemctl restart grafana
journalctl -u grafana -f
# node_exporter
systemctl status node_exporter
systemctl restart node_exporter
journalctl -u node_exporter -f
Data Locations
/var/lib/victoriametrics/ # Time-series data
/var/lib/grafana/ # Grafana database and dashboards
/var/log/grafana/ # Grafana logs
/etc/victoriametrics/ # VictoriaMetrics config
/etc/grafana/ # Grafana config
Backup
VictoriaMetrics data is stored in /var/lib/victoriametrics:
# Stop service
systemctl stop victoriametrics
# Backup data
tar -czf victoriametrics-backup-$(date +%Y%m%d).tar.gz /var/lib/victoriametrics
# Start service
systemctl start victoriametrics
Grafana dashboards are stored in SQLite database at /var/lib/grafana/grafana.db:
# Backup Grafana
systemctl stop grafana
tar -czf grafana-backup-$(date +%Y%m%d).tar.gz /var/lib/grafana /etc/grafana
systemctl start grafana
Security
Authentication
- Grafana protected by Authentik OAuth/OIDC
- Local admin account available for emergency access
- All services bind to localhost only
Network Security
- VictoriaMetrics:
127.0.0.1:8428(no external access) - Grafana:
127.0.0.1:3000(proxied via Caddy with HTTPS) - node_exporter:
127.0.0.1:9100(no external access)
systemd Hardening
All services run with security restrictions:
NoNewPrivileges=trueProtectSystem=strictProtectHome=truePrivateTmp=true- Read-only filesystem (except data directories)
Troubleshooting
Grafana OAuth Not Working
-
Check Authentik provider configuration:
# Verify redirect URI matches # https://metrics.jnss.me/login/generic_oauth -
Check Grafana logs:
journalctl -u grafana -f -
Verify OAuth credentials in vault match Authentik
No Metrics in Grafana
-
Check VictoriaMetrics scrape targets:
curl http://127.0.0.1:8428/api/v1/targets -
Check node_exporter is running:
systemctl status node_exporter curl http://127.0.0.1:9100/metrics -
Check VictoriaMetrics logs:
journalctl -u victoriametrics -f
High Memory Usage
VictoriaMetrics is configured to use max 60% of available memory. Adjust if needed:
# In roles/metrics/defaults/main.yml
victoriametrics_memory_allowed_percent: "40" # Reduce to 40%