# Metrics Role Complete monitoring stack for rick-infra providing system metrics collection, storage, and visualization with SSO integration. ## Components ### VictoriaMetrics - **Purpose**: Time-series database for metrics storage - **Type**: Native systemd service - **Listen**: `127.0.0.1:8428` (localhost only) - **Features**: - Prometheus-compatible API and PromQL - 7x less RAM usage than Prometheus - Single binary deployment - 12-month data retention by default ### Grafana - **Purpose**: Metrics visualization and dashboarding - **Type**: Native systemd service - **Listen**: `127.0.0.1:3000` (localhost only, proxied via Caddy) - **Domain**: `metrics.jnss.me` - **Features**: - OAuth/OIDC integration with Authentik - Role-based access control via Authentik groups - VictoriaMetrics as default data source ### node_exporter - **Purpose**: System metrics collection - **Type**: Native systemd service - **Listen**: `127.0.0.1:9100` (localhost only) - **Metrics**: CPU, memory, disk, network, systemd units ## Architecture ``` ┌─────────────────────────────────────────────────────┐ │ metrics.jnss.me (Grafana Dashboard) │ │ ┌─────────────────────────────────────────────────┐ │ │ │ Caddy (HTTPS) │ │ │ │ ↓ │ │ │ │ Grafana (OAuth → Authentik) │ │ │ │ ↓ │ │ │ │ VictoriaMetrics (Prometheus-compatible) │ │ │ │ ↑ │ │ │ │ node_exporter (System Metrics) │ │ │ └─────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────┘ ``` ## Deployment ### Prerequisites 1. **Caddy role deployed** - Required for HTTPS proxy 2. **Authentik deployed** - Required for OAuth/SSO 3. **Vault variables configured**: ```yaml # In host_vars/arch-vps/vault.yml vault_grafana_admin_password: "secure-admin-password" vault_grafana_secret_key: "random-secret-key-32-chars" vault_grafana_oauth_client_id: "grafana" vault_grafana_oauth_client_secret: "oauth-client-secret-from-authentik" ``` ### Authentik Configuration Before deployment, create OAuth2/OIDC provider in Authentik: 1. **Create Provider**: - Name: `Grafana` - Type: `OAuth2/OpenID Provider` - Client ID: `grafana` - Client Secret: Generate and save to vault - Redirect URIs: `https://metrics.jnss.me/login/generic_oauth` - Signing Key: Auto-generated 2. **Create Application**: - Name: `Grafana` - Slug: `grafana` - Provider: Select Grafana provider created above 3. **Create Groups** (optional, for role mapping): - `grafana-admins` - Full admin access - `grafana-editors` - Can create/edit dashboards - Users without these groups get Viewer access ### Deploy ```bash # Deploy complete metrics stack ansible-playbook rick-infra.yml --tags metrics # Deploy individual components ansible-playbook rick-infra.yml --tags victoriametrics ansible-playbook rick-infra.yml --tags grafana ansible-playbook rick-infra.yml --tags node_exporter ``` ### Verify Deployment ```bash # Check service status ansible homelab -a "systemctl status victoriametrics grafana node_exporter" # Check metrics collection curl http://127.0.0.1:9100/metrics # node_exporter metrics curl http://127.0.0.1:8428/metrics # VictoriaMetrics metrics curl http://127.0.0.1:8428/api/v1/targets # Scrape targets # Access Grafana curl -I https://metrics.jnss.me/ # Should redirect to Authentik login ``` ## Usage ### Access Dashboard 1. Navigate to `https://metrics.jnss.me` 2. Click "Sign in with Authentik" 3. Authenticate via Authentik SSO 4. Access granted based on Authentik group membership ### Role Mapping Grafana roles are automatically assigned based on Authentik groups: - **Admin**: Members of `grafana-admins` group - Full administrative access - Can manage users, data sources, plugins - Can create/edit/delete all dashboards - **Editor**: Members of `grafana-editors` group - Can create and edit dashboards - Cannot manage users or data sources - **Viewer**: All other authenticated users - Read-only access to dashboards - Cannot create or edit dashboards ### Creating Dashboards Grafana comes with VictoriaMetrics pre-configured as the default data source. Use PromQL queries: ```promql # CPU usage 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) # Memory usage node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes # Disk usage 100 - ((node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100) # Network traffic irate(node_network_receive_bytes_total[5m]) ``` ### Import Community Dashboards 1. Browse dashboards at https://grafana.com/grafana/dashboards/ 2. Recommended for node_exporter: - Dashboard ID: 1860 (Node Exporter Full) - Dashboard ID: 11074 (Node Exporter for Prometheus) 3. Import via Grafana UI: Dashboards → Import → Enter ID ## Configuration ### Customization Key configuration options in `roles/metrics/defaults/main.yml`: ```yaml # Data retention victoriametrics_retention_period: "12" # months # Scrape interval victoriametrics_scrape_interval: "15s" # OAuth role mapping (JMESPath expression) grafana_oauth_role_attribute_path: "contains(groups, 'grafana-admins') && 'Admin' || contains(groups, 'grafana-editors') && 'Editor' || 'Viewer'" # Memory limits victoriametrics_memory_allowed_percent: "60" ``` ### Adding Scrape Targets Edit `roles/metrics/templates/scrape.yml.j2`: ```yaml scrape_configs: # Add custom application metrics - job_name: 'myapp' static_configs: - targets: ['127.0.0.1:8080'] labels: service: 'myapp' ``` ## Operations ### Service Management ```bash # VictoriaMetrics systemctl status victoriametrics systemctl restart victoriametrics journalctl -u victoriametrics -f # Grafana systemctl status grafana systemctl restart grafana journalctl -u grafana -f # node_exporter systemctl status node_exporter systemctl restart node_exporter journalctl -u node_exporter -f ``` ### Data Locations ``` /var/lib/victoriametrics/ # Time-series data /var/lib/grafana/ # Grafana database and dashboards /var/log/grafana/ # Grafana logs /etc/victoriametrics/ # VictoriaMetrics config /etc/grafana/ # Grafana config ``` ### Backup VictoriaMetrics data is stored in `/var/lib/victoriametrics`: ```bash # Stop service systemctl stop victoriametrics # Backup data tar -czf victoriametrics-backup-$(date +%Y%m%d).tar.gz /var/lib/victoriametrics # Start service systemctl start victoriametrics ``` Grafana dashboards are stored in SQLite database at `/var/lib/grafana/grafana.db`: ```bash # Backup Grafana systemctl stop grafana tar -czf grafana-backup-$(date +%Y%m%d).tar.gz /var/lib/grafana /etc/grafana systemctl start grafana ``` ## Security ### Authentication - Grafana protected by Authentik OAuth/OIDC - Local admin account available for emergency access - All services bind to localhost only ### Network Security - VictoriaMetrics: `127.0.0.1:8428` (no external access) - Grafana: `127.0.0.1:3000` (proxied via Caddy with HTTPS) - node_exporter: `127.0.0.1:9100` (no external access) ### systemd Hardening All services run with security restrictions: - `NoNewPrivileges=true` - `ProtectSystem=strict` - `ProtectHome=true` - `PrivateTmp=true` - Read-only filesystem (except data directories) ## Troubleshooting ### Grafana OAuth Not Working 1. Check Authentik provider configuration: ```bash # Verify redirect URI matches # https://metrics.jnss.me/login/generic_oauth ``` 2. Check Grafana logs: ```bash journalctl -u grafana -f ``` 3. Verify OAuth credentials in vault match Authentik ### No Metrics in Grafana 1. Check VictoriaMetrics scrape targets: ```bash curl http://127.0.0.1:8428/api/v1/targets ``` 2. Check node_exporter is running: ```bash systemctl status node_exporter curl http://127.0.0.1:9100/metrics ``` 3. Check VictoriaMetrics logs: ```bash journalctl -u victoriametrics -f ``` ### High Memory Usage VictoriaMetrics is configured to use max 60% of available memory. Adjust if needed: ```yaml # In roles/metrics/defaults/main.yml victoriametrics_memory_allowed_percent: "40" # Reduce to 40% ``` ## See Also - [VictoriaMetrics Documentation](https://docs.victoriametrics.com/) - [Grafana Documentation](https://grafana.com/docs/) - [node_exporter GitHub](https://github.com/prometheus/node_exporter) - [PromQL Documentation](https://prometheus.io/docs/prometheus/latest/querying/basics/) - [Authentik OAuth Integration](https://goauthentik.io/docs/providers/oauth2/)