Implement complete monitoring infrastructure following rick-infra principles: Components: - VictoriaMetrics: Prometheus-compatible TSDB (7x less RAM usage) - Grafana: Visualization dashboard with Authentik OAuth/OIDC integration - node_exporter: System metrics collection (CPU, memory, disk, network) Architecture: - All services run as native systemd binaries (no containers) - localhost-only binding for security - Grafana uses native OAuth integration with Authentik (not forward_auth) - Full systemd security hardening enabled - Proxied via Caddy at metrics.jnss.me with HTTPS Role Features: - Unified metrics role (single role for complete stack) - Automatic role mapping via Authentik groups: - authentik Admins OR grafana-admins -> Admin access - grafana-editors -> Editor access - All others -> Viewer access - VictoriaMetrics auto-provisioned as default Grafana datasource - 12-month metrics retention by default - Comprehensive documentation included Security: - OAuth/OIDC SSO via Authentik - All metrics services bind to 127.0.0.1 only - systemd hardening (NoNewPrivileges, ProtectSystem, etc.) - Grafana accessible only via Caddy HTTPS proxy Documentation: - roles/metrics/README.md: Complete role documentation - docs/metrics-deployment-guide.md: Step-by-step deployment guide Configuration: - Updated rick-infra.yml to include metrics deployment - Grafana port set to 3001 (Gitea uses 3000) - Ready for multi-host expansion (designed for future node_exporter deployment to production hosts)
8.4 KiB
Metrics Stack Deployment Guide
Complete guide to deploying the monitoring stack (VictoriaMetrics, Grafana, node_exporter) on rick-infra.
Overview
The metrics stack provides:
- System monitoring: CPU, memory, disk, network via node_exporter
- Time-series storage: VictoriaMetrics (Prometheus-compatible, 7x less RAM)
- Visualization: Grafana with Authentik SSO integration
- Access:
https://metrics.jnss.mewith role-based permissions
Architecture
User → metrics.jnss.me (HTTPS)
↓
Caddy (Reverse Proxy)
↓
Grafana (OAuth → Authentik for SSO)
↓
VictoriaMetrics (Time-series DB)
↑
node_exporter (System Metrics)
All services run on localhost only, following rick-infra security principles.
Prerequisites
1. Caddy Deployed
ansible-playbook rick-infra.yml --tags caddy
2. Authentik Deployed
ansible-playbook rick-infra.yml --tags authentik
3. DNS Configuration
Ensure metrics.jnss.me points to arch-vps IP:
dig metrics.jnss.me # Should return 69.62.119.31
Step 1: Configure Authentik OAuth Provider
Create OAuth2/OIDC Provider
-
Login to Authentik at
https://auth.jnss.me -
Navigate to Applications → Providers → Create
-
Configure provider:
- Name:
Grafana - Type:
OAuth2/OpenID Provider - Authentication flow:
default-authentication-flow - Authorization flow:
default-provider-authorization-explicit-consent - Client type:
Confidential - Client ID:
grafana - Client Secret: Click Generate and copy the secret
- Redirect URIs:
https://metrics.jnss.me/login/generic_oauth - Signing Key: Select auto-generated key
- Scopes:
openid,profile,email,groups
- Name:
-
Click Finish
Create Application
-
Navigate to Applications → Create
-
Configure application:
- Name:
Grafana - Slug:
grafana - Provider: Select
Grafanaprovider created above - Launch URL:
https://metrics.jnss.me
- Name:
-
Click Create
Create Groups (Optional)
For role-based access control:
-
Navigate to Directory → Groups → Create
-
Create groups:
- grafana-admins: Full admin access to Grafana
- grafana-editors: Can create/edit dashboards
- All other users get Viewer access
-
Add users to groups as needed
Step 2: Configure Vault Variables
Edit vault file:
ansible-vault edit host_vars/arch-vps/vault.yml
Add these variables:
# Grafana admin password (for emergency local login)
vault_grafana_admin_password: "your-secure-admin-password"
# Grafana secret key (generate with: openssl rand -base64 32)
vault_grafana_secret_key: "your-random-32-char-secret-key"
# OAuth credentials from Authentik
vault_grafana_oauth_client_id: "grafana"
vault_grafana_oauth_client_secret: "paste-secret-from-authentik-here"
Save and close (:wq in vim).
Step 3: Deploy Metrics Stack
Deploy all components:
ansible-playbook rick-infra.yml --tags metrics
This will:
- Install and configure VictoriaMetrics
- Install and configure node_exporter
- Install and configure Grafana with OAuth
- Deploy Caddy configuration for
metrics.jnss.me
Expected output:
PLAY RECAP *******************************************************
arch-vps : ok=25 changed=15 unreachable=0 failed=0 skipped=0
Step 4: Verify Deployment
Check Services
SSH to arch-vps and verify services:
# Check all services are running
systemctl status victoriametrics grafana node_exporter
# Check service health
curl http://127.0.0.1:8428/health # VictoriaMetrics
curl http://127.0.0.1:9100/metrics # node_exporter
curl http://127.0.0.1:3000/api/health # Grafana
Check HTTPS Access
curl -I https://metrics.jnss.me
# Should return 200 or 302 (redirect to Authentik)
Check Metrics Collection
# Check VictoriaMetrics scrape targets
curl http://127.0.0.1:8428/api/v1/targets
# Should show node_exporter as "up"
Step 5: Access Grafana
- Navigate to
https://metrics.jnss.me - Click "Sign in with Authentik"
- Login with your Authentik credentials
- You should be redirected to Grafana dashboard
First login will:
- Auto-create your Grafana user
- Assign role based on Authentik group membership
- Grant access to default organization
Step 6: Verify Data Source
- In Grafana, navigate to Connections → Data sources
- Verify VictoriaMetrics is listed and default
- Click on VictoriaMetrics → Save & test
- Should show green "Data source is working" message
Step 7: Create First Dashboard
Option 1: Import Community Dashboard (Recommended)
- Navigate to Dashboards → Import
- Enter dashboard ID:
1860(Node Exporter Full) - Click Load
- Select VictoriaMetrics as data source
- Click Import
You now have a comprehensive system monitoring dashboard!
Option 2: Create Custom Dashboard
- Navigate to Dashboards → New → New Dashboard
- Click Add visualization
- Select VictoriaMetrics data source
- Enter PromQL query:
# CPU usage 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) - Click Apply
Step 8: Configure Alerting (Optional)
Grafana supports alerting on metrics. Configure via Alerting → Alert rules.
Example alert for high CPU:
avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100 < 20
Troubleshooting
OAuth Login Fails
Symptom: Redirect to Authentik, but returns error after login
Solution:
- Verify redirect URI in Authentik matches exactly:
https://metrics.jnss.me/login/generic_oauth - Check Grafana logs:
journalctl -u grafana -f - Verify OAuth credentials in vault match Authentik
No Metrics in Grafana
Symptom: Data source working, but no data in dashboards
Solution:
- Check VictoriaMetrics targets:
curl http://127.0.0.1:8428/api/v1/targets - Verify node_exporter is up:
systemctl status node_exporter - Check time range in Grafana (top right) - try "Last 5 minutes"
Can't Access metrics.jnss.me
Symptom: Connection timeout or SSL error
Solution:
- Verify DNS:
dig metrics.jnss.me - Check Caddy is running:
systemctl status caddy - Check Caddy logs:
journalctl -u caddy -f - Verify Caddy config loaded:
ls /etc/caddy/sites/grafana.caddy
Wrong Grafana Role
Symptom: User has wrong permissions (e.g., Viewer instead of Admin)
Solution:
- Verify user is in correct Authentik group (
grafana-adminsorgrafana-editors) - Logout of Grafana and login again
- Check role mapping expression in
roles/metrics/defaults/main.yml:grafana_oauth_role_attribute_path: "contains(groups, 'grafana-admins') && 'Admin' || contains(groups, 'grafana-editors') && 'Editor' || 'Viewer'"
Next Steps
Add More Hosts
To monitor additional hosts (e.g., mini-vps):
- Deploy node_exporter to target host
- Update VictoriaMetrics scrape config to include remote targets
- Configure remote_write or federation
Add Service Metrics
To monitor containerized services:
- Expose
/metricsendpoint in application (port 8080) - Add scrape config in
roles/metrics/templates/scrape.yml.j2:- job_name: 'myservice' static_configs: - targets: ['127.0.0.1:8080'] - Redeploy metrics role
Set Up Alerting
- Configure notification channels in Grafana (Email, Slack, etc.)
- Create alert rules for critical metrics
- Set up on-call rotation if needed
Security Notes
- All metrics services run on localhost only
- Grafana is the only internet-facing component (via Caddy HTTPS)
- OAuth provides SSO with Authentik (no separate Grafana passwords)
- systemd hardening enabled on all services
- Default admin account should only be used for emergencies
Resources
- VictoriaMetrics Docs: https://docs.victoriametrics.com/
- Grafana Docs: https://grafana.com/docs/
- PromQL Guide: https://prometheus.io/docs/prometheus/latest/querying/basics/
- Dashboard Library: https://grafana.com/grafana/dashboards/
- Authentik OAuth: https://goauthentik.io/docs/providers/oauth2/
Support
For issues specific to rick-infra metrics deployment:
- Check service logs:
journalctl -u <service> -f - Review role README:
roles/metrics/README.md - Verify vault variables are correctly set
- Ensure Authentik OAuth provider is properly configured