Files
rick-infra/docs/metrics-deployment-guide.md
Joakim 1f3f111d88 Add metrics monitoring stack with VictoriaMetrics, Grafana, and node_exporter
Implement complete monitoring infrastructure following rick-infra principles:

Components:
- VictoriaMetrics: Prometheus-compatible TSDB (7x less RAM usage)
- Grafana: Visualization dashboard with Authentik OAuth/OIDC integration
- node_exporter: System metrics collection (CPU, memory, disk, network)

Architecture:
- All services run as native systemd binaries (no containers)
- localhost-only binding for security
- Grafana uses native OAuth integration with Authentik (not forward_auth)
- Full systemd security hardening enabled
- Proxied via Caddy at metrics.jnss.me with HTTPS

Role Features:
- Unified metrics role (single role for complete stack)
- Automatic role mapping via Authentik groups:
  - authentik Admins OR grafana-admins -> Admin access
  - grafana-editors -> Editor access
  - All others -> Viewer access
- VictoriaMetrics auto-provisioned as default Grafana datasource
- 12-month metrics retention by default
- Comprehensive documentation included

Security:
- OAuth/OIDC SSO via Authentik
- All metrics services bind to 127.0.0.1 only
- systemd hardening (NoNewPrivileges, ProtectSystem, etc.)
- Grafana accessible only via Caddy HTTPS proxy

Documentation:
- roles/metrics/README.md: Complete role documentation
- docs/metrics-deployment-guide.md: Step-by-step deployment guide

Configuration:
- Updated rick-infra.yml to include metrics deployment
- Grafana port set to 3001 (Gitea uses 3000)
- Ready for multi-host expansion (designed for future node_exporter deployment to production hosts)
2025-12-28 19:18:30 +01:00

8.4 KiB

Metrics Stack Deployment Guide

Complete guide to deploying the monitoring stack (VictoriaMetrics, Grafana, node_exporter) on rick-infra.

Overview

The metrics stack provides:

  • System monitoring: CPU, memory, disk, network via node_exporter
  • Time-series storage: VictoriaMetrics (Prometheus-compatible, 7x less RAM)
  • Visualization: Grafana with Authentik SSO integration
  • Access: https://metrics.jnss.me with role-based permissions

Architecture

User → metrics.jnss.me (HTTPS)
  ↓
Caddy (Reverse Proxy)
  ↓
Grafana (OAuth → Authentik for SSO)
  ↓
VictoriaMetrics (Time-series DB)
  ↑
node_exporter (System Metrics)

All services run on localhost only, following rick-infra security principles.

Prerequisites

1. Caddy Deployed

ansible-playbook rick-infra.yml --tags caddy

2. Authentik Deployed

ansible-playbook rick-infra.yml --tags authentik

3. DNS Configuration

Ensure metrics.jnss.me points to arch-vps IP:

dig metrics.jnss.me  # Should return 69.62.119.31

Step 1: Configure Authentik OAuth Provider

Create OAuth2/OIDC Provider

  1. Login to Authentik at https://auth.jnss.me

  2. Navigate to Applications → ProvidersCreate

  3. Configure provider:

    • Name: Grafana
    • Type: OAuth2/OpenID Provider
    • Authentication flow: default-authentication-flow
    • Authorization flow: default-provider-authorization-explicit-consent
    • Client type: Confidential
    • Client ID: grafana
    • Client Secret: Click Generate and copy the secret
    • Redirect URIs: https://metrics.jnss.me/login/generic_oauth
    • Signing Key: Select auto-generated key
    • Scopes: openid, profile, email, groups
  4. Click Finish

Create Application

  1. Navigate to ApplicationsCreate

  2. Configure application:

    • Name: Grafana
    • Slug: grafana
    • Provider: Select Grafana provider created above
    • Launch URL: https://metrics.jnss.me
  3. Click Create

Create Groups (Optional)

For role-based access control:

  1. Navigate to Directory → GroupsCreate

  2. Create groups:

    • grafana-admins: Full admin access to Grafana
    • grafana-editors: Can create/edit dashboards
    • All other users get Viewer access
  3. Add users to groups as needed

Step 2: Configure Vault Variables

Edit vault file:

ansible-vault edit host_vars/arch-vps/vault.yml

Add these variables:

# Grafana admin password (for emergency local login)
vault_grafana_admin_password: "your-secure-admin-password"

# Grafana secret key (generate with: openssl rand -base64 32)
vault_grafana_secret_key: "your-random-32-char-secret-key"

# OAuth credentials from Authentik
vault_grafana_oauth_client_id: "grafana"
vault_grafana_oauth_client_secret: "paste-secret-from-authentik-here"

Save and close (:wq in vim).

Step 3: Deploy Metrics Stack

Deploy all components:

ansible-playbook rick-infra.yml --tags metrics

This will:

  1. Install and configure VictoriaMetrics
  2. Install and configure node_exporter
  3. Install and configure Grafana with OAuth
  4. Deploy Caddy configuration for metrics.jnss.me

Expected output:

PLAY RECAP *******************************************************
arch-vps : ok=25 changed=15 unreachable=0 failed=0 skipped=0

Step 4: Verify Deployment

Check Services

SSH to arch-vps and verify services:

# Check all services are running
systemctl status victoriametrics grafana node_exporter

# Check service health
curl http://127.0.0.1:8428/health   # VictoriaMetrics
curl http://127.0.0.1:9100/metrics  # node_exporter
curl http://127.0.0.1:3000/api/health  # Grafana

Check HTTPS Access

curl -I https://metrics.jnss.me
# Should return 200 or 302 (redirect to Authentik)

Check Metrics Collection

# Check VictoriaMetrics scrape targets
curl http://127.0.0.1:8428/api/v1/targets

# Should show node_exporter as "up"

Step 5: Access Grafana

  1. Navigate to https://metrics.jnss.me
  2. Click "Sign in with Authentik"
  3. Login with your Authentik credentials
  4. You should be redirected to Grafana dashboard

First login will:

  • Auto-create your Grafana user
  • Assign role based on Authentik group membership
  • Grant access to default organization

Step 6: Verify Data Source

  1. In Grafana, navigate to Connections → Data sources
  2. Verify VictoriaMetrics is listed and default
  3. Click on VictoriaMetrics → Save & test
  4. Should show green "Data source is working" message

Step 7: Create First Dashboard

  1. Navigate to Dashboards → Import
  2. Enter dashboard ID: 1860 (Node Exporter Full)
  3. Click Load
  4. Select VictoriaMetrics as data source
  5. Click Import

You now have a comprehensive system monitoring dashboard!

Option 2: Create Custom Dashboard

  1. Navigate to Dashboards → New → New Dashboard
  2. Click Add visualization
  3. Select VictoriaMetrics data source
  4. Enter PromQL query:
    # CPU usage
    100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
    
  5. Click Apply

Step 8: Configure Alerting (Optional)

Grafana supports alerting on metrics. Configure via Alerting → Alert rules.

Example alert for high CPU:

avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100 < 20

Troubleshooting

OAuth Login Fails

Symptom: Redirect to Authentik, but returns error after login

Solution:

  1. Verify redirect URI in Authentik matches exactly: https://metrics.jnss.me/login/generic_oauth
  2. Check Grafana logs: journalctl -u grafana -f
  3. Verify OAuth credentials in vault match Authentik

No Metrics in Grafana

Symptom: Data source working, but no data in dashboards

Solution:

  1. Check VictoriaMetrics targets: curl http://127.0.0.1:8428/api/v1/targets
  2. Verify node_exporter is up: systemctl status node_exporter
  3. Check time range in Grafana (top right) - try "Last 5 minutes"

Can't Access metrics.jnss.me

Symptom: Connection timeout or SSL error

Solution:

  1. Verify DNS: dig metrics.jnss.me
  2. Check Caddy is running: systemctl status caddy
  3. Check Caddy logs: journalctl -u caddy -f
  4. Verify Caddy config loaded: ls /etc/caddy/sites/grafana.caddy

Wrong Grafana Role

Symptom: User has wrong permissions (e.g., Viewer instead of Admin)

Solution:

  1. Verify user is in correct Authentik group (grafana-admins or grafana-editors)
  2. Logout of Grafana and login again
  3. Check role mapping expression in roles/metrics/defaults/main.yml:
    grafana_oauth_role_attribute_path: "contains(groups, 'grafana-admins') && 'Admin' || contains(groups, 'grafana-editors') && 'Editor' || 'Viewer'"
    

Next Steps

Add More Hosts

To monitor additional hosts (e.g., mini-vps):

  1. Deploy node_exporter to target host
  2. Update VictoriaMetrics scrape config to include remote targets
  3. Configure remote_write or federation

Add Service Metrics

To monitor containerized services:

  1. Expose /metrics endpoint in application (port 8080)
  2. Add scrape config in roles/metrics/templates/scrape.yml.j2:
    - job_name: 'myservice'
      static_configs:
        - targets: ['127.0.0.1:8080']
    
  3. Redeploy metrics role

Set Up Alerting

  1. Configure notification channels in Grafana (Email, Slack, etc.)
  2. Create alert rules for critical metrics
  3. Set up on-call rotation if needed

Security Notes

  • All metrics services run on localhost only
  • Grafana is the only internet-facing component (via Caddy HTTPS)
  • OAuth provides SSO with Authentik (no separate Grafana passwords)
  • systemd hardening enabled on all services
  • Default admin account should only be used for emergencies

Resources

Support

For issues specific to rick-infra metrics deployment:

  1. Check service logs: journalctl -u <service> -f
  2. Review role README: roles/metrics/README.md
  3. Verify vault variables are correctly set
  4. Ensure Authentik OAuth provider is properly configured