diff --git a/docs/metrics-deployment-guide.md b/docs/metrics-deployment-guide.md new file mode 100644 index 0000000..c0b69f9 --- /dev/null +++ b/docs/metrics-deployment-guide.md @@ -0,0 +1,311 @@ +# Metrics Stack Deployment Guide + +Complete guide to deploying the monitoring stack (VictoriaMetrics, Grafana, node_exporter) on rick-infra. + +## Overview + +The metrics stack provides: +- **System monitoring**: CPU, memory, disk, network via node_exporter +- **Time-series storage**: VictoriaMetrics (Prometheus-compatible, 7x less RAM) +- **Visualization**: Grafana with Authentik SSO integration +- **Access**: `https://metrics.jnss.me` with role-based permissions + +## Architecture + +``` +User → metrics.jnss.me (HTTPS) + ↓ +Caddy (Reverse Proxy) + ↓ +Grafana (OAuth → Authentik for SSO) + ↓ +VictoriaMetrics (Time-series DB) + ↑ +node_exporter (System Metrics) +``` + +All services run on localhost only, following rick-infra security principles. + +## Prerequisites + +### 1. Caddy Deployed +```bash +ansible-playbook rick-infra.yml --tags caddy +``` + +### 2. Authentik Deployed +```bash +ansible-playbook rick-infra.yml --tags authentik +``` + +### 3. DNS Configuration +Ensure `metrics.jnss.me` points to arch-vps IP: +```bash +dig metrics.jnss.me # Should return 69.62.119.31 +``` + +## Step 1: Configure Authentik OAuth Provider + +### Create OAuth2/OIDC Provider + +1. Login to Authentik at `https://auth.jnss.me` + +2. Navigate to **Applications → Providers** → **Create** + +3. Configure provider: + - **Name**: `Grafana` + - **Type**: `OAuth2/OpenID Provider` + - **Authentication flow**: `default-authentication-flow` + - **Authorization flow**: `default-provider-authorization-explicit-consent` + - **Client type**: `Confidential` + - **Client ID**: `grafana` + - **Client Secret**: Click **Generate** and **copy the secret** + - **Redirect URIs**: `https://metrics.jnss.me/login/generic_oauth` + - **Signing Key**: Select auto-generated key + - **Scopes**: `openid`, `profile`, `email`, `groups` + +4. Click **Finish** + +### Create Application + +1. Navigate to **Applications** → **Create** + +2. Configure application: + - **Name**: `Grafana` + - **Slug**: `grafana` + - **Provider**: Select `Grafana` provider created above + - **Launch URL**: `https://metrics.jnss.me` + +3. Click **Create** + +### Create Groups (Optional) + +For role-based access control: + +1. Navigate to **Directory → Groups** → **Create** + +2. Create groups: + - **grafana-admins**: Full admin access to Grafana + - **grafana-editors**: Can create/edit dashboards + - All other users get Viewer access + +3. Add users to groups as needed + +## Step 2: Configure Vault Variables + +Edit vault file: +```bash +ansible-vault edit host_vars/arch-vps/vault.yml +``` + +Add these variables: +```yaml +# Grafana admin password (for emergency local login) +vault_grafana_admin_password: "your-secure-admin-password" + +# Grafana secret key (generate with: openssl rand -base64 32) +vault_grafana_secret_key: "your-random-32-char-secret-key" + +# OAuth credentials from Authentik +vault_grafana_oauth_client_id: "grafana" +vault_grafana_oauth_client_secret: "paste-secret-from-authentik-here" +``` + +Save and close (`:wq` in vim). + +## Step 3: Deploy Metrics Stack + +Deploy all components: +```bash +ansible-playbook rick-infra.yml --tags metrics +``` + +This will: +1. Install and configure VictoriaMetrics +2. Install and configure node_exporter +3. Install and configure Grafana with OAuth +4. Deploy Caddy configuration for `metrics.jnss.me` + +Expected output: +``` +PLAY RECAP ******************************************************* +arch-vps : ok=25 changed=15 unreachable=0 failed=0 skipped=0 +``` + +## Step 4: Verify Deployment + +### Check Services + +SSH to arch-vps and verify services: +```bash +# Check all services are running +systemctl status victoriametrics grafana node_exporter + +# Check service health +curl http://127.0.0.1:8428/health # VictoriaMetrics +curl http://127.0.0.1:9100/metrics # node_exporter +curl http://127.0.0.1:3000/api/health # Grafana +``` + +### Check HTTPS Access + +```bash +curl -I https://metrics.jnss.me +# Should return 200 or 302 (redirect to Authentik) +``` + +### Check Metrics Collection + +```bash +# Check VictoriaMetrics scrape targets +curl http://127.0.0.1:8428/api/v1/targets + +# Should show node_exporter as "up" +``` + +## Step 5: Access Grafana + +1. Navigate to `https://metrics.jnss.me` +2. Click **"Sign in with Authentik"** +3. Login with your Authentik credentials +4. You should be redirected to Grafana dashboard + +First login will: +- Auto-create your Grafana user +- Assign role based on Authentik group membership +- Grant access to default organization + +## Step 6: Verify Data Source + +1. In Grafana, navigate to **Connections → Data sources** +2. Verify **VictoriaMetrics** is listed and default +3. Click on VictoriaMetrics → **Save & test** +4. Should show green "Data source is working" message + +## Step 7: Create First Dashboard + +### Option 1: Import Community Dashboard (Recommended) + +1. Navigate to **Dashboards → Import** +2. Enter dashboard ID: `1860` (Node Exporter Full) +3. Click **Load** +4. Select **VictoriaMetrics** as data source +5. Click **Import** + +You now have a comprehensive system monitoring dashboard! + +### Option 2: Create Custom Dashboard + +1. Navigate to **Dashboards → New → New Dashboard** +2. Click **Add visualization** +3. Select **VictoriaMetrics** data source +4. Enter PromQL query: + ```promql + # CPU usage + 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) + ``` +5. Click **Apply** + +## Step 8: Configure Alerting (Optional) + +Grafana supports alerting on metrics. Configure via **Alerting → Alert rules**. + +Example alert for high CPU: +```promql +avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100 < 20 +``` + +## Troubleshooting + +### OAuth Login Fails + +**Symptom**: Redirect to Authentik, but returns error after login + +**Solution**: +1. Verify redirect URI in Authentik matches exactly: `https://metrics.jnss.me/login/generic_oauth` +2. Check Grafana logs: `journalctl -u grafana -f` +3. Verify OAuth credentials in vault match Authentik + +### No Metrics in Grafana + +**Symptom**: Data source working, but no data in dashboards + +**Solution**: +1. Check VictoriaMetrics targets: `curl http://127.0.0.1:8428/api/v1/targets` +2. Verify node_exporter is up: `systemctl status node_exporter` +3. Check time range in Grafana (top right) - try "Last 5 minutes" + +### Can't Access metrics.jnss.me + +**Symptom**: Connection timeout or SSL error + +**Solution**: +1. Verify DNS: `dig metrics.jnss.me` +2. Check Caddy is running: `systemctl status caddy` +3. Check Caddy logs: `journalctl -u caddy -f` +4. Verify Caddy config loaded: `ls /etc/caddy/sites/grafana.caddy` + +### Wrong Grafana Role + +**Symptom**: User has wrong permissions (e.g., Viewer instead of Admin) + +**Solution**: +1. Verify user is in correct Authentik group (`grafana-admins` or `grafana-editors`) +2. Logout of Grafana and login again +3. Check role mapping expression in `roles/metrics/defaults/main.yml`: + ```yaml + grafana_oauth_role_attribute_path: "contains(groups, 'grafana-admins') && 'Admin' || contains(groups, 'grafana-editors') && 'Editor' || 'Viewer'" + ``` + +## Next Steps + +### Add More Hosts + +To monitor additional hosts (e.g., mini-vps): + +1. Deploy node_exporter to target host +2. Update VictoriaMetrics scrape config to include remote targets +3. Configure remote_write or federation + +### Add Service Metrics + +To monitor containerized services: + +1. Expose `/metrics` endpoint in application (port 8080) +2. Add scrape config in `roles/metrics/templates/scrape.yml.j2`: + ```yaml + - job_name: 'myservice' + static_configs: + - targets: ['127.0.0.1:8080'] + ``` +3. Redeploy metrics role + +### Set Up Alerting + +1. Configure notification channels in Grafana (Email, Slack, etc.) +2. Create alert rules for critical metrics +3. Set up on-call rotation if needed + +## Security Notes + +- All metrics services run on localhost only +- Grafana is the only internet-facing component (via Caddy HTTPS) +- OAuth provides SSO with Authentik (no separate Grafana passwords) +- systemd hardening enabled on all services +- Default admin account should only be used for emergencies + +## Resources + +- **VictoriaMetrics Docs**: https://docs.victoriametrics.com/ +- **Grafana Docs**: https://grafana.com/docs/ +- **PromQL Guide**: https://prometheus.io/docs/prometheus/latest/querying/basics/ +- **Dashboard Library**: https://grafana.com/grafana/dashboards/ +- **Authentik OAuth**: https://goauthentik.io/docs/providers/oauth2/ + +## Support + +For issues specific to rick-infra metrics deployment: +1. Check service logs: `journalctl -u -f` +2. Review role README: `roles/metrics/README.md` +3. Verify vault variables are correctly set +4. Ensure Authentik OAuth provider is properly configured diff --git a/rick-infra.yml b/rick-infra.yml index 18e96c8..b5e6b16 100644 --- a/rick-infra.yml +++ b/rick-infra.yml @@ -10,9 +10,11 @@ # - Authentik SSO/authentication # - Gitea git hosting # - Vaultwarden password manager +# - Metrics (VictoriaMetrics, Grafana, node_exporter) # # Usage: -# ansible-playbook playbooks/homelab.yml +# ansible-playbook rick-infra.yml +# ansible-playbook rick-infra.yml --tags metrics # - import_playbook: playbooks/security.yml - name: Deploy Homelab Infrastructure @@ -21,10 +23,16 @@ gather_facts: true tasks: - - name: Deploy Caddy + # - name: Deploy Caddy + # include_role: + # name: caddy + # tags: ['caddy'] + + - name: Deploy Metrics Stack include_role: - name: caddy - tags: ['caddy'] + name: metrics + tags: ['metrics', 'monitoring', 'grafana', 'victoriametrics'] + # - name: Deploy Authentik # include_role: # name: authentik diff --git a/roles/metrics/README.md b/roles/metrics/README.md new file mode 100644 index 0000000..19aba8c --- /dev/null +++ b/roles/metrics/README.md @@ -0,0 +1,325 @@ +# Metrics Role + +Complete monitoring stack for rick-infra providing system metrics collection, storage, and visualization with SSO integration. + +## Components + +### VictoriaMetrics +- **Purpose**: Time-series database for metrics storage +- **Type**: Native systemd service +- **Listen**: `127.0.0.1:8428` (localhost only) +- **Features**: + - Prometheus-compatible API and PromQL + - 7x less RAM usage than Prometheus + - Single binary deployment + - 12-month data retention by default + +### Grafana +- **Purpose**: Metrics visualization and dashboarding +- **Type**: Native systemd service +- **Listen**: `127.0.0.1:3000` (localhost only, proxied via Caddy) +- **Domain**: `metrics.jnss.me` +- **Features**: + - OAuth/OIDC integration with Authentik + - Role-based access control via Authentik groups + - VictoriaMetrics as default data source + +### node_exporter +- **Purpose**: System metrics collection +- **Type**: Native systemd service +- **Listen**: `127.0.0.1:9100` (localhost only) +- **Metrics**: CPU, memory, disk, network, systemd units + +## Architecture + +``` +┌─────────────────────────────────────────────────────┐ +│ metrics.jnss.me (Grafana Dashboard) │ +│ ┌─────────────────────────────────────────────────┐ │ +│ │ Caddy (HTTPS) │ │ +│ │ ↓ │ │ +│ │ Grafana (OAuth → Authentik) │ │ +│ │ ↓ │ │ +│ │ VictoriaMetrics (Prometheus-compatible) │ │ +│ │ ↑ │ │ +│ │ node_exporter (System Metrics) │ │ +│ └─────────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────┘ +``` + +## Deployment + +### Prerequisites + +1. **Caddy role deployed** - Required for HTTPS proxy +2. **Authentik deployed** - Required for OAuth/SSO +3. **Vault variables configured**: + ```yaml + # In host_vars/arch-vps/vault.yml + vault_grafana_admin_password: "secure-admin-password" + vault_grafana_secret_key: "random-secret-key-32-chars" + vault_grafana_oauth_client_id: "grafana" + vault_grafana_oauth_client_secret: "oauth-client-secret-from-authentik" + ``` + +### Authentik Configuration + +Before deployment, create OAuth2/OIDC provider in Authentik: + +1. **Create Provider**: + - Name: `Grafana` + - Type: `OAuth2/OpenID Provider` + - Client ID: `grafana` + - Client Secret: Generate and save to vault + - Redirect URIs: `https://metrics.jnss.me/login/generic_oauth` + - Signing Key: Auto-generated + +2. **Create Application**: + - Name: `Grafana` + - Slug: `grafana` + - Provider: Select Grafana provider created above + +3. **Create Groups** (optional, for role mapping): + - `grafana-admins` - Full admin access + - `grafana-editors` - Can create/edit dashboards + - Users without these groups get Viewer access + +### Deploy + +```bash +# Deploy complete metrics stack +ansible-playbook rick-infra.yml --tags metrics + +# Deploy individual components +ansible-playbook rick-infra.yml --tags victoriametrics +ansible-playbook rick-infra.yml --tags grafana +ansible-playbook rick-infra.yml --tags node_exporter +``` + +### Verify Deployment + +```bash +# Check service status +ansible homelab -a "systemctl status victoriametrics grafana node_exporter" + +# Check metrics collection +curl http://127.0.0.1:9100/metrics # node_exporter metrics +curl http://127.0.0.1:8428/metrics # VictoriaMetrics metrics +curl http://127.0.0.1:8428/api/v1/targets # Scrape targets + +# Access Grafana +curl -I https://metrics.jnss.me/ # Should redirect to Authentik login +``` + +## Usage + +### Access Dashboard + +1. Navigate to `https://metrics.jnss.me` +2. Click "Sign in with Authentik" +3. Authenticate via Authentik SSO +4. Access granted based on Authentik group membership + +### Role Mapping + +Grafana roles are automatically assigned based on Authentik groups: + +- **Admin**: Members of `grafana-admins` group + - Full administrative access + - Can manage users, data sources, plugins + - Can create/edit/delete all dashboards + +- **Editor**: Members of `grafana-editors` group + - Can create and edit dashboards + - Cannot manage users or data sources + +- **Viewer**: All other authenticated users + - Read-only access to dashboards + - Cannot create or edit dashboards + +### Creating Dashboards + +Grafana comes with VictoriaMetrics pre-configured as the default data source. Use PromQL queries: + +```promql +# CPU usage +100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) + +# Memory usage +node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes + +# Disk usage +100 - ((node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100) + +# Network traffic +irate(node_network_receive_bytes_total[5m]) +``` + +### Import Community Dashboards + +1. Browse dashboards at https://grafana.com/grafana/dashboards/ +2. Recommended for node_exporter: + - Dashboard ID: 1860 (Node Exporter Full) + - Dashboard ID: 11074 (Node Exporter for Prometheus) +3. Import via Grafana UI: Dashboards → Import → Enter ID + +## Configuration + +### Customization + +Key configuration options in `roles/metrics/defaults/main.yml`: + +```yaml +# Data retention +victoriametrics_retention_period: "12" # months + +# Scrape interval +victoriametrics_scrape_interval: "15s" + +# OAuth role mapping (JMESPath expression) +grafana_oauth_role_attribute_path: "contains(groups, 'grafana-admins') && 'Admin' || contains(groups, 'grafana-editors') && 'Editor' || 'Viewer'" + +# Memory limits +victoriametrics_memory_allowed_percent: "60" +``` + +### Adding Scrape Targets + +Edit `roles/metrics/templates/scrape.yml.j2`: + +```yaml +scrape_configs: + # Add custom application metrics + - job_name: 'myapp' + static_configs: + - targets: ['127.0.0.1:8080'] + labels: + service: 'myapp' +``` + +## Operations + +### Service Management + +```bash +# VictoriaMetrics +systemctl status victoriametrics +systemctl restart victoriametrics +journalctl -u victoriametrics -f + +# Grafana +systemctl status grafana +systemctl restart grafana +journalctl -u grafana -f + +# node_exporter +systemctl status node_exporter +systemctl restart node_exporter +journalctl -u node_exporter -f +``` + +### Data Locations + +``` +/var/lib/victoriametrics/ # Time-series data +/var/lib/grafana/ # Grafana database and dashboards +/var/log/grafana/ # Grafana logs +/etc/victoriametrics/ # VictoriaMetrics config +/etc/grafana/ # Grafana config +``` + +### Backup + +VictoriaMetrics data is stored in `/var/lib/victoriametrics`: + +```bash +# Stop service +systemctl stop victoriametrics + +# Backup data +tar -czf victoriametrics-backup-$(date +%Y%m%d).tar.gz /var/lib/victoriametrics + +# Start service +systemctl start victoriametrics +``` + +Grafana dashboards are stored in SQLite database at `/var/lib/grafana/grafana.db`: + +```bash +# Backup Grafana +systemctl stop grafana +tar -czf grafana-backup-$(date +%Y%m%d).tar.gz /var/lib/grafana /etc/grafana +systemctl start grafana +``` + +## Security + +### Authentication +- Grafana protected by Authentik OAuth/OIDC +- Local admin account available for emergency access +- All services bind to localhost only + +### Network Security +- VictoriaMetrics: `127.0.0.1:8428` (no external access) +- Grafana: `127.0.0.1:3000` (proxied via Caddy with HTTPS) +- node_exporter: `127.0.0.1:9100` (no external access) + +### systemd Hardening +All services run with security restrictions: +- `NoNewPrivileges=true` +- `ProtectSystem=strict` +- `ProtectHome=true` +- `PrivateTmp=true` +- Read-only filesystem (except data directories) + +## Troubleshooting + +### Grafana OAuth Not Working + +1. Check Authentik provider configuration: + ```bash + # Verify redirect URI matches + # https://metrics.jnss.me/login/generic_oauth + ``` + +2. Check Grafana logs: + ```bash + journalctl -u grafana -f + ``` + +3. Verify OAuth credentials in vault match Authentik + +### No Metrics in Grafana + +1. Check VictoriaMetrics scrape targets: + ```bash + curl http://127.0.0.1:8428/api/v1/targets + ``` + +2. Check node_exporter is running: + ```bash + systemctl status node_exporter + curl http://127.0.0.1:9100/metrics + ``` + +3. Check VictoriaMetrics logs: + ```bash + journalctl -u victoriametrics -f + ``` + +### High Memory Usage + +VictoriaMetrics is configured to use max 60% of available memory. Adjust if needed: + +```yaml +# In roles/metrics/defaults/main.yml +victoriametrics_memory_allowed_percent: "40" # Reduce to 40% +``` + +## See Also + +- [VictoriaMetrics Documentation](https://docs.victoriametrics.com/) +- [Grafana Documentation](https://grafana.com/docs/) +- [node_exporter GitHub](https://github.com/prometheus/node_exporter) +- [PromQL Documentation](https://prometheus.io/docs/prometheus/latest/querying/basics/) +- [Authentik OAuth Integration](https://goauthentik.io/docs/providers/oauth2/) diff --git a/roles/metrics/defaults/main.yml b/roles/metrics/defaults/main.yml new file mode 100644 index 0000000..d459317 --- /dev/null +++ b/roles/metrics/defaults/main.yml @@ -0,0 +1,178 @@ +--- +# ================================================================= +# Metrics Infrastructure Role - Complete Monitoring Stack +# ================================================================= +# Provides VictoriaMetrics, Grafana, and node_exporter as unified stack + +# ================================================================= +# VictoriaMetrics Configuration +# ================================================================= + +# Service Management +victoriametrics_service_enabled: true +victoriametrics_service_state: "started" + +# Version +victoriametrics_version: "1.105.0" + +# Network Security (localhost only) +victoriametrics_listen_address: "127.0.0.1:8428" + +# Storage Configuration +victoriametrics_data_dir: "/var/lib/victoriametrics" +victoriametrics_retention_period: "12" # months + +# User/Group +victoriametrics_user: "victoriametrics" +victoriametrics_group: "victoriametrics" + +# Performance Settings +victoriametrics_memory_allowed_percent: "30" +victoriametrics_storage_min_free_disk_space_bytes: "10GB" + +# Scrape Configuration +victoriametrics_scrape_config_dir: "/etc/victoriametrics" +victoriametrics_scrape_config_file: "{{ victoriametrics_scrape_config_dir }}/scrape.yml" +victoriametrics_scrape_interval: "15s" +victoriametrics_scrape_timeout: "10s" + +# systemd security +victoriametrics_systemd_security: true + +# ================================================================= +# Grafana Configuration +# ================================================================= + +# Service Management +grafana_service_enabled: true +grafana_service_state: "started" + +# Version +grafana_version: "11.4.0" + +# Network Security (localhost only - proxied via Caddy) +grafana_listen_address: "127.0.0.1" +grafana_listen_port: 3420 + +# User/Group +grafana_user: "grafana" +grafana_group: "grafana" + +# Directories +grafana_data_dir: "/var/lib/grafana" +grafana_logs_dir: "/var/log/grafana" +grafana_plugins_dir: "/var/lib/grafana/plugins" +grafana_provisioning_dir: "/etc/grafana/provisioning" + +# Domain Configuration +grafana_domain: "metrics.{{ caddy_domain }}" +grafana_root_url: "https://{{ grafana_domain }}" + +# Default admin (used only for initial setup) +grafana_admin_user: "admin" +grafana_admin_password: "{{ vault_grafana_admin_password }}" + +# Disable registration (OAuth only) +grafana_allow_signup: false +grafana_disable_login_form: false # Keep fallback login + +# OAuth/OIDC Configuration (Authentik) +grafana_oauth_enabled: true +grafana_oauth_name: "Authentik" +grafana_oauth_client_id: "{{ vault_grafana_oauth_client_id }}" +grafana_oauth_client_secret: "{{ vault_grafana_oauth_client_secret }}" + +# Authentik OAuth endpoints +grafana_oauth_auth_url: "https://{{ authentik_domain }}/application/o/authorize/" +grafana_oauth_token_url: "https://{{ authentik_domain }}/application/o/token/" +grafana_oauth_api_url: "https://{{ authentik_domain }}/application/o/userinfo/" + +# OAuth role mapping +grafana_oauth_role_attribute_path: "(contains(groups, 'authentik Admins') || contains(groups, 'grafana-admins')) && 'Admin' || contains(groups, 'grafana-editors') && 'Editor' || 'Viewer'" +grafana_oauth_allow_sign_up: true # Auto-create users from OAuth +grafana_oauth_scopes: "openid profile email groups" + +# Data Source Configuration +grafana_datasource_vm_enabled: true +grafana_datasource_vm_url: "http://{{ victoriametrics_listen_address }}" +grafana_datasource_vm_name: "VictoriaMetrics" + +# Security +grafana_systemd_security: true +grafana_cookie_secure: true +grafana_cookie_samesite: "lax" + +# Database (SQLite by default) +grafana_database_type: "sqlite3" +grafana_database_path: "{{ grafana_data_dir }}/grafana.db" + +# ================================================================= +# Node Exporter Configuration +# ================================================================= + +# Service Management +node_exporter_service_enabled: true +node_exporter_service_state: "started" + +# Version +node_exporter_version: "1.8.2" + +# Network Security (localhost only) +node_exporter_listen_address: "127.0.0.1:9100" + +# User/Group +node_exporter_user: "node_exporter" +node_exporter_group: "node_exporter" + +# Enabled collectors +node_exporter_enabled_collectors: + - cpu + - diskstats + - filesystem + - loadavg + - meminfo + - netdev + - netstat + - stat + - time + - uname + - vmstat + - systemd + +# Disabled collectors +node_exporter_disabled_collectors: + - mdadm + +# Filesystem collector configuration +node_exporter_filesystem_ignored_fs_types: + - tmpfs + - devtmpfs + - devfs + - iso9660 + - overlay + - aufs + - squashfs + +node_exporter_filesystem_ignored_mount_points: + - /var/lib/containers/storage/.* + - /run/.* + - /sys/.* + - /proc/.* + +# systemd security +node_exporter_systemd_security: true + +# ================================================================= +# Infrastructure Notes +# ================================================================= +# Complete monitoring stack: +# - VictoriaMetrics: Time-series database (Prometheus-compatible) +# - Grafana: Visualization with Authentik OAuth integration +# - node_exporter: System metrics collection +# +# Role mapping via Authentik groups: +# - grafana-admins: Full admin access +# - grafana-editors: Can create/edit dashboards +# - Default: Viewer access +# +# All services run on localhost only, proxied via Caddy diff --git a/roles/metrics/handlers/main.yml b/roles/metrics/handlers/main.yml new file mode 100644 index 0000000..a3a5b70 --- /dev/null +++ b/roles/metrics/handlers/main.yml @@ -0,0 +1,23 @@ +--- +- name: restart victoriametrics + ansible.builtin.systemd: + name: victoriametrics + state: restarted + daemon_reload: true + +- name: restart node_exporter + ansible.builtin.systemd: + name: node_exporter + state: restarted + daemon_reload: true + +- name: restart grafana + ansible.builtin.systemd: + name: grafana + state: restarted + daemon_reload: true + +- name: reload caddy + ansible.builtin.systemd: + name: caddy + state: reloaded diff --git a/roles/metrics/meta/main.yml b/roles/metrics/meta/main.yml new file mode 100644 index 0000000..1dbd0f6 --- /dev/null +++ b/roles/metrics/meta/main.yml @@ -0,0 +1,3 @@ +--- +dependencies: + - role: caddy diff --git a/roles/metrics/tasks/caddy.yml b/roles/metrics/tasks/caddy.yml new file mode 100644 index 0000000..9a40c9b --- /dev/null +++ b/roles/metrics/tasks/caddy.yml @@ -0,0 +1,9 @@ +--- +- name: Deploy Grafana Caddy configuration + ansible.builtin.template: + src: grafana.caddy.j2 + dest: /etc/caddy/sites-enabled/grafana.caddy + owner: caddy + group: caddy + mode: '0644' + notify: reload caddy diff --git a/roles/metrics/tasks/grafana.yml b/roles/metrics/tasks/grafana.yml new file mode 100644 index 0000000..f7f1698 --- /dev/null +++ b/roles/metrics/tasks/grafana.yml @@ -0,0 +1,90 @@ +--- +- name: Create Grafana system user + ansible.builtin.user: + name: "{{ grafana_user }}" + system: true + create_home: false + shell: /usr/sbin/nologin + state: present + +- name: Create Grafana directories + ansible.builtin.file: + path: "{{ item }}" + state: directory + owner: "{{ grafana_user }}" + group: "{{ grafana_group }}" + mode: '0755' + loop: + - "{{ grafana_data_dir }}" + - "{{ grafana_logs_dir }}" + - "{{ grafana_plugins_dir }}" + - "{{ grafana_provisioning_dir }}" + - "{{ grafana_provisioning_dir }}/datasources" + - "{{ grafana_provisioning_dir }}/dashboards" + - "{{ grafana_data_dir }}/dashboards" + - /etc/grafana + +- name: Download Grafana binary + ansible.builtin.get_url: + url: "https://dl.grafana.com/oss/release/grafana-{{ grafana_version }}.linux-amd64.tar.gz" + dest: "/tmp/grafana-{{ grafana_version }}.tar.gz" + mode: '0644' + register: grafana_download + +- name: Extract Grafana + ansible.builtin.unarchive: + src: "/tmp/grafana-{{ grafana_version }}.tar.gz" + dest: /opt + remote_src: true + creates: "/opt/grafana-v{{ grafana_version }}" + when: grafana_download.changed + +- name: Create Grafana symlink + ansible.builtin.file: + src: "/opt/grafana-v{{ grafana_version }}" + dest: /opt/grafana + state: link + +- name: Deploy Grafana configuration + ansible.builtin.template: + src: grafana.ini.j2 + dest: /etc/grafana/grafana.ini + owner: "{{ grafana_user }}" + group: "{{ grafana_group }}" + mode: '0640' + notify: restart grafana + +- name: Deploy VictoriaMetrics datasource provisioning + ansible.builtin.template: + src: datasource-victoriametrics.yml.j2 + dest: "{{ grafana_provisioning_dir }}/datasources/victoriametrics.yml" + owner: "{{ grafana_user }}" + group: "{{ grafana_group }}" + mode: '0644' + notify: restart grafana + when: grafana_datasource_vm_enabled + +- name: Deploy dashboard provisioning + ansible.builtin.template: + src: dashboards.yml.j2 + dest: "{{ grafana_provisioning_dir }}/dashboards/default.yml" + owner: "{{ grafana_user }}" + group: "{{ grafana_group }}" + mode: '0644' + notify: restart grafana + +- name: Deploy Grafana systemd service + ansible.builtin.template: + src: grafana.service.j2 + dest: /etc/systemd/system/grafana.service + owner: root + group: root + mode: '0644' + notify: restart grafana + +- name: Enable and start Grafana service + ansible.builtin.systemd: + name: grafana + enabled: "{{ grafana_service_enabled }}" + state: "{{ grafana_service_state }}" + daemon_reload: true diff --git a/roles/metrics/tasks/main.yml b/roles/metrics/tasks/main.yml new file mode 100644 index 0000000..92680c7 --- /dev/null +++ b/roles/metrics/tasks/main.yml @@ -0,0 +1,20 @@ +--- +# ================================================================= +# Metrics Stack Deployment +# ================================================================= + +- name: Deploy VictoriaMetrics + ansible.builtin.include_tasks: victoriametrics.yml + tags: [metrics, victoriametrics] + +- name: Deploy node_exporter + ansible.builtin.include_tasks: node_exporter.yml + tags: [metrics, node_exporter] + +- name: Deploy Grafana + ansible.builtin.include_tasks: grafana.yml + tags: [metrics, grafana] + +- name: Deploy Caddy configuration for Grafana + ansible.builtin.include_tasks: caddy.yml + tags: [metrics, caddy] diff --git a/roles/metrics/tasks/node_exporter.yml b/roles/metrics/tasks/node_exporter.yml new file mode 100644 index 0000000..f7cdb61 --- /dev/null +++ b/roles/metrics/tasks/node_exporter.yml @@ -0,0 +1,49 @@ +--- +- name: Create node_exporter system user + ansible.builtin.user: + name: "{{ node_exporter_user }}" + system: true + create_home: false + shell: /usr/sbin/nologin + state: present + +- name: Download node_exporter binary + ansible.builtin.get_url: + url: "https://github.com/prometheus/node_exporter/releases/download/v{{ node_exporter_version }}/node_exporter-{{ node_exporter_version }}.linux-amd64.tar.gz" + dest: "/tmp/node_exporter-{{ node_exporter_version }}.tar.gz" + mode: '0644' + register: node_exporter_download + +- name: Extract node_exporter binary + ansible.builtin.unarchive: + src: "/tmp/node_exporter-{{ node_exporter_version }}.tar.gz" + dest: /tmp + remote_src: true + creates: "/tmp/node_exporter-{{ node_exporter_version }}.linux-amd64" + when: node_exporter_download.changed + +- name: Copy node_exporter binary to /usr/local/bin + ansible.builtin.copy: + src: "/tmp/node_exporter-{{ node_exporter_version }}.linux-amd64/node_exporter" + dest: /usr/local/bin/node_exporter + owner: root + group: root + mode: '0755' + remote_src: true + when: node_exporter_download.changed + +- name: Deploy node_exporter systemd service + ansible.builtin.template: + src: node_exporter.service.j2 + dest: /etc/systemd/system/node_exporter.service + owner: root + group: root + mode: '0644' + notify: restart node_exporter + +- name: Enable and start node_exporter service + ansible.builtin.systemd: + name: node_exporter + enabled: "{{ node_exporter_service_enabled }}" + state: "{{ node_exporter_service_state }}" + daemon_reload: true diff --git a/roles/metrics/tasks/victoriametrics.yml b/roles/metrics/tasks/victoriametrics.yml new file mode 100644 index 0000000..3c832a0 --- /dev/null +++ b/roles/metrics/tasks/victoriametrics.yml @@ -0,0 +1,66 @@ +--- +- name: Create VictoriaMetrics system user + ansible.builtin.user: + name: "{{ victoriametrics_user }}" + system: true + create_home: false + shell: /usr/sbin/nologin + state: present + +- name: Create VictoriaMetrics directories + ansible.builtin.file: + path: "{{ item }}" + state: directory + owner: "{{ victoriametrics_user }}" + group: "{{ victoriametrics_group }}" + mode: '0755' + loop: + - "{{ victoriametrics_data_dir }}" + - "{{ victoriametrics_scrape_config_dir }}" + +- name: Download VictoriaMetrics binary + ansible.builtin.get_url: + url: "https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v{{ victoriametrics_version }}/victoria-metrics-linux-amd64-v{{ victoriametrics_version }}.tar.gz" + dest: "/tmp/victoria-metrics-v{{ victoriametrics_version }}.tar.gz" + mode: '0644' + register: victoriametrics_download + +- name: Extract VictoriaMetrics binary + ansible.builtin.unarchive: + src: "/tmp/victoria-metrics-v{{ victoriametrics_version }}.tar.gz" + dest: /usr/local/bin + remote_src: true + creates: /usr/local/bin/victoria-metrics-prod + when: victoriametrics_download.changed + +- name: Set VictoriaMetrics binary permissions + ansible.builtin.file: + path: /usr/local/bin/victoria-metrics-prod + owner: root + group: root + mode: '0755' + +- name: Deploy VictoriaMetrics scrape configuration + ansible.builtin.template: + src: scrape.yml.j2 + dest: "{{ victoriametrics_scrape_config_file }}" + owner: "{{ victoriametrics_user }}" + group: "{{ victoriametrics_group }}" + mode: '0644' + notify: restart victoriametrics + +- name: Deploy VictoriaMetrics systemd service + ansible.builtin.template: + src: victoriametrics.service.j2 + dest: /etc/systemd/system/victoriametrics.service + owner: root + group: root + mode: '0644' + notify: restart victoriametrics + +- name: Enable and start VictoriaMetrics service + ansible.builtin.systemd: + name: victoriametrics + enabled: "{{ victoriametrics_service_enabled }}" + state: "{{ victoriametrics_service_state }}" + daemon_reload: true diff --git a/roles/metrics/templates/dashboards.yml.j2 b/roles/metrics/templates/dashboards.yml.j2 new file mode 100644 index 0000000..17f0ef4 --- /dev/null +++ b/roles/metrics/templates/dashboards.yml.j2 @@ -0,0 +1,12 @@ +apiVersion: 1 + +providers: + - name: 'default' + orgId: 1 + folder: '' + type: file + disableDeletion: false + updateIntervalSeconds: 10 + allowUiUpdates: true + options: + path: {{ grafana_data_dir }}/dashboards diff --git a/roles/metrics/templates/datasource-victoriametrics.yml.j2 b/roles/metrics/templates/datasource-victoriametrics.yml.j2 new file mode 100644 index 0000000..618576c --- /dev/null +++ b/roles/metrics/templates/datasource-victoriametrics.yml.j2 @@ -0,0 +1,12 @@ +apiVersion: 1 + +datasources: + - name: {{ grafana_datasource_vm_name }} + type: prometheus + access: proxy + url: {{ grafana_datasource_vm_url }} + isDefault: true + editable: true + jsonData: + httpMethod: POST + timeInterval: 15s diff --git a/roles/metrics/templates/grafana.caddy.j2 b/roles/metrics/templates/grafana.caddy.j2 new file mode 100644 index 0000000..9814429 --- /dev/null +++ b/roles/metrics/templates/grafana.caddy.j2 @@ -0,0 +1,26 @@ +# Grafana Metrics Dashboard +{{ grafana_domain }} { + reverse_proxy http://{{ grafana_listen_address }}:{{ grafana_listen_port }} { + header_up Host {host} + header_up X-Real-IP {remote_host} + header_up X-Forwarded-Proto https + header_up X-Forwarded-For {remote_host} + header_up X-Forwarded-Host {host} + } + + # Security headers + header { + X-Frame-Options SAMEORIGIN + X-Content-Type-Options nosniff + X-XSS-Protection "1; mode=block" + Referrer-Policy strict-origin-when-cross-origin + Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" + } + + # Logging + log { + output file {{ caddy_log_dir }}/grafana.log + level INFO + format json + } +} diff --git a/roles/metrics/templates/grafana.ini.j2 b/roles/metrics/templates/grafana.ini.j2 new file mode 100644 index 0000000..ed48299 --- /dev/null +++ b/roles/metrics/templates/grafana.ini.j2 @@ -0,0 +1,68 @@ +# Grafana Configuration +# Managed by Ansible - DO NOT EDIT MANUALLY + +[paths] +data = {{ grafana_data_dir }} +logs = {{ grafana_logs_dir }} +plugins = {{ grafana_plugins_dir }} +provisioning = {{ grafana_provisioning_dir }} + +[server] +http_addr = {{ grafana_listen_address }} +http_port = {{ grafana_listen_port }} +domain = {{ grafana_domain }} +root_url = {{ grafana_root_url }} +enforce_domain = true +enable_gzip = true + +[database] +type = {{ grafana_database_type }} +{% if grafana_database_type == 'sqlite3' %} +path = {{ grafana_database_path }} +{% endif %} + +[security] +admin_user = {{ grafana_admin_user }} +admin_password = {{ grafana_admin_password }} +secret_key = {{ vault_grafana_secret_key }} +cookie_secure = {{ grafana_cookie_secure | lower }} +cookie_samesite = {{ grafana_cookie_samesite }} +disable_gravatar = true +disable_initial_admin_creation = false + +[users] +allow_sign_up = {{ grafana_allow_signup | lower }} +allow_org_create = false +auto_assign_org = true +auto_assign_org_role = Viewer + +[auth] +disable_login_form = {{ grafana_disable_login_form | lower }} +oauth_auto_login = false + +{% if grafana_oauth_enabled %} +[auth.generic_oauth] +enabled = true +name = {{ grafana_oauth_name }} +client_id = {{ grafana_oauth_client_id }} +client_secret = {{ grafana_oauth_client_secret }} +scopes = {{ grafana_oauth_scopes }} +auth_url = {{ grafana_oauth_auth_url }} +token_url = {{ grafana_oauth_token_url }} +api_url = {{ grafana_oauth_api_url }} +allow_sign_up = {{ grafana_oauth_allow_sign_up | lower }} +role_attribute_path = {{ grafana_oauth_role_attribute_path }} +use_pkce = true +{% endif %} + +[log] +mode = console +level = info + +[analytics] +reporting_enabled = false +check_for_updates = false +check_for_plugin_updates = false + +[snapshots] +external_enabled = false diff --git a/roles/metrics/templates/grafana.service.j2 b/roles/metrics/templates/grafana.service.j2 new file mode 100644 index 0000000..daa55df --- /dev/null +++ b/roles/metrics/templates/grafana.service.j2 @@ -0,0 +1,36 @@ +[Unit] +Description=Grafana visualization platform +Documentation=https://grafana.com/docs/ +After=network-online.target +Wants=network-online.target + +[Service] +Type=simple +User={{ grafana_user }} +Group={{ grafana_group }} + +WorkingDirectory=/opt/grafana +ExecStart=/opt/grafana/bin/grafana-server \ + --config=/etc/grafana/grafana.ini \ + --homepath=/opt/grafana + +Restart=on-failure +RestartSec=5s + +# Security hardening +{% if grafana_systemd_security %} +NoNewPrivileges=true +PrivateTmp=true +ProtectSystem=strict +ProtectHome=true +ReadWritePaths={{ grafana_data_dir }} {{ grafana_logs_dir }} +ProtectKernelTunables=true +ProtectKernelModules=true +ProtectControlGroups=true +RestrictRealtime=true +RestrictNamespaces=true +LockPersonality=true +{% endif %} + +[Install] +WantedBy=multi-user.target diff --git a/roles/metrics/templates/node_exporter.service.j2 b/roles/metrics/templates/node_exporter.service.j2 new file mode 100644 index 0000000..8b76b81 --- /dev/null +++ b/roles/metrics/templates/node_exporter.service.j2 @@ -0,0 +1,42 @@ +[Unit] +Description=Prometheus Node Exporter +Documentation=https://github.com/prometheus/node_exporter +After=network-online.target +Wants=network-online.target + +[Service] +Type=simple +User={{ node_exporter_user }} +Group={{ node_exporter_group }} + +ExecStart=/usr/local/bin/node_exporter \ + --web.listen-address={{ node_exporter_listen_address }} \ +{% for collector in node_exporter_enabled_collectors %} + --collector.{{ collector }} \ +{% endfor %} +{% for collector in node_exporter_disabled_collectors %} + --no-collector.{{ collector }} \ +{% endfor %} + --collector.filesystem.fs-types-exclude="{{ node_exporter_filesystem_ignored_fs_types | join('|') }}" \ + --collector.filesystem.mount-points-exclude="{{ node_exporter_filesystem_ignored_mount_points | join('|') }}" + +Restart=on-failure +RestartSec=5s + +# Security hardening +{% if node_exporter_systemd_security %} +NoNewPrivileges=true +PrivateTmp=true +ProtectSystem=strict +ProtectHome=true +ProtectKernelTunables=true +ProtectKernelModules=true +ProtectControlGroups=true +RestrictRealtime=true +RestrictNamespaces=true +LockPersonality=true +ReadOnlyPaths=/ +{% endif %} + +[Install] +WantedBy=multi-user.target diff --git a/roles/metrics/templates/scrape.yml.j2 b/roles/metrics/templates/scrape.yml.j2 new file mode 100644 index 0000000..46aad4b --- /dev/null +++ b/roles/metrics/templates/scrape.yml.j2 @@ -0,0 +1,22 @@ +global: + scrape_interval: {{ victoriametrics_scrape_interval }} + scrape_timeout: {{ victoriametrics_scrape_timeout }} + external_labels: + environment: '{{ "homelab" if inventory_hostname in groups["homelab"] else "production" }}' + host: '{{ inventory_hostname }}' + +scrape_configs: + # VictoriaMetrics self-monitoring + - job_name: 'victoriametrics' + static_configs: + - targets: ['{{ victoriametrics_listen_address }}'] + labels: + service: 'victoriametrics' + + # Node exporter for system metrics + - job_name: 'node' + static_configs: + - targets: ['{{ node_exporter_listen_address }}'] + labels: + service: 'node_exporter' + instance: '{{ inventory_hostname }}' diff --git a/roles/metrics/templates/victoriametrics.service.j2 b/roles/metrics/templates/victoriametrics.service.j2 new file mode 100644 index 0000000..4f32988 --- /dev/null +++ b/roles/metrics/templates/victoriametrics.service.j2 @@ -0,0 +1,41 @@ +[Unit] +Description=VictoriaMetrics time-series database +Documentation=https://docs.victoriametrics.com/ +After=network-online.target +Wants=network-online.target + +[Service] +Type=simple +User={{ victoriametrics_user }} +Group={{ victoriametrics_group }} + +ExecStart=/usr/local/bin/victoria-metrics-prod \ + -storageDataPath={{ victoriametrics_data_dir }} \ + -retentionPeriod={{ victoriametrics_retention_period }} \ + -httpListenAddr={{ victoriametrics_listen_address }} \ + -promscrape.config={{ victoriametrics_scrape_config_file }} \ + -memory.allowedPercent={{ victoriametrics_memory_allowed_percent }} \ + -storage.minFreeDiskSpaceBytes={{ victoriametrics_storage_min_free_disk_space_bytes }} + +ExecReload=/bin/kill -HUP $MAINPID + +Restart=on-failure +RestartSec=5s + +# Security hardening +{% if victoriametrics_systemd_security %} +NoNewPrivileges=true +PrivateTmp=true +ProtectSystem=strict +ProtectHome=true +ReadWritePaths={{ victoriametrics_data_dir }} +ProtectKernelTunables=true +ProtectKernelModules=true +ProtectControlGroups=true +RestrictRealtime=true +RestrictNamespaces=true +LockPersonality=true +{% endif %} + +[Install] +WantedBy=multi-user.target