Add metrics monitoring stack with VictoriaMetrics, Grafana, and node_exporter

Implement complete monitoring infrastructure following rick-infra principles:

Components:
- VictoriaMetrics: Prometheus-compatible TSDB (7x less RAM usage)
- Grafana: Visualization dashboard with Authentik OAuth/OIDC integration
- node_exporter: System metrics collection (CPU, memory, disk, network)

Architecture:
- All services run as native systemd binaries (no containers)
- localhost-only binding for security
- Grafana uses native OAuth integration with Authentik (not forward_auth)
- Full systemd security hardening enabled
- Proxied via Caddy at metrics.jnss.me with HTTPS

Role Features:
- Unified metrics role (single role for complete stack)
- Automatic role mapping via Authentik groups:
  - authentik Admins OR grafana-admins -> Admin access
  - grafana-editors -> Editor access
  - All others -> Viewer access
- VictoriaMetrics auto-provisioned as default Grafana datasource
- 12-month metrics retention by default
- Comprehensive documentation included

Security:
- OAuth/OIDC SSO via Authentik
- All metrics services bind to 127.0.0.1 only
- systemd hardening (NoNewPrivileges, ProtectSystem, etc.)
- Grafana accessible only via Caddy HTTPS proxy

Documentation:
- roles/metrics/README.md: Complete role documentation
- docs/metrics-deployment-guide.md: Step-by-step deployment guide

Configuration:
- Updated rick-infra.yml to include metrics deployment
- Grafana port set to 3001 (Gitea uses 3000)
- Ready for multi-host expansion (designed for future node_exporter deployment to production hosts)
This commit is contained in:
2025-12-28 19:18:30 +01:00
parent eca3b4a726
commit 1f3f111d88
19 changed files with 1345 additions and 4 deletions

View File

@@ -0,0 +1,311 @@
# Metrics Stack Deployment Guide
Complete guide to deploying the monitoring stack (VictoriaMetrics, Grafana, node_exporter) on rick-infra.
## Overview
The metrics stack provides:
- **System monitoring**: CPU, memory, disk, network via node_exporter
- **Time-series storage**: VictoriaMetrics (Prometheus-compatible, 7x less RAM)
- **Visualization**: Grafana with Authentik SSO integration
- **Access**: `https://metrics.jnss.me` with role-based permissions
## Architecture
```
User → metrics.jnss.me (HTTPS)
Caddy (Reverse Proxy)
Grafana (OAuth → Authentik for SSO)
VictoriaMetrics (Time-series DB)
node_exporter (System Metrics)
```
All services run on localhost only, following rick-infra security principles.
## Prerequisites
### 1. Caddy Deployed
```bash
ansible-playbook rick-infra.yml --tags caddy
```
### 2. Authentik Deployed
```bash
ansible-playbook rick-infra.yml --tags authentik
```
### 3. DNS Configuration
Ensure `metrics.jnss.me` points to arch-vps IP:
```bash
dig metrics.jnss.me # Should return 69.62.119.31
```
## Step 1: Configure Authentik OAuth Provider
### Create OAuth2/OIDC Provider
1. Login to Authentik at `https://auth.jnss.me`
2. Navigate to **Applications → Providers****Create**
3. Configure provider:
- **Name**: `Grafana`
- **Type**: `OAuth2/OpenID Provider`
- **Authentication flow**: `default-authentication-flow`
- **Authorization flow**: `default-provider-authorization-explicit-consent`
- **Client type**: `Confidential`
- **Client ID**: `grafana`
- **Client Secret**: Click **Generate** and **copy the secret**
- **Redirect URIs**: `https://metrics.jnss.me/login/generic_oauth`
- **Signing Key**: Select auto-generated key
- **Scopes**: `openid`, `profile`, `email`, `groups`
4. Click **Finish**
### Create Application
1. Navigate to **Applications****Create**
2. Configure application:
- **Name**: `Grafana`
- **Slug**: `grafana`
- **Provider**: Select `Grafana` provider created above
- **Launch URL**: `https://metrics.jnss.me`
3. Click **Create**
### Create Groups (Optional)
For role-based access control:
1. Navigate to **Directory → Groups****Create**
2. Create groups:
- **grafana-admins**: Full admin access to Grafana
- **grafana-editors**: Can create/edit dashboards
- All other users get Viewer access
3. Add users to groups as needed
## Step 2: Configure Vault Variables
Edit vault file:
```bash
ansible-vault edit host_vars/arch-vps/vault.yml
```
Add these variables:
```yaml
# Grafana admin password (for emergency local login)
vault_grafana_admin_password: "your-secure-admin-password"
# Grafana secret key (generate with: openssl rand -base64 32)
vault_grafana_secret_key: "your-random-32-char-secret-key"
# OAuth credentials from Authentik
vault_grafana_oauth_client_id: "grafana"
vault_grafana_oauth_client_secret: "paste-secret-from-authentik-here"
```
Save and close (`:wq` in vim).
## Step 3: Deploy Metrics Stack
Deploy all components:
```bash
ansible-playbook rick-infra.yml --tags metrics
```
This will:
1. Install and configure VictoriaMetrics
2. Install and configure node_exporter
3. Install and configure Grafana with OAuth
4. Deploy Caddy configuration for `metrics.jnss.me`
Expected output:
```
PLAY RECAP *******************************************************
arch-vps : ok=25 changed=15 unreachable=0 failed=0 skipped=0
```
## Step 4: Verify Deployment
### Check Services
SSH to arch-vps and verify services:
```bash
# Check all services are running
systemctl status victoriametrics grafana node_exporter
# Check service health
curl http://127.0.0.1:8428/health # VictoriaMetrics
curl http://127.0.0.1:9100/metrics # node_exporter
curl http://127.0.0.1:3000/api/health # Grafana
```
### Check HTTPS Access
```bash
curl -I https://metrics.jnss.me
# Should return 200 or 302 (redirect to Authentik)
```
### Check Metrics Collection
```bash
# Check VictoriaMetrics scrape targets
curl http://127.0.0.1:8428/api/v1/targets
# Should show node_exporter as "up"
```
## Step 5: Access Grafana
1. Navigate to `https://metrics.jnss.me`
2. Click **"Sign in with Authentik"**
3. Login with your Authentik credentials
4. You should be redirected to Grafana dashboard
First login will:
- Auto-create your Grafana user
- Assign role based on Authentik group membership
- Grant access to default organization
## Step 6: Verify Data Source
1. In Grafana, navigate to **Connections → Data sources**
2. Verify **VictoriaMetrics** is listed and default
3. Click on VictoriaMetrics → **Save & test**
4. Should show green "Data source is working" message
## Step 7: Create First Dashboard
### Option 1: Import Community Dashboard (Recommended)
1. Navigate to **Dashboards → Import**
2. Enter dashboard ID: `1860` (Node Exporter Full)
3. Click **Load**
4. Select **VictoriaMetrics** as data source
5. Click **Import**
You now have a comprehensive system monitoring dashboard!
### Option 2: Create Custom Dashboard
1. Navigate to **Dashboards → New → New Dashboard**
2. Click **Add visualization**
3. Select **VictoriaMetrics** data source
4. Enter PromQL query:
```promql
# CPU usage
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
```
5. Click **Apply**
## Step 8: Configure Alerting (Optional)
Grafana supports alerting on metrics. Configure via **Alerting → Alert rules**.
Example alert for high CPU:
```promql
avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100 < 20
```
## Troubleshooting
### OAuth Login Fails
**Symptom**: Redirect to Authentik, but returns error after login
**Solution**:
1. Verify redirect URI in Authentik matches exactly: `https://metrics.jnss.me/login/generic_oauth`
2. Check Grafana logs: `journalctl -u grafana -f`
3. Verify OAuth credentials in vault match Authentik
### No Metrics in Grafana
**Symptom**: Data source working, but no data in dashboards
**Solution**:
1. Check VictoriaMetrics targets: `curl http://127.0.0.1:8428/api/v1/targets`
2. Verify node_exporter is up: `systemctl status node_exporter`
3. Check time range in Grafana (top right) - try "Last 5 minutes"
### Can't Access metrics.jnss.me
**Symptom**: Connection timeout or SSL error
**Solution**:
1. Verify DNS: `dig metrics.jnss.me`
2. Check Caddy is running: `systemctl status caddy`
3. Check Caddy logs: `journalctl -u caddy -f`
4. Verify Caddy config loaded: `ls /etc/caddy/sites/grafana.caddy`
### Wrong Grafana Role
**Symptom**: User has wrong permissions (e.g., Viewer instead of Admin)
**Solution**:
1. Verify user is in correct Authentik group (`grafana-admins` or `grafana-editors`)
2. Logout of Grafana and login again
3. Check role mapping expression in `roles/metrics/defaults/main.yml`:
```yaml
grafana_oauth_role_attribute_path: "contains(groups, 'grafana-admins') && 'Admin' || contains(groups, 'grafana-editors') && 'Editor' || 'Viewer'"
```
## Next Steps
### Add More Hosts
To monitor additional hosts (e.g., mini-vps):
1. Deploy node_exporter to target host
2. Update VictoriaMetrics scrape config to include remote targets
3. Configure remote_write or federation
### Add Service Metrics
To monitor containerized services:
1. Expose `/metrics` endpoint in application (port 8080)
2. Add scrape config in `roles/metrics/templates/scrape.yml.j2`:
```yaml
- job_name: 'myservice'
static_configs:
- targets: ['127.0.0.1:8080']
```
3. Redeploy metrics role
### Set Up Alerting
1. Configure notification channels in Grafana (Email, Slack, etc.)
2. Create alert rules for critical metrics
3. Set up on-call rotation if needed
## Security Notes
- All metrics services run on localhost only
- Grafana is the only internet-facing component (via Caddy HTTPS)
- OAuth provides SSO with Authentik (no separate Grafana passwords)
- systemd hardening enabled on all services
- Default admin account should only be used for emergencies
## Resources
- **VictoriaMetrics Docs**: https://docs.victoriametrics.com/
- **Grafana Docs**: https://grafana.com/docs/
- **PromQL Guide**: https://prometheus.io/docs/prometheus/latest/querying/basics/
- **Dashboard Library**: https://grafana.com/grafana/dashboards/
- **Authentik OAuth**: https://goauthentik.io/docs/providers/oauth2/
## Support
For issues specific to rick-infra metrics deployment:
1. Check service logs: `journalctl -u <service> -f`
2. Review role README: `roles/metrics/README.md`
3. Verify vault variables are correctly set
4. Ensure Authentik OAuth provider is properly configured

View File

@@ -10,9 +10,11 @@
# - Authentik SSO/authentication
# - Gitea git hosting
# - Vaultwarden password manager
# - Metrics (VictoriaMetrics, Grafana, node_exporter)
#
# Usage:
# ansible-playbook playbooks/homelab.yml
# ansible-playbook rick-infra.yml
# ansible-playbook rick-infra.yml --tags metrics
# - import_playbook: playbooks/security.yml
- name: Deploy Homelab Infrastructure
@@ -21,10 +23,16 @@
gather_facts: true
tasks:
- name: Deploy Caddy
# - name: Deploy Caddy
# include_role:
# name: caddy
# tags: ['caddy']
- name: Deploy Metrics Stack
include_role:
name: caddy
tags: ['caddy']
name: metrics
tags: ['metrics', 'monitoring', 'grafana', 'victoriametrics']
# - name: Deploy Authentik
# include_role:
# name: authentik

325
roles/metrics/README.md Normal file
View File

@@ -0,0 +1,325 @@
# Metrics Role
Complete monitoring stack for rick-infra providing system metrics collection, storage, and visualization with SSO integration.
## Components
### VictoriaMetrics
- **Purpose**: Time-series database for metrics storage
- **Type**: Native systemd service
- **Listen**: `127.0.0.1:8428` (localhost only)
- **Features**:
- Prometheus-compatible API and PromQL
- 7x less RAM usage than Prometheus
- Single binary deployment
- 12-month data retention by default
### Grafana
- **Purpose**: Metrics visualization and dashboarding
- **Type**: Native systemd service
- **Listen**: `127.0.0.1:3000` (localhost only, proxied via Caddy)
- **Domain**: `metrics.jnss.me`
- **Features**:
- OAuth/OIDC integration with Authentik
- Role-based access control via Authentik groups
- VictoriaMetrics as default data source
### node_exporter
- **Purpose**: System metrics collection
- **Type**: Native systemd service
- **Listen**: `127.0.0.1:9100` (localhost only)
- **Metrics**: CPU, memory, disk, network, systemd units
## Architecture
```
┌─────────────────────────────────────────────────────┐
│ metrics.jnss.me (Grafana Dashboard) │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Caddy (HTTPS) │ │
│ │ ↓ │ │
│ │ Grafana (OAuth → Authentik) │ │
│ │ ↓ │ │
│ │ VictoriaMetrics (Prometheus-compatible) │ │
│ │ ↑ │ │
│ │ node_exporter (System Metrics) │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
```
## Deployment
### Prerequisites
1. **Caddy role deployed** - Required for HTTPS proxy
2. **Authentik deployed** - Required for OAuth/SSO
3. **Vault variables configured**:
```yaml
# In host_vars/arch-vps/vault.yml
vault_grafana_admin_password: "secure-admin-password"
vault_grafana_secret_key: "random-secret-key-32-chars"
vault_grafana_oauth_client_id: "grafana"
vault_grafana_oauth_client_secret: "oauth-client-secret-from-authentik"
```
### Authentik Configuration
Before deployment, create OAuth2/OIDC provider in Authentik:
1. **Create Provider**:
- Name: `Grafana`
- Type: `OAuth2/OpenID Provider`
- Client ID: `grafana`
- Client Secret: Generate and save to vault
- Redirect URIs: `https://metrics.jnss.me/login/generic_oauth`
- Signing Key: Auto-generated
2. **Create Application**:
- Name: `Grafana`
- Slug: `grafana`
- Provider: Select Grafana provider created above
3. **Create Groups** (optional, for role mapping):
- `grafana-admins` - Full admin access
- `grafana-editors` - Can create/edit dashboards
- Users without these groups get Viewer access
### Deploy
```bash
# Deploy complete metrics stack
ansible-playbook rick-infra.yml --tags metrics
# Deploy individual components
ansible-playbook rick-infra.yml --tags victoriametrics
ansible-playbook rick-infra.yml --tags grafana
ansible-playbook rick-infra.yml --tags node_exporter
```
### Verify Deployment
```bash
# Check service status
ansible homelab -a "systemctl status victoriametrics grafana node_exporter"
# Check metrics collection
curl http://127.0.0.1:9100/metrics # node_exporter metrics
curl http://127.0.0.1:8428/metrics # VictoriaMetrics metrics
curl http://127.0.0.1:8428/api/v1/targets # Scrape targets
# Access Grafana
curl -I https://metrics.jnss.me/ # Should redirect to Authentik login
```
## Usage
### Access Dashboard
1. Navigate to `https://metrics.jnss.me`
2. Click "Sign in with Authentik"
3. Authenticate via Authentik SSO
4. Access granted based on Authentik group membership
### Role Mapping
Grafana roles are automatically assigned based on Authentik groups:
- **Admin**: Members of `grafana-admins` group
- Full administrative access
- Can manage users, data sources, plugins
- Can create/edit/delete all dashboards
- **Editor**: Members of `grafana-editors` group
- Can create and edit dashboards
- Cannot manage users or data sources
- **Viewer**: All other authenticated users
- Read-only access to dashboards
- Cannot create or edit dashboards
### Creating Dashboards
Grafana comes with VictoriaMetrics pre-configured as the default data source. Use PromQL queries:
```promql
# CPU usage
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Memory usage
node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes
# Disk usage
100 - ((node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100)
# Network traffic
irate(node_network_receive_bytes_total[5m])
```
### Import Community Dashboards
1. Browse dashboards at https://grafana.com/grafana/dashboards/
2. Recommended for node_exporter:
- Dashboard ID: 1860 (Node Exporter Full)
- Dashboard ID: 11074 (Node Exporter for Prometheus)
3. Import via Grafana UI: Dashboards → Import → Enter ID
## Configuration
### Customization
Key configuration options in `roles/metrics/defaults/main.yml`:
```yaml
# Data retention
victoriametrics_retention_period: "12" # months
# Scrape interval
victoriametrics_scrape_interval: "15s"
# OAuth role mapping (JMESPath expression)
grafana_oauth_role_attribute_path: "contains(groups, 'grafana-admins') && 'Admin' || contains(groups, 'grafana-editors') && 'Editor' || 'Viewer'"
# Memory limits
victoriametrics_memory_allowed_percent: "60"
```
### Adding Scrape Targets
Edit `roles/metrics/templates/scrape.yml.j2`:
```yaml
scrape_configs:
# Add custom application metrics
- job_name: 'myapp'
static_configs:
- targets: ['127.0.0.1:8080']
labels:
service: 'myapp'
```
## Operations
### Service Management
```bash
# VictoriaMetrics
systemctl status victoriametrics
systemctl restart victoriametrics
journalctl -u victoriametrics -f
# Grafana
systemctl status grafana
systemctl restart grafana
journalctl -u grafana -f
# node_exporter
systemctl status node_exporter
systemctl restart node_exporter
journalctl -u node_exporter -f
```
### Data Locations
```
/var/lib/victoriametrics/ # Time-series data
/var/lib/grafana/ # Grafana database and dashboards
/var/log/grafana/ # Grafana logs
/etc/victoriametrics/ # VictoriaMetrics config
/etc/grafana/ # Grafana config
```
### Backup
VictoriaMetrics data is stored in `/var/lib/victoriametrics`:
```bash
# Stop service
systemctl stop victoriametrics
# Backup data
tar -czf victoriametrics-backup-$(date +%Y%m%d).tar.gz /var/lib/victoriametrics
# Start service
systemctl start victoriametrics
```
Grafana dashboards are stored in SQLite database at `/var/lib/grafana/grafana.db`:
```bash
# Backup Grafana
systemctl stop grafana
tar -czf grafana-backup-$(date +%Y%m%d).tar.gz /var/lib/grafana /etc/grafana
systemctl start grafana
```
## Security
### Authentication
- Grafana protected by Authentik OAuth/OIDC
- Local admin account available for emergency access
- All services bind to localhost only
### Network Security
- VictoriaMetrics: `127.0.0.1:8428` (no external access)
- Grafana: `127.0.0.1:3000` (proxied via Caddy with HTTPS)
- node_exporter: `127.0.0.1:9100` (no external access)
### systemd Hardening
All services run with security restrictions:
- `NoNewPrivileges=true`
- `ProtectSystem=strict`
- `ProtectHome=true`
- `PrivateTmp=true`
- Read-only filesystem (except data directories)
## Troubleshooting
### Grafana OAuth Not Working
1. Check Authentik provider configuration:
```bash
# Verify redirect URI matches
# https://metrics.jnss.me/login/generic_oauth
```
2. Check Grafana logs:
```bash
journalctl -u grafana -f
```
3. Verify OAuth credentials in vault match Authentik
### No Metrics in Grafana
1. Check VictoriaMetrics scrape targets:
```bash
curl http://127.0.0.1:8428/api/v1/targets
```
2. Check node_exporter is running:
```bash
systemctl status node_exporter
curl http://127.0.0.1:9100/metrics
```
3. Check VictoriaMetrics logs:
```bash
journalctl -u victoriametrics -f
```
### High Memory Usage
VictoriaMetrics is configured to use max 60% of available memory. Adjust if needed:
```yaml
# In roles/metrics/defaults/main.yml
victoriametrics_memory_allowed_percent: "40" # Reduce to 40%
```
## See Also
- [VictoriaMetrics Documentation](https://docs.victoriametrics.com/)
- [Grafana Documentation](https://grafana.com/docs/)
- [node_exporter GitHub](https://github.com/prometheus/node_exporter)
- [PromQL Documentation](https://prometheus.io/docs/prometheus/latest/querying/basics/)
- [Authentik OAuth Integration](https://goauthentik.io/docs/providers/oauth2/)

View File

@@ -0,0 +1,178 @@
---
# =================================================================
# Metrics Infrastructure Role - Complete Monitoring Stack
# =================================================================
# Provides VictoriaMetrics, Grafana, and node_exporter as unified stack
# =================================================================
# VictoriaMetrics Configuration
# =================================================================
# Service Management
victoriametrics_service_enabled: true
victoriametrics_service_state: "started"
# Version
victoriametrics_version: "1.105.0"
# Network Security (localhost only)
victoriametrics_listen_address: "127.0.0.1:8428"
# Storage Configuration
victoriametrics_data_dir: "/var/lib/victoriametrics"
victoriametrics_retention_period: "12" # months
# User/Group
victoriametrics_user: "victoriametrics"
victoriametrics_group: "victoriametrics"
# Performance Settings
victoriametrics_memory_allowed_percent: "30"
victoriametrics_storage_min_free_disk_space_bytes: "10GB"
# Scrape Configuration
victoriametrics_scrape_config_dir: "/etc/victoriametrics"
victoriametrics_scrape_config_file: "{{ victoriametrics_scrape_config_dir }}/scrape.yml"
victoriametrics_scrape_interval: "15s"
victoriametrics_scrape_timeout: "10s"
# systemd security
victoriametrics_systemd_security: true
# =================================================================
# Grafana Configuration
# =================================================================
# Service Management
grafana_service_enabled: true
grafana_service_state: "started"
# Version
grafana_version: "11.4.0"
# Network Security (localhost only - proxied via Caddy)
grafana_listen_address: "127.0.0.1"
grafana_listen_port: 3420
# User/Group
grafana_user: "grafana"
grafana_group: "grafana"
# Directories
grafana_data_dir: "/var/lib/grafana"
grafana_logs_dir: "/var/log/grafana"
grafana_plugins_dir: "/var/lib/grafana/plugins"
grafana_provisioning_dir: "/etc/grafana/provisioning"
# Domain Configuration
grafana_domain: "metrics.{{ caddy_domain }}"
grafana_root_url: "https://{{ grafana_domain }}"
# Default admin (used only for initial setup)
grafana_admin_user: "admin"
grafana_admin_password: "{{ vault_grafana_admin_password }}"
# Disable registration (OAuth only)
grafana_allow_signup: false
grafana_disable_login_form: false # Keep fallback login
# OAuth/OIDC Configuration (Authentik)
grafana_oauth_enabled: true
grafana_oauth_name: "Authentik"
grafana_oauth_client_id: "{{ vault_grafana_oauth_client_id }}"
grafana_oauth_client_secret: "{{ vault_grafana_oauth_client_secret }}"
# Authentik OAuth endpoints
grafana_oauth_auth_url: "https://{{ authentik_domain }}/application/o/authorize/"
grafana_oauth_token_url: "https://{{ authentik_domain }}/application/o/token/"
grafana_oauth_api_url: "https://{{ authentik_domain }}/application/o/userinfo/"
# OAuth role mapping
grafana_oauth_role_attribute_path: "(contains(groups, 'authentik Admins') || contains(groups, 'grafana-admins')) && 'Admin' || contains(groups, 'grafana-editors') && 'Editor' || 'Viewer'"
grafana_oauth_allow_sign_up: true # Auto-create users from OAuth
grafana_oauth_scopes: "openid profile email groups"
# Data Source Configuration
grafana_datasource_vm_enabled: true
grafana_datasource_vm_url: "http://{{ victoriametrics_listen_address }}"
grafana_datasource_vm_name: "VictoriaMetrics"
# Security
grafana_systemd_security: true
grafana_cookie_secure: true
grafana_cookie_samesite: "lax"
# Database (SQLite by default)
grafana_database_type: "sqlite3"
grafana_database_path: "{{ grafana_data_dir }}/grafana.db"
# =================================================================
# Node Exporter Configuration
# =================================================================
# Service Management
node_exporter_service_enabled: true
node_exporter_service_state: "started"
# Version
node_exporter_version: "1.8.2"
# Network Security (localhost only)
node_exporter_listen_address: "127.0.0.1:9100"
# User/Group
node_exporter_user: "node_exporter"
node_exporter_group: "node_exporter"
# Enabled collectors
node_exporter_enabled_collectors:
- cpu
- diskstats
- filesystem
- loadavg
- meminfo
- netdev
- netstat
- stat
- time
- uname
- vmstat
- systemd
# Disabled collectors
node_exporter_disabled_collectors:
- mdadm
# Filesystem collector configuration
node_exporter_filesystem_ignored_fs_types:
- tmpfs
- devtmpfs
- devfs
- iso9660
- overlay
- aufs
- squashfs
node_exporter_filesystem_ignored_mount_points:
- /var/lib/containers/storage/.*
- /run/.*
- /sys/.*
- /proc/.*
# systemd security
node_exporter_systemd_security: true
# =================================================================
# Infrastructure Notes
# =================================================================
# Complete monitoring stack:
# - VictoriaMetrics: Time-series database (Prometheus-compatible)
# - Grafana: Visualization with Authentik OAuth integration
# - node_exporter: System metrics collection
#
# Role mapping via Authentik groups:
# - grafana-admins: Full admin access
# - grafana-editors: Can create/edit dashboards
# - Default: Viewer access
#
# All services run on localhost only, proxied via Caddy

View File

@@ -0,0 +1,23 @@
---
- name: restart victoriametrics
ansible.builtin.systemd:
name: victoriametrics
state: restarted
daemon_reload: true
- name: restart node_exporter
ansible.builtin.systemd:
name: node_exporter
state: restarted
daemon_reload: true
- name: restart grafana
ansible.builtin.systemd:
name: grafana
state: restarted
daemon_reload: true
- name: reload caddy
ansible.builtin.systemd:
name: caddy
state: reloaded

View File

@@ -0,0 +1,3 @@
---
dependencies:
- role: caddy

View File

@@ -0,0 +1,9 @@
---
- name: Deploy Grafana Caddy configuration
ansible.builtin.template:
src: grafana.caddy.j2
dest: /etc/caddy/sites-enabled/grafana.caddy
owner: caddy
group: caddy
mode: '0644'
notify: reload caddy

View File

@@ -0,0 +1,90 @@
---
- name: Create Grafana system user
ansible.builtin.user:
name: "{{ grafana_user }}"
system: true
create_home: false
shell: /usr/sbin/nologin
state: present
- name: Create Grafana directories
ansible.builtin.file:
path: "{{ item }}"
state: directory
owner: "{{ grafana_user }}"
group: "{{ grafana_group }}"
mode: '0755'
loop:
- "{{ grafana_data_dir }}"
- "{{ grafana_logs_dir }}"
- "{{ grafana_plugins_dir }}"
- "{{ grafana_provisioning_dir }}"
- "{{ grafana_provisioning_dir }}/datasources"
- "{{ grafana_provisioning_dir }}/dashboards"
- "{{ grafana_data_dir }}/dashboards"
- /etc/grafana
- name: Download Grafana binary
ansible.builtin.get_url:
url: "https://dl.grafana.com/oss/release/grafana-{{ grafana_version }}.linux-amd64.tar.gz"
dest: "/tmp/grafana-{{ grafana_version }}.tar.gz"
mode: '0644'
register: grafana_download
- name: Extract Grafana
ansible.builtin.unarchive:
src: "/tmp/grafana-{{ grafana_version }}.tar.gz"
dest: /opt
remote_src: true
creates: "/opt/grafana-v{{ grafana_version }}"
when: grafana_download.changed
- name: Create Grafana symlink
ansible.builtin.file:
src: "/opt/grafana-v{{ grafana_version }}"
dest: /opt/grafana
state: link
- name: Deploy Grafana configuration
ansible.builtin.template:
src: grafana.ini.j2
dest: /etc/grafana/grafana.ini
owner: "{{ grafana_user }}"
group: "{{ grafana_group }}"
mode: '0640'
notify: restart grafana
- name: Deploy VictoriaMetrics datasource provisioning
ansible.builtin.template:
src: datasource-victoriametrics.yml.j2
dest: "{{ grafana_provisioning_dir }}/datasources/victoriametrics.yml"
owner: "{{ grafana_user }}"
group: "{{ grafana_group }}"
mode: '0644'
notify: restart grafana
when: grafana_datasource_vm_enabled
- name: Deploy dashboard provisioning
ansible.builtin.template:
src: dashboards.yml.j2
dest: "{{ grafana_provisioning_dir }}/dashboards/default.yml"
owner: "{{ grafana_user }}"
group: "{{ grafana_group }}"
mode: '0644'
notify: restart grafana
- name: Deploy Grafana systemd service
ansible.builtin.template:
src: grafana.service.j2
dest: /etc/systemd/system/grafana.service
owner: root
group: root
mode: '0644'
notify: restart grafana
- name: Enable and start Grafana service
ansible.builtin.systemd:
name: grafana
enabled: "{{ grafana_service_enabled }}"
state: "{{ grafana_service_state }}"
daemon_reload: true

View File

@@ -0,0 +1,20 @@
---
# =================================================================
# Metrics Stack Deployment
# =================================================================
- name: Deploy VictoriaMetrics
ansible.builtin.include_tasks: victoriametrics.yml
tags: [metrics, victoriametrics]
- name: Deploy node_exporter
ansible.builtin.include_tasks: node_exporter.yml
tags: [metrics, node_exporter]
- name: Deploy Grafana
ansible.builtin.include_tasks: grafana.yml
tags: [metrics, grafana]
- name: Deploy Caddy configuration for Grafana
ansible.builtin.include_tasks: caddy.yml
tags: [metrics, caddy]

View File

@@ -0,0 +1,49 @@
---
- name: Create node_exporter system user
ansible.builtin.user:
name: "{{ node_exporter_user }}"
system: true
create_home: false
shell: /usr/sbin/nologin
state: present
- name: Download node_exporter binary
ansible.builtin.get_url:
url: "https://github.com/prometheus/node_exporter/releases/download/v{{ node_exporter_version }}/node_exporter-{{ node_exporter_version }}.linux-amd64.tar.gz"
dest: "/tmp/node_exporter-{{ node_exporter_version }}.tar.gz"
mode: '0644'
register: node_exporter_download
- name: Extract node_exporter binary
ansible.builtin.unarchive:
src: "/tmp/node_exporter-{{ node_exporter_version }}.tar.gz"
dest: /tmp
remote_src: true
creates: "/tmp/node_exporter-{{ node_exporter_version }}.linux-amd64"
when: node_exporter_download.changed
- name: Copy node_exporter binary to /usr/local/bin
ansible.builtin.copy:
src: "/tmp/node_exporter-{{ node_exporter_version }}.linux-amd64/node_exporter"
dest: /usr/local/bin/node_exporter
owner: root
group: root
mode: '0755'
remote_src: true
when: node_exporter_download.changed
- name: Deploy node_exporter systemd service
ansible.builtin.template:
src: node_exporter.service.j2
dest: /etc/systemd/system/node_exporter.service
owner: root
group: root
mode: '0644'
notify: restart node_exporter
- name: Enable and start node_exporter service
ansible.builtin.systemd:
name: node_exporter
enabled: "{{ node_exporter_service_enabled }}"
state: "{{ node_exporter_service_state }}"
daemon_reload: true

View File

@@ -0,0 +1,66 @@
---
- name: Create VictoriaMetrics system user
ansible.builtin.user:
name: "{{ victoriametrics_user }}"
system: true
create_home: false
shell: /usr/sbin/nologin
state: present
- name: Create VictoriaMetrics directories
ansible.builtin.file:
path: "{{ item }}"
state: directory
owner: "{{ victoriametrics_user }}"
group: "{{ victoriametrics_group }}"
mode: '0755'
loop:
- "{{ victoriametrics_data_dir }}"
- "{{ victoriametrics_scrape_config_dir }}"
- name: Download VictoriaMetrics binary
ansible.builtin.get_url:
url: "https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v{{ victoriametrics_version }}/victoria-metrics-linux-amd64-v{{ victoriametrics_version }}.tar.gz"
dest: "/tmp/victoria-metrics-v{{ victoriametrics_version }}.tar.gz"
mode: '0644'
register: victoriametrics_download
- name: Extract VictoriaMetrics binary
ansible.builtin.unarchive:
src: "/tmp/victoria-metrics-v{{ victoriametrics_version }}.tar.gz"
dest: /usr/local/bin
remote_src: true
creates: /usr/local/bin/victoria-metrics-prod
when: victoriametrics_download.changed
- name: Set VictoriaMetrics binary permissions
ansible.builtin.file:
path: /usr/local/bin/victoria-metrics-prod
owner: root
group: root
mode: '0755'
- name: Deploy VictoriaMetrics scrape configuration
ansible.builtin.template:
src: scrape.yml.j2
dest: "{{ victoriametrics_scrape_config_file }}"
owner: "{{ victoriametrics_user }}"
group: "{{ victoriametrics_group }}"
mode: '0644'
notify: restart victoriametrics
- name: Deploy VictoriaMetrics systemd service
ansible.builtin.template:
src: victoriametrics.service.j2
dest: /etc/systemd/system/victoriametrics.service
owner: root
group: root
mode: '0644'
notify: restart victoriametrics
- name: Enable and start VictoriaMetrics service
ansible.builtin.systemd:
name: victoriametrics
enabled: "{{ victoriametrics_service_enabled }}"
state: "{{ victoriametrics_service_state }}"
daemon_reload: true

View File

@@ -0,0 +1,12 @@
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: false
updateIntervalSeconds: 10
allowUiUpdates: true
options:
path: {{ grafana_data_dir }}/dashboards

View File

@@ -0,0 +1,12 @@
apiVersion: 1
datasources:
- name: {{ grafana_datasource_vm_name }}
type: prometheus
access: proxy
url: {{ grafana_datasource_vm_url }}
isDefault: true
editable: true
jsonData:
httpMethod: POST
timeInterval: 15s

View File

@@ -0,0 +1,26 @@
# Grafana Metrics Dashboard
{{ grafana_domain }} {
reverse_proxy http://{{ grafana_listen_address }}:{{ grafana_listen_port }} {
header_up Host {host}
header_up X-Real-IP {remote_host}
header_up X-Forwarded-Proto https
header_up X-Forwarded-For {remote_host}
header_up X-Forwarded-Host {host}
}
# Security headers
header {
X-Frame-Options SAMEORIGIN
X-Content-Type-Options nosniff
X-XSS-Protection "1; mode=block"
Referrer-Policy strict-origin-when-cross-origin
Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
}
# Logging
log {
output file {{ caddy_log_dir }}/grafana.log
level INFO
format json
}
}

View File

@@ -0,0 +1,68 @@
# Grafana Configuration
# Managed by Ansible - DO NOT EDIT MANUALLY
[paths]
data = {{ grafana_data_dir }}
logs = {{ grafana_logs_dir }}
plugins = {{ grafana_plugins_dir }}
provisioning = {{ grafana_provisioning_dir }}
[server]
http_addr = {{ grafana_listen_address }}
http_port = {{ grafana_listen_port }}
domain = {{ grafana_domain }}
root_url = {{ grafana_root_url }}
enforce_domain = true
enable_gzip = true
[database]
type = {{ grafana_database_type }}
{% if grafana_database_type == 'sqlite3' %}
path = {{ grafana_database_path }}
{% endif %}
[security]
admin_user = {{ grafana_admin_user }}
admin_password = {{ grafana_admin_password }}
secret_key = {{ vault_grafana_secret_key }}
cookie_secure = {{ grafana_cookie_secure | lower }}
cookie_samesite = {{ grafana_cookie_samesite }}
disable_gravatar = true
disable_initial_admin_creation = false
[users]
allow_sign_up = {{ grafana_allow_signup | lower }}
allow_org_create = false
auto_assign_org = true
auto_assign_org_role = Viewer
[auth]
disable_login_form = {{ grafana_disable_login_form | lower }}
oauth_auto_login = false
{% if grafana_oauth_enabled %}
[auth.generic_oauth]
enabled = true
name = {{ grafana_oauth_name }}
client_id = {{ grafana_oauth_client_id }}
client_secret = {{ grafana_oauth_client_secret }}
scopes = {{ grafana_oauth_scopes }}
auth_url = {{ grafana_oauth_auth_url }}
token_url = {{ grafana_oauth_token_url }}
api_url = {{ grafana_oauth_api_url }}
allow_sign_up = {{ grafana_oauth_allow_sign_up | lower }}
role_attribute_path = {{ grafana_oauth_role_attribute_path }}
use_pkce = true
{% endif %}
[log]
mode = console
level = info
[analytics]
reporting_enabled = false
check_for_updates = false
check_for_plugin_updates = false
[snapshots]
external_enabled = false

View File

@@ -0,0 +1,36 @@
[Unit]
Description=Grafana visualization platform
Documentation=https://grafana.com/docs/
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User={{ grafana_user }}
Group={{ grafana_group }}
WorkingDirectory=/opt/grafana
ExecStart=/opt/grafana/bin/grafana-server \
--config=/etc/grafana/grafana.ini \
--homepath=/opt/grafana
Restart=on-failure
RestartSec=5s
# Security hardening
{% if grafana_systemd_security %}
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths={{ grafana_data_dir }} {{ grafana_logs_dir }}
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
RestrictRealtime=true
RestrictNamespaces=true
LockPersonality=true
{% endif %}
[Install]
WantedBy=multi-user.target

View File

@@ -0,0 +1,42 @@
[Unit]
Description=Prometheus Node Exporter
Documentation=https://github.com/prometheus/node_exporter
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User={{ node_exporter_user }}
Group={{ node_exporter_group }}
ExecStart=/usr/local/bin/node_exporter \
--web.listen-address={{ node_exporter_listen_address }} \
{% for collector in node_exporter_enabled_collectors %}
--collector.{{ collector }} \
{% endfor %}
{% for collector in node_exporter_disabled_collectors %}
--no-collector.{{ collector }} \
{% endfor %}
--collector.filesystem.fs-types-exclude="{{ node_exporter_filesystem_ignored_fs_types | join('|') }}" \
--collector.filesystem.mount-points-exclude="{{ node_exporter_filesystem_ignored_mount_points | join('|') }}"
Restart=on-failure
RestartSec=5s
# Security hardening
{% if node_exporter_systemd_security %}
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
RestrictRealtime=true
RestrictNamespaces=true
LockPersonality=true
ReadOnlyPaths=/
{% endif %}
[Install]
WantedBy=multi-user.target

View File

@@ -0,0 +1,22 @@
global:
scrape_interval: {{ victoriametrics_scrape_interval }}
scrape_timeout: {{ victoriametrics_scrape_timeout }}
external_labels:
environment: '{{ "homelab" if inventory_hostname in groups["homelab"] else "production" }}'
host: '{{ inventory_hostname }}'
scrape_configs:
# VictoriaMetrics self-monitoring
- job_name: 'victoriametrics'
static_configs:
- targets: ['{{ victoriametrics_listen_address }}']
labels:
service: 'victoriametrics'
# Node exporter for system metrics
- job_name: 'node'
static_configs:
- targets: ['{{ node_exporter_listen_address }}']
labels:
service: 'node_exporter'
instance: '{{ inventory_hostname }}'

View File

@@ -0,0 +1,41 @@
[Unit]
Description=VictoriaMetrics time-series database
Documentation=https://docs.victoriametrics.com/
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User={{ victoriametrics_user }}
Group={{ victoriametrics_group }}
ExecStart=/usr/local/bin/victoria-metrics-prod \
-storageDataPath={{ victoriametrics_data_dir }} \
-retentionPeriod={{ victoriametrics_retention_period }} \
-httpListenAddr={{ victoriametrics_listen_address }} \
-promscrape.config={{ victoriametrics_scrape_config_file }} \
-memory.allowedPercent={{ victoriametrics_memory_allowed_percent }} \
-storage.minFreeDiskSpaceBytes={{ victoriametrics_storage_min_free_disk_space_bytes }}
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=5s
# Security hardening
{% if victoriametrics_systemd_security %}
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths={{ victoriametrics_data_dir }}
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
RestrictRealtime=true
RestrictNamespaces=true
LockPersonality=true
{% endif %}
[Install]
WantedBy=multi-user.target