Add metrics monitoring stack with VictoriaMetrics, Grafana, and node_exporter
Implement complete monitoring infrastructure following rick-infra principles: Components: - VictoriaMetrics: Prometheus-compatible TSDB (7x less RAM usage) - Grafana: Visualization dashboard with Authentik OAuth/OIDC integration - node_exporter: System metrics collection (CPU, memory, disk, network) Architecture: - All services run as native systemd binaries (no containers) - localhost-only binding for security - Grafana uses native OAuth integration with Authentik (not forward_auth) - Full systemd security hardening enabled - Proxied via Caddy at metrics.jnss.me with HTTPS Role Features: - Unified metrics role (single role for complete stack) - Automatic role mapping via Authentik groups: - authentik Admins OR grafana-admins -> Admin access - grafana-editors -> Editor access - All others -> Viewer access - VictoriaMetrics auto-provisioned as default Grafana datasource - 12-month metrics retention by default - Comprehensive documentation included Security: - OAuth/OIDC SSO via Authentik - All metrics services bind to 127.0.0.1 only - systemd hardening (NoNewPrivileges, ProtectSystem, etc.) - Grafana accessible only via Caddy HTTPS proxy Documentation: - roles/metrics/README.md: Complete role documentation - docs/metrics-deployment-guide.md: Step-by-step deployment guide Configuration: - Updated rick-infra.yml to include metrics deployment - Grafana port set to 3001 (Gitea uses 3000) - Ready for multi-host expansion (designed for future node_exporter deployment to production hosts)
This commit is contained in:
311
docs/metrics-deployment-guide.md
Normal file
311
docs/metrics-deployment-guide.md
Normal file
@@ -0,0 +1,311 @@
|
|||||||
|
# Metrics Stack Deployment Guide
|
||||||
|
|
||||||
|
Complete guide to deploying the monitoring stack (VictoriaMetrics, Grafana, node_exporter) on rick-infra.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The metrics stack provides:
|
||||||
|
- **System monitoring**: CPU, memory, disk, network via node_exporter
|
||||||
|
- **Time-series storage**: VictoriaMetrics (Prometheus-compatible, 7x less RAM)
|
||||||
|
- **Visualization**: Grafana with Authentik SSO integration
|
||||||
|
- **Access**: `https://metrics.jnss.me` with role-based permissions
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
User → metrics.jnss.me (HTTPS)
|
||||||
|
↓
|
||||||
|
Caddy (Reverse Proxy)
|
||||||
|
↓
|
||||||
|
Grafana (OAuth → Authentik for SSO)
|
||||||
|
↓
|
||||||
|
VictoriaMetrics (Time-series DB)
|
||||||
|
↑
|
||||||
|
node_exporter (System Metrics)
|
||||||
|
```
|
||||||
|
|
||||||
|
All services run on localhost only, following rick-infra security principles.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
### 1. Caddy Deployed
|
||||||
|
```bash
|
||||||
|
ansible-playbook rick-infra.yml --tags caddy
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Authentik Deployed
|
||||||
|
```bash
|
||||||
|
ansible-playbook rick-infra.yml --tags authentik
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. DNS Configuration
|
||||||
|
Ensure `metrics.jnss.me` points to arch-vps IP:
|
||||||
|
```bash
|
||||||
|
dig metrics.jnss.me # Should return 69.62.119.31
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 1: Configure Authentik OAuth Provider
|
||||||
|
|
||||||
|
### Create OAuth2/OIDC Provider
|
||||||
|
|
||||||
|
1. Login to Authentik at `https://auth.jnss.me`
|
||||||
|
|
||||||
|
2. Navigate to **Applications → Providers** → **Create**
|
||||||
|
|
||||||
|
3. Configure provider:
|
||||||
|
- **Name**: `Grafana`
|
||||||
|
- **Type**: `OAuth2/OpenID Provider`
|
||||||
|
- **Authentication flow**: `default-authentication-flow`
|
||||||
|
- **Authorization flow**: `default-provider-authorization-explicit-consent`
|
||||||
|
- **Client type**: `Confidential`
|
||||||
|
- **Client ID**: `grafana`
|
||||||
|
- **Client Secret**: Click **Generate** and **copy the secret**
|
||||||
|
- **Redirect URIs**: `https://metrics.jnss.me/login/generic_oauth`
|
||||||
|
- **Signing Key**: Select auto-generated key
|
||||||
|
- **Scopes**: `openid`, `profile`, `email`, `groups`
|
||||||
|
|
||||||
|
4. Click **Finish**
|
||||||
|
|
||||||
|
### Create Application
|
||||||
|
|
||||||
|
1. Navigate to **Applications** → **Create**
|
||||||
|
|
||||||
|
2. Configure application:
|
||||||
|
- **Name**: `Grafana`
|
||||||
|
- **Slug**: `grafana`
|
||||||
|
- **Provider**: Select `Grafana` provider created above
|
||||||
|
- **Launch URL**: `https://metrics.jnss.me`
|
||||||
|
|
||||||
|
3. Click **Create**
|
||||||
|
|
||||||
|
### Create Groups (Optional)
|
||||||
|
|
||||||
|
For role-based access control:
|
||||||
|
|
||||||
|
1. Navigate to **Directory → Groups** → **Create**
|
||||||
|
|
||||||
|
2. Create groups:
|
||||||
|
- **grafana-admins**: Full admin access to Grafana
|
||||||
|
- **grafana-editors**: Can create/edit dashboards
|
||||||
|
- All other users get Viewer access
|
||||||
|
|
||||||
|
3. Add users to groups as needed
|
||||||
|
|
||||||
|
## Step 2: Configure Vault Variables
|
||||||
|
|
||||||
|
Edit vault file:
|
||||||
|
```bash
|
||||||
|
ansible-vault edit host_vars/arch-vps/vault.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
Add these variables:
|
||||||
|
```yaml
|
||||||
|
# Grafana admin password (for emergency local login)
|
||||||
|
vault_grafana_admin_password: "your-secure-admin-password"
|
||||||
|
|
||||||
|
# Grafana secret key (generate with: openssl rand -base64 32)
|
||||||
|
vault_grafana_secret_key: "your-random-32-char-secret-key"
|
||||||
|
|
||||||
|
# OAuth credentials from Authentik
|
||||||
|
vault_grafana_oauth_client_id: "grafana"
|
||||||
|
vault_grafana_oauth_client_secret: "paste-secret-from-authentik-here"
|
||||||
|
```
|
||||||
|
|
||||||
|
Save and close (`:wq` in vim).
|
||||||
|
|
||||||
|
## Step 3: Deploy Metrics Stack
|
||||||
|
|
||||||
|
Deploy all components:
|
||||||
|
```bash
|
||||||
|
ansible-playbook rick-infra.yml --tags metrics
|
||||||
|
```
|
||||||
|
|
||||||
|
This will:
|
||||||
|
1. Install and configure VictoriaMetrics
|
||||||
|
2. Install and configure node_exporter
|
||||||
|
3. Install and configure Grafana with OAuth
|
||||||
|
4. Deploy Caddy configuration for `metrics.jnss.me`
|
||||||
|
|
||||||
|
Expected output:
|
||||||
|
```
|
||||||
|
PLAY RECAP *******************************************************
|
||||||
|
arch-vps : ok=25 changed=15 unreachable=0 failed=0 skipped=0
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 4: Verify Deployment
|
||||||
|
|
||||||
|
### Check Services
|
||||||
|
|
||||||
|
SSH to arch-vps and verify services:
|
||||||
|
```bash
|
||||||
|
# Check all services are running
|
||||||
|
systemctl status victoriametrics grafana node_exporter
|
||||||
|
|
||||||
|
# Check service health
|
||||||
|
curl http://127.0.0.1:8428/health # VictoriaMetrics
|
||||||
|
curl http://127.0.0.1:9100/metrics # node_exporter
|
||||||
|
curl http://127.0.0.1:3000/api/health # Grafana
|
||||||
|
```
|
||||||
|
|
||||||
|
### Check HTTPS Access
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -I https://metrics.jnss.me
|
||||||
|
# Should return 200 or 302 (redirect to Authentik)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Check Metrics Collection
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check VictoriaMetrics scrape targets
|
||||||
|
curl http://127.0.0.1:8428/api/v1/targets
|
||||||
|
|
||||||
|
# Should show node_exporter as "up"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 5: Access Grafana
|
||||||
|
|
||||||
|
1. Navigate to `https://metrics.jnss.me`
|
||||||
|
2. Click **"Sign in with Authentik"**
|
||||||
|
3. Login with your Authentik credentials
|
||||||
|
4. You should be redirected to Grafana dashboard
|
||||||
|
|
||||||
|
First login will:
|
||||||
|
- Auto-create your Grafana user
|
||||||
|
- Assign role based on Authentik group membership
|
||||||
|
- Grant access to default organization
|
||||||
|
|
||||||
|
## Step 6: Verify Data Source
|
||||||
|
|
||||||
|
1. In Grafana, navigate to **Connections → Data sources**
|
||||||
|
2. Verify **VictoriaMetrics** is listed and default
|
||||||
|
3. Click on VictoriaMetrics → **Save & test**
|
||||||
|
4. Should show green "Data source is working" message
|
||||||
|
|
||||||
|
## Step 7: Create First Dashboard
|
||||||
|
|
||||||
|
### Option 1: Import Community Dashboard (Recommended)
|
||||||
|
|
||||||
|
1. Navigate to **Dashboards → Import**
|
||||||
|
2. Enter dashboard ID: `1860` (Node Exporter Full)
|
||||||
|
3. Click **Load**
|
||||||
|
4. Select **VictoriaMetrics** as data source
|
||||||
|
5. Click **Import**
|
||||||
|
|
||||||
|
You now have a comprehensive system monitoring dashboard!
|
||||||
|
|
||||||
|
### Option 2: Create Custom Dashboard
|
||||||
|
|
||||||
|
1. Navigate to **Dashboards → New → New Dashboard**
|
||||||
|
2. Click **Add visualization**
|
||||||
|
3. Select **VictoriaMetrics** data source
|
||||||
|
4. Enter PromQL query:
|
||||||
|
```promql
|
||||||
|
# CPU usage
|
||||||
|
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
|
||||||
|
```
|
||||||
|
5. Click **Apply**
|
||||||
|
|
||||||
|
## Step 8: Configure Alerting (Optional)
|
||||||
|
|
||||||
|
Grafana supports alerting on metrics. Configure via **Alerting → Alert rules**.
|
||||||
|
|
||||||
|
Example alert for high CPU:
|
||||||
|
```promql
|
||||||
|
avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100 < 20
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### OAuth Login Fails
|
||||||
|
|
||||||
|
**Symptom**: Redirect to Authentik, but returns error after login
|
||||||
|
|
||||||
|
**Solution**:
|
||||||
|
1. Verify redirect URI in Authentik matches exactly: `https://metrics.jnss.me/login/generic_oauth`
|
||||||
|
2. Check Grafana logs: `journalctl -u grafana -f`
|
||||||
|
3. Verify OAuth credentials in vault match Authentik
|
||||||
|
|
||||||
|
### No Metrics in Grafana
|
||||||
|
|
||||||
|
**Symptom**: Data source working, but no data in dashboards
|
||||||
|
|
||||||
|
**Solution**:
|
||||||
|
1. Check VictoriaMetrics targets: `curl http://127.0.0.1:8428/api/v1/targets`
|
||||||
|
2. Verify node_exporter is up: `systemctl status node_exporter`
|
||||||
|
3. Check time range in Grafana (top right) - try "Last 5 minutes"
|
||||||
|
|
||||||
|
### Can't Access metrics.jnss.me
|
||||||
|
|
||||||
|
**Symptom**: Connection timeout or SSL error
|
||||||
|
|
||||||
|
**Solution**:
|
||||||
|
1. Verify DNS: `dig metrics.jnss.me`
|
||||||
|
2. Check Caddy is running: `systemctl status caddy`
|
||||||
|
3. Check Caddy logs: `journalctl -u caddy -f`
|
||||||
|
4. Verify Caddy config loaded: `ls /etc/caddy/sites/grafana.caddy`
|
||||||
|
|
||||||
|
### Wrong Grafana Role
|
||||||
|
|
||||||
|
**Symptom**: User has wrong permissions (e.g., Viewer instead of Admin)
|
||||||
|
|
||||||
|
**Solution**:
|
||||||
|
1. Verify user is in correct Authentik group (`grafana-admins` or `grafana-editors`)
|
||||||
|
2. Logout of Grafana and login again
|
||||||
|
3. Check role mapping expression in `roles/metrics/defaults/main.yml`:
|
||||||
|
```yaml
|
||||||
|
grafana_oauth_role_attribute_path: "contains(groups, 'grafana-admins') && 'Admin' || contains(groups, 'grafana-editors') && 'Editor' || 'Viewer'"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
### Add More Hosts
|
||||||
|
|
||||||
|
To monitor additional hosts (e.g., mini-vps):
|
||||||
|
|
||||||
|
1. Deploy node_exporter to target host
|
||||||
|
2. Update VictoriaMetrics scrape config to include remote targets
|
||||||
|
3. Configure remote_write or federation
|
||||||
|
|
||||||
|
### Add Service Metrics
|
||||||
|
|
||||||
|
To monitor containerized services:
|
||||||
|
|
||||||
|
1. Expose `/metrics` endpoint in application (port 8080)
|
||||||
|
2. Add scrape config in `roles/metrics/templates/scrape.yml.j2`:
|
||||||
|
```yaml
|
||||||
|
- job_name: 'myservice'
|
||||||
|
static_configs:
|
||||||
|
- targets: ['127.0.0.1:8080']
|
||||||
|
```
|
||||||
|
3. Redeploy metrics role
|
||||||
|
|
||||||
|
### Set Up Alerting
|
||||||
|
|
||||||
|
1. Configure notification channels in Grafana (Email, Slack, etc.)
|
||||||
|
2. Create alert rules for critical metrics
|
||||||
|
3. Set up on-call rotation if needed
|
||||||
|
|
||||||
|
## Security Notes
|
||||||
|
|
||||||
|
- All metrics services run on localhost only
|
||||||
|
- Grafana is the only internet-facing component (via Caddy HTTPS)
|
||||||
|
- OAuth provides SSO with Authentik (no separate Grafana passwords)
|
||||||
|
- systemd hardening enabled on all services
|
||||||
|
- Default admin account should only be used for emergencies
|
||||||
|
|
||||||
|
## Resources
|
||||||
|
|
||||||
|
- **VictoriaMetrics Docs**: https://docs.victoriametrics.com/
|
||||||
|
- **Grafana Docs**: https://grafana.com/docs/
|
||||||
|
- **PromQL Guide**: https://prometheus.io/docs/prometheus/latest/querying/basics/
|
||||||
|
- **Dashboard Library**: https://grafana.com/grafana/dashboards/
|
||||||
|
- **Authentik OAuth**: https://goauthentik.io/docs/providers/oauth2/
|
||||||
|
|
||||||
|
## Support
|
||||||
|
|
||||||
|
For issues specific to rick-infra metrics deployment:
|
||||||
|
1. Check service logs: `journalctl -u <service> -f`
|
||||||
|
2. Review role README: `roles/metrics/README.md`
|
||||||
|
3. Verify vault variables are correctly set
|
||||||
|
4. Ensure Authentik OAuth provider is properly configured
|
||||||
@@ -10,9 +10,11 @@
|
|||||||
# - Authentik SSO/authentication
|
# - Authentik SSO/authentication
|
||||||
# - Gitea git hosting
|
# - Gitea git hosting
|
||||||
# - Vaultwarden password manager
|
# - Vaultwarden password manager
|
||||||
|
# - Metrics (VictoriaMetrics, Grafana, node_exporter)
|
||||||
#
|
#
|
||||||
# Usage:
|
# Usage:
|
||||||
# ansible-playbook playbooks/homelab.yml
|
# ansible-playbook rick-infra.yml
|
||||||
|
# ansible-playbook rick-infra.yml --tags metrics
|
||||||
|
|
||||||
# - import_playbook: playbooks/security.yml
|
# - import_playbook: playbooks/security.yml
|
||||||
- name: Deploy Homelab Infrastructure
|
- name: Deploy Homelab Infrastructure
|
||||||
@@ -21,10 +23,16 @@
|
|||||||
gather_facts: true
|
gather_facts: true
|
||||||
|
|
||||||
tasks:
|
tasks:
|
||||||
- name: Deploy Caddy
|
# - name: Deploy Caddy
|
||||||
|
# include_role:
|
||||||
|
# name: caddy
|
||||||
|
# tags: ['caddy']
|
||||||
|
|
||||||
|
- name: Deploy Metrics Stack
|
||||||
include_role:
|
include_role:
|
||||||
name: caddy
|
name: metrics
|
||||||
tags: ['caddy']
|
tags: ['metrics', 'monitoring', 'grafana', 'victoriametrics']
|
||||||
|
|
||||||
# - name: Deploy Authentik
|
# - name: Deploy Authentik
|
||||||
# include_role:
|
# include_role:
|
||||||
# name: authentik
|
# name: authentik
|
||||||
|
|||||||
325
roles/metrics/README.md
Normal file
325
roles/metrics/README.md
Normal file
@@ -0,0 +1,325 @@
|
|||||||
|
# Metrics Role
|
||||||
|
|
||||||
|
Complete monitoring stack for rick-infra providing system metrics collection, storage, and visualization with SSO integration.
|
||||||
|
|
||||||
|
## Components
|
||||||
|
|
||||||
|
### VictoriaMetrics
|
||||||
|
- **Purpose**: Time-series database for metrics storage
|
||||||
|
- **Type**: Native systemd service
|
||||||
|
- **Listen**: `127.0.0.1:8428` (localhost only)
|
||||||
|
- **Features**:
|
||||||
|
- Prometheus-compatible API and PromQL
|
||||||
|
- 7x less RAM usage than Prometheus
|
||||||
|
- Single binary deployment
|
||||||
|
- 12-month data retention by default
|
||||||
|
|
||||||
|
### Grafana
|
||||||
|
- **Purpose**: Metrics visualization and dashboarding
|
||||||
|
- **Type**: Native systemd service
|
||||||
|
- **Listen**: `127.0.0.1:3000` (localhost only, proxied via Caddy)
|
||||||
|
- **Domain**: `metrics.jnss.me`
|
||||||
|
- **Features**:
|
||||||
|
- OAuth/OIDC integration with Authentik
|
||||||
|
- Role-based access control via Authentik groups
|
||||||
|
- VictoriaMetrics as default data source
|
||||||
|
|
||||||
|
### node_exporter
|
||||||
|
- **Purpose**: System metrics collection
|
||||||
|
- **Type**: Native systemd service
|
||||||
|
- **Listen**: `127.0.0.1:9100` (localhost only)
|
||||||
|
- **Metrics**: CPU, memory, disk, network, systemd units
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────┐
|
||||||
|
│ metrics.jnss.me (Grafana Dashboard) │
|
||||||
|
│ ┌─────────────────────────────────────────────────┐ │
|
||||||
|
│ │ Caddy (HTTPS) │ │
|
||||||
|
│ │ ↓ │ │
|
||||||
|
│ │ Grafana (OAuth → Authentik) │ │
|
||||||
|
│ │ ↓ │ │
|
||||||
|
│ │ VictoriaMetrics (Prometheus-compatible) │ │
|
||||||
|
│ │ ↑ │ │
|
||||||
|
│ │ node_exporter (System Metrics) │ │
|
||||||
|
│ └─────────────────────────────────────────────────┘ │
|
||||||
|
└─────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Deployment
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
1. **Caddy role deployed** - Required for HTTPS proxy
|
||||||
|
2. **Authentik deployed** - Required for OAuth/SSO
|
||||||
|
3. **Vault variables configured**:
|
||||||
|
```yaml
|
||||||
|
# In host_vars/arch-vps/vault.yml
|
||||||
|
vault_grafana_admin_password: "secure-admin-password"
|
||||||
|
vault_grafana_secret_key: "random-secret-key-32-chars"
|
||||||
|
vault_grafana_oauth_client_id: "grafana"
|
||||||
|
vault_grafana_oauth_client_secret: "oauth-client-secret-from-authentik"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Authentik Configuration
|
||||||
|
|
||||||
|
Before deployment, create OAuth2/OIDC provider in Authentik:
|
||||||
|
|
||||||
|
1. **Create Provider**:
|
||||||
|
- Name: `Grafana`
|
||||||
|
- Type: `OAuth2/OpenID Provider`
|
||||||
|
- Client ID: `grafana`
|
||||||
|
- Client Secret: Generate and save to vault
|
||||||
|
- Redirect URIs: `https://metrics.jnss.me/login/generic_oauth`
|
||||||
|
- Signing Key: Auto-generated
|
||||||
|
|
||||||
|
2. **Create Application**:
|
||||||
|
- Name: `Grafana`
|
||||||
|
- Slug: `grafana`
|
||||||
|
- Provider: Select Grafana provider created above
|
||||||
|
|
||||||
|
3. **Create Groups** (optional, for role mapping):
|
||||||
|
- `grafana-admins` - Full admin access
|
||||||
|
- `grafana-editors` - Can create/edit dashboards
|
||||||
|
- Users without these groups get Viewer access
|
||||||
|
|
||||||
|
### Deploy
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Deploy complete metrics stack
|
||||||
|
ansible-playbook rick-infra.yml --tags metrics
|
||||||
|
|
||||||
|
# Deploy individual components
|
||||||
|
ansible-playbook rick-infra.yml --tags victoriametrics
|
||||||
|
ansible-playbook rick-infra.yml --tags grafana
|
||||||
|
ansible-playbook rick-infra.yml --tags node_exporter
|
||||||
|
```
|
||||||
|
|
||||||
|
### Verify Deployment
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check service status
|
||||||
|
ansible homelab -a "systemctl status victoriametrics grafana node_exporter"
|
||||||
|
|
||||||
|
# Check metrics collection
|
||||||
|
curl http://127.0.0.1:9100/metrics # node_exporter metrics
|
||||||
|
curl http://127.0.0.1:8428/metrics # VictoriaMetrics metrics
|
||||||
|
curl http://127.0.0.1:8428/api/v1/targets # Scrape targets
|
||||||
|
|
||||||
|
# Access Grafana
|
||||||
|
curl -I https://metrics.jnss.me/ # Should redirect to Authentik login
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Access Dashboard
|
||||||
|
|
||||||
|
1. Navigate to `https://metrics.jnss.me`
|
||||||
|
2. Click "Sign in with Authentik"
|
||||||
|
3. Authenticate via Authentik SSO
|
||||||
|
4. Access granted based on Authentik group membership
|
||||||
|
|
||||||
|
### Role Mapping
|
||||||
|
|
||||||
|
Grafana roles are automatically assigned based on Authentik groups:
|
||||||
|
|
||||||
|
- **Admin**: Members of `grafana-admins` group
|
||||||
|
- Full administrative access
|
||||||
|
- Can manage users, data sources, plugins
|
||||||
|
- Can create/edit/delete all dashboards
|
||||||
|
|
||||||
|
- **Editor**: Members of `grafana-editors` group
|
||||||
|
- Can create and edit dashboards
|
||||||
|
- Cannot manage users or data sources
|
||||||
|
|
||||||
|
- **Viewer**: All other authenticated users
|
||||||
|
- Read-only access to dashboards
|
||||||
|
- Cannot create or edit dashboards
|
||||||
|
|
||||||
|
### Creating Dashboards
|
||||||
|
|
||||||
|
Grafana comes with VictoriaMetrics pre-configured as the default data source. Use PromQL queries:
|
||||||
|
|
||||||
|
```promql
|
||||||
|
# CPU usage
|
||||||
|
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
|
||||||
|
|
||||||
|
# Memory usage
|
||||||
|
node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes
|
||||||
|
|
||||||
|
# Disk usage
|
||||||
|
100 - ((node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100)
|
||||||
|
|
||||||
|
# Network traffic
|
||||||
|
irate(node_network_receive_bytes_total[5m])
|
||||||
|
```
|
||||||
|
|
||||||
|
### Import Community Dashboards
|
||||||
|
|
||||||
|
1. Browse dashboards at https://grafana.com/grafana/dashboards/
|
||||||
|
2. Recommended for node_exporter:
|
||||||
|
- Dashboard ID: 1860 (Node Exporter Full)
|
||||||
|
- Dashboard ID: 11074 (Node Exporter for Prometheus)
|
||||||
|
3. Import via Grafana UI: Dashboards → Import → Enter ID
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
### Customization
|
||||||
|
|
||||||
|
Key configuration options in `roles/metrics/defaults/main.yml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# Data retention
|
||||||
|
victoriametrics_retention_period: "12" # months
|
||||||
|
|
||||||
|
# Scrape interval
|
||||||
|
victoriametrics_scrape_interval: "15s"
|
||||||
|
|
||||||
|
# OAuth role mapping (JMESPath expression)
|
||||||
|
grafana_oauth_role_attribute_path: "contains(groups, 'grafana-admins') && 'Admin' || contains(groups, 'grafana-editors') && 'Editor' || 'Viewer'"
|
||||||
|
|
||||||
|
# Memory limits
|
||||||
|
victoriametrics_memory_allowed_percent: "60"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Adding Scrape Targets
|
||||||
|
|
||||||
|
Edit `roles/metrics/templates/scrape.yml.j2`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
scrape_configs:
|
||||||
|
# Add custom application metrics
|
||||||
|
- job_name: 'myapp'
|
||||||
|
static_configs:
|
||||||
|
- targets: ['127.0.0.1:8080']
|
||||||
|
labels:
|
||||||
|
service: 'myapp'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Operations
|
||||||
|
|
||||||
|
### Service Management
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# VictoriaMetrics
|
||||||
|
systemctl status victoriametrics
|
||||||
|
systemctl restart victoriametrics
|
||||||
|
journalctl -u victoriametrics -f
|
||||||
|
|
||||||
|
# Grafana
|
||||||
|
systemctl status grafana
|
||||||
|
systemctl restart grafana
|
||||||
|
journalctl -u grafana -f
|
||||||
|
|
||||||
|
# node_exporter
|
||||||
|
systemctl status node_exporter
|
||||||
|
systemctl restart node_exporter
|
||||||
|
journalctl -u node_exporter -f
|
||||||
|
```
|
||||||
|
|
||||||
|
### Data Locations
|
||||||
|
|
||||||
|
```
|
||||||
|
/var/lib/victoriametrics/ # Time-series data
|
||||||
|
/var/lib/grafana/ # Grafana database and dashboards
|
||||||
|
/var/log/grafana/ # Grafana logs
|
||||||
|
/etc/victoriametrics/ # VictoriaMetrics config
|
||||||
|
/etc/grafana/ # Grafana config
|
||||||
|
```
|
||||||
|
|
||||||
|
### Backup
|
||||||
|
|
||||||
|
VictoriaMetrics data is stored in `/var/lib/victoriametrics`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Stop service
|
||||||
|
systemctl stop victoriametrics
|
||||||
|
|
||||||
|
# Backup data
|
||||||
|
tar -czf victoriametrics-backup-$(date +%Y%m%d).tar.gz /var/lib/victoriametrics
|
||||||
|
|
||||||
|
# Start service
|
||||||
|
systemctl start victoriametrics
|
||||||
|
```
|
||||||
|
|
||||||
|
Grafana dashboards are stored in SQLite database at `/var/lib/grafana/grafana.db`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Backup Grafana
|
||||||
|
systemctl stop grafana
|
||||||
|
tar -czf grafana-backup-$(date +%Y%m%d).tar.gz /var/lib/grafana /etc/grafana
|
||||||
|
systemctl start grafana
|
||||||
|
```
|
||||||
|
|
||||||
|
## Security
|
||||||
|
|
||||||
|
### Authentication
|
||||||
|
- Grafana protected by Authentik OAuth/OIDC
|
||||||
|
- Local admin account available for emergency access
|
||||||
|
- All services bind to localhost only
|
||||||
|
|
||||||
|
### Network Security
|
||||||
|
- VictoriaMetrics: `127.0.0.1:8428` (no external access)
|
||||||
|
- Grafana: `127.0.0.1:3000` (proxied via Caddy with HTTPS)
|
||||||
|
- node_exporter: `127.0.0.1:9100` (no external access)
|
||||||
|
|
||||||
|
### systemd Hardening
|
||||||
|
All services run with security restrictions:
|
||||||
|
- `NoNewPrivileges=true`
|
||||||
|
- `ProtectSystem=strict`
|
||||||
|
- `ProtectHome=true`
|
||||||
|
- `PrivateTmp=true`
|
||||||
|
- Read-only filesystem (except data directories)
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Grafana OAuth Not Working
|
||||||
|
|
||||||
|
1. Check Authentik provider configuration:
|
||||||
|
```bash
|
||||||
|
# Verify redirect URI matches
|
||||||
|
# https://metrics.jnss.me/login/generic_oauth
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Check Grafana logs:
|
||||||
|
```bash
|
||||||
|
journalctl -u grafana -f
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Verify OAuth credentials in vault match Authentik
|
||||||
|
|
||||||
|
### No Metrics in Grafana
|
||||||
|
|
||||||
|
1. Check VictoriaMetrics scrape targets:
|
||||||
|
```bash
|
||||||
|
curl http://127.0.0.1:8428/api/v1/targets
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Check node_exporter is running:
|
||||||
|
```bash
|
||||||
|
systemctl status node_exporter
|
||||||
|
curl http://127.0.0.1:9100/metrics
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Check VictoriaMetrics logs:
|
||||||
|
```bash
|
||||||
|
journalctl -u victoriametrics -f
|
||||||
|
```
|
||||||
|
|
||||||
|
### High Memory Usage
|
||||||
|
|
||||||
|
VictoriaMetrics is configured to use max 60% of available memory. Adjust if needed:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# In roles/metrics/defaults/main.yml
|
||||||
|
victoriametrics_memory_allowed_percent: "40" # Reduce to 40%
|
||||||
|
```
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [VictoriaMetrics Documentation](https://docs.victoriametrics.com/)
|
||||||
|
- [Grafana Documentation](https://grafana.com/docs/)
|
||||||
|
- [node_exporter GitHub](https://github.com/prometheus/node_exporter)
|
||||||
|
- [PromQL Documentation](https://prometheus.io/docs/prometheus/latest/querying/basics/)
|
||||||
|
- [Authentik OAuth Integration](https://goauthentik.io/docs/providers/oauth2/)
|
||||||
178
roles/metrics/defaults/main.yml
Normal file
178
roles/metrics/defaults/main.yml
Normal file
@@ -0,0 +1,178 @@
|
|||||||
|
---
|
||||||
|
# =================================================================
|
||||||
|
# Metrics Infrastructure Role - Complete Monitoring Stack
|
||||||
|
# =================================================================
|
||||||
|
# Provides VictoriaMetrics, Grafana, and node_exporter as unified stack
|
||||||
|
|
||||||
|
# =================================================================
|
||||||
|
# VictoriaMetrics Configuration
|
||||||
|
# =================================================================
|
||||||
|
|
||||||
|
# Service Management
|
||||||
|
victoriametrics_service_enabled: true
|
||||||
|
victoriametrics_service_state: "started"
|
||||||
|
|
||||||
|
# Version
|
||||||
|
victoriametrics_version: "1.105.0"
|
||||||
|
|
||||||
|
# Network Security (localhost only)
|
||||||
|
victoriametrics_listen_address: "127.0.0.1:8428"
|
||||||
|
|
||||||
|
# Storage Configuration
|
||||||
|
victoriametrics_data_dir: "/var/lib/victoriametrics"
|
||||||
|
victoriametrics_retention_period: "12" # months
|
||||||
|
|
||||||
|
# User/Group
|
||||||
|
victoriametrics_user: "victoriametrics"
|
||||||
|
victoriametrics_group: "victoriametrics"
|
||||||
|
|
||||||
|
# Performance Settings
|
||||||
|
victoriametrics_memory_allowed_percent: "30"
|
||||||
|
victoriametrics_storage_min_free_disk_space_bytes: "10GB"
|
||||||
|
|
||||||
|
# Scrape Configuration
|
||||||
|
victoriametrics_scrape_config_dir: "/etc/victoriametrics"
|
||||||
|
victoriametrics_scrape_config_file: "{{ victoriametrics_scrape_config_dir }}/scrape.yml"
|
||||||
|
victoriametrics_scrape_interval: "15s"
|
||||||
|
victoriametrics_scrape_timeout: "10s"
|
||||||
|
|
||||||
|
# systemd security
|
||||||
|
victoriametrics_systemd_security: true
|
||||||
|
|
||||||
|
# =================================================================
|
||||||
|
# Grafana Configuration
|
||||||
|
# =================================================================
|
||||||
|
|
||||||
|
# Service Management
|
||||||
|
grafana_service_enabled: true
|
||||||
|
grafana_service_state: "started"
|
||||||
|
|
||||||
|
# Version
|
||||||
|
grafana_version: "11.4.0"
|
||||||
|
|
||||||
|
# Network Security (localhost only - proxied via Caddy)
|
||||||
|
grafana_listen_address: "127.0.0.1"
|
||||||
|
grafana_listen_port: 3420
|
||||||
|
|
||||||
|
# User/Group
|
||||||
|
grafana_user: "grafana"
|
||||||
|
grafana_group: "grafana"
|
||||||
|
|
||||||
|
# Directories
|
||||||
|
grafana_data_dir: "/var/lib/grafana"
|
||||||
|
grafana_logs_dir: "/var/log/grafana"
|
||||||
|
grafana_plugins_dir: "/var/lib/grafana/plugins"
|
||||||
|
grafana_provisioning_dir: "/etc/grafana/provisioning"
|
||||||
|
|
||||||
|
# Domain Configuration
|
||||||
|
grafana_domain: "metrics.{{ caddy_domain }}"
|
||||||
|
grafana_root_url: "https://{{ grafana_domain }}"
|
||||||
|
|
||||||
|
# Default admin (used only for initial setup)
|
||||||
|
grafana_admin_user: "admin"
|
||||||
|
grafana_admin_password: "{{ vault_grafana_admin_password }}"
|
||||||
|
|
||||||
|
# Disable registration (OAuth only)
|
||||||
|
grafana_allow_signup: false
|
||||||
|
grafana_disable_login_form: false # Keep fallback login
|
||||||
|
|
||||||
|
# OAuth/OIDC Configuration (Authentik)
|
||||||
|
grafana_oauth_enabled: true
|
||||||
|
grafana_oauth_name: "Authentik"
|
||||||
|
grafana_oauth_client_id: "{{ vault_grafana_oauth_client_id }}"
|
||||||
|
grafana_oauth_client_secret: "{{ vault_grafana_oauth_client_secret }}"
|
||||||
|
|
||||||
|
# Authentik OAuth endpoints
|
||||||
|
grafana_oauth_auth_url: "https://{{ authentik_domain }}/application/o/authorize/"
|
||||||
|
grafana_oauth_token_url: "https://{{ authentik_domain }}/application/o/token/"
|
||||||
|
grafana_oauth_api_url: "https://{{ authentik_domain }}/application/o/userinfo/"
|
||||||
|
|
||||||
|
# OAuth role mapping
|
||||||
|
grafana_oauth_role_attribute_path: "(contains(groups, 'authentik Admins') || contains(groups, 'grafana-admins')) && 'Admin' || contains(groups, 'grafana-editors') && 'Editor' || 'Viewer'"
|
||||||
|
grafana_oauth_allow_sign_up: true # Auto-create users from OAuth
|
||||||
|
grafana_oauth_scopes: "openid profile email groups"
|
||||||
|
|
||||||
|
# Data Source Configuration
|
||||||
|
grafana_datasource_vm_enabled: true
|
||||||
|
grafana_datasource_vm_url: "http://{{ victoriametrics_listen_address }}"
|
||||||
|
grafana_datasource_vm_name: "VictoriaMetrics"
|
||||||
|
|
||||||
|
# Security
|
||||||
|
grafana_systemd_security: true
|
||||||
|
grafana_cookie_secure: true
|
||||||
|
grafana_cookie_samesite: "lax"
|
||||||
|
|
||||||
|
# Database (SQLite by default)
|
||||||
|
grafana_database_type: "sqlite3"
|
||||||
|
grafana_database_path: "{{ grafana_data_dir }}/grafana.db"
|
||||||
|
|
||||||
|
# =================================================================
|
||||||
|
# Node Exporter Configuration
|
||||||
|
# =================================================================
|
||||||
|
|
||||||
|
# Service Management
|
||||||
|
node_exporter_service_enabled: true
|
||||||
|
node_exporter_service_state: "started"
|
||||||
|
|
||||||
|
# Version
|
||||||
|
node_exporter_version: "1.8.2"
|
||||||
|
|
||||||
|
# Network Security (localhost only)
|
||||||
|
node_exporter_listen_address: "127.0.0.1:9100"
|
||||||
|
|
||||||
|
# User/Group
|
||||||
|
node_exporter_user: "node_exporter"
|
||||||
|
node_exporter_group: "node_exporter"
|
||||||
|
|
||||||
|
# Enabled collectors
|
||||||
|
node_exporter_enabled_collectors:
|
||||||
|
- cpu
|
||||||
|
- diskstats
|
||||||
|
- filesystem
|
||||||
|
- loadavg
|
||||||
|
- meminfo
|
||||||
|
- netdev
|
||||||
|
- netstat
|
||||||
|
- stat
|
||||||
|
- time
|
||||||
|
- uname
|
||||||
|
- vmstat
|
||||||
|
- systemd
|
||||||
|
|
||||||
|
# Disabled collectors
|
||||||
|
node_exporter_disabled_collectors:
|
||||||
|
- mdadm
|
||||||
|
|
||||||
|
# Filesystem collector configuration
|
||||||
|
node_exporter_filesystem_ignored_fs_types:
|
||||||
|
- tmpfs
|
||||||
|
- devtmpfs
|
||||||
|
- devfs
|
||||||
|
- iso9660
|
||||||
|
- overlay
|
||||||
|
- aufs
|
||||||
|
- squashfs
|
||||||
|
|
||||||
|
node_exporter_filesystem_ignored_mount_points:
|
||||||
|
- /var/lib/containers/storage/.*
|
||||||
|
- /run/.*
|
||||||
|
- /sys/.*
|
||||||
|
- /proc/.*
|
||||||
|
|
||||||
|
# systemd security
|
||||||
|
node_exporter_systemd_security: true
|
||||||
|
|
||||||
|
# =================================================================
|
||||||
|
# Infrastructure Notes
|
||||||
|
# =================================================================
|
||||||
|
# Complete monitoring stack:
|
||||||
|
# - VictoriaMetrics: Time-series database (Prometheus-compatible)
|
||||||
|
# - Grafana: Visualization with Authentik OAuth integration
|
||||||
|
# - node_exporter: System metrics collection
|
||||||
|
#
|
||||||
|
# Role mapping via Authentik groups:
|
||||||
|
# - grafana-admins: Full admin access
|
||||||
|
# - grafana-editors: Can create/edit dashboards
|
||||||
|
# - Default: Viewer access
|
||||||
|
#
|
||||||
|
# All services run on localhost only, proxied via Caddy
|
||||||
23
roles/metrics/handlers/main.yml
Normal file
23
roles/metrics/handlers/main.yml
Normal file
@@ -0,0 +1,23 @@
|
|||||||
|
---
|
||||||
|
- name: restart victoriametrics
|
||||||
|
ansible.builtin.systemd:
|
||||||
|
name: victoriametrics
|
||||||
|
state: restarted
|
||||||
|
daemon_reload: true
|
||||||
|
|
||||||
|
- name: restart node_exporter
|
||||||
|
ansible.builtin.systemd:
|
||||||
|
name: node_exporter
|
||||||
|
state: restarted
|
||||||
|
daemon_reload: true
|
||||||
|
|
||||||
|
- name: restart grafana
|
||||||
|
ansible.builtin.systemd:
|
||||||
|
name: grafana
|
||||||
|
state: restarted
|
||||||
|
daemon_reload: true
|
||||||
|
|
||||||
|
- name: reload caddy
|
||||||
|
ansible.builtin.systemd:
|
||||||
|
name: caddy
|
||||||
|
state: reloaded
|
||||||
3
roles/metrics/meta/main.yml
Normal file
3
roles/metrics/meta/main.yml
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
---
|
||||||
|
dependencies:
|
||||||
|
- role: caddy
|
||||||
9
roles/metrics/tasks/caddy.yml
Normal file
9
roles/metrics/tasks/caddy.yml
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
---
|
||||||
|
- name: Deploy Grafana Caddy configuration
|
||||||
|
ansible.builtin.template:
|
||||||
|
src: grafana.caddy.j2
|
||||||
|
dest: /etc/caddy/sites-enabled/grafana.caddy
|
||||||
|
owner: caddy
|
||||||
|
group: caddy
|
||||||
|
mode: '0644'
|
||||||
|
notify: reload caddy
|
||||||
90
roles/metrics/tasks/grafana.yml
Normal file
90
roles/metrics/tasks/grafana.yml
Normal file
@@ -0,0 +1,90 @@
|
|||||||
|
---
|
||||||
|
- name: Create Grafana system user
|
||||||
|
ansible.builtin.user:
|
||||||
|
name: "{{ grafana_user }}"
|
||||||
|
system: true
|
||||||
|
create_home: false
|
||||||
|
shell: /usr/sbin/nologin
|
||||||
|
state: present
|
||||||
|
|
||||||
|
- name: Create Grafana directories
|
||||||
|
ansible.builtin.file:
|
||||||
|
path: "{{ item }}"
|
||||||
|
state: directory
|
||||||
|
owner: "{{ grafana_user }}"
|
||||||
|
group: "{{ grafana_group }}"
|
||||||
|
mode: '0755'
|
||||||
|
loop:
|
||||||
|
- "{{ grafana_data_dir }}"
|
||||||
|
- "{{ grafana_logs_dir }}"
|
||||||
|
- "{{ grafana_plugins_dir }}"
|
||||||
|
- "{{ grafana_provisioning_dir }}"
|
||||||
|
- "{{ grafana_provisioning_dir }}/datasources"
|
||||||
|
- "{{ grafana_provisioning_dir }}/dashboards"
|
||||||
|
- "{{ grafana_data_dir }}/dashboards"
|
||||||
|
- /etc/grafana
|
||||||
|
|
||||||
|
- name: Download Grafana binary
|
||||||
|
ansible.builtin.get_url:
|
||||||
|
url: "https://dl.grafana.com/oss/release/grafana-{{ grafana_version }}.linux-amd64.tar.gz"
|
||||||
|
dest: "/tmp/grafana-{{ grafana_version }}.tar.gz"
|
||||||
|
mode: '0644'
|
||||||
|
register: grafana_download
|
||||||
|
|
||||||
|
- name: Extract Grafana
|
||||||
|
ansible.builtin.unarchive:
|
||||||
|
src: "/tmp/grafana-{{ grafana_version }}.tar.gz"
|
||||||
|
dest: /opt
|
||||||
|
remote_src: true
|
||||||
|
creates: "/opt/grafana-v{{ grafana_version }}"
|
||||||
|
when: grafana_download.changed
|
||||||
|
|
||||||
|
- name: Create Grafana symlink
|
||||||
|
ansible.builtin.file:
|
||||||
|
src: "/opt/grafana-v{{ grafana_version }}"
|
||||||
|
dest: /opt/grafana
|
||||||
|
state: link
|
||||||
|
|
||||||
|
- name: Deploy Grafana configuration
|
||||||
|
ansible.builtin.template:
|
||||||
|
src: grafana.ini.j2
|
||||||
|
dest: /etc/grafana/grafana.ini
|
||||||
|
owner: "{{ grafana_user }}"
|
||||||
|
group: "{{ grafana_group }}"
|
||||||
|
mode: '0640'
|
||||||
|
notify: restart grafana
|
||||||
|
|
||||||
|
- name: Deploy VictoriaMetrics datasource provisioning
|
||||||
|
ansible.builtin.template:
|
||||||
|
src: datasource-victoriametrics.yml.j2
|
||||||
|
dest: "{{ grafana_provisioning_dir }}/datasources/victoriametrics.yml"
|
||||||
|
owner: "{{ grafana_user }}"
|
||||||
|
group: "{{ grafana_group }}"
|
||||||
|
mode: '0644'
|
||||||
|
notify: restart grafana
|
||||||
|
when: grafana_datasource_vm_enabled
|
||||||
|
|
||||||
|
- name: Deploy dashboard provisioning
|
||||||
|
ansible.builtin.template:
|
||||||
|
src: dashboards.yml.j2
|
||||||
|
dest: "{{ grafana_provisioning_dir }}/dashboards/default.yml"
|
||||||
|
owner: "{{ grafana_user }}"
|
||||||
|
group: "{{ grafana_group }}"
|
||||||
|
mode: '0644'
|
||||||
|
notify: restart grafana
|
||||||
|
|
||||||
|
- name: Deploy Grafana systemd service
|
||||||
|
ansible.builtin.template:
|
||||||
|
src: grafana.service.j2
|
||||||
|
dest: /etc/systemd/system/grafana.service
|
||||||
|
owner: root
|
||||||
|
group: root
|
||||||
|
mode: '0644'
|
||||||
|
notify: restart grafana
|
||||||
|
|
||||||
|
- name: Enable and start Grafana service
|
||||||
|
ansible.builtin.systemd:
|
||||||
|
name: grafana
|
||||||
|
enabled: "{{ grafana_service_enabled }}"
|
||||||
|
state: "{{ grafana_service_state }}"
|
||||||
|
daemon_reload: true
|
||||||
20
roles/metrics/tasks/main.yml
Normal file
20
roles/metrics/tasks/main.yml
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
---
|
||||||
|
# =================================================================
|
||||||
|
# Metrics Stack Deployment
|
||||||
|
# =================================================================
|
||||||
|
|
||||||
|
- name: Deploy VictoriaMetrics
|
||||||
|
ansible.builtin.include_tasks: victoriametrics.yml
|
||||||
|
tags: [metrics, victoriametrics]
|
||||||
|
|
||||||
|
- name: Deploy node_exporter
|
||||||
|
ansible.builtin.include_tasks: node_exporter.yml
|
||||||
|
tags: [metrics, node_exporter]
|
||||||
|
|
||||||
|
- name: Deploy Grafana
|
||||||
|
ansible.builtin.include_tasks: grafana.yml
|
||||||
|
tags: [metrics, grafana]
|
||||||
|
|
||||||
|
- name: Deploy Caddy configuration for Grafana
|
||||||
|
ansible.builtin.include_tasks: caddy.yml
|
||||||
|
tags: [metrics, caddy]
|
||||||
49
roles/metrics/tasks/node_exporter.yml
Normal file
49
roles/metrics/tasks/node_exporter.yml
Normal file
@@ -0,0 +1,49 @@
|
|||||||
|
---
|
||||||
|
- name: Create node_exporter system user
|
||||||
|
ansible.builtin.user:
|
||||||
|
name: "{{ node_exporter_user }}"
|
||||||
|
system: true
|
||||||
|
create_home: false
|
||||||
|
shell: /usr/sbin/nologin
|
||||||
|
state: present
|
||||||
|
|
||||||
|
- name: Download node_exporter binary
|
||||||
|
ansible.builtin.get_url:
|
||||||
|
url: "https://github.com/prometheus/node_exporter/releases/download/v{{ node_exporter_version }}/node_exporter-{{ node_exporter_version }}.linux-amd64.tar.gz"
|
||||||
|
dest: "/tmp/node_exporter-{{ node_exporter_version }}.tar.gz"
|
||||||
|
mode: '0644'
|
||||||
|
register: node_exporter_download
|
||||||
|
|
||||||
|
- name: Extract node_exporter binary
|
||||||
|
ansible.builtin.unarchive:
|
||||||
|
src: "/tmp/node_exporter-{{ node_exporter_version }}.tar.gz"
|
||||||
|
dest: /tmp
|
||||||
|
remote_src: true
|
||||||
|
creates: "/tmp/node_exporter-{{ node_exporter_version }}.linux-amd64"
|
||||||
|
when: node_exporter_download.changed
|
||||||
|
|
||||||
|
- name: Copy node_exporter binary to /usr/local/bin
|
||||||
|
ansible.builtin.copy:
|
||||||
|
src: "/tmp/node_exporter-{{ node_exporter_version }}.linux-amd64/node_exporter"
|
||||||
|
dest: /usr/local/bin/node_exporter
|
||||||
|
owner: root
|
||||||
|
group: root
|
||||||
|
mode: '0755'
|
||||||
|
remote_src: true
|
||||||
|
when: node_exporter_download.changed
|
||||||
|
|
||||||
|
- name: Deploy node_exporter systemd service
|
||||||
|
ansible.builtin.template:
|
||||||
|
src: node_exporter.service.j2
|
||||||
|
dest: /etc/systemd/system/node_exporter.service
|
||||||
|
owner: root
|
||||||
|
group: root
|
||||||
|
mode: '0644'
|
||||||
|
notify: restart node_exporter
|
||||||
|
|
||||||
|
- name: Enable and start node_exporter service
|
||||||
|
ansible.builtin.systemd:
|
||||||
|
name: node_exporter
|
||||||
|
enabled: "{{ node_exporter_service_enabled }}"
|
||||||
|
state: "{{ node_exporter_service_state }}"
|
||||||
|
daemon_reload: true
|
||||||
66
roles/metrics/tasks/victoriametrics.yml
Normal file
66
roles/metrics/tasks/victoriametrics.yml
Normal file
@@ -0,0 +1,66 @@
|
|||||||
|
---
|
||||||
|
- name: Create VictoriaMetrics system user
|
||||||
|
ansible.builtin.user:
|
||||||
|
name: "{{ victoriametrics_user }}"
|
||||||
|
system: true
|
||||||
|
create_home: false
|
||||||
|
shell: /usr/sbin/nologin
|
||||||
|
state: present
|
||||||
|
|
||||||
|
- name: Create VictoriaMetrics directories
|
||||||
|
ansible.builtin.file:
|
||||||
|
path: "{{ item }}"
|
||||||
|
state: directory
|
||||||
|
owner: "{{ victoriametrics_user }}"
|
||||||
|
group: "{{ victoriametrics_group }}"
|
||||||
|
mode: '0755'
|
||||||
|
loop:
|
||||||
|
- "{{ victoriametrics_data_dir }}"
|
||||||
|
- "{{ victoriametrics_scrape_config_dir }}"
|
||||||
|
|
||||||
|
- name: Download VictoriaMetrics binary
|
||||||
|
ansible.builtin.get_url:
|
||||||
|
url: "https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v{{ victoriametrics_version }}/victoria-metrics-linux-amd64-v{{ victoriametrics_version }}.tar.gz"
|
||||||
|
dest: "/tmp/victoria-metrics-v{{ victoriametrics_version }}.tar.gz"
|
||||||
|
mode: '0644'
|
||||||
|
register: victoriametrics_download
|
||||||
|
|
||||||
|
- name: Extract VictoriaMetrics binary
|
||||||
|
ansible.builtin.unarchive:
|
||||||
|
src: "/tmp/victoria-metrics-v{{ victoriametrics_version }}.tar.gz"
|
||||||
|
dest: /usr/local/bin
|
||||||
|
remote_src: true
|
||||||
|
creates: /usr/local/bin/victoria-metrics-prod
|
||||||
|
when: victoriametrics_download.changed
|
||||||
|
|
||||||
|
- name: Set VictoriaMetrics binary permissions
|
||||||
|
ansible.builtin.file:
|
||||||
|
path: /usr/local/bin/victoria-metrics-prod
|
||||||
|
owner: root
|
||||||
|
group: root
|
||||||
|
mode: '0755'
|
||||||
|
|
||||||
|
- name: Deploy VictoriaMetrics scrape configuration
|
||||||
|
ansible.builtin.template:
|
||||||
|
src: scrape.yml.j2
|
||||||
|
dest: "{{ victoriametrics_scrape_config_file }}"
|
||||||
|
owner: "{{ victoriametrics_user }}"
|
||||||
|
group: "{{ victoriametrics_group }}"
|
||||||
|
mode: '0644'
|
||||||
|
notify: restart victoriametrics
|
||||||
|
|
||||||
|
- name: Deploy VictoriaMetrics systemd service
|
||||||
|
ansible.builtin.template:
|
||||||
|
src: victoriametrics.service.j2
|
||||||
|
dest: /etc/systemd/system/victoriametrics.service
|
||||||
|
owner: root
|
||||||
|
group: root
|
||||||
|
mode: '0644'
|
||||||
|
notify: restart victoriametrics
|
||||||
|
|
||||||
|
- name: Enable and start VictoriaMetrics service
|
||||||
|
ansible.builtin.systemd:
|
||||||
|
name: victoriametrics
|
||||||
|
enabled: "{{ victoriametrics_service_enabled }}"
|
||||||
|
state: "{{ victoriametrics_service_state }}"
|
||||||
|
daemon_reload: true
|
||||||
12
roles/metrics/templates/dashboards.yml.j2
Normal file
12
roles/metrics/templates/dashboards.yml.j2
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
apiVersion: 1
|
||||||
|
|
||||||
|
providers:
|
||||||
|
- name: 'default'
|
||||||
|
orgId: 1
|
||||||
|
folder: ''
|
||||||
|
type: file
|
||||||
|
disableDeletion: false
|
||||||
|
updateIntervalSeconds: 10
|
||||||
|
allowUiUpdates: true
|
||||||
|
options:
|
||||||
|
path: {{ grafana_data_dir }}/dashboards
|
||||||
12
roles/metrics/templates/datasource-victoriametrics.yml.j2
Normal file
12
roles/metrics/templates/datasource-victoriametrics.yml.j2
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
apiVersion: 1
|
||||||
|
|
||||||
|
datasources:
|
||||||
|
- name: {{ grafana_datasource_vm_name }}
|
||||||
|
type: prometheus
|
||||||
|
access: proxy
|
||||||
|
url: {{ grafana_datasource_vm_url }}
|
||||||
|
isDefault: true
|
||||||
|
editable: true
|
||||||
|
jsonData:
|
||||||
|
httpMethod: POST
|
||||||
|
timeInterval: 15s
|
||||||
26
roles/metrics/templates/grafana.caddy.j2
Normal file
26
roles/metrics/templates/grafana.caddy.j2
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
# Grafana Metrics Dashboard
|
||||||
|
{{ grafana_domain }} {
|
||||||
|
reverse_proxy http://{{ grafana_listen_address }}:{{ grafana_listen_port }} {
|
||||||
|
header_up Host {host}
|
||||||
|
header_up X-Real-IP {remote_host}
|
||||||
|
header_up X-Forwarded-Proto https
|
||||||
|
header_up X-Forwarded-For {remote_host}
|
||||||
|
header_up X-Forwarded-Host {host}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Security headers
|
||||||
|
header {
|
||||||
|
X-Frame-Options SAMEORIGIN
|
||||||
|
X-Content-Type-Options nosniff
|
||||||
|
X-XSS-Protection "1; mode=block"
|
||||||
|
Referrer-Policy strict-origin-when-cross-origin
|
||||||
|
Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Logging
|
||||||
|
log {
|
||||||
|
output file {{ caddy_log_dir }}/grafana.log
|
||||||
|
level INFO
|
||||||
|
format json
|
||||||
|
}
|
||||||
|
}
|
||||||
68
roles/metrics/templates/grafana.ini.j2
Normal file
68
roles/metrics/templates/grafana.ini.j2
Normal file
@@ -0,0 +1,68 @@
|
|||||||
|
# Grafana Configuration
|
||||||
|
# Managed by Ansible - DO NOT EDIT MANUALLY
|
||||||
|
|
||||||
|
[paths]
|
||||||
|
data = {{ grafana_data_dir }}
|
||||||
|
logs = {{ grafana_logs_dir }}
|
||||||
|
plugins = {{ grafana_plugins_dir }}
|
||||||
|
provisioning = {{ grafana_provisioning_dir }}
|
||||||
|
|
||||||
|
[server]
|
||||||
|
http_addr = {{ grafana_listen_address }}
|
||||||
|
http_port = {{ grafana_listen_port }}
|
||||||
|
domain = {{ grafana_domain }}
|
||||||
|
root_url = {{ grafana_root_url }}
|
||||||
|
enforce_domain = true
|
||||||
|
enable_gzip = true
|
||||||
|
|
||||||
|
[database]
|
||||||
|
type = {{ grafana_database_type }}
|
||||||
|
{% if grafana_database_type == 'sqlite3' %}
|
||||||
|
path = {{ grafana_database_path }}
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
[security]
|
||||||
|
admin_user = {{ grafana_admin_user }}
|
||||||
|
admin_password = {{ grafana_admin_password }}
|
||||||
|
secret_key = {{ vault_grafana_secret_key }}
|
||||||
|
cookie_secure = {{ grafana_cookie_secure | lower }}
|
||||||
|
cookie_samesite = {{ grafana_cookie_samesite }}
|
||||||
|
disable_gravatar = true
|
||||||
|
disable_initial_admin_creation = false
|
||||||
|
|
||||||
|
[users]
|
||||||
|
allow_sign_up = {{ grafana_allow_signup | lower }}
|
||||||
|
allow_org_create = false
|
||||||
|
auto_assign_org = true
|
||||||
|
auto_assign_org_role = Viewer
|
||||||
|
|
||||||
|
[auth]
|
||||||
|
disable_login_form = {{ grafana_disable_login_form | lower }}
|
||||||
|
oauth_auto_login = false
|
||||||
|
|
||||||
|
{% if grafana_oauth_enabled %}
|
||||||
|
[auth.generic_oauth]
|
||||||
|
enabled = true
|
||||||
|
name = {{ grafana_oauth_name }}
|
||||||
|
client_id = {{ grafana_oauth_client_id }}
|
||||||
|
client_secret = {{ grafana_oauth_client_secret }}
|
||||||
|
scopes = {{ grafana_oauth_scopes }}
|
||||||
|
auth_url = {{ grafana_oauth_auth_url }}
|
||||||
|
token_url = {{ grafana_oauth_token_url }}
|
||||||
|
api_url = {{ grafana_oauth_api_url }}
|
||||||
|
allow_sign_up = {{ grafana_oauth_allow_sign_up | lower }}
|
||||||
|
role_attribute_path = {{ grafana_oauth_role_attribute_path }}
|
||||||
|
use_pkce = true
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
[log]
|
||||||
|
mode = console
|
||||||
|
level = info
|
||||||
|
|
||||||
|
[analytics]
|
||||||
|
reporting_enabled = false
|
||||||
|
check_for_updates = false
|
||||||
|
check_for_plugin_updates = false
|
||||||
|
|
||||||
|
[snapshots]
|
||||||
|
external_enabled = false
|
||||||
36
roles/metrics/templates/grafana.service.j2
Normal file
36
roles/metrics/templates/grafana.service.j2
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
[Unit]
|
||||||
|
Description=Grafana visualization platform
|
||||||
|
Documentation=https://grafana.com/docs/
|
||||||
|
After=network-online.target
|
||||||
|
Wants=network-online.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
User={{ grafana_user }}
|
||||||
|
Group={{ grafana_group }}
|
||||||
|
|
||||||
|
WorkingDirectory=/opt/grafana
|
||||||
|
ExecStart=/opt/grafana/bin/grafana-server \
|
||||||
|
--config=/etc/grafana/grafana.ini \
|
||||||
|
--homepath=/opt/grafana
|
||||||
|
|
||||||
|
Restart=on-failure
|
||||||
|
RestartSec=5s
|
||||||
|
|
||||||
|
# Security hardening
|
||||||
|
{% if grafana_systemd_security %}
|
||||||
|
NoNewPrivileges=true
|
||||||
|
PrivateTmp=true
|
||||||
|
ProtectSystem=strict
|
||||||
|
ProtectHome=true
|
||||||
|
ReadWritePaths={{ grafana_data_dir }} {{ grafana_logs_dir }}
|
||||||
|
ProtectKernelTunables=true
|
||||||
|
ProtectKernelModules=true
|
||||||
|
ProtectControlGroups=true
|
||||||
|
RestrictRealtime=true
|
||||||
|
RestrictNamespaces=true
|
||||||
|
LockPersonality=true
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
42
roles/metrics/templates/node_exporter.service.j2
Normal file
42
roles/metrics/templates/node_exporter.service.j2
Normal file
@@ -0,0 +1,42 @@
|
|||||||
|
[Unit]
|
||||||
|
Description=Prometheus Node Exporter
|
||||||
|
Documentation=https://github.com/prometheus/node_exporter
|
||||||
|
After=network-online.target
|
||||||
|
Wants=network-online.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
User={{ node_exporter_user }}
|
||||||
|
Group={{ node_exporter_group }}
|
||||||
|
|
||||||
|
ExecStart=/usr/local/bin/node_exporter \
|
||||||
|
--web.listen-address={{ node_exporter_listen_address }} \
|
||||||
|
{% for collector in node_exporter_enabled_collectors %}
|
||||||
|
--collector.{{ collector }} \
|
||||||
|
{% endfor %}
|
||||||
|
{% for collector in node_exporter_disabled_collectors %}
|
||||||
|
--no-collector.{{ collector }} \
|
||||||
|
{% endfor %}
|
||||||
|
--collector.filesystem.fs-types-exclude="{{ node_exporter_filesystem_ignored_fs_types | join('|') }}" \
|
||||||
|
--collector.filesystem.mount-points-exclude="{{ node_exporter_filesystem_ignored_mount_points | join('|') }}"
|
||||||
|
|
||||||
|
Restart=on-failure
|
||||||
|
RestartSec=5s
|
||||||
|
|
||||||
|
# Security hardening
|
||||||
|
{% if node_exporter_systemd_security %}
|
||||||
|
NoNewPrivileges=true
|
||||||
|
PrivateTmp=true
|
||||||
|
ProtectSystem=strict
|
||||||
|
ProtectHome=true
|
||||||
|
ProtectKernelTunables=true
|
||||||
|
ProtectKernelModules=true
|
||||||
|
ProtectControlGroups=true
|
||||||
|
RestrictRealtime=true
|
||||||
|
RestrictNamespaces=true
|
||||||
|
LockPersonality=true
|
||||||
|
ReadOnlyPaths=/
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
22
roles/metrics/templates/scrape.yml.j2
Normal file
22
roles/metrics/templates/scrape.yml.j2
Normal file
@@ -0,0 +1,22 @@
|
|||||||
|
global:
|
||||||
|
scrape_interval: {{ victoriametrics_scrape_interval }}
|
||||||
|
scrape_timeout: {{ victoriametrics_scrape_timeout }}
|
||||||
|
external_labels:
|
||||||
|
environment: '{{ "homelab" if inventory_hostname in groups["homelab"] else "production" }}'
|
||||||
|
host: '{{ inventory_hostname }}'
|
||||||
|
|
||||||
|
scrape_configs:
|
||||||
|
# VictoriaMetrics self-monitoring
|
||||||
|
- job_name: 'victoriametrics'
|
||||||
|
static_configs:
|
||||||
|
- targets: ['{{ victoriametrics_listen_address }}']
|
||||||
|
labels:
|
||||||
|
service: 'victoriametrics'
|
||||||
|
|
||||||
|
# Node exporter for system metrics
|
||||||
|
- job_name: 'node'
|
||||||
|
static_configs:
|
||||||
|
- targets: ['{{ node_exporter_listen_address }}']
|
||||||
|
labels:
|
||||||
|
service: 'node_exporter'
|
||||||
|
instance: '{{ inventory_hostname }}'
|
||||||
41
roles/metrics/templates/victoriametrics.service.j2
Normal file
41
roles/metrics/templates/victoriametrics.service.j2
Normal file
@@ -0,0 +1,41 @@
|
|||||||
|
[Unit]
|
||||||
|
Description=VictoriaMetrics time-series database
|
||||||
|
Documentation=https://docs.victoriametrics.com/
|
||||||
|
After=network-online.target
|
||||||
|
Wants=network-online.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
User={{ victoriametrics_user }}
|
||||||
|
Group={{ victoriametrics_group }}
|
||||||
|
|
||||||
|
ExecStart=/usr/local/bin/victoria-metrics-prod \
|
||||||
|
-storageDataPath={{ victoriametrics_data_dir }} \
|
||||||
|
-retentionPeriod={{ victoriametrics_retention_period }} \
|
||||||
|
-httpListenAddr={{ victoriametrics_listen_address }} \
|
||||||
|
-promscrape.config={{ victoriametrics_scrape_config_file }} \
|
||||||
|
-memory.allowedPercent={{ victoriametrics_memory_allowed_percent }} \
|
||||||
|
-storage.minFreeDiskSpaceBytes={{ victoriametrics_storage_min_free_disk_space_bytes }}
|
||||||
|
|
||||||
|
ExecReload=/bin/kill -HUP $MAINPID
|
||||||
|
|
||||||
|
Restart=on-failure
|
||||||
|
RestartSec=5s
|
||||||
|
|
||||||
|
# Security hardening
|
||||||
|
{% if victoriametrics_systemd_security %}
|
||||||
|
NoNewPrivileges=true
|
||||||
|
PrivateTmp=true
|
||||||
|
ProtectSystem=strict
|
||||||
|
ProtectHome=true
|
||||||
|
ReadWritePaths={{ victoriametrics_data_dir }}
|
||||||
|
ProtectKernelTunables=true
|
||||||
|
ProtectKernelModules=true
|
||||||
|
ProtectControlGroups=true
|
||||||
|
RestrictRealtime=true
|
||||||
|
RestrictNamespaces=true
|
||||||
|
LockPersonality=true
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
Reference in New Issue
Block a user