Files
rick-infra/docs/authentik-deployment-guide.md
Joakim 3506e55016 Migrate to rootful container architecture with infrastructure fact pattern
Major architectural change from rootless user services to system-level (rootful)
containers to enable group-based Unix socket access for containerized applications.

Infrastructure Changes:
- PostgreSQL: Export postgres-clients group GID as Ansible fact
- Valkey: Export valkey-clients group GID as Ansible fact
- Valkey: Add socket-fix service to maintain correct socket group ownership
- Both: Set socket directories to 770 with client group ownership

Authentik Role Refactoring:
- Remove rootless container configuration (subuid/subgid, lingering, user systemd)
- Deploy Quadlet files to /etc/containers/systemd/ (system-level)
- Use dynamic GID facts in container PodmanArgs (--group-add)
- Simplify user creation to system user with infrastructure group membership
- Update handlers for system scope service management
- Remove unnecessary container security options (no user namespace isolation)

Container Template Changes:
- Pod: Remove --userns args, change WantedBy to multi-user.target
- Containers: Replace Annotation with PodmanArgs using dynamic GIDs
- Remove /dev/shm mounts and SecurityLabelDisable (not needed for rootful)
- Change WantedBy to multi-user.target for system services

Documentation Updates:
- Add ADR-005: Rootful Containers with Infrastructure Fact Pattern
- Update ADR-003: Podman + systemd for system-level deployment
- Update authentik-deployment-guide.md for system scope commands
- Update service-integration-guide.md with rootful pattern examples
- Document discarded rootless approach and rationale

Why Rootful Succeeds:
- Direct UID/GID mapping preserves supplementary groups
- Container process groups match host socket group ownership
- No user namespace remapping breaking permissions

Why Rootless Failed (Discarded):
- User namespace UID/GID remapping broke group-based socket access
- Supplementary groups remapped into subgid range didn't match socket ownership
- Even with --userns=host and keep_original_groups, permissions failed

Pattern Established:
- Infrastructure roles create client groups and export GID facts
- Application roles validate facts and consume in container templates
- Rootful containers run as dedicated users with --group-add for socket access
- System-level deployment provides standard systemd service management

Deployment Validated:
- Services in /system.slice/ ✓
- Process groups: 961 (valkey-clients), 962 (postgres-clients), 966 (authentik) ✓
- Socket permissions: 770 with client groups ✓
- HTTP endpoint responding ✓
2025-12-14 16:56:50 +01:00

657 lines
19 KiB
Markdown

# Authentik Deployment Guide
A comprehensive guide for deploying Authentik authentication server with native database services and Unix socket IPC in the rick-infra environment.
## Overview
This guide covers the complete deployment process for Authentik, a modern authentication and authorization server, integrated with:
- **Native PostgreSQL** - High-performance database with Unix socket IPC
- **Native Valkey** - Redis-compatible cache with Unix socket IPC
- **Podman Containers** - System-level container orchestration via systemd/Quadlet
- **Caddy Reverse Proxy** - TLS termination and forward authentication
## Architecture Summary
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐
│ Internet │ │ Caddy Proxy │ │ Authentik (systemd) │
│ │───▶│ auth.jnss.me │───▶│ /system.slice/ │
│ HTTPS/443 │ │ TLS + Forward │ │ │
└─────────────────┘ │ Auth │ │ ┌─────────────────────┐ │
└─────────────────┘ │ │ Pod + Server/Worker │ │
│ │ User: 966:966 │ │
│ │ Groups: 961,962 │ │
│ └─────────────────────┘ │
└─────────────────────────┘
┌─────────────────────────────────┼─────────────────┐
│ Host Infrastructure │ │
│ ▼ │
│ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ PostgreSQL │ │ Valkey │ │
│ │ (Native) │ │ (Redis-compatible) │ │
│ │ Unix Socket │ │ Unix Socket │ │
│ │ Group: 962 │ │ Group: 961 │ │
│ └─────────────┘ └─────────────────────────┘ │
└─────────────────────────────────────────────────────┘
```
## Prerequisites
### Infrastructure Requirements
Before deploying Authentik, ensure the following infrastructure services are running:
```bash
# Verify required services are active
ssh root@your-vps "systemctl is-active postgresql valkey caddy podman"
```
Expected output: All services should show `active`
### Required Infrastructure Components
1. **PostgreSQL Database**
- Native systemd service with Unix socket enabled
- Socket location: `/var/run/postgresql/.s.PGSQL.5432`
2. **Valkey Cache Service**
- Native systemd service with Unix socket enabled
- Socket location: `/var/run/valkey/valkey.sock`
3. **Podman Container Runtime**
- Container runtime installed
- systemd integration (Quadlet) configured
4. **Caddy Web Server**
- TLS/SSL termination configured
- API management enabled for dynamic configuration
### DNS Configuration
Ensure DNS records are configured for your authentik domain:
```bash
# Verify DNS resolution
dig +short auth.jnss.me
# Should return your VPS IP address
```
### Network Requirements
- **Port 80/443**: Open for HTTP/HTTPS traffic
- **Internal communications**: Unix sockets (no additional ports required)
## Vault Variables Setup
### Required Vault Variables
Create or update `host_vars/arch-vps/vault.yml` with the following encrypted variables:
```yaml
---
# Authentik Database Password
vault_authentik_db_password: "secure_random_password_32_chars"
# Authentik Secret Key (generate with: openssl rand -base64 32)
vault_authentik_secret_key: "your_generated_secret_key_here"
# Authentik Admin Password
vault_authentik_admin_password: "secure_admin_password"
# Infrastructure Dependencies (should already exist)
vault_valkey_password: "valkey_password"
```
### Generate Required Secrets
```bash
# Generate Authentik secret key
openssl rand -base64 32
# Generate secure passwords
openssl rand -base64 20
```
### Encrypt Vault File
```bash
# Encrypt the vault file
ansible-vault encrypt host_vars/arch-vps/vault.yml
# Verify vault variables
ansible-vault view host_vars/arch-vps/vault.yml
```
## Pre-deployment Validation
### Infrastructure Health Check
Run the following commands to verify infrastructure readiness:
```bash
# 1. Check PostgreSQL Unix socket
ssh root@your-vps "ls -la /var/run/postgresql/"
# Should show .s.PGSQL.5432 socket file
# 2. Check Valkey Unix socket
ssh root@your-vps "ls -la /var/run/valkey/"
# Should show valkey.sock file
# 3. Test PostgreSQL connectivity
ssh root@your-vps "sudo -u postgres psql -h /var/run/postgresql -c 'SELECT version();'"
# Should return PostgreSQL version
# 4. Test Valkey connectivity
ssh root@your-vps "redis-cli -s /var/run/valkey/valkey.sock ping"
# Should return "PONG"
# 5. Verify Caddy is responsive
curl -I https://jnss.me/
# Should return HTTP/2 200
```
### Ansible Configuration Validation
```bash
# Test Ansible connectivity
ansible arch-vps -m ping
# Verify vault variables can be decrypted
ansible arch-vps -m debug -a "var=vault_authentik_secret_key" --ask-vault-pass
```
## Step-by-Step Deployment
### Step 1: Enable Authentik Role
Update `site.yml` to include the authentik role:
```yaml
- name: Deploy Core Infrastructure
hosts: arch-vps
become: true
gather_facts: true
roles:
# Infrastructure dependencies handled automatically via meta/main.yml
- role: authentik
tags: ['authentik', 'auth', 'sso']
```
### Step 2: Execute Deployment
```bash
# Full deployment with verbose output
ansible-playbook -i inventory/hosts.yml site.yml --tags authentik --ask-vault-pass -v
# Alternative: Deploy with specific components
ansible-playbook -i inventory/hosts.yml site.yml --tags authentik,database,containers --ask-vault-pass
```
### Step 3: Monitor Deployment Progress
During deployment, monitor the following:
```bash
# Monitor deployment logs in real-time (separate terminal)
ssh root@your-vps "journalctl -f"
# Watch for authentik-specific services
ssh root@your-vps "systemctl --user -M authentik@ status"
```
### Step 4: Verify Container Deployment
After deployment completion:
```bash
# Check systemd services (system scope)
ssh root@your-vps "systemctl list-units 'authentik*'"
# Verify service location
ssh root@your-vps "systemctl status authentik-server | grep CGroup"
# Expected: /system.slice/authentik-server.service
# Verify containers are running
ssh root@your-vps "podman ps"
# Check pod status
ssh root@your-vps "podman pod ps"
```
### Step 5: Health Check Verification
```bash
# Test internal HTTP endpoint
ssh root@your-vps "curl -I http://127.0.0.1:9000/"
# Expected: HTTP/1.1 302 Found (redirect to login)
# Test external HTTPS endpoint
curl -I https://auth.jnss.me/
# Expected: HTTP/2 200 or 302
# Test authentik API endpoint
curl -s https://auth.jnss.me/api/v3/admin/version/
# Expected: JSON response with authentication error (proves API is responsive)
```
## Post-deployment Configuration
### Initial Admin Access
1. **Access Web Interface**:
```bash
# Open in browser
https://auth.jnss.me/
```
2. **Admin Login**:
- **Username**: `admin@auth.jnss.me` (or configured admin email)
- **Password**: Value from `vault_authentik_admin_password`
3. **Verify Admin Access**:
- Navigate to `/if/admin/`
- Confirm admin interface loads successfully
- Check system status in admin dashboard
### Essential Configuration Tasks
#### 1. Configure OAuth2 Provider
```bash
# Navigate to Applications → Providers → Create
# Provider Type: OAuth2/OpenID Provider
# Name: "Default OAuth2 Provider"
# Client Type: Confidential
# Authorization Grant Type: Authorization Code
# Redirect URIs: Add your application callback URLs
```
#### 2. Create Application
```bash
# Navigate to Applications → Applications → Create
# Name: "Your Application Name"
# Slug: "your-app"
# Provider: Select OAuth2 provider created above
# Launch URL: Your application URL
```
#### 3. Configure Forward Auth (for Caddy integration)
```bash
# Navigate to Applications → Providers → Create
# Provider Type: Proxy Provider
# Name: "Forward Auth Provider"
# External Host: https://your-service.jnss.me
# Internal Host: http://localhost:8080 (your service backend)
```
## Service Integration Examples
### Example 1: Protect Existing HTTP Service with Forward Auth
Add to your service's Caddy configuration:
```caddyfile
# In /etc/caddy/sites-enabled/myservice.caddy
myservice.jnss.me {
# Forward authentication to authentik
forward_auth https://auth.jnss.me {
uri /outpost.goauthentik.io/auth/caddy
copy_headers Remote-User Remote-Name Remote-Email Remote-Groups
}
# Your service backend
reverse_proxy localhost:8080
}
```
### Example 2: OAuth2 Integration for Custom Applications
For applications that can handle OAuth2 directly:
```yaml
# Application configuration
OAUTH2_PROVIDER_URL: "https://auth.jnss.me/application/o/authorize/"
OAUTH2_TOKEN_URL: "https://auth.jnss.me/application/o/token/"
OAUTH2_USER_INFO_URL: "https://auth.jnss.me/application/o/userinfo/"
OAUTH2_CLIENT_ID: "your_client_id"
OAUTH2_CLIENT_SECRET: "your_client_secret"
OAUTH2_REDIRECT_URI: "https://yourapp.jnss.me/oauth/callback"
```
## Troubleshooting Guide
### Common Issues and Solutions
#### Issue: Containers fail to start with socket permission errors
**Symptoms**:
```
Error: failed to connect to database: permission denied
```
**Solution**:
```bash
# Check authentik user group membership
ssh root@your-vps "groups authentik"
# Should show: authentik postgres-clients valkey-clients
# Verify container process groups
ssh root@your-vps "ps aux | grep authentik-server | head -1 | awk '{print \$2}' | xargs -I {} cat /proc/{}/status | grep Groups"
# Should show: Groups: 961 962 966 (valkey-clients postgres-clients authentik)
# Verify socket permissions
ssh root@your-vps "ls -la /var/run/postgresql/ /var/run/valkey/"
# Fix group membership if missing
ansible-playbook site.yml --tags authentik,user,setup --ask-vault-pass
```
#### Issue: HTTP binding errors (address already in use)
**Symptoms**:
```
Error: bind: address already in use (port 9000)
```
**Solution**:
```bash
# Check what's using port 9000
ssh root@your-vps "netstat -tulpn | grep 9000"
# Stop conflicting services
ssh root@your-vps "systemctl stop authentik-pod"
# Restart with correct configuration
ansible-playbook site.yml --tags authentik,containers --ask-vault-pass
```
#### Issue: Database connection failures
**Symptoms**:
```
FATAL: database "authentik" does not exist
```
**Solution**:
```bash
# Recreate database and user
ansible-playbook site.yml --tags authentik,database --ask-vault-pass
# Verify database creation
ssh root@your-vps "sudo -u postgres psql -h /var/run/postgresql -c '\l'"
```
#### Issue: Cache connection failures
**Symptoms**:
```
Error connecting to Redis: Connection refused
```
**Solution**:
```bash
# Check Valkey service status
ssh root@your-vps "systemctl status valkey"
# Test socket connectivity
ssh root@your-vps "redis-cli -s /var/run/valkey/valkey.sock ping"
# Redeploy cache configuration if needed
ansible-playbook site.yml --tags authentik,cache --ask-vault-pass
```
### Diagnostic Commands
#### Container Debugging
```bash
# Check container logs
ssh root@your-vps "podman logs authentik-server"
ssh root@your-vps "podman logs authentik-worker"
# Inspect container configuration
ssh root@your-vps "podman inspect authentik-server"
# Check container user/group mapping
ssh root@your-vps "podman exec authentik-server id"
# Expected: uid=966(authentik) gid=966(authentik) groups=966(authentik),961(valkey-clients),962(postgres-clients)
```
#### Service Status Verification
```bash
# Check all authentik systemd services
ssh root@your-vps "systemctl status authentik-pod authentik-server authentik-worker"
# View service dependencies
ssh root@your-vps "systemctl list-dependencies authentik-pod"
# Verify services are in system.slice
ssh root@your-vps "systemctl status authentik-server | grep CGroup"
# Expected: /system.slice/authentik-server.service
```
#### Network Connectivity Testing
```bash
# Test internal HTTP binding
ssh root@your-vps "curl -v http://127.0.0.1:9000/"
# Test Caddy reverse proxy
ssh root@your-vps "curl -v http://127.0.0.1:80/ -H 'Host: auth.jnss.me'"
# Test external HTTPS
curl -v https://auth.jnss.me/
```
### Log Analysis
#### Key Log Locations
```bash
# Authentik application logs
ssh root@your-vps "cat /opt/authentik/logs/server.log"
ssh root@your-vps "cat /opt/authentik/logs/worker.log"
# systemd service logs
ssh root@your-vps "journalctl -u authentik-server -f"
ssh root@your-vps "journalctl -u authentik-worker -f"
# Caddy logs for reverse proxy issues
ssh root@your-vps "journalctl -u caddy -f"
```
#### Common Log Patterns
**Successful startup**:
```
INFO authentik.core.signals: authentik 2025.10.x starting
INFO authentik.core.models: Database version up-to-date
```
**Database connection success**:
```
INFO authentik.core.db: Connected to database via unix socket
```
**Cache connection success**:
```
INFO authentik.core.cache: Connected to cache via unix socket
```
## Performance Monitoring
### Resource Usage Monitoring
```bash
# Monitor container resource usage
ssh root@your-vps "podman stats"
# Monitor service memory usage
ssh root@your-vps "systemctl status authentik-server | grep Memory"
# Monitor database connections
ssh root@your-vps "sudo -u postgres psql -h /var/run/postgresql -c 'SELECT * FROM pg_stat_activity;'"
```
### Performance Optimization Tips
1. **Database Performance**:
- Monitor PostgreSQL slow query log
- Consider database connection pooling for high traffic
- Regular database maintenance (VACUUM, ANALYZE)
2. **Cache Performance**:
- Monitor Valkey memory usage and hit rate
- Adjust cache TTL settings based on usage patterns
- Consider cache warming for frequently accessed data
3. **Container Performance**:
- Monitor container memory limits and usage
- Optimize shared memory configuration if needed
- Review worker process configuration
## Maintenance Tasks
### Regular Maintenance
#### Update Authentik Version
```yaml
# Update version in defaults or inventory
authentik_version: "2025.12.1" # New version
# Deploy update
ansible-playbook site.yml --tags authentik,containers --ask-vault-pass
```
#### Backup Procedures
```bash
# Database backup
ssh root@your-vps "sudo -u postgres pg_dump -h /var/run/postgresql authentik > /backup/authentik-$(date +%Y%m%d).sql"
# Media files backup
ssh root@your-vps "tar -czf /backup/authentik-media-$(date +%Y%m%d).tar.gz -C /opt/authentik media"
# Configuration backup (run from ansible control machine)
ansible-vault view host_vars/arch-vps/vault.yml > backup/authentik-vault-$(date +%Y%m%d).yml
```
#### Health Monitoring
Set up regular health checks:
```bash
#!/bin/bash
# Health check script
HEALTH_URL="https://auth.jnss.me/if/health/live/"
if ! curl -f -s "$HEALTH_URL" > /dev/null; then
echo "Authentik health check failed"
# Add alerting logic
fi
```
### Security Maintenance
#### Certificate Monitoring
```bash
# Check certificate expiration
ssh root@your-vps "curl -vI https://auth.jnss.me/ 2>&1 | grep expire"
# Caddy handles renewal automatically, but monitor logs
ssh root@your-vps "journalctl -u caddy | grep -i cert"
```
#### Security Updates
```bash
# Update container images regularly
ansible-playbook site.yml --tags authentik,image-pull --ask-vault-pass
# Monitor for Authentik security advisories
# https://github.com/goauthentik/authentik/security/advisories
```
## Support and Resources
### Documentation References
- **Authentik Official Documentation**: https://docs.goauthentik.io/
- **rick-infra Architecture Decisions**: [docs/architecture-decisions.md](architecture-decisions.md)
- **Service Integration Guide**: [docs/service-integration-guide.md](service-integration-guide.md)
- **Security Model**: [docs/security-hardening.md](security-hardening.md)
### Community Resources
- **Authentik Community Forum**: https://community.goauthentik.io/
- **GitHub Issues**: https://github.com/goauthentik/authentik/issues
- **Discord Community**: https://discord.gg/jg33eMhnj6
### Emergency Procedures
#### Service Recovery
```bash
# Emergency service restart
ssh root@your-vps "systemctl restart authentik-pod"
# Fallback: Direct container management
ssh root@your-vps "podman pod restart authentik"
# Last resort: Full service rebuild
ansible-playbook site.yml --tags authentik --ask-vault-pass --limit arch-vps
```
#### Rollback Procedures
```bash
# Rollback to previous container version
authentik_version: "previous_working_version"
ansible-playbook site.yml --tags authentik,containers --ask-vault-pass
# Database rollback (if needed)
ssh root@your-vps "sudo -u postgres psql -h /var/run/postgresql authentik < /backup/authentik-backup.sql"
```
---
## Deployment Checklist
Use this checklist to ensure complete deployment:
### Pre-deployment
- [ ] Infrastructure services (PostgreSQL, Valkey, Caddy, Podman) running
- [ ] DNS records configured for auth.jnss.me
- [ ] Vault variables configured and encrypted
- [ ] Ansible connectivity verified
### Deployment
- [ ] Authentik role enabled in site.yml
- [ ] Deployment executed successfully
- [ ] Health checks passing
- [ ] Containers running and responsive
### Post-deployment
- [ ] Admin web interface accessible
- [ ] Initial admin login successful
- [ ] OAuth2 provider configured
- [ ] Test application integration
- [ ] Forward auth configuration tested
### Production Readiness
- [ ] Backup procedures implemented
- [ ] Monitoring and alerting configured
- [ ] Security review completed
- [ ] Documentation updated
- [ ] Team training completed
---
This comprehensive deployment guide provides everything needed to successfully deploy and maintain Authentik in the rick-infra environment, emphasizing the security and performance benefits of our native database + Unix socket architecture.