Migrate to rootful container architecture with infrastructure fact pattern

Major architectural change from rootless user services to system-level (rootful)
containers to enable group-based Unix socket access for containerized applications.

Infrastructure Changes:
- PostgreSQL: Export postgres-clients group GID as Ansible fact
- Valkey: Export valkey-clients group GID as Ansible fact
- Valkey: Add socket-fix service to maintain correct socket group ownership
- Both: Set socket directories to 770 with client group ownership

Authentik Role Refactoring:
- Remove rootless container configuration (subuid/subgid, lingering, user systemd)
- Deploy Quadlet files to /etc/containers/systemd/ (system-level)
- Use dynamic GID facts in container PodmanArgs (--group-add)
- Simplify user creation to system user with infrastructure group membership
- Update handlers for system scope service management
- Remove unnecessary container security options (no user namespace isolation)

Container Template Changes:
- Pod: Remove --userns args, change WantedBy to multi-user.target
- Containers: Replace Annotation with PodmanArgs using dynamic GIDs
- Remove /dev/shm mounts and SecurityLabelDisable (not needed for rootful)
- Change WantedBy to multi-user.target for system services

Documentation Updates:
- Add ADR-005: Rootful Containers with Infrastructure Fact Pattern
- Update ADR-003: Podman + systemd for system-level deployment
- Update authentik-deployment-guide.md for system scope commands
- Update service-integration-guide.md with rootful pattern examples
- Document discarded rootless approach and rationale

Why Rootful Succeeds:
- Direct UID/GID mapping preserves supplementary groups
- Container process groups match host socket group ownership
- No user namespace remapping breaking permissions

Why Rootless Failed (Discarded):
- User namespace UID/GID remapping broke group-based socket access
- Supplementary groups remapped into subgid range didn't match socket ownership
- Even with --userns=host and keep_original_groups, permissions failed

Pattern Established:
- Infrastructure roles create client groups and export GID facts
- Application roles validate facts and consume in container templates
- Rootful containers run as dedicated users with --group-add for socket access
- System-level deployment provides standard systemd service management

Deployment Validated:
- Services in /system.slice/ ✓
- Process groups: 961 (valkey-clients), 962 (postgres-clients), 966 (authentik) ✓
- Socket permissions: 770 with client groups ✓
- HTTP endpoint responding ✓
This commit is contained in:
2025-12-14 16:56:50 +01:00
parent 9e570ac2a3
commit 3506e55016
21 changed files with 587 additions and 288 deletions

View File

@@ -8,18 +8,22 @@ This guide covers the complete deployment process for Authentik, a modern authen
- **Native PostgreSQL** - High-performance database with Unix socket IPC
- **Native Valkey** - Redis-compatible cache with Unix socket IPC
- **Rootless Podman** - Secure container orchestration via systemd/Quadlet
- **Podman Containers** - System-level container orchestration via systemd/Quadlet
- **Caddy Reverse Proxy** - TLS termination and forward authentication
## Architecture Summary
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Internet │ │ Caddy Proxy │ │ Authentik Pod
│ │───▶│ auth.jnss.me │───▶│
│ HTTPS/443 │ │ TLS + Forward │ │ Server + Worker
└─────────────────┘ │ Auth │ │ Containers
└─────────────────┘ └─────────────────┘
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────
│ Internet │ │ Caddy Proxy │ │ Authentik (systemd)
│ │───▶│ auth.jnss.me │───▶│ /system.slice/
│ HTTPS/443 │ │ TLS + Forward │ │
└─────────────────┘ │ Auth │ │ ┌─────────────────────┐
└─────────────────┘ │ │ Pod + Server/Worker │ │
│ │ User: 966:966 │ │
│ │ Groups: 961,962 │ │
│ └─────────────────────┘ │
└─────────────────────────┘
┌─────────────────────────────────┼─────────────────┐
│ Host Infrastructure │ │
@@ -28,6 +32,7 @@ This guide covers the complete deployment process for Authentik, a modern authen
│ │ PostgreSQL │ │ Valkey │ │
│ │ (Native) │ │ (Redis-compatible) │ │
│ │ Unix Socket │ │ Unix Socket │ │
│ │ Group: 962 │ │ Group: 961 │ │
│ └─────────────┘ └─────────────────────────┘ │
└─────────────────────────────────────────────────────┘
```
@@ -56,8 +61,8 @@ Expected output: All services should show `active`
- Socket location: `/var/run/valkey/valkey.sock`
3. **Podman Container Runtime**
- Rootless container support configured
- systemd user session support enabled
- Container runtime installed
- systemd integration (Quadlet) configured
4. **Caddy Web Server**
- TLS/SSL termination configured
@@ -202,14 +207,18 @@ ssh root@your-vps "systemctl --user -M authentik@ status"
After deployment completion:
```bash
# Check systemd user services for authentik user
ssh root@your-vps "systemctl --user -M authentik@ list-units 'authentik*'"
# Check systemd services (system scope)
ssh root@your-vps "systemctl list-units 'authentik*'"
# Verify service location
ssh root@your-vps "systemctl status authentik-server | grep CGroup"
# Expected: /system.slice/authentik-server.service
# Verify containers are running
ssh root@your-vps "sudo -u authentik podman ps"
ssh root@your-vps "podman ps"
# Check pod status
ssh root@your-vps "sudo -u authentik podman pod ps"
ssh root@your-vps "podman pod ps"
```
### Step 5: Health Check Verification
@@ -329,7 +338,11 @@ Error: failed to connect to database: permission denied
```bash
# Check authentik user group membership
ssh root@your-vps "groups authentik"
# Should show: authentik postgres valkey
# Should show: authentik postgres-clients valkey-clients
# Verify container process groups
ssh root@your-vps "ps aux | grep authentik-server | head -1 | awk '{print \$2}' | xargs -I {} cat /proc/{}/status | grep Groups"
# Should show: Groups: 961 962 966 (valkey-clients postgres-clients authentik)
# Verify socket permissions
ssh root@your-vps "ls -la /var/run/postgresql/ /var/run/valkey/"
@@ -351,7 +364,7 @@ Error: bind: address already in use (port 9000)
ssh root@your-vps "netstat -tulpn | grep 9000"
# Stop conflicting services
ssh root@your-vps "systemctl --user -M authentik@ stop authentik-pod"
ssh root@your-vps "systemctl stop authentik-pod"
# Restart with correct configuration
ansible-playbook site.yml --tags authentik,containers --ask-vault-pass
@@ -398,27 +411,29 @@ ansible-playbook site.yml --tags authentik,cache --ask-vault-pass
```bash
# Check container logs
ssh root@your-vps "sudo -u authentik podman logs authentik-server"
ssh root@your-vps "sudo -u authentik podman logs authentik-worker"
ssh root@your-vps "podman logs authentik-server"
ssh root@your-vps "podman logs authentik-worker"
# Inspect container configuration
ssh root@your-vps "sudo -u authentik podman inspect authentik-server"
ssh root@your-vps "podman inspect authentik-server"
# Check container user/group mapping
ssh root@your-vps "sudo -u authentik podman exec authentik-server id"
ssh root@your-vps "podman exec authentik-server id"
# Expected: uid=966(authentik) gid=966(authentik) groups=966(authentik),961(valkey-clients),962(postgres-clients)
```
#### Service Status Verification
```bash
# Check all authentik systemd services
ssh root@your-vps "systemctl --user -M authentik@ status authentik-pod authentik-server authentik-worker"
ssh root@your-vps "systemctl status authentik-pod authentik-server authentik-worker"
# View service dependencies
ssh root@your-vps "systemctl --user -M authentik@ list-dependencies authentik-pod"
ssh root@your-vps "systemctl list-dependencies authentik-pod"
# Check user session status
ssh root@your-vps "loginctl show-user authentik"
# Verify services are in system.slice
ssh root@your-vps "systemctl status authentik-server | grep CGroup"
# Expected: /system.slice/authentik-server.service
```
#### Network Connectivity Testing
@@ -440,12 +455,12 @@ curl -v https://auth.jnss.me/
```bash
# Authentik application logs
ssh root@your-vps "sudo -u authentik cat /opt/authentik/logs/server.log"
ssh root@your-vps "sudo -u authentik cat /opt/authentik/logs/worker.log"
ssh root@your-vps "cat /opt/authentik/logs/server.log"
ssh root@your-vps "cat /opt/authentik/logs/worker.log"
# systemd service logs
ssh root@your-vps "journalctl --user -M authentik@ -u authentik-server -f"
ssh root@your-vps "journalctl --user -M authentik@ -u authentik-worker -f"
ssh root@your-vps "journalctl -u authentik-server -f"
ssh root@your-vps "journalctl -u authentik-worker -f"
# Caddy logs for reverse proxy issues
ssh root@your-vps "journalctl -u caddy -f"
@@ -475,10 +490,10 @@ INFO authentik.core.cache: Connected to cache via unix socket
```bash
# Monitor container resource usage
ssh root@your-vps "sudo -u authentik podman stats"
ssh root@your-vps "podman stats"
# Monitor service memory usage
ssh root@your-vps "systemctl --user -M authentik@ status authentik-server | grep Memory"
ssh root@your-vps "systemctl status authentik-server | grep Memory"
# Monitor database connections
ssh root@your-vps "sudo -u postgres psql -h /var/run/postgresql -c 'SELECT * FROM pg_stat_activity;'"
@@ -585,10 +600,10 @@ ansible-playbook site.yml --tags authentik,image-pull --ask-vault-pass
```bash
# Emergency service restart
ssh root@your-vps "systemctl --user -M authentik@ restart authentik-pod"
ssh root@your-vps "systemctl restart authentik-pod"
# Fallback: Direct container management
ssh root@your-vps "sudo -u authentik podman pod restart authentik"
ssh root@your-vps "podman pod restart authentik"
# Last resort: Full service rebuild
ansible-playbook site.yml --tags authentik --ask-vault-pass --limit arch-vps