Migrate to rootful container architecture with infrastructure fact pattern

Major architectural change from rootless user services to system-level (rootful)
containers to enable group-based Unix socket access for containerized applications.

Infrastructure Changes:
- PostgreSQL: Export postgres-clients group GID as Ansible fact
- Valkey: Export valkey-clients group GID as Ansible fact
- Valkey: Add socket-fix service to maintain correct socket group ownership
- Both: Set socket directories to 770 with client group ownership

Authentik Role Refactoring:
- Remove rootless container configuration (subuid/subgid, lingering, user systemd)
- Deploy Quadlet files to /etc/containers/systemd/ (system-level)
- Use dynamic GID facts in container PodmanArgs (--group-add)
- Simplify user creation to system user with infrastructure group membership
- Update handlers for system scope service management
- Remove unnecessary container security options (no user namespace isolation)

Container Template Changes:
- Pod: Remove --userns args, change WantedBy to multi-user.target
- Containers: Replace Annotation with PodmanArgs using dynamic GIDs
- Remove /dev/shm mounts and SecurityLabelDisable (not needed for rootful)
- Change WantedBy to multi-user.target for system services

Documentation Updates:
- Add ADR-005: Rootful Containers with Infrastructure Fact Pattern
- Update ADR-003: Podman + systemd for system-level deployment
- Update authentik-deployment-guide.md for system scope commands
- Update service-integration-guide.md with rootful pattern examples
- Document discarded rootless approach and rationale

Why Rootful Succeeds:
- Direct UID/GID mapping preserves supplementary groups
- Container process groups match host socket group ownership
- No user namespace remapping breaking permissions

Why Rootless Failed (Discarded):
- User namespace UID/GID remapping broke group-based socket access
- Supplementary groups remapped into subgid range didn't match socket ownership
- Even with --userns=host and keep_original_groups, permissions failed

Pattern Established:
- Infrastructure roles create client groups and export GID facts
- Application roles validate facts and consume in container templates
- Rootful containers run as dedicated users with --group-add for socket access
- System-level deployment provides standard systemd service management

Deployment Validated:
- Services in /system.slice/ ✓
- Process groups: 961 (valkey-clients), 962 (postgres-clients), 966 (authentik) ✓
- Socket permissions: 770 with client groups ✓
- HTTP endpoint responding ✓
This commit is contained in:
2025-12-14 16:56:50 +01:00
parent 9e570ac2a3
commit 3506e55016
21 changed files with 587 additions and 288 deletions

View File

@@ -8,6 +8,7 @@ This document records the significant architectural decisions made in the rick-i
- [ADR-002: Unix Socket IPC Architecture](#adr-002-unix-socket-ipc-architecture)
- [ADR-003: Podman + systemd Container Orchestration](#adr-003-podman--systemd-container-orchestration)
- [ADR-004: Forward Authentication Security Model](#adr-004-forward-authentication-security-model)
- [ADR-005: Rootful Containers with Infrastructure Fact Pattern](#adr-005-rootful-containers-with-infrastructure-fact-pattern)
---
@@ -270,8 +271,9 @@ podman exec authentik-server id
**Status**: ✅ Accepted
**Date**: December 2025
**Updated**: December 2025 (System-level deployment pattern)
**Deciders**: Infrastructure Team
**Technical Story**: Container orchestration solution for rootless, secure application deployment with systemd integration.
**Technical Story**: Container orchestration solution for secure application deployment with systemd integration.
### Context
@@ -284,40 +286,42 @@ Container orchestration options for a single-node infrastructure:
### Decision
We will use **Podman with systemd integration (Quadlet)** for container orchestration.
We will use **Podman with systemd integration (Quadlet)** for container orchestration, deployed as system-level services (rootful containers running as dedicated users).
### Rationale
#### Security Advantages
- **Rootless Architecture**: No privileged daemon required
- **No Daemon Required**: No privileged daemon attack surface
```bash
# Docker: Requires root daemon
sudo systemctl status docker
# Podman: Rootless operation
systemctl --user status podman
# Podman: Daemonless operation
podman ps # No daemon needed
```
- **No Daemon Attack Surface**: No long-running privileged process
- **User Namespace Isolation**: Each user's containers are isolated
- **Dedicated Service Users**: Containers run as dedicated system users (not root)
- **Group-Based Access Control**: Unix group membership controls infrastructure access
- **SELinux Integration**: Better SELinux support than Docker
#### systemd Integration Benefits
- **Native Service Management**: Containers as systemd services
- **Native Service Management**: Containers as system-level systemd services
```ini
# Quadlet file: ~/.config/containers/systemd/authentik.pod
# Quadlet file: /etc/containers/systemd/authentik.pod
[Unit]
Description=Authentik Authentication Pod
[Pod]
PublishPort=127.0.0.1:9000:9000
PublishPort=0.0.0.0:9000:9000
ShmSize=256m
[Service]
Restart=always
TimeoutStartSec=900
[Install]
WantedBy=default.target
WantedBy=multi-user.target
```
- **Dependency Management**: systemd handles service dependencies
- **Resource Control**: systemd resource limits and monitoring
@@ -327,11 +331,11 @@ We will use **Podman with systemd integration (Quadlet)** for container orchestr
- **Familiar Tooling**: Standard systemd commands
```bash
systemctl --user status authentik-pod
systemctl --user restart authentik-server
journalctl --user -u authentik-server -f
systemctl status authentik-pod
systemctl restart authentik-server
journalctl -u authentik-server -f
```
- **Boot Integration**: Services start automatically with user sessions
- **Boot Integration**: Services start automatically at system boot
- **Resource Monitoring**: systemd resource tracking
- **Configuration Management**: Declarative Quadlet files
@@ -345,7 +349,7 @@ We will use **Podman with systemd integration (Quadlet)** for container orchestr
```
┌─────────────────────────────────────────────────────────────┐
│ systemd User Session (authentik)
│ systemd System Services (/system.slice/)
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌───────────────┐ │
│ │ authentik-pod │ │ authentik-server│ │authentik-worker│ │
@@ -355,15 +359,23 @@ We will use **Podman with systemd integration (Quadlet)** for container orchestr
│ └────────────────────┼────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Podman Pod (rootless) │ │
│ │ Podman Pod (rootful, dedicated user) │ │
│ │ │ │
│ │ ┌─────────────────┐ ┌─────────────────────────────────┐ │ │
│ │ │ Server Container│ │ Worker Container │ │ │
│ │ │ UID: 963 (host) │ │ UID: 963 (host) │ │ │
│ │ │ Groups: postgres│ │ Groups: postgres,valkey │ │ │
│ │ │ valkey │ │ │ │ │
│ │ │ User: 966:966 │ │ User: 966:966 │ │ │
│ │ │ Groups: 961,962 │ │ Groups: 961,962 │ │ │
│ │ │ (valkey,postgres)│ │ (valkey,postgres) │ │ │
│ │ └─────────────────┘ └─────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│ Group-based access to infrastructure
┌─────────────────────────────────────────────────────────────┐
│ Infrastructure Services │
│ PostgreSQL: /var/run/postgresql (postgres:postgres-clients)│
│ Valkey: /var/run/valkey (valkey:valkey-clients) │
└─────────────────────────────────────────────────────────────┘
```
@@ -396,19 +408,22 @@ Requires=authentik-pod.service
ContainerName=authentik-server
Image=ghcr.io/goauthentik/server:2025.10
Pod=authentik.pod
EnvironmentFile=%h/.env
User=%i:%i
Annotation=run.oci.keep_original_groups=1
EnvironmentFile=/opt/authentik/.env
User=966:966
PodmanArgs=--group-add 962 --group-add 961
# Volume mounts for sockets
# Volume mounts for sockets and data
Volume=/opt/authentik/media:/media
Volume=/opt/authentik/data:/data
Volume=/var/run/postgresql:/var/run/postgresql:Z
Volume=/var/run/valkey:/var/run/valkey:Z
[Service]
Restart=always
TimeoutStartSec=300
[Install]
WantedBy=default.target
WantedBy=multi-user.target
```
### User Management Strategy
@@ -418,20 +433,17 @@ WantedBy=default.target
- name: Create service user
user:
name: authentik
group: authentik
groups: [postgres-clients, valkey-clients]
system: true
shell: /bin/bash
home: /opt/authentik
create_home: true
- name: Add to infrastructure groups
user:
name: authentik
groups: [postgres, valkey]
append: true
- name: Enable lingering (services persist)
command: loginctl enable-linger authentik
```
**Note**: Infrastructure roles (PostgreSQL, Valkey) export client group GIDs as Ansible facts (`postgresql_client_group_gid`, `valkey_client_group_gid`) which are consumed by application container templates for dynamic `--group-add` arguments.
### Consequences
#### Positive
@@ -457,15 +469,20 @@ WantedBy=default.target
### Technical Implementation
```bash
# Container management (as service user)
systemctl --user status authentik-pod
systemctl --user restart authentik-server
# Container management (system scope)
systemctl status authentik-pod
systemctl restart authentik-server
podman ps
podman logs authentik-server
# Resource monitoring
systemctl --user show authentik-server --property=MemoryCurrent
journalctl --user -u authentik-server -f
systemctl show authentik-server --property=MemoryCurrent
journalctl -u authentik-server -f
# Verify container groups
ps aux | grep authentik-server | head -1 | awk '{print $2}' | \
xargs -I {} cat /proc/{}/status | grep Groups
# Output: Groups: 961 962 966
```
### Alternatives Considered
@@ -763,6 +780,268 @@ app.jnss.me {
---
## ADR-005: Rootful Containers with Infrastructure Fact Pattern
**Status**: ✅ Accepted
**Date**: December 2025
**Deciders**: Infrastructure Team
**Technical Story**: Enable containerized applications to access native infrastructure services (PostgreSQL, Valkey) via Unix sockets with group-based permissions.
### Context
Containerized applications need to access infrastructure services (PostgreSQL, Valkey) through Unix sockets with filesystem-based permission controls. The permission model requires:
1. **Socket directories** owned by service groups (`postgres-clients`, `valkey-clients`)
2. **Application users** added to these groups for access
3. **Container processes** must preserve group membership to access sockets
Two approaches were evaluated:
1. **Rootless containers (user namespace)**: Containers run in user namespace with UID/GID remapping
2. **Rootful containers (system services)**: Containers run as dedicated system users without namespace isolation
### Decision
We will use **rootful containers deployed as system-level systemd services** with an **Infrastructure Fact Pattern** where infrastructure roles export client group GIDs as Ansible facts for application consumption.
### Rationale
#### Why Rootful Succeeds
**Direct UID/GID Mapping**:
```bash
# Host: authentik user UID 966, groups: 966 (authentik), 961 (valkey-clients), 962 (postgres-clients)
# Container User=966:966 with PodmanArgs=--group-add 961 --group-add 962
# Inside container:
id
# uid=966(authentik) gid=966(authentik) groups=966(authentik),961(valkey-clients),962(postgres-clients)
# Socket access works:
ls -l /var/run/postgresql/.s.PGSQL.5432
# srwxrwx--- 1 postgres postgres-clients 0 ... /var/run/postgresql/.s.PGSQL.5432
```
**Group membership preserved**: Container process has GIDs 961 and 962, matching socket group ownership.
#### Why Rootless Failed (Discarded Approach)
**User Namespace UID/GID Remapping**:
```bash
# Host: authentik user UID 100000, subuid range 200000-265535
# Container User=%i:%i with --userns=host --group-add=keep-groups
# User namespace remaps:
# Host UID 100000 → Container UID 100000 (root in namespace)
# Host GID 961 → Container GID 200961 (remapped into subgid range)
# Host GID 962 → Container GID 200962 (remapped into subgid range)
# Socket ownership on host:
# srwxrwx--- 1 postgres postgres-clients (GID 962)
# Container process groups: 200961, 200962 (remapped)
# Socket expects: GID 962 (not remapped)
# Result: Permission denied ❌
```
**Root cause**: User namespace supplementary group remapping breaks group-based socket access even with `--userns=host`, `--group-add=keep-groups`, and `Annotation=run.oci.keep_original_groups=1`.
### Infrastructure Fact Pattern
#### Infrastructure Roles Export GIDs
Infrastructure services create client groups and export their GIDs as Ansible facts:
```yaml
# PostgreSQL role: roles/postgresql/tasks/main.yml
- name: Create PostgreSQL client access group
group:
name: postgres-clients
system: true
- name: Get PostgreSQL client group GID
shell: "getent group postgres-clients | cut -d: -f3"
register: postgresql_client_group_lookup
changed_when: false
- name: Set PostgreSQL client group GID as fact
set_fact:
postgresql_client_group_gid: "{{ postgresql_client_group_lookup.stdout }}"
```
```yaml
# Valkey role: roles/valkey/tasks/main.yml
- name: Create Valkey client access group
group:
name: valkey-clients
system: true
- name: Get Valkey client group GID
shell: "getent group valkey-clients | cut -d: -f3"
register: valkey_client_group_lookup
changed_when: false
- name: Set Valkey client group GID as fact
set_fact:
valkey_client_group_gid: "{{ valkey_client_group_lookup.stdout }}"
```
#### Application Roles Consume Facts
Application roles validate and consume infrastructure facts:
```yaml
# Authentik role: roles/authentik/tasks/main.yml
- name: Validate infrastructure facts are available
assert:
that:
- postgresql_client_group_gid is defined
- valkey_client_group_gid is defined
fail_msg: |
Required infrastructure facts are not available.
Ensure PostgreSQL and Valkey roles have run first.
- name: Create authentik user with infrastructure groups
user:
name: authentik
groups: [postgres-clients, valkey-clients]
append: true
```
```ini
# Container template: roles/authentik/templates/authentik-server.container
[Container]
User={{ authentik_uid }}:{{ authentik_gid }}
PodmanArgs=--group-add {{ postgresql_client_group_gid }} --group-add {{ valkey_client_group_gid }}
```
### Implementation Details
#### System-Level Deployment
```ini
# Quadlet files deployed to /etc/containers/systemd/ (not ~/.config/)
# Pod: /etc/containers/systemd/authentik.pod
[Unit]
Description=Authentik Authentication Pod
[Pod]
PublishPort=0.0.0.0:9000:9000
ShmSize=256m
[Service]
Restart=always
[Install]
WantedBy=multi-user.target # System target, not default.target
```
```ini
# Container: /etc/containers/systemd/authentik-server.container
[Container]
User=966:966
PodmanArgs=--group-add 962 --group-add 961
Volume=/var/run/postgresql:/var/run/postgresql:Z
Volume=/var/run/valkey:/var/run/valkey:Z
```
#### Service Management
```bash
# System scope (not user scope)
systemctl status authentik-pod
systemctl restart authentik-server
journalctl -u authentik-server -f
# Verify container location
systemctl status authentik-server | grep CGroup
# CGroup: /system.slice/authentik-server.service ✓
```
### Special Case: Valkey Socket Group Fix
Valkey doesn't natively support socket group configuration (unlike PostgreSQL's `unix_socket_group`). A helper service ensures correct socket permissions:
```ini
# /etc/systemd/system/valkey-socket-fix.service
[Unit]
Description=Fix Valkey socket group ownership and permissions
BindsTo=valkey.service
After=valkey.service
[Service]
Type=oneshot
ExecStart=/bin/sh -c 'i=0; while [ ! -S /var/run/valkey/valkey.sock ] && [ $i -lt 100 ]; do sleep 0.1; i=$((i+1)); done'
ExecStart=/bin/chgrp valkey-clients /var/run/valkey/valkey.sock
ExecStart=/bin/chmod 770 /var/run/valkey/valkey.sock
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
```
Triggered by Valkey service:
```ini
# /etc/systemd/system/valkey.service (excerpt)
[Unit]
Wants=valkey-socket-fix.service
```
### Consequences
#### Positive
- **Socket Access Works**: Group-based permissions function correctly
- **Security**: Containers run as dedicated users (not root), no privileged daemon
- **Portability**: Dynamic GID facts work across different hosts
- **Consistency**: Same pattern for all containerized applications
- **Simplicity**: No user namespace complexity, standard systemd service management
#### Negative
- **Not "Pure" Rootless**: Containers require root for systemd service deployment
- **Different from Docker**: Less familiar pattern than rootless user services
#### Neutral
- **System vs User Scope**: Different commands (`systemctl` vs `systemctl --user`) but equally capable
- **Deployment Location**: `/etc/containers/systemd/` vs `~/.config/` but same Quadlet functionality
### Validation
```bash
# Verify service location
systemctl status authentik-server | grep CGroup
# → /system.slice/authentik-server.service ✓
# Verify process groups
ps aux | grep authentik | head -1 | awk '{print $2}' | \
xargs -I {} cat /proc/{}/status | grep Groups
# → Groups: 961 962 966 ✓
# Verify socket permissions
ls -l /var/run/postgresql/.s.PGSQL.5432
# → srwxrwx--- postgres postgres-clients ✓
ls -l /var/run/valkey/valkey.sock
# → srwxrwx--- valkey valkey-clients ✓
# Verify HTTP endpoint
curl -I http://127.0.0.1:9000/
# → HTTP/1.1 302 Found ✓
```
### Alternatives Considered
1. **Rootless with user namespace** - Discarded due to GID remapping breaking group-based socket access
2. **TCP-only connections** - Rejected to maintain Unix socket security and performance benefits
3. **Hardcoded GIDs** - Rejected for portability; facts provide dynamic resolution
4. **Directory permissions (777)** - Rejected for security; group-based access more restrictive
---
## Summary
These architecture decisions collectively create a robust, secure, and performant infrastructure:
@@ -771,6 +1050,7 @@ These architecture decisions collectively create a robust, secure, and performan
- **Unix Sockets** eliminate network attack vectors
- **Podman + systemd** delivers secure container orchestration
- **Forward Authentication** enables centralized security without application changes
- **Rootful Container Pattern** enables group-based socket access with infrastructure fact sharing
The combination results in an infrastructure that prioritizes security and performance while maintaining operational simplicity and reliability.