Migrate to rootful container architecture with infrastructure fact pattern

Major architectural change from rootless user services to system-level (rootful)
containers to enable group-based Unix socket access for containerized applications.

Infrastructure Changes:
- PostgreSQL: Export postgres-clients group GID as Ansible fact
- Valkey: Export valkey-clients group GID as Ansible fact
- Valkey: Add socket-fix service to maintain correct socket group ownership
- Both: Set socket directories to 770 with client group ownership

Authentik Role Refactoring:
- Remove rootless container configuration (subuid/subgid, lingering, user systemd)
- Deploy Quadlet files to /etc/containers/systemd/ (system-level)
- Use dynamic GID facts in container PodmanArgs (--group-add)
- Simplify user creation to system user with infrastructure group membership
- Update handlers for system scope service management
- Remove unnecessary container security options (no user namespace isolation)

Container Template Changes:
- Pod: Remove --userns args, change WantedBy to multi-user.target
- Containers: Replace Annotation with PodmanArgs using dynamic GIDs
- Remove /dev/shm mounts and SecurityLabelDisable (not needed for rootful)
- Change WantedBy to multi-user.target for system services

Documentation Updates:
- Add ADR-005: Rootful Containers with Infrastructure Fact Pattern
- Update ADR-003: Podman + systemd for system-level deployment
- Update authentik-deployment-guide.md for system scope commands
- Update service-integration-guide.md with rootful pattern examples
- Document discarded rootless approach and rationale

Why Rootful Succeeds:
- Direct UID/GID mapping preserves supplementary groups
- Container process groups match host socket group ownership
- No user namespace remapping breaking permissions

Why Rootless Failed (Discarded):
- User namespace UID/GID remapping broke group-based socket access
- Supplementary groups remapped into subgid range didn't match socket ownership
- Even with --userns=host and keep_original_groups, permissions failed

Pattern Established:
- Infrastructure roles create client groups and export GID facts
- Application roles validate facts and consume in container templates
- Rootful containers run as dedicated users with --group-add for socket access
- System-level deployment provides standard systemd service management

Deployment Validated:
- Services in /system.slice/ ✓
- Process groups: 961 (valkey-clients), 962 (postgres-clients), 966 (authentik) ✓
- Socket permissions: 770 with client groups ✓
- HTTP endpoint responding ✓
This commit is contained in:
2025-12-14 16:56:50 +01:00
parent 9e570ac2a3
commit 3506e55016
21 changed files with 587 additions and 288 deletions

View File

@@ -9,6 +9,19 @@
# Note: Arch Linux's redis package (which provides Valkey) creates the 'valkey' user automatically
# We don't need to create users - just ensure data directory permissions
- name: Create Valkey client access group
group:
name: "{{ valkey_client_group }}"
system: true
when: valkey_client_group_create
- name: Ensure valkey user is in client group
user:
name: valkey
groups: "{{ valkey_client_group }}"
append: true
when: valkey_client_group_create
- name: Create Valkey configuration directory
file:
path: /etc/valkey
@@ -33,17 +46,8 @@
path: "{{ valkey_unix_socket_path | dirname }}"
state: directory
owner: valkey
group: valkey
mode: '0775'
when: valkey_unix_socket_enabled
- name: Ensure socket directory is accessible
file:
path: "{{ valkey_unix_socket_path | dirname }}"
owner: valkey
group: valkey
mode: '0775'
recurse: yes
group: "{{ valkey_client_group }}"
mode: '0770'
when: valkey_unix_socket_enabled
- name: Deploy Valkey configuration file
@@ -56,6 +60,43 @@
backup: yes
notify: restart valkey
- name: Deploy Valkey systemd service file (with socket group management)
template:
src: valkey.service.j2
dest: /etc/systemd/system/valkey.service
mode: '0644'
notify:
- reload systemd
- restart valkey
when: valkey_client_group_create
- name: Deploy Valkey socket group fix service
template:
src: valkey-socket-fix.service.j2
dest: /etc/systemd/system/valkey-socket-fix.service
mode: '0644'
notify:
- reload systemd
when: valkey_client_group_create and valkey_unix_socket_enabled
- name: Enable Valkey socket group fix service
systemd:
name: valkey-socket-fix
enabled: true
daemon_reload: true
when: valkey_client_group_create and valkey_unix_socket_enabled
- name: Get Valkey client group GID for containerized applications
shell: "getent group {{ valkey_client_group }} | cut -d: -f3"
register: valkey_client_group_lookup
changed_when: false
when: valkey_client_group_create
- name: Set Valkey client group GID as fact
set_fact:
valkey_client_group_gid: "{{ valkey_client_group_lookup.stdout }}"
when: valkey_client_group_create and valkey_client_group_lookup.stdout is defined
- name: Enable and start Valkey service
systemd:
name: valkey
@@ -64,13 +105,6 @@
daemon_reload: true
register: valkey_service_result
- name: Wait for Valkey to be ready (TCP)
wait_for:
port: "{{ valkey_port }}"
host: "{{ valkey_bind }}"
timeout: 30
when: valkey_service_state == "started" and not valkey_unix_socket_enabled
- name: Wait for Valkey socket file to exist
wait_for:
path: "{{ valkey_unix_socket_path }}"
@@ -102,13 +136,6 @@
(valkey_socket_ping_noauth.stdout != "PONG") and
("NOAUTH" in (valkey_socket_ping_noauth.stdout + valkey_socket_ping_noauth.stderr) or valkey_socket_ping_noauth.rc != 0)
- name: Test Valkey connectivity (TCP)
command: redis-cli -h {{ valkey_bind }} -p {{ valkey_port }} -a {{ valkey_password }} ping
register: valkey_ping_result_tcp
changed_when: false
failed_when: valkey_ping_result_tcp.stdout != "PONG"
when: valkey_service_state == "started" and not valkey_unix_socket_enabled
- name: Test Valkey connectivity (Unix Socket)
command: redis-cli -s {{ valkey_unix_socket_path }} -a {{ valkey_password }} ping
register: valkey_ping_result_socket