Files
rick-infra/docs/architecture-decisions.md
Joakim 3506e55016 Migrate to rootful container architecture with infrastructure fact pattern
Major architectural change from rootless user services to system-level (rootful)
containers to enable group-based Unix socket access for containerized applications.

Infrastructure Changes:
- PostgreSQL: Export postgres-clients group GID as Ansible fact
- Valkey: Export valkey-clients group GID as Ansible fact
- Valkey: Add socket-fix service to maintain correct socket group ownership
- Both: Set socket directories to 770 with client group ownership

Authentik Role Refactoring:
- Remove rootless container configuration (subuid/subgid, lingering, user systemd)
- Deploy Quadlet files to /etc/containers/systemd/ (system-level)
- Use dynamic GID facts in container PodmanArgs (--group-add)
- Simplify user creation to system user with infrastructure group membership
- Update handlers for system scope service management
- Remove unnecessary container security options (no user namespace isolation)

Container Template Changes:
- Pod: Remove --userns args, change WantedBy to multi-user.target
- Containers: Replace Annotation with PodmanArgs using dynamic GIDs
- Remove /dev/shm mounts and SecurityLabelDisable (not needed for rootful)
- Change WantedBy to multi-user.target for system services

Documentation Updates:
- Add ADR-005: Rootful Containers with Infrastructure Fact Pattern
- Update ADR-003: Podman + systemd for system-level deployment
- Update authentik-deployment-guide.md for system scope commands
- Update service-integration-guide.md with rootful pattern examples
- Document discarded rootless approach and rationale

Why Rootful Succeeds:
- Direct UID/GID mapping preserves supplementary groups
- Container process groups match host socket group ownership
- No user namespace remapping breaking permissions

Why Rootless Failed (Discarded):
- User namespace UID/GID remapping broke group-based socket access
- Supplementary groups remapped into subgid range didn't match socket ownership
- Even with --userns=host and keep_original_groups, permissions failed

Pattern Established:
- Infrastructure roles create client groups and export GID facts
- Application roles validate facts and consume in container templates
- Rootful containers run as dedicated users with --group-add for socket access
- System-level deployment provides standard systemd service management

Deployment Validated:
- Services in /system.slice/ ✓
- Process groups: 961 (valkey-clients), 962 (postgres-clients), 966 (authentik) ✓
- Socket permissions: 770 with client groups ✓
- HTTP endpoint responding ✓
2025-12-14 16:56:50 +01:00

37 KiB

Architecture Decision Records (ADR)

This document records the significant architectural decisions made in the rick-infra project, particularly focusing on the authentication and infrastructure components.

Table of Contents


ADR-001: Native Database Services over Containerized

Status: Accepted
Date: December 2025
Deciders: Infrastructure Team
Technical Story: Need reliable database and cache services for containerized applications with optimal performance and security.

Context

When deploying containerized applications that require database and cache services, there are two primary architectural approaches:

  1. Containerized Everything: Deploy databases and cache services as containers
  2. Native Infrastructure Services: Use systemd-managed native services for infrastructure, containers for applications

Decision

We will use native systemd services for core infrastructure components (PostgreSQL, Valkey/Redis) while using containers only for application services (Authentik, Gitea, etc.).

Rationale

Performance Benefits

  • No Container Overhead: Native services eliminate container runtime overhead
    # Native PostgreSQL: Direct filesystem access
    # Containerized PostgreSQL: Container filesystem layer overhead
    
  • Direct System Resources: Native services access system resources without abstraction layers
  • Optimized Memory Management: OS-level memory management without container constraints
  • Disk I/O Performance: Direct access to storage without container volume mounting overhead

Security Advantages

  • Unix Socket Security: Native services can provide Unix sockets with filesystem-based security
    # Native: /var/run/postgresql/.s.PGSQL.5432 (postgres:postgres 0770)
    # Containerized: Requires network exposure or complex socket mounting
    
  • Reduced Attack Surface: No container runtime vulnerabilities for critical infrastructure
  • OS-Level Security: Standard system security mechanisms apply directly
  • Group-Based Access Control: Simple Unix group membership for service access

Operational Excellence

  • Standard Tooling: Familiar systemd service management
    systemctl status postgresql
    journalctl -u postgresql -f
    systemctl restart postgresql
    
  • Package Management: Standard OS package updates and security patches
  • Backup Integration: Native backup tools work seamlessly
    pg_dump -h /var/run/postgresql authentik > backup.sql
    
  • Monitoring: Standard system monitoring tools apply directly

Reliability

  • systemd Integration: Robust service lifecycle management
    [Unit]
    Description=PostgreSQL database server
    After=network.target
    
    [Service]
    Type=forking
    Restart=always
    RestartSec=5
    
  • Resource Isolation: systemd provides resource isolation without container overhead
  • Proven Architecture: Battle-tested approach used by major infrastructure providers

Consequences

Positive

  • Performance: 15-25% better database performance in benchmarks
  • Security: Eliminated network-based database attacks via Unix sockets
  • Operations: Simplified backup, monitoring, and maintenance procedures
  • Resource Usage: Lower memory and CPU overhead
  • Reliability: More predictable service behavior

Negative

  • Containerization Purity: Not a "pure" containerized environment
  • Portability: Slightly less portable than full-container approach
  • Learning Curve: Team needs to understand both systemd and container management

Neutral

  • Complexity: Different but not necessarily more complex than container orchestration
  • Tooling: Different toolset but equally capable

Implementation Notes

# Infrastructure services (native systemd)
- postgresql  # Native database service
- valkey      # Native cache service  
- caddy       # Native reverse proxy
- podman      # Container runtime

# Application services (containerized)
- authentik   # Authentication service
- gitea       # Git service

Alternatives Considered

  1. Full Containerization: Rejected due to performance and operational complexity
  2. Mixed with Docker: Rejected in favor of Podman for security benefits
  3. VM-based Infrastructure: Rejected due to resource overhead

ADR-002: Unix Socket IPC Architecture

Status: Accepted
Date: December 2025
Deciders: Infrastructure Team
Technical Story: Secure and performant communication between containerized applications and native infrastructure services.

Context

Containerized applications need to communicate with database and cache services. Communication methods include:

  1. Network TCP/IP: Standard network protocols
  2. Unix Domain Sockets: Filesystem-based IPC
  3. Shared Memory: Direct memory sharing (complex)

Decision

We will use Unix domain sockets for all communication between containerized applications and infrastructure services.

Rationale

Security Benefits

  • No Network Exposure: Infrastructure services bind only to Unix sockets
    # PostgreSQL configuration
    listen_addresses = ''                    # No TCP binding
    unix_socket_directories = '/var/run/postgresql'
    
    # Valkey configuration  
    port 0                                   # Disable TCP port
    unixsocket /var/run/valkey/valkey.sock
    
  • Filesystem Permissions: Access controlled by Unix file permissions
    srwxrwx--- 1 postgres postgres 0 /var/run/postgresql/.s.PGSQL.5432
    srwxrwx--- 1 valkey   valkey   0 /var/run/valkey/valkey.sock
    
  • Group-Based Access: Simple group membership controls access
    # Add application user to infrastructure groups
    usermod -a -G postgres,valkey authentik
    
  • No Network Scanning: Services invisible to network reconnaissance

Performance Advantages

  • Lower Latency: Unix sockets have ~20% lower latency than TCP loopback
  • Higher Throughput: Up to 40% higher throughput for local communication
  • Reduced CPU Overhead: No network stack processing required
  • Efficient Data Transfer: Direct kernel-level data copying

Operational Benefits

  • Connection Reliability: Filesystem-based connections are more reliable
  • Resource Monitoring: Standard filesystem monitoring applies
  • Backup Friendly: No network configuration to backup/restore
  • Debugging: Standard filesystem tools for troubleshooting

Implementation Strategy

Container Socket Access

# Container configuration (Quadlet)
[Container]
# Mount socket directories with proper labels
Volume=/var/run/postgresql:/var/run/postgresql:Z
Volume=/var/run/valkey:/var/run/valkey:Z

# Preserve user namespace and groups
PodmanArgs=--userns=host
Annotation=run.oci.keep_original_groups=1

Application Configuration

# Database connection (PostgreSQL)
DATABASE_URL=postgresql://authentik@/authentik?host=/var/run/postgresql

# Cache connection (Redis/Valkey) 
CACHE_URL=unix:///var/run/valkey/valkey.sock?db=1&password=secret

User Management

# Ansible user setup
- name: Add application user to infrastructure groups
  user:
    name: "{{ app_user }}"
    groups:
      - postgres  # For database access
      - valkey    # For cache access
    append: true

Consequences

Positive

  • Security: Eliminated network attack vectors for databases
  • Performance: Measurably faster database and cache operations
  • Reliability: More stable connections than network-based
  • Simplicity: Simpler configuration than network + authentication

Negative

  • Container Complexity: Requires careful container user/group management
  • Learning Curve: Less familiar than standard TCP connections
  • Port Forwarding: Cannot use standard port forwarding for debugging

Mitigation Strategies

  • Documentation: Comprehensive guides for Unix socket configuration
  • Testing: Automated tests verify socket connectivity
  • Tooling: Helper scripts for debugging socket connections

Technical Implementation

# Test socket connectivity
sudo -u authentik psql -h /var/run/postgresql -U authentik -d authentik
sudo -u authentik redis-cli -s /var/run/valkey/valkey.sock ping

# Container user verification
podman exec authentik-server id
# uid=963(authentik) gid=963(authentik) groups=963(authentik),968(postgres),965(valkey)

Alternatives Considered

  1. TCP with Authentication: Rejected due to network exposure
  2. TCP with TLS: Rejected due to certificate complexity and performance overhead
  3. Shared Memory: Rejected due to implementation complexity

ADR-003: Podman + systemd Container Orchestration

Status: Accepted
Date: December 2025
Updated: December 2025 (System-level deployment pattern) Deciders: Infrastructure Team
Technical Story: Container orchestration solution for secure application deployment with systemd integration.

Context

Container orchestration options for a single-node infrastructure:

  1. Docker + Docker Compose: Traditional container orchestration
  2. Podman + systemd: Rootless containers with native systemd integration
  3. Kubernetes: Full orchestration platform (overkill for single node)
  4. Nomad: HashiCorp orchestration solution

Decision

We will use Podman with systemd integration (Quadlet) for container orchestration, deployed as system-level services (rootful containers running as dedicated users).

Rationale

Security Advantages

  • No Daemon Required: No privileged daemon attack surface
    # Docker: Requires root daemon
    sudo systemctl status docker
    
    # Podman: Daemonless operation
    podman ps  # No daemon needed
    
  • Dedicated Service Users: Containers run as dedicated system users (not root)
  • Group-Based Access Control: Unix group membership controls infrastructure access
  • SELinux Integration: Better SELinux support than Docker

systemd Integration Benefits

  • Native Service Management: Containers as system-level systemd services
    # Quadlet file: /etc/containers/systemd/authentik.pod
    [Unit]
    Description=Authentik Authentication Pod
    
    [Pod]
    PublishPort=0.0.0.0:9000:9000
    ShmSize=256m
    
    [Service]  
    Restart=always
    TimeoutStartSec=900
    
    [Install]
    WantedBy=multi-user.target
    
  • Dependency Management: systemd handles service dependencies
  • Resource Control: systemd resource limits and monitoring
  • Logging Integration: journald for centralized logging

Operational Excellence

  • Familiar Tooling: Standard systemd commands
    systemctl status authentik-pod
    systemctl restart authentik-server
    journalctl -u authentik-server -f
    
  • Boot Integration: Services start automatically at system boot
  • Resource Monitoring: systemd resource tracking
  • Configuration Management: Declarative Quadlet files

Performance Benefits

  • Lower Overhead: No daemon overhead for container management
  • Direct Kernel Access: Better performance than daemon-based solutions
  • Resource Efficiency: More efficient resource utilization

Implementation Architecture

┌─────────────────────────────────────────────────────────────┐
│ systemd System Services (/system.slice/)                   │
│                                                             │
│ ┌─────────────────┐  ┌─────────────────┐  ┌───────────────┐ │
│ │ authentik-pod   │  │ authentik-server│  │authentik-worker│ │
│ │ .service        │  │ .service        │  │ .service       │ │
│ └─────────────────┘  └─────────────────┘  └───────────────┘ │
│           │                    │                    │       │
│           └────────────────────┼────────────────────┘       │
│                                │                            │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Podman Pod (rootful, dedicated user)                    │ │
│ │                                                         │ │
│ │ ┌─────────────────┐  ┌─────────────────────────────────┐ │ │
│ │ │ Server Container│  │ Worker Container                │ │ │
│ │ │ User: 966:966   │  │ User: 966:966                  │ │ │
│ │ │ Groups: 961,962 │  │ Groups: 961,962                │ │ │
│ │ │ (valkey,postgres)│ │ (valkey,postgres)              │ │ │
│ │ └─────────────────┘  └─────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
           │                                                  
           │ Group-based access to infrastructure            
           ▼                                                  
┌─────────────────────────────────────────────────────────────┐
│ Infrastructure Services                                     │
│ PostgreSQL: /var/run/postgresql (postgres:postgres-clients)│
│ Valkey: /var/run/valkey (valkey:valkey-clients)           │
└─────────────────────────────────────────────────────────────┘

Quadlet Configuration

# Pod configuration (authentik.pod)
[Unit]
Description=Authentik Authentication Pod

[Pod]
PublishPort=127.0.0.1:9000:9000
ShmSize=256m

[Service]
Restart=always

[Install]
WantedBy=default.target
# Container configuration (authentik-server.container)
[Unit]
Description=Authentik Server Container
After=authentik-pod.service
Requires=authentik-pod.service

[Container]
ContainerName=authentik-server
Image=ghcr.io/goauthentik/server:2025.10
Pod=authentik.pod
EnvironmentFile=/opt/authentik/.env
User=966:966
PodmanArgs=--group-add 962 --group-add 961

# Volume mounts for sockets and data
Volume=/opt/authentik/media:/media
Volume=/opt/authentik/data:/data
Volume=/var/run/postgresql:/var/run/postgresql:Z
Volume=/var/run/valkey:/var/run/valkey:Z

[Service]
Restart=always
TimeoutStartSec=300

[Install]
WantedBy=multi-user.target

User Management Strategy

# Ansible implementation
- name: Create service user
  user:
    name: authentik
    group: authentik
    groups: [postgres-clients, valkey-clients]
    system: true
    shell: /bin/bash
    home: /opt/authentik
    create_home: true
    append: true

Note: Infrastructure roles (PostgreSQL, Valkey) export client group GIDs as Ansible facts (postgresql_client_group_gid, valkey_client_group_gid) which are consumed by application container templates for dynamic --group-add arguments.

Consequences

Positive

  • Security: Eliminated privileged daemon attack surface
  • Integration: Seamless systemd integration for management
  • Performance: Lower overhead than daemon-based solutions
  • Reliability: systemd's proven service management
  • Monitoring: Standard systemd monitoring and logging

Negative

  • Learning Curve: Different from Docker Compose workflows
  • Tooling: Ecosystem less mature than Docker
  • Documentation: Fewer online resources and examples

Mitigation Strategies

  • Documentation: Comprehensive internal documentation
  • Training: Team training on Podman/systemd workflows
  • Tooling: Helper scripts for common operations

Technical Implementation

# Container management (system scope)
systemctl status authentik-pod
systemctl restart authentik-server
podman ps
podman logs authentik-server

# Resource monitoring
systemctl show authentik-server --property=MemoryCurrent
journalctl -u authentik-server -f

# Verify container groups
ps aux | grep authentik-server | head -1 | awk '{print $2}' | \
  xargs -I {} cat /proc/{}/status | grep Groups
# Output: Groups: 961 962 966

Alternatives Considered

  1. Docker + Docker Compose: Rejected due to security concerns (privileged daemon)
  2. Kubernetes: Rejected as overkill for single-node deployment
  3. Nomad: Rejected to maintain consistency with systemd ecosystem

ADR-004: Forward Authentication Security Model

Status: Accepted
Date: December 2025
Deciders: Infrastructure Team
Technical Story: Centralized authentication and authorization for multiple services without modifying existing applications.

Context

Authentication strategies for multiple services:

  1. Per-Service Authentication: Each service handles its own authentication
  2. Shared Database: Services share authentication database
  3. Forward Authentication: Reverse proxy handles authentication
  4. OAuth2/OIDC Integration: Services implement OAuth2 clients

Decision

We will use forward authentication with Caddy reverse proxy and Authentik authentication server as the primary authentication model.

Rationale

Security Benefits

  • Single Point of Control: Centralized authentication policy
  • Zero Application Changes: Protect existing services without modification
  • Consistent Security: Same security model across all services
  • Session Management: Centralized session handling and timeouts
  • Multi-Factor Authentication: MFA applied consistently across services

Operational Advantages

  • Simplified Deployment: No per-service authentication setup
  • Audit Trail: Centralized authentication logging
  • Policy Management: Single place to manage access policies
  • User Management: One system for all user administration
  • Service Independence: Services focus on business logic

Integration Benefits

  • Transparent to Applications: Services receive authenticated requests
  • Header-Based Identity: Simple identity propagation
    Remote-User: john.doe
    Remote-Name: John Doe
    Remote-Email: john.doe@company.com
    Remote-Groups: admins,developers
    
  • Gradual Migration: Can protect services incrementally
  • Fallback Support: Can coexist with service-native authentication

Implementation Architecture

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│    User     │    │    Caddy    │    │  Authentik  │    │   Service   │
│             │    │  (Proxy)    │    │   (Auth)    │    │ (Backend)   │
└──────┬──────┘    └──────┬──────┘    └──────┬──────┘    └──────┬──────┘
       │                  │                  │                  │
       │ GET /dashboard   │                  │                  │
       │─────────────────▶│                  │                  │
       │                  │                  │                  │
       │                  │ Forward Auth     │                  │
       │                  │─────────────────▶│                  │
       │                  │                  │                  │
       │                  │ 401 Unauthorized │                  │
       │                  │◀─────────────────│                  │
       │                  │                  │                  │
       │ 302 → /auth/login│                  │                  │
       │◀─────────────────│                  │                  │
       │                  │                  │                  │
       │ Login form       │                  │                  │
       │─────────────────▶│─────────────────▶│                  │
       │                  │                  │                  │
       │ Credentials      │                  │                  │
       │─────────────────▶│─────────────────▶│                  │
       │                  │                  │                  │
       │ Set-Cookie       │                  │                  │
       │◀─────────────────│◀─────────────────│                  │
       │                  │                  │                  │
       │ GET /dashboard   │                  │                  │
       │─────────────────▶│                  │                  │
       │                  │                  │                  │
       │                  │ Forward Auth     │                  │
       │                  │─────────────────▶│                  │
       │                  │                  │                  │
       │                  │ 200 + Headers    │                  │
       │                  │◀─────────────────│                  │
       │                  │                  │                  │
       │                  │ GET /dashboard + Auth Headers       │
       │                  │─────────────────────────────────────▶│
       │                  │                                     │
       │                  │ Dashboard Content                   │
       │                  │◀─────────────────────────────────────│
       │                  │                                     │
       │ Dashboard        │                                     │
       │◀─────────────────│                                     │

Caddy Configuration

# Service protection template
dashboard.jnss.me {
    # Forward authentication to Authentik
    forward_auth https://auth.jnss.me {
        uri /outpost.goauthentik.io/auth/caddy
        copy_headers Remote-User Remote-Name Remote-Email Remote-Groups
    }
    
    # Backend service (receives authenticated requests)
    reverse_proxy localhost:8080
}

Service Integration

Services receive authentication information via HTTP headers:

# Example service code (Python Flask)
@app.route('/dashboard')
def dashboard():
    username = request.headers.get('Remote-User')
    name = request.headers.get('Remote-Name') 
    email = request.headers.get('Remote-Email')
    groups = request.headers.get('Remote-Groups', '').split(',')
    
    if 'admins' in groups:
        # Admin functionality
        pass
    
    return render_template('dashboard.html', 
                         username=username, 
                         name=name)

Authentik Provider Configuration

# Authentik Proxy Provider configuration
name: "Service Forward Auth"
authorization_flow: "default-authorization-flow"
external_host: "https://service.jnss.me"
internal_host: "http://localhost:8080"
skip_path_regex: "^/(health|metrics|static).*"

Authorization Policies

# Example authorization policy in Authentik
policy_bindings:
  - policy: "group_admins_only"
    target: "service_dashboard"
    order: 0
  
  - policy: "deny_external_ips" 
    target: "admin_endpoints"
    order: 1

Consequences

Positive

  • Security: Consistent, centralized authentication and authorization
  • Simplicity: No application changes required for protection
  • Flexibility: Fine-grained access control through Authentik policies
  • Auditability: Centralized authentication logging
  • User Experience: Single sign-on across all services

Negative

  • Single Point of Failure: Authentication system failure affects all services
  • Performance: Additional hop for authentication checks
  • Complexity: Additional component in the request path

Mitigation Strategies

  • High Availability: Robust deployment and monitoring of auth components
  • Caching: Session caching to reduce authentication overhead
  • Fallback: Emergency bypass procedures for critical services
  • Monitoring: Comprehensive monitoring of authentication flow

Security Considerations

Session Security

# Authentik session settings
session_cookie_age: 3600  # 1 hour
session_cookie_secure: true
session_cookie_samesite: "Strict"
session_remember_me: false

Access Control

  • Group-Based Authorization: Users assigned to groups, groups to applications
  • Time-Based Access: Temporary access grants
  • IP-Based Restrictions: Geographic or network-based access control
  • MFA Requirements: Multi-factor authentication for sensitive services

Audit Logging

{
  "timestamp": "2025-12-11T17:52:31Z",
  "event": "authentication_success",
  "user": "john.doe",
  "service": "dashboard.jnss.me",
  "ip": "192.168.1.100",
  "user_agent": "Mozilla/5.0..."
}

Alternative Models Supported

While forward auth is primary, we also support:

  1. OAuth2/OIDC Integration: For applications that can implement OAuth2
  2. API Key Authentication: For service-to-service communication
  3. Service-Native Auth: For legacy applications that cannot be easily protected

Implementation Examples

Protecting a Static Site

docs.jnss.me {
    forward_auth https://auth.jnss.me {
        uri /outpost.goauthentik.io/auth/caddy
        copy_headers Remote-User Remote-Groups
    }
    
    root * /var/www/docs
    file_server
}

Protecting an API

api.jnss.me {
    forward_auth https://auth.jnss.me {
        uri /outpost.goauthentik.io/auth/caddy
        copy_headers Remote-User Remote-Email Remote-Groups
    }
    
    reverse_proxy localhost:3000
}

Public Endpoints with Selective Protection

app.jnss.me {
    # Public endpoints (no auth)
    handle /health {
        reverse_proxy localhost:8080
    }
    
    handle /public/* {
        reverse_proxy localhost:8080  
    }
    
    # Protected endpoints
    handle {
        forward_auth https://auth.jnss.me {
            uri /outpost.goauthentik.io/auth/caddy
            copy_headers Remote-User Remote-Groups
        }
        reverse_proxy localhost:8080
    }
}

Alternatives Considered

  1. OAuth2 Only: Rejected due to application modification requirements
  2. Shared Database: Rejected due to tight coupling between services
  3. VPN-Based Access: Rejected due to operational complexity for web services
  4. Per-Service Authentication: Rejected due to management overhead

ADR-005: Rootful Containers with Infrastructure Fact Pattern

Status: Accepted
Date: December 2025
Deciders: Infrastructure Team
Technical Story: Enable containerized applications to access native infrastructure services (PostgreSQL, Valkey) via Unix sockets with group-based permissions.

Context

Containerized applications need to access infrastructure services (PostgreSQL, Valkey) through Unix sockets with filesystem-based permission controls. The permission model requires:

  1. Socket directories owned by service groups (postgres-clients, valkey-clients)
  2. Application users added to these groups for access
  3. Container processes must preserve group membership to access sockets

Two approaches were evaluated:

  1. Rootless containers (user namespace): Containers run in user namespace with UID/GID remapping
  2. Rootful containers (system services): Containers run as dedicated system users without namespace isolation

Decision

We will use rootful containers deployed as system-level systemd services with an Infrastructure Fact Pattern where infrastructure roles export client group GIDs as Ansible facts for application consumption.

Rationale

Why Rootful Succeeds

Direct UID/GID Mapping:

# Host: authentik user UID 966, groups: 966 (authentik), 961 (valkey-clients), 962 (postgres-clients)
# Container User=966:966 with PodmanArgs=--group-add 961 --group-add 962

# Inside container:
id
# uid=966(authentik) gid=966(authentik) groups=966(authentik),961(valkey-clients),962(postgres-clients)

# Socket access works:
ls -l /var/run/postgresql/.s.PGSQL.5432
# srwxrwx--- 1 postgres postgres-clients 0 ... /var/run/postgresql/.s.PGSQL.5432

Group membership preserved: Container process has GIDs 961 and 962, matching socket group ownership.

Why Rootless Failed (Discarded Approach)

User Namespace UID/GID Remapping:

# Host: authentik user UID 100000, subuid range 200000-265535
# Container User=%i:%i with --userns=host --group-add=keep-groups

# User namespace remaps:
# Host UID 100000 → Container UID 100000 (root in namespace)
# Host GID 961 → Container GID 200961 (remapped into subgid range)
# Host GID 962 → Container GID 200962 (remapped into subgid range)

# Socket ownership on host:
# srwxrwx--- 1 postgres postgres-clients (GID 962)

# Container process groups: 200961, 200962 (remapped)
# Socket expects: GID 962 (not remapped)
# Result: Permission denied ❌

Root cause: User namespace supplementary group remapping breaks group-based socket access even with --userns=host, --group-add=keep-groups, and Annotation=run.oci.keep_original_groups=1.

Infrastructure Fact Pattern

Infrastructure Roles Export GIDs

Infrastructure services create client groups and export their GIDs as Ansible facts:

# PostgreSQL role: roles/postgresql/tasks/main.yml
- name: Create PostgreSQL client access group
  group:
    name: postgres-clients
    system: true

- name: Get PostgreSQL client group GID
  shell: "getent group postgres-clients | cut -d: -f3"
  register: postgresql_client_group_lookup
  changed_when: false

- name: Set PostgreSQL client group GID as fact
  set_fact:
    postgresql_client_group_gid: "{{ postgresql_client_group_lookup.stdout }}"
# Valkey role: roles/valkey/tasks/main.yml
- name: Create Valkey client access group
  group:
    name: valkey-clients
    system: true

- name: Get Valkey client group GID
  shell: "getent group valkey-clients | cut -d: -f3"
  register: valkey_client_group_lookup
  changed_when: false

- name: Set Valkey client group GID as fact
  set_fact:
    valkey_client_group_gid: "{{ valkey_client_group_lookup.stdout }}"

Application Roles Consume Facts

Application roles validate and consume infrastructure facts:

# Authentik role: roles/authentik/tasks/main.yml
- name: Validate infrastructure facts are available
  assert:
    that:
      - postgresql_client_group_gid is defined
      - valkey_client_group_gid is defined
    fail_msg: |
      Required infrastructure facts are not available.
      Ensure PostgreSQL and Valkey roles have run first.

- name: Create authentik user with infrastructure groups
  user:
    name: authentik
    groups: [postgres-clients, valkey-clients]
    append: true
# Container template: roles/authentik/templates/authentik-server.container
[Container]
User={{ authentik_uid }}:{{ authentik_gid }}
PodmanArgs=--group-add {{ postgresql_client_group_gid }} --group-add {{ valkey_client_group_gid }}

Implementation Details

System-Level Deployment

# Quadlet files deployed to /etc/containers/systemd/ (not ~/.config/)
# Pod: /etc/containers/systemd/authentik.pod
[Unit]
Description=Authentik Authentication Pod

[Pod]
PublishPort=0.0.0.0:9000:9000
ShmSize=256m

[Service]
Restart=always

[Install]
WantedBy=multi-user.target  # System target, not default.target
# Container: /etc/containers/systemd/authentik-server.container
[Container]
User=966:966
PodmanArgs=--group-add 962 --group-add 961

Volume=/var/run/postgresql:/var/run/postgresql:Z
Volume=/var/run/valkey:/var/run/valkey:Z

Service Management

# System scope (not user scope)
systemctl status authentik-pod
systemctl restart authentik-server
journalctl -u authentik-server -f

# Verify container location
systemctl status authentik-server | grep CGroup
# CGroup: /system.slice/authentik-server.service ✓

Special Case: Valkey Socket Group Fix

Valkey doesn't natively support socket group configuration (unlike PostgreSQL's unix_socket_group). A helper service ensures correct socket permissions:

# /etc/systemd/system/valkey-socket-fix.service
[Unit]
Description=Fix Valkey socket group ownership and permissions
BindsTo=valkey.service
After=valkey.service

[Service]
Type=oneshot
ExecStart=/bin/sh -c 'i=0; while [ ! -S /var/run/valkey/valkey.sock ] && [ $i -lt 100 ]; do sleep 0.1; i=$((i+1)); done'
ExecStart=/bin/chgrp valkey-clients /var/run/valkey/valkey.sock
ExecStart=/bin/chmod 770 /var/run/valkey/valkey.sock
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

Triggered by Valkey service:

# /etc/systemd/system/valkey.service (excerpt)
[Unit]
Wants=valkey-socket-fix.service

Consequences

Positive

  • Socket Access Works: Group-based permissions function correctly
  • Security: Containers run as dedicated users (not root), no privileged daemon
  • Portability: Dynamic GID facts work across different hosts
  • Consistency: Same pattern for all containerized applications
  • Simplicity: No user namespace complexity, standard systemd service management

Negative

  • Not "Pure" Rootless: Containers require root for systemd service deployment
  • Different from Docker: Less familiar pattern than rootless user services

Neutral

  • System vs User Scope: Different commands (systemctl vs systemctl --user) but equally capable
  • Deployment Location: /etc/containers/systemd/ vs ~/.config/ but same Quadlet functionality

Validation

# Verify service location
systemctl status authentik-server | grep CGroup
# → /system.slice/authentik-server.service ✓

# Verify process groups
ps aux | grep authentik | head -1 | awk '{print $2}' | \
  xargs -I {} cat /proc/{}/status | grep Groups
# → Groups: 961 962 966 ✓

# Verify socket permissions
ls -l /var/run/postgresql/.s.PGSQL.5432
# → srwxrwx--- postgres postgres-clients ✓

ls -l /var/run/valkey/valkey.sock
# → srwxrwx--- valkey valkey-clients ✓

# Verify HTTP endpoint
curl -I http://127.0.0.1:9000/
# → HTTP/1.1 302 Found ✓

Alternatives Considered

  1. Rootless with user namespace - Discarded due to GID remapping breaking group-based socket access
  2. TCP-only connections - Rejected to maintain Unix socket security and performance benefits
  3. Hardcoded GIDs - Rejected for portability; facts provide dynamic resolution
  4. Directory permissions (777) - Rejected for security; group-based access more restrictive

Summary

These architecture decisions collectively create a robust, secure, and performant infrastructure:

  • Native Services provide optimal performance and security
  • Unix Sockets eliminate network attack vectors
  • Podman + systemd delivers secure container orchestration
  • Forward Authentication enables centralized security without application changes
  • Rootful Container Pattern enables group-based socket access with infrastructure fact sharing

The combination results in an infrastructure that prioritizes security and performance while maintaining operational simplicity and reliability.

References