# Architecture Decision Records (ADR) This document records the significant architectural decisions made in the rick-infra project. --- ## Unix Socket IPC Architecture ### Context Containerized applications need to communicate with database and cache services. Communication methods include: 1. **Network TCP/IP**: Standard network protocols 2. **Unix Domain Sockets**: Filesystem-based IPC ### Decision We will use **Unix domain sockets** for all communication between applications and infrastructure services. ### Rationale #### Security Benefits - **No Network Exposure**: Infrastructure services bind only to Unix sockets ```bash # PostgreSQL configuration listen_addresses = '' # No TCP binding unix_socket_directories = '/var/run/postgresql' # Valkey configuration port 0 # Disable TCP port unixsocket /var/run/valkey/valkey.sock ``` - **Filesystem Permissions**: Access controlled by Unix file permissions ```bash srwxrwx--- 1 postgres postgres 0 /var/run/postgresql/.s.PGSQL.5432 srwxrwx--- 1 valkey valkey 0 /var/run/valkey/valkey.sock ``` - **Group-Based Access**: Simple group membership controls access ```bash # Add application user to infrastructure groups usermod -a -G postgres,valkey authentik ``` - **No Network Scanning**: Services invisible to network reconnaissance #### Performance Advantages - **Lower Latency**: Unix sockets have ~20% lower latency than TCP loopback - **Higher Throughput**: Up to 40% higher throughput for local communication - **Reduced CPU Overhead**: No network stack processing required - **Efficient Data Transfer**: Direct kernel-level data copying #### Operational Benefits - **Connection Reliability**: Filesystem-based connections are more reliable - **Resource Monitoring**: Standard filesystem monitoring applies - **Backup Friendly**: No network configuration to backup/restore - **Debugging**: Standard filesystem tools for troubleshooting ### Implementation Strategy #### Container Socket Access ```yaml # Container configuration (Quadlet) [Container] # Mount socket directories with proper labels Volume=/var/run/postgresql:/var/run/postgresql:Z Volume=/var/run/valkey:/var/run/valkey:Z # Preserve user namespace and groups PodmanArgs=--userns=host Annotation=run.oci.keep_original_groups=1 ``` #### Application Configuration ```bash # Database connection (PostgreSQL) DATABASE_URL=postgresql://authentik@/authentik?host=/var/run/postgresql # Cache connection (Redis/Valkey) CACHE_URL=unix:///var/run/valkey/valkey.sock?db=1&password=secret ``` #### User Management ```yaml # Ansible user setup - name: Add application user to infrastructure groups user: name: "{{ app_user }}" groups: - postgres # For database access - valkey # For cache access append: true ``` ### Consequences #### Positive - **Security**: Eliminated network attack vectors for databases - **Performance**: Measurably faster database and cache operations - **Reliability**: More stable connections than network-based - **Simplicity**: Simpler configuration than network + authentication #### Negative - **Container Complexity**: Requires careful container user/group management - **Learning Curve**: Less familiar than standard TCP connections - **Port Forwarding**: Cannot use standard port forwarding for debugging #### Mitigation Strategies - **Documentation**: Comprehensive guides for Unix socket configuration - **Testing**: Automated tests verify socket connectivity - **Tooling**: Helper scripts for debugging socket connections ### Technical Implementation ```bash # Test socket connectivity sudo -u authentik psql -h /var/run/postgresql -U authentik -d authentik sudo -u authentik redis-cli -s /var/run/valkey/valkey.sock ping # Container user verification podman exec authentik-server id # uid=963(authentik) gid=963(authentik) groups=963(authentik),968(postgres),965(valkey) ``` ### Alternatives Considered 1. **TCP with Authentication**: Rejected due to network exposure 2. **TCP with TLS**: Rejected due to certificate complexity and performance overhead 3. **Shared Memory**: Rejected due to implementation complexity --- ## ADR-003: Podman + systemd Container Orchestration **Technical Story**: Container orchestration solution for secure application deployment with systemd integration. ### Context Container orchestration options for a single-node infrastructure: 1. **Docker + Docker Compose**: Traditional container orchestration 2. **Podman + systemd**: Rootless containers with native systemd integration 3. **Kubernetes**: Full orchestration platform (overkill for single node) 4. **Nomad**: HashiCorp orchestration solution ### Decision We will use **Podman with systemd integration (Quadlet)** for container orchestration, deployed as system-level services (rootful containers running as dedicated users). ### Rationale #### Security Advantages - **No Daemon Required**: No privileged daemon attack surface ```bash # Docker: Requires root daemon sudo systemctl status docker # Podman: Daemonless operation podman ps # No daemon needed ``` - **Dedicated Service Users**: Containers run as dedicated system users (not root) - **Group-Based Access Control**: Unix group membership controls infrastructure access - **SELinux Integration**: Better SELinux support than Docker #### systemd Integration Benefits - **Native Service Management**: Containers as system-level systemd services ```ini # Quadlet file: /etc/containers/systemd/authentik.pod [Unit] Description=Authentik Authentication Pod [Pod] PublishPort=0.0.0.0:9000:9000 ShmSize=256m [Service] Restart=always TimeoutStartSec=900 [Install] WantedBy=multi-user.target ``` - **Dependency Management**: systemd handles service dependencies - **Resource Control**: systemd resource limits and monitoring - **Logging Integration**: journald for centralized logging #### Operational Excellence - **Familiar Tooling**: Standard systemd commands ```bash systemctl status authentik-pod systemctl restart authentik-server journalctl -u authentik-server -f ``` - **Boot Integration**: Services start automatically at system boot - **Resource Monitoring**: systemd resource tracking - **Configuration Management**: Declarative Quadlet files #### Performance Benefits - **Lower Overhead**: No daemon overhead for container management - **Direct Kernel Access**: Better performance than daemon-based solutions - **Resource Efficiency**: More efficient resource utilization ### Implementation Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ systemd System Services (/system.slice/) │ │ │ │ ┌─────────────────┐ ┌─────────────────┐ ┌───────────────┐ │ │ │ authentik-pod │ │ authentik-server│ │authentik-worker│ │ │ │ .service │ │ .service │ │ .service │ │ │ └─────────────────┘ └─────────────────┘ └───────────────┘ │ │ │ │ │ │ │ └────────────────────┼────────────────────┘ │ │ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Podman Pod (rootful, dedicated user) │ │ │ │ │ │ │ │ ┌─────────────────┐ ┌─────────────────────────────────┐ │ │ │ │ │ Server Container│ │ Worker Container │ │ │ │ │ │ User: 966:966 │ │ User: 966:966 │ │ │ │ │ │ Groups: 961,962 │ │ Groups: 961,962 │ │ │ │ │ │ (valkey,postgres)│ │ (valkey,postgres) │ │ │ │ │ └─────────────────┘ └─────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ │ │ Group-based access to infrastructure ▼ ┌─────────────────────────────────────────────────────────────┐ │ Infrastructure Services │ │ PostgreSQL: /var/run/postgresql (postgres:postgres-clients)│ │ Valkey: /var/run/valkey (valkey:valkey-clients) │ └─────────────────────────────────────────────────────────────┘ ``` #### Quadlet Configuration ```ini # Pod configuration (authentik.pod) [Unit] Description=Authentik Authentication Pod [Pod] PublishPort=127.0.0.1:9000:9000 ShmSize=256m [Service] Restart=always [Install] WantedBy=default.target ``` ```ini # Container configuration (authentik-server.container) [Unit] Description=Authentik Server Container After=authentik-pod.service Requires=authentik-pod.service [Container] ContainerName=authentik-server Image=ghcr.io/goauthentik/server:2025.10 Pod=authentik.pod EnvironmentFile=/opt/authentik/.env User=966:966 PodmanArgs=--group-add 962 --group-add 961 # Volume mounts for sockets and data Volume=/opt/authentik/media:/media Volume=/opt/authentik/data:/data Volume=/var/run/postgresql:/var/run/postgresql:Z Volume=/var/run/valkey:/var/run/valkey:Z [Service] Restart=always TimeoutStartSec=300 [Install] WantedBy=multi-user.target ``` ### User Management Strategy ```yaml # Ansible implementation - name: Create service user user: name: authentik group: authentik groups: [postgres-clients, valkey-clients] system: true shell: /bin/bash home: /opt/authentik create_home: true append: true ``` **Note**: Infrastructure roles (PostgreSQL, Valkey) export client group GIDs as Ansible facts (`postgresql_client_group_gid`, `valkey_client_group_gid`) which are consumed by application container templates for dynamic `--group-add` arguments. ### Consequences #### Positive - **Security**: Eliminated privileged daemon attack surface - **Integration**: Seamless systemd integration for management - **Performance**: Lower overhead than daemon-based solutions - **Reliability**: systemd's proven service management - **Monitoring**: Standard systemd monitoring and logging #### Negative - **Learning Curve**: Different from Docker Compose workflows - **Tooling**: Ecosystem less mature than Docker - **Documentation**: Fewer online resources and examples #### Mitigation Strategies - **Documentation**: Comprehensive internal documentation - **Training**: Team training on Podman/systemd workflows - **Tooling**: Helper scripts for common operations ### Technical Implementation ```bash # Container management (system scope) systemctl status authentik-pod systemctl restart authentik-server podman ps podman logs authentik-server # Resource monitoring systemctl show authentik-server --property=MemoryCurrent journalctl -u authentik-server -f # Verify container groups ps aux | grep authentik-server | head -1 | awk '{print $2}' | \ xargs -I {} cat /proc/{}/status | grep Groups # Output: Groups: 961 962 966 ``` ### Alternatives Considered 1. **Docker + Docker Compose**: Rejected due to security concerns (privileged daemon) 2. **Kubernetes**: Rejected as overkill for single-node deployment 3. **Nomad**: Rejected to maintain consistency with systemd ecosystem --- ## OAuth/OIDC and Forward Authentication Security Model **Technical Story**: Centralized authentication and authorization for multiple services using industry-standard OAuth2/OIDC protocols where supported, with forward authentication as a fallback. ### Context Authentication strategies for multiple services: 1. **Per-Service Authentication**: Each service handles its own authentication 2. **Shared Database**: Services share authentication database 3. **OAuth2/OIDC Integration**: Services implement standard OAuth2/OIDC clients 4. **Forward Authentication**: Reverse proxy handles authentication for services without OAuth support ### Decision We will use **OAuth2/OIDC integration** as the primary authentication method for services that support it, and **forward authentication** for services that do not support native OAuth2/OIDC integration. ### Rationale #### OAuth/OIDC as Primary Method **Security Benefits**: - **Standard Protocol**: Industry-standard authentication flow (RFC 6749, RFC 7636) - **Token-Based Security**: Secure JWT tokens with cryptographic signatures - **Proper Session Management**: Native application session handling with refresh tokens - **Scope-Based Authorization**: Fine-grained permission control via OAuth scopes - **PKCE Support**: Protection against authorization code interception attacks **Integration Benefits**: - **Native Support**: Applications designed for OAuth/OIDC work seamlessly - **Better UX**: Proper redirect flows, logout handling, and token refresh - **API Access**: OAuth tokens enable secure API integrations - **Standard Claims**: OpenID Connect user info endpoint provides standardized user data - **Multi-Application SSO**: Proper single sign-on with token sharing **Examples**: Nextcloud, Gitea, Grafana, many modern applications #### Forward Auth as Fallback **Use Cases**: - Services without OAuth/OIDC support - Legacy applications that cannot be modified - Static sites requiring authentication - Simple internal tools **Security Benefits**: - **Zero Application Changes**: Protect existing services without modification - **Header-Based Identity**: Simple identity propagation to backend - **Transparent Protection**: Services receive pre-authenticated requests **Limitations**: - **Non-Standard**: Not using industry-standard authentication protocols - **Proxy Dependency**: All requests must flow through authenticating proxy - **Limited Logout**: Complex logout scenarios across services - **Header Trust**: Backend must trust proxy-provided headers #### Shared Benefits (Both Methods) - **Single Point of Control**: Centralized authentication policy via Authentik - **Consistent Security**: Same authentication provider across all services - **Multi-Factor Authentication**: MFA applied consistently via Authentik - **Audit Trail**: Centralized authentication logging - **User Management**: One system for all user administration ### Implementation Architecture #### OAuth/OIDC Flow (Primary Method) ``` ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ User │ │ Service │ │ Authentik │ │ │ │ (OAuth App) │ │ (IdP) │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ │ Access Service │ │ │─────────────────▶│ │ │ │ │ │ │ No session │ │ 302 → OAuth │ │ │◀─────────────────│ │ │ │ │ │ GET /authorize?client_id=...&redirect_uri=... │──────────────────────────────────────▶│ │ │ │ │ Login form (if not authenticated) │ │◀────────────────────────────────────│ │ │ │ │ Credentials │ │ │─────────────────────────────────────▶│ │ │ │ │ 302 → callback?code=AUTH_CODE │ │◀────────────────────────────────────│ │ │ │ │ GET /callback?code=AUTH_CODE │ │─────────────────▶│ │ │ │ │ │ │ POST /token │ │ │ code=AUTH_CODE │ │ │─────────────────▶│ │ │ │ │ │ access_token │ │ │ id_token (JWT) │ │ │◀─────────────────│ │ │ │ │ Set-Cookie │ GET /userinfo │ │ 302 → /dashboard │─────────────────▶│ │◀─────────────────│ │ │ │ User claims │ │ │◀─────────────────│ │ │ │ │ GET /dashboard │ │ │─────────────────▶│ │ │ │ │ │ Dashboard │ │ │◀─────────────────│ │ ``` #### Forward Auth Flow (Fallback Method) ``` ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ User │ │ Caddy │ │ Authentik │ │ Service │ │ │ │ (Proxy) │ │ (Forward) │ │ (Backend) │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ │ │ GET / │ │ │ │─────────────────▶│ │ │ │ │ │ │ │ │ Forward Auth │ │ │ │─────────────────▶│ │ │ │ │ │ │ │ 401 Unauthorized │ │ │ │◀─────────────────│ │ │ │ │ │ │ 302 → /auth │ │ │ │◀─────────────────│ │ │ │ │ │ │ │ Login form │ │ │ │──────────────────────────────────────▶│ │ │ │ │ │ │ Credentials │ │ │ │──────────────────────────────────────▶│ │ │ │ │ │ │ Set-Cookie │ │ │ │◀──────────────────────────────────────│ │ │ │ │ │ │ GET / │ │ │ │─────────────────▶│ │ │ │ │ │ │ │ │ Forward Auth │ │ │ │─────────────────▶│ │ │ │ │ │ │ │ 200 + Headers │ │ │ │◀─────────────────│ │ │ │ │ │ │ │ Proxy + Headers │ │ │ │─────────────────────────────────────▶│ │ │ │ │ │ │ Response │ │ │ │◀─────────────────────────────────────│ │ │ │ │ │ Content │ │ │ │◀─────────────────│ │ │ ``` ### OAuth/OIDC Configuration Examples #### Nextcloud OAuth Configuration ```php // Nextcloud config.php 'oidc_login_provider_url' => 'https://auth.jnss.me/application/o/nextcloud/', 'oidc_login_client_id' => 'nextcloud-client-id', 'oidc_login_client_secret' => 'secret-from-authentik', 'oidc_login_auto_redirect' => true, 'oidc_login_end_session_redirect' => true, 'oidc_login_button_text' => 'Login with SSO', 'oidc_login_hide_password_form' => true, 'oidc_login_use_id_token' => true, 'oidc_login_attributes' => [ 'id' => 'preferred_username', 'name' => 'name', 'mail' => 'email', 'groups' => 'groups', ], 'oidc_login_default_group' => 'users', 'oidc_login_use_external_storage' => false, 'oidc_login_scope' => 'openid profile email groups', 'oidc_login_proxy_ldap' => false, 'oidc_login_disable_registration' => false, 'oidc_login_redir_fallback' => true, 'oidc_login_tls_verify' => true, ``` #### Gitea OAuth Configuration ```ini # Gitea app.ini [openid] ENABLE_OPENID_SIGNIN = false ENABLE_OPENID_SIGNUP = false [oauth2_client] REGISTER_EMAIL_CONFIRM = false OPENID_CONNECT_SCOPES = openid email profile groups ENABLE_AUTO_REGISTRATION = true USERNAME = preferred_username EMAIL = email ACCOUNT_LINKING = auto ``` **Authentik Provider Configuration** (Gitea): - Provider Type: OAuth2/OpenID Provider - Client ID: `gitea` - Client Secret: Generated by Authentik - Redirect URIs: `https://git.jnss.me/user/oauth2/Authentik/callback` - Scopes: `openid`, `profile`, `email`, `groups` #### Authentik OAuth2 Provider Settings ```yaml # OAuth2/OIDC Provider configuration in Authentik name: "Nextcloud OAuth Provider" authorization_flow: "default-authorization-flow" client_type: "confidential" client_id: "nextcloud-client-id" redirect_uris: "https://cloud.jnss.me/apps/oidc_login/oidc" signing_key: "authentik-default-key" property_mappings: - "authentik default OAuth Mapping: OpenID 'openid'" - "authentik default OAuth Mapping: OpenID 'email'" - "authentik default OAuth Mapping: OpenID 'profile'" - "Custom: Groups" # Maps user groups to 'groups' claim ``` ### Forward Auth Configuration Examples #### Caddy Configuration for Forward Auth ```caddyfile # whoami service with forward authentication whoami.jnss.me { # Forward authentication to Authentik forward_auth https://auth.jnss.me { uri /outpost.goauthentik.io/auth/caddy copy_headers Remote-User Remote-Name Remote-Email Remote-Groups } # Backend service (receives authenticated requests) reverse_proxy localhost:8080 } ``` #### Authentik Proxy Provider Configuration ```yaml # Authentik Proxy Provider for forward auth name: "Whoami Forward Auth" type: "proxy" authorization_flow: "default-authorization-flow" external_host: "https://whoami.jnss.me" internal_host: "http://localhost:8080" skip_path_regex: "^/(health|metrics).*" mode: "forward_single" # Single application mode ``` #### Service Integration (Forward Auth) Services receive authentication information via HTTP headers: ```python # Example service code (Python Flask) @app.route('/') def index(): username = request.headers.get('Remote-User') name = request.headers.get('Remote-Name') email = request.headers.get('Remote-Email') groups = request.headers.get('Remote-Groups', '').split(',') return render_template('index.html', username=username, name=name, email=email, groups=groups) ``` ### Authorization Policies Both OAuth and Forward Auth support Authentik authorization policies: ```yaml # Example authorization policy in Authentik policy_bindings: - policy: "group_admins_only" target: "nextcloud_oauth_provider" order: 0 - policy: "require_mfa" target: "gitea_oauth_provider" order: 1 - policy: "internal_network_only" target: "whoami_proxy_provider" order: 0 ``` ### Decision Matrix: OAuth/OIDC vs Forward Auth | Criteria | OAuth/OIDC | Forward Auth | |----------|-----------|-------------| | **Application Support** | Requires native OAuth/OIDC support | Any application | | **Protocol Standard** | Industry standard (RFC 6749, 7636) | Proprietary/custom | | **Token Management** | Native refresh tokens, proper expiry | Session-based only | | **Logout Handling** | Proper logout flow | Complex, proxy-dependent | | **API Access** | Full API support via tokens | Header-only | | **Implementation Effort** | Configure OAuth settings | Zero app changes | | **User Experience** | Standard OAuth redirects | Transparent | | **Security Model** | Token-based with scopes | Header trust model | | **When to Use** | **Nextcloud, Gitea, modern apps** | **Static sites, legacy apps, whoami** | ### Consequences #### Positive - **Standards Compliance**: OAuth/OIDC uses industry-standard protocols - **Security**: Multiple authentication options with appropriate security models - **Flexibility**: Right tool for each service (OAuth when possible, forward auth when needed) - **Auditability**: Centralized authentication logging via Authentik - **User Experience**: Proper SSO across all services - **Token Security**: OAuth provides secure token refresh and scope management - **Graceful Degradation**: Forward auth available for services without OAuth support #### Negative - **Complexity**: Need to understand two authentication methods - **Configuration Overhead**: OAuth requires per-service configuration - **Single Point of Failure**: Authentik failure affects all services - **Learning Curve**: Team must understand OAuth flows and forward auth model #### Mitigation Strategies - **Documentation**: Clear decision guide for choosing OAuth vs forward auth - **Templates**: Reusable OAuth configuration templates for common services - **High Availability**: Robust deployment and monitoring of Authentik - **Monitoring**: Comprehensive monitoring of both authentication flows - **Testing**: Automated tests for authentication flows ### Security Considerations #### OAuth/OIDC Security ```yaml # Authentik OAuth2 Provider security settings authorization_code_validity: 60 # 1 minute access_code_validity: 3600 # 1 hour refresh_code_validity: 2592000 # 30 days include_claims_in_id_token: true signing_key: "authentik-default-key" sub_mode: "hashed_user_id" issuer_mode: "per_provider" ``` **Best Practices**: - Use PKCE for all OAuth flows (protection against interception) - Implement proper token rotation (refresh tokens expire and rotate) - Validate `aud` (audience) and `iss` (issuer) claims in JWT tokens - Use short-lived access tokens (1 hour) - Store client secrets securely (Ansible Vault) #### Forward Auth Security ```yaml # Authentik Proxy Provider security settings token_validity: 3600 # 1 hour session cookie_domain: ".jnss.me" skip_path_regex: "^/(health|metrics|static).*" ``` **Best Practices**: - Trust only Authentik-provided headers - Validate `Remote-User` header exists before granting access - Use HTTPS for all forward auth endpoints - Implement proper session timeouts - Strip user-provided authentication headers at proxy #### Access Control - **Group-Based Authorization**: Users assigned to groups, groups to applications - **Policy Engine**: Authentik policies for fine-grained access control - **MFA Requirements**: Multi-factor authentication for sensitive services - **IP-Based Restrictions**: Geographic or network-based access control - **Time-Based Access**: Temporary access grants via policies #### Audit Logging ```json { "timestamp": "2025-12-15T10:30:00Z", "event": "oauth_authorization", "user": "john.doe", "application": "nextcloud", "scopes": ["openid", "email", "profile", "groups"], "ip": "192.168.1.100", "user_agent": "Mozilla/5.0..." } ``` ### Implementation Examples by Service Type #### OAuth/OIDC Services (Primary Method) **Nextcloud**: ```caddyfile cloud.jnss.me { reverse_proxy localhost:8080 } # OAuth configured within Nextcloud application ``` **Gitea**: ```caddyfile git.jnss.me { reverse_proxy localhost:3000 } # OAuth configured within Gitea application settings ``` #### Forward Auth Services (Fallback Method) **Whoami (test/demo service)**: ```caddyfile whoami.jnss.me { forward_auth https://auth.jnss.me { uri /outpost.goauthentik.io/auth/caddy copy_headers Remote-User Remote-Name Remote-Email Remote-Groups } reverse_proxy localhost:8080 } ``` **Static Documentation Site**: ```caddyfile docs.jnss.me { forward_auth https://auth.jnss.me { uri /outpost.goauthentik.io/auth/caddy copy_headers Remote-User Remote-Groups } root * /var/www/docs file_server } ``` **Internal API (no OAuth support)**: ```caddyfile api.jnss.me { forward_auth https://auth.jnss.me { uri /outpost.goauthentik.io/auth/caddy copy_headers Remote-User Remote-Email Remote-Groups } reverse_proxy localhost:3000 } ``` #### Selective Protection (Public + Protected Paths) ```caddyfile app.jnss.me { # Public endpoints (no auth required) handle /health { reverse_proxy localhost:8080 } handle /metrics { reverse_proxy localhost:8080 } handle /public/* { reverse_proxy localhost:8080 } # Protected endpoints (forward auth) handle /admin/* { forward_auth https://auth.jnss.me { uri /outpost.goauthentik.io/auth/caddy copy_headers Remote-User Remote-Groups } reverse_proxy localhost:8080 } # Default: protected handle { forward_auth https://auth.jnss.me { uri /outpost.goauthentik.io/auth/caddy copy_headers Remote-User Remote-Groups } reverse_proxy localhost:8080 } } ``` ### Alternatives Considered 1. **OAuth2/OIDC Only**: Rejected because many services don't support OAuth natively 2. **Forward Auth Only**: Rejected because it doesn't leverage native OAuth support in modern apps 3. **Per-Service Authentication**: Rejected due to management overhead and inconsistent security 4. **Shared Database**: Rejected due to tight coupling between services 5. **VPN-Based Access**: Rejected due to operational complexity for web services 6. **SAML**: Rejected in favor of modern OAuth2/OIDC standards --- ## Rootful Containers with Infrastructure Fact Pattern **Technical Story**: Enable containerized applications to access native infrastructure services (PostgreSQL, Valkey) via Unix sockets with group-based permissions. ### Context Containerized applications need to access infrastructure services (PostgreSQL, Valkey) through Unix sockets with filesystem-based permission controls. The permission model requires: 1. **Socket directories** owned by service groups (`postgres-clients`, `valkey-clients`) 2. **Application users** added to these groups for access 3. **Container processes** must preserve group membership to access sockets Two approaches were evaluated: 1. **Rootless containers (user namespace)**: Containers run in user namespace with UID/GID remapping 2. **Rootful containers (system services)**: Containers run as dedicated system users without namespace isolation ### Decision We will use **rootful containers deployed as system-level systemd services** with an **Infrastructure Fact Pattern** where infrastructure roles export client group GIDs as Ansible facts for application consumption. ### Rationale #### Why Rootful Succeeds **Direct UID/GID Mapping**: ```bash # Host: authentik user UID 966, groups: 966 (authentik), 961 (valkey-clients), 962 (postgres-clients) # Container User=966:966 with PodmanArgs=--group-add 961 --group-add 962 # Inside container: id # uid=966(authentik) gid=966(authentik) groups=966(authentik),961(valkey-clients),962(postgres-clients) # Socket access works: ls -l /var/run/postgresql/.s.PGSQL.5432 # srwxrwx--- 1 postgres postgres-clients 0 ... /var/run/postgresql/.s.PGSQL.5432 ``` **Group membership preserved**: Container process has GIDs 961 and 962, matching socket group ownership. #### Why Rootless Failed (Discarded Approach) **User Namespace UID/GID Remapping**: ```bash # Host: authentik user UID 100000, subuid range 200000-265535 # Container User=%i:%i with --userns=host --group-add=keep-groups # User namespace remaps: # Host UID 100000 → Container UID 100000 (root in namespace) # Host GID 961 → Container GID 200961 (remapped into subgid range) # Host GID 962 → Container GID 200962 (remapped into subgid range) # Socket ownership on host: # srwxrwx--- 1 postgres postgres-clients (GID 962) # Container process groups: 200961, 200962 (remapped) # Socket expects: GID 962 (not remapped) # Result: Permission denied ❌ ``` **Root cause**: User namespace supplementary group remapping breaks group-based socket access even with `--userns=host`, `--group-add=keep-groups`, and `Annotation=run.oci.keep_original_groups=1`. ### Infrastructure Fact Pattern #### Infrastructure Roles Export GIDs Infrastructure services create client groups and export their GIDs as Ansible facts: ```yaml # PostgreSQL role: roles/postgresql/tasks/main.yml - name: Create PostgreSQL client access group group: name: postgres-clients system: true - name: Get PostgreSQL client group GID shell: "getent group postgres-clients | cut -d: -f3" register: postgresql_client_group_lookup changed_when: false - name: Set PostgreSQL client group GID as fact set_fact: postgresql_client_group_gid: "{{ postgresql_client_group_lookup.stdout }}" ``` ```yaml # Valkey role: roles/valkey/tasks/main.yml - name: Create Valkey client access group group: name: valkey-clients system: true - name: Get Valkey client group GID shell: "getent group valkey-clients | cut -d: -f3" register: valkey_client_group_lookup changed_when: false - name: Set Valkey client group GID as fact set_fact: valkey_client_group_gid: "{{ valkey_client_group_lookup.stdout }}" ``` #### Application Roles Consume Facts Application roles validate and consume infrastructure facts: ```yaml # Authentik role: roles/authentik/tasks/main.yml - name: Validate infrastructure facts are available assert: that: - postgresql_client_group_gid is defined - valkey_client_group_gid is defined fail_msg: | Required infrastructure facts are not available. Ensure PostgreSQL and Valkey roles have run first. - name: Create authentik user with infrastructure groups user: name: authentik groups: [postgres-clients, valkey-clients] append: true ``` ```ini # Container template: roles/authentik/templates/authentik-server.container [Container] User={{ authentik_uid }}:{{ authentik_gid }} PodmanArgs=--group-add {{ postgresql_client_group_gid }} --group-add {{ valkey_client_group_gid }} ``` ### Implementation Details #### System-Level Deployment ```ini # Quadlet files deployed to /etc/containers/systemd/ (not ~/.config/) # Pod: /etc/containers/systemd/authentik.pod [Unit] Description=Authentik Authentication Pod [Pod] PublishPort=0.0.0.0:9000:9000 ShmSize=256m [Service] Restart=always [Install] WantedBy=multi-user.target # System target, not default.target ``` ```ini # Container: /etc/containers/systemd/authentik-server.container [Container] User=966:966 PodmanArgs=--group-add 962 --group-add 961 Volume=/var/run/postgresql:/var/run/postgresql:Z Volume=/var/run/valkey:/var/run/valkey:Z ``` #### Service Management ```bash # System scope (not user scope) systemctl status authentik-pod systemctl restart authentik-server journalctl -u authentik-server -f # Verify container location systemctl status authentik-server | grep CGroup # CGroup: /system.slice/authentik-server.service ✓ ``` ### Special Case: Valkey Socket Group Fix Valkey doesn't natively support socket group configuration (unlike PostgreSQL's `unix_socket_group`). A helper service ensures correct socket permissions: ```ini # /etc/systemd/system/valkey-socket-fix.service [Unit] Description=Fix Valkey socket group ownership and permissions BindsTo=valkey.service After=valkey.service [Service] Type=oneshot ExecStart=/bin/sh -c 'i=0; while [ ! -S /var/run/valkey/valkey.sock ] && [ $i -lt 100 ]; do sleep 0.1; i=$((i+1)); done' ExecStart=/bin/chgrp valkey-clients /var/run/valkey/valkey.sock ExecStart=/bin/chmod 770 /var/run/valkey/valkey.sock RemainAfterExit=yes [Install] WantedBy=multi-user.target ``` Triggered by Valkey service: ```ini # /etc/systemd/system/valkey.service (excerpt) [Unit] Wants=valkey-socket-fix.service ``` ### Consequences #### Positive - **Socket Access Works**: Group-based permissions function correctly - **Security**: Containers run as dedicated users (not root), no privileged daemon - **Portability**: Dynamic GID facts work across different hosts - **Consistency**: Same pattern for all containerized applications - **Simplicity**: No user namespace complexity, standard systemd service management #### Negative - **Not "Pure" Rootless**: Containers require root for systemd service deployment - **Different from Docker**: Less familiar pattern than rootless user services #### Neutral - **System vs User Scope**: Different commands (`systemctl` vs `systemctl --user`) but equally capable - **Deployment Location**: `/etc/containers/systemd/` vs `~/.config/` but same Quadlet functionality ### Validation ```bash # Verify service location systemctl status authentik-server | grep CGroup # → /system.slice/authentik-server.service ✓ # Verify process groups ps aux | grep authentik | head -1 | awk '{print $2}' | \ xargs -I {} cat /proc/{}/status | grep Groups # → Groups: 961 962 966 ✓ # Verify socket permissions ls -l /var/run/postgresql/.s.PGSQL.5432 # → srwxrwx--- postgres postgres-clients ✓ ls -l /var/run/valkey/valkey.sock # → srwxrwx--- valkey valkey-clients ✓ # Verify HTTP endpoint curl -I http://127.0.0.1:9000/ # → HTTP/1.1 302 Found ✓ ``` ### Alternatives Considered 1. **Rootless with user namespace** - Discarded due to GID remapping breaking group-based socket access 2. **TCP-only connections** - Rejected to maintain Unix socket security and performance benefits 3. **Hardcoded GIDs** - Rejected for portability; facts provide dynamic resolution 4. **Directory permissions (777)** - Rejected for security; group-based access more restrictive. This is then later changed again to 777, due to Nextcloud switching from root to www-data, breaking group-based permissions. --- --- ## ADR-007: Multi-Environment Infrastructure Architecture **Date**: December 2025 **Status**: Accepted **Context**: Separation of homelab services from production client projects ### Decision Rick-infra will manage two separate environments with different purposes and uptime requirements: 1. **Homelab Environment** (arch-vps) - Purpose: Personal services and experimentation - Infrastructure: Full stack (PostgreSQL, Valkey, Podman, Caddy) - Services: Authentik, Nextcloud, Gitea - Uptime requirement: Best effort 2. **Production Environment** (mini-vps) - Purpose: Client projects requiring high uptime - Infrastructure: Minimal (Caddy only) - Services: Sigvild Gallery - Uptime requirement: High availability ### Rationale **Separation of Concerns**: - Personal experiments don't affect client services - Client services isolated from homelab maintenance - Clear distinction between environments in code **Infrastructure Optimization**: - Production runs minimal services (no PostgreSQL/Valkey overhead) - Homelab can be rebooted/upgraded without affecting clients - Cost optimization: smaller VPS for production **Operational Flexibility**: - Different backup strategies per environment - Different monitoring/alerting levels - Independent deployment schedules ### Implementation **Variable Organization**: ``` rick-infra/ ├── group_vars/ │ └── production/ # Production environment config │ ├── main.yml │ └── vault.yml ├── host_vars/ │ └── arch-vps/ # Homelab host config │ ├── main.yml │ └── vault.yml └── playbooks/ ├── homelab.yml # Homelab deployment ├── production.yml # Production deployment └── site.yml # Orchestrates both ``` **Playbook Structure**: - `site.yml` imports both homelab.yml and production.yml - Each playbook manually loads variables (Ansible 2.20 workaround) - Services deploy only to their designated environment **Inventory Groups**: ```yaml homelab: hosts: arch-vps: ansible_host: 69.62.119.31 production: hosts: mini-vps: ansible_host: 72.62.91.251 ``` ### Migration Example **Sigvild Gallery Migration** (December 2025): - **From**: arch-vps (homelab) - **To**: mini-vps (production) - **Reason**: Client project requiring higher uptime - **Process**: 1. Created backup on arch-vps 2. Deployed to mini-vps with automatic restore 3. Updated DNS (5 min downtime) 4. Removed from arch-vps configuration ### Consequences **Positive**: - Clear separation of personal vs. client services - Reduced blast radius for experiments - Optimized resource usage per environment - Independent scaling and management **Negative**: - Increased complexity in playbook organization - Need to manage multiple VPS instances - Ansible 2.20 variable loading requires workarounds - Duplicate infrastructure code (Caddy on both) **Neutral**: - Services can be migrated between environments with minimal friction - Backup/restore procedures work across environments - Group_vars vs. host_vars hybrid approach ### Future Considerations - Consider grouping multiple client projects on production VPS - Evaluate if homelab needs full infrastructure stack - Monitor for opportunities to share infrastructure between environments - Document migration procedures for moving services between environments ---