Files
rick-infra/docs/architecture-decisions.md
Joakim ecbeb07ba2 Migrate sigvild-gallery to production environment
- Add multi-environment architecture (homelab + production)
- Create production environment (mini-vps) for client projects
- Create homelab playbook for arch-vps services
- Create production playbook for mini-vps services
- Move sigvild-gallery from homelab to production
- Restructure variables: group_vars/production + host_vars/arch-vps
- Add backup-sigvild.yml playbook with auto-restore functionality
- Fix restore logic to check for data before creating directories
- Add manual variable loading workaround for Ansible 2.20
- Update all documentation for multi-environment setup
- Add ADR-007 documenting multi-environment architecture decision
2025-12-15 16:33:33 +01:00

44 KiB

Architecture Decision Records (ADR)

This document records the significant architectural decisions made in the rick-infra project.


Unix Socket IPC Architecture

Context

Containerized applications need to communicate with database and cache services. Communication methods include:

  1. Network TCP/IP: Standard network protocols
  2. Unix Domain Sockets: Filesystem-based IPC

Decision

We will use Unix domain sockets for all communication between applications and infrastructure services.

Rationale

Security Benefits

  • No Network Exposure: Infrastructure services bind only to Unix sockets
    # PostgreSQL configuration
    listen_addresses = ''                    # No TCP binding
    unix_socket_directories = '/var/run/postgresql'
    
    # Valkey configuration  
    port 0                                   # Disable TCP port
    unixsocket /var/run/valkey/valkey.sock
    
  • Filesystem Permissions: Access controlled by Unix file permissions
    srwxrwx--- 1 postgres postgres 0 /var/run/postgresql/.s.PGSQL.5432
    srwxrwx--- 1 valkey   valkey   0 /var/run/valkey/valkey.sock
    
  • Group-Based Access: Simple group membership controls access
    # Add application user to infrastructure groups
    usermod -a -G postgres,valkey authentik
    
  • No Network Scanning: Services invisible to network reconnaissance

Performance Advantages

  • Lower Latency: Unix sockets have ~20% lower latency than TCP loopback
  • Higher Throughput: Up to 40% higher throughput for local communication
  • Reduced CPU Overhead: No network stack processing required
  • Efficient Data Transfer: Direct kernel-level data copying

Operational Benefits

  • Connection Reliability: Filesystem-based connections are more reliable
  • Resource Monitoring: Standard filesystem monitoring applies
  • Backup Friendly: No network configuration to backup/restore
  • Debugging: Standard filesystem tools for troubleshooting

Implementation Strategy

Container Socket Access

# Container configuration (Quadlet)
[Container]
# Mount socket directories with proper labels
Volume=/var/run/postgresql:/var/run/postgresql:Z
Volume=/var/run/valkey:/var/run/valkey:Z

# Preserve user namespace and groups
PodmanArgs=--userns=host
Annotation=run.oci.keep_original_groups=1

Application Configuration

# Database connection (PostgreSQL)
DATABASE_URL=postgresql://authentik@/authentik?host=/var/run/postgresql

# Cache connection (Redis/Valkey) 
CACHE_URL=unix:///var/run/valkey/valkey.sock?db=1&password=secret

User Management

# Ansible user setup
- name: Add application user to infrastructure groups
  user:
    name: "{{ app_user }}"
    groups:
      - postgres  # For database access
      - valkey    # For cache access
    append: true

Consequences

Positive

  • Security: Eliminated network attack vectors for databases
  • Performance: Measurably faster database and cache operations
  • Reliability: More stable connections than network-based
  • Simplicity: Simpler configuration than network + authentication

Negative

  • Container Complexity: Requires careful container user/group management
  • Learning Curve: Less familiar than standard TCP connections
  • Port Forwarding: Cannot use standard port forwarding for debugging

Mitigation Strategies

  • Documentation: Comprehensive guides for Unix socket configuration
  • Testing: Automated tests verify socket connectivity
  • Tooling: Helper scripts for debugging socket connections

Technical Implementation

# Test socket connectivity
sudo -u authentik psql -h /var/run/postgresql -U authentik -d authentik
sudo -u authentik redis-cli -s /var/run/valkey/valkey.sock ping

# Container user verification
podman exec authentik-server id
# uid=963(authentik) gid=963(authentik) groups=963(authentik),968(postgres),965(valkey)

Alternatives Considered

  1. TCP with Authentication: Rejected due to network exposure
  2. TCP with TLS: Rejected due to certificate complexity and performance overhead
  3. Shared Memory: Rejected due to implementation complexity

ADR-003: Podman + systemd Container Orchestration

Technical Story: Container orchestration solution for secure application deployment with systemd integration.

Context

Container orchestration options for a single-node infrastructure:

  1. Docker + Docker Compose: Traditional container orchestration
  2. Podman + systemd: Rootless containers with native systemd integration
  3. Kubernetes: Full orchestration platform (overkill for single node)
  4. Nomad: HashiCorp orchestration solution

Decision

We will use Podman with systemd integration (Quadlet) for container orchestration, deployed as system-level services (rootful containers running as dedicated users).

Rationale

Security Advantages

  • No Daemon Required: No privileged daemon attack surface
    # Docker: Requires root daemon
    sudo systemctl status docker
    
    # Podman: Daemonless operation
    podman ps  # No daemon needed
    
  • Dedicated Service Users: Containers run as dedicated system users (not root)
  • Group-Based Access Control: Unix group membership controls infrastructure access
  • SELinux Integration: Better SELinux support than Docker

systemd Integration Benefits

  • Native Service Management: Containers as system-level systemd services
    # Quadlet file: /etc/containers/systemd/authentik.pod
    [Unit]
    Description=Authentik Authentication Pod
    
    [Pod]
    PublishPort=0.0.0.0:9000:9000
    ShmSize=256m
    
    [Service]  
    Restart=always
    TimeoutStartSec=900
    
    [Install]
    WantedBy=multi-user.target
    
  • Dependency Management: systemd handles service dependencies
  • Resource Control: systemd resource limits and monitoring
  • Logging Integration: journald for centralized logging

Operational Excellence

  • Familiar Tooling: Standard systemd commands
    systemctl status authentik-pod
    systemctl restart authentik-server
    journalctl -u authentik-server -f
    
  • Boot Integration: Services start automatically at system boot
  • Resource Monitoring: systemd resource tracking
  • Configuration Management: Declarative Quadlet files

Performance Benefits

  • Lower Overhead: No daemon overhead for container management
  • Direct Kernel Access: Better performance than daemon-based solutions
  • Resource Efficiency: More efficient resource utilization

Implementation Architecture

┌─────────────────────────────────────────────────────────────┐
│ systemd System Services (/system.slice/)                   │
│                                                             │
│ ┌─────────────────┐  ┌─────────────────┐  ┌───────────────┐ │
│ │ authentik-pod   │  │ authentik-server│  │authentik-worker│ │
│ │ .service        │  │ .service        │  │ .service       │ │
│ └─────────────────┘  └─────────────────┘  └───────────────┘ │
│           │                    │                    │       │
│           └────────────────────┼────────────────────┘       │
│                                │                            │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Podman Pod (rootful, dedicated user)                    │ │
│ │                                                         │ │
│ │ ┌─────────────────┐  ┌─────────────────────────────────┐ │ │
│ │ │ Server Container│  │ Worker Container                │ │ │
│ │ │ User: 966:966   │  │ User: 966:966                  │ │ │
│ │ │ Groups: 961,962 │  │ Groups: 961,962                │ │ │
│ │ │ (valkey,postgres)│ │ (valkey,postgres)              │ │ │
│ │ └─────────────────┘  └─────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
           │                                                  
           │ Group-based access to infrastructure            
           ▼                                                  
┌─────────────────────────────────────────────────────────────┐
│ Infrastructure Services                                     │
│ PostgreSQL: /var/run/postgresql (postgres:postgres-clients)│
│ Valkey: /var/run/valkey (valkey:valkey-clients)           │
└─────────────────────────────────────────────────────────────┘

Quadlet Configuration

# Pod configuration (authentik.pod)
[Unit]
Description=Authentik Authentication Pod

[Pod]
PublishPort=127.0.0.1:9000:9000
ShmSize=256m

[Service]
Restart=always

[Install]
WantedBy=default.target
# Container configuration (authentik-server.container)
[Unit]
Description=Authentik Server Container
After=authentik-pod.service
Requires=authentik-pod.service

[Container]
ContainerName=authentik-server
Image=ghcr.io/goauthentik/server:2025.10
Pod=authentik.pod
EnvironmentFile=/opt/authentik/.env
User=966:966
PodmanArgs=--group-add 962 --group-add 961

# Volume mounts for sockets and data
Volume=/opt/authentik/media:/media
Volume=/opt/authentik/data:/data
Volume=/var/run/postgresql:/var/run/postgresql:Z
Volume=/var/run/valkey:/var/run/valkey:Z

[Service]
Restart=always
TimeoutStartSec=300

[Install]
WantedBy=multi-user.target

User Management Strategy

# Ansible implementation
- name: Create service user
  user:
    name: authentik
    group: authentik
    groups: [postgres-clients, valkey-clients]
    system: true
    shell: /bin/bash
    home: /opt/authentik
    create_home: true
    append: true

Note: Infrastructure roles (PostgreSQL, Valkey) export client group GIDs as Ansible facts (postgresql_client_group_gid, valkey_client_group_gid) which are consumed by application container templates for dynamic --group-add arguments.

Consequences

Positive

  • Security: Eliminated privileged daemon attack surface
  • Integration: Seamless systemd integration for management
  • Performance: Lower overhead than daemon-based solutions
  • Reliability: systemd's proven service management
  • Monitoring: Standard systemd monitoring and logging

Negative

  • Learning Curve: Different from Docker Compose workflows
  • Tooling: Ecosystem less mature than Docker
  • Documentation: Fewer online resources and examples

Mitigation Strategies

  • Documentation: Comprehensive internal documentation
  • Training: Team training on Podman/systemd workflows
  • Tooling: Helper scripts for common operations

Technical Implementation

# Container management (system scope)
systemctl status authentik-pod
systemctl restart authentik-server
podman ps
podman logs authentik-server

# Resource monitoring
systemctl show authentik-server --property=MemoryCurrent
journalctl -u authentik-server -f

# Verify container groups
ps aux | grep authentik-server | head -1 | awk '{print $2}' | \
  xargs -I {} cat /proc/{}/status | grep Groups
# Output: Groups: 961 962 966

Alternatives Considered

  1. Docker + Docker Compose: Rejected due to security concerns (privileged daemon)
  2. Kubernetes: Rejected as overkill for single-node deployment
  3. Nomad: Rejected to maintain consistency with systemd ecosystem

OAuth/OIDC and Forward Authentication Security Model

Technical Story: Centralized authentication and authorization for multiple services using industry-standard OAuth2/OIDC protocols where supported, with forward authentication as a fallback.

Context

Authentication strategies for multiple services:

  1. Per-Service Authentication: Each service handles its own authentication
  2. Shared Database: Services share authentication database
  3. OAuth2/OIDC Integration: Services implement standard OAuth2/OIDC clients
  4. Forward Authentication: Reverse proxy handles authentication for services without OAuth support

Decision

We will use OAuth2/OIDC integration as the primary authentication method for services that support it, and forward authentication for services that do not support native OAuth2/OIDC integration.

Rationale

OAuth/OIDC as Primary Method

Security Benefits:

  • Standard Protocol: Industry-standard authentication flow (RFC 6749, RFC 7636)
  • Token-Based Security: Secure JWT tokens with cryptographic signatures
  • Proper Session Management: Native application session handling with refresh tokens
  • Scope-Based Authorization: Fine-grained permission control via OAuth scopes
  • PKCE Support: Protection against authorization code interception attacks

Integration Benefits:

  • Native Support: Applications designed for OAuth/OIDC work seamlessly
  • Better UX: Proper redirect flows, logout handling, and token refresh
  • API Access: OAuth tokens enable secure API integrations
  • Standard Claims: OpenID Connect user info endpoint provides standardized user data
  • Multi-Application SSO: Proper single sign-on with token sharing

Examples: Nextcloud, Gitea, Grafana, many modern applications

Forward Auth as Fallback

Use Cases:

  • Services without OAuth/OIDC support
  • Legacy applications that cannot be modified
  • Static sites requiring authentication
  • Simple internal tools

Security Benefits:

  • Zero Application Changes: Protect existing services without modification
  • Header-Based Identity: Simple identity propagation to backend
  • Transparent Protection: Services receive pre-authenticated requests

Limitations:

  • Non-Standard: Not using industry-standard authentication protocols
  • Proxy Dependency: All requests must flow through authenticating proxy
  • Limited Logout: Complex logout scenarios across services
  • Header Trust: Backend must trust proxy-provided headers

Shared Benefits (Both Methods)

  • Single Point of Control: Centralized authentication policy via Authentik
  • Consistent Security: Same authentication provider across all services
  • Multi-Factor Authentication: MFA applied consistently via Authentik
  • Audit Trail: Centralized authentication logging
  • User Management: One system for all user administration

Implementation Architecture

OAuth/OIDC Flow (Primary Method)

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│    User     │    │   Service   │    │  Authentik  │
│             │    │ (OAuth App) │    │   (IdP)     │
└──────┬──────┘    └──────┬──────┘    └──────┬──────┘
       │                  │                  │
       │ Access Service   │                  │
       │─────────────────▶│                  │
       │                  │                  │
       │                  │ No session       │
       │ 302 → OAuth      │                  │
       │◀─────────────────│                  │
       │                  │                  │
       │ GET /authorize?client_id=...&redirect_uri=...
       │──────────────────────────────────────▶│
       │                  │                  │
       │ Login form (if not authenticated)   │
       │◀────────────────────────────────────│
       │                  │                  │
       │ Credentials      │                  │
       │─────────────────────────────────────▶│
       │                  │                  │
       │ 302 → callback?code=AUTH_CODE       │
       │◀────────────────────────────────────│
       │                  │                  │
       │ GET /callback?code=AUTH_CODE        │
       │─────────────────▶│                  │
       │                  │                  │
       │                  │ POST /token      │
       │                  │  code=AUTH_CODE  │
       │                  │─────────────────▶│
       │                  │                  │
       │                  │ access_token     │
       │                  │ id_token (JWT)   │
       │                  │◀─────────────────│
       │                  │                  │
       │ Set-Cookie       │ GET /userinfo    │
       │ 302 → /dashboard │─────────────────▶│
       │◀─────────────────│                  │
       │                  │ User claims      │
       │                  │◀─────────────────│
       │                  │                  │
       │ GET /dashboard   │                  │
       │─────────────────▶│                  │
       │                  │                  │
       │ Dashboard        │                  │
       │◀─────────────────│                  │

Forward Auth Flow (Fallback Method)

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│    User     │    │    Caddy    │    │  Authentik  │    │   Service   │
│             │    │  (Proxy)    │    │  (Forward)  │    │ (Backend)   │
└──────┬──────┘    └──────┬──────┘    └──────┬──────┘    └──────┬──────┘
       │                  │                  │                  │
       │ GET /           │                  │                  │
       │─────────────────▶│                  │                  │
       │                  │                  │                  │
       │                  │ Forward Auth     │                  │
       │                  │─────────────────▶│                  │
       │                  │                  │                  │
       │                  │ 401 Unauthorized │                  │
       │                  │◀─────────────────│                  │
       │                  │                  │                  │
       │ 302 → /auth      │                  │                  │
       │◀─────────────────│                  │                  │
       │                  │                  │                  │
       │ Login form       │                  │                  │
       │──────────────────────────────────────▶│                  │
       │                  │                  │                  │
       │ Credentials      │                  │                  │
       │──────────────────────────────────────▶│                  │
       │                  │                  │                  │
       │ Set-Cookie       │                  │                  │
       │◀──────────────────────────────────────│                  │
       │                  │                  │                  │
       │ GET /           │                  │                  │
       │─────────────────▶│                  │                  │
       │                  │                  │                  │
       │                  │ Forward Auth     │                  │
       │                  │─────────────────▶│                  │
       │                  │                  │                  │
       │                  │ 200 + Headers    │                  │
       │                  │◀─────────────────│                  │
       │                  │                  │                  │
       │                  │ Proxy + Headers  │                  │
       │                  │─────────────────────────────────────▶│
       │                  │                  │                  │
       │                  │ Response         │                  │
       │                  │◀─────────────────────────────────────│
       │                  │                  │                  │
       │ Content          │                  │                  │
       │◀─────────────────│                  │                  │

OAuth/OIDC Configuration Examples

Nextcloud OAuth Configuration

// Nextcloud config.php
'oidc_login_provider_url' => 'https://auth.jnss.me/application/o/nextcloud/',
'oidc_login_client_id' => 'nextcloud-client-id',
'oidc_login_client_secret' => 'secret-from-authentik',
'oidc_login_auto_redirect' => true,
'oidc_login_end_session_redirect' => true,
'oidc_login_button_text' => 'Login with SSO',
'oidc_login_hide_password_form' => true,
'oidc_login_use_id_token' => true,
'oidc_login_attributes' => [
    'id' => 'preferred_username',
    'name' => 'name',
    'mail' => 'email',
    'groups' => 'groups',
],
'oidc_login_default_group' => 'users',
'oidc_login_use_external_storage' => false,
'oidc_login_scope' => 'openid profile email groups',
'oidc_login_proxy_ldap' => false,
'oidc_login_disable_registration' => false,
'oidc_login_redir_fallback' => true,
'oidc_login_tls_verify' => true,

Gitea OAuth Configuration

# Gitea app.ini
[openid]
ENABLE_OPENID_SIGNIN = false
ENABLE_OPENID_SIGNUP = false

[oauth2_client]
REGISTER_EMAIL_CONFIRM = false
OPENID_CONNECT_SCOPES = openid email profile groups
ENABLE_AUTO_REGISTRATION = true
USERNAME = preferred_username
EMAIL = email
ACCOUNT_LINKING = auto

Authentik Provider Configuration (Gitea):

  • Provider Type: OAuth2/OpenID Provider
  • Client ID: gitea
  • Client Secret: Generated by Authentik
  • Redirect URIs: https://git.jnss.me/user/oauth2/Authentik/callback
  • Scopes: openid, profile, email, groups

Authentik OAuth2 Provider Settings

# OAuth2/OIDC Provider configuration in Authentik
name: "Nextcloud OAuth Provider"
authorization_flow: "default-authorization-flow"
client_type: "confidential"
client_id: "nextcloud-client-id"
redirect_uris: "https://cloud.jnss.me/apps/oidc_login/oidc"
signing_key: "authentik-default-key"
property_mappings:
  - "authentik default OAuth Mapping: OpenID 'openid'"
  - "authentik default OAuth Mapping: OpenID 'email'"
  - "authentik default OAuth Mapping: OpenID 'profile'"
  - "Custom: Groups"  # Maps user groups to 'groups' claim

Forward Auth Configuration Examples

Caddy Configuration for Forward Auth

# whoami service with forward authentication
whoami.jnss.me {
    # Forward authentication to Authentik
    forward_auth https://auth.jnss.me {
        uri /outpost.goauthentik.io/auth/caddy
        copy_headers Remote-User Remote-Name Remote-Email Remote-Groups
    }
    
    # Backend service (receives authenticated requests)
    reverse_proxy localhost:8080
}

Authentik Proxy Provider Configuration

# Authentik Proxy Provider for forward auth
name: "Whoami Forward Auth"
type: "proxy"
authorization_flow: "default-authorization-flow"
external_host: "https://whoami.jnss.me"
internal_host: "http://localhost:8080"
skip_path_regex: "^/(health|metrics).*"
mode: "forward_single"  # Single application mode

Service Integration (Forward Auth)

Services receive authentication information via HTTP headers:

# Example service code (Python Flask)
@app.route('/')
def index():
    username = request.headers.get('Remote-User')
    name = request.headers.get('Remote-Name') 
    email = request.headers.get('Remote-Email')
    groups = request.headers.get('Remote-Groups', '').split(',')
    
    return render_template('index.html', 
                         username=username, 
                         name=name,
                         email=email,
                         groups=groups)

Authorization Policies

Both OAuth and Forward Auth support Authentik authorization policies:

# Example authorization policy in Authentik
policy_bindings:
  - policy: "group_admins_only"
    target: "nextcloud_oauth_provider"
    order: 0
  
  - policy: "require_mfa" 
    target: "gitea_oauth_provider"
    order: 1
    
  - policy: "internal_network_only"
    target: "whoami_proxy_provider"
    order: 0

Decision Matrix: OAuth/OIDC vs Forward Auth

Criteria OAuth/OIDC Forward Auth
Application Support Requires native OAuth/OIDC support Any application
Protocol Standard Industry standard (RFC 6749, 7636) Proprietary/custom
Token Management Native refresh tokens, proper expiry Session-based only
Logout Handling Proper logout flow Complex, proxy-dependent
API Access Full API support via tokens Header-only
Implementation Effort Configure OAuth settings Zero app changes
User Experience Standard OAuth redirects Transparent
Security Model Token-based with scopes Header trust model
When to Use Nextcloud, Gitea, modern apps Static sites, legacy apps, whoami

Consequences

Positive

  • Standards Compliance: OAuth/OIDC uses industry-standard protocols
  • Security: Multiple authentication options with appropriate security models
  • Flexibility: Right tool for each service (OAuth when possible, forward auth when needed)
  • Auditability: Centralized authentication logging via Authentik
  • User Experience: Proper SSO across all services
  • Token Security: OAuth provides secure token refresh and scope management
  • Graceful Degradation: Forward auth available for services without OAuth support

Negative

  • Complexity: Need to understand two authentication methods
  • Configuration Overhead: OAuth requires per-service configuration
  • Single Point of Failure: Authentik failure affects all services
  • Learning Curve: Team must understand OAuth flows and forward auth model

Mitigation Strategies

  • Documentation: Clear decision guide for choosing OAuth vs forward auth
  • Templates: Reusable OAuth configuration templates for common services
  • High Availability: Robust deployment and monitoring of Authentik
  • Monitoring: Comprehensive monitoring of both authentication flows
  • Testing: Automated tests for authentication flows

Security Considerations

OAuth/OIDC Security

# Authentik OAuth2 Provider security settings
authorization_code_validity: 60  # 1 minute
access_code_validity: 3600       # 1 hour
refresh_code_validity: 2592000   # 30 days
include_claims_in_id_token: true
signing_key: "authentik-default-key"
sub_mode: "hashed_user_id"
issuer_mode: "per_provider"

Best Practices:

  • Use PKCE for all OAuth flows (protection against interception)
  • Implement proper token rotation (refresh tokens expire and rotate)
  • Validate aud (audience) and iss (issuer) claims in JWT tokens
  • Use short-lived access tokens (1 hour)
  • Store client secrets securely (Ansible Vault)

Forward Auth Security

# Authentik Proxy Provider security settings
token_validity: 3600  # 1 hour session
cookie_domain: ".jnss.me"
skip_path_regex: "^/(health|metrics|static).*"

Best Practices:

  • Trust only Authentik-provided headers
  • Validate Remote-User header exists before granting access
  • Use HTTPS for all forward auth endpoints
  • Implement proper session timeouts
  • Strip user-provided authentication headers at proxy

Access Control

  • Group-Based Authorization: Users assigned to groups, groups to applications
  • Policy Engine: Authentik policies for fine-grained access control
  • MFA Requirements: Multi-factor authentication for sensitive services
  • IP-Based Restrictions: Geographic or network-based access control
  • Time-Based Access: Temporary access grants via policies

Audit Logging

{
  "timestamp": "2025-12-15T10:30:00Z",
  "event": "oauth_authorization",
  "user": "john.doe",
  "application": "nextcloud",
  "scopes": ["openid", "email", "profile", "groups"],
  "ip": "192.168.1.100",
  "user_agent": "Mozilla/5.0..."
}

Implementation Examples by Service Type

OAuth/OIDC Services (Primary Method)

Nextcloud:

cloud.jnss.me {
    reverse_proxy localhost:8080
}
# OAuth configured within Nextcloud application

Gitea:

git.jnss.me {
    reverse_proxy localhost:3000
}
# OAuth configured within Gitea application settings

Forward Auth Services (Fallback Method)

Whoami (test/demo service):

whoami.jnss.me {
    forward_auth https://auth.jnss.me {
        uri /outpost.goauthentik.io/auth/caddy
        copy_headers Remote-User Remote-Name Remote-Email Remote-Groups
    }
    reverse_proxy localhost:8080
}

Static Documentation Site:

docs.jnss.me {
    forward_auth https://auth.jnss.me {
        uri /outpost.goauthentik.io/auth/caddy
        copy_headers Remote-User Remote-Groups
    }
    
    root * /var/www/docs
    file_server
}

Internal API (no OAuth support):

api.jnss.me {
    forward_auth https://auth.jnss.me {
        uri /outpost.goauthentik.io/auth/caddy
        copy_headers Remote-User Remote-Email Remote-Groups
    }
    reverse_proxy localhost:3000
}

Selective Protection (Public + Protected Paths)

app.jnss.me {
    # Public endpoints (no auth required)
    handle /health {
        reverse_proxy localhost:8080
    }
    
    handle /metrics {
        reverse_proxy localhost:8080
    }
    
    handle /public/* {
        reverse_proxy localhost:8080  
    }
    
    # Protected endpoints (forward auth)
    handle /admin/* {
        forward_auth https://auth.jnss.me {
            uri /outpost.goauthentik.io/auth/caddy
            copy_headers Remote-User Remote-Groups
        }
        reverse_proxy localhost:8080
    }
    
    # Default: protected
    handle {
        forward_auth https://auth.jnss.me {
            uri /outpost.goauthentik.io/auth/caddy
            copy_headers Remote-User Remote-Groups
        }
        reverse_proxy localhost:8080
    }
}

Alternatives Considered

  1. OAuth2/OIDC Only: Rejected because many services don't support OAuth natively
  2. Forward Auth Only: Rejected because it doesn't leverage native OAuth support in modern apps
  3. Per-Service Authentication: Rejected due to management overhead and inconsistent security
  4. Shared Database: Rejected due to tight coupling between services
  5. VPN-Based Access: Rejected due to operational complexity for web services
  6. SAML: Rejected in favor of modern OAuth2/OIDC standards

Rootful Containers with Infrastructure Fact Pattern

Technical Story: Enable containerized applications to access native infrastructure services (PostgreSQL, Valkey) via Unix sockets with group-based permissions.

Context

Containerized applications need to access infrastructure services (PostgreSQL, Valkey) through Unix sockets with filesystem-based permission controls. The permission model requires:

  1. Socket directories owned by service groups (postgres-clients, valkey-clients)
  2. Application users added to these groups for access
  3. Container processes must preserve group membership to access sockets

Two approaches were evaluated:

  1. Rootless containers (user namespace): Containers run in user namespace with UID/GID remapping
  2. Rootful containers (system services): Containers run as dedicated system users without namespace isolation

Decision

We will use rootful containers deployed as system-level systemd services with an Infrastructure Fact Pattern where infrastructure roles export client group GIDs as Ansible facts for application consumption.

Rationale

Why Rootful Succeeds

Direct UID/GID Mapping:

# Host: authentik user UID 966, groups: 966 (authentik), 961 (valkey-clients), 962 (postgres-clients)
# Container User=966:966 with PodmanArgs=--group-add 961 --group-add 962

# Inside container:
id
# uid=966(authentik) gid=966(authentik) groups=966(authentik),961(valkey-clients),962(postgres-clients)

# Socket access works:
ls -l /var/run/postgresql/.s.PGSQL.5432
# srwxrwx--- 1 postgres postgres-clients 0 ... /var/run/postgresql/.s.PGSQL.5432

Group membership preserved: Container process has GIDs 961 and 962, matching socket group ownership.

Why Rootless Failed (Discarded Approach)

User Namespace UID/GID Remapping:

# Host: authentik user UID 100000, subuid range 200000-265535
# Container User=%i:%i with --userns=host --group-add=keep-groups

# User namespace remaps:
# Host UID 100000 → Container UID 100000 (root in namespace)
# Host GID 961 → Container GID 200961 (remapped into subgid range)
# Host GID 962 → Container GID 200962 (remapped into subgid range)

# Socket ownership on host:
# srwxrwx--- 1 postgres postgres-clients (GID 962)

# Container process groups: 200961, 200962 (remapped)
# Socket expects: GID 962 (not remapped)
# Result: Permission denied ❌

Root cause: User namespace supplementary group remapping breaks group-based socket access even with --userns=host, --group-add=keep-groups, and Annotation=run.oci.keep_original_groups=1.

Infrastructure Fact Pattern

Infrastructure Roles Export GIDs

Infrastructure services create client groups and export their GIDs as Ansible facts:

# PostgreSQL role: roles/postgresql/tasks/main.yml
- name: Create PostgreSQL client access group
  group:
    name: postgres-clients
    system: true

- name: Get PostgreSQL client group GID
  shell: "getent group postgres-clients | cut -d: -f3"
  register: postgresql_client_group_lookup
  changed_when: false

- name: Set PostgreSQL client group GID as fact
  set_fact:
    postgresql_client_group_gid: "{{ postgresql_client_group_lookup.stdout }}"
# Valkey role: roles/valkey/tasks/main.yml
- name: Create Valkey client access group
  group:
    name: valkey-clients
    system: true

- name: Get Valkey client group GID
  shell: "getent group valkey-clients | cut -d: -f3"
  register: valkey_client_group_lookup
  changed_when: false

- name: Set Valkey client group GID as fact
  set_fact:
    valkey_client_group_gid: "{{ valkey_client_group_lookup.stdout }}"

Application Roles Consume Facts

Application roles validate and consume infrastructure facts:

# Authentik role: roles/authentik/tasks/main.yml
- name: Validate infrastructure facts are available
  assert:
    that:
      - postgresql_client_group_gid is defined
      - valkey_client_group_gid is defined
    fail_msg: |
      Required infrastructure facts are not available.
      Ensure PostgreSQL and Valkey roles have run first.

- name: Create authentik user with infrastructure groups
  user:
    name: authentik
    groups: [postgres-clients, valkey-clients]
    append: true
# Container template: roles/authentik/templates/authentik-server.container
[Container]
User={{ authentik_uid }}:{{ authentik_gid }}
PodmanArgs=--group-add {{ postgresql_client_group_gid }} --group-add {{ valkey_client_group_gid }}

Implementation Details

System-Level Deployment

# Quadlet files deployed to /etc/containers/systemd/ (not ~/.config/)
# Pod: /etc/containers/systemd/authentik.pod
[Unit]
Description=Authentik Authentication Pod

[Pod]
PublishPort=0.0.0.0:9000:9000
ShmSize=256m

[Service]
Restart=always

[Install]
WantedBy=multi-user.target  # System target, not default.target
# Container: /etc/containers/systemd/authentik-server.container
[Container]
User=966:966
PodmanArgs=--group-add 962 --group-add 961

Volume=/var/run/postgresql:/var/run/postgresql:Z
Volume=/var/run/valkey:/var/run/valkey:Z

Service Management

# System scope (not user scope)
systemctl status authentik-pod
systemctl restart authentik-server
journalctl -u authentik-server -f

# Verify container location
systemctl status authentik-server | grep CGroup
# CGroup: /system.slice/authentik-server.service ✓

Special Case: Valkey Socket Group Fix

Valkey doesn't natively support socket group configuration (unlike PostgreSQL's unix_socket_group). A helper service ensures correct socket permissions:

# /etc/systemd/system/valkey-socket-fix.service
[Unit]
Description=Fix Valkey socket group ownership and permissions
BindsTo=valkey.service
After=valkey.service

[Service]
Type=oneshot
ExecStart=/bin/sh -c 'i=0; while [ ! -S /var/run/valkey/valkey.sock ] && [ $i -lt 100 ]; do sleep 0.1; i=$((i+1)); done'
ExecStart=/bin/chgrp valkey-clients /var/run/valkey/valkey.sock
ExecStart=/bin/chmod 770 /var/run/valkey/valkey.sock
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

Triggered by Valkey service:

# /etc/systemd/system/valkey.service (excerpt)
[Unit]
Wants=valkey-socket-fix.service

Consequences

Positive

  • Socket Access Works: Group-based permissions function correctly
  • Security: Containers run as dedicated users (not root), no privileged daemon
  • Portability: Dynamic GID facts work across different hosts
  • Consistency: Same pattern for all containerized applications
  • Simplicity: No user namespace complexity, standard systemd service management

Negative

  • Not "Pure" Rootless: Containers require root for systemd service deployment
  • Different from Docker: Less familiar pattern than rootless user services

Neutral

  • System vs User Scope: Different commands (systemctl vs systemctl --user) but equally capable
  • Deployment Location: /etc/containers/systemd/ vs ~/.config/ but same Quadlet functionality

Validation

# Verify service location
systemctl status authentik-server | grep CGroup
# → /system.slice/authentik-server.service ✓

# Verify process groups
ps aux | grep authentik | head -1 | awk '{print $2}' | \
  xargs -I {} cat /proc/{}/status | grep Groups
# → Groups: 961 962 966 ✓

# Verify socket permissions
ls -l /var/run/postgresql/.s.PGSQL.5432
# → srwxrwx--- postgres postgres-clients ✓

ls -l /var/run/valkey/valkey.sock
# → srwxrwx--- valkey valkey-clients ✓

# Verify HTTP endpoint
curl -I http://127.0.0.1:9000/
# → HTTP/1.1 302 Found ✓

Alternatives Considered

  1. Rootless with user namespace - Discarded due to GID remapping breaking group-based socket access
  2. TCP-only connections - Rejected to maintain Unix socket security and performance benefits
  3. Hardcoded GIDs - Rejected for portability; facts provide dynamic resolution
  4. Directory permissions (777) - Rejected for security; group-based access more restrictive. This is then later changed again to 777, due to Nextcloud switching from root to www-data, breaking group-based permissions.


ADR-007: Multi-Environment Infrastructure Architecture

Date: December 2025
Status: Accepted
Context: Separation of homelab services from production client projects

Decision

Rick-infra will manage two separate environments with different purposes and uptime requirements:

  1. Homelab Environment (arch-vps)

    • Purpose: Personal services and experimentation
    • Infrastructure: Full stack (PostgreSQL, Valkey, Podman, Caddy)
    • Services: Authentik, Nextcloud, Gitea
    • Uptime requirement: Best effort
  2. Production Environment (mini-vps)

    • Purpose: Client projects requiring high uptime
    • Infrastructure: Minimal (Caddy only)
    • Services: Sigvild Gallery
    • Uptime requirement: High availability

Rationale

Separation of Concerns:

  • Personal experiments don't affect client services
  • Client services isolated from homelab maintenance
  • Clear distinction between environments in code

Infrastructure Optimization:

  • Production runs minimal services (no PostgreSQL/Valkey overhead)
  • Homelab can be rebooted/upgraded without affecting clients
  • Cost optimization: smaller VPS for production

Operational Flexibility:

  • Different backup strategies per environment
  • Different monitoring/alerting levels
  • Independent deployment schedules

Implementation

Variable Organization:

rick-infra/
├── group_vars/
│   └── production/        # Production environment config
│       ├── main.yml
│       └── vault.yml
├── host_vars/
│   └── arch-vps/          # Homelab host config
│       ├── main.yml
│       └── vault.yml
└── playbooks/
    ├── homelab.yml        # Homelab deployment
    ├── production.yml     # Production deployment
    └── site.yml           # Orchestrates both

Playbook Structure:

  • site.yml imports both homelab.yml and production.yml
  • Each playbook manually loads variables (Ansible 2.20 workaround)
  • Services deploy only to their designated environment

Inventory Groups:

homelab:
  hosts:
    arch-vps:
      ansible_host: 69.62.119.31

production:
  hosts:
    mini-vps:
      ansible_host: 72.62.91.251

Migration Example

Sigvild Gallery Migration (December 2025):

  • From: arch-vps (homelab)
  • To: mini-vps (production)
  • Reason: Client project requiring higher uptime
  • Process:
    1. Created backup on arch-vps
    2. Deployed to mini-vps with automatic restore
    3. Updated DNS (5 min downtime)
    4. Removed from arch-vps configuration

Consequences

Positive:

  • Clear separation of personal vs. client services
  • Reduced blast radius for experiments
  • Optimized resource usage per environment
  • Independent scaling and management

Negative:

  • Increased complexity in playbook organization
  • Need to manage multiple VPS instances
  • Ansible 2.20 variable loading requires workarounds
  • Duplicate infrastructure code (Caddy on both)

Neutral:

  • Services can be migrated between environments with minimal friction
  • Backup/restore procedures work across environments
  • Group_vars vs. host_vars hybrid approach

Future Considerations

  • Consider grouping multiple client projects on production VPS
  • Evaluate if homelab needs full infrastructure stack
  • Monitor for opportunities to share infrastructure between environments
  • Document migration procedures for moving services between environments