Files

Joakim ecbeb07ba2 Migrate sigvild-gallery to production environment

- Add multi-environment architecture (homelab + production)
- Create production environment (mini-vps) for client projects
- Create homelab playbook for arch-vps services
- Create production playbook for mini-vps services
- Move sigvild-gallery from homelab to production
- Restructure variables: group_vars/production + host_vars/arch-vps
- Add backup-sigvild.yml playbook with auto-restore functionality
- Fix restore logic to check for data before creating directories
- Add manual variable loading workaround for Ansible 2.20
- Update all documentation for multi-environment setup
- Add ADR-007 documenting multi-environment architecture decision

2025-12-15 16:33:33 +01:00

44 KiB

Raw Blame History

Architecture Decision Records (ADR)

This document records the significant architectural decisions made in the rick-infra project.

Unix Socket IPC Architecture

Context

Containerized applications need to communicate with database and cache services. Communication methods include:

Network TCP/IP: Standard network protocols
Unix Domain Sockets: Filesystem-based IPC

Decision

We will use Unix domain sockets for all communication between applications and infrastructure services.

Rationale

Security Benefits

No Network Exposure: Infrastructure services bind only to Unix sockets

# PostgreSQL configuration
listen_addresses = ''                    # No TCP binding
unix_socket_directories = '/var/run/postgresql'

# Valkey configuration  
port 0                                   # Disable TCP port
unixsocket /var/run/valkey/valkey.sock

Filesystem Permissions: Access controlled by Unix file permissions

srwxrwx--- 1 postgres postgres 0 /var/run/postgresql/.s.PGSQL.5432
srwxrwx--- 1 valkey   valkey   0 /var/run/valkey/valkey.sock

Group-Based Access: Simple group membership controls access

# Add application user to infrastructure groups
usermod -a -G postgres,valkey authentik

No Network Scanning: Services invisible to network reconnaissance

Performance Advantages

Lower Latency: Unix sockets have ~20% lower latency than TCP loopback
Higher Throughput: Up to 40% higher throughput for local communication
Reduced CPU Overhead: No network stack processing required
Efficient Data Transfer: Direct kernel-level data copying

Operational Benefits

Connection Reliability: Filesystem-based connections are more reliable
Resource Monitoring: Standard filesystem monitoring applies
Backup Friendly: No network configuration to backup/restore
Debugging: Standard filesystem tools for troubleshooting

Implementation Strategy

Container Socket Access

# Container configuration (Quadlet)
[Container]
# Mount socket directories with proper labels
Volume=/var/run/postgresql:/var/run/postgresql:Z
Volume=/var/run/valkey:/var/run/valkey:Z

# Preserve user namespace and groups
PodmanArgs=--userns=host
Annotation=run.oci.keep_original_groups=1

Application Configuration

# Database connection (PostgreSQL)
DATABASE_URL=postgresql://authentik@/authentik?host=/var/run/postgresql

# Cache connection (Redis/Valkey) 
CACHE_URL=unix:///var/run/valkey/valkey.sock?db=1&password=secret

User Management

# Ansible user setup
- name: Add application user to infrastructure groups
  user:
    name: "{{ app_user }}"
    groups:
      - postgres  # For database access
      - valkey    # For cache access
    append: true

Consequences

Positive

Security: Eliminated network attack vectors for databases
Performance: Measurably faster database and cache operations
Reliability: More stable connections than network-based
Simplicity: Simpler configuration than network + authentication

Negative

Container Complexity: Requires careful container user/group management
Learning Curve: Less familiar than standard TCP connections
Port Forwarding: Cannot use standard port forwarding for debugging

Mitigation Strategies

Documentation: Comprehensive guides for Unix socket configuration
Testing: Automated tests verify socket connectivity
Tooling: Helper scripts for debugging socket connections

Technical Implementation

# Test socket connectivity
sudo -u authentik psql -h /var/run/postgresql -U authentik -d authentik
sudo -u authentik redis-cli -s /var/run/valkey/valkey.sock ping

# Container user verification
podman exec authentik-server id
# uid=963(authentik) gid=963(authentik) groups=963(authentik),968(postgres),965(valkey)

Alternatives Considered

TCP with Authentication: Rejected due to network exposure
TCP with TLS: Rejected due to certificate complexity and performance overhead
Shared Memory: Rejected due to implementation complexity

ADR-003: Podman + systemd Container Orchestration

Technical Story: Container orchestration solution for secure application deployment with systemd integration.

Context

Container orchestration options for a single-node infrastructure:

Docker + Docker Compose: Traditional container orchestration
Podman + systemd: Rootless containers with native systemd integration
Kubernetes: Full orchestration platform (overkill for single node)
Nomad: HashiCorp orchestration solution

Decision

We will use Podman with systemd integration (Quadlet) for container orchestration, deployed as system-level services (rootful containers running as dedicated users).

Rationale

Security Advantages

No Daemon Required: No privileged daemon attack surface

# Docker: Requires root daemon
sudo systemctl status docker

# Podman: Daemonless operation
podman ps  # No daemon needed

Dedicated Service Users: Containers run as dedicated system users (not root)
Group-Based Access Control: Unix group membership controls infrastructure access
SELinux Integration: Better SELinux support than Docker

systemd Integration Benefits

Native Service Management: Containers as system-level systemd services

# Quadlet file: /etc/containers/systemd/authentik.pod
[Unit]
Description=Authentik Authentication Pod

[Pod]
PublishPort=0.0.0.0:9000:9000
ShmSize=256m

[Service]  
Restart=always
TimeoutStartSec=900

[Install]
WantedBy=multi-user.target

Dependency Management: systemd handles service dependencies
Resource Control: systemd resource limits and monitoring
Logging Integration: journald for centralized logging

Operational Excellence

Familiar Tooling: Standard systemd commands

systemctl status authentik-pod
systemctl restart authentik-server
journalctl -u authentik-server -f

Boot Integration: Services start automatically at system boot
Resource Monitoring: systemd resource tracking
Configuration Management: Declarative Quadlet files

Performance Benefits

Lower Overhead: No daemon overhead for container management
Direct Kernel Access: Better performance than daemon-based solutions
Resource Efficiency: More efficient resource utilization

Implementation Architecture

┌─────────────────────────────────────────────────────────────┐
│ systemd System Services (/system.slice/)                   │
│                                                             │
│ ┌─────────────────┐  ┌─────────────────┐  ┌───────────────┐ │
│ │ authentik-pod   │  │ authentik-server│  │authentik-worker│ │
│ │ .service        │  │ .service        │  │ .service       │ │
│ └─────────────────┘  └─────────────────┘  └───────────────┘ │
│           │                    │                    │       │
│           └────────────────────┼────────────────────┘       │
│                                │                            │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Podman Pod (rootful, dedicated user)                    │ │
│ │                                                         │ │
│ │ ┌─────────────────┐  ┌─────────────────────────────────┐ │ │
│ │ │ Server Container│  │ Worker Container                │ │ │
│ │ │ User: 966:966   │  │ User: 966:966                  │ │ │
│ │ │ Groups: 961,962 │  │ Groups: 961,962                │ │ │
│ │ │ (valkey,postgres)│ │ (valkey,postgres)              │ │ │
│ │ └─────────────────┘  └─────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
           │                                                  
           │ Group-based access to infrastructure            
           ▼                                                  
┌─────────────────────────────────────────────────────────────┐
│ Infrastructure Services                                     │
│ PostgreSQL: /var/run/postgresql (postgres:postgres-clients)│
│ Valkey: /var/run/valkey (valkey:valkey-clients)           │
└─────────────────────────────────────────────────────────────┘

Quadlet Configuration

# Pod configuration (authentik.pod)
[Unit]
Description=Authentik Authentication Pod

[Pod]
PublishPort=127.0.0.1:9000:9000
ShmSize=256m

[Service]
Restart=always

[Install]
WantedBy=default.target

# Container configuration (authentik-server.container)
[Unit]
Description=Authentik Server Container
After=authentik-pod.service
Requires=authentik-pod.service

[Container]
ContainerName=authentik-server
Image=ghcr.io/goauthentik/server:2025.10
Pod=authentik.pod
EnvironmentFile=/opt/authentik/.env
User=966:966
PodmanArgs=--group-add 962 --group-add 961

# Volume mounts for sockets and data
Volume=/opt/authentik/media:/media
Volume=/opt/authentik/data:/data
Volume=/var/run/postgresql:/var/run/postgresql:Z
Volume=/var/run/valkey:/var/run/valkey:Z

[Service]
Restart=always
TimeoutStartSec=300

[Install]
WantedBy=multi-user.target

User Management Strategy

# Ansible implementation
- name: Create service user
  user:
    name: authentik
    group: authentik
    groups: [postgres-clients, valkey-clients]
    system: true
    shell: /bin/bash
    home: /opt/authentik
    create_home: true
    append: true

Note: Infrastructure roles (PostgreSQL, Valkey) export client group GIDs as Ansible facts (postgresql_client_group_gid, valkey_client_group_gid) which are consumed by application container templates for dynamic --group-add arguments.

Consequences

Positive

Security: Eliminated privileged daemon attack surface
Integration: Seamless systemd integration for management
Performance: Lower overhead than daemon-based solutions
Reliability: systemd's proven service management
Monitoring: Standard systemd monitoring and logging

Negative

Learning Curve: Different from Docker Compose workflows
Tooling: Ecosystem less mature than Docker
Documentation: Fewer online resources and examples

Mitigation Strategies

Documentation: Comprehensive internal documentation
Training: Team training on Podman/systemd workflows
Tooling: Helper scripts for common operations

Technical Implementation

# Container management (system scope)
systemctl status authentik-pod
systemctl restart authentik-server
podman ps
podman logs authentik-server

# Resource monitoring
systemctl show authentik-server --property=MemoryCurrent
journalctl -u authentik-server -f

# Verify container groups
ps aux | grep authentik-server | head -1 | awk '{print $2}' | \
  xargs -I {} cat /proc/{}/status | grep Groups
# Output: Groups: 961 962 966

Alternatives Considered

Docker + Docker Compose: Rejected due to security concerns (privileged daemon)
Kubernetes: Rejected as overkill for single-node deployment
Nomad: Rejected to maintain consistency with systemd ecosystem

OAuth/OIDC and Forward Authentication Security Model

Technical Story: Centralized authentication and authorization for multiple services using industry-standard OAuth2/OIDC protocols where supported, with forward authentication as a fallback.

Context

Authentication strategies for multiple services:

Per-Service Authentication: Each service handles its own authentication
Shared Database: Services share authentication database
OAuth2/OIDC Integration: Services implement standard OAuth2/OIDC clients
Forward Authentication: Reverse proxy handles authentication for services without OAuth support

Decision

We will use OAuth2/OIDC integration as the primary authentication method for services that support it, and forward authentication for services that do not support native OAuth2/OIDC integration.

Rationale

OAuth/OIDC as Primary Method

Security Benefits:

Standard Protocol: Industry-standard authentication flow (RFC 6749, RFC 7636)
Token-Based Security: Secure JWT tokens with cryptographic signatures
Proper Session Management: Native application session handling with refresh tokens
Scope-Based Authorization: Fine-grained permission control via OAuth scopes
PKCE Support: Protection against authorization code interception attacks

Integration Benefits:

Native Support: Applications designed for OAuth/OIDC work seamlessly
Better UX: Proper redirect flows, logout handling, and token refresh
API Access: OAuth tokens enable secure API integrations
Standard Claims: OpenID Connect user info endpoint provides standardized user data
Multi-Application SSO: Proper single sign-on with token sharing

Examples: Nextcloud, Gitea, Grafana, many modern applications

Forward Auth as Fallback

Use Cases:

Services without OAuth/OIDC support
Legacy applications that cannot be modified
Static sites requiring authentication
Simple internal tools

Security Benefits:

Zero Application Changes: Protect existing services without modification
Header-Based Identity: Simple identity propagation to backend
Transparent Protection: Services receive pre-authenticated requests

Limitations:

Non-Standard: Not using industry-standard authentication protocols
Proxy Dependency: All requests must flow through authenticating proxy
Limited Logout: Complex logout scenarios across services
Header Trust: Backend must trust proxy-provided headers

Shared Benefits (Both Methods)

Single Point of Control: Centralized authentication policy via Authentik
Consistent Security: Same authentication provider across all services
Multi-Factor Authentication: MFA applied consistently via Authentik
Audit Trail: Centralized authentication logging
User Management: One system for all user administration

Implementation Architecture

OAuth/OIDC Flow (Primary Method)

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│    User     │    │   Service   │    │  Authentik  │
│             │    │ (OAuth App) │    │   (IdP)     │
└──────┬──────┘    └──────┬──────┘    └──────┬──────┘
       │                  │                  │
       │ Access Service   │                  │
       │─────────────────▶│                  │
       │                  │                  │
       │                  │ No session       │
       │ 302 → OAuth      │                  │
       │◀─────────────────│                  │
       │                  │                  │
       │ GET /authorize?client_id=...&redirect_uri=...
       │──────────────────────────────────────▶│
       │                  │                  │
       │ Login form (if not authenticated)   │
       │◀────────────────────────────────────│
       │                  │                  │
       │ Credentials      │                  │
       │─────────────────────────────────────▶│
       │                  │                  │
       │ 302 → callback?code=AUTH_CODE       │
       │◀────────────────────────────────────│
       │                  │                  │
       │ GET /callback?code=AUTH_CODE        │
       │─────────────────▶│                  │
       │                  │                  │
       │                  │ POST /token      │
       │                  │  code=AUTH_CODE  │
       │                  │─────────────────▶│
       │                  │                  │
       │                  │ access_token     │
       │                  │ id_token (JWT)   │
       │                  │◀─────────────────│
       │                  │                  │
       │ Set-Cookie       │ GET /userinfo    │
       │ 302 → /dashboard │─────────────────▶│
       │◀─────────────────│                  │
       │                  │ User claims      │
       │                  │◀─────────────────│
       │                  │                  │
       │ GET /dashboard   │                  │
       │─────────────────▶│                  │
       │                  │                  │
       │ Dashboard        │                  │
       │◀─────────────────│                  │

Forward Auth Flow (Fallback Method)

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│    User     │    │    Caddy    │    │  Authentik  │    │   Service   │
│             │    │  (Proxy)    │    │  (Forward)  │    │ (Backend)   │
└──────┬──────┘    └──────┬──────┘    └──────┬──────┘    └──────┬──────┘
       │                  │                  │                  │
       │ GET /           │                  │                  │
       │─────────────────▶│                  │                  │
       │                  │                  │                  │
       │                  │ Forward Auth     │                  │
       │                  │─────────────────▶│                  │
       │                  │                  │                  │
       │                  │ 401 Unauthorized │                  │
       │                  │◀─────────────────│                  │
       │                  │                  │                  │
       │ 302 → /auth      │                  │                  │
       │◀─────────────────│                  │                  │
       │                  │                  │                  │
       │ Login form       │                  │                  │
       │──────────────────────────────────────▶│                  │
       │                  │                  │                  │
       │ Credentials      │                  │                  │
       │──────────────────────────────────────▶│                  │
       │                  │                  │                  │
       │ Set-Cookie       │                  │                  │
       │◀──────────────────────────────────────│                  │
       │                  │                  │                  │
       │ GET /           │                  │                  │
       │─────────────────▶│                  │                  │
       │                  │                  │                  │
       │                  │ Forward Auth     │                  │
       │                  │─────────────────▶│                  │
       │                  │                  │                  │
       │                  │ 200 + Headers    │                  │
       │                  │◀─────────────────│                  │
       │                  │                  │                  │
       │                  │ Proxy + Headers  │                  │
       │                  │─────────────────────────────────────▶│
       │                  │                  │                  │
       │                  │ Response         │                  │
       │                  │◀─────────────────────────────────────│
       │                  │                  │                  │
       │ Content          │                  │                  │
       │◀─────────────────│                  │                  │

OAuth/OIDC Configuration Examples

Nextcloud OAuth Configuration

// Nextcloud config.php
'oidc_login_provider_url' => 'https://auth.jnss.me/application/o/nextcloud/',
'oidc_login_client_id' => 'nextcloud-client-id',
'oidc_login_client_secret' => 'secret-from-authentik',
'oidc_login_auto_redirect' => true,
'oidc_login_end_session_redirect' => true,
'oidc_login_button_text' => 'Login with SSO',
'oidc_login_hide_password_form' => true,
'oidc_login_use_id_token' => true,
'oidc_login_attributes' => [
    'id' => 'preferred_username',
    'name' => 'name',
    'mail' => 'email',
    'groups' => 'groups',
],
'oidc_login_default_group' => 'users',
'oidc_login_use_external_storage' => false,
'oidc_login_scope' => 'openid profile email groups',
'oidc_login_proxy_ldap' => false,
'oidc_login_disable_registration' => false,
'oidc_login_redir_fallback' => true,
'oidc_login_tls_verify' => true,

Gitea OAuth Configuration

# Gitea app.ini
[openid]
ENABLE_OPENID_SIGNIN = false
ENABLE_OPENID_SIGNUP = false

[oauth2_client]
REGISTER_EMAIL_CONFIRM = false
OPENID_CONNECT_SCOPES = openid email profile groups
ENABLE_AUTO_REGISTRATION = true
USERNAME = preferred_username
EMAIL = email
ACCOUNT_LINKING = auto

Authentik Provider Configuration (Gitea):

Provider Type: OAuth2/OpenID Provider
Client ID: gitea
Client Secret: Generated by Authentik
Redirect URIs: https://git.jnss.me/user/oauth2/Authentik/callback
Scopes: openid, profile, email, groups

Authentik OAuth2 Provider Settings

# OAuth2/OIDC Provider configuration in Authentik
name: "Nextcloud OAuth Provider"
authorization_flow: "default-authorization-flow"
client_type: "confidential"
client_id: "nextcloud-client-id"
redirect_uris: "https://cloud.jnss.me/apps/oidc_login/oidc"
signing_key: "authentik-default-key"
property_mappings:
  - "authentik default OAuth Mapping: OpenID 'openid'"
  - "authentik default OAuth Mapping: OpenID 'email'"
  - "authentik default OAuth Mapping: OpenID 'profile'"
  - "Custom: Groups"  # Maps user groups to 'groups' claim

Forward Auth Configuration Examples

Caddy Configuration for Forward Auth

# whoami service with forward authentication
whoami.jnss.me {
    # Forward authentication to Authentik
    forward_auth https://auth.jnss.me {
        uri /outpost.goauthentik.io/auth/caddy
        copy_headers Remote-User Remote-Name Remote-Email Remote-Groups
    }
    
    # Backend service (receives authenticated requests)
    reverse_proxy localhost:8080
}

Authentik Proxy Provider Configuration

# Authentik Proxy Provider for forward auth
name: "Whoami Forward Auth"
type: "proxy"
authorization_flow: "default-authorization-flow"
external_host: "https://whoami.jnss.me"
internal_host: "http://localhost:8080"
skip_path_regex: "^/(health|metrics).*"
mode: "forward_single"  # Single application mode

Service Integration (Forward Auth)

Services receive authentication information via HTTP headers:

# Example service code (Python Flask)
@app.route('/')
def index():
    username = request.headers.get('Remote-User')
    name = request.headers.get('Remote-Name') 
    email = request.headers.get('Remote-Email')
    groups = request.headers.get('Remote-Groups', '').split(',')
    
    return render_template('index.html', 
                         username=username, 
                         name=name,
                         email=email,
                         groups=groups)

Authorization Policies

Both OAuth and Forward Auth support Authentik authorization policies:

# Example authorization policy in Authentik
policy_bindings:
  - policy: "group_admins_only"
    target: "nextcloud_oauth_provider"
    order: 0
  
  - policy: "require_mfa" 
    target: "gitea_oauth_provider"
    order: 1
    
  - policy: "internal_network_only"
    target: "whoami_proxy_provider"
    order: 0

Decision Matrix: OAuth/OIDC vs Forward Auth

Criteria	OAuth/OIDC	Forward Auth
Application Support	Requires native OAuth/OIDC support	Any application
Protocol Standard	Industry standard (RFC 6749, 7636)	Proprietary/custom
Token Management	Native refresh tokens, proper expiry	Session-based only
Logout Handling	Proper logout flow	Complex, proxy-dependent
API Access	Full API support via tokens	Header-only
Implementation Effort	Configure OAuth settings	Zero app changes
User Experience	Standard OAuth redirects	Transparent
Security Model	Token-based with scopes	Header trust model
When to Use	Nextcloud, Gitea, modern apps	Static sites, legacy apps, whoami

Consequences

Positive

Standards Compliance: OAuth/OIDC uses industry-standard protocols
Security: Multiple authentication options with appropriate security models
Flexibility: Right tool for each service (OAuth when possible, forward auth when needed)
Auditability: Centralized authentication logging via Authentik
User Experience: Proper SSO across all services
Token Security: OAuth provides secure token refresh and scope management
Graceful Degradation: Forward auth available for services without OAuth support

Negative

Complexity: Need to understand two authentication methods
Configuration Overhead: OAuth requires per-service configuration
Single Point of Failure: Authentik failure affects all services
Learning Curve: Team must understand OAuth flows and forward auth model

Mitigation Strategies

Documentation: Clear decision guide for choosing OAuth vs forward auth
Templates: Reusable OAuth configuration templates for common services
High Availability: Robust deployment and monitoring of Authentik
Monitoring: Comprehensive monitoring of both authentication flows
Testing: Automated tests for authentication flows

Security Considerations

OAuth/OIDC Security

# Authentik OAuth2 Provider security settings
authorization_code_validity: 60  # 1 minute
access_code_validity: 3600       # 1 hour
refresh_code_validity: 2592000   # 30 days
include_claims_in_id_token: true
signing_key: "authentik-default-key"
sub_mode: "hashed_user_id"
issuer_mode: "per_provider"

Best Practices:

Use PKCE for all OAuth flows (protection against interception)
Implement proper token rotation (refresh tokens expire and rotate)
Validate aud (audience) and iss (issuer) claims in JWT tokens
Use short-lived access tokens (1 hour)
Store client secrets securely (Ansible Vault)

Forward Auth Security

# Authentik Proxy Provider security settings
token_validity: 3600  # 1 hour session
cookie_domain: ".jnss.me"
skip_path_regex: "^/(health|metrics|static).*"

Best Practices:

Trust only Authentik-provided headers
Validate Remote-User header exists before granting access
Use HTTPS for all forward auth endpoints
Implement proper session timeouts
Strip user-provided authentication headers at proxy

Access Control

Group-Based Authorization: Users assigned to groups, groups to applications
Policy Engine: Authentik policies for fine-grained access control
MFA Requirements: Multi-factor authentication for sensitive services
IP-Based Restrictions: Geographic or network-based access control
Time-Based Access: Temporary access grants via policies

Audit Logging

{
  "timestamp": "2025-12-15T10:30:00Z",
  "event": "oauth_authorization",
  "user": "john.doe",
  "application": "nextcloud",
  "scopes": ["openid", "email", "profile", "groups"],
  "ip": "192.168.1.100",
  "user_agent": "Mozilla/5.0..."
}

Implementation Examples by Service Type

OAuth/OIDC Services (Primary Method)

Nextcloud:

cloud.jnss.me {
    reverse_proxy localhost:8080
}
# OAuth configured within Nextcloud application

Gitea:

git.jnss.me {
    reverse_proxy localhost:3000
}
# OAuth configured within Gitea application settings

Forward Auth Services (Fallback Method)

Whoami (test/demo service):

whoami.jnss.me {
    forward_auth https://auth.jnss.me {
        uri /outpost.goauthentik.io/auth/caddy
        copy_headers Remote-User Remote-Name Remote-Email Remote-Groups
    }
    reverse_proxy localhost:8080
}

Static Documentation Site:

docs.jnss.me {
    forward_auth https://auth.jnss.me {
        uri /outpost.goauthentik.io/auth/caddy
        copy_headers Remote-User Remote-Groups
    }
    
    root * /var/www/docs
    file_server
}

Internal API (no OAuth support):

api.jnss.me {
    forward_auth https://auth.jnss.me {
        uri /outpost.goauthentik.io/auth/caddy
        copy_headers Remote-User Remote-Email Remote-Groups
    }
    reverse_proxy localhost:3000
}

Selective Protection (Public + Protected Paths)

app.jnss.me {
    # Public endpoints (no auth required)
    handle /health {
        reverse_proxy localhost:8080
    }
    
    handle /metrics {
        reverse_proxy localhost:8080
    }
    
    handle /public/* {
        reverse_proxy localhost:8080  
    }
    
    # Protected endpoints (forward auth)
    handle /admin/* {
        forward_auth https://auth.jnss.me {
            uri /outpost.goauthentik.io/auth/caddy
            copy_headers Remote-User Remote-Groups
        }
        reverse_proxy localhost:8080
    }
    
    # Default: protected
    handle {
        forward_auth https://auth.jnss.me {
            uri /outpost.goauthentik.io/auth/caddy
            copy_headers Remote-User Remote-Groups
        }
        reverse_proxy localhost:8080
    }
}

Alternatives Considered

OAuth2/OIDC Only: Rejected because many services don't support OAuth natively
Forward Auth Only: Rejected because it doesn't leverage native OAuth support in modern apps
Per-Service Authentication: Rejected due to management overhead and inconsistent security
Shared Database: Rejected due to tight coupling between services
VPN-Based Access: Rejected due to operational complexity for web services
SAML: Rejected in favor of modern OAuth2/OIDC standards

Rootful Containers with Infrastructure Fact Pattern

Technical Story: Enable containerized applications to access native infrastructure services (PostgreSQL, Valkey) via Unix sockets with group-based permissions.

Context

Containerized applications need to access infrastructure services (PostgreSQL, Valkey) through Unix sockets with filesystem-based permission controls. The permission model requires:

Socket directories owned by service groups (postgres-clients, valkey-clients)
Application users added to these groups for access
Container processes must preserve group membership to access sockets

Two approaches were evaluated:

Rootless containers (user namespace): Containers run in user namespace with UID/GID remapping
Rootful containers (system services): Containers run as dedicated system users without namespace isolation

Decision

We will use rootful containers deployed as system-level systemd services with an Infrastructure Fact Pattern where infrastructure roles export client group GIDs as Ansible facts for application consumption.

Rationale

Why Rootful Succeeds

Direct UID/GID Mapping:

# Host: authentik user UID 966, groups: 966 (authentik), 961 (valkey-clients), 962 (postgres-clients)
# Container User=966:966 with PodmanArgs=--group-add 961 --group-add 962

# Inside container:
id
# uid=966(authentik) gid=966(authentik) groups=966(authentik),961(valkey-clients),962(postgres-clients)

# Socket access works:
ls -l /var/run/postgresql/.s.PGSQL.5432
# srwxrwx--- 1 postgres postgres-clients 0 ... /var/run/postgresql/.s.PGSQL.5432

Group membership preserved: Container process has GIDs 961 and 962, matching socket group ownership.

Why Rootless Failed (Discarded Approach)

User Namespace UID/GID Remapping:

# Host: authentik user UID 100000, subuid range 200000-265535
# Container User=%i:%i with --userns=host --group-add=keep-groups

# User namespace remaps:
# Host UID 100000 → Container UID 100000 (root in namespace)
# Host GID 961 → Container GID 200961 (remapped into subgid range)
# Host GID 962 → Container GID 200962 (remapped into subgid range)

# Socket ownership on host:
# srwxrwx--- 1 postgres postgres-clients (GID 962)

# Container process groups: 200961, 200962 (remapped)
# Socket expects: GID 962 (not remapped)
# Result: Permission denied ❌

Root cause: User namespace supplementary group remapping breaks group-based socket access even with --userns=host, --group-add=keep-groups, and Annotation=run.oci.keep_original_groups=1.

Infrastructure Fact Pattern

Infrastructure Roles Export GIDs

Infrastructure services create client groups and export their GIDs as Ansible facts:

# PostgreSQL role: roles/postgresql/tasks/main.yml
- name: Create PostgreSQL client access group
  group:
    name: postgres-clients
    system: true

- name: Get PostgreSQL client group GID
  shell: "getent group postgres-clients | cut -d: -f3"
  register: postgresql_client_group_lookup
  changed_when: false

- name: Set PostgreSQL client group GID as fact
  set_fact:
    postgresql_client_group_gid: "{{ postgresql_client_group_lookup.stdout }}"

# Valkey role: roles/valkey/tasks/main.yml
- name: Create Valkey client access group
  group:
    name: valkey-clients
    system: true

- name: Get Valkey client group GID
  shell: "getent group valkey-clients | cut -d: -f3"
  register: valkey_client_group_lookup
  changed_when: false

- name: Set Valkey client group GID as fact
  set_fact:
    valkey_client_group_gid: "{{ valkey_client_group_lookup.stdout }}"

Application Roles Consume Facts

Application roles validate and consume infrastructure facts:

# Authentik role: roles/authentik/tasks/main.yml
- name: Validate infrastructure facts are available
  assert:
    that:
      - postgresql_client_group_gid is defined
      - valkey_client_group_gid is defined
    fail_msg: |
      Required infrastructure facts are not available.
      Ensure PostgreSQL and Valkey roles have run first.

- name: Create authentik user with infrastructure groups
  user:
    name: authentik
    groups: [postgres-clients, valkey-clients]
    append: true

# Container template: roles/authentik/templates/authentik-server.container
[Container]
User={{ authentik_uid }}:{{ authentik_gid }}
PodmanArgs=--group-add {{ postgresql_client_group_gid }} --group-add {{ valkey_client_group_gid }}

Implementation Details

System-Level Deployment

# Quadlet files deployed to /etc/containers/systemd/ (not ~/.config/)
# Pod: /etc/containers/systemd/authentik.pod
[Unit]
Description=Authentik Authentication Pod

[Pod]
PublishPort=0.0.0.0:9000:9000
ShmSize=256m

[Service]
Restart=always

[Install]
WantedBy=multi-user.target  # System target, not default.target

# Container: /etc/containers/systemd/authentik-server.container
[Container]
User=966:966
PodmanArgs=--group-add 962 --group-add 961

Volume=/var/run/postgresql:/var/run/postgresql:Z
Volume=/var/run/valkey:/var/run/valkey:Z

Service Management

# System scope (not user scope)
systemctl status authentik-pod
systemctl restart authentik-server
journalctl -u authentik-server -f

# Verify container location
systemctl status authentik-server | grep CGroup
# CGroup: /system.slice/authentik-server.service ✓

Special Case: Valkey Socket Group Fix

Valkey doesn't natively support socket group configuration (unlike PostgreSQL's unix_socket_group). A helper service ensures correct socket permissions:

# /etc/systemd/system/valkey-socket-fix.service
[Unit]
Description=Fix Valkey socket group ownership and permissions
BindsTo=valkey.service
After=valkey.service

[Service]
Type=oneshot
ExecStart=/bin/sh -c 'i=0; while [ ! -S /var/run/valkey/valkey.sock ] && [ $i -lt 100 ]; do sleep 0.1; i=$((i+1)); done'
ExecStart=/bin/chgrp valkey-clients /var/run/valkey/valkey.sock
ExecStart=/bin/chmod 770 /var/run/valkey/valkey.sock
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

Triggered by Valkey service:

# /etc/systemd/system/valkey.service (excerpt)
[Unit]
Wants=valkey-socket-fix.service

Consequences

Positive

Socket Access Works: Group-based permissions function correctly
Security: Containers run as dedicated users (not root), no privileged daemon
Portability: Dynamic GID facts work across different hosts
Consistency: Same pattern for all containerized applications
Simplicity: No user namespace complexity, standard systemd service management

Negative

Not "Pure" Rootless: Containers require root for systemd service deployment
Different from Docker: Less familiar pattern than rootless user services

Neutral

System vs User Scope: Different commands (systemctl vs systemctl --user) but equally capable
Deployment Location: /etc/containers/systemd/ vs ~/.config/ but same Quadlet functionality

Validation

# Verify service location
systemctl status authentik-server | grep CGroup
# → /system.slice/authentik-server.service ✓

# Verify process groups
ps aux | grep authentik | head -1 | awk '{print $2}' | \
  xargs -I {} cat /proc/{}/status | grep Groups
# → Groups: 961 962 966 ✓

# Verify socket permissions
ls -l /var/run/postgresql/.s.PGSQL.5432
# → srwxrwx--- postgres postgres-clients ✓

ls -l /var/run/valkey/valkey.sock
# → srwxrwx--- valkey valkey-clients ✓

# Verify HTTP endpoint
curl -I http://127.0.0.1:9000/
# → HTTP/1.1 302 Found ✓

Alternatives Considered

Rootless with user namespace - Discarded due to GID remapping breaking group-based socket access
TCP-only connections - Rejected to maintain Unix socket security and performance benefits
Hardcoded GIDs - Rejected for portability; facts provide dynamic resolution
Directory permissions (777) - Rejected for security; group-based access more restrictive. This is then later changed again to 777, due to Nextcloud switching from root to www-data, breaking group-based permissions.

ADR-007: Multi-Environment Infrastructure Architecture

Date: December 2025
Status: Accepted
Context: Separation of homelab services from production client projects

Decision

Rick-infra will manage two separate environments with different purposes and uptime requirements:

Homelab Environment (arch-vps)
- Purpose: Personal services and experimentation
- Infrastructure: Full stack (PostgreSQL, Valkey, Podman, Caddy)
- Services: Authentik, Nextcloud, Gitea
- Uptime requirement: Best effort
Production Environment (mini-vps)
- Purpose: Client projects requiring high uptime
- Infrastructure: Minimal (Caddy only)
- Services: Sigvild Gallery
- Uptime requirement: High availability

Rationale

Separation of Concerns:

Personal experiments don't affect client services
Client services isolated from homelab maintenance
Clear distinction between environments in code

Infrastructure Optimization:

Production runs minimal services (no PostgreSQL/Valkey overhead)
Homelab can be rebooted/upgraded without affecting clients
Cost optimization: smaller VPS for production

Operational Flexibility:

Different backup strategies per environment
Different monitoring/alerting levels
Independent deployment schedules

Implementation

Variable Organization:

rick-infra/
├── group_vars/
│   └── production/        # Production environment config
│       ├── main.yml
│       └── vault.yml
├── host_vars/
│   └── arch-vps/          # Homelab host config
│       ├── main.yml
│       └── vault.yml
└── playbooks/
    ├── homelab.yml        # Homelab deployment
    ├── production.yml     # Production deployment
    └── site.yml           # Orchestrates both

Playbook Structure:

site.yml imports both homelab.yml and production.yml
Each playbook manually loads variables (Ansible 2.20 workaround)
Services deploy only to their designated environment

Inventory Groups:

homelab:
  hosts:
    arch-vps:
      ansible_host: 69.62.119.31

production:
  hosts:
    mini-vps:
      ansible_host: 72.62.91.251

Migration Example

Sigvild Gallery Migration (December 2025):

From: arch-vps (homelab)
To: mini-vps (production)
Reason: Client project requiring higher uptime
Process:
1. Created backup on arch-vps
2. Deployed to mini-vps with automatic restore
3. Updated DNS (5 min downtime)
4. Removed from arch-vps configuration

Consequences

Positive:

Clear separation of personal vs. client services
Reduced blast radius for experiments
Optimized resource usage per environment
Independent scaling and management

Negative:

Increased complexity in playbook organization
Need to manage multiple VPS instances
Ansible 2.20 variable loading requires workarounds
Duplicate infrastructure code (Caddy on both)

Neutral:

Services can be migrated between environments with minimal friction
Backup/restore procedures work across environments
Group_vars vs. host_vars hybrid approach

Future Considerations

Consider grouping multiple client projects on production VPS
Evaluate if homelab needs full infrastructure stack
Monitor for opportunities to share infrastructure between environments
Document migration procedures for moving services between environments

44 KiB Raw Blame History

Architecture Decision Records (ADR)

Unix Socket IPC Architecture

Context

Decision

Rationale

Security Benefits

Performance Advantages

Operational Benefits

Implementation Strategy

Container Socket Access

Application Configuration

User Management

Consequences

Positive

Negative

Mitigation Strategies

Technical Implementation

Alternatives Considered

ADR-003: Podman + systemd Container Orchestration

Context

Decision

Rationale

Security Advantages

systemd Integration Benefits

Operational Excellence

Performance Benefits

Implementation Architecture

Quadlet Configuration

User Management Strategy

Consequences

Positive

Negative

Mitigation Strategies

Technical Implementation

Alternatives Considered

OAuth/OIDC and Forward Authentication Security Model

Context

Decision

Rationale

OAuth/OIDC as Primary Method

Forward Auth as Fallback

Shared Benefits (Both Methods)

Implementation Architecture

OAuth/OIDC Flow (Primary Method)

Forward Auth Flow (Fallback Method)

OAuth/OIDC Configuration Examples

Nextcloud OAuth Configuration

Gitea OAuth Configuration

Authentik OAuth2 Provider Settings

Forward Auth Configuration Examples

Caddy Configuration for Forward Auth

Authentik Proxy Provider Configuration

Service Integration (Forward Auth)

Authorization Policies

Decision Matrix: OAuth/OIDC vs Forward Auth

Consequences

Positive

Negative

Mitigation Strategies

Security Considerations

OAuth/OIDC Security

Forward Auth Security

Access Control

Audit Logging

Implementation Examples by Service Type

OAuth/OIDC Services (Primary Method)

Forward Auth Services (Fallback Method)

Selective Protection (Public + Protected Paths)

Alternatives Considered

Rootful Containers with Infrastructure Fact Pattern

Context

Decision

Rationale

Why Rootful Succeeds

Why Rootless Failed (Discarded Approach)

Infrastructure Fact Pattern

Infrastructure Roles Export GIDs

Application Roles Consume Facts

Implementation Details

44 KiB

Raw Blame History