- Add multi-environment architecture (homelab + production) - Create production environment (mini-vps) for client projects - Create homelab playbook for arch-vps services - Create production playbook for mini-vps services - Move sigvild-gallery from homelab to production - Restructure variables: group_vars/production + host_vars/arch-vps - Add backup-sigvild.yml playbook with auto-restore functionality - Fix restore logic to check for data before creating directories - Add manual variable loading workaround for Ansible 2.20 - Update all documentation for multi-environment setup - Add ADR-007 documenting multi-environment architecture decision
44 KiB
Architecture Decision Records (ADR)
This document records the significant architectural decisions made in the rick-infra project.
Unix Socket IPC Architecture
Context
Containerized applications need to communicate with database and cache services. Communication methods include:
- Network TCP/IP: Standard network protocols
- Unix Domain Sockets: Filesystem-based IPC
Decision
We will use Unix domain sockets for all communication between applications and infrastructure services.
Rationale
Security Benefits
- No Network Exposure: Infrastructure services bind only to Unix sockets
# PostgreSQL configuration listen_addresses = '' # No TCP binding unix_socket_directories = '/var/run/postgresql' # Valkey configuration port 0 # Disable TCP port unixsocket /var/run/valkey/valkey.sock - Filesystem Permissions: Access controlled by Unix file permissions
srwxrwx--- 1 postgres postgres 0 /var/run/postgresql/.s.PGSQL.5432 srwxrwx--- 1 valkey valkey 0 /var/run/valkey/valkey.sock - Group-Based Access: Simple group membership controls access
# Add application user to infrastructure groups usermod -a -G postgres,valkey authentik - No Network Scanning: Services invisible to network reconnaissance
Performance Advantages
- Lower Latency: Unix sockets have ~20% lower latency than TCP loopback
- Higher Throughput: Up to 40% higher throughput for local communication
- Reduced CPU Overhead: No network stack processing required
- Efficient Data Transfer: Direct kernel-level data copying
Operational Benefits
- Connection Reliability: Filesystem-based connections are more reliable
- Resource Monitoring: Standard filesystem monitoring applies
- Backup Friendly: No network configuration to backup/restore
- Debugging: Standard filesystem tools for troubleshooting
Implementation Strategy
Container Socket Access
# Container configuration (Quadlet)
[Container]
# Mount socket directories with proper labels
Volume=/var/run/postgresql:/var/run/postgresql:Z
Volume=/var/run/valkey:/var/run/valkey:Z
# Preserve user namespace and groups
PodmanArgs=--userns=host
Annotation=run.oci.keep_original_groups=1
Application Configuration
# Database connection (PostgreSQL)
DATABASE_URL=postgresql://authentik@/authentik?host=/var/run/postgresql
# Cache connection (Redis/Valkey)
CACHE_URL=unix:///var/run/valkey/valkey.sock?db=1&password=secret
User Management
# Ansible user setup
- name: Add application user to infrastructure groups
user:
name: "{{ app_user }}"
groups:
- postgres # For database access
- valkey # For cache access
append: true
Consequences
Positive
- Security: Eliminated network attack vectors for databases
- Performance: Measurably faster database and cache operations
- Reliability: More stable connections than network-based
- Simplicity: Simpler configuration than network + authentication
Negative
- Container Complexity: Requires careful container user/group management
- Learning Curve: Less familiar than standard TCP connections
- Port Forwarding: Cannot use standard port forwarding for debugging
Mitigation Strategies
- Documentation: Comprehensive guides for Unix socket configuration
- Testing: Automated tests verify socket connectivity
- Tooling: Helper scripts for debugging socket connections
Technical Implementation
# Test socket connectivity
sudo -u authentik psql -h /var/run/postgresql -U authentik -d authentik
sudo -u authentik redis-cli -s /var/run/valkey/valkey.sock ping
# Container user verification
podman exec authentik-server id
# uid=963(authentik) gid=963(authentik) groups=963(authentik),968(postgres),965(valkey)
Alternatives Considered
- TCP with Authentication: Rejected due to network exposure
- TCP with TLS: Rejected due to certificate complexity and performance overhead
- Shared Memory: Rejected due to implementation complexity
ADR-003: Podman + systemd Container Orchestration
Technical Story: Container orchestration solution for secure application deployment with systemd integration.
Context
Container orchestration options for a single-node infrastructure:
- Docker + Docker Compose: Traditional container orchestration
- Podman + systemd: Rootless containers with native systemd integration
- Kubernetes: Full orchestration platform (overkill for single node)
- Nomad: HashiCorp orchestration solution
Decision
We will use Podman with systemd integration (Quadlet) for container orchestration, deployed as system-level services (rootful containers running as dedicated users).
Rationale
Security Advantages
- No Daemon Required: No privileged daemon attack surface
# Docker: Requires root daemon sudo systemctl status docker # Podman: Daemonless operation podman ps # No daemon needed - Dedicated Service Users: Containers run as dedicated system users (not root)
- Group-Based Access Control: Unix group membership controls infrastructure access
- SELinux Integration: Better SELinux support than Docker
systemd Integration Benefits
- Native Service Management: Containers as system-level systemd services
# Quadlet file: /etc/containers/systemd/authentik.pod [Unit] Description=Authentik Authentication Pod [Pod] PublishPort=0.0.0.0:9000:9000 ShmSize=256m [Service] Restart=always TimeoutStartSec=900 [Install] WantedBy=multi-user.target - Dependency Management: systemd handles service dependencies
- Resource Control: systemd resource limits and monitoring
- Logging Integration: journald for centralized logging
Operational Excellence
- Familiar Tooling: Standard systemd commands
systemctl status authentik-pod systemctl restart authentik-server journalctl -u authentik-server -f - Boot Integration: Services start automatically at system boot
- Resource Monitoring: systemd resource tracking
- Configuration Management: Declarative Quadlet files
Performance Benefits
- Lower Overhead: No daemon overhead for container management
- Direct Kernel Access: Better performance than daemon-based solutions
- Resource Efficiency: More efficient resource utilization
Implementation Architecture
┌─────────────────────────────────────────────────────────────┐
│ systemd System Services (/system.slice/) │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌───────────────┐ │
│ │ authentik-pod │ │ authentik-server│ │authentik-worker│ │
│ │ .service │ │ .service │ │ .service │ │
│ └─────────────────┘ └─────────────────┘ └───────────────┘ │
│ │ │ │ │
│ └────────────────────┼────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Podman Pod (rootful, dedicated user) │ │
│ │ │ │
│ │ ┌─────────────────┐ ┌─────────────────────────────────┐ │ │
│ │ │ Server Container│ │ Worker Container │ │ │
│ │ │ User: 966:966 │ │ User: 966:966 │ │ │
│ │ │ Groups: 961,962 │ │ Groups: 961,962 │ │ │
│ │ │ (valkey,postgres)│ │ (valkey,postgres) │ │ │
│ │ └─────────────────┘ └─────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
│ Group-based access to infrastructure
▼
┌─────────────────────────────────────────────────────────────┐
│ Infrastructure Services │
│ PostgreSQL: /var/run/postgresql (postgres:postgres-clients)│
│ Valkey: /var/run/valkey (valkey:valkey-clients) │
└─────────────────────────────────────────────────────────────┘
Quadlet Configuration
# Pod configuration (authentik.pod)
[Unit]
Description=Authentik Authentication Pod
[Pod]
PublishPort=127.0.0.1:9000:9000
ShmSize=256m
[Service]
Restart=always
[Install]
WantedBy=default.target
# Container configuration (authentik-server.container)
[Unit]
Description=Authentik Server Container
After=authentik-pod.service
Requires=authentik-pod.service
[Container]
ContainerName=authentik-server
Image=ghcr.io/goauthentik/server:2025.10
Pod=authentik.pod
EnvironmentFile=/opt/authentik/.env
User=966:966
PodmanArgs=--group-add 962 --group-add 961
# Volume mounts for sockets and data
Volume=/opt/authentik/media:/media
Volume=/opt/authentik/data:/data
Volume=/var/run/postgresql:/var/run/postgresql:Z
Volume=/var/run/valkey:/var/run/valkey:Z
[Service]
Restart=always
TimeoutStartSec=300
[Install]
WantedBy=multi-user.target
User Management Strategy
# Ansible implementation
- name: Create service user
user:
name: authentik
group: authentik
groups: [postgres-clients, valkey-clients]
system: true
shell: /bin/bash
home: /opt/authentik
create_home: true
append: true
Note: Infrastructure roles (PostgreSQL, Valkey) export client group GIDs as Ansible facts (postgresql_client_group_gid, valkey_client_group_gid) which are consumed by application container templates for dynamic --group-add arguments.
Consequences
Positive
- Security: Eliminated privileged daemon attack surface
- Integration: Seamless systemd integration for management
- Performance: Lower overhead than daemon-based solutions
- Reliability: systemd's proven service management
- Monitoring: Standard systemd monitoring and logging
Negative
- Learning Curve: Different from Docker Compose workflows
- Tooling: Ecosystem less mature than Docker
- Documentation: Fewer online resources and examples
Mitigation Strategies
- Documentation: Comprehensive internal documentation
- Training: Team training on Podman/systemd workflows
- Tooling: Helper scripts for common operations
Technical Implementation
# Container management (system scope)
systemctl status authentik-pod
systemctl restart authentik-server
podman ps
podman logs authentik-server
# Resource monitoring
systemctl show authentik-server --property=MemoryCurrent
journalctl -u authentik-server -f
# Verify container groups
ps aux | grep authentik-server | head -1 | awk '{print $2}' | \
xargs -I {} cat /proc/{}/status | grep Groups
# Output: Groups: 961 962 966
Alternatives Considered
- Docker + Docker Compose: Rejected due to security concerns (privileged daemon)
- Kubernetes: Rejected as overkill for single-node deployment
- Nomad: Rejected to maintain consistency with systemd ecosystem
OAuth/OIDC and Forward Authentication Security Model
Technical Story: Centralized authentication and authorization for multiple services using industry-standard OAuth2/OIDC protocols where supported, with forward authentication as a fallback.
Context
Authentication strategies for multiple services:
- Per-Service Authentication: Each service handles its own authentication
- Shared Database: Services share authentication database
- OAuth2/OIDC Integration: Services implement standard OAuth2/OIDC clients
- Forward Authentication: Reverse proxy handles authentication for services without OAuth support
Decision
We will use OAuth2/OIDC integration as the primary authentication method for services that support it, and forward authentication for services that do not support native OAuth2/OIDC integration.
Rationale
OAuth/OIDC as Primary Method
Security Benefits:
- Standard Protocol: Industry-standard authentication flow (RFC 6749, RFC 7636)
- Token-Based Security: Secure JWT tokens with cryptographic signatures
- Proper Session Management: Native application session handling with refresh tokens
- Scope-Based Authorization: Fine-grained permission control via OAuth scopes
- PKCE Support: Protection against authorization code interception attacks
Integration Benefits:
- Native Support: Applications designed for OAuth/OIDC work seamlessly
- Better UX: Proper redirect flows, logout handling, and token refresh
- API Access: OAuth tokens enable secure API integrations
- Standard Claims: OpenID Connect user info endpoint provides standardized user data
- Multi-Application SSO: Proper single sign-on with token sharing
Examples: Nextcloud, Gitea, Grafana, many modern applications
Forward Auth as Fallback
Use Cases:
- Services without OAuth/OIDC support
- Legacy applications that cannot be modified
- Static sites requiring authentication
- Simple internal tools
Security Benefits:
- Zero Application Changes: Protect existing services without modification
- Header-Based Identity: Simple identity propagation to backend
- Transparent Protection: Services receive pre-authenticated requests
Limitations:
- Non-Standard: Not using industry-standard authentication protocols
- Proxy Dependency: All requests must flow through authenticating proxy
- Limited Logout: Complex logout scenarios across services
- Header Trust: Backend must trust proxy-provided headers
Shared Benefits (Both Methods)
- Single Point of Control: Centralized authentication policy via Authentik
- Consistent Security: Same authentication provider across all services
- Multi-Factor Authentication: MFA applied consistently via Authentik
- Audit Trail: Centralized authentication logging
- User Management: One system for all user administration
Implementation Architecture
OAuth/OIDC Flow (Primary Method)
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ User │ │ Service │ │ Authentik │
│ │ │ (OAuth App) │ │ (IdP) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
│ Access Service │ │
│─────────────────▶│ │
│ │ │
│ │ No session │
│ 302 → OAuth │ │
│◀─────────────────│ │
│ │ │
│ GET /authorize?client_id=...&redirect_uri=...
│──────────────────────────────────────▶│
│ │ │
│ Login form (if not authenticated) │
│◀────────────────────────────────────│
│ │ │
│ Credentials │ │
│─────────────────────────────────────▶│
│ │ │
│ 302 → callback?code=AUTH_CODE │
│◀────────────────────────────────────│
│ │ │
│ GET /callback?code=AUTH_CODE │
│─────────────────▶│ │
│ │ │
│ │ POST /token │
│ │ code=AUTH_CODE │
│ │─────────────────▶│
│ │ │
│ │ access_token │
│ │ id_token (JWT) │
│ │◀─────────────────│
│ │ │
│ Set-Cookie │ GET /userinfo │
│ 302 → /dashboard │─────────────────▶│
│◀─────────────────│ │
│ │ User claims │
│ │◀─────────────────│
│ │ │
│ GET /dashboard │ │
│─────────────────▶│ │
│ │ │
│ Dashboard │ │
│◀─────────────────│ │
Forward Auth Flow (Fallback Method)
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ User │ │ Caddy │ │ Authentik │ │ Service │
│ │ │ (Proxy) │ │ (Forward) │ │ (Backend) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │ │
│ GET / │ │ │
│─────────────────▶│ │ │
│ │ │ │
│ │ Forward Auth │ │
│ │─────────────────▶│ │
│ │ │ │
│ │ 401 Unauthorized │ │
│ │◀─────────────────│ │
│ │ │ │
│ 302 → /auth │ │ │
│◀─────────────────│ │ │
│ │ │ │
│ Login form │ │ │
│──────────────────────────────────────▶│ │
│ │ │ │
│ Credentials │ │ │
│──────────────────────────────────────▶│ │
│ │ │ │
│ Set-Cookie │ │ │
│◀──────────────────────────────────────│ │
│ │ │ │
│ GET / │ │ │
│─────────────────▶│ │ │
│ │ │ │
│ │ Forward Auth │ │
│ │─────────────────▶│ │
│ │ │ │
│ │ 200 + Headers │ │
│ │◀─────────────────│ │
│ │ │ │
│ │ Proxy + Headers │ │
│ │─────────────────────────────────────▶│
│ │ │ │
│ │ Response │ │
│ │◀─────────────────────────────────────│
│ │ │ │
│ Content │ │ │
│◀─────────────────│ │ │
OAuth/OIDC Configuration Examples
Nextcloud OAuth Configuration
// Nextcloud config.php
'oidc_login_provider_url' => 'https://auth.jnss.me/application/o/nextcloud/',
'oidc_login_client_id' => 'nextcloud-client-id',
'oidc_login_client_secret' => 'secret-from-authentik',
'oidc_login_auto_redirect' => true,
'oidc_login_end_session_redirect' => true,
'oidc_login_button_text' => 'Login with SSO',
'oidc_login_hide_password_form' => true,
'oidc_login_use_id_token' => true,
'oidc_login_attributes' => [
'id' => 'preferred_username',
'name' => 'name',
'mail' => 'email',
'groups' => 'groups',
],
'oidc_login_default_group' => 'users',
'oidc_login_use_external_storage' => false,
'oidc_login_scope' => 'openid profile email groups',
'oidc_login_proxy_ldap' => false,
'oidc_login_disable_registration' => false,
'oidc_login_redir_fallback' => true,
'oidc_login_tls_verify' => true,
Gitea OAuth Configuration
# Gitea app.ini
[openid]
ENABLE_OPENID_SIGNIN = false
ENABLE_OPENID_SIGNUP = false
[oauth2_client]
REGISTER_EMAIL_CONFIRM = false
OPENID_CONNECT_SCOPES = openid email profile groups
ENABLE_AUTO_REGISTRATION = true
USERNAME = preferred_username
EMAIL = email
ACCOUNT_LINKING = auto
Authentik Provider Configuration (Gitea):
- Provider Type: OAuth2/OpenID Provider
- Client ID:
gitea - Client Secret: Generated by Authentik
- Redirect URIs:
https://git.jnss.me/user/oauth2/Authentik/callback - Scopes:
openid,profile,email,groups
Authentik OAuth2 Provider Settings
# OAuth2/OIDC Provider configuration in Authentik
name: "Nextcloud OAuth Provider"
authorization_flow: "default-authorization-flow"
client_type: "confidential"
client_id: "nextcloud-client-id"
redirect_uris: "https://cloud.jnss.me/apps/oidc_login/oidc"
signing_key: "authentik-default-key"
property_mappings:
- "authentik default OAuth Mapping: OpenID 'openid'"
- "authentik default OAuth Mapping: OpenID 'email'"
- "authentik default OAuth Mapping: OpenID 'profile'"
- "Custom: Groups" # Maps user groups to 'groups' claim
Forward Auth Configuration Examples
Caddy Configuration for Forward Auth
# whoami service with forward authentication
whoami.jnss.me {
# Forward authentication to Authentik
forward_auth https://auth.jnss.me {
uri /outpost.goauthentik.io/auth/caddy
copy_headers Remote-User Remote-Name Remote-Email Remote-Groups
}
# Backend service (receives authenticated requests)
reverse_proxy localhost:8080
}
Authentik Proxy Provider Configuration
# Authentik Proxy Provider for forward auth
name: "Whoami Forward Auth"
type: "proxy"
authorization_flow: "default-authorization-flow"
external_host: "https://whoami.jnss.me"
internal_host: "http://localhost:8080"
skip_path_regex: "^/(health|metrics).*"
mode: "forward_single" # Single application mode
Service Integration (Forward Auth)
Services receive authentication information via HTTP headers:
# Example service code (Python Flask)
@app.route('/')
def index():
username = request.headers.get('Remote-User')
name = request.headers.get('Remote-Name')
email = request.headers.get('Remote-Email')
groups = request.headers.get('Remote-Groups', '').split(',')
return render_template('index.html',
username=username,
name=name,
email=email,
groups=groups)
Authorization Policies
Both OAuth and Forward Auth support Authentik authorization policies:
# Example authorization policy in Authentik
policy_bindings:
- policy: "group_admins_only"
target: "nextcloud_oauth_provider"
order: 0
- policy: "require_mfa"
target: "gitea_oauth_provider"
order: 1
- policy: "internal_network_only"
target: "whoami_proxy_provider"
order: 0
Decision Matrix: OAuth/OIDC vs Forward Auth
| Criteria | OAuth/OIDC | Forward Auth |
|---|---|---|
| Application Support | Requires native OAuth/OIDC support | Any application |
| Protocol Standard | Industry standard (RFC 6749, 7636) | Proprietary/custom |
| Token Management | Native refresh tokens, proper expiry | Session-based only |
| Logout Handling | Proper logout flow | Complex, proxy-dependent |
| API Access | Full API support via tokens | Header-only |
| Implementation Effort | Configure OAuth settings | Zero app changes |
| User Experience | Standard OAuth redirects | Transparent |
| Security Model | Token-based with scopes | Header trust model |
| When to Use | Nextcloud, Gitea, modern apps | Static sites, legacy apps, whoami |
Consequences
Positive
- Standards Compliance: OAuth/OIDC uses industry-standard protocols
- Security: Multiple authentication options with appropriate security models
- Flexibility: Right tool for each service (OAuth when possible, forward auth when needed)
- Auditability: Centralized authentication logging via Authentik
- User Experience: Proper SSO across all services
- Token Security: OAuth provides secure token refresh and scope management
- Graceful Degradation: Forward auth available for services without OAuth support
Negative
- Complexity: Need to understand two authentication methods
- Configuration Overhead: OAuth requires per-service configuration
- Single Point of Failure: Authentik failure affects all services
- Learning Curve: Team must understand OAuth flows and forward auth model
Mitigation Strategies
- Documentation: Clear decision guide for choosing OAuth vs forward auth
- Templates: Reusable OAuth configuration templates for common services
- High Availability: Robust deployment and monitoring of Authentik
- Monitoring: Comprehensive monitoring of both authentication flows
- Testing: Automated tests for authentication flows
Security Considerations
OAuth/OIDC Security
# Authentik OAuth2 Provider security settings
authorization_code_validity: 60 # 1 minute
access_code_validity: 3600 # 1 hour
refresh_code_validity: 2592000 # 30 days
include_claims_in_id_token: true
signing_key: "authentik-default-key"
sub_mode: "hashed_user_id"
issuer_mode: "per_provider"
Best Practices:
- Use PKCE for all OAuth flows (protection against interception)
- Implement proper token rotation (refresh tokens expire and rotate)
- Validate
aud(audience) andiss(issuer) claims in JWT tokens - Use short-lived access tokens (1 hour)
- Store client secrets securely (Ansible Vault)
Forward Auth Security
# Authentik Proxy Provider security settings
token_validity: 3600 # 1 hour session
cookie_domain: ".jnss.me"
skip_path_regex: "^/(health|metrics|static).*"
Best Practices:
- Trust only Authentik-provided headers
- Validate
Remote-Userheader exists before granting access - Use HTTPS for all forward auth endpoints
- Implement proper session timeouts
- Strip user-provided authentication headers at proxy
Access Control
- Group-Based Authorization: Users assigned to groups, groups to applications
- Policy Engine: Authentik policies for fine-grained access control
- MFA Requirements: Multi-factor authentication for sensitive services
- IP-Based Restrictions: Geographic or network-based access control
- Time-Based Access: Temporary access grants via policies
Audit Logging
{
"timestamp": "2025-12-15T10:30:00Z",
"event": "oauth_authorization",
"user": "john.doe",
"application": "nextcloud",
"scopes": ["openid", "email", "profile", "groups"],
"ip": "192.168.1.100",
"user_agent": "Mozilla/5.0..."
}
Implementation Examples by Service Type
OAuth/OIDC Services (Primary Method)
Nextcloud:
cloud.jnss.me {
reverse_proxy localhost:8080
}
# OAuth configured within Nextcloud application
Gitea:
git.jnss.me {
reverse_proxy localhost:3000
}
# OAuth configured within Gitea application settings
Forward Auth Services (Fallback Method)
Whoami (test/demo service):
whoami.jnss.me {
forward_auth https://auth.jnss.me {
uri /outpost.goauthentik.io/auth/caddy
copy_headers Remote-User Remote-Name Remote-Email Remote-Groups
}
reverse_proxy localhost:8080
}
Static Documentation Site:
docs.jnss.me {
forward_auth https://auth.jnss.me {
uri /outpost.goauthentik.io/auth/caddy
copy_headers Remote-User Remote-Groups
}
root * /var/www/docs
file_server
}
Internal API (no OAuth support):
api.jnss.me {
forward_auth https://auth.jnss.me {
uri /outpost.goauthentik.io/auth/caddy
copy_headers Remote-User Remote-Email Remote-Groups
}
reverse_proxy localhost:3000
}
Selective Protection (Public + Protected Paths)
app.jnss.me {
# Public endpoints (no auth required)
handle /health {
reverse_proxy localhost:8080
}
handle /metrics {
reverse_proxy localhost:8080
}
handle /public/* {
reverse_proxy localhost:8080
}
# Protected endpoints (forward auth)
handle /admin/* {
forward_auth https://auth.jnss.me {
uri /outpost.goauthentik.io/auth/caddy
copy_headers Remote-User Remote-Groups
}
reverse_proxy localhost:8080
}
# Default: protected
handle {
forward_auth https://auth.jnss.me {
uri /outpost.goauthentik.io/auth/caddy
copy_headers Remote-User Remote-Groups
}
reverse_proxy localhost:8080
}
}
Alternatives Considered
- OAuth2/OIDC Only: Rejected because many services don't support OAuth natively
- Forward Auth Only: Rejected because it doesn't leverage native OAuth support in modern apps
- Per-Service Authentication: Rejected due to management overhead and inconsistent security
- Shared Database: Rejected due to tight coupling between services
- VPN-Based Access: Rejected due to operational complexity for web services
- SAML: Rejected in favor of modern OAuth2/OIDC standards
Rootful Containers with Infrastructure Fact Pattern
Technical Story: Enable containerized applications to access native infrastructure services (PostgreSQL, Valkey) via Unix sockets with group-based permissions.
Context
Containerized applications need to access infrastructure services (PostgreSQL, Valkey) through Unix sockets with filesystem-based permission controls. The permission model requires:
- Socket directories owned by service groups (
postgres-clients,valkey-clients) - Application users added to these groups for access
- Container processes must preserve group membership to access sockets
Two approaches were evaluated:
- Rootless containers (user namespace): Containers run in user namespace with UID/GID remapping
- Rootful containers (system services): Containers run as dedicated system users without namespace isolation
Decision
We will use rootful containers deployed as system-level systemd services with an Infrastructure Fact Pattern where infrastructure roles export client group GIDs as Ansible facts for application consumption.
Rationale
Why Rootful Succeeds
Direct UID/GID Mapping:
# Host: authentik user UID 966, groups: 966 (authentik), 961 (valkey-clients), 962 (postgres-clients)
# Container User=966:966 with PodmanArgs=--group-add 961 --group-add 962
# Inside container:
id
# uid=966(authentik) gid=966(authentik) groups=966(authentik),961(valkey-clients),962(postgres-clients)
# Socket access works:
ls -l /var/run/postgresql/.s.PGSQL.5432
# srwxrwx--- 1 postgres postgres-clients 0 ... /var/run/postgresql/.s.PGSQL.5432
Group membership preserved: Container process has GIDs 961 and 962, matching socket group ownership.
Why Rootless Failed (Discarded Approach)
User Namespace UID/GID Remapping:
# Host: authentik user UID 100000, subuid range 200000-265535
# Container User=%i:%i with --userns=host --group-add=keep-groups
# User namespace remaps:
# Host UID 100000 → Container UID 100000 (root in namespace)
# Host GID 961 → Container GID 200961 (remapped into subgid range)
# Host GID 962 → Container GID 200962 (remapped into subgid range)
# Socket ownership on host:
# srwxrwx--- 1 postgres postgres-clients (GID 962)
# Container process groups: 200961, 200962 (remapped)
# Socket expects: GID 962 (not remapped)
# Result: Permission denied ❌
Root cause: User namespace supplementary group remapping breaks group-based socket access even with --userns=host, --group-add=keep-groups, and Annotation=run.oci.keep_original_groups=1.
Infrastructure Fact Pattern
Infrastructure Roles Export GIDs
Infrastructure services create client groups and export their GIDs as Ansible facts:
# PostgreSQL role: roles/postgresql/tasks/main.yml
- name: Create PostgreSQL client access group
group:
name: postgres-clients
system: true
- name: Get PostgreSQL client group GID
shell: "getent group postgres-clients | cut -d: -f3"
register: postgresql_client_group_lookup
changed_when: false
- name: Set PostgreSQL client group GID as fact
set_fact:
postgresql_client_group_gid: "{{ postgresql_client_group_lookup.stdout }}"
# Valkey role: roles/valkey/tasks/main.yml
- name: Create Valkey client access group
group:
name: valkey-clients
system: true
- name: Get Valkey client group GID
shell: "getent group valkey-clients | cut -d: -f3"
register: valkey_client_group_lookup
changed_when: false
- name: Set Valkey client group GID as fact
set_fact:
valkey_client_group_gid: "{{ valkey_client_group_lookup.stdout }}"
Application Roles Consume Facts
Application roles validate and consume infrastructure facts:
# Authentik role: roles/authentik/tasks/main.yml
- name: Validate infrastructure facts are available
assert:
that:
- postgresql_client_group_gid is defined
- valkey_client_group_gid is defined
fail_msg: |
Required infrastructure facts are not available.
Ensure PostgreSQL and Valkey roles have run first.
- name: Create authentik user with infrastructure groups
user:
name: authentik
groups: [postgres-clients, valkey-clients]
append: true
# Container template: roles/authentik/templates/authentik-server.container
[Container]
User={{ authentik_uid }}:{{ authentik_gid }}
PodmanArgs=--group-add {{ postgresql_client_group_gid }} --group-add {{ valkey_client_group_gid }}
Implementation Details
System-Level Deployment
# Quadlet files deployed to /etc/containers/systemd/ (not ~/.config/)
# Pod: /etc/containers/systemd/authentik.pod
[Unit]
Description=Authentik Authentication Pod
[Pod]
PublishPort=0.0.0.0:9000:9000
ShmSize=256m
[Service]
Restart=always
[Install]
WantedBy=multi-user.target # System target, not default.target
# Container: /etc/containers/systemd/authentik-server.container
[Container]
User=966:966
PodmanArgs=--group-add 962 --group-add 961
Volume=/var/run/postgresql:/var/run/postgresql:Z
Volume=/var/run/valkey:/var/run/valkey:Z
Service Management
# System scope (not user scope)
systemctl status authentik-pod
systemctl restart authentik-server
journalctl -u authentik-server -f
# Verify container location
systemctl status authentik-server | grep CGroup
# CGroup: /system.slice/authentik-server.service ✓
Special Case: Valkey Socket Group Fix
Valkey doesn't natively support socket group configuration (unlike PostgreSQL's unix_socket_group). A helper service ensures correct socket permissions:
# /etc/systemd/system/valkey-socket-fix.service
[Unit]
Description=Fix Valkey socket group ownership and permissions
BindsTo=valkey.service
After=valkey.service
[Service]
Type=oneshot
ExecStart=/bin/sh -c 'i=0; while [ ! -S /var/run/valkey/valkey.sock ] && [ $i -lt 100 ]; do sleep 0.1; i=$((i+1)); done'
ExecStart=/bin/chgrp valkey-clients /var/run/valkey/valkey.sock
ExecStart=/bin/chmod 770 /var/run/valkey/valkey.sock
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
Triggered by Valkey service:
# /etc/systemd/system/valkey.service (excerpt)
[Unit]
Wants=valkey-socket-fix.service
Consequences
Positive
- Socket Access Works: Group-based permissions function correctly
- Security: Containers run as dedicated users (not root), no privileged daemon
- Portability: Dynamic GID facts work across different hosts
- Consistency: Same pattern for all containerized applications
- Simplicity: No user namespace complexity, standard systemd service management
Negative
- Not "Pure" Rootless: Containers require root for systemd service deployment
- Different from Docker: Less familiar pattern than rootless user services
Neutral
- System vs User Scope: Different commands (
systemctlvssystemctl --user) but equally capable - Deployment Location:
/etc/containers/systemd/vs~/.config/but same Quadlet functionality
Validation
# Verify service location
systemctl status authentik-server | grep CGroup
# → /system.slice/authentik-server.service ✓
# Verify process groups
ps aux | grep authentik | head -1 | awk '{print $2}' | \
xargs -I {} cat /proc/{}/status | grep Groups
# → Groups: 961 962 966 ✓
# Verify socket permissions
ls -l /var/run/postgresql/.s.PGSQL.5432
# → srwxrwx--- postgres postgres-clients ✓
ls -l /var/run/valkey/valkey.sock
# → srwxrwx--- valkey valkey-clients ✓
# Verify HTTP endpoint
curl -I http://127.0.0.1:9000/
# → HTTP/1.1 302 Found ✓
Alternatives Considered
- Rootless with user namespace - Discarded due to GID remapping breaking group-based socket access
- TCP-only connections - Rejected to maintain Unix socket security and performance benefits
- Hardcoded GIDs - Rejected for portability; facts provide dynamic resolution
- Directory permissions (777) - Rejected for security; group-based access more restrictive. This is then later changed again to 777, due to Nextcloud switching from root to www-data, breaking group-based permissions.
ADR-007: Multi-Environment Infrastructure Architecture
Date: December 2025
Status: Accepted
Context: Separation of homelab services from production client projects
Decision
Rick-infra will manage two separate environments with different purposes and uptime requirements:
-
Homelab Environment (arch-vps)
- Purpose: Personal services and experimentation
- Infrastructure: Full stack (PostgreSQL, Valkey, Podman, Caddy)
- Services: Authentik, Nextcloud, Gitea
- Uptime requirement: Best effort
-
Production Environment (mini-vps)
- Purpose: Client projects requiring high uptime
- Infrastructure: Minimal (Caddy only)
- Services: Sigvild Gallery
- Uptime requirement: High availability
Rationale
Separation of Concerns:
- Personal experiments don't affect client services
- Client services isolated from homelab maintenance
- Clear distinction between environments in code
Infrastructure Optimization:
- Production runs minimal services (no PostgreSQL/Valkey overhead)
- Homelab can be rebooted/upgraded without affecting clients
- Cost optimization: smaller VPS for production
Operational Flexibility:
- Different backup strategies per environment
- Different monitoring/alerting levels
- Independent deployment schedules
Implementation
Variable Organization:
rick-infra/
├── group_vars/
│ └── production/ # Production environment config
│ ├── main.yml
│ └── vault.yml
├── host_vars/
│ └── arch-vps/ # Homelab host config
│ ├── main.yml
│ └── vault.yml
└── playbooks/
├── homelab.yml # Homelab deployment
├── production.yml # Production deployment
└── site.yml # Orchestrates both
Playbook Structure:
site.ymlimports both homelab.yml and production.yml- Each playbook manually loads variables (Ansible 2.20 workaround)
- Services deploy only to their designated environment
Inventory Groups:
homelab:
hosts:
arch-vps:
ansible_host: 69.62.119.31
production:
hosts:
mini-vps:
ansible_host: 72.62.91.251
Migration Example
Sigvild Gallery Migration (December 2025):
- From: arch-vps (homelab)
- To: mini-vps (production)
- Reason: Client project requiring higher uptime
- Process:
- Created backup on arch-vps
- Deployed to mini-vps with automatic restore
- Updated DNS (5 min downtime)
- Removed from arch-vps configuration
Consequences
Positive:
- Clear separation of personal vs. client services
- Reduced blast radius for experiments
- Optimized resource usage per environment
- Independent scaling and management
Negative:
- Increased complexity in playbook organization
- Need to manage multiple VPS instances
- Ansible 2.20 variable loading requires workarounds
- Duplicate infrastructure code (Caddy on both)
Neutral:
- Services can be migrated between environments with minimal friction
- Backup/restore procedures work across environments
- Group_vars vs. host_vars hybrid approach
Future Considerations
- Consider grouping multiple client projects on production VPS
- Evaluate if homelab needs full infrastructure stack
- Monitor for opportunities to share infrastructure between environments
- Document migration procedures for moving services between environments