Files
rick-infra/docs/architecture-decisions.md
Joakim e8b76c6a72 Update authentication documentation to reflect OAuth/OIDC as primary method
- Update architecture-decisions.md: Change decision to OAuth/OIDC primary, forward auth fallback
  - Add comprehensive OAuth/OIDC and forward auth flow diagrams
  - Add decision matrix comparing both authentication methods
  - Include real examples: Nextcloud/Gitea OAuth configs, whoami forward auth
  - Update rationale to emphasize OAuth/OIDC security and standards benefits

- Update authentication-architecture.md: Align with new OAuth-first approach
  - Add 'Choosing the Right Pattern' section with clear decision guidance
  - Swap pattern order: OAuth/OIDC (Pattern 1), Forward Auth (Pattern 2)
  - Update Example 1: Change Gitea from forward auth to OAuth/OIDC integration
  - Add emphasis on primary vs fallback methods throughout

- Update authentik-deployment-guide.md: Reflect OAuth/OIDC preference
  - Update overview to mention OAuth2/OIDC provider and forward auth fallback
  - Add decision guidance to service integration examples
  - Reorder examples: Nextcloud OAuth (primary), forward auth (fallback)
  - Clarify forward auth should only be used for services without OAuth support

This update ensures all authentication documentation consistently reflects the
agreed architectural decision: use OAuth/OIDC when services support it
(Nextcloud, Gitea, modern apps), and only use forward auth as a fallback for
legacy applications, static sites, or simple tools without OAuth capabilities.
2025-12-15 00:25:24 +01:00

1130 lines
41 KiB
Markdown

# Architecture Decision Records (ADR)
This document records the significant architectural decisions made in the rick-infra project.
---
## Unix Socket IPC Architecture
### Context
Containerized applications need to communicate with database and cache services. Communication methods include:
1. **Network TCP/IP**: Standard network protocols
2. **Unix Domain Sockets**: Filesystem-based IPC
### Decision
We will use **Unix domain sockets** for all communication between applications and infrastructure services.
### Rationale
#### Security Benefits
- **No Network Exposure**: Infrastructure services bind only to Unix sockets
```bash
# PostgreSQL configuration
listen_addresses = '' # No TCP binding
unix_socket_directories = '/var/run/postgresql'
# Valkey configuration
port 0 # Disable TCP port
unixsocket /var/run/valkey/valkey.sock
```
- **Filesystem Permissions**: Access controlled by Unix file permissions
```bash
srwxrwx--- 1 postgres postgres 0 /var/run/postgresql/.s.PGSQL.5432
srwxrwx--- 1 valkey valkey 0 /var/run/valkey/valkey.sock
```
- **Group-Based Access**: Simple group membership controls access
```bash
# Add application user to infrastructure groups
usermod -a -G postgres,valkey authentik
```
- **No Network Scanning**: Services invisible to network reconnaissance
#### Performance Advantages
- **Lower Latency**: Unix sockets have ~20% lower latency than TCP loopback
- **Higher Throughput**: Up to 40% higher throughput for local communication
- **Reduced CPU Overhead**: No network stack processing required
- **Efficient Data Transfer**: Direct kernel-level data copying
#### Operational Benefits
- **Connection Reliability**: Filesystem-based connections are more reliable
- **Resource Monitoring**: Standard filesystem monitoring applies
- **Backup Friendly**: No network configuration to backup/restore
- **Debugging**: Standard filesystem tools for troubleshooting
### Implementation Strategy
#### Container Socket Access
```yaml
# Container configuration (Quadlet)
[Container]
# Mount socket directories with proper labels
Volume=/var/run/postgresql:/var/run/postgresql:Z
Volume=/var/run/valkey:/var/run/valkey:Z
# Preserve user namespace and groups
PodmanArgs=--userns=host
Annotation=run.oci.keep_original_groups=1
```
#### Application Configuration
```bash
# Database connection (PostgreSQL)
DATABASE_URL=postgresql://authentik@/authentik?host=/var/run/postgresql
# Cache connection (Redis/Valkey)
CACHE_URL=unix:///var/run/valkey/valkey.sock?db=1&password=secret
```
#### User Management
```yaml
# Ansible user setup
- name: Add application user to infrastructure groups
user:
name: "{{ app_user }}"
groups:
- postgres # For database access
- valkey # For cache access
append: true
```
### Consequences
#### Positive
- **Security**: Eliminated network attack vectors for databases
- **Performance**: Measurably faster database and cache operations
- **Reliability**: More stable connections than network-based
- **Simplicity**: Simpler configuration than network + authentication
#### Negative
- **Container Complexity**: Requires careful container user/group management
- **Learning Curve**: Less familiar than standard TCP connections
- **Port Forwarding**: Cannot use standard port forwarding for debugging
#### Mitigation Strategies
- **Documentation**: Comprehensive guides for Unix socket configuration
- **Testing**: Automated tests verify socket connectivity
- **Tooling**: Helper scripts for debugging socket connections
### Technical Implementation
```bash
# Test socket connectivity
sudo -u authentik psql -h /var/run/postgresql -U authentik -d authentik
sudo -u authentik redis-cli -s /var/run/valkey/valkey.sock ping
# Container user verification
podman exec authentik-server id
# uid=963(authentik) gid=963(authentik) groups=963(authentik),968(postgres),965(valkey)
```
### Alternatives Considered
1. **TCP with Authentication**: Rejected due to network exposure
2. **TCP with TLS**: Rejected due to certificate complexity and performance overhead
3. **Shared Memory**: Rejected due to implementation complexity
---
## ADR-003: Podman + systemd Container Orchestration
**Technical Story**: Container orchestration solution for secure application deployment with systemd integration.
### Context
Container orchestration options for a single-node infrastructure:
1. **Docker + Docker Compose**: Traditional container orchestration
2. **Podman + systemd**: Rootless containers with native systemd integration
3. **Kubernetes**: Full orchestration platform (overkill for single node)
4. **Nomad**: HashiCorp orchestration solution
### Decision
We will use **Podman with systemd integration (Quadlet)** for container orchestration, deployed as system-level services (rootful containers running as dedicated users).
### Rationale
#### Security Advantages
- **No Daemon Required**: No privileged daemon attack surface
```bash
# Docker: Requires root daemon
sudo systemctl status docker
# Podman: Daemonless operation
podman ps # No daemon needed
```
- **Dedicated Service Users**: Containers run as dedicated system users (not root)
- **Group-Based Access Control**: Unix group membership controls infrastructure access
- **SELinux Integration**: Better SELinux support than Docker
#### systemd Integration Benefits
- **Native Service Management**: Containers as system-level systemd services
```ini
# Quadlet file: /etc/containers/systemd/authentik.pod
[Unit]
Description=Authentik Authentication Pod
[Pod]
PublishPort=0.0.0.0:9000:9000
ShmSize=256m
[Service]
Restart=always
TimeoutStartSec=900
[Install]
WantedBy=multi-user.target
```
- **Dependency Management**: systemd handles service dependencies
- **Resource Control**: systemd resource limits and monitoring
- **Logging Integration**: journald for centralized logging
#### Operational Excellence
- **Familiar Tooling**: Standard systemd commands
```bash
systemctl status authentik-pod
systemctl restart authentik-server
journalctl -u authentik-server -f
```
- **Boot Integration**: Services start automatically at system boot
- **Resource Monitoring**: systemd resource tracking
- **Configuration Management**: Declarative Quadlet files
#### Performance Benefits
- **Lower Overhead**: No daemon overhead for container management
- **Direct Kernel Access**: Better performance than daemon-based solutions
- **Resource Efficiency**: More efficient resource utilization
### Implementation Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ systemd System Services (/system.slice/) │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌───────────────┐ │
│ │ authentik-pod │ │ authentik-server│ │authentik-worker│ │
│ │ .service │ │ .service │ │ .service │ │
│ └─────────────────┘ └─────────────────┘ └───────────────┘ │
│ │ │ │ │
│ └────────────────────┼────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Podman Pod (rootful, dedicated user) │ │
│ │ │ │
│ │ ┌─────────────────┐ ┌─────────────────────────────────┐ │ │
│ │ │ Server Container│ │ Worker Container │ │ │
│ │ │ User: 966:966 │ │ User: 966:966 │ │ │
│ │ │ Groups: 961,962 │ │ Groups: 961,962 │ │ │
│ │ │ (valkey,postgres)│ │ (valkey,postgres) │ │ │
│ │ └─────────────────┘ └─────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│ Group-based access to infrastructure
┌─────────────────────────────────────────────────────────────┐
│ Infrastructure Services │
│ PostgreSQL: /var/run/postgresql (postgres:postgres-clients)│
│ Valkey: /var/run/valkey (valkey:valkey-clients) │
└─────────────────────────────────────────────────────────────┘
```
#### Quadlet Configuration
```ini
# Pod configuration (authentik.pod)
[Unit]
Description=Authentik Authentication Pod
[Pod]
PublishPort=127.0.0.1:9000:9000
ShmSize=256m
[Service]
Restart=always
[Install]
WantedBy=default.target
```
```ini
# Container configuration (authentik-server.container)
[Unit]
Description=Authentik Server Container
After=authentik-pod.service
Requires=authentik-pod.service
[Container]
ContainerName=authentik-server
Image=ghcr.io/goauthentik/server:2025.10
Pod=authentik.pod
EnvironmentFile=/opt/authentik/.env
User=966:966
PodmanArgs=--group-add 962 --group-add 961
# Volume mounts for sockets and data
Volume=/opt/authentik/media:/media
Volume=/opt/authentik/data:/data
Volume=/var/run/postgresql:/var/run/postgresql:Z
Volume=/var/run/valkey:/var/run/valkey:Z
[Service]
Restart=always
TimeoutStartSec=300
[Install]
WantedBy=multi-user.target
```
### User Management Strategy
```yaml
# Ansible implementation
- name: Create service user
user:
name: authentik
group: authentik
groups: [postgres-clients, valkey-clients]
system: true
shell: /bin/bash
home: /opt/authentik
create_home: true
append: true
```
**Note**: Infrastructure roles (PostgreSQL, Valkey) export client group GIDs as Ansible facts (`postgresql_client_group_gid`, `valkey_client_group_gid`) which are consumed by application container templates for dynamic `--group-add` arguments.
### Consequences
#### Positive
- **Security**: Eliminated privileged daemon attack surface
- **Integration**: Seamless systemd integration for management
- **Performance**: Lower overhead than daemon-based solutions
- **Reliability**: systemd's proven service management
- **Monitoring**: Standard systemd monitoring and logging
#### Negative
- **Learning Curve**: Different from Docker Compose workflows
- **Tooling**: Ecosystem less mature than Docker
- **Documentation**: Fewer online resources and examples
#### Mitigation Strategies
- **Documentation**: Comprehensive internal documentation
- **Training**: Team training on Podman/systemd workflows
- **Tooling**: Helper scripts for common operations
### Technical Implementation
```bash
# Container management (system scope)
systemctl status authentik-pod
systemctl restart authentik-server
podman ps
podman logs authentik-server
# Resource monitoring
systemctl show authentik-server --property=MemoryCurrent
journalctl -u authentik-server -f
# Verify container groups
ps aux | grep authentik-server | head -1 | awk '{print $2}' | \
xargs -I {} cat /proc/{}/status | grep Groups
# Output: Groups: 961 962 966
```
### Alternatives Considered
1. **Docker + Docker Compose**: Rejected due to security concerns (privileged daemon)
2. **Kubernetes**: Rejected as overkill for single-node deployment
3. **Nomad**: Rejected to maintain consistency with systemd ecosystem
---
## OAuth/OIDC and Forward Authentication Security Model
**Technical Story**: Centralized authentication and authorization for multiple services using industry-standard OAuth2/OIDC protocols where supported, with forward authentication as a fallback.
### Context
Authentication strategies for multiple services:
1. **Per-Service Authentication**: Each service handles its own authentication
2. **Shared Database**: Services share authentication database
3. **OAuth2/OIDC Integration**: Services implement standard OAuth2/OIDC clients
4. **Forward Authentication**: Reverse proxy handles authentication for services without OAuth support
### Decision
We will use **OAuth2/OIDC integration** as the primary authentication method for services that support it, and **forward authentication** for services that do not support native OAuth2/OIDC integration.
### Rationale
#### OAuth/OIDC as Primary Method
**Security Benefits**:
- **Standard Protocol**: Industry-standard authentication flow (RFC 6749, RFC 7636)
- **Token-Based Security**: Secure JWT tokens with cryptographic signatures
- **Proper Session Management**: Native application session handling with refresh tokens
- **Scope-Based Authorization**: Fine-grained permission control via OAuth scopes
- **PKCE Support**: Protection against authorization code interception attacks
**Integration Benefits**:
- **Native Support**: Applications designed for OAuth/OIDC work seamlessly
- **Better UX**: Proper redirect flows, logout handling, and token refresh
- **API Access**: OAuth tokens enable secure API integrations
- **Standard Claims**: OpenID Connect user info endpoint provides standardized user data
- **Multi-Application SSO**: Proper single sign-on with token sharing
**Examples**: Nextcloud, Gitea, Grafana, many modern applications
#### Forward Auth as Fallback
**Use Cases**:
- Services without OAuth/OIDC support
- Legacy applications that cannot be modified
- Static sites requiring authentication
- Simple internal tools
**Security Benefits**:
- **Zero Application Changes**: Protect existing services without modification
- **Header-Based Identity**: Simple identity propagation to backend
- **Transparent Protection**: Services receive pre-authenticated requests
**Limitations**:
- **Non-Standard**: Not using industry-standard authentication protocols
- **Proxy Dependency**: All requests must flow through authenticating proxy
- **Limited Logout**: Complex logout scenarios across services
- **Header Trust**: Backend must trust proxy-provided headers
#### Shared Benefits (Both Methods)
- **Single Point of Control**: Centralized authentication policy via Authentik
- **Consistent Security**: Same authentication provider across all services
- **Multi-Factor Authentication**: MFA applied consistently via Authentik
- **Audit Trail**: Centralized authentication logging
- **User Management**: One system for all user administration
### Implementation Architecture
#### OAuth/OIDC Flow (Primary Method)
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ User │ │ Service │ │ Authentik │
│ │ │ (OAuth App) │ │ (IdP) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
│ Access Service │ │
│─────────────────▶│ │
│ │ │
│ │ No session │
│ 302 → OAuth │ │
│◀─────────────────│ │
│ │ │
│ GET /authorize?client_id=...&redirect_uri=...
│──────────────────────────────────────▶│
│ │ │
│ Login form (if not authenticated) │
│◀────────────────────────────────────│
│ │ │
│ Credentials │ │
│─────────────────────────────────────▶│
│ │ │
│ 302 → callback?code=AUTH_CODE │
│◀────────────────────────────────────│
│ │ │
│ GET /callback?code=AUTH_CODE │
│─────────────────▶│ │
│ │ │
│ │ POST /token │
│ │ code=AUTH_CODE │
│ │─────────────────▶│
│ │ │
│ │ access_token │
│ │ id_token (JWT) │
│ │◀─────────────────│
│ │ │
│ Set-Cookie │ GET /userinfo │
│ 302 → /dashboard │─────────────────▶│
│◀─────────────────│ │
│ │ User claims │
│ │◀─────────────────│
│ │ │
│ GET /dashboard │ │
│─────────────────▶│ │
│ │ │
│ Dashboard │ │
│◀─────────────────│ │
```
#### Forward Auth Flow (Fallback Method)
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ User │ │ Caddy │ │ Authentik │ │ Service │
│ │ │ (Proxy) │ │ (Forward) │ │ (Backend) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │ │
│ GET / │ │ │
│─────────────────▶│ │ │
│ │ │ │
│ │ Forward Auth │ │
│ │─────────────────▶│ │
│ │ │ │
│ │ 401 Unauthorized │ │
│ │◀─────────────────│ │
│ │ │ │
│ 302 → /auth │ │ │
│◀─────────────────│ │ │
│ │ │ │
│ Login form │ │ │
│──────────────────────────────────────▶│ │
│ │ │ │
│ Credentials │ │ │
│──────────────────────────────────────▶│ │
│ │ │ │
│ Set-Cookie │ │ │
│◀──────────────────────────────────────│ │
│ │ │ │
│ GET / │ │ │
│─────────────────▶│ │ │
│ │ │ │
│ │ Forward Auth │ │
│ │─────────────────▶│ │
│ │ │ │
│ │ 200 + Headers │ │
│ │◀─────────────────│ │
│ │ │ │
│ │ Proxy + Headers │ │
│ │─────────────────────────────────────▶│
│ │ │ │
│ │ Response │ │
│ │◀─────────────────────────────────────│
│ │ │ │
│ Content │ │ │
│◀─────────────────│ │ │
```
### OAuth/OIDC Configuration Examples
#### Nextcloud OAuth Configuration
```php
// Nextcloud config.php
'oidc_login_provider_url' => 'https://auth.jnss.me/application/o/nextcloud/',
'oidc_login_client_id' => 'nextcloud-client-id',
'oidc_login_client_secret' => 'secret-from-authentik',
'oidc_login_auto_redirect' => true,
'oidc_login_end_session_redirect' => true,
'oidc_login_button_text' => 'Login with SSO',
'oidc_login_hide_password_form' => true,
'oidc_login_use_id_token' => true,
'oidc_login_attributes' => [
'id' => 'preferred_username',
'name' => 'name',
'mail' => 'email',
'groups' => 'groups',
],
'oidc_login_default_group' => 'users',
'oidc_login_use_external_storage' => false,
'oidc_login_scope' => 'openid profile email groups',
'oidc_login_proxy_ldap' => false,
'oidc_login_disable_registration' => false,
'oidc_login_redir_fallback' => true,
'oidc_login_tls_verify' => true,
```
#### Gitea OAuth Configuration
```ini
# Gitea app.ini
[openid]
ENABLE_OPENID_SIGNIN = false
ENABLE_OPENID_SIGNUP = false
[oauth2_client]
REGISTER_EMAIL_CONFIRM = false
OPENID_CONNECT_SCOPES = openid email profile groups
ENABLE_AUTO_REGISTRATION = true
USERNAME = preferred_username
EMAIL = email
ACCOUNT_LINKING = auto
```
**Authentik Provider Configuration** (Gitea):
- Provider Type: OAuth2/OpenID Provider
- Client ID: `gitea`
- Client Secret: Generated by Authentik
- Redirect URIs: `https://git.jnss.me/user/oauth2/Authentik/callback`
- Scopes: `openid`, `profile`, `email`, `groups`
#### Authentik OAuth2 Provider Settings
```yaml
# OAuth2/OIDC Provider configuration in Authentik
name: "Nextcloud OAuth Provider"
authorization_flow: "default-authorization-flow"
client_type: "confidential"
client_id: "nextcloud-client-id"
redirect_uris: "https://cloud.jnss.me/apps/oidc_login/oidc"
signing_key: "authentik-default-key"
property_mappings:
- "authentik default OAuth Mapping: OpenID 'openid'"
- "authentik default OAuth Mapping: OpenID 'email'"
- "authentik default OAuth Mapping: OpenID 'profile'"
- "Custom: Groups" # Maps user groups to 'groups' claim
```
### Forward Auth Configuration Examples
#### Caddy Configuration for Forward Auth
```caddyfile
# whoami service with forward authentication
whoami.jnss.me {
# Forward authentication to Authentik
forward_auth https://auth.jnss.me {
uri /outpost.goauthentik.io/auth/caddy
copy_headers Remote-User Remote-Name Remote-Email Remote-Groups
}
# Backend service (receives authenticated requests)
reverse_proxy localhost:8080
}
```
#### Authentik Proxy Provider Configuration
```yaml
# Authentik Proxy Provider for forward auth
name: "Whoami Forward Auth"
type: "proxy"
authorization_flow: "default-authorization-flow"
external_host: "https://whoami.jnss.me"
internal_host: "http://localhost:8080"
skip_path_regex: "^/(health|metrics).*"
mode: "forward_single" # Single application mode
```
#### Service Integration (Forward Auth)
Services receive authentication information via HTTP headers:
```python
# Example service code (Python Flask)
@app.route('/')
def index():
username = request.headers.get('Remote-User')
name = request.headers.get('Remote-Name')
email = request.headers.get('Remote-Email')
groups = request.headers.get('Remote-Groups', '').split(',')
return render_template('index.html',
username=username,
name=name,
email=email,
groups=groups)
```
### Authorization Policies
Both OAuth and Forward Auth support Authentik authorization policies:
```yaml
# Example authorization policy in Authentik
policy_bindings:
- policy: "group_admins_only"
target: "nextcloud_oauth_provider"
order: 0
- policy: "require_mfa"
target: "gitea_oauth_provider"
order: 1
- policy: "internal_network_only"
target: "whoami_proxy_provider"
order: 0
```
### Decision Matrix: OAuth/OIDC vs Forward Auth
| Criteria | OAuth/OIDC | Forward Auth |
|----------|-----------|-------------|
| **Application Support** | Requires native OAuth/OIDC support | Any application |
| **Protocol Standard** | Industry standard (RFC 6749, 7636) | Proprietary/custom |
| **Token Management** | Native refresh tokens, proper expiry | Session-based only |
| **Logout Handling** | Proper logout flow | Complex, proxy-dependent |
| **API Access** | Full API support via tokens | Header-only |
| **Implementation Effort** | Configure OAuth settings | Zero app changes |
| **User Experience** | Standard OAuth redirects | Transparent |
| **Security Model** | Token-based with scopes | Header trust model |
| **When to Use** | **Nextcloud, Gitea, modern apps** | **Static sites, legacy apps, whoami** |
### Consequences
#### Positive
- **Standards Compliance**: OAuth/OIDC uses industry-standard protocols
- **Security**: Multiple authentication options with appropriate security models
- **Flexibility**: Right tool for each service (OAuth when possible, forward auth when needed)
- **Auditability**: Centralized authentication logging via Authentik
- **User Experience**: Proper SSO across all services
- **Token Security**: OAuth provides secure token refresh and scope management
- **Graceful Degradation**: Forward auth available for services without OAuth support
#### Negative
- **Complexity**: Need to understand two authentication methods
- **Configuration Overhead**: OAuth requires per-service configuration
- **Single Point of Failure**: Authentik failure affects all services
- **Learning Curve**: Team must understand OAuth flows and forward auth model
#### Mitigation Strategies
- **Documentation**: Clear decision guide for choosing OAuth vs forward auth
- **Templates**: Reusable OAuth configuration templates for common services
- **High Availability**: Robust deployment and monitoring of Authentik
- **Monitoring**: Comprehensive monitoring of both authentication flows
- **Testing**: Automated tests for authentication flows
### Security Considerations
#### OAuth/OIDC Security
```yaml
# Authentik OAuth2 Provider security settings
authorization_code_validity: 60 # 1 minute
access_code_validity: 3600 # 1 hour
refresh_code_validity: 2592000 # 30 days
include_claims_in_id_token: true
signing_key: "authentik-default-key"
sub_mode: "hashed_user_id"
issuer_mode: "per_provider"
```
**Best Practices**:
- Use PKCE for all OAuth flows (protection against interception)
- Implement proper token rotation (refresh tokens expire and rotate)
- Validate `aud` (audience) and `iss` (issuer) claims in JWT tokens
- Use short-lived access tokens (1 hour)
- Store client secrets securely (Ansible Vault)
#### Forward Auth Security
```yaml
# Authentik Proxy Provider security settings
token_validity: 3600 # 1 hour session
cookie_domain: ".jnss.me"
skip_path_regex: "^/(health|metrics|static).*"
```
**Best Practices**:
- Trust only Authentik-provided headers
- Validate `Remote-User` header exists before granting access
- Use HTTPS for all forward auth endpoints
- Implement proper session timeouts
- Strip user-provided authentication headers at proxy
#### Access Control
- **Group-Based Authorization**: Users assigned to groups, groups to applications
- **Policy Engine**: Authentik policies for fine-grained access control
- **MFA Requirements**: Multi-factor authentication for sensitive services
- **IP-Based Restrictions**: Geographic or network-based access control
- **Time-Based Access**: Temporary access grants via policies
#### Audit Logging
```json
{
"timestamp": "2025-12-15T10:30:00Z",
"event": "oauth_authorization",
"user": "john.doe",
"application": "nextcloud",
"scopes": ["openid", "email", "profile", "groups"],
"ip": "192.168.1.100",
"user_agent": "Mozilla/5.0..."
}
```
### Implementation Examples by Service Type
#### OAuth/OIDC Services (Primary Method)
**Nextcloud**:
```caddyfile
cloud.jnss.me {
reverse_proxy localhost:8080
}
# OAuth configured within Nextcloud application
```
**Gitea**:
```caddyfile
git.jnss.me {
reverse_proxy localhost:3000
}
# OAuth configured within Gitea application settings
```
#### Forward Auth Services (Fallback Method)
**Whoami (test/demo service)**:
```caddyfile
whoami.jnss.me {
forward_auth https://auth.jnss.me {
uri /outpost.goauthentik.io/auth/caddy
copy_headers Remote-User Remote-Name Remote-Email Remote-Groups
}
reverse_proxy localhost:8080
}
```
**Static Documentation Site**:
```caddyfile
docs.jnss.me {
forward_auth https://auth.jnss.me {
uri /outpost.goauthentik.io/auth/caddy
copy_headers Remote-User Remote-Groups
}
root * /var/www/docs
file_server
}
```
**Internal API (no OAuth support)**:
```caddyfile
api.jnss.me {
forward_auth https://auth.jnss.me {
uri /outpost.goauthentik.io/auth/caddy
copy_headers Remote-User Remote-Email Remote-Groups
}
reverse_proxy localhost:3000
}
```
#### Selective Protection (Public + Protected Paths)
```caddyfile
app.jnss.me {
# Public endpoints (no auth required)
handle /health {
reverse_proxy localhost:8080
}
handle /metrics {
reverse_proxy localhost:8080
}
handle /public/* {
reverse_proxy localhost:8080
}
# Protected endpoints (forward auth)
handle /admin/* {
forward_auth https://auth.jnss.me {
uri /outpost.goauthentik.io/auth/caddy
copy_headers Remote-User Remote-Groups
}
reverse_proxy localhost:8080
}
# Default: protected
handle {
forward_auth https://auth.jnss.me {
uri /outpost.goauthentik.io/auth/caddy
copy_headers Remote-User Remote-Groups
}
reverse_proxy localhost:8080
}
}
```
### Alternatives Considered
1. **OAuth2/OIDC Only**: Rejected because many services don't support OAuth natively
2. **Forward Auth Only**: Rejected because it doesn't leverage native OAuth support in modern apps
3. **Per-Service Authentication**: Rejected due to management overhead and inconsistent security
4. **Shared Database**: Rejected due to tight coupling between services
5. **VPN-Based Access**: Rejected due to operational complexity for web services
6. **SAML**: Rejected in favor of modern OAuth2/OIDC standards
---
## Rootful Containers with Infrastructure Fact Pattern
**Technical Story**: Enable containerized applications to access native infrastructure services (PostgreSQL, Valkey) via Unix sockets with group-based permissions.
### Context
Containerized applications need to access infrastructure services (PostgreSQL, Valkey) through Unix sockets with filesystem-based permission controls. The permission model requires:
1. **Socket directories** owned by service groups (`postgres-clients`, `valkey-clients`)
2. **Application users** added to these groups for access
3. **Container processes** must preserve group membership to access sockets
Two approaches were evaluated:
1. **Rootless containers (user namespace)**: Containers run in user namespace with UID/GID remapping
2. **Rootful containers (system services)**: Containers run as dedicated system users without namespace isolation
### Decision
We will use **rootful containers deployed as system-level systemd services** with an **Infrastructure Fact Pattern** where infrastructure roles export client group GIDs as Ansible facts for application consumption.
### Rationale
#### Why Rootful Succeeds
**Direct UID/GID Mapping**:
```bash
# Host: authentik user UID 966, groups: 966 (authentik), 961 (valkey-clients), 962 (postgres-clients)
# Container User=966:966 with PodmanArgs=--group-add 961 --group-add 962
# Inside container:
id
# uid=966(authentik) gid=966(authentik) groups=966(authentik),961(valkey-clients),962(postgres-clients)
# Socket access works:
ls -l /var/run/postgresql/.s.PGSQL.5432
# srwxrwx--- 1 postgres postgres-clients 0 ... /var/run/postgresql/.s.PGSQL.5432
```
**Group membership preserved**: Container process has GIDs 961 and 962, matching socket group ownership.
#### Why Rootless Failed (Discarded Approach)
**User Namespace UID/GID Remapping**:
```bash
# Host: authentik user UID 100000, subuid range 200000-265535
# Container User=%i:%i with --userns=host --group-add=keep-groups
# User namespace remaps:
# Host UID 100000 → Container UID 100000 (root in namespace)
# Host GID 961 → Container GID 200961 (remapped into subgid range)
# Host GID 962 → Container GID 200962 (remapped into subgid range)
# Socket ownership on host:
# srwxrwx--- 1 postgres postgres-clients (GID 962)
# Container process groups: 200961, 200962 (remapped)
# Socket expects: GID 962 (not remapped)
# Result: Permission denied ❌
```
**Root cause**: User namespace supplementary group remapping breaks group-based socket access even with `--userns=host`, `--group-add=keep-groups`, and `Annotation=run.oci.keep_original_groups=1`.
### Infrastructure Fact Pattern
#### Infrastructure Roles Export GIDs
Infrastructure services create client groups and export their GIDs as Ansible facts:
```yaml
# PostgreSQL role: roles/postgresql/tasks/main.yml
- name: Create PostgreSQL client access group
group:
name: postgres-clients
system: true
- name: Get PostgreSQL client group GID
shell: "getent group postgres-clients | cut -d: -f3"
register: postgresql_client_group_lookup
changed_when: false
- name: Set PostgreSQL client group GID as fact
set_fact:
postgresql_client_group_gid: "{{ postgresql_client_group_lookup.stdout }}"
```
```yaml
# Valkey role: roles/valkey/tasks/main.yml
- name: Create Valkey client access group
group:
name: valkey-clients
system: true
- name: Get Valkey client group GID
shell: "getent group valkey-clients | cut -d: -f3"
register: valkey_client_group_lookup
changed_when: false
- name: Set Valkey client group GID as fact
set_fact:
valkey_client_group_gid: "{{ valkey_client_group_lookup.stdout }}"
```
#### Application Roles Consume Facts
Application roles validate and consume infrastructure facts:
```yaml
# Authentik role: roles/authentik/tasks/main.yml
- name: Validate infrastructure facts are available
assert:
that:
- postgresql_client_group_gid is defined
- valkey_client_group_gid is defined
fail_msg: |
Required infrastructure facts are not available.
Ensure PostgreSQL and Valkey roles have run first.
- name: Create authentik user with infrastructure groups
user:
name: authentik
groups: [postgres-clients, valkey-clients]
append: true
```
```ini
# Container template: roles/authentik/templates/authentik-server.container
[Container]
User={{ authentik_uid }}:{{ authentik_gid }}
PodmanArgs=--group-add {{ postgresql_client_group_gid }} --group-add {{ valkey_client_group_gid }}
```
### Implementation Details
#### System-Level Deployment
```ini
# Quadlet files deployed to /etc/containers/systemd/ (not ~/.config/)
# Pod: /etc/containers/systemd/authentik.pod
[Unit]
Description=Authentik Authentication Pod
[Pod]
PublishPort=0.0.0.0:9000:9000
ShmSize=256m
[Service]
Restart=always
[Install]
WantedBy=multi-user.target # System target, not default.target
```
```ini
# Container: /etc/containers/systemd/authentik-server.container
[Container]
User=966:966
PodmanArgs=--group-add 962 --group-add 961
Volume=/var/run/postgresql:/var/run/postgresql:Z
Volume=/var/run/valkey:/var/run/valkey:Z
```
#### Service Management
```bash
# System scope (not user scope)
systemctl status authentik-pod
systemctl restart authentik-server
journalctl -u authentik-server -f
# Verify container location
systemctl status authentik-server | grep CGroup
# CGroup: /system.slice/authentik-server.service ✓
```
### Special Case: Valkey Socket Group Fix
Valkey doesn't natively support socket group configuration (unlike PostgreSQL's `unix_socket_group`). A helper service ensures correct socket permissions:
```ini
# /etc/systemd/system/valkey-socket-fix.service
[Unit]
Description=Fix Valkey socket group ownership and permissions
BindsTo=valkey.service
After=valkey.service
[Service]
Type=oneshot
ExecStart=/bin/sh -c 'i=0; while [ ! -S /var/run/valkey/valkey.sock ] && [ $i -lt 100 ]; do sleep 0.1; i=$((i+1)); done'
ExecStart=/bin/chgrp valkey-clients /var/run/valkey/valkey.sock
ExecStart=/bin/chmod 770 /var/run/valkey/valkey.sock
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
```
Triggered by Valkey service:
```ini
# /etc/systemd/system/valkey.service (excerpt)
[Unit]
Wants=valkey-socket-fix.service
```
### Consequences
#### Positive
- **Socket Access Works**: Group-based permissions function correctly
- **Security**: Containers run as dedicated users (not root), no privileged daemon
- **Portability**: Dynamic GID facts work across different hosts
- **Consistency**: Same pattern for all containerized applications
- **Simplicity**: No user namespace complexity, standard systemd service management
#### Negative
- **Not "Pure" Rootless**: Containers require root for systemd service deployment
- **Different from Docker**: Less familiar pattern than rootless user services
#### Neutral
- **System vs User Scope**: Different commands (`systemctl` vs `systemctl --user`) but equally capable
- **Deployment Location**: `/etc/containers/systemd/` vs `~/.config/` but same Quadlet functionality
### Validation
```bash
# Verify service location
systemctl status authentik-server | grep CGroup
# → /system.slice/authentik-server.service ✓
# Verify process groups
ps aux | grep authentik | head -1 | awk '{print $2}' | \
xargs -I {} cat /proc/{}/status | grep Groups
# → Groups: 961 962 966 ✓
# Verify socket permissions
ls -l /var/run/postgresql/.s.PGSQL.5432
# → srwxrwx--- postgres postgres-clients ✓
ls -l /var/run/valkey/valkey.sock
# → srwxrwx--- valkey valkey-clients ✓
# Verify HTTP endpoint
curl -I http://127.0.0.1:9000/
# → HTTP/1.1 302 Found ✓
```
### Alternatives Considered
1. **Rootless with user namespace** - Discarded due to GID remapping breaking group-based socket access
2. **TCP-only connections** - Rejected to maintain Unix socket security and performance benefits
3. **Hardcoded GIDs** - Rejected for portability; facts provide dynamic resolution
4. **Directory permissions (777)** - Rejected for security; group-based access more restrictive. This is then later changed again to 777, due to Nextcloud switching from root to www-data, breaking group-based permissions.
---