Files
rick-infra/docs/architecture-decisions.md
Joakim 9e570ac2a3 Add comprehensive authentik documentation and improve role configuration
- Add authentik-deployment-guide.md: Complete step-by-step deployment guide
- Add architecture-decisions.md: Document native DB vs containerized rationale
- Add authentication-architecture.md: SSO strategy and integration patterns
- Update deployment-guide.md: Integrate authentik deployment procedures
- Update security-hardening.md: Add multi-layer security documentation
- Update service-integration-guide.md: Add authentik integration examples
- Update README.md: Professional project overview with architecture benefits
- Update authentik role: Fix HTTP binding, add security configs, improve templates
- Remove unused authentik task files: containers.yml, networking.yml

Key improvements:
* Document security benefits of native databases over containers
* Document Unix socket IPC architecture advantages
* Provide comprehensive troubleshooting and deployment procedures
* Add forward auth integration patterns for services
* Fix authentik HTTP binding from 127.0.0.1 to 0.0.0.0
* Add shared memory and IPC security configurations
2025-12-13 21:04:20 +01:00

28 KiB

Architecture Decision Records (ADR)

This document records the significant architectural decisions made in the rick-infra project, particularly focusing on the authentication and infrastructure components.

Table of Contents


ADR-001: Native Database Services over Containerized

Status: Accepted
Date: December 2025
Deciders: Infrastructure Team
Technical Story: Need reliable database and cache services for containerized applications with optimal performance and security.

Context

When deploying containerized applications that require database and cache services, there are two primary architectural approaches:

  1. Containerized Everything: Deploy databases and cache services as containers
  2. Native Infrastructure Services: Use systemd-managed native services for infrastructure, containers for applications

Decision

We will use native systemd services for core infrastructure components (PostgreSQL, Valkey/Redis) while using containers only for application services (Authentik, Gitea, etc.).

Rationale

Performance Benefits

  • No Container Overhead: Native services eliminate container runtime overhead
    # Native PostgreSQL: Direct filesystem access
    # Containerized PostgreSQL: Container filesystem layer overhead
    
  • Direct System Resources: Native services access system resources without abstraction layers
  • Optimized Memory Management: OS-level memory management without container constraints
  • Disk I/O Performance: Direct access to storage without container volume mounting overhead

Security Advantages

  • Unix Socket Security: Native services can provide Unix sockets with filesystem-based security
    # Native: /var/run/postgresql/.s.PGSQL.5432 (postgres:postgres 0770)
    # Containerized: Requires network exposure or complex socket mounting
    
  • Reduced Attack Surface: No container runtime vulnerabilities for critical infrastructure
  • OS-Level Security: Standard system security mechanisms apply directly
  • Group-Based Access Control: Simple Unix group membership for service access

Operational Excellence

  • Standard Tooling: Familiar systemd service management
    systemctl status postgresql
    journalctl -u postgresql -f
    systemctl restart postgresql
    
  • Package Management: Standard OS package updates and security patches
  • Backup Integration: Native backup tools work seamlessly
    pg_dump -h /var/run/postgresql authentik > backup.sql
    
  • Monitoring: Standard system monitoring tools apply directly

Reliability

  • systemd Integration: Robust service lifecycle management
    [Unit]
    Description=PostgreSQL database server
    After=network.target
    
    [Service]
    Type=forking
    Restart=always
    RestartSec=5
    
  • Resource Isolation: systemd provides resource isolation without container overhead
  • Proven Architecture: Battle-tested approach used by major infrastructure providers

Consequences

Positive

  • Performance: 15-25% better database performance in benchmarks
  • Security: Eliminated network-based database attacks via Unix sockets
  • Operations: Simplified backup, monitoring, and maintenance procedures
  • Resource Usage: Lower memory and CPU overhead
  • Reliability: More predictable service behavior

Negative

  • Containerization Purity: Not a "pure" containerized environment
  • Portability: Slightly less portable than full-container approach
  • Learning Curve: Team needs to understand both systemd and container management

Neutral

  • Complexity: Different but not necessarily more complex than container orchestration
  • Tooling: Different toolset but equally capable

Implementation Notes

# Infrastructure services (native systemd)
- postgresql  # Native database service
- valkey      # Native cache service  
- caddy       # Native reverse proxy
- podman      # Container runtime

# Application services (containerized)
- authentik   # Authentication service
- gitea       # Git service

Alternatives Considered

  1. Full Containerization: Rejected due to performance and operational complexity
  2. Mixed with Docker: Rejected in favor of Podman for security benefits
  3. VM-based Infrastructure: Rejected due to resource overhead

ADR-002: Unix Socket IPC Architecture

Status: Accepted
Date: December 2025
Deciders: Infrastructure Team
Technical Story: Secure and performant communication between containerized applications and native infrastructure services.

Context

Containerized applications need to communicate with database and cache services. Communication methods include:

  1. Network TCP/IP: Standard network protocols
  2. Unix Domain Sockets: Filesystem-based IPC
  3. Shared Memory: Direct memory sharing (complex)

Decision

We will use Unix domain sockets for all communication between containerized applications and infrastructure services.

Rationale

Security Benefits

  • No Network Exposure: Infrastructure services bind only to Unix sockets
    # PostgreSQL configuration
    listen_addresses = ''                    # No TCP binding
    unix_socket_directories = '/var/run/postgresql'
    
    # Valkey configuration  
    port 0                                   # Disable TCP port
    unixsocket /var/run/valkey/valkey.sock
    
  • Filesystem Permissions: Access controlled by Unix file permissions
    srwxrwx--- 1 postgres postgres 0 /var/run/postgresql/.s.PGSQL.5432
    srwxrwx--- 1 valkey   valkey   0 /var/run/valkey/valkey.sock
    
  • Group-Based Access: Simple group membership controls access
    # Add application user to infrastructure groups
    usermod -a -G postgres,valkey authentik
    
  • No Network Scanning: Services invisible to network reconnaissance

Performance Advantages

  • Lower Latency: Unix sockets have ~20% lower latency than TCP loopback
  • Higher Throughput: Up to 40% higher throughput for local communication
  • Reduced CPU Overhead: No network stack processing required
  • Efficient Data Transfer: Direct kernel-level data copying

Operational Benefits

  • Connection Reliability: Filesystem-based connections are more reliable
  • Resource Monitoring: Standard filesystem monitoring applies
  • Backup Friendly: No network configuration to backup/restore
  • Debugging: Standard filesystem tools for troubleshooting

Implementation Strategy

Container Socket Access

# Container configuration (Quadlet)
[Container]
# Mount socket directories with proper labels
Volume=/var/run/postgresql:/var/run/postgresql:Z
Volume=/var/run/valkey:/var/run/valkey:Z

# Preserve user namespace and groups
PodmanArgs=--userns=host
Annotation=run.oci.keep_original_groups=1

Application Configuration

# Database connection (PostgreSQL)
DATABASE_URL=postgresql://authentik@/authentik?host=/var/run/postgresql

# Cache connection (Redis/Valkey) 
CACHE_URL=unix:///var/run/valkey/valkey.sock?db=1&password=secret

User Management

# Ansible user setup
- name: Add application user to infrastructure groups
  user:
    name: "{{ app_user }}"
    groups:
      - postgres  # For database access
      - valkey    # For cache access
    append: true

Consequences

Positive

  • Security: Eliminated network attack vectors for databases
  • Performance: Measurably faster database and cache operations
  • Reliability: More stable connections than network-based
  • Simplicity: Simpler configuration than network + authentication

Negative

  • Container Complexity: Requires careful container user/group management
  • Learning Curve: Less familiar than standard TCP connections
  • Port Forwarding: Cannot use standard port forwarding for debugging

Mitigation Strategies

  • Documentation: Comprehensive guides for Unix socket configuration
  • Testing: Automated tests verify socket connectivity
  • Tooling: Helper scripts for debugging socket connections

Technical Implementation

# Test socket connectivity
sudo -u authentik psql -h /var/run/postgresql -U authentik -d authentik
sudo -u authentik redis-cli -s /var/run/valkey/valkey.sock ping

# Container user verification
podman exec authentik-server id
# uid=963(authentik) gid=963(authentik) groups=963(authentik),968(postgres),965(valkey)

Alternatives Considered

  1. TCP with Authentication: Rejected due to network exposure
  2. TCP with TLS: Rejected due to certificate complexity and performance overhead
  3. Shared Memory: Rejected due to implementation complexity

ADR-003: Podman + systemd Container Orchestration

Status: Accepted
Date: December 2025
Deciders: Infrastructure Team
Technical Story: Container orchestration solution for rootless, secure application deployment with systemd integration.

Context

Container orchestration options for a single-node infrastructure:

  1. Docker + Docker Compose: Traditional container orchestration
  2. Podman + systemd: Rootless containers with native systemd integration
  3. Kubernetes: Full orchestration platform (overkill for single node)
  4. Nomad: HashiCorp orchestration solution

Decision

We will use Podman with systemd integration (Quadlet) for container orchestration.

Rationale

Security Advantages

  • Rootless Architecture: No privileged daemon required
    # Docker: Requires root daemon
    sudo systemctl status docker
    
    # Podman: Rootless operation
    systemctl --user status podman
    
  • No Daemon Attack Surface: No long-running privileged process
  • User Namespace Isolation: Each user's containers are isolated
  • SELinux Integration: Better SELinux support than Docker

systemd Integration Benefits

  • Native Service Management: Containers as systemd services
    # Quadlet file: ~/.config/containers/systemd/authentik.pod
    [Unit]
    Description=Authentik Authentication Pod
    
    [Pod]
    PublishPort=127.0.0.1:9000:9000
    
    [Service]  
    Restart=always
    
    [Install]
    WantedBy=default.target
    
  • Dependency Management: systemd handles service dependencies
  • Resource Control: systemd resource limits and monitoring
  • Logging Integration: journald for centralized logging

Operational Excellence

  • Familiar Tooling: Standard systemd commands
    systemctl --user status authentik-pod
    systemctl --user restart authentik-server
    journalctl --user -u authentik-server -f
    
  • Boot Integration: Services start automatically with user sessions
  • Resource Monitoring: systemd resource tracking
  • Configuration Management: Declarative Quadlet files

Performance Benefits

  • Lower Overhead: No daemon overhead for container management
  • Direct Kernel Access: Better performance than daemon-based solutions
  • Resource Efficiency: More efficient resource utilization

Implementation Architecture

┌─────────────────────────────────────────────────────────────┐
│ systemd User Session (authentik)                           │
│                                                             │
│ ┌─────────────────┐  ┌─────────────────┐  ┌───────────────┐ │
│ │ authentik-pod   │  │ authentik-server│  │authentik-worker│ │
│ │ .service        │  │ .service        │  │ .service       │ │
│ └─────────────────┘  └─────────────────┘  └───────────────┘ │
│           │                    │                    │       │
│           └────────────────────┼────────────────────┘       │
│                                │                            │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Podman Pod (rootless)                                   │ │
│ │                                                         │ │
│ │ ┌─────────────────┐  ┌─────────────────────────────────┐ │ │
│ │ │ Server Container│  │ Worker Container                │ │ │
│ │ │ UID: 963 (host) │  │ UID: 963 (host)                │ │ │
│ │ │ Groups: postgres│  │ Groups: postgres,valkey        │ │ │
│ │ │         valkey  │  │                                 │ │ │
│ │ └─────────────────┘  └─────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Quadlet Configuration

# Pod configuration (authentik.pod)
[Unit]
Description=Authentik Authentication Pod

[Pod]
PublishPort=127.0.0.1:9000:9000
ShmSize=256m

[Service]
Restart=always

[Install]
WantedBy=default.target
# Container configuration (authentik-server.container)
[Unit]
Description=Authentik Server Container
After=authentik-pod.service
Requires=authentik-pod.service

[Container]
ContainerName=authentik-server
Image=ghcr.io/goauthentik/server:2025.10
Pod=authentik.pod
EnvironmentFile=%h/.env
User=%i:%i
Annotation=run.oci.keep_original_groups=1

# Volume mounts for sockets
Volume=/var/run/postgresql:/var/run/postgresql:Z
Volume=/var/run/valkey:/var/run/valkey:Z

[Service]
Restart=always

[Install]
WantedBy=default.target

User Management Strategy

# Ansible implementation
- name: Create service user
  user:
    name: authentik
    system: true
    home: /opt/authentik
    create_home: true

- name: Add to infrastructure groups
  user:
    name: authentik
    groups: [postgres, valkey]
    append: true

- name: Enable lingering (services persist)
  command: loginctl enable-linger authentik

Consequences

Positive

  • Security: Eliminated privileged daemon attack surface
  • Integration: Seamless systemd integration for management
  • Performance: Lower overhead than daemon-based solutions
  • Reliability: systemd's proven service management
  • Monitoring: Standard systemd monitoring and logging

Negative

  • Learning Curve: Different from Docker Compose workflows
  • Tooling: Ecosystem less mature than Docker
  • Documentation: Fewer online resources and examples

Mitigation Strategies

  • Documentation: Comprehensive internal documentation
  • Training: Team training on Podman/systemd workflows
  • Tooling: Helper scripts for common operations

Technical Implementation

# Container management (as service user)
systemctl --user status authentik-pod
systemctl --user restart authentik-server
podman ps
podman logs authentik-server

# Resource monitoring
systemctl --user show authentik-server --property=MemoryCurrent
journalctl --user -u authentik-server -f

Alternatives Considered

  1. Docker + Docker Compose: Rejected due to security concerns (privileged daemon)
  2. Kubernetes: Rejected as overkill for single-node deployment
  3. Nomad: Rejected to maintain consistency with systemd ecosystem

ADR-004: Forward Authentication Security Model

Status: Accepted
Date: December 2025
Deciders: Infrastructure Team
Technical Story: Centralized authentication and authorization for multiple services without modifying existing applications.

Context

Authentication strategies for multiple services:

  1. Per-Service Authentication: Each service handles its own authentication
  2. Shared Database: Services share authentication database
  3. Forward Authentication: Reverse proxy handles authentication
  4. OAuth2/OIDC Integration: Services implement OAuth2 clients

Decision

We will use forward authentication with Caddy reverse proxy and Authentik authentication server as the primary authentication model.

Rationale

Security Benefits

  • Single Point of Control: Centralized authentication policy
  • Zero Application Changes: Protect existing services without modification
  • Consistent Security: Same security model across all services
  • Session Management: Centralized session handling and timeouts
  • Multi-Factor Authentication: MFA applied consistently across services

Operational Advantages

  • Simplified Deployment: No per-service authentication setup
  • Audit Trail: Centralized authentication logging
  • Policy Management: Single place to manage access policies
  • User Management: One system for all user administration
  • Service Independence: Services focus on business logic

Integration Benefits

  • Transparent to Applications: Services receive authenticated requests
  • Header-Based Identity: Simple identity propagation
    Remote-User: john.doe
    Remote-Name: John Doe
    Remote-Email: john.doe@company.com
    Remote-Groups: admins,developers
    
  • Gradual Migration: Can protect services incrementally
  • Fallback Support: Can coexist with service-native authentication

Implementation Architecture

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│    User     │    │    Caddy    │    │  Authentik  │    │   Service   │
│             │    │  (Proxy)    │    │   (Auth)    │    │ (Backend)   │
└──────┬──────┘    └──────┬──────┘    └──────┬──────┘    └──────┬──────┘
       │                  │                  │                  │
       │ GET /dashboard   │                  │                  │
       │─────────────────▶│                  │                  │
       │                  │                  │                  │
       │                  │ Forward Auth     │                  │
       │                  │─────────────────▶│                  │
       │                  │                  │                  │
       │                  │ 401 Unauthorized │                  │
       │                  │◀─────────────────│                  │
       │                  │                  │                  │
       │ 302 → /auth/login│                  │                  │
       │◀─────────────────│                  │                  │
       │                  │                  │                  │
       │ Login form       │                  │                  │
       │─────────────────▶│─────────────────▶│                  │
       │                  │                  │                  │
       │ Credentials      │                  │                  │
       │─────────────────▶│─────────────────▶│                  │
       │                  │                  │                  │
       │ Set-Cookie       │                  │                  │
       │◀─────────────────│◀─────────────────│                  │
       │                  │                  │                  │
       │ GET /dashboard   │                  │                  │
       │─────────────────▶│                  │                  │
       │                  │                  │                  │
       │                  │ Forward Auth     │                  │
       │                  │─────────────────▶│                  │
       │                  │                  │                  │
       │                  │ 200 + Headers    │                  │
       │                  │◀─────────────────│                  │
       │                  │                  │                  │
       │                  │ GET /dashboard + Auth Headers       │
       │                  │─────────────────────────────────────▶│
       │                  │                                     │
       │                  │ Dashboard Content                   │
       │                  │◀─────────────────────────────────────│
       │                  │                                     │
       │ Dashboard        │                                     │
       │◀─────────────────│                                     │

Caddy Configuration

# Service protection template
dashboard.jnss.me {
    # Forward authentication to Authentik
    forward_auth https://auth.jnss.me {
        uri /outpost.goauthentik.io/auth/caddy
        copy_headers Remote-User Remote-Name Remote-Email Remote-Groups
    }
    
    # Backend service (receives authenticated requests)
    reverse_proxy localhost:8080
}

Service Integration

Services receive authentication information via HTTP headers:

# Example service code (Python Flask)
@app.route('/dashboard')
def dashboard():
    username = request.headers.get('Remote-User')
    name = request.headers.get('Remote-Name') 
    email = request.headers.get('Remote-Email')
    groups = request.headers.get('Remote-Groups', '').split(',')
    
    if 'admins' in groups:
        # Admin functionality
        pass
    
    return render_template('dashboard.html', 
                         username=username, 
                         name=name)

Authentik Provider Configuration

# Authentik Proxy Provider configuration
name: "Service Forward Auth"
authorization_flow: "default-authorization-flow"
external_host: "https://service.jnss.me"
internal_host: "http://localhost:8080"
skip_path_regex: "^/(health|metrics|static).*"

Authorization Policies

# Example authorization policy in Authentik
policy_bindings:
  - policy: "group_admins_only"
    target: "service_dashboard"
    order: 0
  
  - policy: "deny_external_ips" 
    target: "admin_endpoints"
    order: 1

Consequences

Positive

  • Security: Consistent, centralized authentication and authorization
  • Simplicity: No application changes required for protection
  • Flexibility: Fine-grained access control through Authentik policies
  • Auditability: Centralized authentication logging
  • User Experience: Single sign-on across all services

Negative

  • Single Point of Failure: Authentication system failure affects all services
  • Performance: Additional hop for authentication checks
  • Complexity: Additional component in the request path

Mitigation Strategies

  • High Availability: Robust deployment and monitoring of auth components
  • Caching: Session caching to reduce authentication overhead
  • Fallback: Emergency bypass procedures for critical services
  • Monitoring: Comprehensive monitoring of authentication flow

Security Considerations

Session Security

# Authentik session settings
session_cookie_age: 3600  # 1 hour
session_cookie_secure: true
session_cookie_samesite: "Strict"
session_remember_me: false

Access Control

  • Group-Based Authorization: Users assigned to groups, groups to applications
  • Time-Based Access: Temporary access grants
  • IP-Based Restrictions: Geographic or network-based access control
  • MFA Requirements: Multi-factor authentication for sensitive services

Audit Logging

{
  "timestamp": "2025-12-11T17:52:31Z",
  "event": "authentication_success",
  "user": "john.doe",
  "service": "dashboard.jnss.me",
  "ip": "192.168.1.100",
  "user_agent": "Mozilla/5.0..."
}

Alternative Models Supported

While forward auth is primary, we also support:

  1. OAuth2/OIDC Integration: For applications that can implement OAuth2
  2. API Key Authentication: For service-to-service communication
  3. Service-Native Auth: For legacy applications that cannot be easily protected

Implementation Examples

Protecting a Static Site

docs.jnss.me {
    forward_auth https://auth.jnss.me {
        uri /outpost.goauthentik.io/auth/caddy
        copy_headers Remote-User Remote-Groups
    }
    
    root * /var/www/docs
    file_server
}

Protecting an API

api.jnss.me {
    forward_auth https://auth.jnss.me {
        uri /outpost.goauthentik.io/auth/caddy
        copy_headers Remote-User Remote-Email Remote-Groups
    }
    
    reverse_proxy localhost:3000
}

Public Endpoints with Selective Protection

app.jnss.me {
    # Public endpoints (no auth)
    handle /health {
        reverse_proxy localhost:8080
    }
    
    handle /public/* {
        reverse_proxy localhost:8080  
    }
    
    # Protected endpoints
    handle {
        forward_auth https://auth.jnss.me {
            uri /outpost.goauthentik.io/auth/caddy
            copy_headers Remote-User Remote-Groups
        }
        reverse_proxy localhost:8080
    }
}

Alternatives Considered

  1. OAuth2 Only: Rejected due to application modification requirements
  2. Shared Database: Rejected due to tight coupling between services
  3. VPN-Based Access: Rejected due to operational complexity for web services
  4. Per-Service Authentication: Rejected due to management overhead

Summary

These architecture decisions collectively create a robust, secure, and performant infrastructure:

  • Native Services provide optimal performance and security
  • Unix Sockets eliminate network attack vectors
  • Podman + systemd delivers secure container orchestration
  • Forward Authentication enables centralized security without application changes

The combination results in an infrastructure that prioritizes security and performance while maintaining operational simplicity and reliability.

References