- Update architecture-decisions.md: Change decision to OAuth/OIDC primary, forward auth fallback
- Add comprehensive OAuth/OIDC and forward auth flow diagrams
- Add decision matrix comparing both authentication methods
- Include real examples: Nextcloud/Gitea OAuth configs, whoami forward auth
- Update rationale to emphasize OAuth/OIDC security and standards benefits
- Update authentication-architecture.md: Align with new OAuth-first approach
- Add 'Choosing the Right Pattern' section with clear decision guidance
- Swap pattern order: OAuth/OIDC (Pattern 1), Forward Auth (Pattern 2)
- Update Example 1: Change Gitea from forward auth to OAuth/OIDC integration
- Add emphasis on primary vs fallback methods throughout
- Update authentik-deployment-guide.md: Reflect OAuth/OIDC preference
- Update overview to mention OAuth2/OIDC provider and forward auth fallback
- Add decision guidance to service integration examples
- Reorder examples: Nextcloud OAuth (primary), forward auth (fallback)
- Clarify forward auth should only be used for services without OAuth support
This update ensures all authentication documentation consistently reflects the
agreed architectural decision: use OAuth/OIDC when services support it
(Nextcloud, Gitea, modern apps), and only use forward auth as a fallback for
legacy applications, static sites, or simple tools without OAuth capabilities.
## New Features
- **Nextcloud Role**: Complete cloud storage deployment using Podman Quadlet
- FPM variant with Caddy reverse proxy and FastCGI
- PostgreSQL database via Unix socket
- Valkey/Redis for app-level caching and file locking
- Automatic HTTPS with Let's Encrypt via Caddy
- Dual-root pattern: Caddy serves static assets, FPM handles PHP
- **Split Caching Strategy**: Redis caching WITHOUT Redis sessions
- Custom redis.config.php template for app-level caching only
- File-based PHP sessions for stability (avoids session lock issues)
- Prevents cascading failures from session lock contention
- Documented in role README with detailed rationale
## Infrastructure Updates
- **Socket Permissions**: Update PostgreSQL and Valkey to mode 777
- Required for containers that switch users (root → www-data)
- Nextcloud container loses supplementary groups on user switch
- Security maintained via password authentication (scram-sha-256, requirepass)
- Documented socket permission architecture in docs/
- **PostgreSQL**: Export client group GID as fact for dependent roles
- **Valkey**: Export client group GID as fact, update socket fix service
## Documentation
- New: docs/socket-permissions-architecture.md
- Explains 777 vs 770 socket permission trade-offs
- Documents why group-based access doesn't work for user-switching containers
- Provides TCP alternative for stricter security requirements
- Updated: All role READMEs with socket permission notes
- New: Nextcloud README with comprehensive deployment, troubleshooting, and Redis architecture documentation
## Configuration
- host_vars: Add Nextcloud vault variables and configuration
- site.yml: Include Nextcloud role in main playbook
## Technical Details
**Why disable Redis sessions?**
The official Nextcloud container enables Redis session handling via REDIS_HOST env var,
which causes severe performance issues:
1. Session lock contention under high concurrency (browser parallel asset requests)
2. Infinite lock retries (default lock_retries=-1) blocking FPM workers
3. Timeout orphaning: reverse proxy kills connection, worker keeps lock
4. Worker pool exhaustion: all 5 default workers blocked on same session lock
5. Cascading failure: new requests queue, more timeouts, more orphaned locks
Solution: Use file-based sessions (reliable, fast for single-server) while keeping
Redis for distributed cache and transactional file locking via custom config file.
This provides optimal performance without the complexity of Redis session debugging.
Tested: Fresh deployment on arch-vps (69.62.119.31)
Domain: https://cloud.jnss.me/
Changes:
- Migrate Authentik to journald logging (remove file-based logs)
- Update Gitea to use infrastructure variables for PostgreSQL access
- Add comprehensive logging documentation to deployment guide
- Add infrastructure variable pattern guide to integration docs
Authentik Logging:
- Remove LogDriver=k8s-file from server and worker containers
- Remove logs directory creation from user setup tasks
- Update deployment guide with journald examples and JSON log patterns
Gitea Infrastructure Variables:
- Add infrastructure dependencies section to role defaults
- Replace hardcoded paths with postgresql_unix_socket_directories variable
- Replace hardcoded 'postgres' group with postgresql_client_group variable
- Add infrastructure variable validation in tasks
- Remove manual socket permission override (handled by infrastructure)
Documentation:
- Add journald logging best practices to service integration guide
- Add infrastructure variable pattern documentation with Gitea example
- Update Authentik deployment guide with journald commands and JSON filtering
- Document benefits: centralized logging, single source of truth, maintainability
Validated on arch-vps:
- Authentik logs accessible via journalctl and podman logs (identical output)
- Gitea user added to postgres-clients group (GID 962)
- No PostgreSQL socket permission errors after service restart
Major architectural change from rootless user services to system-level (rootful)
containers to enable group-based Unix socket access for containerized applications.
Infrastructure Changes:
- PostgreSQL: Export postgres-clients group GID as Ansible fact
- Valkey: Export valkey-clients group GID as Ansible fact
- Valkey: Add socket-fix service to maintain correct socket group ownership
- Both: Set socket directories to 770 with client group ownership
Authentik Role Refactoring:
- Remove rootless container configuration (subuid/subgid, lingering, user systemd)
- Deploy Quadlet files to /etc/containers/systemd/ (system-level)
- Use dynamic GID facts in container PodmanArgs (--group-add)
- Simplify user creation to system user with infrastructure group membership
- Update handlers for system scope service management
- Remove unnecessary container security options (no user namespace isolation)
Container Template Changes:
- Pod: Remove --userns args, change WantedBy to multi-user.target
- Containers: Replace Annotation with PodmanArgs using dynamic GIDs
- Remove /dev/shm mounts and SecurityLabelDisable (not needed for rootful)
- Change WantedBy to multi-user.target for system services
Documentation Updates:
- Add ADR-005: Rootful Containers with Infrastructure Fact Pattern
- Update ADR-003: Podman + systemd for system-level deployment
- Update authentik-deployment-guide.md for system scope commands
- Update service-integration-guide.md with rootful pattern examples
- Document discarded rootless approach and rationale
Why Rootful Succeeds:
- Direct UID/GID mapping preserves supplementary groups
- Container process groups match host socket group ownership
- No user namespace remapping breaking permissions
Why Rootless Failed (Discarded):
- User namespace UID/GID remapping broke group-based socket access
- Supplementary groups remapped into subgid range didn't match socket ownership
- Even with --userns=host and keep_original_groups, permissions failed
Pattern Established:
- Infrastructure roles create client groups and export GID facts
- Application roles validate facts and consume in container templates
- Rootful containers run as dedicated users with --group-add for socket access
- System-level deployment provides standard systemd service management
Deployment Validated:
- Services in /system.slice/ ✓
- Process groups: 961 (valkey-clients), 962 (postgres-clients), 966 (authentik) ✓
- Socket permissions: 770 with client groups ✓
- HTTP endpoint responding ✓
Resolves authentik deployment issues by implementing proper Podman Quadlet
configuration and fixing networking for external access through Caddy.
Core Fixes:
• Add missing [Install] sections to container Quadlet files for systemd service generation
• Fix pod references from 'systemd-authentik' to 'authentik.pod' for proper Quadlet linking
• Remove problematic --userns=host to use proper rootless user namespaces
• Configure subuid/subgid ranges for authentik user (200000:65536)
• Update networking to bind 0.0.0.0:9000 only (remove unnecessary HTTPS port 9443)
• Add AUTHENTIK_LISTEN__HTTP=0.0.0.0:9000 environment configuration
• Fix Caddy reverse proxy to use HTTP backend instead of HTTPS
Infrastructure Updates:
• Enhance PostgreSQL role with Unix socket configuration and user management
• Improve Valkey role with proper systemd integration and socket permissions
• Add comprehensive service integration documentation
• Update deployment playbooks with backup and restore capabilities
Security Improvements:
• Secure network isolation with Caddy SSL termination
• Reduced attack surface by removing direct HTTPS container exposure
• Proper rootless container configuration with user namespace mapping
Result: authentik now fully operational with external HTTPS access via auth.jnss.me
All systemd services (authentik-pod, authentik-server, authentik-worker) running correctly.