- Implement complete Vaultwarden deployment using Podman Quadlet
- PostgreSQL backend via Unix socket with 777 permissions
- Caddy reverse proxy with WebSocket support for live sync
- Control-node admin token hashing using argon2 (OWASP preset)
- Idempotent token hashing with deterministic salt generation
- Full Authentik SSO integration following official guide
- SMTP email configuration support (optional)
- Invitation-only user registration by default
- Comprehensive documentation with setup and troubleshooting guides
Technical Details:
- Container: vaultwarden/server:latest from Docker Hub
- Database: PostgreSQL via /var/run/postgresql socket
- Port: 8080 (localhost only, proxied by Caddy)
- Domain: vault.jnss.me
- Admin token: Hashed on control node with argon2id
- SSO: OpenID Connect with offline_access scope support
Role includes automatic argon2 installation on control node if needed.
Major architectural changes:
- Replace config file templating with unified OCC command script
- Remove custom_apps mount overlay that caused Caddy serving issues
- Implement script-based configuration for idempotency and clarity
Configuration improvements:
- Add email/SMTP support with master switch (nextcloud_email_enabled)
- Add OIDC/SSO integration with Authentik support
- Add apps installation (user_oidc, calendar, contacts)
- Enable group provisioning and quota management from OIDC
- Set nextcloud_oidc_unique_uid to false per Authentik docs
Files removed:
- nextcloud.config.php.j2 (replaced by OCC commands)
- redis.config.php.j2 (replaced by OCC commands)
- optimization.yml (merged into configure.yml)
Files added:
- configure-nextcloud.sh.j2 (single source of truth for config)
- configure.yml (deploys and runs configuration script)
Documentation:
- Add comprehensive OIDC setup guide with Authentik integration
- Document custom scope mapping and group provisioning
- Add email configuration examples for common providers
- Update vault variables documentation
- Explain two-phase deployment approach
Host configuration:
- Change admin user from 'admin' to 'joakim'
- Add admin email configuration
Changes:
- Configure Gitea mailer with Titan Email SMTP settings
- Add SMTP_AUTH = PLAIN for authentication method specification
- Update SMTP password in vault (vault_gitea_smtp_password)
Email Status:
Currently non-functional due to SMTP authentication rejection by Titan Email
servers. Error: 535 5.7.8 authentication failed
Troubleshooting Performed:
- Tested both port 587 (STARTTLS) and 465 (SSL/TLS)
- Verified credentials work in webmail
- Tested AUTH PLAIN and AUTH LOGIN methods
- Removed conflicting TLS settings
- Both authentication methods rejected despite correct credentials
Root Cause:
The issue is NOT a Gitea configuration problem. Titan Email SMTP server
is rejecting all authentication attempts from the VPS (69.62.119.31)
despite credentials being correct and working in webmail.
Possible causes:
- SMTP access may need to be enabled in Hostinger control panel
- VPS IP may require whitelisting
- Account may need additional verification for SMTP access
- Titan Email plan may not include external SMTP access
Documentation:
Created comprehensive troubleshooting guide at:
docs/gitea-email-troubleshooting.md
Files Modified:
- roles/gitea/templates/app.ini.j2 (+1 line: SMTP_AUTH = PLAIN)
- docs/gitea-email-troubleshooting.md (new file, complete troubleshooting log)
- host_vars/arch-vps/vault.yml (updated SMTP password - not committed)
Next Steps:
- Check Hostinger control panel for SMTP/IMAP access toggle
- Test SMTP from different IP to rule out IP blocking
- Contact Hostinger/Titan support for SMTP access verification
- Consider alternative email providers if Titan SMTP unavailable
Major Changes:
- Add dual SSH mode system (passthrough default, dedicated fallback)
- Refactor domain configuration to use direct specification pattern
- Fix critical fail2ban security gap in dedicated mode
- Separate HTTP and SSH domains for cleaner Git URLs
- Restructure security playbook with modular nftables loader
- Base rules loaded first, service rules second, drop rule last
- Add Gitea self-contained firewall management (port 2222)
- Add fail2ban protection for Gitea SSH brute force attacks
- Update documentation with new firewall architecture
- Create comprehensive Gitea deployment and testing guide
This enables self-contained service roles to manage their own firewall
rules without modifying the central security playbook. Each service
deploys rules to /etc/nftables.d/ which are loaded before the final
drop rule, maintaining the defense-in-depth security model.
- Deploy /etc/containers/auth.json with GHCR credentials
- Support for private container image pulls
- Credentials encrypted in Ansible vault
- Used by devigo and other services pulling from private registries
- Updated documentation with authentication setup
## New Features
- **Nextcloud Role**: Complete cloud storage deployment using Podman Quadlet
- FPM variant with Caddy reverse proxy and FastCGI
- PostgreSQL database via Unix socket
- Valkey/Redis for app-level caching and file locking
- Automatic HTTPS with Let's Encrypt via Caddy
- Dual-root pattern: Caddy serves static assets, FPM handles PHP
- **Split Caching Strategy**: Redis caching WITHOUT Redis sessions
- Custom redis.config.php template for app-level caching only
- File-based PHP sessions for stability (avoids session lock issues)
- Prevents cascading failures from session lock contention
- Documented in role README with detailed rationale
## Infrastructure Updates
- **Socket Permissions**: Update PostgreSQL and Valkey to mode 777
- Required for containers that switch users (root → www-data)
- Nextcloud container loses supplementary groups on user switch
- Security maintained via password authentication (scram-sha-256, requirepass)
- Documented socket permission architecture in docs/
- **PostgreSQL**: Export client group GID as fact for dependent roles
- **Valkey**: Export client group GID as fact, update socket fix service
## Documentation
- New: docs/socket-permissions-architecture.md
- Explains 777 vs 770 socket permission trade-offs
- Documents why group-based access doesn't work for user-switching containers
- Provides TCP alternative for stricter security requirements
- Updated: All role READMEs with socket permission notes
- New: Nextcloud README with comprehensive deployment, troubleshooting, and Redis architecture documentation
## Configuration
- host_vars: Add Nextcloud vault variables and configuration
- site.yml: Include Nextcloud role in main playbook
## Technical Details
**Why disable Redis sessions?**
The official Nextcloud container enables Redis session handling via REDIS_HOST env var,
which causes severe performance issues:
1. Session lock contention under high concurrency (browser parallel asset requests)
2. Infinite lock retries (default lock_retries=-1) blocking FPM workers
3. Timeout orphaning: reverse proxy kills connection, worker keeps lock
4. Worker pool exhaustion: all 5 default workers blocked on same session lock
5. Cascading failure: new requests queue, more timeouts, more orphaned locks
Solution: Use file-based sessions (reliable, fast for single-server) while keeping
Redis for distributed cache and transactional file locking via custom config file.
This provides optimal performance without the complexity of Redis session debugging.
Tested: Fresh deployment on arch-vps (69.62.119.31)
Domain: https://cloud.jnss.me/
Changes:
- Migrate Authentik to journald logging (remove file-based logs)
- Update Gitea to use infrastructure variables for PostgreSQL access
- Add comprehensive logging documentation to deployment guide
- Add infrastructure variable pattern guide to integration docs
Authentik Logging:
- Remove LogDriver=k8s-file from server and worker containers
- Remove logs directory creation from user setup tasks
- Update deployment guide with journald examples and JSON log patterns
Gitea Infrastructure Variables:
- Add infrastructure dependencies section to role defaults
- Replace hardcoded paths with postgresql_unix_socket_directories variable
- Replace hardcoded 'postgres' group with postgresql_client_group variable
- Add infrastructure variable validation in tasks
- Remove manual socket permission override (handled by infrastructure)
Documentation:
- Add journald logging best practices to service integration guide
- Add infrastructure variable pattern documentation with Gitea example
- Update Authentik deployment guide with journald commands and JSON filtering
- Document benefits: centralized logging, single source of truth, maintainability
Validated on arch-vps:
- Authentik logs accessible via journalctl and podman logs (identical output)
- Gitea user added to postgres-clients group (GID 962)
- No PostgreSQL socket permission errors after service restart
Major architectural change from rootless user services to system-level (rootful)
containers to enable group-based Unix socket access for containerized applications.
Infrastructure Changes:
- PostgreSQL: Export postgres-clients group GID as Ansible fact
- Valkey: Export valkey-clients group GID as Ansible fact
- Valkey: Add socket-fix service to maintain correct socket group ownership
- Both: Set socket directories to 770 with client group ownership
Authentik Role Refactoring:
- Remove rootless container configuration (subuid/subgid, lingering, user systemd)
- Deploy Quadlet files to /etc/containers/systemd/ (system-level)
- Use dynamic GID facts in container PodmanArgs (--group-add)
- Simplify user creation to system user with infrastructure group membership
- Update handlers for system scope service management
- Remove unnecessary container security options (no user namespace isolation)
Container Template Changes:
- Pod: Remove --userns args, change WantedBy to multi-user.target
- Containers: Replace Annotation with PodmanArgs using dynamic GIDs
- Remove /dev/shm mounts and SecurityLabelDisable (not needed for rootful)
- Change WantedBy to multi-user.target for system services
Documentation Updates:
- Add ADR-005: Rootful Containers with Infrastructure Fact Pattern
- Update ADR-003: Podman + systemd for system-level deployment
- Update authentik-deployment-guide.md for system scope commands
- Update service-integration-guide.md with rootful pattern examples
- Document discarded rootless approach and rationale
Why Rootful Succeeds:
- Direct UID/GID mapping preserves supplementary groups
- Container process groups match host socket group ownership
- No user namespace remapping breaking permissions
Why Rootless Failed (Discarded):
- User namespace UID/GID remapping broke group-based socket access
- Supplementary groups remapped into subgid range didn't match socket ownership
- Even with --userns=host and keep_original_groups, permissions failed
Pattern Established:
- Infrastructure roles create client groups and export GID facts
- Application roles validate facts and consume in container templates
- Rootful containers run as dedicated users with --group-add for socket access
- System-level deployment provides standard systemd service management
Deployment Validated:
- Services in /system.slice/ ✓
- Process groups: 961 (valkey-clients), 962 (postgres-clients), 966 (authentik) ✓
- Socket permissions: 770 with client groups ✓
- HTTP endpoint responding ✓
- Add git user to postgres group for Unix socket access
- Ensure PostgreSQL socket directory has proper permissions
- Add socket connectivity test before database operations
- Update database tasks to use explicit socket parameters
- Add missing database privileges grant task
Resolves timeout issue in 'waiting for gitea to be ready' task
caused by permission denied errors when accessing PostgreSQL
Unix socket. Follows same pattern as working Authentik role.
- Fix authentik Caddy template to use localhost instead of variable for consistency
- Improve Caddy installation logic with better conditional checks
- Fix version checking and plugin detection for more reliable deployments
- Add cleanup task condition for DNS challenge installations
These changes improve deployment reliability and consistency.
- Enable sigvild-gallery role in site.yml playbook
- Add backup configuration to host variables
- Integrate restore functionality into main sigvild-gallery tasks
- Add data protection logic to prevent accidental overwrites
- Enable gitea role for complete service deployment
This completes the sigvild-gallery service integration with backup/restore capabilities.
- Fix Unix timestamp conversion in restore.yml using proper strftime syntax
- Add service existence check before stopping sigvild-gallery service
- Fix systemd service template environment variable syntax error
- Add proper error handling for fresh deployments where service doesn't exist yet
Resolves service management failures during restoration on fresh VPS installations.
- Change reverse_proxy from https:// to http:// backend
- Use authentik_http_port instead of authentik_https_port
- Remove unnecessary TLS transport configuration
- Remove health check for non-existent endpoint
This aligns the Ansible template with the working configuration
where authentik only serves HTTP internally and Caddy handles SSL.
Merge completed authentik Quadlet implementation that resolves all deployment
issues and enables external HTTPS access. This brings the working solution
developed and tested on authentik-quadlet-fix branch into main.
All systemd services now generate correctly and authentik is fully operational
at https://auth.jnss.me with proper SSL termination via Caddy.
Resolves authentik deployment issues by implementing proper Podman Quadlet
configuration and fixing networking for external access through Caddy.
Core Fixes:
• Add missing [Install] sections to container Quadlet files for systemd service generation
• Fix pod references from 'systemd-authentik' to 'authentik.pod' for proper Quadlet linking
• Remove problematic --userns=host to use proper rootless user namespaces
• Configure subuid/subgid ranges for authentik user (200000:65536)
• Update networking to bind 0.0.0.0:9000 only (remove unnecessary HTTPS port 9443)
• Add AUTHENTIK_LISTEN__HTTP=0.0.0.0:9000 environment configuration
• Fix Caddy reverse proxy to use HTTP backend instead of HTTPS
Infrastructure Updates:
• Enhance PostgreSQL role with Unix socket configuration and user management
• Improve Valkey role with proper systemd integration and socket permissions
• Add comprehensive service integration documentation
• Update deployment playbooks with backup and restore capabilities
Security Improvements:
• Secure network isolation with Caddy SSL termination
• Reduced attack surface by removing direct HTTPS container exposure
• Proper rootless container configuration with user namespace mapping
Result: authentik now fully operational with external HTTPS access via auth.jnss.me
All systemd services (authentik-pod, authentik-server, authentik-worker) running correctly.
- Created authentik.pod file for proper pod definition
- Removed superfluous authentik-pod.container file
- Updated container templates to reference pod correctly
- Issue: Quadlet still reports 'pod authentik is not Quadlet based'
- Container services not being generated (only pod service works)
- Implemented complete Valkey infrastructure role following PostgreSQL patterns
- Provides 100% Redis-compatible high-performance data structure store
- Configured for multi-application support with database isolation
- Security-focused: localhost-only binding, password auth, systemd hardening
- Arch Linux compatible: uses native Valkey package with Redis compatibility
- Database allocation strategy: DB 0 reserved, DB 1+ for applications
- Full systemd integration with security overrides and proper service management
- Redis client compatibility maintained for seamless application integration
- Ready for Authentik and future container workloads requiring cache services
- Implements complete Gitea Git service following rick-infra self-contained architecture
- Uses PostgreSQL infrastructure role as dependency and manages own database/user
- Native Arch Linux installation via pacman packages
- Automatic database setup (gitea database and user creation)
- SystemD service with security hardening and proper dependency management
- Caddy reverse proxy integration deployed to sites-enabled directory
- SSH server on port 2222 with automatic host key generation
- Production-ready with LFS support, security headers, and HTTPS via Caddy
- Follows simplified configuration approach with essential variables only
- Self-contained pattern: service manages complete setup independently
- Provides PostgreSQL server as shared database infrastructure
- Follows KISS principle with only essential configuration (11 variables vs 45 originally)
- Implements maximum security with Unix socket-only superuser access
- Uses scram-sha-256 authentication for application users
- Includes SystemD security hardening
- Applications manage their own databases/users via this infrastructure
- Production-ready with data checksums and localhost-only access