Add Nextcloud cloud storage role with split Redis caching strategy

## New Features

- **Nextcloud Role**: Complete cloud storage deployment using Podman Quadlet
  - FPM variant with Caddy reverse proxy and FastCGI
  - PostgreSQL database via Unix socket
  - Valkey/Redis for app-level caching and file locking
  - Automatic HTTPS with Let's Encrypt via Caddy
  - Dual-root pattern: Caddy serves static assets, FPM handles PHP

- **Split Caching Strategy**: Redis caching WITHOUT Redis sessions
  - Custom redis.config.php template for app-level caching only
  - File-based PHP sessions for stability (avoids session lock issues)
  - Prevents cascading failures from session lock contention
  - Documented in role README with detailed rationale

## Infrastructure Updates

- **Socket Permissions**: Update PostgreSQL and Valkey to mode 777
  - Required for containers that switch users (root → www-data)
  - Nextcloud container loses supplementary groups on user switch
  - Security maintained via password authentication (scram-sha-256, requirepass)
  - Documented socket permission architecture in docs/

- **PostgreSQL**: Export client group GID as fact for dependent roles
- **Valkey**: Export client group GID as fact, update socket fix service

## Documentation

- New: docs/socket-permissions-architecture.md
  - Explains 777 vs 770 socket permission trade-offs
  - Documents why group-based access doesn't work for user-switching containers
  - Provides TCP alternative for stricter security requirements

- Updated: All role READMEs with socket permission notes
- New: Nextcloud README with comprehensive deployment, troubleshooting, and Redis architecture documentation

## Configuration

- host_vars: Add Nextcloud vault variables and configuration
- site.yml: Include Nextcloud role in main playbook

## Technical Details

**Why disable Redis sessions?**

The official Nextcloud container enables Redis session handling via REDIS_HOST env var,
which causes severe performance issues:

1. Session lock contention under high concurrency (browser parallel asset requests)
2. Infinite lock retries (default lock_retries=-1) blocking FPM workers
3. Timeout orphaning: reverse proxy kills connection, worker keeps lock
4. Worker pool exhaustion: all 5 default workers blocked on same session lock
5. Cascading failure: new requests queue, more timeouts, more orphaned locks

Solution: Use file-based sessions (reliable, fast for single-server) while keeping
Redis for distributed cache and transactional file locking via custom config file.

This provides optimal performance without the complexity of Redis session debugging.

Tested: Fresh deployment on arch-vps (69.62.119.31)
Domain: https://cloud.jnss.me/
This commit is contained in:
2025-12-14 22:07:08 +01:00
parent 8e8aabd5e7
commit 4f8da38ca6
24 changed files with 1379 additions and 8 deletions

View File

@@ -0,0 +1,50 @@
# Nextcloud Environment Configuration
# Generated by Ansible Nextcloud role
# =================================================================
# Database Configuration (PostgreSQL via Unix Socket)
# =================================================================
POSTGRES_HOST={{ postgresql_unix_socket_directories }}
POSTGRES_DB={{ nextcloud_db_name }}
POSTGRES_USER={{ nextcloud_db_user }}
POSTGRES_PASSWORD={{ nextcloud_db_password }}
# =================================================================
# Admin Account (Auto-configured on first run)
# =================================================================
NEXTCLOUD_ADMIN_USER={{ nextcloud_admin_user }}
NEXTCLOUD_ADMIN_PASSWORD={{ nextcloud_admin_password }}
# =================================================================
# Trusted Domains
# =================================================================
NEXTCLOUD_TRUSTED_DOMAINS={{ nextcloud_trusted_domains }}
# =================================================================
# Redis/Valkey Cache Configuration
# =================================================================
# Note: Nextcloud uses REDIS_* variables even for Valkey (Redis-compatible)
# Socket access works because infrastructure sockets use 777 permissions
# Note: These are disabled since we've encountered slowdowns and issues with redis sessions. Instead nextcloud now uses file sessions.
# REDIS_HOST={{ valkey_unix_socket_path }}
# REDIS_HOST_PASSWORD={{ valkey_password }}
# =================================================================
# Reverse Proxy Configuration
# =================================================================
# These settings tell Nextcloud it's behind a reverse proxy (Caddy)
OVERWRITEPROTOCOL={{ nextcloud_overwriteprotocol }}
OVERWRITEHOST={{ nextcloud_domain }}
TRUSTED_PROXIES=127.0.0.1
# =================================================================
# PHP Configuration
# =================================================================
PHP_MEMORY_LIMIT={{ nextcloud_php_memory_limit }}
PHP_UPLOAD_LIMIT={{ nextcloud_php_upload_limit }}
# =================================================================
# Application Settings
# =================================================================
# Enable automatic updates during container restart
NEXTCLOUD_UPDATE=1