Files
rick-infra/roles/valkey/tasks/main.yml
Joakim 4f8da38ca6 Add Nextcloud cloud storage role with split Redis caching strategy
## New Features

- **Nextcloud Role**: Complete cloud storage deployment using Podman Quadlet
  - FPM variant with Caddy reverse proxy and FastCGI
  - PostgreSQL database via Unix socket
  - Valkey/Redis for app-level caching and file locking
  - Automatic HTTPS with Let's Encrypt via Caddy
  - Dual-root pattern: Caddy serves static assets, FPM handles PHP

- **Split Caching Strategy**: Redis caching WITHOUT Redis sessions
  - Custom redis.config.php template for app-level caching only
  - File-based PHP sessions for stability (avoids session lock issues)
  - Prevents cascading failures from session lock contention
  - Documented in role README with detailed rationale

## Infrastructure Updates

- **Socket Permissions**: Update PostgreSQL and Valkey to mode 777
  - Required for containers that switch users (root → www-data)
  - Nextcloud container loses supplementary groups on user switch
  - Security maintained via password authentication (scram-sha-256, requirepass)
  - Documented socket permission architecture in docs/

- **PostgreSQL**: Export client group GID as fact for dependent roles
- **Valkey**: Export client group GID as fact, update socket fix service

## Documentation

- New: docs/socket-permissions-architecture.md
  - Explains 777 vs 770 socket permission trade-offs
  - Documents why group-based access doesn't work for user-switching containers
  - Provides TCP alternative for stricter security requirements

- Updated: All role READMEs with socket permission notes
- New: Nextcloud README with comprehensive deployment, troubleshooting, and Redis architecture documentation

## Configuration

- host_vars: Add Nextcloud vault variables and configuration
- site.yml: Include Nextcloud role in main playbook

## Technical Details

**Why disable Redis sessions?**

The official Nextcloud container enables Redis session handling via REDIS_HOST env var,
which causes severe performance issues:

1. Session lock contention under high concurrency (browser parallel asset requests)
2. Infinite lock retries (default lock_retries=-1) blocking FPM workers
3. Timeout orphaning: reverse proxy kills connection, worker keeps lock
4. Worker pool exhaustion: all 5 default workers blocked on same session lock
5. Cascading failure: new requests queue, more timeouts, more orphaned locks

Solution: Use file-based sessions (reliable, fast for single-server) while keeping
Redis for distributed cache and transactional file locking via custom config file.

This provides optimal performance without the complexity of Redis session debugging.

Tested: Fresh deployment on arch-vps (69.62.119.31)
Domain: https://cloud.jnss.me/
2025-12-14 22:07:08 +01:00

163 lines
5.2 KiB
YAML

---
# Valkey Infrastructure Role - Simplified Tasks
- name: Install Valkey
pacman:
name: valkey
state: present
# Note: Arch Linux's redis package (which provides Valkey) creates the 'valkey' user automatically
# We don't need to create users - just ensure data directory permissions
- name: Create Valkey client access group
group:
name: "{{ valkey_client_group }}"
system: true
when: valkey_client_group_create
- name: Ensure valkey user is in client group
user:
name: valkey
groups: "{{ valkey_client_group }}"
append: true
when: valkey_client_group_create
- name: Create Valkey configuration directory
file:
path: /etc/valkey
state: directory
mode: '0755'
- name: Check if Valkey data directory exists
stat:
path: "/var/lib/valkey"
register: valkey_data_dir
- name: Ensure Valkey data directory permissions
file:
path: /var/lib/valkey
state: directory
owner: valkey
group: valkey
mode: '0750'
- name: Create Valkey Unix socket directory
file:
path: "{{ valkey_unix_socket_path | dirname }}"
state: directory
owner: valkey
group: "{{ valkey_client_group }}"
mode: '0777'
when: valkey_unix_socket_enabled
- name: Deploy Valkey configuration file
template:
src: valkey.conf.j2
dest: /etc/valkey/valkey.conf
owner: valkey
group: valkey
mode: '0640'
backup: yes
notify: restart valkey
- name: Deploy Valkey systemd service file (with socket group management)
template:
src: valkey.service.j2
dest: /etc/systemd/system/valkey.service
mode: '0644'
notify:
- reload systemd
- restart valkey
when: valkey_client_group_create
- name: Deploy Valkey socket group fix service
template:
src: valkey-socket-fix.service.j2
dest: /etc/systemd/system/valkey-socket-fix.service
mode: '0644'
notify:
- reload systemd
when: valkey_client_group_create and valkey_unix_socket_enabled
- name: Enable Valkey socket group fix service
systemd:
name: valkey-socket-fix
enabled: true
daemon_reload: true
when: valkey_client_group_create and valkey_unix_socket_enabled
- name: Get Valkey client group GID for containerized applications
shell: "getent group {{ valkey_client_group }} | cut -d: -f3"
register: valkey_client_group_lookup
changed_when: false
when: valkey_client_group_create
- name: Set Valkey client group GID as fact
set_fact:
valkey_client_group_gid: "{{ valkey_client_group_lookup.stdout }}"
when: valkey_client_group_create and valkey_client_group_lookup.stdout is defined
- name: Enable and start Valkey service
systemd:
name: valkey
enabled: "{{ valkey_service_enabled }}"
state: "{{ valkey_service_state }}"
daemon_reload: true
register: valkey_service_result
- name: Wait for Valkey socket file to exist
wait_for:
path: "{{ valkey_unix_socket_path }}"
timeout: 30
when: valkey_service_state == "started" and valkey_unix_socket_enabled
- name: Wait for Valkey to be ready (Unix Socket) - Try without auth first
command: redis-cli -s {{ valkey_unix_socket_path }} ping
register: valkey_socket_ping_noauth
until: >
valkey_socket_ping_noauth.stdout == "PONG" or
"NOAUTH" in (valkey_socket_ping_noauth.stdout + valkey_socket_ping_noauth.stderr)
retries: 15
delay: 2
changed_when: false
failed_when: false
when: valkey_service_state == "started" and valkey_unix_socket_enabled
- name: Wait for Valkey to be ready (Unix Socket) - Try with auth if needed
command: redis-cli -s {{ valkey_unix_socket_path }} -a {{ valkey_password }} ping
register: valkey_socket_ping_auth
until: valkey_socket_ping_auth.stdout == "PONG"
retries: 5
delay: 2
changed_when: false
failed_when: valkey_socket_ping_auth.rc != 0
when: >
valkey_service_state == "started" and valkey_unix_socket_enabled and
(valkey_socket_ping_noauth.stdout != "PONG") and
("NOAUTH" in (valkey_socket_ping_noauth.stdout + valkey_socket_ping_noauth.stderr) or valkey_socket_ping_noauth.rc != 0)
- name: Test Valkey connectivity (Unix Socket)
command: redis-cli -s {{ valkey_unix_socket_path }} -a {{ valkey_password }} ping
register: valkey_ping_result_socket
changed_when: false
failed_when: valkey_ping_result_socket.stdout != "PONG"
when: valkey_service_state == "started" and valkey_unix_socket_enabled
- name: Display Valkey infrastructure status
debug:
msg: |
✅ Valkey infrastructure ready!
📡 Service: {% if valkey_unix_socket_enabled %}Unix Socket ({{ valkey_unix_socket_path }}){% else %}{{ valkey_bind }}:{{ valkey_port }}{% endif %}
🔒 Auth: Password protected
💾 Persistence: {{ 'RDB enabled' if valkey_save_enabled else 'Memory only' }}
🗄️ Databases: {{ valkey_databases }} available (0-{{ valkey_databases - 1 }})
🏗️ Ready for applications to configure Valkey usage
📋 Application Integration:
- Use database numbers 1-{{ valkey_databases - 1 }} for applications
- Database 0 reserved for system/testing
- {% if valkey_unix_socket_enabled %}Unix socket: {{ valkey_unix_socket_path }}{% else %}TCP: {{ valkey_bind }}:{{ valkey_port }}{% endif %}
- Redis-compatible: applications can use REDIS_* or VALKEY_* env vars