# Socket Permissions Architecture Decision ## Context Rick-infra uses Unix domain sockets for PostgreSQL and Valkey (Redis) connections to maximize performance and security. Applications run in Podman containers and need to access these infrastructure services via sockets. ## Problem Different container images have different user models: 1. **Authentik**: Runs as a specific user (UID 966) from start to finish 2. **Nextcloud**: Starts as root, runs entrypoint scripts, then switches to www-data (UID 33) When using `--group-add` with Podman: - Supplementary groups are added to the **initial user** the container runs as - Groups are **NOT inherited** when a container switches users internally - Nextcloud's www-data process ends up without socket access ## Decision **Use 777 permissions on Unix sockets** for PostgreSQL and Valkey. ## Rationale ### Why 777 Works 1. **Compatibility**: Any container user model can access the sockets 2. **Simplicity**: No complex user namespace mapping needed 3. **Security maintained**: Password authentication still required 4. **Local-only**: Sockets are not network-exposed ### Security Analysis **What 777 allows:** - ✅ Any local process can **attempt** to connect to the socket **What 777 does NOT allow:** - ❌ Authentication bypass - PostgreSQL requires username + password (scram-sha-256) - ❌ Network access - Sockets are local filesystem only - ❌ Remote connections - Not exposed beyond localhost **Security layers:** 1. **Physical**: Server access required 2. **Process**: Must be running on the same host 3. **Authentication**: Must provide valid credentials 4. **Authorization**: Database/Redis permissions enforced ### Comparison to TCP Localhost Using `127.0.0.1:5432` (TCP) has **identical security**: - Localhost-only (not network) - Requires authentication - Any local process can attempt connection Socket 777 vs TCP localhost: - **Same security model**: Both require credentials, both are local-only - **Different performance**: Sockets are faster (no TCP/IP stack overhead) - **Different permissions**: Sockets use filesystem permissions, TCP uses network ## Alternatives Considered ### Alternative 1: Group-based Permissions (770) **Implementation:** ```yaml postgresql_unix_socket_permissions: "0770" valkey_unix_socket_perm: "770" ``` **Why rejected:** - Doesn't work for Nextcloud (www-data not in groups after su switch) - Requires all containers to use `--group-add` - Complex UID/GID management - Breaks container user-switching patterns ### Alternative 2: User Namespace Mapping **Implementation:** ``` --uidmap 33:963:1 # Map www-data to nextcloud --gidmap 33:963:1 ``` **Why rejected:** - Container's root loses privileges (can't run entrypoint) - Very complex configuration - Fragile (breaks on image updates) - Doesn't solve the fundamental user-switching problem ### Alternative 3: TCP on Localhost **Implementation:** ```yaml # PostgreSQL postgresql_listen_addresses: "127.0.0.1" # Valkey valkey_bind: "127.0.0.1" valkey_port: 6379 ``` **Why not chosen (but valid alternative):** - ✅ Same security as socket 777 - ✅ No permission issues - ❌ Abandons Unix socket performance benefits - ❌ Goes against infrastructure design goal **Status:** Documented as alternative, available for users who prefer it ### Alternative 4: Custom Entrypoint **Implementation:** Create wrapper that adds www-data to groups before starting FPM. **Why rejected:** - Requires custom Dockerfile - Maintenance burden - Breaks on upstream image updates - Fragile and complex ## Implementation ### Files Changed 1. `roles/postgresql/defaults/main.yml`: Set `postgresql_unix_socket_permissions: "0777"` 2. `roles/valkey/defaults/main.yml`: Set `valkey_unix_socket_perm: "777"` 3. Documentation updated in all affected role READMEs ### Migration Path For existing deployments: 1. Update socket permissions: `chmod 777 /var/run/postgresql/.s.PGSQL.5432` 2. Update socket permissions: `chmod 777 /var/run/valkey/valkey.sock` 3. Restart services (permissions persist via role configuration) ## Consequences ### Positive - ✅ Works with all container user models (root-switching, single-user, etc.) - ✅ Simple to understand and maintain - ✅ No complex UID/GID mapping required - ✅ Standard pattern, well-documented - ✅ Authentication still enforced ### Negative - ⚠️ Any local process can attempt socket connection - ⚠️ Requires clear documentation of security model - ⚠️ May surprise users expecting tighter filesystem permissions ### Neutral - ℹ️ Same security model as TCP localhost - ℹ️ Alternative (TCP) available for those who prefer it - ℹ️ Follows "make it work, make it right, make it fast" philosophy ## Validation ### Security Validation 1. **Authentication required**: ✅ Tested - connection requires credentials 2. **Password strength**: ✅ Enforced via scram-sha-256 and vault 3. **Local-only**: ✅ Sockets are filesystem objects, not network 4. **Process isolation**: ✅ Each service has separate database/namespace ### Functional Validation 1. **Authentik**: ✅ Works with 777 sockets 2. **Nextcloud**: ✅ Works with 777 sockets (www-data can access) 3. **Gitea**: ✅ Works with 777 sockets ## Monitoring No additional monitoring required. Standard checks apply: - Service authentication logs (failed login attempts) - Connection monitoring via application logs - Systemd service health ## Documentation All relevant READMEs updated with: - Explanation of 777 permission choice - Security rationale - TCP alternative configuration - Clear security model explanation ## Future Considerations This decision can be revisited if: 1. Container orchestration changes (e.g., Kubernetes with different security contexts) 2. New containers with different user models emerge 3. Network isolation requirements change 4. Regulatory compliance requires stricter filesystem permissions In such cases, the TCP alternative provides an equivalent security model without filesystem permission concerns. ## References - [PostgreSQL Role README](../roles/postgresql/README.md) - [Valkey Role README](../roles/valkey/README.md) - [Nextcloud Role README](../roles/nextcloud/README.md) - [Podman User Namespaces Documentation](https://docs.podman.io/en/latest/markdown/podman-run.1.html#userns-mode) - [Unix Socket Security](https://www.man7.org/linux/man-pages/man7/unix.7.html) --- **Decision Date**: December 14, 2025 **Status**: Accepted **Reviewers**: rick-infra maintainers