Commit Graph

9 Commits

Author SHA1 Message Date
e8b76c6a72 Update authentication documentation to reflect OAuth/OIDC as primary method
- Update architecture-decisions.md: Change decision to OAuth/OIDC primary, forward auth fallback
  - Add comprehensive OAuth/OIDC and forward auth flow diagrams
  - Add decision matrix comparing both authentication methods
  - Include real examples: Nextcloud/Gitea OAuth configs, whoami forward auth
  - Update rationale to emphasize OAuth/OIDC security and standards benefits

- Update authentication-architecture.md: Align with new OAuth-first approach
  - Add 'Choosing the Right Pattern' section with clear decision guidance
  - Swap pattern order: OAuth/OIDC (Pattern 1), Forward Auth (Pattern 2)
  - Update Example 1: Change Gitea from forward auth to OAuth/OIDC integration
  - Add emphasis on primary vs fallback methods throughout

- Update authentik-deployment-guide.md: Reflect OAuth/OIDC preference
  - Update overview to mention OAuth2/OIDC provider and forward auth fallback
  - Add decision guidance to service integration examples
  - Reorder examples: Nextcloud OAuth (primary), forward auth (fallback)
  - Clarify forward auth should only be used for services without OAuth support

This update ensures all authentication documentation consistently reflects the
agreed architectural decision: use OAuth/OIDC when services support it
(Nextcloud, Gitea, modern apps), and only use forward auth as a fallback for
legacy applications, static sites, or simple tools without OAuth capabilities.
2025-12-15 00:25:24 +01:00
4f8da38ca6 Add Nextcloud cloud storage role with split Redis caching strategy
## New Features

- **Nextcloud Role**: Complete cloud storage deployment using Podman Quadlet
  - FPM variant with Caddy reverse proxy and FastCGI
  - PostgreSQL database via Unix socket
  - Valkey/Redis for app-level caching and file locking
  - Automatic HTTPS with Let's Encrypt via Caddy
  - Dual-root pattern: Caddy serves static assets, FPM handles PHP

- **Split Caching Strategy**: Redis caching WITHOUT Redis sessions
  - Custom redis.config.php template for app-level caching only
  - File-based PHP sessions for stability (avoids session lock issues)
  - Prevents cascading failures from session lock contention
  - Documented in role README with detailed rationale

## Infrastructure Updates

- **Socket Permissions**: Update PostgreSQL and Valkey to mode 777
  - Required for containers that switch users (root → www-data)
  - Nextcloud container loses supplementary groups on user switch
  - Security maintained via password authentication (scram-sha-256, requirepass)
  - Documented socket permission architecture in docs/

- **PostgreSQL**: Export client group GID as fact for dependent roles
- **Valkey**: Export client group GID as fact, update socket fix service

## Documentation

- New: docs/socket-permissions-architecture.md
  - Explains 777 vs 770 socket permission trade-offs
  - Documents why group-based access doesn't work for user-switching containers
  - Provides TCP alternative for stricter security requirements

- Updated: All role READMEs with socket permission notes
- New: Nextcloud README with comprehensive deployment, troubleshooting, and Redis architecture documentation

## Configuration

- host_vars: Add Nextcloud vault variables and configuration
- site.yml: Include Nextcloud role in main playbook

## Technical Details

**Why disable Redis sessions?**

The official Nextcloud container enables Redis session handling via REDIS_HOST env var,
which causes severe performance issues:

1. Session lock contention under high concurrency (browser parallel asset requests)
2. Infinite lock retries (default lock_retries=-1) blocking FPM workers
3. Timeout orphaning: reverse proxy kills connection, worker keeps lock
4. Worker pool exhaustion: all 5 default workers blocked on same session lock
5. Cascading failure: new requests queue, more timeouts, more orphaned locks

Solution: Use file-based sessions (reliable, fast for single-server) while keeping
Redis for distributed cache and transactional file locking via custom config file.

This provides optimal performance without the complexity of Redis session debugging.

Tested: Fresh deployment on arch-vps (69.62.119.31)
Domain: https://cloud.jnss.me/
2025-12-14 22:07:08 +01:00
8e8aabd5e7 Improve logging and infrastructure variable consistency
Changes:
- Migrate Authentik to journald logging (remove file-based logs)
- Update Gitea to use infrastructure variables for PostgreSQL access
- Add comprehensive logging documentation to deployment guide
- Add infrastructure variable pattern guide to integration docs

Authentik Logging:
- Remove LogDriver=k8s-file from server and worker containers
- Remove logs directory creation from user setup tasks
- Update deployment guide with journald examples and JSON log patterns

Gitea Infrastructure Variables:
- Add infrastructure dependencies section to role defaults
- Replace hardcoded paths with postgresql_unix_socket_directories variable
- Replace hardcoded 'postgres' group with postgresql_client_group variable
- Add infrastructure variable validation in tasks
- Remove manual socket permission override (handled by infrastructure)

Documentation:
- Add journald logging best practices to service integration guide
- Add infrastructure variable pattern documentation with Gitea example
- Update Authentik deployment guide with journald commands and JSON filtering
- Document benefits: centralized logging, single source of truth, maintainability

Validated on arch-vps:
- Authentik logs accessible via journalctl and podman logs (identical output)
- Gitea user added to postgres-clients group (GID 962)
- No PostgreSQL socket permission errors after service restart
2025-12-14 17:16:21 +01:00
3506e55016 Migrate to rootful container architecture with infrastructure fact pattern
Major architectural change from rootless user services to system-level (rootful)
containers to enable group-based Unix socket access for containerized applications.

Infrastructure Changes:
- PostgreSQL: Export postgres-clients group GID as Ansible fact
- Valkey: Export valkey-clients group GID as Ansible fact
- Valkey: Add socket-fix service to maintain correct socket group ownership
- Both: Set socket directories to 770 with client group ownership

Authentik Role Refactoring:
- Remove rootless container configuration (subuid/subgid, lingering, user systemd)
- Deploy Quadlet files to /etc/containers/systemd/ (system-level)
- Use dynamic GID facts in container PodmanArgs (--group-add)
- Simplify user creation to system user with infrastructure group membership
- Update handlers for system scope service management
- Remove unnecessary container security options (no user namespace isolation)

Container Template Changes:
- Pod: Remove --userns args, change WantedBy to multi-user.target
- Containers: Replace Annotation with PodmanArgs using dynamic GIDs
- Remove /dev/shm mounts and SecurityLabelDisable (not needed for rootful)
- Change WantedBy to multi-user.target for system services

Documentation Updates:
- Add ADR-005: Rootful Containers with Infrastructure Fact Pattern
- Update ADR-003: Podman + systemd for system-level deployment
- Update authentik-deployment-guide.md for system scope commands
- Update service-integration-guide.md with rootful pattern examples
- Document discarded rootless approach and rationale

Why Rootful Succeeds:
- Direct UID/GID mapping preserves supplementary groups
- Container process groups match host socket group ownership
- No user namespace remapping breaking permissions

Why Rootless Failed (Discarded):
- User namespace UID/GID remapping broke group-based socket access
- Supplementary groups remapped into subgid range didn't match socket ownership
- Even with --userns=host and keep_original_groups, permissions failed

Pattern Established:
- Infrastructure roles create client groups and export GID facts
- Application roles validate facts and consume in container templates
- Rootful containers run as dedicated users with --group-add for socket access
- System-level deployment provides standard systemd service management

Deployment Validated:
- Services in /system.slice/ ✓
- Process groups: 961 (valkey-clients), 962 (postgres-clients), 966 (authentik) ✓
- Socket permissions: 770 with client groups ✓
- HTTP endpoint responding ✓
2025-12-14 16:56:50 +01:00
9e570ac2a3 Add comprehensive authentik documentation and improve role configuration
- Add authentik-deployment-guide.md: Complete step-by-step deployment guide
- Add architecture-decisions.md: Document native DB vs containerized rationale
- Add authentication-architecture.md: SSO strategy and integration patterns
- Update deployment-guide.md: Integrate authentik deployment procedures
- Update security-hardening.md: Add multi-layer security documentation
- Update service-integration-guide.md: Add authentik integration examples
- Update README.md: Professional project overview with architecture benefits
- Update authentik role: Fix HTTP binding, add security configs, improve templates
- Remove unused authentik task files: containers.yml, networking.yml

Key improvements:
* Document security benefits of native databases over containers
* Document Unix socket IPC architecture advantages
* Provide comprehensive troubleshooting and deployment procedures
* Add forward auth integration patterns for services
* Fix authentik HTTP binding from 127.0.0.1 to 0.0.0.0
* Add shared memory and IPC security configurations
2025-12-13 21:04:20 +01:00
b42ee2a22b Fix: Complete authentik Quadlet implementation with networking solution
Resolves authentik deployment issues by implementing proper Podman Quadlet
configuration and fixing networking for external access through Caddy.

Core Fixes:
• Add missing [Install] sections to container Quadlet files for systemd service generation
• Fix pod references from 'systemd-authentik' to 'authentik.pod' for proper Quadlet linking
• Remove problematic --userns=host to use proper rootless user namespaces
• Configure subuid/subgid ranges for authentik user (200000:65536)
• Update networking to bind 0.0.0.0:9000 only (remove unnecessary HTTPS port 9443)
• Add AUTHENTIK_LISTEN__HTTP=0.0.0.0:9000 environment configuration
• Fix Caddy reverse proxy to use HTTP backend instead of HTTPS

Infrastructure Updates:
• Enhance PostgreSQL role with Unix socket configuration and user management
• Improve Valkey role with proper systemd integration and socket permissions
• Add comprehensive service integration documentation
• Update deployment playbooks with backup and restore capabilities

Security Improvements:
• Secure network isolation with Caddy SSL termination
• Reduced attack surface by removing direct HTTPS container exposure
• Proper rootless container configuration with user namespace mapping

Result: authentik now fully operational with external HTTPS access via auth.jnss.me
All systemd services (authentik-pod, authentik-server, authentik-worker) running correctly.
2025-12-04 19:42:31 +01:00
7c3b02e5ad Add Sigvild Gallery wedding photo application with automated deployment and improve Caddy plugin management 2025-11-18 22:33:56 +01:00
8162e789ee Simplify Caddy infrastructure to use file-based configuration instead of complex API registration system 2025-11-15 00:30:38 +01:00
0b6eea6113 Initial commit 2025-11-12 20:48:28 +01:00