14 Commits

Author SHA1 Message Date
ecbeb07ba2 Migrate sigvild-gallery to production environment
- Add multi-environment architecture (homelab + production)
- Create production environment (mini-vps) for client projects
- Create homelab playbook for arch-vps services
- Create production playbook for mini-vps services
- Move sigvild-gallery from homelab to production
- Restructure variables: group_vars/production + host_vars/arch-vps
- Add backup-sigvild.yml playbook with auto-restore functionality
- Fix restore logic to check for data before creating directories
- Add manual variable loading workaround for Ansible 2.20
- Update all documentation for multi-environment setup
- Add ADR-007 documenting multi-environment architecture decision
2025-12-15 16:33:33 +01:00
4f8da38ca6 Add Nextcloud cloud storage role with split Redis caching strategy
## New Features

- **Nextcloud Role**: Complete cloud storage deployment using Podman Quadlet
  - FPM variant with Caddy reverse proxy and FastCGI
  - PostgreSQL database via Unix socket
  - Valkey/Redis for app-level caching and file locking
  - Automatic HTTPS with Let's Encrypt via Caddy
  - Dual-root pattern: Caddy serves static assets, FPM handles PHP

- **Split Caching Strategy**: Redis caching WITHOUT Redis sessions
  - Custom redis.config.php template for app-level caching only
  - File-based PHP sessions for stability (avoids session lock issues)
  - Prevents cascading failures from session lock contention
  - Documented in role README with detailed rationale

## Infrastructure Updates

- **Socket Permissions**: Update PostgreSQL and Valkey to mode 777
  - Required for containers that switch users (root → www-data)
  - Nextcloud container loses supplementary groups on user switch
  - Security maintained via password authentication (scram-sha-256, requirepass)
  - Documented socket permission architecture in docs/

- **PostgreSQL**: Export client group GID as fact for dependent roles
- **Valkey**: Export client group GID as fact, update socket fix service

## Documentation

- New: docs/socket-permissions-architecture.md
  - Explains 777 vs 770 socket permission trade-offs
  - Documents why group-based access doesn't work for user-switching containers
  - Provides TCP alternative for stricter security requirements

- Updated: All role READMEs with socket permission notes
- New: Nextcloud README with comprehensive deployment, troubleshooting, and Redis architecture documentation

## Configuration

- host_vars: Add Nextcloud vault variables and configuration
- site.yml: Include Nextcloud role in main playbook

## Technical Details

**Why disable Redis sessions?**

The official Nextcloud container enables Redis session handling via REDIS_HOST env var,
which causes severe performance issues:

1. Session lock contention under high concurrency (browser parallel asset requests)
2. Infinite lock retries (default lock_retries=-1) blocking FPM workers
3. Timeout orphaning: reverse proxy kills connection, worker keeps lock
4. Worker pool exhaustion: all 5 default workers blocked on same session lock
5. Cascading failure: new requests queue, more timeouts, more orphaned locks

Solution: Use file-based sessions (reliable, fast for single-server) while keeping
Redis for distributed cache and transactional file locking via custom config file.

This provides optimal performance without the complexity of Redis session debugging.

Tested: Fresh deployment on arch-vps (69.62.119.31)
Domain: https://cloud.jnss.me/
2025-12-14 22:07:08 +01:00
3506e55016 Migrate to rootful container architecture with infrastructure fact pattern
Major architectural change from rootless user services to system-level (rootful)
containers to enable group-based Unix socket access for containerized applications.

Infrastructure Changes:
- PostgreSQL: Export postgres-clients group GID as Ansible fact
- Valkey: Export valkey-clients group GID as Ansible fact
- Valkey: Add socket-fix service to maintain correct socket group ownership
- Both: Set socket directories to 770 with client group ownership

Authentik Role Refactoring:
- Remove rootless container configuration (subuid/subgid, lingering, user systemd)
- Deploy Quadlet files to /etc/containers/systemd/ (system-level)
- Use dynamic GID facts in container PodmanArgs (--group-add)
- Simplify user creation to system user with infrastructure group membership
- Update handlers for system scope service management
- Remove unnecessary container security options (no user namespace isolation)

Container Template Changes:
- Pod: Remove --userns args, change WantedBy to multi-user.target
- Containers: Replace Annotation with PodmanArgs using dynamic GIDs
- Remove /dev/shm mounts and SecurityLabelDisable (not needed for rootful)
- Change WantedBy to multi-user.target for system services

Documentation Updates:
- Add ADR-005: Rootful Containers with Infrastructure Fact Pattern
- Update ADR-003: Podman + systemd for system-level deployment
- Update authentik-deployment-guide.md for system scope commands
- Update service-integration-guide.md with rootful pattern examples
- Document discarded rootless approach and rationale

Why Rootful Succeeds:
- Direct UID/GID mapping preserves supplementary groups
- Container process groups match host socket group ownership
- No user namespace remapping breaking permissions

Why Rootless Failed (Discarded):
- User namespace UID/GID remapping broke group-based socket access
- Supplementary groups remapped into subgid range didn't match socket ownership
- Even with --userns=host and keep_original_groups, permissions failed

Pattern Established:
- Infrastructure roles create client groups and export GID facts
- Application roles validate facts and consume in container templates
- Rootful containers run as dedicated users with --group-add for socket access
- System-level deployment provides standard systemd service management

Deployment Validated:
- Services in /system.slice/ ✓
- Process groups: 961 (valkey-clients), 962 (postgres-clients), 966 (authentik) ✓
- Socket permissions: 770 with client groups ✓
- HTTP endpoint responding ✓
2025-12-14 16:56:50 +01:00
9e570ac2a3 Add comprehensive authentik documentation and improve role configuration
- Add authentik-deployment-guide.md: Complete step-by-step deployment guide
- Add architecture-decisions.md: Document native DB vs containerized rationale
- Add authentication-architecture.md: SSO strategy and integration patterns
- Update deployment-guide.md: Integrate authentik deployment procedures
- Update security-hardening.md: Add multi-layer security documentation
- Update service-integration-guide.md: Add authentik integration examples
- Update README.md: Professional project overview with architecture benefits
- Update authentik role: Fix HTTP binding, add security configs, improve templates
- Remove unused authentik task files: containers.yml, networking.yml

Key improvements:
* Document security benefits of native databases over containers
* Document Unix socket IPC architecture advantages
* Provide comprehensive troubleshooting and deployment procedures
* Add forward auth integration patterns for services
* Fix authentik HTTP binding from 127.0.0.1 to 0.0.0.0
* Add shared memory and IPC security configurations
2025-12-13 21:04:20 +01:00
fe4efcbd5b Enable sigvild-gallery role and backup functionality
- Enable sigvild-gallery role in site.yml playbook
- Add backup configuration to host variables
- Integrate restore functionality into main sigvild-gallery tasks
- Add data protection logic to prevent accidental overwrites
- Enable gitea role for complete service deployment

This completes the sigvild-gallery service integration with backup/restore capabilities.
2025-12-07 21:21:50 +01:00
b42ee2a22b Fix: Complete authentik Quadlet implementation with networking solution
Resolves authentik deployment issues by implementing proper Podman Quadlet
configuration and fixing networking for external access through Caddy.

Core Fixes:
• Add missing [Install] sections to container Quadlet files for systemd service generation
• Fix pod references from 'systemd-authentik' to 'authentik.pod' for proper Quadlet linking
• Remove problematic --userns=host to use proper rootless user namespaces
• Configure subuid/subgid ranges for authentik user (200000:65536)
• Update networking to bind 0.0.0.0:9000 only (remove unnecessary HTTPS port 9443)
• Add AUTHENTIK_LISTEN__HTTP=0.0.0.0:9000 environment configuration
• Fix Caddy reverse proxy to use HTTP backend instead of HTTPS

Infrastructure Updates:
• Enhance PostgreSQL role with Unix socket configuration and user management
• Improve Valkey role with proper systemd integration and socket permissions
• Add comprehensive service integration documentation
• Update deployment playbooks with backup and restore capabilities

Security Improvements:
• Secure network isolation with Caddy SSL termination
• Reduced attack surface by removing direct HTTPS container exposure
• Proper rootless container configuration with user namespace mapping

Result: authentik now fully operational with external HTTPS access via auth.jnss.me
All systemd services (authentik-pod, authentik-server, authentik-worker) running correctly.
2025-12-04 19:42:31 +01:00
500224b5de Add Podman container infrastructure role for containerized services
- Implemented complete Podman infrastructure role following rick-infra patterns
- Minimal installation approach: only install podman, trust Arch dependency management
- Configured with crun runtime for optimal performance and security
- Security-focused: HTTPS-only registries, rootless containers, systemd hardening
- Registry support: docker.io, quay.io, ghcr.io with secure configurations
- Ready for service-specific users with isolated container environments
- Quadlet support for native systemd container management
- Container-to-host networking via bridge networks with host gateway access
- Foundation for future containerized services (Authentik, Nextcloud)
- Maintains rick-infra philosophy: infrastructure provides foundation, apps manage specifics
2025-11-20 22:11:44 +01:00
3b062edeb6 Add Valkey infrastructure role as Redis-compatible cache service
- Implemented complete Valkey infrastructure role following PostgreSQL patterns
- Provides 100% Redis-compatible high-performance data structure store
- Configured for multi-application support with database isolation
- Security-focused: localhost-only binding, password auth, systemd hardening
- Arch Linux compatible: uses native Valkey package with Redis compatibility
- Database allocation strategy: DB 0 reserved, DB 1+ for applications
- Full systemd integration with security overrides and proper service management
- Redis client compatibility maintained for seamless application integration
- Ready for Authentik and future container workloads requiring cache services
2025-11-19 22:20:54 +01:00
ddbdefd27f Add self-contained Gitea Git service with PostgreSQL integration
- Implements complete Gitea Git service following rick-infra self-contained architecture
- Uses PostgreSQL infrastructure role as dependency and manages own database/user
- Native Arch Linux installation via pacman packages
- Automatic database setup (gitea database and user creation)
- SystemD service with security hardening and proper dependency management
- Caddy reverse proxy integration deployed to sites-enabled directory
- SSH server on port 2222 with automatic host key generation
- Production-ready with LFS support, security headers, and HTTPS via Caddy
- Follows simplified configuration approach with essential variables only
- Self-contained pattern: service manages complete setup independently
2025-11-18 22:33:56 +01:00
762d00eebf Add simplified PostgreSQL infrastructure role for database services
- Provides PostgreSQL server as shared database infrastructure
- Follows KISS principle with only essential configuration (11 variables vs 45 originally)
- Implements maximum security with Unix socket-only superuser access
- Uses scram-sha-256 authentication for application users
- Includes SystemD security hardening
- Applications manage their own databases/users via this infrastructure
- Production-ready with data checksums and localhost-only access
2025-11-18 22:33:56 +01:00
7c3b02e5ad Add Sigvild Gallery wedding photo application with automated deployment and improve Caddy plugin management 2025-11-18 22:33:56 +01:00
8162e789ee Simplify Caddy infrastructure to use file-based configuration instead of complex API registration system 2025-11-15 00:30:38 +01:00
7788410bfc Complete production-ready Caddy infrastructure with security hardening
- Add comprehensive Caddy role with HTTPS/TLS, DNS challenges, and systemd security
- Implement optimized systemd overrides with enhanced security restrictions
- Create detailed documentation with usage examples and variable references
- Establish proper Ansible configuration with vault integration
- Update site.yml for infrastructure orchestration with role-based deployment
- Add host-specific configuration structure for scalable multi-environment setup
2025-11-12 22:36:34 +01:00
0b6eea6113 Initial commit 2025-11-12 20:48:28 +01:00