Commit Graph

32 Commits

Author SHA1 Message Date
2fe194ba82 Implement modular nftables architecture and Gitea SSH firewall management
- Restructure security playbook with modular nftables loader
- Base rules loaded first, service rules second, drop rule last
- Add Gitea self-contained firewall management (port 2222)
- Add fail2ban protection for Gitea SSH brute force attacks
- Update documentation with new firewall architecture
- Create comprehensive Gitea deployment and testing guide

This enables self-contained service roles to manage their own firewall
rules without modifying the central security playbook. Each service
deploys rules to /etc/nftables.d/ which are loaded before the final
drop rule, maintaining the defense-in-depth security model.
2025-12-16 21:45:22 +01:00
9b12225ec8 MILE: All current services confirmed working on a fresh arch vps 2025-12-16 20:40:25 +01:00
f40349c2e7 update authentik handlers
authentik-pod handles container orchestration, so no server and worker
handlers needed
2025-12-16 20:39:36 +01:00
4f8b46fa14 solve folder structure issue 2025-12-16 20:38:51 +01:00
a9f814d929 Add comprehensive devigo deployment documentation
- Complete deployment guide with architecture overview
- Documented all deployment steps and current status
- Added lessons learned from deployment challenges
- Comprehensive troubleshooting guide
- Maintenance procedures and security considerations
- Migration notes from arch-vps to mini-vps

Key sections:
- OAuth flow and authentication
- CI/CD deployment pipeline
- MIME type handling strategy
- Podman authentication for systemd
- Future improvements and technical debt tracking
2025-12-16 00:53:45 +01:00
44584c68f1 Add GitHub Container Registry authentication to Podman role
- Deploy /etc/containers/auth.json with GHCR credentials
- Support for private container image pulls
- Credentials encrypted in Ansible vault
- Used by devigo and other services pulling from private registries
- Updated documentation with authentication setup
2025-12-16 00:53:42 +01:00
0ecbb84fa5 Configure devigo service in production environment
- Added devigo role to production playbook
- Configured domains: devigo.no (apex), www.devigo.no, decap.jnss.me
- Set OAuth trusted origins for multi-domain support
- Integrated with existing Caddy and Podman infrastructure
2025-12-16 00:53:39 +01:00
1350d10a7c Add devigo deployment role for mini-vps production environment
- Created comprehensive devigo Ansible role with Podman Quadlet support
- Deployed devigo-site container (Hugo + nginx) via systemd
- Deployed devigo-decap-oauth OAuth2 proxy for Decap CMS
- Integrated with Caddy reverse proxy for HTTPS

Services deployed:
- devigo.no (apex domain, primary)
- www.devigo.no (redirects to apex)
- decap.jnss.me (OAuth proxy)

Key features:
- REGISTRY_AUTH_FILE environment for Podman GHCR authentication
- TRUSTED_ORIGINS (plural) for decapcms-oauth2 multi-origin support
- JavaScript-based Decap CMS initialization (eliminates YAML MIME dependency)
- nginx location block for YAML MIME type (text/yaml)
- Automated deployment via GitHub Actions CI/CD
- Comprehensive documentation with troubleshooting guide
- Architecture decision records

Fixes applied during deployment:
- OAuth origin trust validation (TRUSTED_ORIGINS vs TRUSTED_ORIGIN)
- MIME type handling strategy (location-specific vs server-level types block)
- Decap CMS initialization method (JavaScript vs link tag)
- Podman authentication for systemd services (REGISTRY_AUTH_FILE)

Testing status:
-  MIME types verified (HTML, CSS, YAML all correct)
-  OAuth authentication working
-  Container image pulls from private GHCR
-  Automated deployments functional
-  Site fully operational at devigo.no
2025-12-16 00:53:33 +01:00
ecbeb07ba2 Migrate sigvild-gallery to production environment
- Add multi-environment architecture (homelab + production)
- Create production environment (mini-vps) for client projects
- Create homelab playbook for arch-vps services
- Create production playbook for mini-vps services
- Move sigvild-gallery from homelab to production
- Restructure variables: group_vars/production + host_vars/arch-vps
- Add backup-sigvild.yml playbook with auto-restore functionality
- Fix restore logic to check for data before creating directories
- Add manual variable loading workaround for Ansible 2.20
- Update all documentation for multi-environment setup
- Add ADR-007 documenting multi-environment architecture decision
2025-12-15 16:33:33 +01:00
e8b76c6a72 Update authentication documentation to reflect OAuth/OIDC as primary method
- Update architecture-decisions.md: Change decision to OAuth/OIDC primary, forward auth fallback
  - Add comprehensive OAuth/OIDC and forward auth flow diagrams
  - Add decision matrix comparing both authentication methods
  - Include real examples: Nextcloud/Gitea OAuth configs, whoami forward auth
  - Update rationale to emphasize OAuth/OIDC security and standards benefits

- Update authentication-architecture.md: Align with new OAuth-first approach
  - Add 'Choosing the Right Pattern' section with clear decision guidance
  - Swap pattern order: OAuth/OIDC (Pattern 1), Forward Auth (Pattern 2)
  - Update Example 1: Change Gitea from forward auth to OAuth/OIDC integration
  - Add emphasis on primary vs fallback methods throughout

- Update authentik-deployment-guide.md: Reflect OAuth/OIDC preference
  - Update overview to mention OAuth2/OIDC provider and forward auth fallback
  - Add decision guidance to service integration examples
  - Reorder examples: Nextcloud OAuth (primary), forward auth (fallback)
  - Clarify forward auth should only be used for services without OAuth support

This update ensures all authentication documentation consistently reflects the
agreed architectural decision: use OAuth/OIDC when services support it
(Nextcloud, Gitea, modern apps), and only use forward auth as a fallback for
legacy applications, static sites, or simple tools without OAuth capabilities.
2025-12-15 00:25:24 +01:00
4f8da38ca6 Add Nextcloud cloud storage role with split Redis caching strategy
## New Features

- **Nextcloud Role**: Complete cloud storage deployment using Podman Quadlet
  - FPM variant with Caddy reverse proxy and FastCGI
  - PostgreSQL database via Unix socket
  - Valkey/Redis for app-level caching and file locking
  - Automatic HTTPS with Let's Encrypt via Caddy
  - Dual-root pattern: Caddy serves static assets, FPM handles PHP

- **Split Caching Strategy**: Redis caching WITHOUT Redis sessions
  - Custom redis.config.php template for app-level caching only
  - File-based PHP sessions for stability (avoids session lock issues)
  - Prevents cascading failures from session lock contention
  - Documented in role README with detailed rationale

## Infrastructure Updates

- **Socket Permissions**: Update PostgreSQL and Valkey to mode 777
  - Required for containers that switch users (root → www-data)
  - Nextcloud container loses supplementary groups on user switch
  - Security maintained via password authentication (scram-sha-256, requirepass)
  - Documented socket permission architecture in docs/

- **PostgreSQL**: Export client group GID as fact for dependent roles
- **Valkey**: Export client group GID as fact, update socket fix service

## Documentation

- New: docs/socket-permissions-architecture.md
  - Explains 777 vs 770 socket permission trade-offs
  - Documents why group-based access doesn't work for user-switching containers
  - Provides TCP alternative for stricter security requirements

- Updated: All role READMEs with socket permission notes
- New: Nextcloud README with comprehensive deployment, troubleshooting, and Redis architecture documentation

## Configuration

- host_vars: Add Nextcloud vault variables and configuration
- site.yml: Include Nextcloud role in main playbook

## Technical Details

**Why disable Redis sessions?**

The official Nextcloud container enables Redis session handling via REDIS_HOST env var,
which causes severe performance issues:

1. Session lock contention under high concurrency (browser parallel asset requests)
2. Infinite lock retries (default lock_retries=-1) blocking FPM workers
3. Timeout orphaning: reverse proxy kills connection, worker keeps lock
4. Worker pool exhaustion: all 5 default workers blocked on same session lock
5. Cascading failure: new requests queue, more timeouts, more orphaned locks

Solution: Use file-based sessions (reliable, fast for single-server) while keeping
Redis for distributed cache and transactional file locking via custom config file.

This provides optimal performance without the complexity of Redis session debugging.

Tested: Fresh deployment on arch-vps (69.62.119.31)
Domain: https://cloud.jnss.me/
2025-12-14 22:07:08 +01:00
8e8aabd5e7 Improve logging and infrastructure variable consistency
Changes:
- Migrate Authentik to journald logging (remove file-based logs)
- Update Gitea to use infrastructure variables for PostgreSQL access
- Add comprehensive logging documentation to deployment guide
- Add infrastructure variable pattern guide to integration docs

Authentik Logging:
- Remove LogDriver=k8s-file from server and worker containers
- Remove logs directory creation from user setup tasks
- Update deployment guide with journald examples and JSON log patterns

Gitea Infrastructure Variables:
- Add infrastructure dependencies section to role defaults
- Replace hardcoded paths with postgresql_unix_socket_directories variable
- Replace hardcoded 'postgres' group with postgresql_client_group variable
- Add infrastructure variable validation in tasks
- Remove manual socket permission override (handled by infrastructure)

Documentation:
- Add journald logging best practices to service integration guide
- Add infrastructure variable pattern documentation with Gitea example
- Update Authentik deployment guide with journald commands and JSON filtering
- Document benefits: centralized logging, single source of truth, maintainability

Validated on arch-vps:
- Authentik logs accessible via journalctl and podman logs (identical output)
- Gitea user added to postgres-clients group (GID 962)
- No PostgreSQL socket permission errors after service restart
2025-12-14 17:16:21 +01:00
3506e55016 Migrate to rootful container architecture with infrastructure fact pattern
Major architectural change from rootless user services to system-level (rootful)
containers to enable group-based Unix socket access for containerized applications.

Infrastructure Changes:
- PostgreSQL: Export postgres-clients group GID as Ansible fact
- Valkey: Export valkey-clients group GID as Ansible fact
- Valkey: Add socket-fix service to maintain correct socket group ownership
- Both: Set socket directories to 770 with client group ownership

Authentik Role Refactoring:
- Remove rootless container configuration (subuid/subgid, lingering, user systemd)
- Deploy Quadlet files to /etc/containers/systemd/ (system-level)
- Use dynamic GID facts in container PodmanArgs (--group-add)
- Simplify user creation to system user with infrastructure group membership
- Update handlers for system scope service management
- Remove unnecessary container security options (no user namespace isolation)

Container Template Changes:
- Pod: Remove --userns args, change WantedBy to multi-user.target
- Containers: Replace Annotation with PodmanArgs using dynamic GIDs
- Remove /dev/shm mounts and SecurityLabelDisable (not needed for rootful)
- Change WantedBy to multi-user.target for system services

Documentation Updates:
- Add ADR-005: Rootful Containers with Infrastructure Fact Pattern
- Update ADR-003: Podman + systemd for system-level deployment
- Update authentik-deployment-guide.md for system scope commands
- Update service-integration-guide.md with rootful pattern examples
- Document discarded rootless approach and rationale

Why Rootful Succeeds:
- Direct UID/GID mapping preserves supplementary groups
- Container process groups match host socket group ownership
- No user namespace remapping breaking permissions

Why Rootless Failed (Discarded):
- User namespace UID/GID remapping broke group-based socket access
- Supplementary groups remapped into subgid range didn't match socket ownership
- Even with --userns=host and keep_original_groups, permissions failed

Pattern Established:
- Infrastructure roles create client groups and export GID facts
- Application roles validate facts and consume in container templates
- Rootful containers run as dedicated users with --group-add for socket access
- System-level deployment provides standard systemd service management

Deployment Validated:
- Services in /system.slice/ ✓
- Process groups: 961 (valkey-clients), 962 (postgres-clients), 966 (authentik) ✓
- Socket permissions: 770 with client groups ✓
- HTTP endpoint responding ✓
2025-12-14 16:56:50 +01:00
9e570ac2a3 Add comprehensive authentik documentation and improve role configuration
- Add authentik-deployment-guide.md: Complete step-by-step deployment guide
- Add architecture-decisions.md: Document native DB vs containerized rationale
- Add authentication-architecture.md: SSO strategy and integration patterns
- Update deployment-guide.md: Integrate authentik deployment procedures
- Update security-hardening.md: Add multi-layer security documentation
- Update service-integration-guide.md: Add authentik integration examples
- Update README.md: Professional project overview with architecture benefits
- Update authentik role: Fix HTTP binding, add security configs, improve templates
- Remove unused authentik task files: containers.yml, networking.yml

Key improvements:
* Document security benefits of native databases over containers
* Document Unix socket IPC architecture advantages
* Provide comprehensive troubleshooting and deployment procedures
* Add forward auth integration patterns for services
* Fix authentik HTTP binding from 127.0.0.1 to 0.0.0.0
* Add shared memory and IPC security configurations
2025-12-13 21:04:20 +01:00
bf53700b7e Fix Gitea PostgreSQL socket permission issue
- Add git user to postgres group for Unix socket access
- Ensure PostgreSQL socket directory has proper permissions
- Add socket connectivity test before database operations
- Update database tasks to use explicit socket parameters
- Add missing database privileges grant task

Resolves timeout issue in 'waiting for gitea to be ready' task
caused by permission denied errors when accessing PostgreSQL
Unix socket. Follows same pattern as working Authentik role.
2025-12-11 19:33:49 +01:00
30a882b1e1 Improve infrastructure configuration and installation logic
- Fix authentik Caddy template to use localhost instead of variable for consistency
- Improve Caddy installation logic with better conditional checks
- Fix version checking and plugin detection for more reliable deployments
- Add cleanup task condition for DNS challenge installations

These changes improve deployment reliability and consistency.
2025-12-07 21:22:00 +01:00
fe4efcbd5b Enable sigvild-gallery role and backup functionality
- Enable sigvild-gallery role in site.yml playbook
- Add backup configuration to host variables
- Integrate restore functionality into main sigvild-gallery tasks
- Add data protection logic to prevent accidental overwrites
- Enable gitea role for complete service deployment

This completes the sigvild-gallery service integration with backup/restore capabilities.
2025-12-07 21:21:50 +01:00
4df87dd57f Fix: Service management errors in sigvild-gallery restoration
- Fix Unix timestamp conversion in restore.yml using proper strftime syntax
- Add service existence check before stopping sigvild-gallery service
- Fix systemd service template environment variable syntax error
- Add proper error handling for fresh deployments where service doesn't exist yet

Resolves service management failures during restoration on fresh VPS installations.
2025-12-07 21:21:31 +01:00
0507e3291d Fix: Update authentik Caddy template to use HTTP backend
- Change reverse_proxy from https:// to http:// backend
- Use authentik_http_port instead of authentik_https_port
- Remove unnecessary TLS transport configuration
- Remove health check for non-existent endpoint

This aligns the Ansible template with the working configuration
where authentik only serves HTTP internally and Caddy handles SSL.
2025-12-07 16:45:42 +01:00
b3c3fe5c56 Merge authentik-quadlet-fix: Integrate working authentik implementation
Merge completed authentik Quadlet implementation that resolves all deployment
issues and enables external HTTPS access. This brings the working solution
developed and tested on authentik-quadlet-fix branch into main.

All systemd services now generate correctly and authentik is fully operational
at https://auth.jnss.me with proper SSL termination via Caddy.
2025-12-04 19:43:36 +01:00
b42ee2a22b Fix: Complete authentik Quadlet implementation with networking solution
Resolves authentik deployment issues by implementing proper Podman Quadlet
configuration and fixing networking for external access through Caddy.

Core Fixes:
• Add missing [Install] sections to container Quadlet files for systemd service generation
• Fix pod references from 'systemd-authentik' to 'authentik.pod' for proper Quadlet linking
• Remove problematic --userns=host to use proper rootless user namespaces
• Configure subuid/subgid ranges for authentik user (200000:65536)
• Update networking to bind 0.0.0.0:9000 only (remove unnecessary HTTPS port 9443)
• Add AUTHENTIK_LISTEN__HTTP=0.0.0.0:9000 environment configuration
• Fix Caddy reverse proxy to use HTTP backend instead of HTTPS

Infrastructure Updates:
• Enhance PostgreSQL role with Unix socket configuration and user management
• Improve Valkey role with proper systemd integration and socket permissions
• Add comprehensive service integration documentation
• Update deployment playbooks with backup and restore capabilities

Security Improvements:
• Secure network isolation with Caddy SSL termination
• Reduced attack surface by removing direct HTTPS container exposure
• Proper rootless container configuration with user namespace mapping

Result: authentik now fully operational with external HTTPS access via auth.jnss.me
All systemd services (authentik-pod, authentik-server, authentik-worker) running correctly.
2025-12-04 19:42:31 +01:00
df4ae0eb17 WIP: Authentik role with Quadlet pod approach - debugging container service generation
- Created authentik.pod file for proper pod definition
- Removed superfluous authentik-pod.container file
- Updated container templates to reference pod correctly
- Issue: Quadlet still reports 'pod authentik is not Quadlet based'
- Container services not being generated (only pod service works)
2025-11-26 23:24:09 +01:00
dd62e93517 Switching over to using unix sockets for ICP 2025-11-26 16:07:48 +01:00
d814369c99 Add Authentik SSO service and refactor Valkey configuration to use native tools and consolidated systemd service 2025-11-22 21:36:23 +01:00
500224b5de Add Podman container infrastructure role for containerized services
- Implemented complete Podman infrastructure role following rick-infra patterns
- Minimal installation approach: only install podman, trust Arch dependency management
- Configured with crun runtime for optimal performance and security
- Security-focused: HTTPS-only registries, rootless containers, systemd hardening
- Registry support: docker.io, quay.io, ghcr.io with secure configurations
- Ready for service-specific users with isolated container environments
- Quadlet support for native systemd container management
- Container-to-host networking via bridge networks with host gateway access
- Foundation for future containerized services (Authentik, Nextcloud)
- Maintains rick-infra philosophy: infrastructure provides foundation, apps manage specifics
2025-11-20 22:11:44 +01:00
3b062edeb6 Add Valkey infrastructure role as Redis-compatible cache service
- Implemented complete Valkey infrastructure role following PostgreSQL patterns
- Provides 100% Redis-compatible high-performance data structure store
- Configured for multi-application support with database isolation
- Security-focused: localhost-only binding, password auth, systemd hardening
- Arch Linux compatible: uses native Valkey package with Redis compatibility
- Database allocation strategy: DB 0 reserved, DB 1+ for applications
- Full systemd integration with security overrides and proper service management
- Redis client compatibility maintained for seamless application integration
- Ready for Authentik and future container workloads requiring cache services
2025-11-19 22:20:54 +01:00
ddbdefd27f Add self-contained Gitea Git service with PostgreSQL integration
- Implements complete Gitea Git service following rick-infra self-contained architecture
- Uses PostgreSQL infrastructure role as dependency and manages own database/user
- Native Arch Linux installation via pacman packages
- Automatic database setup (gitea database and user creation)
- SystemD service with security hardening and proper dependency management
- Caddy reverse proxy integration deployed to sites-enabled directory
- SSH server on port 2222 with automatic host key generation
- Production-ready with LFS support, security headers, and HTTPS via Caddy
- Follows simplified configuration approach with essential variables only
- Self-contained pattern: service manages complete setup independently
2025-11-18 22:33:56 +01:00
762d00eebf Add simplified PostgreSQL infrastructure role for database services
- Provides PostgreSQL server as shared database infrastructure
- Follows KISS principle with only essential configuration (11 variables vs 45 originally)
- Implements maximum security with Unix socket-only superuser access
- Uses scram-sha-256 authentication for application users
- Includes SystemD security hardening
- Applications manage their own databases/users via this infrastructure
- Production-ready with data checksums and localhost-only access
2025-11-18 22:33:56 +01:00
7c3b02e5ad Add Sigvild Gallery wedding photo application with automated deployment and improve Caddy plugin management 2025-11-18 22:33:56 +01:00
8162e789ee Simplify Caddy infrastructure to use file-based configuration instead of complex API registration system 2025-11-15 00:30:38 +01:00
7788410bfc Complete production-ready Caddy infrastructure with security hardening
- Add comprehensive Caddy role with HTTPS/TLS, DNS challenges, and systemd security
- Implement optimized systemd overrides with enhanced security restrictions
- Create detailed documentation with usage examples and variable references
- Establish proper Ansible configuration with vault integration
- Update site.yml for infrastructure orchestration with role-based deployment
- Add host-specific configuration structure for scalable multi-environment setup
2025-11-12 22:36:34 +01:00
0b6eea6113 Initial commit 2025-11-12 20:48:28 +01:00