# Config System Redesign ## 1. Problem Statement `LoadConfig()` panics on the server because the config system assumes a writable config directory. On the server, `OPAL_CONFIG_DIR=/etc/opal` is read-only (systemd `ReadOnlyPaths`), `opal.yml` doesn't exist (Ansible only deploys `opal.env`), and the attempt to create it fails. The nil `*Config` propagates unchecked through `GetConfig()` callers to `BuildUrgencyCoefficients(nil)`, causing a nil-pointer panic. ### Panic chain ``` sortByUrgency() report.go:426 cfg, _ := GetConfig() ← error discarded, cfg = nil coeffs := BuildUrgencyCoefficients(cfg) return &UrgencyCoefficients{ Due: cfg.UrgencyDue, ← nil dereference → PANIC } ``` ### Root cause The config system was designed for the CLI (user-writable `~/.config/opal/`) and has three interacting issues: 1. **Write-on-read**: `LoadConfig()` creates directories and writes a default `opal.yml` as a side effect of *reading* config. This fails when the filesystem is read-only. 2. **Error swallowing**: All internal callers use `cfg, _ := GetConfig()`, turning a load failure into a nil-pointer panic instead of a graceful error. 3. **No mode awareness**: The same code path runs for CLI users (writable home dir, interactive, config file is useful) and for the server (read-only `/etc/opal`, headless, defaults are fine). --- ## 2. Current Architecture ```mermaid graph TD subgraph "Config Sources" ENV["opal.env
(systemd EnvironmentFile)"] YML["opal.yml
(Viper / YAML)"] DEFAULTS["Hardcoded defaults
(in LoadConfig)"] end subgraph "Loading Mechanisms" AUTH_LOAD["auth.LoadConfig()
os.Getenv() → auth.Config"] ENGINE_LOAD["engine.LoadConfig()
Viper → engine.Config"] end subgraph "Consumers" SERVER["API Server
(handlers, middleware)"] CLI["CLI Commands
(cmd/ package)"] end ENV --> AUTH_LOAD YML --> ENGINE_LOAD DEFAULTS --> ENGINE_LOAD AUTH_LOAD --> SERVER ENGINE_LOAD --> SERVER ENGINE_LOAD --> CLI ``` ### Two independent config subsystems | Aspect | `engine.Config` | `auth.Config` | |--------|----------------|---------------| | **Source** | `opal.yml` via Viper | Environment variables | | **Loaded by** | `engine.LoadConfig()` | `auth.LoadConfig()` | | **Caching** | Singleton (`globalConfig`) | None (re-read each call) | | **Write side effects** | Creates dir + file on load | None | | **Error model** | Returns `(*Config, error)` | Returns `*Config` (no error) | | **Used by** | CLI + Server (lazy) | Server only | ### `engine.Config` field categories | Category | Fields | CLI | Server | Notes | |----------|--------|-----|--------|-------| | **Display** | `DefaultFilter`, `DefaultSort`, `DefaultReport`, `ColorOutput` | Yes | No | Terminal-only | | **Date** | `WeekStartDay`, `DefaultDueTime` | Yes | Yes | Used by `ParseDate()` | | **Urgency** | 13 urgency coefficients | Yes | Yes | Core scoring logic | | **Limits** | `NextLimit` | Yes | Indirectly | Report limit | | **Sync** | 6 sync fields | Yes | No | Client-side sync config | Key observation: The server only needs **urgency coefficients** and **date settings** from `engine.Config`. It doesn't need display preferences or sync settings. But all 30 fields are loaded through the same mechanism. ### `engine.LoadConfig()` flow ``` 1. Check globalConfig singleton → return if cached 2. GetConfigDir() → resolve directory path 3. os.MkdirAll(configDir) ← FAILS on read-only FS 4. Viper.ReadInConfig() 5. If read fails: a. Viper.WriteConfigAs() ← FAILS on read-only FS b. Re-read written file 6. Unmarshal → Config struct 7. Cache in globalConfig ``` Steps 3 and 5a are the failure points on the server. ### Caller error handling audit | Caller | File:Line | Error handling | |--------|-----------|----------------| | `FormatTaskListWithFormat` | display.go:24 | `cfg, _ := GetConfig()` — **ignored** | | `formatMinimalLine` | display.go:119 | `cfg, _ := GetConfig()` — **ignored** | | `NextReport.SortFunc` | report.go:168 | `cfg, _ := GetConfig()` — **ignored** | | `NextReport.LimitFunc` | report.go:187 | `cfg, _ := GetConfig()` — **ignored** | | `sortByUrgency` | report.go:426 | `cfg, _ := GetConfig()` — **ignored** | | `PopulateUrgency` | task.go:769 | `cfg, _ := GetConfig()` — **ignored** | | `getWeekStart` | dateparse.go:484 | Checked — falls back to Monday | | All cmd/ callers | root.go, sync.go | Checked — exits on error | Every engine-internal caller except `dateparse.go` discards the error. --- ## 3. Additional Issues Found ### 3a. Config file clobbering `LoadConfig()` line 296-306 catches **any** `ReadInConfig` error (not just "file not found") and overwrites the config file with defaults. A YAML syntax error in the user's `opal.yml` silently destroys their customizations. ### 3b. `SaveConfig()` manual field sync Adding a new config field requires updating three places: 1. `Config` struct definition 2. `LoadConfig()` — `v.SetDefault(...)` call 3. `SaveConfig()` — `v.Set(...)` call Missing any one of these causes silent data loss on save or missing defaults. ### 3c. `auth.LoadConfig()` silent parse failures ```go jwtExpiry, _ := strconv.Atoi(getEnv("JWT_EXPIRY", "3600")) ``` If `JWT_EXPIRY=abc`, `Atoi` returns `(0, error)`, error is discarded, and `JWTExpiry` silently becomes 0 (tokens expire immediately). ### 3d. Insecure JWT default `JWT_SECRET` defaults to `"change-me-in-production"`. While `validateServerConfig()` checks for empty values, it doesn't check for the insecure default. ### 3e. Sync API key in plaintext `sync_api_key` is stored in `opal.yml` with no special file permissions. ### 3f. No env-var override of YAML config Viper's `AutomaticEnv()` is never called, so there's no way to override YAML config values via environment variables. This is an obstacle for 12-factor server deployments. --- ## 4. Design Options ### Option A: Minimal fix — graceful fallback to defaults **Change**: Make `LoadConfig()` return a `DefaultConfig()` when it can't read or write the config file, instead of returning nil. ```go func DefaultConfig() *Config { return &Config{ DefaultFilter: "status:pending", DefaultSort: "due,priority", // ... all defaults ... UrgencyDue: 12.0, // ... } } func LoadConfig() (*Config, error) { if globalConfig != nil { return globalConfig, nil } cfg, err := loadFromFile() if err != nil { // File not available — use defaults (common in server mode) cfg = DefaultConfig() } globalConfig = cfg return cfg, nil } ``` **Pros**: Smallest change, fixes the panic, no behavior change for CLI users. **Cons**: Doesn't address the architectural confusion. Error-swallowing callers remain. Write-on-read side effect remains for CLI. Three-source config stays. --- ### Option B: Separate load paths for CLI and server **Change**: Split `LoadConfig()` into two functions: - `LoadConfigFromFile()` — current behavior for CLI (read/write YAML) - `LoadConfigFromDefaults()` — returns `DefaultConfig()` for server The server startup code calls `LoadConfigFromDefaults()` eagerly during init, populating the singleton before any lazy `GetConfig()` call. ```go // cmd/server.go — in serverStartCmd.Run, before InitDB() engine.LoadConfigFromDefaults() ``` **Pros**: Fixes the panic. Server path never touches the filesystem for config. Explicit about which mode is running. CLI behavior unchanged. **Cons**: Two code paths to maintain. Server can't be customized without code changes (urgency tuning, etc.). --- ### Option C: Read-only load with env-var overrides (recommended) **Change**: Restructure `LoadConfig()` to: 1. Start with hardcoded defaults (always succeeds) 2. Layer YAML file on top **if it exists** (read-only, never create) 3. Layer environment variables on top (via Viper `AutomaticEnv`) 4. Never write to the filesystem as a side effect of loading ```go func LoadConfig() (*Config, error) { if globalConfig != nil { return globalConfig, nil } v := viper.New() v.SetConfigType("yaml") // 1. Hardcoded defaults (always present) setDefaults(v) // 2. YAML file overlay (optional, read-only) if configPath, err := GetConfigPath(); err == nil { v.SetConfigFile(configPath) if err := v.ReadInConfig(); err != nil { if _, ok := err.(viper.ConfigFileNotFoundError); !ok { // File exists but is malformed — report, don't clobber if _, statErr := os.Stat(configPath); statErr == nil { return nil, fmt.Errorf("config file %s is invalid: %w", configPath, err) } // File doesn't exist — that's fine, use defaults } } } // 3. Environment variable overlay v.SetEnvPrefix("OPAL") v.AutomaticEnv() cfg := &Config{} if err := v.Unmarshal(cfg); err != nil { return nil, fmt.Errorf("failed to unmarshal config: %w", err) } globalConfig = cfg return cfg, nil } ``` File creation moves to an explicit `InitConfig()` function, called only during `opal setup` and CLI first-run: ```go func InitConfig() error { configDir, err := GetConfigDir() if err != nil { return err } if err := os.MkdirAll(configDir, 0755); err != nil { return fmt.Errorf("failed to create config directory: %w", err) } return SaveConfig(DefaultConfig()) } ``` Environment variable mapping: | YAML key | Env var | |----------|---------| | `urgency_due_coefficient` | `OPAL_URGENCY_DUE_COEFFICIENT` | | `default_filter` | `OPAL_DEFAULT_FILTER` | | `week_start_day` | `OPAL_WEEK_START_DAY` | | ... | `OPAL_` | **Pros**: - Fixes the panic — loading never fails fatally (missing file = defaults) - Malformed YAML is reported, not silently clobbered - Server can tune urgency via `opal.env` without deploying `opal.yml` - 12-factor compatible (env vars override everything) - CLI behavior unchanged (reads existing `opal.yml` + env overrides) - Single code path for both CLI and server - No filesystem write side effects during load **Cons**: - `SaveConfig()` still needs the manual field-sync (separate improvement) - Slightly more Viper configuration --- ### Option D: Full restructure with typed config sections **Change**: Split `Config` into domain-specific sub-configs and unify the loading mechanism for all config (merge `auth.Config` into the same system). ```go type Config struct { Display DisplayConfig `mapstructure:"display"` Urgency UrgencyConfig `mapstructure:"urgency"` Sync SyncConfig `mapstructure:"sync"` Server ServerConfig `mapstructure:"server"` // absorbs auth.Config } ``` **Pros**: Clean separation. Single config system. Extensible. **Cons**: Large refactor. Breaks existing `opal.yml` format. `auth.Config` works fine as-is for its purpose. Over-engineered for the current problem. --- ## 5. Recommendation **Option C** — read-only load with env-var overrides. It fixes the immediate panic, eliminates the write-on-read footgun, enables server customization via environment variables, and does it with a focused change to one function. It doesn't over-engineer or restructure things that work. ### Implementation plan #### Step 1: Extract `DefaultConfig()` constructor Create a `DefaultConfig()` function that returns a `*Config` with all defaults populated. This is the single source of truth for default values — used by `LoadConfig()`, `InitConfig()`, and `SaveConfig()`. **Files**: `internal/engine/config.go` #### Step 2: Rewrite `LoadConfig()` to be read-only Remove directory creation and file writing. Layer: defaults → YAML (if exists) → env vars. Return `DefaultConfig()` on missing file instead of nil. Return error only on malformed YAML (file exists but can't be parsed). **Files**: `internal/engine/config.go` #### Step 3: Extract `InitConfig()` for file creation Move the "create config directory + write defaults" logic into `InitConfig()`. Call it from `initializeApp()` (CLI first-run path) and `opal setup`. **Files**: `internal/engine/config.go`, `cmd/root.go`, `cmd/setup.go` #### Step 4: Fix error-swallowing callers Two options, from least to most invasive: **4a (recommended)**: Make `GetConfig()` never return nil. Since `LoadConfig()` now always returns a valid `*Config` (defaults on failure), `GetConfig()` also always returns non-nil. The `cfg, _ := GetConfig()` pattern becomes safe. The error return is kept for callers that want to distinguish "loaded from file" vs "using defaults" but nil is never returned. **4b (defensive)**: Also make `BuildUrgencyCoefficients()` accept nil and return default coefficients. Belt-and-suspenders. **Files**: `internal/engine/config.go`, `internal/engine/urgency.go` #### Step 5: Enable env-var overrides via Viper Add `v.SetEnvPrefix("OPAL")` and `v.AutomaticEnv()` so that any config key can be overridden by `OPAL_`. Update `opal.env` in deployment docs to show urgency tuning examples. **Files**: `internal/engine/config.go`, `docs/deployment.md` #### Step 6: Eliminate `SaveConfig()` field duplication Replace the manual `v.Set()` calls with Viper's `mapstructure` round-trip or a reflect-based helper so new fields are automatically included. ```go func SaveConfig(cfg *Config) error { v := viper.New() v.SetConfigFile(configPath) v.SetConfigType("yaml") // Use mapstructure tags to populate viper from struct data, _ := mapstructure.Decode(cfg) // or manual marshal for k, val := range data { v.Set(k, val) } return v.WriteConfig() } ``` Alternatively, marshal to YAML directly with `yaml.Marshal` and write the bytes, bypassing Viper for the write path entirely. **Files**: `internal/engine/config.go` ### Deployment changes After this redesign, the Ansible role needs **no changes** — `opal.env` is sufficient. To customize urgency coefficients on the server, add lines to `opal.env`: ```bash # Server-side urgency tuning (optional) OPAL_URGENCY_DUE_COEFFICIENT=12.0 OPAL_URGENCY_AGE_MAX=180 ``` No `opal.yml` deployment needed. The server runs on defaults + env overrides. --- ## 6. Technical Decisions ### ADR-1: Config loading must not write to the filesystem - **Context**: `LoadConfig()` creates directories and writes `opal.yml` as a side effect. This fails on read-only filesystems (server) and is surprising behavior for a "load" function. - **Decision**: `LoadConfig()` becomes pure read. File creation moves to `InitConfig()`, called explicitly during setup/first-run. - **Alternatives**: Deploy `opal.yml` via Ansible; make `/etc/opal` writable. - **Consequences**: CLI first-run code must call `InitConfig()` explicitly. `IsFirstRun()` check remains in `initializeApp()`. ### ADR-2: Missing config file returns defaults, not an error - **Context**: On the server, `opal.yml` doesn't exist and doesn't need to. The current code treats this as a fatal error. - **Decision**: Missing file = use defaults silently. Malformed file = return error (don't clobber). `GetConfig()` never returns nil. - **Alternatives**: Require `opal.yml` everywhere; use separate load functions per mode. - **Consequences**: Callers that discard errors (`cfg, _ := GetConfig()`) are now safe. The error return still exists for callers that need to distinguish "file loaded" from "using defaults". ### ADR-3: Environment variables can override any config key - **Context**: The server is configured entirely via env vars (`opal.env` + systemd). Currently urgency coefficients can only be set via `opal.yml`. - **Decision**: Enable Viper's `AutomaticEnv()` with prefix `OPAL_`. Any YAML key `foo_bar` can be overridden by `OPAL_FOO_BAR`. - **Alternatives**: Add a server-specific config file; keep config values hardcoded for server. - **Consequences**: Layered config: defaults < YAML file < env vars. Aligns with 12-factor app principles. Deployment can tune behavior without deploying additional files. ### ADR-4: Do not merge auth.Config into engine.Config - **Context**: `auth.Config` and `engine.Config` are completely independent systems. Merging them would create a single unified config. - **Decision**: Keep them separate. `auth.Config` loads from env vars, works correctly, and is server-only. No reason to change it. - **Alternatives**: Unified config struct with sections. - **Consequences**: Two config types remain. This is acceptable because they serve different purposes, have different lifecycles, and the current `auth.Config` has no bugs.