Files
gems/docs/design/config-redesign.md
T
joakim 80ea17227d fix: prevent nil-panic on server and improve OAuth callback handling
Load config eagerly during server startup so sortByUrgency never
hits a nil config. Add nil-guard in BuildUrgencyCoefficients as
belt-and-suspenders defense. Fix OAuth callback to support both
GET and POST, and resolve issuer URLs properly with path.Dir.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 16:40:53 +01:00

16 KiB

Config System Redesign

1. Problem Statement

LoadConfig() panics on the server because the config system assumes a writable config directory. On the server, OPAL_CONFIG_DIR=/etc/opal is read-only (systemd ReadOnlyPaths), opal.yml doesn't exist (Ansible only deploys opal.env), and the attempt to create it fails. The nil *Config propagates unchecked through GetConfig() callers to BuildUrgencyCoefficients(nil), causing a nil-pointer panic.

Panic chain

sortByUrgency()                     report.go:426
  cfg, _ := GetConfig()             ← error discarded, cfg = nil
  coeffs := BuildUrgencyCoefficients(cfg)
    return &UrgencyCoefficients{
      Due: cfg.UrgencyDue,          ← nil dereference → PANIC
    }

Root cause

The config system was designed for the CLI (user-writable ~/.config/opal/) and has three interacting issues:

  1. Write-on-read: LoadConfig() creates directories and writes a default opal.yml as a side effect of reading config. This fails when the filesystem is read-only.
  2. Error swallowing: All internal callers use cfg, _ := GetConfig(), turning a load failure into a nil-pointer panic instead of a graceful error.
  3. No mode awareness: The same code path runs for CLI users (writable home dir, interactive, config file is useful) and for the server (read-only /etc/opal, headless, defaults are fine).

2. Current Architecture

graph TD
    subgraph "Config Sources"
        ENV["opal.env<br/>(systemd EnvironmentFile)"]
        YML["opal.yml<br/>(Viper / YAML)"]
        DEFAULTS["Hardcoded defaults<br/>(in LoadConfig)"]
    end

    subgraph "Loading Mechanisms"
        AUTH_LOAD["auth.LoadConfig()<br/>os.Getenv() → auth.Config"]
        ENGINE_LOAD["engine.LoadConfig()<br/>Viper → engine.Config"]
    end

    subgraph "Consumers"
        SERVER["API Server<br/>(handlers, middleware)"]
        CLI["CLI Commands<br/>(cmd/ package)"]
    end

    ENV --> AUTH_LOAD
    YML --> ENGINE_LOAD
    DEFAULTS --> ENGINE_LOAD

    AUTH_LOAD --> SERVER
    ENGINE_LOAD --> SERVER
    ENGINE_LOAD --> CLI

Two independent config subsystems

Aspect engine.Config auth.Config
Source opal.yml via Viper Environment variables
Loaded by engine.LoadConfig() auth.LoadConfig()
Caching Singleton (globalConfig) None (re-read each call)
Write side effects Creates dir + file on load None
Error model Returns (*Config, error) Returns *Config (no error)
Used by CLI + Server (lazy) Server only

engine.Config field categories

Category Fields CLI Server Notes
Display DefaultFilter, DefaultSort, DefaultReport, ColorOutput Yes No Terminal-only
Date WeekStartDay, DefaultDueTime Yes Yes Used by ParseDate()
Urgency 13 urgency coefficients Yes Yes Core scoring logic
Limits NextLimit Yes Indirectly Report limit
Sync 6 sync fields Yes No Client-side sync config

Key observation: The server only needs urgency coefficients and date settings from engine.Config. It doesn't need display preferences or sync settings. But all 30 fields are loaded through the same mechanism.

engine.LoadConfig() flow

1. Check globalConfig singleton → return if cached
2. GetConfigDir() → resolve directory path
3. os.MkdirAll(configDir)          ← FAILS on read-only FS
4. Viper.ReadInConfig()
5. If read fails:
   a. Viper.WriteConfigAs()        ← FAILS on read-only FS
   b. Re-read written file
6. Unmarshal → Config struct
7. Cache in globalConfig

Steps 3 and 5a are the failure points on the server.

Caller error handling audit

Caller File:Line Error handling
FormatTaskListWithFormat display.go:24 cfg, _ := GetConfig()ignored
formatMinimalLine display.go:119 cfg, _ := GetConfig()ignored
NextReport.SortFunc report.go:168 cfg, _ := GetConfig()ignored
NextReport.LimitFunc report.go:187 cfg, _ := GetConfig()ignored
sortByUrgency report.go:426 cfg, _ := GetConfig()ignored
PopulateUrgency task.go:769 cfg, _ := GetConfig()ignored
getWeekStart dateparse.go:484 Checked — falls back to Monday
All cmd/ callers root.go, sync.go Checked — exits on error

Every engine-internal caller except dateparse.go discards the error.


3. Additional Issues Found

3a. Config file clobbering

LoadConfig() line 296-306 catches any ReadInConfig error (not just "file not found") and overwrites the config file with defaults. A YAML syntax error in the user's opal.yml silently destroys their customizations.

3b. SaveConfig() manual field sync

Adding a new config field requires updating three places:

  1. Config struct definition
  2. LoadConfig()v.SetDefault(...) call
  3. SaveConfig()v.Set(...) call

Missing any one of these causes silent data loss on save or missing defaults.

3c. auth.LoadConfig() silent parse failures

jwtExpiry, _ := strconv.Atoi(getEnv("JWT_EXPIRY", "3600"))

If JWT_EXPIRY=abc, Atoi returns (0, error), error is discarded, and JWTExpiry silently becomes 0 (tokens expire immediately).

3d. Insecure JWT default

JWT_SECRET defaults to "change-me-in-production". While validateServerConfig() checks for empty values, it doesn't check for the insecure default.

3e. Sync API key in plaintext

sync_api_key is stored in opal.yml with no special file permissions.

3f. No env-var override of YAML config

Viper's AutomaticEnv() is never called, so there's no way to override YAML config values via environment variables. This is an obstacle for 12-factor server deployments.


4. Design Options

Option A: Minimal fix — graceful fallback to defaults

Change: Make LoadConfig() return a DefaultConfig() when it can't read or write the config file, instead of returning nil.

func DefaultConfig() *Config {
    return &Config{
        DefaultFilter:      "status:pending",
        DefaultSort:        "due,priority",
        // ... all defaults ...
        UrgencyDue:         12.0,
        // ...
    }
}

func LoadConfig() (*Config, error) {
    if globalConfig != nil {
        return globalConfig, nil
    }

    cfg, err := loadFromFile()
    if err != nil {
        // File not available — use defaults (common in server mode)
        cfg = DefaultConfig()
    }

    globalConfig = cfg
    return cfg, nil
}

Pros: Smallest change, fixes the panic, no behavior change for CLI users. Cons: Doesn't address the architectural confusion. Error-swallowing callers remain. Write-on-read side effect remains for CLI. Three-source config stays.


Option B: Separate load paths for CLI and server

Change: Split LoadConfig() into two functions:

  • LoadConfigFromFile() — current behavior for CLI (read/write YAML)
  • LoadConfigFromDefaults() — returns DefaultConfig() for server

The server startup code calls LoadConfigFromDefaults() eagerly during init, populating the singleton before any lazy GetConfig() call.

// cmd/server.go — in serverStartCmd.Run, before InitDB()
engine.LoadConfigFromDefaults()

Pros: Fixes the panic. Server path never touches the filesystem for config. Explicit about which mode is running. CLI behavior unchanged. Cons: Two code paths to maintain. Server can't be customized without code changes (urgency tuning, etc.).


Change: Restructure LoadConfig() to:

  1. Start with hardcoded defaults (always succeeds)
  2. Layer YAML file on top if it exists (read-only, never create)
  3. Layer environment variables on top (via Viper AutomaticEnv)
  4. Never write to the filesystem as a side effect of loading
func LoadConfig() (*Config, error) {
    if globalConfig != nil {
        return globalConfig, nil
    }

    v := viper.New()
    v.SetConfigType("yaml")

    // 1. Hardcoded defaults (always present)
    setDefaults(v)

    // 2. YAML file overlay (optional, read-only)
    if configPath, err := GetConfigPath(); err == nil {
        v.SetConfigFile(configPath)
        if err := v.ReadInConfig(); err != nil {
            if _, ok := err.(viper.ConfigFileNotFoundError); !ok {
                // File exists but is malformed — report, don't clobber
                if _, statErr := os.Stat(configPath); statErr == nil {
                    return nil, fmt.Errorf("config file %s is invalid: %w", configPath, err)
                }
                // File doesn't exist — that's fine, use defaults
            }
        }
    }

    // 3. Environment variable overlay
    v.SetEnvPrefix("OPAL")
    v.AutomaticEnv()

    cfg := &Config{}
    if err := v.Unmarshal(cfg); err != nil {
        return nil, fmt.Errorf("failed to unmarshal config: %w", err)
    }

    globalConfig = cfg
    return cfg, nil
}

File creation moves to an explicit InitConfig() function, called only during opal setup and CLI first-run:

func InitConfig() error {
    configDir, err := GetConfigDir()
    if err != nil {
        return err
    }
    if err := os.MkdirAll(configDir, 0755); err != nil {
        return fmt.Errorf("failed to create config directory: %w", err)
    }
    return SaveConfig(DefaultConfig())
}

Environment variable mapping:

YAML key Env var
urgency_due_coefficient OPAL_URGENCY_DUE_COEFFICIENT
default_filter OPAL_DEFAULT_FILTER
week_start_day OPAL_WEEK_START_DAY
... OPAL_<UPPER_SNAKE>

Pros:

  • Fixes the panic — loading never fails fatally (missing file = defaults)
  • Malformed YAML is reported, not silently clobbered
  • Server can tune urgency via opal.env without deploying opal.yml
  • 12-factor compatible (env vars override everything)
  • CLI behavior unchanged (reads existing opal.yml + env overrides)
  • Single code path for both CLI and server
  • No filesystem write side effects during load

Cons:

  • SaveConfig() still needs the manual field-sync (separate improvement)
  • Slightly more Viper configuration

Option D: Full restructure with typed config sections

Change: Split Config into domain-specific sub-configs and unify the loading mechanism for all config (merge auth.Config into the same system).

type Config struct {
    Display  DisplayConfig  `mapstructure:"display"`
    Urgency  UrgencyConfig  `mapstructure:"urgency"`
    Sync     SyncConfig     `mapstructure:"sync"`
    Server   ServerConfig   `mapstructure:"server"`  // absorbs auth.Config
}

Pros: Clean separation. Single config system. Extensible. Cons: Large refactor. Breaks existing opal.yml format. auth.Config works fine as-is for its purpose. Over-engineered for the current problem.


5. Recommendation

Option C — read-only load with env-var overrides.

It fixes the immediate panic, eliminates the write-on-read footgun, enables server customization via environment variables, and does it with a focused change to one function. It doesn't over-engineer or restructure things that work.

Implementation plan

Step 1: Extract DefaultConfig() constructor

Create a DefaultConfig() function that returns a *Config with all defaults populated. This is the single source of truth for default values — used by LoadConfig(), InitConfig(), and SaveConfig().

Files: internal/engine/config.go

Step 2: Rewrite LoadConfig() to be read-only

Remove directory creation and file writing. Layer: defaults → YAML (if exists) → env vars. Return DefaultConfig() on missing file instead of nil. Return error only on malformed YAML (file exists but can't be parsed).

Files: internal/engine/config.go

Step 3: Extract InitConfig() for file creation

Move the "create config directory + write defaults" logic into InitConfig(). Call it from initializeApp() (CLI first-run path) and opal setup.

Files: internal/engine/config.go, cmd/root.go, cmd/setup.go

Step 4: Fix error-swallowing callers

Two options, from least to most invasive:

4a (recommended): Make GetConfig() never return nil. Since LoadConfig() now always returns a valid *Config (defaults on failure), GetConfig() also always returns non-nil. The cfg, _ := GetConfig() pattern becomes safe. The error return is kept for callers that want to distinguish "loaded from file" vs "using defaults" but nil is never returned.

4b (defensive): Also make BuildUrgencyCoefficients() accept nil and return default coefficients. Belt-and-suspenders.

Files: internal/engine/config.go, internal/engine/urgency.go

Step 5: Enable env-var overrides via Viper

Add v.SetEnvPrefix("OPAL") and v.AutomaticEnv() so that any config key can be overridden by OPAL_<KEY>. Update opal.env in deployment docs to show urgency tuning examples.

Files: internal/engine/config.go, docs/deployment.md

Step 6: Eliminate SaveConfig() field duplication

Replace the manual v.Set() calls with Viper's mapstructure round-trip or a reflect-based helper so new fields are automatically included.

func SaveConfig(cfg *Config) error {
    v := viper.New()
    v.SetConfigFile(configPath)
    v.SetConfigType("yaml")

    // Use mapstructure tags to populate viper from struct
    data, _ := mapstructure.Decode(cfg) // or manual marshal
    for k, val := range data {
        v.Set(k, val)
    }
    return v.WriteConfig()
}

Alternatively, marshal to YAML directly with yaml.Marshal and write the bytes, bypassing Viper for the write path entirely.

Files: internal/engine/config.go

Deployment changes

After this redesign, the Ansible role needs no changesopal.env is sufficient. To customize urgency coefficients on the server, add lines to opal.env:

# Server-side urgency tuning (optional)
OPAL_URGENCY_DUE_COEFFICIENT=12.0
OPAL_URGENCY_AGE_MAX=180

No opal.yml deployment needed. The server runs on defaults + env overrides.


6. Technical Decisions

ADR-1: Config loading must not write to the filesystem

  • Context: LoadConfig() creates directories and writes opal.yml as a side effect. This fails on read-only filesystems (server) and is surprising behavior for a "load" function.
  • Decision: LoadConfig() becomes pure read. File creation moves to InitConfig(), called explicitly during setup/first-run.
  • Alternatives: Deploy opal.yml via Ansible; make /etc/opal writable.
  • Consequences: CLI first-run code must call InitConfig() explicitly. IsFirstRun() check remains in initializeApp().

ADR-2: Missing config file returns defaults, not an error

  • Context: On the server, opal.yml doesn't exist and doesn't need to. The current code treats this as a fatal error.
  • Decision: Missing file = use defaults silently. Malformed file = return error (don't clobber). GetConfig() never returns nil.
  • Alternatives: Require opal.yml everywhere; use separate load functions per mode.
  • Consequences: Callers that discard errors (cfg, _ := GetConfig()) are now safe. The error return still exists for callers that need to distinguish "file loaded" from "using defaults".

ADR-3: Environment variables can override any config key

  • Context: The server is configured entirely via env vars (opal.env + systemd). Currently urgency coefficients can only be set via opal.yml.
  • Decision: Enable Viper's AutomaticEnv() with prefix OPAL_. Any YAML key foo_bar can be overridden by OPAL_FOO_BAR.
  • Alternatives: Add a server-specific config file; keep config values hardcoded for server.
  • Consequences: Layered config: defaults < YAML file < env vars. Aligns with 12-factor app principles. Deployment can tune behavior without deploying additional files.

ADR-4: Do not merge auth.Config into engine.Config

  • Context: auth.Config and engine.Config are completely independent systems. Merging them would create a single unified config.
  • Decision: Keep them separate. auth.Config loads from env vars, works correctly, and is server-only. No reason to change it.
  • Alternatives: Unified config struct with sections.
  • Consequences: Two config types remain. This is acceptable because they serve different purposes, have different lifecycles, and the current auth.Config has no bugs.