fix: prevent nil-panic on server and improve OAuth callback handling

Load config eagerly during server startup so sortByUrgency never
hits a nil config. Add nil-guard in BuildUrgencyCoefficients as
belt-and-suspenders defense. Fix OAuth callback to support both
GET and POST, and resolve issuer URLs properly with path.Dir.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-17 16:40:53 +01:00
parent c5a963bfd9
commit 80ea17227d
7 changed files with 526 additions and 59 deletions
+482
View File
@@ -0,0 +1,482 @@
# Config System Redesign
## 1. Problem Statement
`LoadConfig()` panics on the server because the config system assumes a writable
config directory. On the server, `OPAL_CONFIG_DIR=/etc/opal` is read-only
(systemd `ReadOnlyPaths`), `opal.yml` doesn't exist (Ansible only deploys
`opal.env`), and the attempt to create it fails. The nil `*Config` propagates
unchecked through `GetConfig()` callers to `BuildUrgencyCoefficients(nil)`,
causing a nil-pointer panic.
### Panic chain
```
sortByUrgency() report.go:426
cfg, _ := GetConfig() ← error discarded, cfg = nil
coeffs := BuildUrgencyCoefficients(cfg)
return &UrgencyCoefficients{
Due: cfg.UrgencyDue, ← nil dereference → PANIC
}
```
### Root cause
The config system was designed for the CLI (user-writable `~/.config/opal/`)
and has three interacting issues:
1. **Write-on-read**: `LoadConfig()` creates directories and writes a default
`opal.yml` as a side effect of *reading* config. This fails when the
filesystem is read-only.
2. **Error swallowing**: All internal callers use `cfg, _ := GetConfig()`,
turning a load failure into a nil-pointer panic instead of a graceful error.
3. **No mode awareness**: The same code path runs for CLI users (writable home
dir, interactive, config file is useful) and for the server (read-only
`/etc/opal`, headless, defaults are fine).
---
## 2. Current Architecture
```mermaid
graph TD
subgraph "Config Sources"
ENV["opal.env<br/>(systemd EnvironmentFile)"]
YML["opal.yml<br/>(Viper / YAML)"]
DEFAULTS["Hardcoded defaults<br/>(in LoadConfig)"]
end
subgraph "Loading Mechanisms"
AUTH_LOAD["auth.LoadConfig()<br/>os.Getenv() → auth.Config"]
ENGINE_LOAD["engine.LoadConfig()<br/>Viper → engine.Config"]
end
subgraph "Consumers"
SERVER["API Server<br/>(handlers, middleware)"]
CLI["CLI Commands<br/>(cmd/ package)"]
end
ENV --> AUTH_LOAD
YML --> ENGINE_LOAD
DEFAULTS --> ENGINE_LOAD
AUTH_LOAD --> SERVER
ENGINE_LOAD --> SERVER
ENGINE_LOAD --> CLI
```
### Two independent config subsystems
| Aspect | `engine.Config` | `auth.Config` |
|--------|----------------|---------------|
| **Source** | `opal.yml` via Viper | Environment variables |
| **Loaded by** | `engine.LoadConfig()` | `auth.LoadConfig()` |
| **Caching** | Singleton (`globalConfig`) | None (re-read each call) |
| **Write side effects** | Creates dir + file on load | None |
| **Error model** | Returns `(*Config, error)` | Returns `*Config` (no error) |
| **Used by** | CLI + Server (lazy) | Server only |
### `engine.Config` field categories
| Category | Fields | CLI | Server | Notes |
|----------|--------|-----|--------|-------|
| **Display** | `DefaultFilter`, `DefaultSort`, `DefaultReport`, `ColorOutput` | Yes | No | Terminal-only |
| **Date** | `WeekStartDay`, `DefaultDueTime` | Yes | Yes | Used by `ParseDate()` |
| **Urgency** | 13 urgency coefficients | Yes | Yes | Core scoring logic |
| **Limits** | `NextLimit` | Yes | Indirectly | Report limit |
| **Sync** | 6 sync fields | Yes | No | Client-side sync config |
Key observation: The server only needs **urgency coefficients** and **date
settings** from `engine.Config`. It doesn't need display preferences or sync
settings. But all 30 fields are loaded through the same mechanism.
### `engine.LoadConfig()` flow
```
1. Check globalConfig singleton → return if cached
2. GetConfigDir() → resolve directory path
3. os.MkdirAll(configDir) ← FAILS on read-only FS
4. Viper.ReadInConfig()
5. If read fails:
a. Viper.WriteConfigAs() ← FAILS on read-only FS
b. Re-read written file
6. Unmarshal → Config struct
7. Cache in globalConfig
```
Steps 3 and 5a are the failure points on the server.
### Caller error handling audit
| Caller | File:Line | Error handling |
|--------|-----------|----------------|
| `FormatTaskListWithFormat` | display.go:24 | `cfg, _ := GetConfig()`**ignored** |
| `formatMinimalLine` | display.go:119 | `cfg, _ := GetConfig()`**ignored** |
| `NextReport.SortFunc` | report.go:168 | `cfg, _ := GetConfig()`**ignored** |
| `NextReport.LimitFunc` | report.go:187 | `cfg, _ := GetConfig()`**ignored** |
| `sortByUrgency` | report.go:426 | `cfg, _ := GetConfig()`**ignored** |
| `PopulateUrgency` | task.go:769 | `cfg, _ := GetConfig()`**ignored** |
| `getWeekStart` | dateparse.go:484 | Checked — falls back to Monday |
| All cmd/ callers | root.go, sync.go | Checked — exits on error |
Every engine-internal caller except `dateparse.go` discards the error.
---
## 3. Additional Issues Found
### 3a. Config file clobbering
`LoadConfig()` line 296-306 catches **any** `ReadInConfig` error (not just
"file not found") and overwrites the config file with defaults. A YAML syntax
error in the user's `opal.yml` silently destroys their customizations.
### 3b. `SaveConfig()` manual field sync
Adding a new config field requires updating three places:
1. `Config` struct definition
2. `LoadConfig()``v.SetDefault(...)` call
3. `SaveConfig()``v.Set(...)` call
Missing any one of these causes silent data loss on save or missing defaults.
### 3c. `auth.LoadConfig()` silent parse failures
```go
jwtExpiry, _ := strconv.Atoi(getEnv("JWT_EXPIRY", "3600"))
```
If `JWT_EXPIRY=abc`, `Atoi` returns `(0, error)`, error is discarded, and
`JWTExpiry` silently becomes 0 (tokens expire immediately).
### 3d. Insecure JWT default
`JWT_SECRET` defaults to `"change-me-in-production"`. While `validateServerConfig()`
checks for empty values, it doesn't check for the insecure default.
### 3e. Sync API key in plaintext
`sync_api_key` is stored in `opal.yml` with no special file permissions.
### 3f. No env-var override of YAML config
Viper's `AutomaticEnv()` is never called, so there's no way to override YAML
config values via environment variables. This is an obstacle for 12-factor
server deployments.
---
## 4. Design Options
### Option A: Minimal fix — graceful fallback to defaults
**Change**: Make `LoadConfig()` return a `DefaultConfig()` when it can't read
or write the config file, instead of returning nil.
```go
func DefaultConfig() *Config {
return &Config{
DefaultFilter: "status:pending",
DefaultSort: "due,priority",
// ... all defaults ...
UrgencyDue: 12.0,
// ...
}
}
func LoadConfig() (*Config, error) {
if globalConfig != nil {
return globalConfig, nil
}
cfg, err := loadFromFile()
if err != nil {
// File not available — use defaults (common in server mode)
cfg = DefaultConfig()
}
globalConfig = cfg
return cfg, nil
}
```
**Pros**: Smallest change, fixes the panic, no behavior change for CLI users.
**Cons**: Doesn't address the architectural confusion. Error-swallowing callers
remain. Write-on-read side effect remains for CLI. Three-source config stays.
---
### Option B: Separate load paths for CLI and server
**Change**: Split `LoadConfig()` into two functions:
- `LoadConfigFromFile()` — current behavior for CLI (read/write YAML)
- `LoadConfigFromDefaults()` — returns `DefaultConfig()` for server
The server startup code calls `LoadConfigFromDefaults()` eagerly during init,
populating the singleton before any lazy `GetConfig()` call.
```go
// cmd/server.go — in serverStartCmd.Run, before InitDB()
engine.LoadConfigFromDefaults()
```
**Pros**: Fixes the panic. Server path never touches the filesystem for config.
Explicit about which mode is running. CLI behavior unchanged.
**Cons**: Two code paths to maintain. Server can't be customized without code
changes (urgency tuning, etc.).
---
### Option C: Read-only load with env-var overrides (recommended)
**Change**: Restructure `LoadConfig()` to:
1. Start with hardcoded defaults (always succeeds)
2. Layer YAML file on top **if it exists** (read-only, never create)
3. Layer environment variables on top (via Viper `AutomaticEnv`)
4. Never write to the filesystem as a side effect of loading
```go
func LoadConfig() (*Config, error) {
if globalConfig != nil {
return globalConfig, nil
}
v := viper.New()
v.SetConfigType("yaml")
// 1. Hardcoded defaults (always present)
setDefaults(v)
// 2. YAML file overlay (optional, read-only)
if configPath, err := GetConfigPath(); err == nil {
v.SetConfigFile(configPath)
if err := v.ReadInConfig(); err != nil {
if _, ok := err.(viper.ConfigFileNotFoundError); !ok {
// File exists but is malformed — report, don't clobber
if _, statErr := os.Stat(configPath); statErr == nil {
return nil, fmt.Errorf("config file %s is invalid: %w", configPath, err)
}
// File doesn't exist — that's fine, use defaults
}
}
}
// 3. Environment variable overlay
v.SetEnvPrefix("OPAL")
v.AutomaticEnv()
cfg := &Config{}
if err := v.Unmarshal(cfg); err != nil {
return nil, fmt.Errorf("failed to unmarshal config: %w", err)
}
globalConfig = cfg
return cfg, nil
}
```
File creation moves to an explicit `InitConfig()` function, called only during
`opal setup` and CLI first-run:
```go
func InitConfig() error {
configDir, err := GetConfigDir()
if err != nil {
return err
}
if err := os.MkdirAll(configDir, 0755); err != nil {
return fmt.Errorf("failed to create config directory: %w", err)
}
return SaveConfig(DefaultConfig())
}
```
Environment variable mapping:
| YAML key | Env var |
|----------|---------|
| `urgency_due_coefficient` | `OPAL_URGENCY_DUE_COEFFICIENT` |
| `default_filter` | `OPAL_DEFAULT_FILTER` |
| `week_start_day` | `OPAL_WEEK_START_DAY` |
| ... | `OPAL_<UPPER_SNAKE>` |
**Pros**:
- Fixes the panic — loading never fails fatally (missing file = defaults)
- Malformed YAML is reported, not silently clobbered
- Server can tune urgency via `opal.env` without deploying `opal.yml`
- 12-factor compatible (env vars override everything)
- CLI behavior unchanged (reads existing `opal.yml` + env overrides)
- Single code path for both CLI and server
- No filesystem write side effects during load
**Cons**:
- `SaveConfig()` still needs the manual field-sync (separate improvement)
- Slightly more Viper configuration
---
### Option D: Full restructure with typed config sections
**Change**: Split `Config` into domain-specific sub-configs and unify the
loading mechanism for all config (merge `auth.Config` into the same system).
```go
type Config struct {
Display DisplayConfig `mapstructure:"display"`
Urgency UrgencyConfig `mapstructure:"urgency"`
Sync SyncConfig `mapstructure:"sync"`
Server ServerConfig `mapstructure:"server"` // absorbs auth.Config
}
```
**Pros**: Clean separation. Single config system. Extensible.
**Cons**: Large refactor. Breaks existing `opal.yml` format. `auth.Config`
works fine as-is for its purpose. Over-engineered for the current problem.
---
## 5. Recommendation
**Option C** — read-only load with env-var overrides.
It fixes the immediate panic, eliminates the write-on-read footgun, enables
server customization via environment variables, and does it with a focused
change to one function. It doesn't over-engineer or restructure things that
work.
### Implementation plan
#### Step 1: Extract `DefaultConfig()` constructor
Create a `DefaultConfig()` function that returns a `*Config` with all defaults
populated. This is the single source of truth for default values — used by
`LoadConfig()`, `InitConfig()`, and `SaveConfig()`.
**Files**: `internal/engine/config.go`
#### Step 2: Rewrite `LoadConfig()` to be read-only
Remove directory creation and file writing. Layer: defaults → YAML (if
exists) → env vars. Return `DefaultConfig()` on missing file instead of nil.
Return error only on malformed YAML (file exists but can't be parsed).
**Files**: `internal/engine/config.go`
#### Step 3: Extract `InitConfig()` for file creation
Move the "create config directory + write defaults" logic into `InitConfig()`.
Call it from `initializeApp()` (CLI first-run path) and `opal setup`.
**Files**: `internal/engine/config.go`, `cmd/root.go`, `cmd/setup.go`
#### Step 4: Fix error-swallowing callers
Two options, from least to most invasive:
**4a (recommended)**: Make `GetConfig()` never return nil. Since `LoadConfig()`
now always returns a valid `*Config` (defaults on failure), `GetConfig()` also
always returns non-nil. The `cfg, _ := GetConfig()` pattern becomes safe. The
error return is kept for callers that want to distinguish "loaded from file" vs
"using defaults" but nil is never returned.
**4b (defensive)**: Also make `BuildUrgencyCoefficients()` accept nil and return
default coefficients. Belt-and-suspenders.
**Files**: `internal/engine/config.go`, `internal/engine/urgency.go`
#### Step 5: Enable env-var overrides via Viper
Add `v.SetEnvPrefix("OPAL")` and `v.AutomaticEnv()` so that any config key
can be overridden by `OPAL_<KEY>`. Update `opal.env` in deployment docs to
show urgency tuning examples.
**Files**: `internal/engine/config.go`, `docs/deployment.md`
#### Step 6: Eliminate `SaveConfig()` field duplication
Replace the manual `v.Set()` calls with Viper's `mapstructure` round-trip or
a reflect-based helper so new fields are automatically included.
```go
func SaveConfig(cfg *Config) error {
v := viper.New()
v.SetConfigFile(configPath)
v.SetConfigType("yaml")
// Use mapstructure tags to populate viper from struct
data, _ := mapstructure.Decode(cfg) // or manual marshal
for k, val := range data {
v.Set(k, val)
}
return v.WriteConfig()
}
```
Alternatively, marshal to YAML directly with `yaml.Marshal` and write the
bytes, bypassing Viper for the write path entirely.
**Files**: `internal/engine/config.go`
### Deployment changes
After this redesign, the Ansible role needs **no changes**`opal.env` is
sufficient. To customize urgency coefficients on the server, add lines to
`opal.env`:
```bash
# Server-side urgency tuning (optional)
OPAL_URGENCY_DUE_COEFFICIENT=12.0
OPAL_URGENCY_AGE_MAX=180
```
No `opal.yml` deployment needed. The server runs on defaults + env overrides.
---
## 6. Technical Decisions
### ADR-1: Config loading must not write to the filesystem
- **Context**: `LoadConfig()` creates directories and writes `opal.yml` as a
side effect. This fails on read-only filesystems (server) and is surprising
behavior for a "load" function.
- **Decision**: `LoadConfig()` becomes pure read. File creation moves to
`InitConfig()`, called explicitly during setup/first-run.
- **Alternatives**: Deploy `opal.yml` via Ansible; make `/etc/opal` writable.
- **Consequences**: CLI first-run code must call `InitConfig()` explicitly.
`IsFirstRun()` check remains in `initializeApp()`.
### ADR-2: Missing config file returns defaults, not an error
- **Context**: On the server, `opal.yml` doesn't exist and doesn't need to.
The current code treats this as a fatal error.
- **Decision**: Missing file = use defaults silently. Malformed file = return
error (don't clobber). `GetConfig()` never returns nil.
- **Alternatives**: Require `opal.yml` everywhere; use separate load functions
per mode.
- **Consequences**: Callers that discard errors (`cfg, _ := GetConfig()`) are
now safe. The error return still exists for callers that need to distinguish
"file loaded" from "using defaults".
### ADR-3: Environment variables can override any config key
- **Context**: The server is configured entirely via env vars (`opal.env` +
systemd). Currently urgency coefficients can only be set via `opal.yml`.
- **Decision**: Enable Viper's `AutomaticEnv()` with prefix `OPAL_`. Any YAML
key `foo_bar` can be overridden by `OPAL_FOO_BAR`.
- **Alternatives**: Add a server-specific config file; keep config values
hardcoded for server.
- **Consequences**: Layered config: defaults < YAML file < env vars. Aligns
with 12-factor app principles. Deployment can tune behavior without deploying
additional files.
### ADR-4: Do not merge auth.Config into engine.Config
- **Context**: `auth.Config` and `engine.Config` are completely independent
systems. Merging them would create a single unified config.
- **Decision**: Keep them separate. `auth.Config` loads from env vars, works
correctly, and is server-only. No reason to change it.
- **Alternatives**: Unified config struct with sections.
- **Consequences**: Two config types remain. This is acceptable because they
serve different purposes, have different lifecycles, and the current
`auth.Config` has no bugs.