fix: prevent nil-panic on server and improve OAuth callback handling
Load config eagerly during server startup so sortByUrgency never hits a nil config. Add nil-guard in BuildUrgencyCoefficients as belt-and-suspenders defense. Fix OAuth callback to support both GET and POST, and resolve issuer URLs properly with path.Dir. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,482 @@
|
||||
# Config System Redesign
|
||||
|
||||
## 1. Problem Statement
|
||||
|
||||
`LoadConfig()` panics on the server because the config system assumes a writable
|
||||
config directory. On the server, `OPAL_CONFIG_DIR=/etc/opal` is read-only
|
||||
(systemd `ReadOnlyPaths`), `opal.yml` doesn't exist (Ansible only deploys
|
||||
`opal.env`), and the attempt to create it fails. The nil `*Config` propagates
|
||||
unchecked through `GetConfig()` callers to `BuildUrgencyCoefficients(nil)`,
|
||||
causing a nil-pointer panic.
|
||||
|
||||
### Panic chain
|
||||
|
||||
```
|
||||
sortByUrgency() report.go:426
|
||||
cfg, _ := GetConfig() ← error discarded, cfg = nil
|
||||
coeffs := BuildUrgencyCoefficients(cfg)
|
||||
return &UrgencyCoefficients{
|
||||
Due: cfg.UrgencyDue, ← nil dereference → PANIC
|
||||
}
|
||||
```
|
||||
|
||||
### Root cause
|
||||
|
||||
The config system was designed for the CLI (user-writable `~/.config/opal/`)
|
||||
and has three interacting issues:
|
||||
|
||||
1. **Write-on-read**: `LoadConfig()` creates directories and writes a default
|
||||
`opal.yml` as a side effect of *reading* config. This fails when the
|
||||
filesystem is read-only.
|
||||
2. **Error swallowing**: All internal callers use `cfg, _ := GetConfig()`,
|
||||
turning a load failure into a nil-pointer panic instead of a graceful error.
|
||||
3. **No mode awareness**: The same code path runs for CLI users (writable home
|
||||
dir, interactive, config file is useful) and for the server (read-only
|
||||
`/etc/opal`, headless, defaults are fine).
|
||||
|
||||
---
|
||||
|
||||
## 2. Current Architecture
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph "Config Sources"
|
||||
ENV["opal.env<br/>(systemd EnvironmentFile)"]
|
||||
YML["opal.yml<br/>(Viper / YAML)"]
|
||||
DEFAULTS["Hardcoded defaults<br/>(in LoadConfig)"]
|
||||
end
|
||||
|
||||
subgraph "Loading Mechanisms"
|
||||
AUTH_LOAD["auth.LoadConfig()<br/>os.Getenv() → auth.Config"]
|
||||
ENGINE_LOAD["engine.LoadConfig()<br/>Viper → engine.Config"]
|
||||
end
|
||||
|
||||
subgraph "Consumers"
|
||||
SERVER["API Server<br/>(handlers, middleware)"]
|
||||
CLI["CLI Commands<br/>(cmd/ package)"]
|
||||
end
|
||||
|
||||
ENV --> AUTH_LOAD
|
||||
YML --> ENGINE_LOAD
|
||||
DEFAULTS --> ENGINE_LOAD
|
||||
|
||||
AUTH_LOAD --> SERVER
|
||||
ENGINE_LOAD --> SERVER
|
||||
ENGINE_LOAD --> CLI
|
||||
```
|
||||
|
||||
### Two independent config subsystems
|
||||
|
||||
| Aspect | `engine.Config` | `auth.Config` |
|
||||
|--------|----------------|---------------|
|
||||
| **Source** | `opal.yml` via Viper | Environment variables |
|
||||
| **Loaded by** | `engine.LoadConfig()` | `auth.LoadConfig()` |
|
||||
| **Caching** | Singleton (`globalConfig`) | None (re-read each call) |
|
||||
| **Write side effects** | Creates dir + file on load | None |
|
||||
| **Error model** | Returns `(*Config, error)` | Returns `*Config` (no error) |
|
||||
| **Used by** | CLI + Server (lazy) | Server only |
|
||||
|
||||
### `engine.Config` field categories
|
||||
|
||||
| Category | Fields | CLI | Server | Notes |
|
||||
|----------|--------|-----|--------|-------|
|
||||
| **Display** | `DefaultFilter`, `DefaultSort`, `DefaultReport`, `ColorOutput` | Yes | No | Terminal-only |
|
||||
| **Date** | `WeekStartDay`, `DefaultDueTime` | Yes | Yes | Used by `ParseDate()` |
|
||||
| **Urgency** | 13 urgency coefficients | Yes | Yes | Core scoring logic |
|
||||
| **Limits** | `NextLimit` | Yes | Indirectly | Report limit |
|
||||
| **Sync** | 6 sync fields | Yes | No | Client-side sync config |
|
||||
|
||||
Key observation: The server only needs **urgency coefficients** and **date
|
||||
settings** from `engine.Config`. It doesn't need display preferences or sync
|
||||
settings. But all 30 fields are loaded through the same mechanism.
|
||||
|
||||
### `engine.LoadConfig()` flow
|
||||
|
||||
```
|
||||
1. Check globalConfig singleton → return if cached
|
||||
2. GetConfigDir() → resolve directory path
|
||||
3. os.MkdirAll(configDir) ← FAILS on read-only FS
|
||||
4. Viper.ReadInConfig()
|
||||
5. If read fails:
|
||||
a. Viper.WriteConfigAs() ← FAILS on read-only FS
|
||||
b. Re-read written file
|
||||
6. Unmarshal → Config struct
|
||||
7. Cache in globalConfig
|
||||
```
|
||||
|
||||
Steps 3 and 5a are the failure points on the server.
|
||||
|
||||
### Caller error handling audit
|
||||
|
||||
| Caller | File:Line | Error handling |
|
||||
|--------|-----------|----------------|
|
||||
| `FormatTaskListWithFormat` | display.go:24 | `cfg, _ := GetConfig()` — **ignored** |
|
||||
| `formatMinimalLine` | display.go:119 | `cfg, _ := GetConfig()` — **ignored** |
|
||||
| `NextReport.SortFunc` | report.go:168 | `cfg, _ := GetConfig()` — **ignored** |
|
||||
| `NextReport.LimitFunc` | report.go:187 | `cfg, _ := GetConfig()` — **ignored** |
|
||||
| `sortByUrgency` | report.go:426 | `cfg, _ := GetConfig()` — **ignored** |
|
||||
| `PopulateUrgency` | task.go:769 | `cfg, _ := GetConfig()` — **ignored** |
|
||||
| `getWeekStart` | dateparse.go:484 | Checked — falls back to Monday |
|
||||
| All cmd/ callers | root.go, sync.go | Checked — exits on error |
|
||||
|
||||
Every engine-internal caller except `dateparse.go` discards the error.
|
||||
|
||||
---
|
||||
|
||||
## 3. Additional Issues Found
|
||||
|
||||
### 3a. Config file clobbering
|
||||
|
||||
`LoadConfig()` line 296-306 catches **any** `ReadInConfig` error (not just
|
||||
"file not found") and overwrites the config file with defaults. A YAML syntax
|
||||
error in the user's `opal.yml` silently destroys their customizations.
|
||||
|
||||
### 3b. `SaveConfig()` manual field sync
|
||||
|
||||
Adding a new config field requires updating three places:
|
||||
1. `Config` struct definition
|
||||
2. `LoadConfig()` — `v.SetDefault(...)` call
|
||||
3. `SaveConfig()` — `v.Set(...)` call
|
||||
|
||||
Missing any one of these causes silent data loss on save or missing defaults.
|
||||
|
||||
### 3c. `auth.LoadConfig()` silent parse failures
|
||||
|
||||
```go
|
||||
jwtExpiry, _ := strconv.Atoi(getEnv("JWT_EXPIRY", "3600"))
|
||||
```
|
||||
|
||||
If `JWT_EXPIRY=abc`, `Atoi` returns `(0, error)`, error is discarded, and
|
||||
`JWTExpiry` silently becomes 0 (tokens expire immediately).
|
||||
|
||||
### 3d. Insecure JWT default
|
||||
|
||||
`JWT_SECRET` defaults to `"change-me-in-production"`. While `validateServerConfig()`
|
||||
checks for empty values, it doesn't check for the insecure default.
|
||||
|
||||
### 3e. Sync API key in plaintext
|
||||
|
||||
`sync_api_key` is stored in `opal.yml` with no special file permissions.
|
||||
|
||||
### 3f. No env-var override of YAML config
|
||||
|
||||
Viper's `AutomaticEnv()` is never called, so there's no way to override YAML
|
||||
config values via environment variables. This is an obstacle for 12-factor
|
||||
server deployments.
|
||||
|
||||
---
|
||||
|
||||
## 4. Design Options
|
||||
|
||||
### Option A: Minimal fix — graceful fallback to defaults
|
||||
|
||||
**Change**: Make `LoadConfig()` return a `DefaultConfig()` when it can't read
|
||||
or write the config file, instead of returning nil.
|
||||
|
||||
```go
|
||||
func DefaultConfig() *Config {
|
||||
return &Config{
|
||||
DefaultFilter: "status:pending",
|
||||
DefaultSort: "due,priority",
|
||||
// ... all defaults ...
|
||||
UrgencyDue: 12.0,
|
||||
// ...
|
||||
}
|
||||
}
|
||||
|
||||
func LoadConfig() (*Config, error) {
|
||||
if globalConfig != nil {
|
||||
return globalConfig, nil
|
||||
}
|
||||
|
||||
cfg, err := loadFromFile()
|
||||
if err != nil {
|
||||
// File not available — use defaults (common in server mode)
|
||||
cfg = DefaultConfig()
|
||||
}
|
||||
|
||||
globalConfig = cfg
|
||||
return cfg, nil
|
||||
}
|
||||
```
|
||||
|
||||
**Pros**: Smallest change, fixes the panic, no behavior change for CLI users.
|
||||
**Cons**: Doesn't address the architectural confusion. Error-swallowing callers
|
||||
remain. Write-on-read side effect remains for CLI. Three-source config stays.
|
||||
|
||||
---
|
||||
|
||||
### Option B: Separate load paths for CLI and server
|
||||
|
||||
**Change**: Split `LoadConfig()` into two functions:
|
||||
- `LoadConfigFromFile()` — current behavior for CLI (read/write YAML)
|
||||
- `LoadConfigFromDefaults()` — returns `DefaultConfig()` for server
|
||||
|
||||
The server startup code calls `LoadConfigFromDefaults()` eagerly during init,
|
||||
populating the singleton before any lazy `GetConfig()` call.
|
||||
|
||||
```go
|
||||
// cmd/server.go — in serverStartCmd.Run, before InitDB()
|
||||
engine.LoadConfigFromDefaults()
|
||||
```
|
||||
|
||||
**Pros**: Fixes the panic. Server path never touches the filesystem for config.
|
||||
Explicit about which mode is running. CLI behavior unchanged.
|
||||
**Cons**: Two code paths to maintain. Server can't be customized without code
|
||||
changes (urgency tuning, etc.).
|
||||
|
||||
---
|
||||
|
||||
### Option C: Read-only load with env-var overrides (recommended)
|
||||
|
||||
**Change**: Restructure `LoadConfig()` to:
|
||||
1. Start with hardcoded defaults (always succeeds)
|
||||
2. Layer YAML file on top **if it exists** (read-only, never create)
|
||||
3. Layer environment variables on top (via Viper `AutomaticEnv`)
|
||||
4. Never write to the filesystem as a side effect of loading
|
||||
|
||||
```go
|
||||
func LoadConfig() (*Config, error) {
|
||||
if globalConfig != nil {
|
||||
return globalConfig, nil
|
||||
}
|
||||
|
||||
v := viper.New()
|
||||
v.SetConfigType("yaml")
|
||||
|
||||
// 1. Hardcoded defaults (always present)
|
||||
setDefaults(v)
|
||||
|
||||
// 2. YAML file overlay (optional, read-only)
|
||||
if configPath, err := GetConfigPath(); err == nil {
|
||||
v.SetConfigFile(configPath)
|
||||
if err := v.ReadInConfig(); err != nil {
|
||||
if _, ok := err.(viper.ConfigFileNotFoundError); !ok {
|
||||
// File exists but is malformed — report, don't clobber
|
||||
if _, statErr := os.Stat(configPath); statErr == nil {
|
||||
return nil, fmt.Errorf("config file %s is invalid: %w", configPath, err)
|
||||
}
|
||||
// File doesn't exist — that's fine, use defaults
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// 3. Environment variable overlay
|
||||
v.SetEnvPrefix("OPAL")
|
||||
v.AutomaticEnv()
|
||||
|
||||
cfg := &Config{}
|
||||
if err := v.Unmarshal(cfg); err != nil {
|
||||
return nil, fmt.Errorf("failed to unmarshal config: %w", err)
|
||||
}
|
||||
|
||||
globalConfig = cfg
|
||||
return cfg, nil
|
||||
}
|
||||
```
|
||||
|
||||
File creation moves to an explicit `InitConfig()` function, called only during
|
||||
`opal setup` and CLI first-run:
|
||||
|
||||
```go
|
||||
func InitConfig() error {
|
||||
configDir, err := GetConfigDir()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if err := os.MkdirAll(configDir, 0755); err != nil {
|
||||
return fmt.Errorf("failed to create config directory: %w", err)
|
||||
}
|
||||
return SaveConfig(DefaultConfig())
|
||||
}
|
||||
```
|
||||
|
||||
Environment variable mapping:
|
||||
|
||||
| YAML key | Env var |
|
||||
|----------|---------|
|
||||
| `urgency_due_coefficient` | `OPAL_URGENCY_DUE_COEFFICIENT` |
|
||||
| `default_filter` | `OPAL_DEFAULT_FILTER` |
|
||||
| `week_start_day` | `OPAL_WEEK_START_DAY` |
|
||||
| ... | `OPAL_<UPPER_SNAKE>` |
|
||||
|
||||
**Pros**:
|
||||
- Fixes the panic — loading never fails fatally (missing file = defaults)
|
||||
- Malformed YAML is reported, not silently clobbered
|
||||
- Server can tune urgency via `opal.env` without deploying `opal.yml`
|
||||
- 12-factor compatible (env vars override everything)
|
||||
- CLI behavior unchanged (reads existing `opal.yml` + env overrides)
|
||||
- Single code path for both CLI and server
|
||||
- No filesystem write side effects during load
|
||||
|
||||
**Cons**:
|
||||
- `SaveConfig()` still needs the manual field-sync (separate improvement)
|
||||
- Slightly more Viper configuration
|
||||
|
||||
---
|
||||
|
||||
### Option D: Full restructure with typed config sections
|
||||
|
||||
**Change**: Split `Config` into domain-specific sub-configs and unify the
|
||||
loading mechanism for all config (merge `auth.Config` into the same system).
|
||||
|
||||
```go
|
||||
type Config struct {
|
||||
Display DisplayConfig `mapstructure:"display"`
|
||||
Urgency UrgencyConfig `mapstructure:"urgency"`
|
||||
Sync SyncConfig `mapstructure:"sync"`
|
||||
Server ServerConfig `mapstructure:"server"` // absorbs auth.Config
|
||||
}
|
||||
```
|
||||
|
||||
**Pros**: Clean separation. Single config system. Extensible.
|
||||
**Cons**: Large refactor. Breaks existing `opal.yml` format. `auth.Config`
|
||||
works fine as-is for its purpose. Over-engineered for the current problem.
|
||||
|
||||
---
|
||||
|
||||
## 5. Recommendation
|
||||
|
||||
**Option C** — read-only load with env-var overrides.
|
||||
|
||||
It fixes the immediate panic, eliminates the write-on-read footgun, enables
|
||||
server customization via environment variables, and does it with a focused
|
||||
change to one function. It doesn't over-engineer or restructure things that
|
||||
work.
|
||||
|
||||
### Implementation plan
|
||||
|
||||
#### Step 1: Extract `DefaultConfig()` constructor
|
||||
|
||||
Create a `DefaultConfig()` function that returns a `*Config` with all defaults
|
||||
populated. This is the single source of truth for default values — used by
|
||||
`LoadConfig()`, `InitConfig()`, and `SaveConfig()`.
|
||||
|
||||
**Files**: `internal/engine/config.go`
|
||||
|
||||
#### Step 2: Rewrite `LoadConfig()` to be read-only
|
||||
|
||||
Remove directory creation and file writing. Layer: defaults → YAML (if
|
||||
exists) → env vars. Return `DefaultConfig()` on missing file instead of nil.
|
||||
Return error only on malformed YAML (file exists but can't be parsed).
|
||||
|
||||
**Files**: `internal/engine/config.go`
|
||||
|
||||
#### Step 3: Extract `InitConfig()` for file creation
|
||||
|
||||
Move the "create config directory + write defaults" logic into `InitConfig()`.
|
||||
Call it from `initializeApp()` (CLI first-run path) and `opal setup`.
|
||||
|
||||
**Files**: `internal/engine/config.go`, `cmd/root.go`, `cmd/setup.go`
|
||||
|
||||
#### Step 4: Fix error-swallowing callers
|
||||
|
||||
Two options, from least to most invasive:
|
||||
|
||||
**4a (recommended)**: Make `GetConfig()` never return nil. Since `LoadConfig()`
|
||||
now always returns a valid `*Config` (defaults on failure), `GetConfig()` also
|
||||
always returns non-nil. The `cfg, _ := GetConfig()` pattern becomes safe. The
|
||||
error return is kept for callers that want to distinguish "loaded from file" vs
|
||||
"using defaults" but nil is never returned.
|
||||
|
||||
**4b (defensive)**: Also make `BuildUrgencyCoefficients()` accept nil and return
|
||||
default coefficients. Belt-and-suspenders.
|
||||
|
||||
**Files**: `internal/engine/config.go`, `internal/engine/urgency.go`
|
||||
|
||||
#### Step 5: Enable env-var overrides via Viper
|
||||
|
||||
Add `v.SetEnvPrefix("OPAL")` and `v.AutomaticEnv()` so that any config key
|
||||
can be overridden by `OPAL_<KEY>`. Update `opal.env` in deployment docs to
|
||||
show urgency tuning examples.
|
||||
|
||||
**Files**: `internal/engine/config.go`, `docs/deployment.md`
|
||||
|
||||
#### Step 6: Eliminate `SaveConfig()` field duplication
|
||||
|
||||
Replace the manual `v.Set()` calls with Viper's `mapstructure` round-trip or
|
||||
a reflect-based helper so new fields are automatically included.
|
||||
|
||||
```go
|
||||
func SaveConfig(cfg *Config) error {
|
||||
v := viper.New()
|
||||
v.SetConfigFile(configPath)
|
||||
v.SetConfigType("yaml")
|
||||
|
||||
// Use mapstructure tags to populate viper from struct
|
||||
data, _ := mapstructure.Decode(cfg) // or manual marshal
|
||||
for k, val := range data {
|
||||
v.Set(k, val)
|
||||
}
|
||||
return v.WriteConfig()
|
||||
}
|
||||
```
|
||||
|
||||
Alternatively, marshal to YAML directly with `yaml.Marshal` and write the
|
||||
bytes, bypassing Viper for the write path entirely.
|
||||
|
||||
**Files**: `internal/engine/config.go`
|
||||
|
||||
### Deployment changes
|
||||
|
||||
After this redesign, the Ansible role needs **no changes** — `opal.env` is
|
||||
sufficient. To customize urgency coefficients on the server, add lines to
|
||||
`opal.env`:
|
||||
|
||||
```bash
|
||||
# Server-side urgency tuning (optional)
|
||||
OPAL_URGENCY_DUE_COEFFICIENT=12.0
|
||||
OPAL_URGENCY_AGE_MAX=180
|
||||
```
|
||||
|
||||
No `opal.yml` deployment needed. The server runs on defaults + env overrides.
|
||||
|
||||
---
|
||||
|
||||
## 6. Technical Decisions
|
||||
|
||||
### ADR-1: Config loading must not write to the filesystem
|
||||
|
||||
- **Context**: `LoadConfig()` creates directories and writes `opal.yml` as a
|
||||
side effect. This fails on read-only filesystems (server) and is surprising
|
||||
behavior for a "load" function.
|
||||
- **Decision**: `LoadConfig()` becomes pure read. File creation moves to
|
||||
`InitConfig()`, called explicitly during setup/first-run.
|
||||
- **Alternatives**: Deploy `opal.yml` via Ansible; make `/etc/opal` writable.
|
||||
- **Consequences**: CLI first-run code must call `InitConfig()` explicitly.
|
||||
`IsFirstRun()` check remains in `initializeApp()`.
|
||||
|
||||
### ADR-2: Missing config file returns defaults, not an error
|
||||
|
||||
- **Context**: On the server, `opal.yml` doesn't exist and doesn't need to.
|
||||
The current code treats this as a fatal error.
|
||||
- **Decision**: Missing file = use defaults silently. Malformed file = return
|
||||
error (don't clobber). `GetConfig()` never returns nil.
|
||||
- **Alternatives**: Require `opal.yml` everywhere; use separate load functions
|
||||
per mode.
|
||||
- **Consequences**: Callers that discard errors (`cfg, _ := GetConfig()`) are
|
||||
now safe. The error return still exists for callers that need to distinguish
|
||||
"file loaded" from "using defaults".
|
||||
|
||||
### ADR-3: Environment variables can override any config key
|
||||
|
||||
- **Context**: The server is configured entirely via env vars (`opal.env` +
|
||||
systemd). Currently urgency coefficients can only be set via `opal.yml`.
|
||||
- **Decision**: Enable Viper's `AutomaticEnv()` with prefix `OPAL_`. Any YAML
|
||||
key `foo_bar` can be overridden by `OPAL_FOO_BAR`.
|
||||
- **Alternatives**: Add a server-specific config file; keep config values
|
||||
hardcoded for server.
|
||||
- **Consequences**: Layered config: defaults < YAML file < env vars. Aligns
|
||||
with 12-factor app principles. Deployment can tune behavior without deploying
|
||||
additional files.
|
||||
|
||||
### ADR-4: Do not merge auth.Config into engine.Config
|
||||
|
||||
- **Context**: `auth.Config` and `engine.Config` are completely independent
|
||||
systems. Merging them would create a single unified config.
|
||||
- **Decision**: Keep them separate. `auth.Config` loads from env vars, works
|
||||
correctly, and is server-only. No reason to change it.
|
||||
- **Alternatives**: Unified config struct with sections.
|
||||
- **Consequences**: Two config types remain. This is acceptable because they
|
||||
serve different purposes, have different lifecycles, and the current
|
||||
`auth.Config` has no bugs.
|
||||
Reference in New Issue
Block a user