Files
ClaudeDo/docs/superpowers/specs/2026-06-04-debug-logging-traceability-design.md
2026-06-04 18:20:37 +02:00

91 lines
6.3 KiB
Markdown

# Debug Logging & Frontend↔Backend Traceability — Design
**Date:** 2026-06-04
**Status:** Approved (pending spec review)
## Goal
Make debug logging rich enough to diagnose problems across the UI↔Worker boundary, while keeping the installed (production) build near-silent. Verbosity is decided by **build configuration, detected at runtime** — no runtime knob, no config field, no `#if DEBUG`:
- **Debug build** (Rider run button) → verbose, console + file.
- **Release build** (installed app) → minimal, file only.
## Decisions (from brainstorming)
1. **Mechanism:** runtime build-config detection via the entry assembly's `DebuggableAttribute` (JIT optimizer disabled ⇒ Debug build). A single `BuildConfig.IsDebug` helper drives ordinary `if` branching — no `#if DEBUG` directives. Rider's run button builds `Debug`; the installer ships `-c Release`.
2. **Scope:** Worker **and** App/Ui. The desktop side currently has no log sink at all — UI/IPC failures vanish today.
3. **Release behavior:** all three log `Warning`+ to file (not silent — capture crashes). Worker drops from its current `Information` to `Warning`.
4. **One shared log file** across both processes, unified timeline.
5. **Correlation:** TaskId-based (option A). Enrich log lines with `TaskId` when one is in scope. No changes to the SignalR contract (`IWorkerClient`/`WorkerHub` untouched → test fakes untouched).
## Verbosity matrix
| Process | Debug build | Release build |
|---|---|---|
| Worker | `Debug` level, console + shared file | `Warning` level, shared file |
| App/Ui | `Debug` level, console + shared file | `Warning` level, shared file |
## Shared log file
- Single daily-rolling file: `~/.todo-app/logs/claudedo-.log` (Serilog appends the date).
- `shared: true` on both processes' file sinks → Serilog coordinates multi-process writes via a global mutex.
- `retainedFileCountLimit: 2`.
- Each line is tagged with a `Process` property (`"worker"` / `"app"`) so the two sides are distinguishable in the interleaved timeline.
> The existing `worker-.log` is replaced by `claudedo-.log`. Task-run NDJSON (`{taskId}_run{n}.ndjson`) and `daily-prep.log` are **out of scope** — they are data streams, not diagnostic logs, and stay exactly as they are.
## Output template
```
[{Timestamp:HH:mm:ss.fff} {Level:u3}] {Process}/{SourceContext} [{TaskId}] {Message:lj}{NewLine}{Exception}
```
- `{Process}``worker` or `app`.
- `{SourceContext}` — the `ILogger<T>` category (the logging class), so you see *which* component spoke.
- `{TaskId}` — the correlation key, defaulted to `-` when no task is in scope (see enricher below).
## Traceability (TaskId correlation)
Use Serilog's `LogContext` (`.Enrich.FromLogContext()` on both processes) plus a small default enricher so `TaskId` is always present (renders `-` when absent — avoids the raw `{TaskId}` token leaking into output).
Push the property at the entry points where a task is in scope; all nested `ILogger<T>` calls inherit it automatically:
- **Worker:** wrap per-task execution in `TaskRunner` (the run/continue entry) with `using (LogContext.PushProperty("TaskId", task.Id))`. This covers the bulk of backend activity (runner, state transitions, worktree, planning) for free.
- **App/Ui:** push `TaskId` in `WorkerClient` task-targeted calls (e.g. RunNow / Cancel / Continue / review actions) so the UI side of a task action carries the same key.
Result: grep one `TaskId` in `claudedo-.log` and read the full UI→Worker→UI story in timestamp order.
This adds **no parameters** to the SignalR surface — correlation rides on the existing `taskId` arguments already present in those calls.
## Implementation surface
A single shared helper keeps the two processes' Serilog setup from drifting.
- **New project:** `ClaudeDo.Logging` — a small library both `ClaudeDo.App` and `ClaudeDo.Worker` reference (keeps `ClaudeDo.Data` free of any Serilog dependency). Contains:
- `BuildConfig.IsDebug` — checks the entry assembly's `DebuggableAttribute` (`IsJITOptimizerDisabled` ⇒ Debug build). Cached static.
- The output template and the default-TaskId enricher.
- `ConfigureLogger(LoggerConfiguration, processTag, logRoot)` — applies level/sink choices by branching on `BuildConfig.IsDebug` (Debug ⇒ `Debug` level + console + file; Release ⇒ `Warning` level + file only). Both processes call it so level/template/retention stay in sync.
- **Worker `Program.cs:34`:** replace the inline `UseSerilog` body with a call into the shared helper (`processTag = "worker"`).
- **App `Program.cs`:** add Serilog packages; build a logger via the shared helper (`Process = "app"`) and register it with `sc.AddLogging(b => b.AddSerilog(logger, dispose: true))`. App currently registers **no** logging at all, so this also makes `ILogger<T>` injection actually work UI-side. Remove/keep `.LogToTrace()` as appropriate (Avalonia internal trace, separate concern — leave it).
- **App shutdown:** flush/close the logger (`Log.CloseAndFlush()` or dispose via the container's existing `finally`).
### Packages to add (App project)
- `Serilog.Extensions.Logging` (bridge `ILogger` → Serilog)
- `Serilog.Sinks.File`
- `Serilog.Sinks.Console`
- (Worker already has Serilog + File sink; add `Serilog.Sinks.Console` for the Debug console output.)
## Testing
- This is logging wiring; per project policy, no tests that spawn the real Claude CLI and no heavy test scaffolding for log output.
- Light verification: a unit-level check that the default enricher yields `-` when no `TaskId` is pushed, and (if practical) that `ConfigureLogger` wires the expected sinks. `BuildConfig.IsDebug` reflects the test assembly's own build config, so it can't be flipped within one run — assert each branch by passing the level/flag explicitly rather than relying on the ambient value, or verify the Release path and smoke-test Debug manually from Rider.
- Manual smoke test (documented, not automated): run from Rider, confirm console + `claudedo-.log` show `Debug` lines with `Process`/`SourceContext`; run a task and confirm both `app` and `worker` lines share the same `[TaskId]`.
## Out of scope
- Runtime/config log-level knob.
- Per-call correlation IDs for non-task flows (connect, config edits, prep) — TaskId-only for now; revisit if a non-task flow proves to be a black hole.
- Changes to task-run NDJSON capture or `daily-prep.log`.
- Any change to `IWorkerClient` / `WorkerHub` signatures.