diff --git a/docs/superpowers/specs/2026-06-04-debug-logging-traceability-design.md b/docs/superpowers/specs/2026-06-04-debug-logging-traceability-design.md new file mode 100644 index 0000000..723a7e6 --- /dev/null +++ b/docs/superpowers/specs/2026-06-04-debug-logging-traceability-design.md @@ -0,0 +1,88 @@ +# Debug Logging & Frontend↔Backend Traceability — Design + +**Date:** 2026-06-04 +**Status:** Approved (pending spec review) + +## Goal + +Make debug logging rich enough to diagnose problems across the UI↔Worker boundary, while keeping the installed (production) build near-silent. Verbosity is decided by **build configuration** — no runtime knob, no config field: + +- **Debug build** (Rider run button) → verbose, console + file. +- **Release build** (installed app) → minimal, file only. + +## Decisions (from brainstorming) + +1. **Mechanism:** `#if DEBUG` build-config split. Rider builds `Debug`; the installer ships `Release`. +2. **Scope:** Worker **and** App/Ui. The desktop side currently has no log sink at all — UI/IPC failures vanish today. +3. **Release behavior:** App/Ui log `Warning`+ to file (not silent — capture crashes). Worker unchanged at `Information`. +4. **One shared log file** across both processes, unified timeline. +5. **Correlation:** TaskId-based (option A). Enrich log lines with `TaskId` when one is in scope. No changes to the SignalR contract (`IWorkerClient`/`WorkerHub` untouched → test fakes untouched). + +## Verbosity matrix + +| Process | Debug build | Release build | +|---|---|---| +| Worker | `Debug` level, console + shared file | `Information` level, shared file | +| App/Ui | `Debug` level, console + shared file | `Warning` level, shared file | + +## Shared log file + +- Single daily-rolling file: `~/.todo-app/logs/claudedo-.log` (Serilog appends the date). +- `shared: true` on both processes' file sinks → Serilog coordinates multi-process writes via a global mutex. +- `retainedFileCountLimit: 7` (matches current Worker retention). +- Each line is tagged with a `Process` property (`"worker"` / `"app"`) so the two sides are distinguishable in the interleaved timeline. + +> The existing `worker-.log` is replaced by `claudedo-.log`. Task-run NDJSON (`{taskId}_run{n}.ndjson`) and `daily-prep.log` are **out of scope** — they are data streams, not diagnostic logs, and stay exactly as they are. + +## Output template + +``` +[{Timestamp:HH:mm:ss.fff} {Level:u3}] {Process}/{SourceContext} [{TaskId}] {Message:lj}{NewLine}{Exception} +``` + +- `{Process}` — `worker` or `app`. +- `{SourceContext}` — the `ILogger` category (the logging class), so you see *which* component spoke. +- `{TaskId}` — the correlation key, defaulted to `-` when no task is in scope (see enricher below). + +## Traceability (TaskId correlation) + +Use Serilog's `LogContext` (`.Enrich.FromLogContext()` on both processes) plus a small default enricher so `TaskId` is always present (renders `-` when absent — avoids the raw `{TaskId}` token leaking into output). + +Push the property at the entry points where a task is in scope; all nested `ILogger` calls inherit it automatically: + +- **Worker:** wrap per-task execution in `TaskRunner` (the run/continue entry) with `using (LogContext.PushProperty("TaskId", task.Id))`. This covers the bulk of backend activity (runner, state transitions, worktree, planning) for free. +- **App/Ui:** push `TaskId` in `WorkerClient` task-targeted calls (e.g. RunNow / Cancel / Continue / review actions) so the UI side of a task action carries the same key. + +Result: grep one `TaskId` in `claudedo-.log` and read the full UI→Worker→UI story in timestamp order. + +This adds **no parameters** to the SignalR surface — correlation rides on the existing `taskId` arguments already present in those calls. + +## Implementation surface + +A single shared helper keeps the two processes' Serilog setup from drifting. + +- **New:** `ClaudeDo.Data/Logging/LoggingSetup.cs` (or similar shared location reachable by both App and Worker) exposing the output template, the default-TaskId enricher, and a `ConfigureLogger(LoggerConfiguration, Process, logRoot)` that applies the `#if DEBUG` level/sink choices. Both processes call it so level/template/retention stay in sync. + - *Placement note:* must live in a project both `ClaudeDo.App` and `ClaudeDo.Worker` reference. `ClaudeDo.Data` qualifies; if it should not take a Serilog dependency, use a tiny new shared project. Decide during planning. +- **Worker `Program.cs:34`:** replace the inline `UseSerilog` body with a call into the shared helper (`Process = "worker"`). +- **App `Program.cs`:** add Serilog packages; build a logger via the shared helper (`Process = "app"`) and register it with `sc.AddLogging(b => b.AddSerilog(logger, dispose: true))`. App currently registers **no** logging at all, so this also makes `ILogger` injection actually work UI-side. Remove/keep `.LogToTrace()` as appropriate (Avalonia internal trace, separate concern — leave it). +- **App shutdown:** flush/close the logger (`Log.CloseAndFlush()` or dispose via the container's existing `finally`). + +### Packages to add (App project) + +- `Serilog.Extensions.Logging` (bridge `ILogger` → Serilog) +- `Serilog.Sinks.File` +- `Serilog.Sinks.Console` +- (Worker already has Serilog + File sink; add `Serilog.Sinks.Console` for the Debug console output.) + +## Testing + +- This is logging wiring; per project policy, no tests that spawn the real Claude CLI and no heavy test scaffolding for log output. +- Light verification: a unit-level check that the shared helper produces the expected minimum level per build config (Debug vs Release) and that the default enricher yields `-` when no `TaskId` is pushed. If asserting on `#if DEBUG` is awkward in a single test run, verify the Release path (the one that matters for prod) and smoke-test Debug manually from Rider. +- Manual smoke test (documented, not automated): run from Rider, confirm console + `claudedo-.log` show `Debug` lines with `Process`/`SourceContext`; run a task and confirm both `app` and `worker` lines share the same `[TaskId]`. + +## Out of scope + +- Runtime/config log-level knob. +- Per-call correlation IDs for non-task flows (connect, config edits, prep) — TaskId-only for now; revisit if a non-task flow proves to be a black hole. +- Changes to task-run NDJSON capture or `daily-prep.log`. +- Any change to `IWorkerClient` / `WorkerHub` signatures.