diff --git a/docs/superpowers/specs/2026-06-01-worker-lifecycle-design.md b/docs/superpowers/specs/2026-06-01-worker-lifecycle-design.md new file mode 100644 index 0000000..4d56dfc --- /dev/null +++ b/docs/superpowers/specs/2026-06-01-worker-lifecycle-design.md @@ -0,0 +1,153 @@ +# Worker Lifecycle Redesign + +**Date:** 2026-06-01 +**Status:** Approved (design) + +## Problem + +The worker process has multiple competing owners, which collide in development and +muddy production behavior: + +- The App auto-spawns its own worker on startup (`EnsureWorkerRunningAsync`, + `IslandsShellViewModel.cs:310`, called at line 224) ~4s after launch if it isn't + yet connected. In the IDE "Start Everything" multilaunch — which already runs the + worker via the `http` launch profile (`dotnet run`) — this produces a *second* + worker that fails to bind to `127.0.0.1:47821` and dies, surfacing a stray console + with a "failed to bind to address" error. +- Production autostart uses a per-user logon **Scheduled Task** (`RegisterAutostartStep` + + `ScheduledTaskXml`), which the user wants to replace with a simpler Startup-folder + shortcut. +- When the App can't reach the worker, the only feedback is a silent "Offline" pill in + the footer — no guidance to the user. + +## Goal + +Establish a single owner for the worker lifecycle and make connection failures +actionable: + +1. The worker is owned **externally** — a per-user **Startup-folder shortcut** in + production (replacing the Scheduled Task), or the IDE in development. +2. The App **only connects**; it never auto-spawns a worker. +3. When the App can't connect, it shows a one-time prompt offering **Start Worker**, + **Rerun Installer**, or **Dismiss**, plus a clickable Offline pill to reopen it. + +## Non-Goals + +- No change to the IDE dev setup. The "Start Everything" multilaunch keeps running the + worker via the `http` profile (console with live logs); the duplicate/bind-error + worker disappears purely because the App no longer auto-spawns. Rider run configs live + in `.idea/.../workspace.xml` (per-user, gitignored) and are out of scope. +- No change to the SignalR hub URL, port, reconnect policy, or the worker's + single-instance mutex. + +## Design + +### Component 1 — Installer: Scheduled Task → Startup-folder shortcut + +**`RegisterAutostartStep`** (`src/ClaudeDo.Installer/Steps/RegisterAutostartStep.cs`) +- Replace the task-XML build + `schtasks /Create` with creation of a `.lnk` in the + per-user Startup folder (`Environment.SpecialFolder.Startup`) targeting + `{InstallDirectory}\worker\ClaudeDo.Worker.exe`. The worker is `WinExe`, so it launches + with no console window. +- **Migration:** keep the existing legacy Windows-service removal, and **add** removal of + the old scheduled task: `schtasks.exe /Delete /TN "ClaudeDoWorker" /F` (best-effort), + so existing installs migrate cleanly to the shortcut model. + +**`StartWorkerStep`** (`src/ClaudeDo.Installer/Steps/StartWorkerStep.cs`) +- Replace `schtasks /Run /TN ClaudeDoWorker` with a direct + `Process.Start(new ProcessStartInfo(workerExe) { UseShellExecute = true })`. + +**`StopWorkerStep`** (`src/ClaudeDo.Installer/Steps/StopWorkerStep.cs`) +- Drop the `schtasks /End` call. Keep the existing install-dir-scoped process kill, which + is the real stop mechanism. + +**`UninstallRunner`** (`src/ClaudeDo.Installer/Core/UninstallRunner.cs`) +- Keep the existing `schtasks /Delete` and `sc delete` (migration/legacy cleanup). +- **Add** deletion of the Startup-folder `.lnk` alongside the existing Start Menu / + Desktop shortcut removal. + +**Shared shortcut helper** +- Extract the `IShellLink` COM interop currently embedded in `CreateShortcutsStep` into a + shared `src/ClaudeDo.Installer/Core/ShortcutFactory.cs` (`CreateShortcut(path, target, + workingDir, description)`). Both `CreateShortcutsStep` and `RegisterAutostartStep` use it. + +**Cleanup** +- Delete `src/ClaudeDo.Installer/Core/ScheduledTaskXml.cs` once unreferenced. + +The autostart shortcut name and location: `ClaudeDo Worker.lnk` in +`Environment.SpecialFolder.Startup`, working directory `{InstallDirectory}\worker`. + +### Component 2 — App: stop auto-spawning the worker + +**`IslandsShellViewModel`** (`src/ClaudeDo.Ui/ViewModels/IslandsShellViewModel.cs`) +- Remove the `_ = EnsureWorkerRunningAsync();` call (line 224) and the + `EnsureWorkerRunningAsync` method + its `_ensureRunningAttempted` flag. +- Keep the worker-launch logic (`RestartWorkerService`, which finds the worker exe via + `WorkerLocator` and starts it) — it becomes the backing action for the prompt's + **Start Worker** button. The existing `RestartWorkerAsync` command stays. + +### Component 3 — App: connection-failure prompt + +**New dialog** `WorkerConnectionModalViewModel` +(`src/ClaudeDo.Ui/ViewModels/Modals/WorkerConnectionModalViewModel.cs`) + +`WorkerConnectionModalView` (`src/ClaudeDo.Ui/Views/Modals/`). +- Buttons: **Start Worker**, **Rerun Installer**, **Dismiss**. +- Uses the established dialog pattern: a `Func` + hook on `IslandsShellViewModel` set by `MainWindow` (mirroring `ShowAboutModal`), and + the dialog resolves a `TaskCompletionSource` on button press. +- **Start Worker** → `WorkerLocator.Find()` + `Process.Start` (reuse the + `RestartWorkerService` path). **Rerun Installer** → `InstallerLocator.Find()` + launch + + `Environment.Exit(0)` (same pattern as the existing `UpdateNow` command). + **Dismiss** → close. + +**Trigger logic** (in `IslandsShellViewModel`) +- A one-shot grace timer (~12s) started on construction/startup. When it elapses, if the + worker is still offline (`IsOffline` — not connected and not reconnecting) and the + prompt hasn't been shown yet (`_connectionPromptShown`), show the dialog once and set + the flag. +- If the worker connects before the grace elapses, the prompt is never shown. + +**Clickable Offline pill** (`src/ClaudeDo.Ui/Views/MainWindow.axaml`) +- Turn the footer status pill into a button bound to a command that opens the same dialog + on demand (independent of the one-shot flag), so the user can reopen guidance anytime + while offline. + +### Component 4 — Dev + +No code change (see Non-Goals). + +## Data Flow + +``` +Startup (production): + Windows logon -> Startup-folder .lnk -> ClaudeDo.Worker.exe (WinExe, mutex-guarded) + App launches -> WorkerClient connects to 127.0.0.1:47821 + connected within grace -> Online pill, no prompt + still offline after ~12s -> WorkerConnectionModal (once) + +User clicks Offline pill (anytime offline) -> WorkerConnectionModal + Start Worker -> Process.Start(worker exe) + Rerun Installer -> Process.Start(installer), Environment.Exit(0) + Dismiss -> close +``` + +## Error Handling + +- Worker exe / installer not found (`Locator.Find()` returns null): the corresponding + dialog button is a no-op (consistent with existing `UpdateNow` behavior); the dialog + stays open so the user can pick another action. +- Startup-shortcut creation failure in the installer: surfaced as a failed install step + (`StepResult.Fail`), same as the current task-registration failure path. +- Legacy scheduled-task deletion is best-effort and never fails the install. + +## Testing + +- **`Installer.Tests`**: `RegisterAutostartStep` creates the Startup `.lnk` at the + expected path with the correct target, and issues the legacy-task delete command. + `UninstallRunner` removes the Startup `.lnk`. +- **`Ui.Tests`**: prompt trigger logic — grace elapsed while offline shows the prompt + exactly once; a connection established before grace suppresses it; the clickable-pill + command opens the dialog regardless of the one-shot flag. (Abstract the dialog-show + hook so it can be asserted without real UI.) +- **Manual**: dialog buttons (Start Worker / Rerun Installer / Dismiss) and the clickable + Offline pill in a running App.