docs: add worker lifecycle redesign spec
Startup-folder shortcut replaces the scheduled task; App only connects and prompts on connection failure instead of auto-spawning a worker. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
153
docs/superpowers/specs/2026-06-01-worker-lifecycle-design.md
Normal file
153
docs/superpowers/specs/2026-06-01-worker-lifecycle-design.md
Normal file
@@ -0,0 +1,153 @@
|
||||
# Worker Lifecycle Redesign
|
||||
|
||||
**Date:** 2026-06-01
|
||||
**Status:** Approved (design)
|
||||
|
||||
## Problem
|
||||
|
||||
The worker process has multiple competing owners, which collide in development and
|
||||
muddy production behavior:
|
||||
|
||||
- The App auto-spawns its own worker on startup (`EnsureWorkerRunningAsync`,
|
||||
`IslandsShellViewModel.cs:310`, called at line 224) ~4s after launch if it isn't
|
||||
yet connected. In the IDE "Start Everything" multilaunch — which already runs the
|
||||
worker via the `http` launch profile (`dotnet run`) — this produces a *second*
|
||||
worker that fails to bind to `127.0.0.1:47821` and dies, surfacing a stray console
|
||||
with a "failed to bind to address" error.
|
||||
- Production autostart uses a per-user logon **Scheduled Task** (`RegisterAutostartStep`
|
||||
+ `ScheduledTaskXml`), which the user wants to replace with a simpler Startup-folder
|
||||
shortcut.
|
||||
- When the App can't reach the worker, the only feedback is a silent "Offline" pill in
|
||||
the footer — no guidance to the user.
|
||||
|
||||
## Goal
|
||||
|
||||
Establish a single owner for the worker lifecycle and make connection failures
|
||||
actionable:
|
||||
|
||||
1. The worker is owned **externally** — a per-user **Startup-folder shortcut** in
|
||||
production (replacing the Scheduled Task), or the IDE in development.
|
||||
2. The App **only connects**; it never auto-spawns a worker.
|
||||
3. When the App can't connect, it shows a one-time prompt offering **Start Worker**,
|
||||
**Rerun Installer**, or **Dismiss**, plus a clickable Offline pill to reopen it.
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- No change to the IDE dev setup. The "Start Everything" multilaunch keeps running the
|
||||
worker via the `http` profile (console with live logs); the duplicate/bind-error
|
||||
worker disappears purely because the App no longer auto-spawns. Rider run configs live
|
||||
in `.idea/.../workspace.xml` (per-user, gitignored) and are out of scope.
|
||||
- No change to the SignalR hub URL, port, reconnect policy, or the worker's
|
||||
single-instance mutex.
|
||||
|
||||
## Design
|
||||
|
||||
### Component 1 — Installer: Scheduled Task → Startup-folder shortcut
|
||||
|
||||
**`RegisterAutostartStep`** (`src/ClaudeDo.Installer/Steps/RegisterAutostartStep.cs`)
|
||||
- Replace the task-XML build + `schtasks /Create` with creation of a `.lnk` in the
|
||||
per-user Startup folder (`Environment.SpecialFolder.Startup`) targeting
|
||||
`{InstallDirectory}\worker\ClaudeDo.Worker.exe`. The worker is `WinExe`, so it launches
|
||||
with no console window.
|
||||
- **Migration:** keep the existing legacy Windows-service removal, and **add** removal of
|
||||
the old scheduled task: `schtasks.exe /Delete /TN "ClaudeDoWorker" /F` (best-effort),
|
||||
so existing installs migrate cleanly to the shortcut model.
|
||||
|
||||
**`StartWorkerStep`** (`src/ClaudeDo.Installer/Steps/StartWorkerStep.cs`)
|
||||
- Replace `schtasks /Run /TN ClaudeDoWorker` with a direct
|
||||
`Process.Start(new ProcessStartInfo(workerExe) { UseShellExecute = true })`.
|
||||
|
||||
**`StopWorkerStep`** (`src/ClaudeDo.Installer/Steps/StopWorkerStep.cs`)
|
||||
- Drop the `schtasks /End` call. Keep the existing install-dir-scoped process kill, which
|
||||
is the real stop mechanism.
|
||||
|
||||
**`UninstallRunner`** (`src/ClaudeDo.Installer/Core/UninstallRunner.cs`)
|
||||
- Keep the existing `schtasks /Delete` and `sc delete` (migration/legacy cleanup).
|
||||
- **Add** deletion of the Startup-folder `.lnk` alongside the existing Start Menu /
|
||||
Desktop shortcut removal.
|
||||
|
||||
**Shared shortcut helper**
|
||||
- Extract the `IShellLink` COM interop currently embedded in `CreateShortcutsStep` into a
|
||||
shared `src/ClaudeDo.Installer/Core/ShortcutFactory.cs` (`CreateShortcut(path, target,
|
||||
workingDir, description)`). Both `CreateShortcutsStep` and `RegisterAutostartStep` use it.
|
||||
|
||||
**Cleanup**
|
||||
- Delete `src/ClaudeDo.Installer/Core/ScheduledTaskXml.cs` once unreferenced.
|
||||
|
||||
The autostart shortcut name and location: `ClaudeDo Worker.lnk` in
|
||||
`Environment.SpecialFolder.Startup`, working directory `{InstallDirectory}\worker`.
|
||||
|
||||
### Component 2 — App: stop auto-spawning the worker
|
||||
|
||||
**`IslandsShellViewModel`** (`src/ClaudeDo.Ui/ViewModels/IslandsShellViewModel.cs`)
|
||||
- Remove the `_ = EnsureWorkerRunningAsync();` call (line 224) and the
|
||||
`EnsureWorkerRunningAsync` method + its `_ensureRunningAttempted` flag.
|
||||
- Keep the worker-launch logic (`RestartWorkerService`, which finds the worker exe via
|
||||
`WorkerLocator` and starts it) — it becomes the backing action for the prompt's
|
||||
**Start Worker** button. The existing `RestartWorkerAsync` command stays.
|
||||
|
||||
### Component 3 — App: connection-failure prompt
|
||||
|
||||
**New dialog** `WorkerConnectionModalViewModel`
|
||||
(`src/ClaudeDo.Ui/ViewModels/Modals/WorkerConnectionModalViewModel.cs`) +
|
||||
`WorkerConnectionModalView` (`src/ClaudeDo.Ui/Views/Modals/`).
|
||||
- Buttons: **Start Worker**, **Rerun Installer**, **Dismiss**.
|
||||
- Uses the established dialog pattern: a `Func<WorkerConnectionModalViewModel, Task>`
|
||||
hook on `IslandsShellViewModel` set by `MainWindow` (mirroring `ShowAboutModal`), and
|
||||
the dialog resolves a `TaskCompletionSource` on button press.
|
||||
- **Start Worker** → `WorkerLocator.Find()` + `Process.Start` (reuse the
|
||||
`RestartWorkerService` path). **Rerun Installer** → `InstallerLocator.Find()` + launch
|
||||
+ `Environment.Exit(0)` (same pattern as the existing `UpdateNow` command).
|
||||
**Dismiss** → close.
|
||||
|
||||
**Trigger logic** (in `IslandsShellViewModel`)
|
||||
- A one-shot grace timer (~12s) started on construction/startup. When it elapses, if the
|
||||
worker is still offline (`IsOffline` — not connected and not reconnecting) and the
|
||||
prompt hasn't been shown yet (`_connectionPromptShown`), show the dialog once and set
|
||||
the flag.
|
||||
- If the worker connects before the grace elapses, the prompt is never shown.
|
||||
|
||||
**Clickable Offline pill** (`src/ClaudeDo.Ui/Views/MainWindow.axaml`)
|
||||
- Turn the footer status pill into a button bound to a command that opens the same dialog
|
||||
on demand (independent of the one-shot flag), so the user can reopen guidance anytime
|
||||
while offline.
|
||||
|
||||
### Component 4 — Dev
|
||||
|
||||
No code change (see Non-Goals).
|
||||
|
||||
## Data Flow
|
||||
|
||||
```
|
||||
Startup (production):
|
||||
Windows logon -> Startup-folder .lnk -> ClaudeDo.Worker.exe (WinExe, mutex-guarded)
|
||||
App launches -> WorkerClient connects to 127.0.0.1:47821
|
||||
connected within grace -> Online pill, no prompt
|
||||
still offline after ~12s -> WorkerConnectionModal (once)
|
||||
|
||||
User clicks Offline pill (anytime offline) -> WorkerConnectionModal
|
||||
Start Worker -> Process.Start(worker exe)
|
||||
Rerun Installer -> Process.Start(installer), Environment.Exit(0)
|
||||
Dismiss -> close
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
- Worker exe / installer not found (`Locator.Find()` returns null): the corresponding
|
||||
dialog button is a no-op (consistent with existing `UpdateNow` behavior); the dialog
|
||||
stays open so the user can pick another action.
|
||||
- Startup-shortcut creation failure in the installer: surfaced as a failed install step
|
||||
(`StepResult.Fail`), same as the current task-registration failure path.
|
||||
- Legacy scheduled-task deletion is best-effort and never fails the install.
|
||||
|
||||
## Testing
|
||||
|
||||
- **`Installer.Tests`**: `RegisterAutostartStep` creates the Startup `.lnk` at the
|
||||
expected path with the correct target, and issues the legacy-task delete command.
|
||||
`UninstallRunner` removes the Startup `.lnk`.
|
||||
- **`Ui.Tests`**: prompt trigger logic — grace elapsed while offline shows the prompt
|
||||
exactly once; a connection established before grace suppresses it; the clickable-pill
|
||||
command opens the dialog regardless of the one-shot flag. (Abstract the dialog-show
|
||||
hook so it can be asserted without real UI.)
|
||||
- **Manual**: dialog buttons (Start Worker / Rerun Installer / Dismiss) and the clickable
|
||||
Offline pill in a running App.
|
||||
Reference in New Issue
Block a user