docs: add worker lifecycle redesign spec

Startup-folder shortcut replaces the scheduled task; App only connects and
prompts on connection failure instead of auto-spawning a worker.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
mika kuns
2026-06-01 11:55:08 +02:00
parent 926471da6b
commit 4963a726de

View File

@@ -0,0 +1,153 @@
# Worker Lifecycle Redesign
**Date:** 2026-06-01
**Status:** Approved (design)
## Problem
The worker process has multiple competing owners, which collide in development and
muddy production behavior:
- The App auto-spawns its own worker on startup (`EnsureWorkerRunningAsync`,
`IslandsShellViewModel.cs:310`, called at line 224) ~4s after launch if it isn't
yet connected. In the IDE "Start Everything" multilaunch — which already runs the
worker via the `http` launch profile (`dotnet run`) — this produces a *second*
worker that fails to bind to `127.0.0.1:47821` and dies, surfacing a stray console
with a "failed to bind to address" error.
- Production autostart uses a per-user logon **Scheduled Task** (`RegisterAutostartStep`
+ `ScheduledTaskXml`), which the user wants to replace with a simpler Startup-folder
shortcut.
- When the App can't reach the worker, the only feedback is a silent "Offline" pill in
the footer — no guidance to the user.
## Goal
Establish a single owner for the worker lifecycle and make connection failures
actionable:
1. The worker is owned **externally** — a per-user **Startup-folder shortcut** in
production (replacing the Scheduled Task), or the IDE in development.
2. The App **only connects**; it never auto-spawns a worker.
3. When the App can't connect, it shows a one-time prompt offering **Start Worker**,
**Rerun Installer**, or **Dismiss**, plus a clickable Offline pill to reopen it.
## Non-Goals
- No change to the IDE dev setup. The "Start Everything" multilaunch keeps running the
worker via the `http` profile (console with live logs); the duplicate/bind-error
worker disappears purely because the App no longer auto-spawns. Rider run configs live
in `.idea/.../workspace.xml` (per-user, gitignored) and are out of scope.
- No change to the SignalR hub URL, port, reconnect policy, or the worker's
single-instance mutex.
## Design
### Component 1 — Installer: Scheduled Task → Startup-folder shortcut
**`RegisterAutostartStep`** (`src/ClaudeDo.Installer/Steps/RegisterAutostartStep.cs`)
- Replace the task-XML build + `schtasks /Create` with creation of a `.lnk` in the
per-user Startup folder (`Environment.SpecialFolder.Startup`) targeting
`{InstallDirectory}\worker\ClaudeDo.Worker.exe`. The worker is `WinExe`, so it launches
with no console window.
- **Migration:** keep the existing legacy Windows-service removal, and **add** removal of
the old scheduled task: `schtasks.exe /Delete /TN "ClaudeDoWorker" /F` (best-effort),
so existing installs migrate cleanly to the shortcut model.
**`StartWorkerStep`** (`src/ClaudeDo.Installer/Steps/StartWorkerStep.cs`)
- Replace `schtasks /Run /TN ClaudeDoWorker` with a direct
`Process.Start(new ProcessStartInfo(workerExe) { UseShellExecute = true })`.
**`StopWorkerStep`** (`src/ClaudeDo.Installer/Steps/StopWorkerStep.cs`)
- Drop the `schtasks /End` call. Keep the existing install-dir-scoped process kill, which
is the real stop mechanism.
**`UninstallRunner`** (`src/ClaudeDo.Installer/Core/UninstallRunner.cs`)
- Keep the existing `schtasks /Delete` and `sc delete` (migration/legacy cleanup).
- **Add** deletion of the Startup-folder `.lnk` alongside the existing Start Menu /
Desktop shortcut removal.
**Shared shortcut helper**
- Extract the `IShellLink` COM interop currently embedded in `CreateShortcutsStep` into a
shared `src/ClaudeDo.Installer/Core/ShortcutFactory.cs` (`CreateShortcut(path, target,
workingDir, description)`). Both `CreateShortcutsStep` and `RegisterAutostartStep` use it.
**Cleanup**
- Delete `src/ClaudeDo.Installer/Core/ScheduledTaskXml.cs` once unreferenced.
The autostart shortcut name and location: `ClaudeDo Worker.lnk` in
`Environment.SpecialFolder.Startup`, working directory `{InstallDirectory}\worker`.
### Component 2 — App: stop auto-spawning the worker
**`IslandsShellViewModel`** (`src/ClaudeDo.Ui/ViewModels/IslandsShellViewModel.cs`)
- Remove the `_ = EnsureWorkerRunningAsync();` call (line 224) and the
`EnsureWorkerRunningAsync` method + its `_ensureRunningAttempted` flag.
- Keep the worker-launch logic (`RestartWorkerService`, which finds the worker exe via
`WorkerLocator` and starts it) — it becomes the backing action for the prompt's
**Start Worker** button. The existing `RestartWorkerAsync` command stays.
### Component 3 — App: connection-failure prompt
**New dialog** `WorkerConnectionModalViewModel`
(`src/ClaudeDo.Ui/ViewModels/Modals/WorkerConnectionModalViewModel.cs`) +
`WorkerConnectionModalView` (`src/ClaudeDo.Ui/Views/Modals/`).
- Buttons: **Start Worker**, **Rerun Installer**, **Dismiss**.
- Uses the established dialog pattern: a `Func<WorkerConnectionModalViewModel, Task>`
hook on `IslandsShellViewModel` set by `MainWindow` (mirroring `ShowAboutModal`), and
the dialog resolves a `TaskCompletionSource` on button press.
- **Start Worker** → `WorkerLocator.Find()` + `Process.Start` (reuse the
`RestartWorkerService` path). **Rerun Installer**`InstallerLocator.Find()` + launch
+ `Environment.Exit(0)` (same pattern as the existing `UpdateNow` command).
**Dismiss** → close.
**Trigger logic** (in `IslandsShellViewModel`)
- A one-shot grace timer (~12s) started on construction/startup. When it elapses, if the
worker is still offline (`IsOffline` — not connected and not reconnecting) and the
prompt hasn't been shown yet (`_connectionPromptShown`), show the dialog once and set
the flag.
- If the worker connects before the grace elapses, the prompt is never shown.
**Clickable Offline pill** (`src/ClaudeDo.Ui/Views/MainWindow.axaml`)
- Turn the footer status pill into a button bound to a command that opens the same dialog
on demand (independent of the one-shot flag), so the user can reopen guidance anytime
while offline.
### Component 4 — Dev
No code change (see Non-Goals).
## Data Flow
```
Startup (production):
Windows logon -> Startup-folder .lnk -> ClaudeDo.Worker.exe (WinExe, mutex-guarded)
App launches -> WorkerClient connects to 127.0.0.1:47821
connected within grace -> Online pill, no prompt
still offline after ~12s -> WorkerConnectionModal (once)
User clicks Offline pill (anytime offline) -> WorkerConnectionModal
Start Worker -> Process.Start(worker exe)
Rerun Installer -> Process.Start(installer), Environment.Exit(0)
Dismiss -> close
```
## Error Handling
- Worker exe / installer not found (`Locator.Find()` returns null): the corresponding
dialog button is a no-op (consistent with existing `UpdateNow` behavior); the dialog
stays open so the user can pick another action.
- Startup-shortcut creation failure in the installer: surfaced as a failed install step
(`StepResult.Fail`), same as the current task-registration failure path.
- Legacy scheduled-task deletion is best-effort and never fails the install.
## Testing
- **`Installer.Tests`**: `RegisterAutostartStep` creates the Startup `.lnk` at the
expected path with the correct target, and issues the legacy-task delete command.
`UninstallRunner` removes the Startup `.lnk`.
- **`Ui.Tests`**: prompt trigger logic — grace elapsed while offline shows the prompt
exactly once; a connection established before grace suppresses it; the clickable-pill
command opens the dialog regardless of the one-shot flag. (Abstract the dialog-show
hook so it can be asserted without real UI.)
- **Manual**: dialog buttons (Start Worker / Rerun Installer / Dismiss) and the clickable
Offline pill in a running App.