Files
ClaudeDo/docs/superpowers/specs/2026-06-01-worker-lifecycle-design.md
mika kuns 4963a726de docs: add worker lifecycle redesign spec
Startup-folder shortcut replaces the scheduled task; App only connects and
prompts on connection failure instead of auto-spawning a worker.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 11:55:08 +02:00

7.4 KiB

Worker Lifecycle Redesign

Date: 2026-06-01 Status: Approved (design)

Problem

The worker process has multiple competing owners, which collide in development and muddy production behavior:

  • The App auto-spawns its own worker on startup (EnsureWorkerRunningAsync, IslandsShellViewModel.cs:310, called at line 224) ~4s after launch if it isn't yet connected. In the IDE "Start Everything" multilaunch — which already runs the worker via the http launch profile (dotnet run) — this produces a second worker that fails to bind to 127.0.0.1:47821 and dies, surfacing a stray console with a "failed to bind to address" error.
  • Production autostart uses a per-user logon Scheduled Task (RegisterAutostartStep
    • ScheduledTaskXml), which the user wants to replace with a simpler Startup-folder shortcut.
  • When the App can't reach the worker, the only feedback is a silent "Offline" pill in the footer — no guidance to the user.

Goal

Establish a single owner for the worker lifecycle and make connection failures actionable:

  1. The worker is owned externally — a per-user Startup-folder shortcut in production (replacing the Scheduled Task), or the IDE in development.
  2. The App only connects; it never auto-spawns a worker.
  3. When the App can't connect, it shows a one-time prompt offering Start Worker, Rerun Installer, or Dismiss, plus a clickable Offline pill to reopen it.

Non-Goals

  • No change to the IDE dev setup. The "Start Everything" multilaunch keeps running the worker via the http profile (console with live logs); the duplicate/bind-error worker disappears purely because the App no longer auto-spawns. Rider run configs live in .idea/.../workspace.xml (per-user, gitignored) and are out of scope.
  • No change to the SignalR hub URL, port, reconnect policy, or the worker's single-instance mutex.

Design

Component 1 — Installer: Scheduled Task → Startup-folder shortcut

RegisterAutostartStep (src/ClaudeDo.Installer/Steps/RegisterAutostartStep.cs)

  • Replace the task-XML build + schtasks /Create with creation of a .lnk in the per-user Startup folder (Environment.SpecialFolder.Startup) targeting {InstallDirectory}\worker\ClaudeDo.Worker.exe. The worker is WinExe, so it launches with no console window.
  • Migration: keep the existing legacy Windows-service removal, and add removal of the old scheduled task: schtasks.exe /Delete /TN "ClaudeDoWorker" /F (best-effort), so existing installs migrate cleanly to the shortcut model.

StartWorkerStep (src/ClaudeDo.Installer/Steps/StartWorkerStep.cs)

  • Replace schtasks /Run /TN ClaudeDoWorker with a direct Process.Start(new ProcessStartInfo(workerExe) { UseShellExecute = true }).

StopWorkerStep (src/ClaudeDo.Installer/Steps/StopWorkerStep.cs)

  • Drop the schtasks /End call. Keep the existing install-dir-scoped process kill, which is the real stop mechanism.

UninstallRunner (src/ClaudeDo.Installer/Core/UninstallRunner.cs)

  • Keep the existing schtasks /Delete and sc delete (migration/legacy cleanup).
  • Add deletion of the Startup-folder .lnk alongside the existing Start Menu / Desktop shortcut removal.

Shared shortcut helper

  • Extract the IShellLink COM interop currently embedded in CreateShortcutsStep into a shared src/ClaudeDo.Installer/Core/ShortcutFactory.cs (CreateShortcut(path, target, workingDir, description)). Both CreateShortcutsStep and RegisterAutostartStep use it.

Cleanup

  • Delete src/ClaudeDo.Installer/Core/ScheduledTaskXml.cs once unreferenced.

The autostart shortcut name and location: ClaudeDo Worker.lnk in Environment.SpecialFolder.Startup, working directory {InstallDirectory}\worker.

Component 2 — App: stop auto-spawning the worker

IslandsShellViewModel (src/ClaudeDo.Ui/ViewModels/IslandsShellViewModel.cs)

  • Remove the _ = EnsureWorkerRunningAsync(); call (line 224) and the EnsureWorkerRunningAsync method + its _ensureRunningAttempted flag.
  • Keep the worker-launch logic (RestartWorkerService, which finds the worker exe via WorkerLocator and starts it) — it becomes the backing action for the prompt's Start Worker button. The existing RestartWorkerAsync command stays.

Component 3 — App: connection-failure prompt

New dialog WorkerConnectionModalViewModel (src/ClaudeDo.Ui/ViewModels/Modals/WorkerConnectionModalViewModel.cs) + WorkerConnectionModalView (src/ClaudeDo.Ui/Views/Modals/).

  • Buttons: Start Worker, Rerun Installer, Dismiss.
  • Uses the established dialog pattern: a Func<WorkerConnectionModalViewModel, Task> hook on IslandsShellViewModel set by MainWindow (mirroring ShowAboutModal), and the dialog resolves a TaskCompletionSource on button press.
  • Start WorkerWorkerLocator.Find() + Process.Start (reuse the RestartWorkerService path). Rerun InstallerInstallerLocator.Find() + launch
    • Environment.Exit(0) (same pattern as the existing UpdateNow command). Dismiss → close.

Trigger logic (in IslandsShellViewModel)

  • A one-shot grace timer (~12s) started on construction/startup. When it elapses, if the worker is still offline (IsOffline — not connected and not reconnecting) and the prompt hasn't been shown yet (_connectionPromptShown), show the dialog once and set the flag.
  • If the worker connects before the grace elapses, the prompt is never shown.

Clickable Offline pill (src/ClaudeDo.Ui/Views/MainWindow.axaml)

  • Turn the footer status pill into a button bound to a command that opens the same dialog on demand (independent of the one-shot flag), so the user can reopen guidance anytime while offline.

Component 4 — Dev

No code change (see Non-Goals).

Data Flow

Startup (production):
  Windows logon -> Startup-folder .lnk -> ClaudeDo.Worker.exe (WinExe, mutex-guarded)
  App launches -> WorkerClient connects to 127.0.0.1:47821
    connected within grace      -> Online pill, no prompt
    still offline after ~12s    -> WorkerConnectionModal (once)

User clicks Offline pill (anytime offline) -> WorkerConnectionModal
  Start Worker     -> Process.Start(worker exe)
  Rerun Installer  -> Process.Start(installer), Environment.Exit(0)
  Dismiss          -> close

Error Handling

  • Worker exe / installer not found (Locator.Find() returns null): the corresponding dialog button is a no-op (consistent with existing UpdateNow behavior); the dialog stays open so the user can pick another action.
  • Startup-shortcut creation failure in the installer: surfaced as a failed install step (StepResult.Fail), same as the current task-registration failure path.
  • Legacy scheduled-task deletion is best-effort and never fails the install.

Testing

  • Installer.Tests: RegisterAutostartStep creates the Startup .lnk at the expected path with the correct target, and issues the legacy-task delete command. UninstallRunner removes the Startup .lnk.
  • Ui.Tests: prompt trigger logic — grace elapsed while offline shows the prompt exactly once; a connection established before grace suppresses it; the clickable-pill command opens the dialog regardless of the one-shot flag. (Abstract the dialog-show hook so it can be asserted without real UI.)
  • Manual: dialog buttons (Start Worker / Rerun Installer / Dismiss) and the clickable Offline pill in a running App.