Files
ClaudeDo/docs/superpowers/specs/2026-05-29-worker-per-user-autostart-design.md
mika kuns 26c4e5771b feat(worker): run worker as per-user logon task instead of Windows service
A LocalSystem Windows service can't see the logged-in user's Claude CLI
authentication, so the worker now runs as the current user via a hidden
per-user logon Scheduled Task with restart-on-failure.

- Worker is WinExe (no console window) with a Serilog rolling file sink and
  a single-instance mutex so the logon task, app ensure-running, and Restart
  button can't fight over the SignalR port.
- Installer replaces the service steps (register/start/stop) with autostart
  task steps, migrates the legacy ClaudeDoWorker service away on update, and
  removes the task on uninstall. ServicePage drops the service-account UI.
- UI gains a WorkerLocator; the app ensures the worker is running at startup
  and the Restart button kills+relaunches this install's worker process.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-30 09:39:41 +02:00

8.9 KiB

Worker per-user autostart (drop Windows service)

Status: approved 2026-05-29 Author: brainstorm session (mika kuns + Claude)

Problem

The worker runs as a Windows service registered under LocalSystem. The worker shells out to the claude CLI, whose authentication is stored per-user (%USERPROFILE%\.claude). Under LocalSystem the worker uses the system profile and cannot see the user's Claude login, so task execution fails. The installer even exposes a "Current User" service-account radio that the backend rejects (RegisterServiceStep fails the install). Net effect: the only installable configuration cannot authenticate Claude.

Goal

Run the worker as the logged-in user so it inherits the user's Claude auth, starting automatically at logon and staying alive in the background (independent of the desktop app, so Prime/scheduled tasks fire when the UI is closed).

Decisions (locked)

  1. Lifetime: background from logon, always — independent of the UI.
  2. Mechanism: per-user logon Scheduled Task (schtasks), run only when the user is logged on (no stored password), hidden, with restart-on-failure.
  3. No console window: worker becomes WinExe; add a Serilog rolling file sink so worker diagnostics aren't lost.
  4. App ensures running: "Restart Worker" becomes process-based; on app startup, if SignalR doesn't connect within a few seconds, the app launches the worker.
  5. Auto-migrate: the installer detects and removes the old ClaudeDoWorker service, then registers the task. Uninstall removes the task + kills the worker process.

Non-goals

  • Cross-account elevation (admin elevates as a different account than the interactive user). Single-user / user-is-admin is assumed; the task targets the interactive user.
  • Running the worker when no user is logged on (that's the whole point — it must be a user session for Claude auth).

Component changes

ClaudeDo.Worker

  • ClaudeDo.Worker.csproj: <OutputType>WinExe</OutputType>. Add packages Serilog.AspNetCore and Serilog.Sinks.File.
  • Program.cs:
    • Remove builder.Host.UseWindowsService(...).
    • Configure Serilog file sink: path <LogRoot>/worker-.log, rollingInterval: Day, retainedFileCountLimit: 7, shared write. LogRoot comes from WorkerConfig (expand ~). Wire via builder.Host.UseSerilog(...).
    • Single-instance guard: at startup create new Mutex(true, @"Local\ClaudeDoWorker", out var createdNew). If !createdNew, log "another worker instance is already running" and exit 0. Hold the mutex for process lifetime. Local\ namespace = per user session, which is what we want.
  • CLI preflight (ClaudeCliPreflight) behavior unchanged.

ClaudeDo.Installer

  • New Steps/RegisterAutostartStep.cs (IInstallStep, "Register Autostart"):
    • Build a Task Scheduler definition XML (UTF-16) and register via schtasks /Create /TN "ClaudeDoWorker" /XML "<tmpfile>" /F.
    • XML shape:
      • Principals/Principal: UserId = current interactive user (WindowsIdentity.GetCurrent().Name), LogonType=InteractiveToken, RunLevel=LeastPrivilege.
      • Triggers/LogonTrigger with the same UserId.
      • Settings: Hidden=true, MultipleInstancesPolicy=IgnoreNew, StartWhenAvailable=true, ExecutionTimeLimit=PT0S, DisallowStartIfOnBatteries=false, StopIfGoingOnBatteries=false, RestartOnFailure with Interval (>= PT1M; Task Scheduler's minimum granularity is one minute) and Count=3.
      • Actions/Exec/Command: quoted path to <installDir>/worker/ClaudeDo.Worker.exe.
    • The XML builder is a pure function (string in → XML string out) so it is unit testable without admin.
  • MigrateServiceStep (or folded into RegisterAutostartStep as a first phase): detect the old service via sc query ClaudeDoWorker; if present, sc stop then sc delete (poll for clearance like the old RegisterServiceStep did). No-op when the service doesn't exist (fresh installs).
  • Rename StopServiceStepStopWorkerStep, StartServiceStepStartWorkerStep, reworked to be process/task based:
    • Stop: schtasks /End /TN ClaudeDoWorker (ignore errors) + kill any ClaudeDo.Worker process whose MainModule.FileName is under the install dir; wait for exit. This unlocks worker/ binaries before extract.
    • Start: schtasks /Run /TN ClaudeDoWorker (preferred — launches as the task principal). Used by fresh install (so the worker runs immediately rather than waiting for next logon) and by Settings "restart".
  • Pages/ServicePage/ServicePageViewModel.cs: remove IsLocalSystem/IsCurrentUser radios and ServiceAccount usage. Keep SignalR port, Claude CLI path, "Start at logon" toggle (AutoStart), restart delay (maps to task RestartOnFailure/Interval, clamped to >= 1 min). Update ServicePageView.xaml accordingly. Remove ServiceAccount from InstallContext.
  • RegisterServiceStep.cs: deleted (replaced by RegisterAutostartStep).
  • Pipelines (InstallPageViewModel):
    • Fresh: DownloadAndExtract → WriteConfig → InitDatabase → RegisterAutostart (incl. migration no-op) → CreateShortcuts → WriteUninstallRegistry → WriteInstallManifest → StartWorker.
    • Update: StopWorker → DownloadAndExtract → RegisterAutostart (migrates old service) → StartWorker → WriteInstallManifest → WriteUninstallRegistry.
  • DI (App.xaml.cs): register the renamed/new steps (concrete + IInstallStep where needed, following the existing double-registration pattern).
  • Core/UninstallRunner.cs: replace sc delete ClaudeDoWorker with schtasks /Delete /TN ClaudeDoWorker /F and kill the worker process; also sc delete the legacy service best-effort (in case an old service still lingers).

ClaudeDo.Ui / ClaudeDo.App

  • New Services/WorkerLocator.cs: resolve <installDir>/worker/ClaudeDo.Worker.exe by walking up for install.json then registry InstallLocation (mirrors InstallerLocator).
  • ViewModels/IslandsShellViewModel.cs:
    • RestartWorkerService: drop System.ServiceProcess.ServiceController. Kill worker process(es) under the install dir, then Process.Start(workerExe).
    • Ensure-running: on startup, if the WorkerClient connection isn't established within ~4s, launch the worker via WorkerLocator + Process.Start. Guarded so it runs at most once per app session.
  • Remove the System.ServiceProcess package reference / usings if no longer used.

Data flow

  • Logon: Task Scheduler starts ClaudeDo.Worker.exe in the user session → mutex acquired → Serilog file logging → SignalR hub on 127.0.0.1:47821 → app connects.
  • App start with worker down: app waits ~4s for SignalR; if absent, Process.Start worker → mutex acquired → hub up → app connects.
  • Duplicate launch (task + app race): second instance fails the mutex → logs → exits 0.
  • Restart Worker button: kill worker proc → relaunch → mutex reacquired.

Error handling

  • schtasks/sc calls go through the existing ProcessRunner; non-zero exits surface as StepResult.Fail with the captured output (except best-effort cleanup which is ignored).
  • Worker single-instance: losing the mutex is a normal, non-error exit (code 0).
  • App ensure-running: Process.Start failures are swallowed (the logon task is the primary mechanism; the app launch is a convenience).

Testing

  • Unit (no admin required):
    • Task-definition XML builder: asserts UserId, LogonType, Hidden, RestartOnFailure interval clamping, quoted command path.
    • WorkerLocator: path resolution via temp install.json.
    • Migration decision: given sc query output (exists / not-found), decide stop+delete vs no-op — keep the decision pure, mock ProcessRunner output.
    • Restart-delay → task interval clamping (< 1 minPT1M).
  • Manual verification (post-build, on this machine):
    1. Update from installed 1.0.2-alpha: old service is removed (sc query ClaudeDoWorker → not found), task exists (schtasks /Query /TN ClaudeDoWorker), worker process runs as the user, app connects, no console window.
    2. Worker log file appears at ~/.todo-app/logs/worker-<date>.log.
    3. Kill worker → click Restart Worker in app → reconnects.
    4. Close app, confirm worker still running (Prime/queue alive); reopen app → connects.
    5. Log off / log on → worker autostarts.
    6. Uninstall → task gone, worker process gone, (data kept unless opted out).

Risks

  • Task restart granularity is minutes, not the old seconds-level service restart. The worker's own long-running resilience + the app ensure-running cover short gaps; acceptable.
  • Elevated installer must target the interactive user. Using WindowsIdentity.GetCurrent().Name is correct when the user elevates themselves (the assumed single-user case). Documented non-goal otherwise.