feat(worker): run worker as per-user logon task instead of Windows service
A LocalSystem Windows service can't see the logged-in user's Claude CLI authentication, so the worker now runs as the current user via a hidden per-user logon Scheduled Task with restart-on-failure. - Worker is WinExe (no console window) with a Serilog rolling file sink and a single-instance mutex so the logon task, app ensure-running, and Restart button can't fight over the SignalR port. - Installer replaces the service steps (register/start/stop) with autostart task steps, migrates the legacy ClaudeDoWorker service away on update, and removes the task on uninstall. ServicePage drops the service-account UI. - UI gains a WorkerLocator; the app ensures the worker is running at startup and the Restart button kills+relaunches this install's worker process. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,165 @@
|
||||
# Worker per-user autostart (drop Windows service)
|
||||
|
||||
Status: approved 2026-05-29
|
||||
Author: brainstorm session (mika kuns + Claude)
|
||||
|
||||
## Problem
|
||||
|
||||
The worker runs as a Windows **service** registered under `LocalSystem`. The worker
|
||||
shells out to the `claude` CLI, whose authentication is stored per-user
|
||||
(`%USERPROFILE%\.claude`). Under `LocalSystem` the worker uses the system profile and
|
||||
cannot see the user's Claude login, so task execution fails. The installer even exposes a
|
||||
"Current User" service-account radio that the backend rejects (`RegisterServiceStep`
|
||||
fails the install). Net effect: the only installable configuration cannot authenticate
|
||||
Claude.
|
||||
|
||||
## Goal
|
||||
|
||||
Run the worker as the logged-in **user** so it inherits the user's Claude auth, starting
|
||||
automatically at logon and staying alive in the background (independent of the desktop
|
||||
app, so Prime/scheduled tasks fire when the UI is closed).
|
||||
|
||||
## Decisions (locked)
|
||||
|
||||
1. **Lifetime:** background from logon, always — independent of the UI.
|
||||
2. **Mechanism:** per-user **logon Scheduled Task** (`schtasks`), run only when the user is
|
||||
logged on (no stored password), hidden, with restart-on-failure.
|
||||
3. **No console window:** worker becomes `WinExe`; add a **Serilog rolling file sink** so
|
||||
worker diagnostics aren't lost.
|
||||
4. **App ensures running:** "Restart Worker" becomes process-based; on app startup, if
|
||||
SignalR doesn't connect within a few seconds, the app launches the worker.
|
||||
5. **Auto-migrate:** the installer detects and removes the old `ClaudeDoWorker` service,
|
||||
then registers the task. Uninstall removes the task + kills the worker process.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- Cross-account elevation (admin elevates as a *different* account than the interactive
|
||||
user). Single-user / user-is-admin is assumed; the task targets the interactive user.
|
||||
- Running the worker when no user is logged on (that's the whole point — it must be a user
|
||||
session for Claude auth).
|
||||
|
||||
---
|
||||
|
||||
## Component changes
|
||||
|
||||
### ClaudeDo.Worker
|
||||
|
||||
- **`ClaudeDo.Worker.csproj`**: `<OutputType>WinExe</OutputType>`. Add packages
|
||||
`Serilog.AspNetCore` and `Serilog.Sinks.File`.
|
||||
- **`Program.cs`**:
|
||||
- Remove `builder.Host.UseWindowsService(...)`.
|
||||
- Configure Serilog file sink: path `<LogRoot>/worker-.log`, `rollingInterval: Day`,
|
||||
`retainedFileCountLimit: 7`, shared write. `LogRoot` comes from `WorkerConfig`
|
||||
(expand `~`). Wire via `builder.Host.UseSerilog(...)`.
|
||||
- **Single-instance guard:** at startup create `new Mutex(true, @"Local\ClaudeDoWorker",
|
||||
out var createdNew)`. If `!createdNew`, log "another worker instance is already
|
||||
running" and exit 0. Hold the mutex for process lifetime. `Local\` namespace = per
|
||||
user session, which is what we want.
|
||||
- CLI preflight (`ClaudeCliPreflight`) behavior unchanged.
|
||||
|
||||
### ClaudeDo.Installer
|
||||
|
||||
- **New `Steps/RegisterAutostartStep.cs`** (`IInstallStep`, "Register Autostart"):
|
||||
- Build a Task Scheduler **definition XML** (UTF-16) and register via
|
||||
`schtasks /Create /TN "ClaudeDoWorker" /XML "<tmpfile>" /F`.
|
||||
- XML shape:
|
||||
- `Principals/Principal`: `UserId` = current interactive user
|
||||
(`WindowsIdentity.GetCurrent().Name`), `LogonType=InteractiveToken`,
|
||||
`RunLevel=LeastPrivilege`.
|
||||
- `Triggers/LogonTrigger` with the same `UserId`.
|
||||
- `Settings`: `Hidden=true`, `MultipleInstancesPolicy=IgnoreNew`,
|
||||
`StartWhenAvailable=true`, `ExecutionTimeLimit=PT0S`,
|
||||
`DisallowStartIfOnBatteries=false`, `StopIfGoingOnBatteries=false`,
|
||||
`RestartOnFailure` with `Interval` (>= `PT1M`; Task Scheduler's minimum granularity
|
||||
is one minute) and `Count=3`.
|
||||
- `Actions/Exec/Command`: quoted path to `<installDir>/worker/ClaudeDo.Worker.exe`.
|
||||
- The XML builder is a **pure function** (string in → XML string out) so it is unit
|
||||
testable without admin.
|
||||
- **`MigrateServiceStep`** (or folded into `RegisterAutostartStep` as a first phase):
|
||||
detect the old service via `sc query ClaudeDoWorker`; if present, `sc stop` then
|
||||
`sc delete` (poll for clearance like the old `RegisterServiceStep` did). No-op when the
|
||||
service doesn't exist (fresh installs).
|
||||
- **Rename `StopServiceStep` → `StopWorkerStep`, `StartServiceStep` → `StartWorkerStep`**,
|
||||
reworked to be process/task based:
|
||||
- Stop: `schtasks /End /TN ClaudeDoWorker` (ignore errors) + kill any
|
||||
`ClaudeDo.Worker` process whose `MainModule.FileName` is under the install dir;
|
||||
wait for exit. This unlocks `worker/` binaries before extract.
|
||||
- Start: `schtasks /Run /TN ClaudeDoWorker` (preferred — launches as the task principal).
|
||||
Used by fresh install (so the worker runs immediately rather than waiting for next
|
||||
logon) and by Settings "restart".
|
||||
- **`Pages/ServicePage/ServicePageViewModel.cs`**: remove `IsLocalSystem`/`IsCurrentUser`
|
||||
radios and `ServiceAccount` usage. Keep SignalR port, Claude CLI path, "Start at logon"
|
||||
toggle (`AutoStart`), restart delay (maps to task `RestartOnFailure/Interval`, clamped
|
||||
to >= 1 min). Update `ServicePageView.xaml` accordingly. Remove `ServiceAccount` from
|
||||
`InstallContext`.
|
||||
- **`RegisterServiceStep.cs`**: deleted (replaced by `RegisterAutostartStep`).
|
||||
- **Pipelines (`InstallPageViewModel`)**:
|
||||
- Fresh: DownloadAndExtract → WriteConfig → InitDatabase → **RegisterAutostart** (incl.
|
||||
migration no-op) → CreateShortcuts → WriteUninstallRegistry → WriteInstallManifest →
|
||||
**StartWorker**.
|
||||
- Update: **StopWorker** → DownloadAndExtract → **RegisterAutostart** (migrates old
|
||||
service) → **StartWorker** → WriteInstallManifest → WriteUninstallRegistry.
|
||||
- **DI (`App.xaml.cs`)**: register the renamed/new steps (concrete + `IInstallStep` where
|
||||
needed, following the existing double-registration pattern).
|
||||
- **`Core/UninstallRunner.cs`**: replace `sc delete ClaudeDoWorker` with
|
||||
`schtasks /Delete /TN ClaudeDoWorker /F` and kill the worker process; also `sc delete`
|
||||
the legacy service best-effort (in case an old service still lingers).
|
||||
|
||||
### ClaudeDo.Ui / ClaudeDo.App
|
||||
|
||||
- **New `Services/WorkerLocator.cs`**: resolve `<installDir>/worker/ClaudeDo.Worker.exe`
|
||||
by walking up for `install.json` then registry `InstallLocation` (mirrors
|
||||
`InstallerLocator`).
|
||||
- **`ViewModels/IslandsShellViewModel.cs`**:
|
||||
- `RestartWorkerService`: drop `System.ServiceProcess.ServiceController`. Kill worker
|
||||
process(es) under the install dir, then `Process.Start(workerExe)`.
|
||||
- **Ensure-running:** on startup, if the `WorkerClient` connection isn't established
|
||||
within ~4s, launch the worker via `WorkerLocator` + `Process.Start`. Guarded so it
|
||||
runs at most once per app session.
|
||||
- Remove the `System.ServiceProcess` package reference / usings if no longer used.
|
||||
|
||||
---
|
||||
|
||||
## Data flow
|
||||
|
||||
- **Logon:** Task Scheduler starts `ClaudeDo.Worker.exe` in the user session → mutex
|
||||
acquired → Serilog file logging → SignalR hub on `127.0.0.1:47821` → app connects.
|
||||
- **App start with worker down:** app waits ~4s for SignalR; if absent, `Process.Start`
|
||||
worker → mutex acquired → hub up → app connects.
|
||||
- **Duplicate launch (task + app race):** second instance fails the mutex → logs → exits 0.
|
||||
- **Restart Worker button:** kill worker proc → relaunch → mutex reacquired.
|
||||
|
||||
## Error handling
|
||||
|
||||
- `schtasks`/`sc` calls go through the existing `ProcessRunner`; non-zero exits surface as
|
||||
`StepResult.Fail` with the captured output (except best-effort cleanup which is ignored).
|
||||
- Worker single-instance: losing the mutex is a normal, non-error exit (code 0).
|
||||
- App ensure-running: `Process.Start` failures are swallowed (the logon task is the primary
|
||||
mechanism; the app launch is a convenience).
|
||||
|
||||
## Testing
|
||||
|
||||
- **Unit (no admin required):**
|
||||
- Task-definition XML builder: asserts UserId, LogonType, Hidden, RestartOnFailure
|
||||
interval clamping, quoted command path.
|
||||
- `WorkerLocator`: path resolution via temp `install.json`.
|
||||
- Migration decision: given `sc query` output (exists / not-found), decide stop+delete vs
|
||||
no-op — keep the decision pure, mock `ProcessRunner` output.
|
||||
- Restart-delay → task interval clamping (`< 1 min` → `PT1M`).
|
||||
- **Manual verification (post-build, on this machine):**
|
||||
1. Update from installed `1.0.2-alpha`: old service is removed (`sc query ClaudeDoWorker`
|
||||
→ not found), task exists (`schtasks /Query /TN ClaudeDoWorker`), worker process runs
|
||||
as the user, app connects, no console window.
|
||||
2. Worker log file appears at `~/.todo-app/logs/worker-<date>.log`.
|
||||
3. Kill worker → click Restart Worker in app → reconnects.
|
||||
4. Close app, confirm worker still running (Prime/queue alive); reopen app → connects.
|
||||
5. Log off / log on → worker autostarts.
|
||||
6. Uninstall → task gone, worker process gone, (data kept unless opted out).
|
||||
|
||||
## Risks
|
||||
|
||||
- **Task restart granularity is minutes**, not the old seconds-level service restart. The
|
||||
worker's own long-running resilience + the app ensure-running cover short gaps; acceptable.
|
||||
- **Elevated installer must target the interactive user.** Using
|
||||
`WindowsIdentity.GetCurrent().Name` is correct when the user elevates themselves (the
|
||||
assumed single-user case). Documented non-goal otherwise.
|
||||
Reference in New Issue
Block a user