From cf7a6e413c2a4206d4d98f74e247f70feb960cec Mon Sep 17 00:00:00 2001 From: Mika Kuns Date: Mon, 27 Apr 2026 10:52:55 +0200 Subject: [PATCH] docs(superpowers): add session prompts for worker state consolidation slices 2-6 Self-contained prompts to paste into fresh sessions, one per remaining slice. Each prompt includes scope, allowed transitions, caller-migration list, test expectations, and the conventional-commit message to use. --- ...ker-state-consolidation-session-prompts.md | 225 ++++++++++++++++++ 1 file changed, 225 insertions(+) create mode 100644 docs/superpowers/plans/2026-04-27-worker-state-consolidation-session-prompts.md diff --git a/docs/superpowers/plans/2026-04-27-worker-state-consolidation-session-prompts.md b/docs/superpowers/plans/2026-04-27-worker-state-consolidation-session-prompts.md new file mode 100644 index 0000000..ada1dd7 --- /dev/null +++ b/docs/superpowers/plans/2026-04-27-worker-state-consolidation-session-prompts.md @@ -0,0 +1,225 @@ +# Session Prompts — Worker State & Queue Consolidation Slices 2–6 + +Paste-ready prompts for each remaining slice. Run **one slice per session** so the diff stays reviewable and tests stay green between commits. Spec lives at `docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md` — reference it when the prompt asks. + +**Common ground rules** (carry across all slices): + +- Direct on `main`, one commit per slice, conventional commit messages. +- Build green (`dotnet build src/ClaudeDo.Worker/ClaudeDo.Worker.csproj` + Data + Ui) before commit. +- Pre-existing test errors (TaskRunner/WorkerHub constructor drift in 4 test files) are **not** in scope to fix — they exist on `main` already. New compile errors my changes introduce ARE in scope. +- No drive-by refactors outside the slice's stated scope. +- New files must follow existing naming/folder conventions; legacy enum values stay until Slice 6. +- After each slice, update `~/.claude/projects/C--Private-ClaudeDo/memory/` if I learn something durable about the codebase. + +--- + +## Slice 2 — `TaskStateService` (centralized state machine) + +**Prompt to paste into a fresh session:** + +> Slice 2 of the worker state consolidation refactor. Spec: `docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md` (sections 2 and 8). Slice 1 already landed (commit 7b737e6) — `TaskStatus` has `Idle`/`Cancelled`, `PlanningPhase` enum exists, `BlockedByTaskId` field exists. Legacy enum values still around. +> +> **Goal:** introduce `Worker/State/ITaskStateService` + `TaskStateService` as the single component that mutates `Status`, `PlanningPhase`, `BlockedByTaskId`. Migrate every existing caller. Mark repo `Mark*Async` helpers `internal`. +> +> **Public surface (verbatim from spec):** +> ```csharp +> Task EnqueueAsync(string taskId, CancellationToken ct); +> Task StartRunningAsync(string taskId, DateTime startedAt, CancellationToken ct); +> Task CompleteAsync(string taskId, DateTime finishedAt, string? result, CancellationToken ct); +> Task FailAsync(string taskId, DateTime finishedAt, string? error, CancellationToken ct); +> Task CancelAsync(string taskId, DateTime finishedAt, CancellationToken ct); +> Task ResetToIdleAsync(string taskId, CancellationToken ct); +> Task StartPlanningAsync(string parentId, CancellationToken ct); +> Task FinalizePlanningAsync(string parentId, CancellationToken ct); +> Task BlockOnAsync(string taskId, string predecessorTaskId, CancellationToken ct); +> Task UnblockAsync(string taskId, CancellationToken ct); +> Task RecoverStaleRunningAsync(string reason, CancellationToken ct); +> ``` +> +> **Allowed transition table:** see spec §2. Reject invalid transitions with `TransitionResult(false, "")` — no exceptions. Each transition is one atomic `ExecuteUpdate` with `WHERE Status = ` for TOCTOU-freedom. +> +> **Side effects after successful DB write** (do these inside the service so callers don't need to remember): +> - On any `→ Queued`: call `_queue.WakeQueue()` directly for now (Slice 3 will replace with `IQueueWaker`). Inject `QueueService` lazily via `Func` to break the DI cycle if needed. +> - On any successful transition: `_broadcaster.TaskUpdated(taskId)`. +> - On `Done`/`Failed`/`Cancelled` for a child task: invoke `_chain.OnChildFinishedAsync(taskId, finalStatus, ct)`. If it returns a next-task-id, call `UnblockAsync` on it. Then run `_repo.TryCompleteParentAsync(parentId, ct)`. +> +> **Important:** `BlockOnAsync` and `UnblockAsync` should write `BlockedByTaskId` directly. `EnqueueAsync` for a Planning child should keep `BlockedByTaskId` null when it's the head of the chain. The chain coordinator will compose these calls in Slice 4 — for now just expose the API. +> +> **Caller migration (mechanical — preserve current behavior):** +> - `TaskRunner.HandleSuccess` → replace `taskRepo.MarkDoneAsync` + `TryCompleteParentAsync` + `_chain.OnChildFinishedAsync` block with a single `_state.CompleteAsync(taskId, finishedAt, result, CancellationToken.None)`. +> - `TaskRunner.HandleFailure` → `_state.FailAsync(taskId, finishedAt, errorMarkdown, CancellationToken.None)`. +> - `TaskRunner.MarkFailed` (early-fail path) → same. +> - `TaskRunner.RunAsync` start of run → `_state.StartRunningAsync(taskId, startedAt, ct)`. +> - `StaleTaskRecovery.StartAsync` → `_state.RecoverStaleRunningAsync("worker restart", ct)`. +> - `TaskResetService.ResetAsync` → `_state.ResetToIdleAsync(taskId, ct)` for the status flip; service keeps owning worktree cleanup. +> - `PlanningSessionManager.StartAsync` (the `SetPlanningStartedAsync` call) → `_state.StartPlanningAsync(parentId, ct)`. The manager still owns token/session-dir setup; only the status flip moves. +> - `PlanningChainCoordinator.OnChildFinishedAsync` (the `next.Status = TaskStatus.Queued` write) → keep its existing logic but use `_state.UnblockAsync(next.Id, ct)` for the actual write. The Slice 4 rewrite finishes the rest. +> - `ExternalMcpService.UpdateTaskStatus` (status flip in the Queued case) → `_state.EnqueueAsync(taskId, ct)`. The Manual case stays as-is until Slice 6 since `Manual` is still a valid legacy value. +> +> **Repo helpers to mark `internal`:** `MarkRunningAsync`, `MarkDoneAsync`, `MarkFailedAsync`, `FlipAllRunningToFailedAsync`. Verify nothing outside `ClaudeDo.Worker.State` calls them after migration. (`Worker.Tests` may need `InternalsVisibleTo` — add it if so.) +> +> **DI wiring:** register `TaskStateService` as Singleton in `Program.cs` for both the main app and the external-MCP app. The service holds no per-request state. +> +> **Tests:** new file `tests/ClaudeDo.Worker.Tests/State/TaskStateServiceTests.cs`. At minimum: +> - Happy path for each transition (verify DB state + side-effect mocks invoked). +> - Reject path for each invalid transition (verify result + DB unchanged). +> - Concurrency: two parallel `StartRunningAsync` for the same `Queued` task → exactly one returns `Ok=true`. +> - Mock or fake the broadcaster, queue, and chain-coordinator dependencies. Use real SQLite for the DB (existing test pattern). +> +> Build all projects, run the worker test project (the 4 pre-existing constructor-drift errors are out of scope — but my changes shouldn't add new errors), commit as `refactor(worker/state): introduce TaskStateService and route mutations through it`. + +--- + +## Slice 3 — `IQueueWaker` + `IQueuePicker` + +**Prompt to paste into a fresh session:** + +> Slice 3 of the worker state consolidation refactor. Spec: `docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md` (section 3). Slices 1 and 2 already landed. +> +> **Goal:** extract queue-wake and queue-pick from `QueueService` and `TaskRepository` into dedicated single-responsibility components. Make wakes automatic. +> +> **New components in `Worker/Queue/`:** +> - `IQueueWaker` (interface, `void Wake()`). Backed by `QueueWaker` singleton holding the existing `SemaphoreSlim`. Inject into `TaskStateService` (replaces the direct `QueueService` ref from Slice 2) and into `QueueService` itself. +> - `IQueuePicker` with `Task ClaimNextAsync(DateTime now, CancellationToken ct)`. Implementation `QueuePicker` moves the raw SQL out of `TaskRepository.GetNextQueuedAgentTaskAsync` and **adds a `blocked_by_task_id IS NULL` filter to the WHERE clause**. Order stays `sort_order ASC, created_at ASC` (verify the existing query — add ORDER BY if missing). Atomic `UPDATE … RETURNING` flips `Queued → Running` and writes `started_at`. +> +> **Caller updates:** +> - `TaskStateService` swaps its `Func` for `IQueueWaker`. The `→ Queued` side-effect now calls `_waker.Wake()`. +> - `QueueService.ExecuteAsync` calls `_picker.ClaimNextAsync` instead of `_taskRepo.GetNextQueuedAgentTaskAsync`. The slot-claim, broadcaster, and `WakeQueue()` after slot release stay where they are. +> - `WorkerHub.WakeQueue()` and `ExternalMcpService.WakeQueue` calls in app code → remove the explicit invocations. The state-service triggers waking automatically. **Keep** the SignalR/MCP endpoint that exposes `WakeQueue()` for diagnostics/manual use — that one delegates to `_waker.Wake()`. +> - `TaskRepository.GetNextQueuedAgentTaskAsync` becomes a thin shim that forwards to `IQueuePicker` for any remaining tests, OR delete it and update tests to use the picker. Prefer delete if tests are easy to migrate. +> +> **Tests:** new `tests/ClaudeDo.Worker.Tests/Queue/QueuePickerTests.cs`: +> - Skipped: `BlockedByTaskId` set; missing agent tag; `scheduled_for > now`; status not Queued. +> - Picked: correct order (`sort_order, created_at`). +> - Atomic claim: two parallel pickers → exactly one row returned non-null, the other null. +> +> Update existing `TaskRepositoryTests.GetNextQueuedAgentTaskAsync_*` tests if they exercised the removed method. +> +> Build, test, commit as `refactor(worker/queue): split queue waker and picker, auto-wake on enqueue`. + +--- + +## Slice 4 — Planning flow consolidation (kills the original bug) + +**Prompt to paste into a fresh session:** + +> Slice 4 of the worker state consolidation refactor. Spec: `docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md` (section 4). Slices 1–3 already landed. **This slice eliminates the original "queue never picks up planning tasks" bug structurally.** +> +> **Goal:** one path through planning. Delete the dual-flow problem. +> +> **Changes:** +> - **Delete** `TaskRepository.FinalizePlanningAsync` entirely. Also delete its tests in `TaskRepositoryPlanningTests.cs`. +> - **Rewrite** `PlanningSessionManager.FinalizeAsync(taskId, queueAgentTasks, ct)`: +> 1. `_state.FinalizePlanningAsync(parentId, ct)` (sets parent `PlanningPhase=Finalized`, `Status=Idle`). +> 2. If `queueAgentTasks` is true, call the new `_chainCoordinator.SetupChainAsync(parentId, ct)`. +> 3. Existing worktree-cleanup + session-dir-deletion remains. +> 4. Return the count of children that ended up in the chain. +> - **Rename** `PlanningChainCoordinator.QueueSubtasksSequentiallyAsync` → `SetupChainAsync`. Make it `internal`. New behavior: +> - Eligibility check: children must be in `Status=Idle` (was `Manual` or `Planned` legacy values — keep tolerating those for one slice via OR). +> - Auto-attach `agent` tag to all children (already in WIP — keep that behavior). +> - For first child: `_state.EnqueueAsync(child[0].Id, ct)` (no BlockedBy, head of chain). +> - For rest: `_state.EnqueueAsync(child[i].Id, ct)` followed immediately by `_state.BlockOnAsync(child[i].Id, child[i-1].Id, ct)`. (Or: add a single `EnqueueBlockedAsync` helper to TaskStateService if call-site clutter bothers you.) +> - **Update** `PlanningChainCoordinator.OnChildFinishedAsync`: replace status-via-LINQ logic with: query for the next child where `BlockedByTaskId == childTaskId`, call `_state.UnblockAsync` on it. Drop the `Waiting` lookup entirely. +> - Audit `Status == TaskStatus.Waiting` in UI/tests — replace with `Status == Queued && BlockedByTaskId != null`. (UI changes confirmed against `TaskRowViewModel`, `TasksIslandViewModel` from Slice 1's WIP.) +> +> **Regression test:** new `tests/ClaudeDo.Worker.Tests/Planning/PlanningEndToEndTests.cs` (or extend existing) — `Active` parent + 3 drafts → call `FinalizeAsync(queueAgentTasks: true)` → assert within 200 ms the first child has `Status=Running` (queue picker claimed it) without anyone calling `WakeQueue()` manually. This was the bug the user originally reported. +> +> **Update** `PlanningMcpService.EditableStatuses` — replace `Waiting` with `Queued` (since blocked tasks are now `Queued + BlockedByTaskId`). Verify the MCP tool still gates on `parent.PlanningPhase == Active` (legacy: `parent.Status == Planning`). +> +> Build, test, commit as `feat(planning): consolidate finalize+chain via TaskStateService, fix queue pickup`. + +--- + +## Slice 5 — `OverrideSlotService` + folder reorg + +**Prompt to paste into a fresh session:** + +> Slice 5 of the worker state consolidation refactor. Spec: `docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md` (section 5). Slices 1–4 already landed. +> +> **Goal:** split the override slot out of QueueService and reorganize `Worker/Services/` into domain folders. +> +> **`OverrideSlotService` (new in `Worker/Queue/`):** +> - Owns the `_overrideSlot` field, `RunNow(taskId)`, `ContinueTask(taskId, followUpPrompt)`, and the override-slot piece of `CancelTask`. +> - Status mutations go through `TaskStateService.StartRunningAsync` (non-atomic claim is fine; serialized by slot lock). +> - `QueueService.CancelTask` delegates to `OverrideSlotService.TryCancel` first, falls back to its own queue slot. +> - WorkerHub's `RunNow`/`ContinueTask`/`CancelTask` SignalR endpoints route to the new service via `OverrideSlotService` when applicable; keep the signatures stable. +> +> **Folder reorg** (use `git mv`, don't copy/delete): +> ``` +> Worker/State/ ← ITaskStateService.cs, TaskStateService.cs, TransitionResult.cs (already exist; no move needed if already there) +> Worker/Queue/ ← IQueueWaker.cs, QueueWaker.cs, IQueuePicker.cs, QueuePicker.cs, QueueService.cs, OverrideSlotService.cs, QueueSlotState.cs +> Worker/Lifecycle/ ← StaleTaskRecovery.cs, TaskResetService.cs, TaskMergeService.cs +> Worker/Worktrees/ ← WorktreeMaintenanceService.cs +> Worker/Agents/ ← AgentFileService.cs, DefaultAgentSeeder.cs +> Worker/Runner/ ← unchanged +> Worker/Planning/ ← unchanged +> Worker/External/ ← unchanged +> Worker/Hub/ ← unchanged +> ``` +> +> Update namespaces to match folders (existing convention: namespace == folder path under `ClaudeDo.Worker`). Delete the old `Worker/Services/` folder once empty. +> +> Update DI registrations in `Program.cs` (both apps) — most calls just need `using` updates. `OverrideSlotService` is a new singleton. +> +> Update test `using` statements to follow. +> +> Build, test, commit as `refactor(worker): extract OverrideSlotService and reorganize Worker/Services into domain folders`. + +--- + +## Slice 6 — Cleanup, legacy retirement, docs + +**Prompt to paste into a fresh session:** + +> Slice 6 (final) of the worker state consolidation refactor. Spec: `docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md` (section 6 + slice plan). Slices 1–5 already landed. +> +> **Goal:** retire legacy enum values, backfill DB rows, update docs. +> +> **EF migration `RetireLegacyTaskStatus`:** +> ```sql +> UPDATE tasks SET status='idle' WHERE status IN ('manual', 'draft'); +> UPDATE tasks SET status='idle', planning_phase='active' WHERE status='planning'; +> UPDATE tasks SET status='idle', planning_phase='finalized' WHERE status='planned'; +> +> -- Waiting → Queued + blocked_by from sort_order: +> WITH ordered AS ( +> SELECT id, +> LAG(id) OVER (PARTITION BY parent_task_id ORDER BY sort_order, created_at) AS prev_id +> FROM tasks WHERE status='waiting' +> ) +> UPDATE tasks +> SET status='queued', +> blocked_by_task_id=(SELECT prev_id FROM ordered WHERE ordered.id=tasks.id) +> WHERE id IN (SELECT id FROM ordered); +> ``` +> Use `migrationBuilder.Sql(...)` for these. Down() is best-effort: `Cancelled` → `Failed`, `(idle, finalized)` → `planned`, `(idle, active)` → `planning`, `queued + blocked_by_task_id != null` → `waiting`. Document lossiness in a comment. +> +> **Code changes:** +> - Remove legacy values from `TaskStatus` enum: `Manual, Planning, Planned, Draft, Waiting`. +> - Strip the legacy branches from `TaskEntityConfiguration.StatusToString`/`StatusFromString`. +> - Default for `TaskEntity.Status` is `TaskStatus.Idle` (already correct after Slice 1's revert). +> - Audit + remap every remaining caller — they should already use new values from Slices 2–4, but search for any leftover `TaskStatus.Manual` etc. in: +> - tests (~10 files seed status — flip to `Idle`/`Queued`/etc.) +> - UI (`TaskRowViewModel.IsPlanningParent`, `IsDraft`, `CanOpenPlanningSession`, status maps — replace with `PlanningPhase` checks where appropriate) +> - any leftover guards in MCP/services +> - Mark `Mark*Async` repo helpers as `internal` if not already (Slice 2 should have done this — verify). +> +> **Docs to update:** +> - `src/ClaudeDo.Worker/CLAUDE.md` — new folder structure, new state-service flow, new wake mechanics, removal of legacy values. +> - `src/ClaudeDo.Data/CLAUDE.md` — TaskEntity new fields (`PlanningPhase`, `BlockedByTaskId`), retired legacy enum values, new tag-attach behavior. +> - `docs/plan.md` — update status flow section. +> - `docs/open.md` — close the "queue doesn't pick up planning tasks" item if it's tracked there; add any follow-ups discovered along the way. +> - Memory: update `~/.claude/projects/C--Private-ClaudeDo/memory/` with a new entry summarizing the new architecture (state-service + queue split + planning chain via blocked-by). +> +> **Sanity tests** — full test run. The 4 pre-existing constructor-drift errors should still be the only failures. If new ones surfaced from missed legacy-value remappings, fix them before commit. +> +> Build, full test run, commit as `refactor(data): retire legacy TaskStatus values and backfill existing rows`. + +--- + +## After Slice 6 + +- All 6 slices on `main`. +- The original bug ("queue doesn't pick up planning tasks") is structurally impossible. +- Worker has clear domain folders, single state-mutator, single queue-picker. +- Spec doc + this prompt file can be deleted or moved to `docs/superpowers/done/`.