docs(superpowers): add session prompts for worker state consolidation slices 2-6

Self-contained prompts to paste into fresh sessions, one per remaining slice.
Each prompt includes scope, allowed transitions, caller-migration list, test
expectations, and the conventional-commit message to use.
This commit is contained in:
Mika Kuns
2026-04-27 10:52:55 +02:00
parent 7b737e6717
commit cf7a6e413c

View File

@@ -0,0 +1,225 @@
# Session Prompts — Worker State & Queue Consolidation Slices 26
Paste-ready prompts for each remaining slice. Run **one slice per session** so the diff stays reviewable and tests stay green between commits. Spec lives at `docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md` — reference it when the prompt asks.
**Common ground rules** (carry across all slices):
- Direct on `main`, one commit per slice, conventional commit messages.
- Build green (`dotnet build src/ClaudeDo.Worker/ClaudeDo.Worker.csproj` + Data + Ui) before commit.
- Pre-existing test errors (TaskRunner/WorkerHub constructor drift in 4 test files) are **not** in scope to fix — they exist on `main` already. New compile errors my changes introduce ARE in scope.
- No drive-by refactors outside the slice's stated scope.
- New files must follow existing naming/folder conventions; legacy enum values stay until Slice 6.
- After each slice, update `~/.claude/projects/C--Private-ClaudeDo/memory/` if I learn something durable about the codebase.
---
## Slice 2 — `TaskStateService` (centralized state machine)
**Prompt to paste into a fresh session:**
> Slice 2 of the worker state consolidation refactor. Spec: `docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md` (sections 2 and 8). Slice 1 already landed (commit 7b737e6) — `TaskStatus` has `Idle`/`Cancelled`, `PlanningPhase` enum exists, `BlockedByTaskId` field exists. Legacy enum values still around.
>
> **Goal:** introduce `Worker/State/ITaskStateService` + `TaskStateService` as the single component that mutates `Status`, `PlanningPhase`, `BlockedByTaskId`. Migrate every existing caller. Mark repo `Mark*Async` helpers `internal`.
>
> **Public surface (verbatim from spec):**
> ```csharp
> Task<TransitionResult> EnqueueAsync(string taskId, CancellationToken ct);
> Task<TransitionResult> StartRunningAsync(string taskId, DateTime startedAt, CancellationToken ct);
> Task<TransitionResult> CompleteAsync(string taskId, DateTime finishedAt, string? result, CancellationToken ct);
> Task<TransitionResult> FailAsync(string taskId, DateTime finishedAt, string? error, CancellationToken ct);
> Task<TransitionResult> CancelAsync(string taskId, DateTime finishedAt, CancellationToken ct);
> Task<TransitionResult> ResetToIdleAsync(string taskId, CancellationToken ct);
> Task<TransitionResult> StartPlanningAsync(string parentId, CancellationToken ct);
> Task<TransitionResult> FinalizePlanningAsync(string parentId, CancellationToken ct);
> Task<TransitionResult> BlockOnAsync(string taskId, string predecessorTaskId, CancellationToken ct);
> Task<TransitionResult> UnblockAsync(string taskId, CancellationToken ct);
> Task<int> RecoverStaleRunningAsync(string reason, CancellationToken ct);
> ```
>
> **Allowed transition table:** see spec §2. Reject invalid transitions with `TransitionResult(false, "<reason>")` — no exceptions. Each transition is one atomic `ExecuteUpdate` with `WHERE Status = <expected>` for TOCTOU-freedom.
>
> **Side effects after successful DB write** (do these inside the service so callers don't need to remember):
> - On any `→ Queued`: call `_queue.WakeQueue()` directly for now (Slice 3 will replace with `IQueueWaker`). Inject `QueueService` lazily via `Func<QueueService>` to break the DI cycle if needed.
> - On any successful transition: `_broadcaster.TaskUpdated(taskId)`.
> - On `Done`/`Failed`/`Cancelled` for a child task: invoke `_chain.OnChildFinishedAsync(taskId, finalStatus, ct)`. If it returns a next-task-id, call `UnblockAsync` on it. Then run `_repo.TryCompleteParentAsync(parentId, ct)`.
>
> **Important:** `BlockOnAsync` and `UnblockAsync` should write `BlockedByTaskId` directly. `EnqueueAsync` for a Planning child should keep `BlockedByTaskId` null when it's the head of the chain. The chain coordinator will compose these calls in Slice 4 — for now just expose the API.
>
> **Caller migration (mechanical — preserve current behavior):**
> - `TaskRunner.HandleSuccess` → replace `taskRepo.MarkDoneAsync` + `TryCompleteParentAsync` + `_chain.OnChildFinishedAsync` block with a single `_state.CompleteAsync(taskId, finishedAt, result, CancellationToken.None)`.
> - `TaskRunner.HandleFailure` → `_state.FailAsync(taskId, finishedAt, errorMarkdown, CancellationToken.None)`.
> - `TaskRunner.MarkFailed` (early-fail path) → same.
> - `TaskRunner.RunAsync` start of run → `_state.StartRunningAsync(taskId, startedAt, ct)`.
> - `StaleTaskRecovery.StartAsync` → `_state.RecoverStaleRunningAsync("worker restart", ct)`.
> - `TaskResetService.ResetAsync` → `_state.ResetToIdleAsync(taskId, ct)` for the status flip; service keeps owning worktree cleanup.
> - `PlanningSessionManager.StartAsync` (the `SetPlanningStartedAsync` call) → `_state.StartPlanningAsync(parentId, ct)`. The manager still owns token/session-dir setup; only the status flip moves.
> - `PlanningChainCoordinator.OnChildFinishedAsync` (the `next.Status = TaskStatus.Queued` write) → keep its existing logic but use `_state.UnblockAsync(next.Id, ct)` for the actual write. The Slice 4 rewrite finishes the rest.
> - `ExternalMcpService.UpdateTaskStatus` (status flip in the Queued case) → `_state.EnqueueAsync(taskId, ct)`. The Manual case stays as-is until Slice 6 since `Manual` is still a valid legacy value.
>
> **Repo helpers to mark `internal`:** `MarkRunningAsync`, `MarkDoneAsync`, `MarkFailedAsync`, `FlipAllRunningToFailedAsync`. Verify nothing outside `ClaudeDo.Worker.State` calls them after migration. (`Worker.Tests` may need `InternalsVisibleTo` — add it if so.)
>
> **DI wiring:** register `TaskStateService` as Singleton in `Program.cs` for both the main app and the external-MCP app. The service holds no per-request state.
>
> **Tests:** new file `tests/ClaudeDo.Worker.Tests/State/TaskStateServiceTests.cs`. At minimum:
> - Happy path for each transition (verify DB state + side-effect mocks invoked).
> - Reject path for each invalid transition (verify result + DB unchanged).
> - Concurrency: two parallel `StartRunningAsync` for the same `Queued` task → exactly one returns `Ok=true`.
> - Mock or fake the broadcaster, queue, and chain-coordinator dependencies. Use real SQLite for the DB (existing test pattern).
>
> Build all projects, run the worker test project (the 4 pre-existing constructor-drift errors are out of scope — but my changes shouldn't add new errors), commit as `refactor(worker/state): introduce TaskStateService and route mutations through it`.
---
## Slice 3 — `IQueueWaker` + `IQueuePicker`
**Prompt to paste into a fresh session:**
> Slice 3 of the worker state consolidation refactor. Spec: `docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md` (section 3). Slices 1 and 2 already landed.
>
> **Goal:** extract queue-wake and queue-pick from `QueueService` and `TaskRepository` into dedicated single-responsibility components. Make wakes automatic.
>
> **New components in `Worker/Queue/`:**
> - `IQueueWaker` (interface, `void Wake()`). Backed by `QueueWaker` singleton holding the existing `SemaphoreSlim`. Inject into `TaskStateService` (replaces the direct `QueueService` ref from Slice 2) and into `QueueService` itself.
> - `IQueuePicker` with `Task<TaskEntity?> ClaimNextAsync(DateTime now, CancellationToken ct)`. Implementation `QueuePicker` moves the raw SQL out of `TaskRepository.GetNextQueuedAgentTaskAsync` and **adds a `blocked_by_task_id IS NULL` filter to the WHERE clause**. Order stays `sort_order ASC, created_at ASC` (verify the existing query — add ORDER BY if missing). Atomic `UPDATE … RETURNING` flips `Queued → Running` and writes `started_at`.
>
> **Caller updates:**
> - `TaskStateService` swaps its `Func<QueueService>` for `IQueueWaker`. The `→ Queued` side-effect now calls `_waker.Wake()`.
> - `QueueService.ExecuteAsync` calls `_picker.ClaimNextAsync` instead of `_taskRepo.GetNextQueuedAgentTaskAsync`. The slot-claim, broadcaster, and `WakeQueue()` after slot release stay where they are.
> - `WorkerHub.WakeQueue()` and `ExternalMcpService.WakeQueue` calls in app code → remove the explicit invocations. The state-service triggers waking automatically. **Keep** the SignalR/MCP endpoint that exposes `WakeQueue()` for diagnostics/manual use — that one delegates to `_waker.Wake()`.
> - `TaskRepository.GetNextQueuedAgentTaskAsync` becomes a thin shim that forwards to `IQueuePicker` for any remaining tests, OR delete it and update tests to use the picker. Prefer delete if tests are easy to migrate.
>
> **Tests:** new `tests/ClaudeDo.Worker.Tests/Queue/QueuePickerTests.cs`:
> - Skipped: `BlockedByTaskId` set; missing agent tag; `scheduled_for > now`; status not Queued.
> - Picked: correct order (`sort_order, created_at`).
> - Atomic claim: two parallel pickers → exactly one row returned non-null, the other null.
>
> Update existing `TaskRepositoryTests.GetNextQueuedAgentTaskAsync_*` tests if they exercised the removed method.
>
> Build, test, commit as `refactor(worker/queue): split queue waker and picker, auto-wake on enqueue`.
---
## Slice 4 — Planning flow consolidation (kills the original bug)
**Prompt to paste into a fresh session:**
> Slice 4 of the worker state consolidation refactor. Spec: `docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md` (section 4). Slices 13 already landed. **This slice eliminates the original "queue never picks up planning tasks" bug structurally.**
>
> **Goal:** one path through planning. Delete the dual-flow problem.
>
> **Changes:**
> - **Delete** `TaskRepository.FinalizePlanningAsync` entirely. Also delete its tests in `TaskRepositoryPlanningTests.cs`.
> - **Rewrite** `PlanningSessionManager.FinalizeAsync(taskId, queueAgentTasks, ct)`:
> 1. `_state.FinalizePlanningAsync(parentId, ct)` (sets parent `PlanningPhase=Finalized`, `Status=Idle`).
> 2. If `queueAgentTasks` is true, call the new `_chainCoordinator.SetupChainAsync(parentId, ct)`.
> 3. Existing worktree-cleanup + session-dir-deletion remains.
> 4. Return the count of children that ended up in the chain.
> - **Rename** `PlanningChainCoordinator.QueueSubtasksSequentiallyAsync` → `SetupChainAsync`. Make it `internal`. New behavior:
> - Eligibility check: children must be in `Status=Idle` (was `Manual` or `Planned` legacy values — keep tolerating those for one slice via OR).
> - Auto-attach `agent` tag to all children (already in WIP — keep that behavior).
> - For first child: `_state.EnqueueAsync(child[0].Id, ct)` (no BlockedBy, head of chain).
> - For rest: `_state.EnqueueAsync(child[i].Id, ct)` followed immediately by `_state.BlockOnAsync(child[i].Id, child[i-1].Id, ct)`. (Or: add a single `EnqueueBlockedAsync` helper to TaskStateService if call-site clutter bothers you.)
> - **Update** `PlanningChainCoordinator.OnChildFinishedAsync`: replace status-via-LINQ logic with: query for the next child where `BlockedByTaskId == childTaskId`, call `_state.UnblockAsync` on it. Drop the `Waiting` lookup entirely.
> - Audit `Status == TaskStatus.Waiting` in UI/tests — replace with `Status == Queued && BlockedByTaskId != null`. (UI changes confirmed against `TaskRowViewModel`, `TasksIslandViewModel` from Slice 1's WIP.)
>
> **Regression test:** new `tests/ClaudeDo.Worker.Tests/Planning/PlanningEndToEndTests.cs` (or extend existing) — `Active` parent + 3 drafts → call `FinalizeAsync(queueAgentTasks: true)` → assert within 200 ms the first child has `Status=Running` (queue picker claimed it) without anyone calling `WakeQueue()` manually. This was the bug the user originally reported.
>
> **Update** `PlanningMcpService.EditableStatuses` — replace `Waiting` with `Queued` (since blocked tasks are now `Queued + BlockedByTaskId`). Verify the MCP tool still gates on `parent.PlanningPhase == Active` (legacy: `parent.Status == Planning`).
>
> Build, test, commit as `feat(planning): consolidate finalize+chain via TaskStateService, fix queue pickup`.
---
## Slice 5 — `OverrideSlotService` + folder reorg
**Prompt to paste into a fresh session:**
> Slice 5 of the worker state consolidation refactor. Spec: `docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md` (section 5). Slices 14 already landed.
>
> **Goal:** split the override slot out of QueueService and reorganize `Worker/Services/` into domain folders.
>
> **`OverrideSlotService` (new in `Worker/Queue/`):**
> - Owns the `_overrideSlot` field, `RunNow(taskId)`, `ContinueTask(taskId, followUpPrompt)`, and the override-slot piece of `CancelTask`.
> - Status mutations go through `TaskStateService.StartRunningAsync` (non-atomic claim is fine; serialized by slot lock).
> - `QueueService.CancelTask` delegates to `OverrideSlotService.TryCancel` first, falls back to its own queue slot.
> - WorkerHub's `RunNow`/`ContinueTask`/`CancelTask` SignalR endpoints route to the new service via `OverrideSlotService` when applicable; keep the signatures stable.
>
> **Folder reorg** (use `git mv`, don't copy/delete):
> ```
> Worker/State/ ← ITaskStateService.cs, TaskStateService.cs, TransitionResult.cs (already exist; no move needed if already there)
> Worker/Queue/ ← IQueueWaker.cs, QueueWaker.cs, IQueuePicker.cs, QueuePicker.cs, QueueService.cs, OverrideSlotService.cs, QueueSlotState.cs
> Worker/Lifecycle/ ← StaleTaskRecovery.cs, TaskResetService.cs, TaskMergeService.cs
> Worker/Worktrees/ ← WorktreeMaintenanceService.cs
> Worker/Agents/ ← AgentFileService.cs, DefaultAgentSeeder.cs
> Worker/Runner/ ← unchanged
> Worker/Planning/ ← unchanged
> Worker/External/ ← unchanged
> Worker/Hub/ ← unchanged
> ```
>
> Update namespaces to match folders (existing convention: namespace == folder path under `ClaudeDo.Worker`). Delete the old `Worker/Services/` folder once empty.
>
> Update DI registrations in `Program.cs` (both apps) — most calls just need `using` updates. `OverrideSlotService` is a new singleton.
>
> Update test `using` statements to follow.
>
> Build, test, commit as `refactor(worker): extract OverrideSlotService and reorganize Worker/Services into domain folders`.
---
## Slice 6 — Cleanup, legacy retirement, docs
**Prompt to paste into a fresh session:**
> Slice 6 (final) of the worker state consolidation refactor. Spec: `docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md` (section 6 + slice plan). Slices 15 already landed.
>
> **Goal:** retire legacy enum values, backfill DB rows, update docs.
>
> **EF migration `RetireLegacyTaskStatus`:**
> ```sql
> UPDATE tasks SET status='idle' WHERE status IN ('manual', 'draft');
> UPDATE tasks SET status='idle', planning_phase='active' WHERE status='planning';
> UPDATE tasks SET status='idle', planning_phase='finalized' WHERE status='planned';
>
> -- Waiting → Queued + blocked_by from sort_order:
> WITH ordered AS (
> SELECT id,
> LAG(id) OVER (PARTITION BY parent_task_id ORDER BY sort_order, created_at) AS prev_id
> FROM tasks WHERE status='waiting'
> )
> UPDATE tasks
> SET status='queued',
> blocked_by_task_id=(SELECT prev_id FROM ordered WHERE ordered.id=tasks.id)
> WHERE id IN (SELECT id FROM ordered);
> ```
> Use `migrationBuilder.Sql(...)` for these. Down() is best-effort: `Cancelled` → `Failed`, `(idle, finalized)` → `planned`, `(idle, active)` → `planning`, `queued + blocked_by_task_id != null` → `waiting`. Document lossiness in a comment.
>
> **Code changes:**
> - Remove legacy values from `TaskStatus` enum: `Manual, Planning, Planned, Draft, Waiting`.
> - Strip the legacy branches from `TaskEntityConfiguration.StatusToString`/`StatusFromString`.
> - Default for `TaskEntity.Status` is `TaskStatus.Idle` (already correct after Slice 1's revert).
> - Audit + remap every remaining caller — they should already use new values from Slices 24, but search for any leftover `TaskStatus.Manual` etc. in:
> - tests (~10 files seed status — flip to `Idle`/`Queued`/etc.)
> - UI (`TaskRowViewModel.IsPlanningParent`, `IsDraft`, `CanOpenPlanningSession`, status maps — replace with `PlanningPhase` checks where appropriate)
> - any leftover guards in MCP/services
> - Mark `Mark*Async` repo helpers as `internal` if not already (Slice 2 should have done this — verify).
>
> **Docs to update:**
> - `src/ClaudeDo.Worker/CLAUDE.md` — new folder structure, new state-service flow, new wake mechanics, removal of legacy values.
> - `src/ClaudeDo.Data/CLAUDE.md` — TaskEntity new fields (`PlanningPhase`, `BlockedByTaskId`), retired legacy enum values, new tag-attach behavior.
> - `docs/plan.md` — update status flow section.
> - `docs/open.md` — close the "queue doesn't pick up planning tasks" item if it's tracked there; add any follow-ups discovered along the way.
> - Memory: update `~/.claude/projects/C--Private-ClaudeDo/memory/` with a new entry summarizing the new architecture (state-service + queue split + planning chain via blocked-by).
>
> **Sanity tests** — full test run. The 4 pre-existing constructor-drift errors should still be the only failures. If new ones surfaced from missed legacy-value remappings, fix them before commit.
>
> Build, full test run, commit as `refactor(data): retire legacy TaskStatus values and backfill existing rows`.
---
## After Slice 6
- All 6 slices on `main`.
- The original bug ("queue doesn't pick up planning tasks") is structurally impossible.
- Worker has clear domain folders, single state-mutator, single queue-picker.
- Spec doc + this prompt file can be deleted or moved to `docs/superpowers/done/`.