Files
ClaudeDo/docs/superpowers/plans/2026-04-27-worker-state-consolidation-session-prompts.md
Mika Kuns cf7a6e413c docs(superpowers): add session prompts for worker state consolidation slices 2-6
Self-contained prompts to paste into fresh sessions, one per remaining slice.
Each prompt includes scope, allowed transitions, caller-migration list, test
expectations, and the conventional-commit message to use.
2026-04-27 10:52:55 +02:00

226 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Session Prompts — Worker State & Queue Consolidation Slices 26
Paste-ready prompts for each remaining slice. Run **one slice per session** so the diff stays reviewable and tests stay green between commits. Spec lives at `docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md` — reference it when the prompt asks.
**Common ground rules** (carry across all slices):
- Direct on `main`, one commit per slice, conventional commit messages.
- Build green (`dotnet build src/ClaudeDo.Worker/ClaudeDo.Worker.csproj` + Data + Ui) before commit.
- Pre-existing test errors (TaskRunner/WorkerHub constructor drift in 4 test files) are **not** in scope to fix — they exist on `main` already. New compile errors my changes introduce ARE in scope.
- No drive-by refactors outside the slice's stated scope.
- New files must follow existing naming/folder conventions; legacy enum values stay until Slice 6.
- After each slice, update `~/.claude/projects/C--Private-ClaudeDo/memory/` if I learn something durable about the codebase.
---
## Slice 2 — `TaskStateService` (centralized state machine)
**Prompt to paste into a fresh session:**
> Slice 2 of the worker state consolidation refactor. Spec: `docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md` (sections 2 and 8). Slice 1 already landed (commit 7b737e6) — `TaskStatus` has `Idle`/`Cancelled`, `PlanningPhase` enum exists, `BlockedByTaskId` field exists. Legacy enum values still around.
>
> **Goal:** introduce `Worker/State/ITaskStateService` + `TaskStateService` as the single component that mutates `Status`, `PlanningPhase`, `BlockedByTaskId`. Migrate every existing caller. Mark repo `Mark*Async` helpers `internal`.
>
> **Public surface (verbatim from spec):**
> ```csharp
> Task<TransitionResult> EnqueueAsync(string taskId, CancellationToken ct);
> Task<TransitionResult> StartRunningAsync(string taskId, DateTime startedAt, CancellationToken ct);
> Task<TransitionResult> CompleteAsync(string taskId, DateTime finishedAt, string? result, CancellationToken ct);
> Task<TransitionResult> FailAsync(string taskId, DateTime finishedAt, string? error, CancellationToken ct);
> Task<TransitionResult> CancelAsync(string taskId, DateTime finishedAt, CancellationToken ct);
> Task<TransitionResult> ResetToIdleAsync(string taskId, CancellationToken ct);
> Task<TransitionResult> StartPlanningAsync(string parentId, CancellationToken ct);
> Task<TransitionResult> FinalizePlanningAsync(string parentId, CancellationToken ct);
> Task<TransitionResult> BlockOnAsync(string taskId, string predecessorTaskId, CancellationToken ct);
> Task<TransitionResult> UnblockAsync(string taskId, CancellationToken ct);
> Task<int> RecoverStaleRunningAsync(string reason, CancellationToken ct);
> ```
>
> **Allowed transition table:** see spec §2. Reject invalid transitions with `TransitionResult(false, "<reason>")` — no exceptions. Each transition is one atomic `ExecuteUpdate` with `WHERE Status = <expected>` for TOCTOU-freedom.
>
> **Side effects after successful DB write** (do these inside the service so callers don't need to remember):
> - On any `→ Queued`: call `_queue.WakeQueue()` directly for now (Slice 3 will replace with `IQueueWaker`). Inject `QueueService` lazily via `Func<QueueService>` to break the DI cycle if needed.
> - On any successful transition: `_broadcaster.TaskUpdated(taskId)`.
> - On `Done`/`Failed`/`Cancelled` for a child task: invoke `_chain.OnChildFinishedAsync(taskId, finalStatus, ct)`. If it returns a next-task-id, call `UnblockAsync` on it. Then run `_repo.TryCompleteParentAsync(parentId, ct)`.
>
> **Important:** `BlockOnAsync` and `UnblockAsync` should write `BlockedByTaskId` directly. `EnqueueAsync` for a Planning child should keep `BlockedByTaskId` null when it's the head of the chain. The chain coordinator will compose these calls in Slice 4 — for now just expose the API.
>
> **Caller migration (mechanical — preserve current behavior):**
> - `TaskRunner.HandleSuccess` → replace `taskRepo.MarkDoneAsync` + `TryCompleteParentAsync` + `_chain.OnChildFinishedAsync` block with a single `_state.CompleteAsync(taskId, finishedAt, result, CancellationToken.None)`.
> - `TaskRunner.HandleFailure` → `_state.FailAsync(taskId, finishedAt, errorMarkdown, CancellationToken.None)`.
> - `TaskRunner.MarkFailed` (early-fail path) → same.
> - `TaskRunner.RunAsync` start of run → `_state.StartRunningAsync(taskId, startedAt, ct)`.
> - `StaleTaskRecovery.StartAsync` → `_state.RecoverStaleRunningAsync("worker restart", ct)`.
> - `TaskResetService.ResetAsync` → `_state.ResetToIdleAsync(taskId, ct)` for the status flip; service keeps owning worktree cleanup.
> - `PlanningSessionManager.StartAsync` (the `SetPlanningStartedAsync` call) → `_state.StartPlanningAsync(parentId, ct)`. The manager still owns token/session-dir setup; only the status flip moves.
> - `PlanningChainCoordinator.OnChildFinishedAsync` (the `next.Status = TaskStatus.Queued` write) → keep its existing logic but use `_state.UnblockAsync(next.Id, ct)` for the actual write. The Slice 4 rewrite finishes the rest.
> - `ExternalMcpService.UpdateTaskStatus` (status flip in the Queued case) → `_state.EnqueueAsync(taskId, ct)`. The Manual case stays as-is until Slice 6 since `Manual` is still a valid legacy value.
>
> **Repo helpers to mark `internal`:** `MarkRunningAsync`, `MarkDoneAsync`, `MarkFailedAsync`, `FlipAllRunningToFailedAsync`. Verify nothing outside `ClaudeDo.Worker.State` calls them after migration. (`Worker.Tests` may need `InternalsVisibleTo` — add it if so.)
>
> **DI wiring:** register `TaskStateService` as Singleton in `Program.cs` for both the main app and the external-MCP app. The service holds no per-request state.
>
> **Tests:** new file `tests/ClaudeDo.Worker.Tests/State/TaskStateServiceTests.cs`. At minimum:
> - Happy path for each transition (verify DB state + side-effect mocks invoked).
> - Reject path for each invalid transition (verify result + DB unchanged).
> - Concurrency: two parallel `StartRunningAsync` for the same `Queued` task → exactly one returns `Ok=true`.
> - Mock or fake the broadcaster, queue, and chain-coordinator dependencies. Use real SQLite for the DB (existing test pattern).
>
> Build all projects, run the worker test project (the 4 pre-existing constructor-drift errors are out of scope — but my changes shouldn't add new errors), commit as `refactor(worker/state): introduce TaskStateService and route mutations through it`.
---
## Slice 3 — `IQueueWaker` + `IQueuePicker`
**Prompt to paste into a fresh session:**
> Slice 3 of the worker state consolidation refactor. Spec: `docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md` (section 3). Slices 1 and 2 already landed.
>
> **Goal:** extract queue-wake and queue-pick from `QueueService` and `TaskRepository` into dedicated single-responsibility components. Make wakes automatic.
>
> **New components in `Worker/Queue/`:**
> - `IQueueWaker` (interface, `void Wake()`). Backed by `QueueWaker` singleton holding the existing `SemaphoreSlim`. Inject into `TaskStateService` (replaces the direct `QueueService` ref from Slice 2) and into `QueueService` itself.
> - `IQueuePicker` with `Task<TaskEntity?> ClaimNextAsync(DateTime now, CancellationToken ct)`. Implementation `QueuePicker` moves the raw SQL out of `TaskRepository.GetNextQueuedAgentTaskAsync` and **adds a `blocked_by_task_id IS NULL` filter to the WHERE clause**. Order stays `sort_order ASC, created_at ASC` (verify the existing query — add ORDER BY if missing). Atomic `UPDATE … RETURNING` flips `Queued → Running` and writes `started_at`.
>
> **Caller updates:**
> - `TaskStateService` swaps its `Func<QueueService>` for `IQueueWaker`. The `→ Queued` side-effect now calls `_waker.Wake()`.
> - `QueueService.ExecuteAsync` calls `_picker.ClaimNextAsync` instead of `_taskRepo.GetNextQueuedAgentTaskAsync`. The slot-claim, broadcaster, and `WakeQueue()` after slot release stay where they are.
> - `WorkerHub.WakeQueue()` and `ExternalMcpService.WakeQueue` calls in app code → remove the explicit invocations. The state-service triggers waking automatically. **Keep** the SignalR/MCP endpoint that exposes `WakeQueue()` for diagnostics/manual use — that one delegates to `_waker.Wake()`.
> - `TaskRepository.GetNextQueuedAgentTaskAsync` becomes a thin shim that forwards to `IQueuePicker` for any remaining tests, OR delete it and update tests to use the picker. Prefer delete if tests are easy to migrate.
>
> **Tests:** new `tests/ClaudeDo.Worker.Tests/Queue/QueuePickerTests.cs`:
> - Skipped: `BlockedByTaskId` set; missing agent tag; `scheduled_for > now`; status not Queued.
> - Picked: correct order (`sort_order, created_at`).
> - Atomic claim: two parallel pickers → exactly one row returned non-null, the other null.
>
> Update existing `TaskRepositoryTests.GetNextQueuedAgentTaskAsync_*` tests if they exercised the removed method.
>
> Build, test, commit as `refactor(worker/queue): split queue waker and picker, auto-wake on enqueue`.
---
## Slice 4 — Planning flow consolidation (kills the original bug)
**Prompt to paste into a fresh session:**
> Slice 4 of the worker state consolidation refactor. Spec: `docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md` (section 4). Slices 13 already landed. **This slice eliminates the original "queue never picks up planning tasks" bug structurally.**
>
> **Goal:** one path through planning. Delete the dual-flow problem.
>
> **Changes:**
> - **Delete** `TaskRepository.FinalizePlanningAsync` entirely. Also delete its tests in `TaskRepositoryPlanningTests.cs`.
> - **Rewrite** `PlanningSessionManager.FinalizeAsync(taskId, queueAgentTasks, ct)`:
> 1. `_state.FinalizePlanningAsync(parentId, ct)` (sets parent `PlanningPhase=Finalized`, `Status=Idle`).
> 2. If `queueAgentTasks` is true, call the new `_chainCoordinator.SetupChainAsync(parentId, ct)`.
> 3. Existing worktree-cleanup + session-dir-deletion remains.
> 4. Return the count of children that ended up in the chain.
> - **Rename** `PlanningChainCoordinator.QueueSubtasksSequentiallyAsync` → `SetupChainAsync`. Make it `internal`. New behavior:
> - Eligibility check: children must be in `Status=Idle` (was `Manual` or `Planned` legacy values — keep tolerating those for one slice via OR).
> - Auto-attach `agent` tag to all children (already in WIP — keep that behavior).
> - For first child: `_state.EnqueueAsync(child[0].Id, ct)` (no BlockedBy, head of chain).
> - For rest: `_state.EnqueueAsync(child[i].Id, ct)` followed immediately by `_state.BlockOnAsync(child[i].Id, child[i-1].Id, ct)`. (Or: add a single `EnqueueBlockedAsync` helper to TaskStateService if call-site clutter bothers you.)
> - **Update** `PlanningChainCoordinator.OnChildFinishedAsync`: replace status-via-LINQ logic with: query for the next child where `BlockedByTaskId == childTaskId`, call `_state.UnblockAsync` on it. Drop the `Waiting` lookup entirely.
> - Audit `Status == TaskStatus.Waiting` in UI/tests — replace with `Status == Queued && BlockedByTaskId != null`. (UI changes confirmed against `TaskRowViewModel`, `TasksIslandViewModel` from Slice 1's WIP.)
>
> **Regression test:** new `tests/ClaudeDo.Worker.Tests/Planning/PlanningEndToEndTests.cs` (or extend existing) — `Active` parent + 3 drafts → call `FinalizeAsync(queueAgentTasks: true)` → assert within 200 ms the first child has `Status=Running` (queue picker claimed it) without anyone calling `WakeQueue()` manually. This was the bug the user originally reported.
>
> **Update** `PlanningMcpService.EditableStatuses` — replace `Waiting` with `Queued` (since blocked tasks are now `Queued + BlockedByTaskId`). Verify the MCP tool still gates on `parent.PlanningPhase == Active` (legacy: `parent.Status == Planning`).
>
> Build, test, commit as `feat(planning): consolidate finalize+chain via TaskStateService, fix queue pickup`.
---
## Slice 5 — `OverrideSlotService` + folder reorg
**Prompt to paste into a fresh session:**
> Slice 5 of the worker state consolidation refactor. Spec: `docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md` (section 5). Slices 14 already landed.
>
> **Goal:** split the override slot out of QueueService and reorganize `Worker/Services/` into domain folders.
>
> **`OverrideSlotService` (new in `Worker/Queue/`):**
> - Owns the `_overrideSlot` field, `RunNow(taskId)`, `ContinueTask(taskId, followUpPrompt)`, and the override-slot piece of `CancelTask`.
> - Status mutations go through `TaskStateService.StartRunningAsync` (non-atomic claim is fine; serialized by slot lock).
> - `QueueService.CancelTask` delegates to `OverrideSlotService.TryCancel` first, falls back to its own queue slot.
> - WorkerHub's `RunNow`/`ContinueTask`/`CancelTask` SignalR endpoints route to the new service via `OverrideSlotService` when applicable; keep the signatures stable.
>
> **Folder reorg** (use `git mv`, don't copy/delete):
> ```
> Worker/State/ ← ITaskStateService.cs, TaskStateService.cs, TransitionResult.cs (already exist; no move needed if already there)
> Worker/Queue/ ← IQueueWaker.cs, QueueWaker.cs, IQueuePicker.cs, QueuePicker.cs, QueueService.cs, OverrideSlotService.cs, QueueSlotState.cs
> Worker/Lifecycle/ ← StaleTaskRecovery.cs, TaskResetService.cs, TaskMergeService.cs
> Worker/Worktrees/ ← WorktreeMaintenanceService.cs
> Worker/Agents/ ← AgentFileService.cs, DefaultAgentSeeder.cs
> Worker/Runner/ ← unchanged
> Worker/Planning/ ← unchanged
> Worker/External/ ← unchanged
> Worker/Hub/ ← unchanged
> ```
>
> Update namespaces to match folders (existing convention: namespace == folder path under `ClaudeDo.Worker`). Delete the old `Worker/Services/` folder once empty.
>
> Update DI registrations in `Program.cs` (both apps) — most calls just need `using` updates. `OverrideSlotService` is a new singleton.
>
> Update test `using` statements to follow.
>
> Build, test, commit as `refactor(worker): extract OverrideSlotService and reorganize Worker/Services into domain folders`.
---
## Slice 6 — Cleanup, legacy retirement, docs
**Prompt to paste into a fresh session:**
> Slice 6 (final) of the worker state consolidation refactor. Spec: `docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md` (section 6 + slice plan). Slices 15 already landed.
>
> **Goal:** retire legacy enum values, backfill DB rows, update docs.
>
> **EF migration `RetireLegacyTaskStatus`:**
> ```sql
> UPDATE tasks SET status='idle' WHERE status IN ('manual', 'draft');
> UPDATE tasks SET status='idle', planning_phase='active' WHERE status='planning';
> UPDATE tasks SET status='idle', planning_phase='finalized' WHERE status='planned';
>
> -- Waiting → Queued + blocked_by from sort_order:
> WITH ordered AS (
> SELECT id,
> LAG(id) OVER (PARTITION BY parent_task_id ORDER BY sort_order, created_at) AS prev_id
> FROM tasks WHERE status='waiting'
> )
> UPDATE tasks
> SET status='queued',
> blocked_by_task_id=(SELECT prev_id FROM ordered WHERE ordered.id=tasks.id)
> WHERE id IN (SELECT id FROM ordered);
> ```
> Use `migrationBuilder.Sql(...)` for these. Down() is best-effort: `Cancelled` → `Failed`, `(idle, finalized)` → `planned`, `(idle, active)` → `planning`, `queued + blocked_by_task_id != null` → `waiting`. Document lossiness in a comment.
>
> **Code changes:**
> - Remove legacy values from `TaskStatus` enum: `Manual, Planning, Planned, Draft, Waiting`.
> - Strip the legacy branches from `TaskEntityConfiguration.StatusToString`/`StatusFromString`.
> - Default for `TaskEntity.Status` is `TaskStatus.Idle` (already correct after Slice 1's revert).
> - Audit + remap every remaining caller — they should already use new values from Slices 24, but search for any leftover `TaskStatus.Manual` etc. in:
> - tests (~10 files seed status — flip to `Idle`/`Queued`/etc.)
> - UI (`TaskRowViewModel.IsPlanningParent`, `IsDraft`, `CanOpenPlanningSession`, status maps — replace with `PlanningPhase` checks where appropriate)
> - any leftover guards in MCP/services
> - Mark `Mark*Async` repo helpers as `internal` if not already (Slice 2 should have done this — verify).
>
> **Docs to update:**
> - `src/ClaudeDo.Worker/CLAUDE.md` — new folder structure, new state-service flow, new wake mechanics, removal of legacy values.
> - `src/ClaudeDo.Data/CLAUDE.md` — TaskEntity new fields (`PlanningPhase`, `BlockedByTaskId`), retired legacy enum values, new tag-attach behavior.
> - `docs/plan.md` — update status flow section.
> - `docs/open.md` — close the "queue doesn't pick up planning tasks" item if it's tracked there; add any follow-ups discovered along the way.
> - Memory: update `~/.claude/projects/C--Private-ClaudeDo/memory/` with a new entry summarizing the new architecture (state-service + queue split + planning chain via blocked-by).
>
> **Sanity tests** — full test run. The 4 pre-existing constructor-drift errors should still be the only failures. If new ones surfaced from missed legacy-value remappings, fix them before commit.
>
> Build, full test run, commit as `refactor(data): retire legacy TaskStatus values and backfill existing rows`.
---
## After Slice 6
- All 6 slices on `main`.
- The original bug ("queue doesn't pick up planning tasks") is structurally impossible.
- Worker has clear domain folders, single state-mutator, single queue-picker.
- Spec doc + this prompt file can be deleted or moved to `docs/superpowers/done/`.