Files
ClaudeDo/docs/superpowers/plans/2026-04-27-worker-state-consolidation-session-prompts.md
Mika Kuns cf7a6e413c docs(superpowers): add session prompts for worker state consolidation slices 2-6
Self-contained prompts to paste into fresh sessions, one per remaining slice.
Each prompt includes scope, allowed transitions, caller-migration list, test
expectations, and the conventional-commit message to use.
2026-04-27 10:52:55 +02:00

17 KiB
Raw Blame History

Session Prompts — Worker State & Queue Consolidation Slices 26

Paste-ready prompts for each remaining slice. Run one slice per session so the diff stays reviewable and tests stay green between commits. Spec lives at docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md — reference it when the prompt asks.

Common ground rules (carry across all slices):

  • Direct on main, one commit per slice, conventional commit messages.
  • Build green (dotnet build src/ClaudeDo.Worker/ClaudeDo.Worker.csproj + Data + Ui) before commit.
  • Pre-existing test errors (TaskRunner/WorkerHub constructor drift in 4 test files) are not in scope to fix — they exist on main already. New compile errors my changes introduce ARE in scope.
  • No drive-by refactors outside the slice's stated scope.
  • New files must follow existing naming/folder conventions; legacy enum values stay until Slice 6.
  • After each slice, update ~/.claude/projects/C--Private-ClaudeDo/memory/ if I learn something durable about the codebase.

Slice 2 — TaskStateService (centralized state machine)

Prompt to paste into a fresh session:

Slice 2 of the worker state consolidation refactor. Spec: docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md (sections 2 and 8). Slice 1 already landed (commit 7b737e6) — TaskStatus has Idle/Cancelled, PlanningPhase enum exists, BlockedByTaskId field exists. Legacy enum values still around.

Goal: introduce Worker/State/ITaskStateService + TaskStateService as the single component that mutates Status, PlanningPhase, BlockedByTaskId. Migrate every existing caller. Mark repo Mark*Async helpers internal.

Public surface (verbatim from spec):

Task<TransitionResult> EnqueueAsync(string taskId, CancellationToken ct);
Task<TransitionResult> StartRunningAsync(string taskId, DateTime startedAt, CancellationToken ct);
Task<TransitionResult> CompleteAsync(string taskId, DateTime finishedAt, string? result, CancellationToken ct);
Task<TransitionResult> FailAsync(string taskId, DateTime finishedAt, string? error, CancellationToken ct);
Task<TransitionResult> CancelAsync(string taskId, DateTime finishedAt, CancellationToken ct);
Task<TransitionResult> ResetToIdleAsync(string taskId, CancellationToken ct);
Task<TransitionResult> StartPlanningAsync(string parentId, CancellationToken ct);
Task<TransitionResult> FinalizePlanningAsync(string parentId, CancellationToken ct);
Task<TransitionResult> BlockOnAsync(string taskId, string predecessorTaskId, CancellationToken ct);
Task<TransitionResult> UnblockAsync(string taskId, CancellationToken ct);
Task<int> RecoverStaleRunningAsync(string reason, CancellationToken ct);

Allowed transition table: see spec §2. Reject invalid transitions with TransitionResult(false, "<reason>") — no exceptions. Each transition is one atomic ExecuteUpdate with WHERE Status = <expected> for TOCTOU-freedom.

Side effects after successful DB write (do these inside the service so callers don't need to remember):

  • On any → Queued: call _queue.WakeQueue() directly for now (Slice 3 will replace with IQueueWaker). Inject QueueService lazily via Func<QueueService> to break the DI cycle if needed.
  • On any successful transition: _broadcaster.TaskUpdated(taskId).
  • On Done/Failed/Cancelled for a child task: invoke _chain.OnChildFinishedAsync(taskId, finalStatus, ct). If it returns a next-task-id, call UnblockAsync on it. Then run _repo.TryCompleteParentAsync(parentId, ct).

Important: BlockOnAsync and UnblockAsync should write BlockedByTaskId directly. EnqueueAsync for a Planning child should keep BlockedByTaskId null when it's the head of the chain. The chain coordinator will compose these calls in Slice 4 — for now just expose the API.

Caller migration (mechanical — preserve current behavior):

  • TaskRunner.HandleSuccess → replace taskRepo.MarkDoneAsync + TryCompleteParentAsync + _chain.OnChildFinishedAsync block with a single _state.CompleteAsync(taskId, finishedAt, result, CancellationToken.None).
  • TaskRunner.HandleFailure_state.FailAsync(taskId, finishedAt, errorMarkdown, CancellationToken.None).
  • TaskRunner.MarkFailed (early-fail path) → same.
  • TaskRunner.RunAsync start of run → _state.StartRunningAsync(taskId, startedAt, ct).
  • StaleTaskRecovery.StartAsync_state.RecoverStaleRunningAsync("worker restart", ct).
  • TaskResetService.ResetAsync_state.ResetToIdleAsync(taskId, ct) for the status flip; service keeps owning worktree cleanup.
  • PlanningSessionManager.StartAsync (the SetPlanningStartedAsync call) → _state.StartPlanningAsync(parentId, ct). The manager still owns token/session-dir setup; only the status flip moves.
  • PlanningChainCoordinator.OnChildFinishedAsync (the next.Status = TaskStatus.Queued write) → keep its existing logic but use _state.UnblockAsync(next.Id, ct) for the actual write. The Slice 4 rewrite finishes the rest.
  • ExternalMcpService.UpdateTaskStatus (status flip in the Queued case) → _state.EnqueueAsync(taskId, ct). The Manual case stays as-is until Slice 6 since Manual is still a valid legacy value.

Repo helpers to mark internal: MarkRunningAsync, MarkDoneAsync, MarkFailedAsync, FlipAllRunningToFailedAsync. Verify nothing outside ClaudeDo.Worker.State calls them after migration. (Worker.Tests may need InternalsVisibleTo — add it if so.)

DI wiring: register TaskStateService as Singleton in Program.cs for both the main app and the external-MCP app. The service holds no per-request state.

Tests: new file tests/ClaudeDo.Worker.Tests/State/TaskStateServiceTests.cs. At minimum:

  • Happy path for each transition (verify DB state + side-effect mocks invoked).
  • Reject path for each invalid transition (verify result + DB unchanged).
  • Concurrency: two parallel StartRunningAsync for the same Queued task → exactly one returns Ok=true.
  • Mock or fake the broadcaster, queue, and chain-coordinator dependencies. Use real SQLite for the DB (existing test pattern).

Build all projects, run the worker test project (the 4 pre-existing constructor-drift errors are out of scope — but my changes shouldn't add new errors), commit as refactor(worker/state): introduce TaskStateService and route mutations through it.


Slice 3 — IQueueWaker + IQueuePicker

Prompt to paste into a fresh session:

Slice 3 of the worker state consolidation refactor. Spec: docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md (section 3). Slices 1 and 2 already landed.

Goal: extract queue-wake and queue-pick from QueueService and TaskRepository into dedicated single-responsibility components. Make wakes automatic.

New components in Worker/Queue/:

  • IQueueWaker (interface, void Wake()). Backed by QueueWaker singleton holding the existing SemaphoreSlim. Inject into TaskStateService (replaces the direct QueueService ref from Slice 2) and into QueueService itself.
  • IQueuePicker with Task<TaskEntity?> ClaimNextAsync(DateTime now, CancellationToken ct). Implementation QueuePicker moves the raw SQL out of TaskRepository.GetNextQueuedAgentTaskAsync and adds a blocked_by_task_id IS NULL filter to the WHERE clause. Order stays sort_order ASC, created_at ASC (verify the existing query — add ORDER BY if missing). Atomic UPDATE … RETURNING flips Queued → Running and writes started_at.

Caller updates:

  • TaskStateService swaps its Func<QueueService> for IQueueWaker. The → Queued side-effect now calls _waker.Wake().
  • QueueService.ExecuteAsync calls _picker.ClaimNextAsync instead of _taskRepo.GetNextQueuedAgentTaskAsync. The slot-claim, broadcaster, and WakeQueue() after slot release stay where they are.
  • WorkerHub.WakeQueue() and ExternalMcpService.WakeQueue calls in app code → remove the explicit invocations. The state-service triggers waking automatically. Keep the SignalR/MCP endpoint that exposes WakeQueue() for diagnostics/manual use — that one delegates to _waker.Wake().
  • TaskRepository.GetNextQueuedAgentTaskAsync becomes a thin shim that forwards to IQueuePicker for any remaining tests, OR delete it and update tests to use the picker. Prefer delete if tests are easy to migrate.

Tests: new tests/ClaudeDo.Worker.Tests/Queue/QueuePickerTests.cs:

  • Skipped: BlockedByTaskId set; missing agent tag; scheduled_for > now; status not Queued.
  • Picked: correct order (sort_order, created_at).
  • Atomic claim: two parallel pickers → exactly one row returned non-null, the other null.

Update existing TaskRepositoryTests.GetNextQueuedAgentTaskAsync_* tests if they exercised the removed method.

Build, test, commit as refactor(worker/queue): split queue waker and picker, auto-wake on enqueue.


Slice 4 — Planning flow consolidation (kills the original bug)

Prompt to paste into a fresh session:

Slice 4 of the worker state consolidation refactor. Spec: docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md (section 4). Slices 13 already landed. This slice eliminates the original "queue never picks up planning tasks" bug structurally.

Goal: one path through planning. Delete the dual-flow problem.

Changes:

  • Delete TaskRepository.FinalizePlanningAsync entirely. Also delete its tests in TaskRepositoryPlanningTests.cs.
  • Rewrite PlanningSessionManager.FinalizeAsync(taskId, queueAgentTasks, ct):
    1. _state.FinalizePlanningAsync(parentId, ct) (sets parent PlanningPhase=Finalized, Status=Idle).
    2. If queueAgentTasks is true, call the new _chainCoordinator.SetupChainAsync(parentId, ct).
    3. Existing worktree-cleanup + session-dir-deletion remains.
    4. Return the count of children that ended up in the chain.
  • Rename PlanningChainCoordinator.QueueSubtasksSequentiallyAsyncSetupChainAsync. Make it internal. New behavior:
    • Eligibility check: children must be in Status=Idle (was Manual or Planned legacy values — keep tolerating those for one slice via OR).
    • Auto-attach agent tag to all children (already in WIP — keep that behavior).
    • For first child: _state.EnqueueAsync(child[0].Id, ct) (no BlockedBy, head of chain).
    • For rest: _state.EnqueueAsync(child[i].Id, ct) followed immediately by _state.BlockOnAsync(child[i].Id, child[i-1].Id, ct). (Or: add a single EnqueueBlockedAsync helper to TaskStateService if call-site clutter bothers you.)
  • Update PlanningChainCoordinator.OnChildFinishedAsync: replace status-via-LINQ logic with: query for the next child where BlockedByTaskId == childTaskId, call _state.UnblockAsync on it. Drop the Waiting lookup entirely.
  • Audit Status == TaskStatus.Waiting in UI/tests — replace with Status == Queued && BlockedByTaskId != null. (UI changes confirmed against TaskRowViewModel, TasksIslandViewModel from Slice 1's WIP.)

Regression test: new tests/ClaudeDo.Worker.Tests/Planning/PlanningEndToEndTests.cs (or extend existing) — Active parent + 3 drafts → call FinalizeAsync(queueAgentTasks: true) → assert within 200 ms the first child has Status=Running (queue picker claimed it) without anyone calling WakeQueue() manually. This was the bug the user originally reported.

Update PlanningMcpService.EditableStatuses — replace Waiting with Queued (since blocked tasks are now Queued + BlockedByTaskId). Verify the MCP tool still gates on parent.PlanningPhase == Active (legacy: parent.Status == Planning).

Build, test, commit as feat(planning): consolidate finalize+chain via TaskStateService, fix queue pickup.


Slice 5 — OverrideSlotService + folder reorg

Prompt to paste into a fresh session:

Slice 5 of the worker state consolidation refactor. Spec: docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md (section 5). Slices 14 already landed.

Goal: split the override slot out of QueueService and reorganize Worker/Services/ into domain folders.

OverrideSlotService (new in Worker/Queue/):

  • Owns the _overrideSlot field, RunNow(taskId), ContinueTask(taskId, followUpPrompt), and the override-slot piece of CancelTask.
  • Status mutations go through TaskStateService.StartRunningAsync (non-atomic claim is fine; serialized by slot lock).
  • QueueService.CancelTask delegates to OverrideSlotService.TryCancel first, falls back to its own queue slot.
  • WorkerHub's RunNow/ContinueTask/CancelTask SignalR endpoints route to the new service via OverrideSlotService when applicable; keep the signatures stable.

Folder reorg (use git mv, don't copy/delete):

Worker/State/        ← ITaskStateService.cs, TaskStateService.cs, TransitionResult.cs (already exist; no move needed if already there)
Worker/Queue/        ← IQueueWaker.cs, QueueWaker.cs, IQueuePicker.cs, QueuePicker.cs, QueueService.cs, OverrideSlotService.cs, QueueSlotState.cs
Worker/Lifecycle/    ← StaleTaskRecovery.cs, TaskResetService.cs, TaskMergeService.cs
Worker/Worktrees/    ← WorktreeMaintenanceService.cs
Worker/Agents/       ← AgentFileService.cs, DefaultAgentSeeder.cs
Worker/Runner/       ← unchanged
Worker/Planning/     ← unchanged
Worker/External/     ← unchanged
Worker/Hub/          ← unchanged

Update namespaces to match folders (existing convention: namespace == folder path under ClaudeDo.Worker). Delete the old Worker/Services/ folder once empty.

Update DI registrations in Program.cs (both apps) — most calls just need using updates. OverrideSlotService is a new singleton.

Update test using statements to follow.

Build, test, commit as refactor(worker): extract OverrideSlotService and reorganize Worker/Services into domain folders.


Slice 6 — Cleanup, legacy retirement, docs

Prompt to paste into a fresh session:

Slice 6 (final) of the worker state consolidation refactor. Spec: docs/superpowers/specs/2026-04-27-worker-state-and-queue-consolidation-design.md (section 6 + slice plan). Slices 15 already landed.

Goal: retire legacy enum values, backfill DB rows, update docs.

EF migration RetireLegacyTaskStatus:

UPDATE tasks SET status='idle' WHERE status IN ('manual', 'draft');
UPDATE tasks SET status='idle', planning_phase='active'    WHERE status='planning';
UPDATE tasks SET status='idle', planning_phase='finalized' WHERE status='planned';

-- Waiting → Queued + blocked_by from sort_order:
WITH ordered AS (
  SELECT id,
         LAG(id) OVER (PARTITION BY parent_task_id ORDER BY sort_order, created_at) AS prev_id
  FROM tasks WHERE status='waiting'
)
UPDATE tasks
SET status='queued',
    blocked_by_task_id=(SELECT prev_id FROM ordered WHERE ordered.id=tasks.id)
WHERE id IN (SELECT id FROM ordered);

Use migrationBuilder.Sql(...) for these. Down() is best-effort: CancelledFailed, (idle, finalized)planned, (idle, active)planning, queued + blocked_by_task_id != nullwaiting. Document lossiness in a comment.

Code changes:

  • Remove legacy values from TaskStatus enum: Manual, Planning, Planned, Draft, Waiting.
  • Strip the legacy branches from TaskEntityConfiguration.StatusToString/StatusFromString.
  • Default for TaskEntity.Status is TaskStatus.Idle (already correct after Slice 1's revert).
  • Audit + remap every remaining caller — they should already use new values from Slices 24, but search for any leftover TaskStatus.Manual etc. in:
    • tests (~10 files seed status — flip to Idle/Queued/etc.)
    • UI (TaskRowViewModel.IsPlanningParent, IsDraft, CanOpenPlanningSession, status maps — replace with PlanningPhase checks where appropriate)
    • any leftover guards in MCP/services
  • Mark Mark*Async repo helpers as internal if not already (Slice 2 should have done this — verify).

Docs to update:

  • src/ClaudeDo.Worker/CLAUDE.md — new folder structure, new state-service flow, new wake mechanics, removal of legacy values.
  • src/ClaudeDo.Data/CLAUDE.md — TaskEntity new fields (PlanningPhase, BlockedByTaskId), retired legacy enum values, new tag-attach behavior.
  • docs/plan.md — update status flow section.
  • docs/open.md — close the "queue doesn't pick up planning tasks" item if it's tracked there; add any follow-ups discovered along the way.
  • Memory: update ~/.claude/projects/C--Private-ClaudeDo/memory/ with a new entry summarizing the new architecture (state-service + queue split + planning chain via blocked-by).

Sanity tests — full test run. The 4 pre-existing constructor-drift errors should still be the only failures. If new ones surfaced from missed legacy-value remappings, fix them before commit.

Build, full test run, commit as refactor(data): retire legacy TaskStatus values and backfill existing rows.


After Slice 6

  • All 6 slices on main.
  • The original bug ("queue doesn't pick up planning tasks") is structurally impossible.
  • Worker has clear domain folders, single state-mutator, single queue-picker.
  • Spec doc + this prompt file can be deleted or moved to docs/superpowers/done/.