diff --git a/docs/superpowers/specs/2026-06-04-bundled-prompts-overhaul-design.md b/docs/superpowers/specs/2026-06-04-bundled-prompts-overhaul-design.md index d727e66..839a4f5 100644 --- a/docs/superpowers/specs/2026-06-04-bundled-prompts-overhaul-design.md +++ b/docs/superpowers/specs/2026-06-04-bundled-prompts-overhaul-design.md @@ -125,6 +125,11 @@ Emit it as many times as needed — once per distinct blocker. Use it only for t blockers, not for routine decisions you can make yourself. ``` +> `system.md` also gains an **"Out-of-scope improvements"** section that tells the +> agent to file follow-up work via the `SuggestImprovement` tool. That section is +> defined in `2026-06-04-child-tasks-and-improvement-loop-design.md` and lands with +> that feature. + ### `planning-system.md` ```markdown You are the planning assistant for ClaudeDo. Your job is to break a task into diff --git a/docs/superpowers/specs/2026-06-04-child-tasks-and-improvement-loop-design.md b/docs/superpowers/specs/2026-06-04-child-tasks-and-improvement-loop-design.md new file mode 100644 index 0000000..d47cdb2 --- /dev/null +++ b/docs/superpowers/specs/2026-06-04-child-tasks-and-improvement-loop-design.md @@ -0,0 +1,156 @@ +# Reusable Child Tasks + Agent Improvement Loop — Design + +Date: 2026-06-04 + +## Goal + +Let an executing task agent offload out-of-scope improvements it spots into +**child tasks** that run automatically, so ClaudeDo can drive a self-improvement +loop. Generalize the parent/child machinery that planning uses today into a +reusable subsystem not bound to planning. + +Example: while implementing task X, Claude notices "this module should really be +refactored, but that's out of scope" — instead of scope-creeping, it calls a tool +that files the refactor as a child of X. The child runs on its own; once all of +X's children finish, X surfaces for review with its whole tree visible. + +This builds on the bundled-prompts overhaul (`system.md` gains one instruction to +use the offload tool). It is otherwise independent. + +## Lifecycle + +A new task status `WaitingForChildren` is added. + +``` +Running → WaitingForReview standalone success, no children (existing) +Running → WaitingForChildren standalone success, ≥1 child (new) +Running → Done planning child success (existing) +WaitingForChildren → WaitingForReview all children terminal (new) +WaitingForChildren → Cancelled cancel (new) +``` + +- Improvement-children are created `Idle` **during** the parent's run and stay + unqueued until the parent's own run finishes — this avoids the parent and a + child working the same repo concurrently. +- When the parent's run succeeds and it has ≥1 non-terminal child, the parent goes + to `WaitingForChildren` and its children are enqueued (they then run under the + normal queue, governed by max-parallel — they are independent, not a forced + sequential chain like planning). +- Children run automatically and reach `Done` on success without their own review + gate (a per-child review would stall the loop). Each child still produces its + own worktree/commit; those worktrees are surfaced under the parent for merge. +- The parent advances to `WaitingForReview` once **all** children are terminal — + counting `Done`, `Failed`, and `Cancelled`, so a failed child can't wedge the + parent forever. Failed/cancelled children are flagged on the review card. + +Planning parents keep their existing behavior (parent → `Done` when its chain +finishes); they do not use `WaitingForChildren`. + +## Consolidating the child subsystem + +Today child handling is planning-coupled. Generalize: + +- **`TaskRepository.CreateChildAsync`** — drop the `parent.PlanningPhase != None` + guard. A child can attach to any existing parent. (Planning callers are + unaffected; their parents have a planning phase.) The child sets + `ParentTaskId = parentId`; the caller decides `CreatedBy`. +- **Child-completion coordinator** — generalize planning's + `OnChildFinishedAsync` / `TryCompleteParentAsync` into a single component that, + on any child reaching a terminal state, checks the parent and applies a + **completion policy**: + - *planning parent* → finalize/Done (existing chain advancement stays in the + planning layer: unblock the next chained child). + - *improvement parent* (in `WaitingForChildren`, all children terminal) → + `WaitingForReview`. +- `TaskStateService` remains the sole writer of `Status` and owns the new + transitions (`SubmitForChildrenAsync`, the `WaitingForChildren → WaitingForReview` + advance). + +## The offload tool + +A narrow MCP tool exposed only to task runs (not the general external surface): + +``` +SuggestImprovement(title, description) → { childTaskId } +``` + +- The **server** stamps everything — the agent cannot choose the parent, the + status, or queue anything directly: + - `ParentTaskId = ` + - `CreatedBy = ` (unambiguous "agent-suggested improvement" + marker — distinct from `null` user/planning tasks and `"mcp"` external tasks) + - `Status = Idle`, same `ListId` as the parent. +- **One layer deep:** the tool rejects the call if the calling task already has a + `ParentTaskId` (a child cannot spawn children). + +### Knowing the caller's identity + +The always-on external `claudedo` MCP is shared and can't tell which task is +calling. So task runs get a **per-run MCP identity**, mirroring planning's +per-session token: + +- `TaskRunner` mints a per-run token and writes a run-scoped `.mcp.json` (or + reuses the global server with a token header) so the offload tool resolves + token → calling task id server-side. A `TaskRunMcpContextAccessor` exposes the + current task id to the tool, the same way `PlanningMcpContextAccessor` does. +- This is the reliable path for both correct provenance and the one-layer-deep + guard — the id is never supplied by the model. + +`system.md` gains a short instruction (from the prompt-overhaul spec): + +```markdown +## Out-of-scope improvements +If you notice worthwhile work that is genuinely outside this task's scope +(a refactor, a follow-up, tech debt), do NOT do it here. File it with +SuggestImprovement(title, description) and stay focused on the task at hand. +``` + +## UI + +- **Collapsible tree:** children group under their parent (by `ParentTaskId`). + Improvement-children are visually marked as agent-suggested (via + `CreatedBy == parentId`). +- **New status chip** for `WaitingForChildren` (e.g. amber "waiting on N + improvements") with its own color in `StatusColorConverter`. +- **Review card** for a parent in `WaitingForReview` lists child outcomes + (done/failed) and exposes their worktrees for merge. + +## Data / migration + +- Add `WaitingForChildren` to the `TaskStatus` enum and its EF `ValueConverter`. + No new columns — `ParentTaskId` and `CreatedBy` already exist. No backfill + needed (no existing rows use the new value). + +## Touch points + +- `src/ClaudeDo.Data/Models/TaskStatus` (enum) + `TaskEntityConfiguration` — new value. +- `src/ClaudeDo.Data/Repositories/TaskRepository.cs` — generalize `CreateChildAsync`. +- `src/ClaudeDo.Worker/State/TaskStateService.cs` — `WaitingForChildren` transitions. +- `src/ClaudeDo.Worker/Runner/TaskRunner.cs` — route to `WaitingForChildren` when + children exist; enqueue children on parent finish; mint per-run MCP token. +- New: child-completion coordinator (generalized from planning) + the offload tool + (e.g. `TaskRunMcpService.SuggestImprovement`) + `TaskRunMcpContextAccessor` + + token auth (mirrors `PlanningTokenAuth`). +- `src/ClaudeDo.Worker/Planning/*` — refactor planning to consume the shared + child-completion coordinator; keep chain-specific advancement local. +- UI — tree grouping, `WaitingForChildren` chip/color, parent review card with + child outcomes. +- Tests — offload tool stamps parent/createdBy + rejects nested calls; + parent → `WaitingForChildren` → `WaitingForReview` lifecycle; planning + regression (still reaches Done). + +## Open questions for review + +1. **Child review/merge:** children reach `Done` without review and leave + worktrees for manual merge. Is reviewing the whole tree at the parent enough, + or do you want per-child merge controls in the parent's review card? (Default: + surface child worktrees under the parent.) +2. **Failed child:** parent still advances to `WaitingForReview` with the failure + flagged (default), vs. parent → `Failed` if any child failed. + +## Out of scope + +- Multi-level nesting (only one layer deep by design). +- Per-list "disable improvement offload" toggle (could come later; the tool is + always available to top-level runs for now). +- Changes to how planning sets up its sequential chain.