docs: design for reusable child tasks + agent improvement loop

Agent offloads out-of-scope work via SuggestImprovement; children run automatically; new WaitingForChildren state; generalize planning's parent/child machinery. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 13:36:53 +02:00
parent ad7d74820a
commit 30b49d1071
2 changed files with 161 additions and 0 deletions
--- a/docs/superpowers/specs/2026-06-04-bundled-prompts-overhaul-design.md
+++ b/docs/superpowers/specs/2026-06-04-bundled-prompts-overhaul-design.md
@@ -125,6 +125,11 @@ Emit it as many times as needed — once per distinct blocker. Use it only for t
 blockers, not for routine decisions you can make yourself.
 ```

+> `system.md` also gains an **"Out-of-scope improvements"** section that tells the
+> agent to file follow-up work via the `SuggestImprovement` tool. That section is
+> defined in `2026-06-04-child-tasks-and-improvement-loop-design.md` and lands with
+> that feature.
+
 ### `planning-system.md`
 ```markdown
 You are the planning assistant for ClaudeDo. Your job is to break a task into
--- a/docs/superpowers/specs/2026-06-04-child-tasks-and-improvement-loop-design.md
+++ b/docs/superpowers/specs/2026-06-04-child-tasks-and-improvement-loop-design.md
@@ -0,0 +1,156 @@
+# Reusable Child Tasks + Agent Improvement Loop — Design
+
+Date: 2026-06-04
+
+## Goal
+
+Let an executing task agent offload out-of-scope improvements it spots into
+**child tasks** that run automatically, so ClaudeDo can drive a self-improvement
+loop. Generalize the parent/child machinery that planning uses today into a
+reusable subsystem not bound to planning.
+
+Example: while implementing task X, Claude notices "this module should really be
+refactored, but that's out of scope" — instead of scope-creeping, it calls a tool
+that files the refactor as a child of X. The child runs on its own; once all of
+X's children finish, X surfaces for review with its whole tree visible.
+
+This builds on the bundled-prompts overhaul (`system.md` gains one instruction to
+use the offload tool). It is otherwise independent.
+
+## Lifecycle
+
+A new task status `WaitingForChildren` is added.
+
+```
+Running → WaitingForReview   standalone success, no children   (existing)
+Running → WaitingForChildren standalone success, ≥1 child       (new)
+Running → Done               planning child success            (existing)
+WaitingForChildren → WaitingForReview   all children terminal  (new)
+WaitingForChildren → Cancelled          cancel                 (new)
+```
+
+- Improvement-children are created `Idle` **during** the parent's run and stay
+  unqueued until the parent's own run finishes — this avoids the parent and a
+  child working the same repo concurrently.
+- When the parent's run succeeds and it has ≥1 non-terminal child, the parent goes
+  to `WaitingForChildren` and its children are enqueued (they then run under the
+  normal queue, governed by max-parallel — they are independent, not a forced
+  sequential chain like planning).
+- Children run automatically and reach `Done` on success without their own review
+  gate (a per-child review would stall the loop). Each child still produces its
+  own worktree/commit; those worktrees are surfaced under the parent for merge.
+- The parent advances to `WaitingForReview` once **all** children are terminal —
+  counting `Done`, `Failed`, and `Cancelled`, so a failed child can't wedge the
+  parent forever. Failed/cancelled children are flagged on the review card.
+
+Planning parents keep their existing behavior (parent → `Done` when its chain
+finishes); they do not use `WaitingForChildren`.
+
+## Consolidating the child subsystem
+
+Today child handling is planning-coupled. Generalize:
+
+- **`TaskRepository.CreateChildAsync`** — drop the `parent.PlanningPhase != None`
+  guard. A child can attach to any existing parent. (Planning callers are
+  unaffected; their parents have a planning phase.) The child sets
+  `ParentTaskId = parentId`; the caller decides `CreatedBy`.
+- **Child-completion coordinator** — generalize planning's
+  `OnChildFinishedAsync` / `TryCompleteParentAsync` into a single component that,
+  on any child reaching a terminal state, checks the parent and applies a
+  **completion policy**:
+  - *planning parent* → finalize/Done (existing chain advancement stays in the
+    planning layer: unblock the next chained child).
+  - *improvement parent* (in `WaitingForChildren`, all children terminal) →
+    `WaitingForReview`.
+- `TaskStateService` remains the sole writer of `Status` and owns the new
+  transitions (`SubmitForChildrenAsync`, the `WaitingForChildren → WaitingForReview`
+  advance).
+
+## The offload tool
+
+A narrow MCP tool exposed only to task runs (not the general external surface):
+
+```
+SuggestImprovement(title, description) → { childTaskId }
+```
+
+- The **server** stamps everything — the agent cannot choose the parent, the
+  status, or queue anything directly:
+  - `ParentTaskId = <calling task id>`
+  - `CreatedBy = <calling task id>`  (unambiguous "agent-suggested improvement"
+    marker — distinct from `null` user/planning tasks and `"mcp"` external tasks)
+  - `Status = Idle`, same `ListId` as the parent.
+- **One layer deep:** the tool rejects the call if the calling task already has a
+  `ParentTaskId` (a child cannot spawn children).
+
+### Knowing the caller's identity
+
+The always-on external `claudedo` MCP is shared and can't tell which task is
+calling. So task runs get a **per-run MCP identity**, mirroring planning's
+per-session token:
+
+- `TaskRunner` mints a per-run token and writes a run-scoped `.mcp.json` (or
+  reuses the global server with a token header) so the offload tool resolves
+  token → calling task id server-side. A `TaskRunMcpContextAccessor` exposes the
+  current task id to the tool, the same way `PlanningMcpContextAccessor` does.
+- This is the reliable path for both correct provenance and the one-layer-deep
+  guard — the id is never supplied by the model.
+
+`system.md` gains a short instruction (from the prompt-overhaul spec):
+
+```markdown
+## Out-of-scope improvements
+If you notice worthwhile work that is genuinely outside this task's scope
+(a refactor, a follow-up, tech debt), do NOT do it here. File it with
+SuggestImprovement(title, description) and stay focused on the task at hand.
+```
+
+## UI
+
+- **Collapsible tree:** children group under their parent (by `ParentTaskId`).
+  Improvement-children are visually marked as agent-suggested (via
+  `CreatedBy == parentId`).
+- **New status chip** for `WaitingForChildren` (e.g. amber "waiting on N
+  improvements") with its own color in `StatusColorConverter`.
+- **Review card** for a parent in `WaitingForReview` lists child outcomes
+  (done/failed) and exposes their worktrees for merge.
+
+## Data / migration
+
+- Add `WaitingForChildren` to the `TaskStatus` enum and its EF `ValueConverter`.
+  No new columns — `ParentTaskId` and `CreatedBy` already exist. No backfill
+  needed (no existing rows use the new value).
+
+## Touch points
+
+- `src/ClaudeDo.Data/Models/TaskStatus` (enum) + `TaskEntityConfiguration` — new value.
+- `src/ClaudeDo.Data/Repositories/TaskRepository.cs` — generalize `CreateChildAsync`.
+- `src/ClaudeDo.Worker/State/TaskStateService.cs` — `WaitingForChildren` transitions.
+- `src/ClaudeDo.Worker/Runner/TaskRunner.cs` — route to `WaitingForChildren` when
+  children exist; enqueue children on parent finish; mint per-run MCP token.
+- New: child-completion coordinator (generalized from planning) + the offload tool
+  (e.g. `TaskRunMcpService.SuggestImprovement`) + `TaskRunMcpContextAccessor` +
+  token auth (mirrors `PlanningTokenAuth`).
+- `src/ClaudeDo.Worker/Planning/*` — refactor planning to consume the shared
+  child-completion coordinator; keep chain-specific advancement local.
+- UI — tree grouping, `WaitingForChildren` chip/color, parent review card with
+  child outcomes.
+- Tests — offload tool stamps parent/createdBy + rejects nested calls;
+  parent → `WaitingForChildren` → `WaitingForReview` lifecycle; planning
+  regression (still reaches Done).
+
+## Open questions for review
+
+1. **Child review/merge:** children reach `Done` without review and leave
+   worktrees for manual merge. Is reviewing the whole tree at the parent enough,
+   or do you want per-child merge controls in the parent's review card? (Default:
+   surface child worktrees under the parent.)
+2. **Failed child:** parent still advances to `WaitingForReview` with the failure
+   flagged (default), vs. parent → `Failed` if any child failed.
+
+## Out of scope
+
+- Multi-level nesting (only one layer deep by design).
+- Per-list "disable improvement offload" toggle (could come later; the tool is
+  always available to top-level runs for now).
+- Changes to how planning sets up its sequential chain.