Files

mika kuns 30b49d1071 docs: design for reusable child tasks + agent improvement loop

Agent offloads out-of-scope work via SuggestImprovement; children run
automatically; new WaitingForChildren state; generalize planning's
parent/child machinery.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-06-04 13:36:53 +02:00

7.4 KiB

Raw Blame History

Reusable Child Tasks + Agent Improvement Loop — Design

Date: 2026-06-04

Goal

Let an executing task agent offload out-of-scope improvements it spots into child tasks that run automatically, so ClaudeDo can drive a self-improvement loop. Generalize the parent/child machinery that planning uses today into a reusable subsystem not bound to planning.

Example: while implementing task X, Claude notices "this module should really be refactored, but that's out of scope" — instead of scope-creeping, it calls a tool that files the refactor as a child of X. The child runs on its own; once all of X's children finish, X surfaces for review with its whole tree visible.

This builds on the bundled-prompts overhaul (system.md gains one instruction to use the offload tool). It is otherwise independent.

Lifecycle

A new task status WaitingForChildren is added.

Running → WaitingForReview   standalone success, no children   (existing)
Running → WaitingForChildren standalone success, ≥1 child       (new)
Running → Done               planning child success            (existing)
WaitingForChildren → WaitingForReview   all children terminal  (new)
WaitingForChildren → Cancelled          cancel                 (new)

Improvement-children are created Idle during the parent's run and stay unqueued until the parent's own run finishes — this avoids the parent and a child working the same repo concurrently.
When the parent's run succeeds and it has ≥1 non-terminal child, the parent goes to WaitingForChildren and its children are enqueued (they then run under the normal queue, governed by max-parallel — they are independent, not a forced sequential chain like planning).
Children run automatically and reach Done on success without their own review gate (a per-child review would stall the loop). Each child still produces its own worktree/commit; those worktrees are surfaced under the parent for merge.
The parent advances to WaitingForReview once all children are terminal — counting Done, Failed, and Cancelled, so a failed child can't wedge the parent forever. Failed/cancelled children are flagged on the review card.

Planning parents keep their existing behavior (parent → Done when its chain finishes); they do not use WaitingForChildren.

Consolidating the child subsystem

Today child handling is planning-coupled. Generalize:

TaskRepository.CreateChildAsync — drop the parent.PlanningPhase != None guard. A child can attach to any existing parent. (Planning callers are unaffected; their parents have a planning phase.) The child sets ParentTaskId = parentId; the caller decides CreatedBy.
Child-completion coordinator — generalize planning's OnChildFinishedAsync / TryCompleteParentAsync into a single component that, on any child reaching a terminal state, checks the parent and applies a completion policy:
- planning parent → finalize/Done (existing chain advancement stays in the planning layer: unblock the next chained child).
- improvement parent (in WaitingForChildren, all children terminal) → WaitingForReview.
TaskStateService remains the sole writer of Status and owns the new transitions (SubmitForChildrenAsync, the WaitingForChildren → WaitingForReview advance).

The offload tool

A narrow MCP tool exposed only to task runs (not the general external surface):

SuggestImprovement(title, description) → { childTaskId }

The server stamps everything — the agent cannot choose the parent, the status, or queue anything directly:
- ParentTaskId = <calling task id>
- CreatedBy = <calling task id> (unambiguous "agent-suggested improvement" marker — distinct from null user/planning tasks and "mcp" external tasks)
- Status = Idle, same ListId as the parent.
One layer deep: the tool rejects the call if the calling task already has a ParentTaskId (a child cannot spawn children).

Knowing the caller's identity

The always-on external claudedo MCP is shared and can't tell which task is calling. So task runs get a per-run MCP identity, mirroring planning's per-session token:

TaskRunner mints a per-run token and writes a run-scoped .mcp.json (or reuses the global server with a token header) so the offload tool resolves token → calling task id server-side. A TaskRunMcpContextAccessor exposes the current task id to the tool, the same way PlanningMcpContextAccessor does.
This is the reliable path for both correct provenance and the one-layer-deep guard — the id is never supplied by the model.

system.md gains a short instruction (from the prompt-overhaul spec):

## Out-of-scope improvements
If you notice worthwhile work that is genuinely outside this task's scope
(a refactor, a follow-up, tech debt), do NOT do it here. File it with
SuggestImprovement(title, description) and stay focused on the task at hand.

UI

Collapsible tree: children group under their parent (by ParentTaskId). Improvement-children are visually marked as agent-suggested (via CreatedBy == parentId).
New status chip for WaitingForChildren (e.g. amber "waiting on N improvements") with its own color in StatusColorConverter.
Review card for a parent in WaitingForReview lists child outcomes (done/failed) and exposes their worktrees for merge.

Data / migration

Add WaitingForChildren to the TaskStatus enum and its EF ValueConverter. No new columns — ParentTaskId and CreatedBy already exist. No backfill needed (no existing rows use the new value).

Touch points

src/ClaudeDo.Data/Models/TaskStatus (enum) + TaskEntityConfiguration — new value.
src/ClaudeDo.Data/Repositories/TaskRepository.cs — generalize CreateChildAsync.
src/ClaudeDo.Worker/State/TaskStateService.cs — WaitingForChildren transitions.
src/ClaudeDo.Worker/Runner/TaskRunner.cs — route to WaitingForChildren when children exist; enqueue children on parent finish; mint per-run MCP token.
New: child-completion coordinator (generalized from planning) + the offload tool (e.g. TaskRunMcpService.SuggestImprovement) + TaskRunMcpContextAccessor + token auth (mirrors PlanningTokenAuth).
src/ClaudeDo.Worker/Planning/* — refactor planning to consume the shared child-completion coordinator; keep chain-specific advancement local.
UI — tree grouping, WaitingForChildren chip/color, parent review card with child outcomes.
Tests — offload tool stamps parent/createdBy + rejects nested calls; parent → WaitingForChildren → WaitingForReview lifecycle; planning regression (still reaches Done).

Open questions for review

Child review/merge: children reach Done without review and leave worktrees for manual merge. Is reviewing the whole tree at the parent enough, or do you want per-child merge controls in the parent's review card? (Default: surface child worktrees under the parent.)
Failed child: parent still advances to WaitingForReview with the failure flagged (default), vs. parent → Failed if any child failed.

Out of scope

Multi-level nesting (only one layer deep by design).
Per-list "disable improvement offload" toggle (could come later; the tool is always available to top-level runs for now).
Changes to how planning sets up its sequential chain.

7.4 KiB Raw Blame History