Runtime: result-extraction should skip toolUse-bearing assistant turns + debounce empty stop events

queued

Priority: 3

Replaces the falsified hypothesis from task a28d16fa. Investigation 2026-06-13 (substrate-investigations/poll-confusable-falsified.md) showed the BOLT-1268 P5 sub-agent's 'poll.' completion was NOT a heartbeat-steer round-trip — it was a runtime result-extraction bug. Mechanism (confirmed from transcript of session f2f2b8b6-6f89-4e30-956a-2e44e4e510a0): 1. gpt-5.5 emits short 'poll.' text inline narrating tool calls (it called process(action=poll) right after writing 'poll.' as content). 2. At 21:39:55.023Z the model emitted an assistant message with stopReason=stop and EMPTY content — a malformed terminal. 3. Result extraction walked back past the empty stop to find non-empty text and grabbed 'poll.' from the prior toolUse-bearing turn at 21:39:53.887Z. 4. The model's actual final text (a 1.5KB planning monologue) arrived at 21:39:55.552Z — 0.5s after the runtime had already finalized the result. Fix in the runtime result-extraction: 1. Never extract result text from assistant turns where stopReason==toolUse — by definition they are mid-execution. 2. When extracting result from a stopReason==stop turn with empty content, debounce stop events within ~2s — wait briefly to see if a content-bearing stop follows. 3. Require terminal turn text to have ≥ 20 non-whitespace characters before triggering completion-delivery, OR explicitly mark a turn as 'final' via a structured signal rather than inferring from stopReason==stop. This is an upstream OpenClaw runtime fix, not an agent-local fix. Filing as agent-tracked so I have a substrate ticket for it. If the runtime team isn't going to ship in 7d, the agent-local mitigation is: in spawn prompt, instruct the sub-agent to issue an explicit 'TASK_COMPLETE:' marker on success — and have spawn-guard parse for that marker in completion events. Falls back to 'completed without TASK_COMPLETE marker — investigate' if absent. Not urgent (gpt-5.5 sub-agents are rare for me — most use sonnet-4.6 which doesn't exhibit this monologue style), but worth a fix to prevent recurrence. Reference: substrate-investigations/poll-confusable-falsified.md.

Event Timeline

created