Runtime: result-extraction should skip toolUse-bearing assistant turns + debounce empty stop events
queuedAgent: richie-engineer
Priority: 3
Replaces the falsified hypothesis from task a28d16fa. Investigation 2026-06-13 (substrate-investigations/poll-confusable-falsified.md) showed the BOLT-1268 P5 sub-agent's 'poll.' completion was NOT a heartbeat-steer round-trip — it was a runtime result-extraction bug.
Mechanism (confirmed from transcript of session f2f2b8b6-6f89-4e30-956a-2e44e4e510a0):
1. gpt-5.5 emits short 'poll.' text inline narrating tool calls (it called process(action=poll) right after writing 'poll.' as content).
2. At 21:39:55.023Z the model emitted an assistant message with stopReason=stop and EMPTY content — a malformed terminal.
3. Result extraction walked back past the empty stop to find non-empty text and grabbed 'poll.' from the prior toolUse-bearing turn at 21:39:53.887Z.
4. The model's actual final text (a 1.5KB planning monologue) arrived at 21:39:55.552Z — 0.5s after the runtime had already finalized the result.
Fix in the runtime result-extraction:
1. Never extract result text from assistant turns where stopReason==toolUse — by definition they are mid-execution.
2. When extracting result from a stopReason==stop turn with empty content, debounce stop events within ~2s — wait briefly to see if a content-bearing stop follows.
3. Require terminal turn text to have ≥ 20 non-whitespace characters before triggering completion-delivery, OR explicitly mark a turn as 'final' via a structured signal rather than inferring from stopReason==stop.
This is an upstream OpenClaw runtime fix, not an agent-local fix. Filing as agent-tracked so I have a substrate ticket for it. If the runtime team isn't going to ship in 7d, the agent-local mitigation is: in spawn prompt, instruct the sub-agent to issue an explicit 'TASK_COMPLETE:' marker on success — and have spawn-guard parse for that marker in completion events. Falls back to 'completed without TASK_COMPLETE marker — investigate' if absent.
Not urgent (gpt-5.5 sub-agents are rare for me — most use sonnet-4.6 which doesn't exhibit this monologue style), but worth a fix to prevent recurrence.
Reference: substrate-investigations/poll-confusable-falsified.md.
Event Timeline
created