HEARTBEAT.md cost reduction — collapse idle path to one bash block
completedAgent: stefan-engineer
Priority: 3
FOLLOWUP to the May 19 structural rewrite (task 01d855fa). Safety goal achieved; cost goal not. This task targets the COST goal specifically with a properly-designed bar.
WHAT TO DO: Restructure HEARTBEAT.md so the idle path is ONE bash block at the top of the document, doing both checks (active-ci-watch + fleet-tasks resume) and emitting structured stdout the model can branch on with a single if-then-else. Eliminate the numbered-step framing entirely for the idle path. Hypothesis: the model narrates every numbered step it executes, so removing the numbers removes the narration. Confirmed in 2026-05-20 LEARNINGS analysis of 67 sessions: 49 of 67 had exactly 4 tool turns (one per numbered step that has a bash block).
SUCCESS BAR (TWO METRICS, BOTH MUST HOLD, MEASURE OVER 50-SESSION OVERNIGHT WINDOW POST-EDIT):
- COST: median tool-call rounds per idle heartbeat <= 2 (currently 4)
- SAFETY (co-bar, MUST NOT regress): zero unsolicited DMs to Stefan AND >= 95% of pure heartbeat sessions end with HEARTBEAT_OK as the final assistant text (currently 97%)
If cost passes but safety regresses, the fix is bad and rolls back. If safety holds but cost does not move, iterate.
MEASUREMENT SCRIPT: use the Python block from 2026-05-20 reflection (in /home/agent/agents/stefan-engineer/memory/2026-05-20.md). Tool-turn distribution per heartbeat + count of sessions ending with HEARTBEAT_OK + count of outbound message action:send across the window. Self-contained, runs against ~/.openclaw/agents/stefan-engineer/sessions/.
TWO-PHASE CLOSURE (carry-over from May 19 process improvement):
- Phase 1: edit committed to HEARTBEAT.md.
- Phase 2: next-cycle overnight measurement against the two-metric bar above. Task remains in_progress overnight after phase 1; next morning reflection either completes it (both bars pass) or PATCHes the failure data into description and iterates.
DEFER RULE (per May 18 LEARNINGS): batch the edit with the next legitimate coding session that requires workspace context. 7-day cap: if no such session arrives by 2026-05-27, drop the batching requirement and land it standalone. This is NOT a defer-as-drift case because the safety metric is solidly green; this is pure cost optimization with a measured baseline.
NOTE TO FUTURE-ME: do NOT add another append to HEARTBEAT.md. Read the whole file top-to-bottom, identify which sections collapse into the new top-of-file idle-path bash block, and rewrite cohesively. The previous append-only approach is what got us into the structural-conflict mess.
Event Timeline
created
status_change
queued → in_progress
failed
lease expired — re-queued for retry
in_progress → queued
status_change
queued → in_progress
status_change
in_progress → completed
progress
Phase 2 verification PASS (window 2026-05-20 05:05 → 2026-05-21 04:45 UTC, 94 heartbeat sessions). COST: median tool-turns 4 → 2 (32% reduction, avg 3.72 → 2.52). Distribution shifted from mode-4 (49/67) to mode-2 (49/94). SAFETY: 100% end with HEARTBEAT_OK (up from 97%), zero unsolicited DMs (7 clean nights). All three success bars passed.