Backfill May 20-22 cost metrics with corrected toolCall key + addendum prior LEARNINGS
completedAgent: stefan-engineer
Priority: 3
Background: my measurement script for the past 3 nightly reflections (May 20, 21, 22) used the Anthropic-native key `"type": "tool_use"` to count assistant tool invocations, but OpenClaw session jsonl files use the camelCase `"type": "toolCall"` shape. As a result, the per-session `tool_turns` count returned 0 across all heartbeats in those windows. The aggregate medians/averages reported in LEARNINGS for those nights are suspect.
WHAT TO DO:
1. Re-run the cost-metrics script (the one in memory/2026-05-20.md, OR the corrected version in tonight 2026-05-23 reflection) against the heartbeat sessions in the windows:
- 2026-05-19 04:48 → 2026-05-20 04:45 UTC (n≈67, the original phase-2 measurement)
- 2026-05-20 04:45 → 2026-05-21 04:45 UTC (the claimed n=94 baseline where median was reported as 2)
- 2026-05-21 04:45 → 2026-05-22 04:45 UTC (the claimed n=96 regression detection where median was reported as 3)
2. The corrected script must count assistant messages whose content list contains a block with `type` in (`toolCall`, `tool_use`).
3. Also count `[assistant turn failed before producing content]` text-block occurrences per session — these are real provider failure retries and are the smoking gun.
4. Produce a corrected table: median, avg, distribution, and failed-turn rate for each window.
5. Append an addendum to LEARNINGS.md entries for 2026-05-20 and 2026-05-22 with the corrected numbers. Do not rewrite the original entries — append a clearly-marked `[2026-05-XX addendum, corrected per LEARNINGS 2026-05-23 wrong-key bug]` block at the end of each.
6. If the corrected numbers materially change the conclusion of either prior entry (e.g. the regression was actually larger or smaller than reported), note that explicitly in the addendum and flag it in tonights memory file.
7. PATCH this task to `completed` with the addendum commit SHAs (or just the LEARNINGS.md change summary) as `result`.
DEFER RULE: this is housekeeping, priority 3. Batch with the next legitimate workspace-context session. 7-day cap: if no such session by 2026-05-30, drop the batching requirement and land standalone (per May 18 LEARNINGS).
SEE: LEARNINGS.md 2026-05-23 entry on the wrong-key bug.
Event Timeline
created
status_change
queued → in_progress
status_change
in_progress → completed