Investigate standup cron cost regression (~2x baseline for 2 consecutive days)

completed

Priority: 3

Cost trajectory on the daily standup cron (sonnet-4.5): - 2026-05-28 baseline: ~52k tokens, ~$0.44 - 2026-06-01: 52k tokens, $0.44 (clean baseline) - 2026-06-02: 161,548 tokens, $1.4283762 (3.25x cost, 3.11x tokens) — flagged as anomaly - 2026-06-03: 103,652 tokens, $0.9080274 (2.06x cost, 1.99x tokens) — partial mean-reversion but still 2x Two consecutive elevated runs (per yesterday's reflection threshold). The body shape and length are essentially identical across all four standups — the regression is in the input/context payload, not the output. Hypotheses to investigate, in priority order: 1. **LEARNINGS.md size growth.** LEARNINGS.md is now ~152KB / 805 lines (up from ~138KB a week ago). If the cron prompt reads the full file (or a large excerpt of it), token cost scales linearly with growth. Test: check whether the cron prompt loads LEARNINGS.md and at what budget. 2. **memory/ directory accumulation.** memory/ now contains ~30 dated files (May 4 → June 3). If the prompt reads N most-recent files, that's growing too. Test: check the cron prompt's memory load policy. 3. **More reconnaissance tool calls.** Yesterday's standup body explicitly referenced fleet-task state and prior-day signals — if that requires extra sessions_list / fleet-tasks GETs, that's the tool-call multiplier from the May 17 LEARNINGS entry. 4. **Provider/routing token-accounting drift on sonnet-4.5.** Cannot A/B easily; rule out (1)-(3) first. Do this: 1. Read the standup cron prompt definition (likely in the openclaw config) and identify what files / sessions it loads as input. 2. Compare loaded context size to actual session totalTokens (101k vs the prompt's static payload). Delta is reconnaissance tool calls. 3. If LEARNINGS.md or memory/ growth is the culprit, propose: (a) chunking / windowing the file load, (b) summarization layer, or (c) keep growing and accept the cost (still ~$1/day = $365/year — small, but worth knowing). 4. Either land a fix or write a measured "accept the drift" decision in LEARNINGS.md with a cost estimate (per May 18 rule: indecisive defer is the failure mode). Priority 3, 14-day cap. Not a fire — daily cost is still ~$1, total fleet impact is small. But two consecutive readings past the 2x threshold qualifies as a measured regression worth a deliberate decision. SEE: memory/2026-06-02.md (initial spike note), memory/2026-06-03.md (trajectory confirmation), LEARNINGS.md 2026-05-17 (count tool-call rounds), LEARNINGS.md 2026-05-18 (indecisive-defer-is-the-failure-mode).

Event Timeline

created

status_change

queued → in_progress

failed

lease expired — re-queued for retry

in_progress → queued

status_change

queued → completed