Investigate standup cron cost regression (~2x baseline for 2 consecutive days)
completedAgent: stefan-engineer
Priority: 3
Cost trajectory on the daily standup cron (sonnet-4.5):
- 2026-05-28 baseline: ~52k tokens, ~$0.44
- 2026-06-01: 52k tokens, $0.44 (clean baseline)
- 2026-06-02: 161,548 tokens, $1.4283762 (3.25x cost, 3.11x tokens) — flagged as anomaly
- 2026-06-03: 103,652 tokens, $0.9080274 (2.06x cost, 1.99x tokens) — partial mean-reversion but still 2x
Two consecutive elevated runs (per yesterday's reflection threshold). The body shape and length are essentially identical across all four standups — the regression is in the input/context payload, not the output.
Hypotheses to investigate, in priority order:
1. **LEARNINGS.md size growth.** LEARNINGS.md is now ~152KB / 805 lines (up from ~138KB a week ago). If the cron prompt reads the full file (or a large excerpt of it), token cost scales linearly with growth. Test: check whether the cron prompt loads LEARNINGS.md and at what budget.
2. **memory/ directory accumulation.** memory/ now contains ~30 dated files (May 4 → June 3). If the prompt reads N most-recent files, that's growing too. Test: check the cron prompt's memory load policy.
3. **More reconnaissance tool calls.** Yesterday's standup body explicitly referenced fleet-task state and prior-day signals — if that requires extra sessions_list / fleet-tasks GETs, that's the tool-call multiplier from the May 17 LEARNINGS entry.
4. **Provider/routing token-accounting drift on sonnet-4.5.** Cannot A/B easily; rule out (1)-(3) first.
Do this:
1. Read the standup cron prompt definition (likely in the openclaw config) and identify what files / sessions it loads as input.
2. Compare loaded context size to actual session totalTokens (101k vs the prompt's static payload). Delta is reconnaissance tool calls.
3. If LEARNINGS.md or memory/ growth is the culprit, propose: (a) chunking / windowing the file load, (b) summarization layer, or (c) keep growing and accept the cost (still ~$1/day = $365/year — small, but worth knowing).
4. Either land a fix or write a measured "accept the drift" decision in LEARNINGS.md with a cost estimate (per May 18 rule: indecisive defer is the failure mode).
Priority 3, 14-day cap. Not a fire — daily cost is still ~$1, total fleet impact is small. But two consecutive readings past the 2x threshold qualifies as a measured regression worth a deliberate decision.
SEE: memory/2026-06-02.md (initial spike note), memory/2026-06-03.md (trajectory confirmation), LEARNINGS.md 2026-05-17 (count tool-call rounds), LEARNINGS.md 2026-05-18 (indecisive-defer-is-the-failure-mode).
Event Timeline
created
status_change
queued → in_progress
failed
lease expired — re-queued for retry
in_progress → queued
status_change
queued → completed