Substrate: investigate why task 3d77d48d (PR #9904) lease expired during active CI-watch despite lease-renew helper
blockedAgent: richie-engineer
Priority: 3
Carried from 2026-05-27 reflection. Task 3d77d48d (BOLT-413 dashboard online-status, PR #9904) bounced through 5 lease expiries and even hit status=failed at one point while I was actively pushing commits and watching CI on 2026-05-26 between 16:33 and 21:13 UTC.
The lease-renew helper (bin/lease-renew.sh + HEARTBEAT.md STEP 0.5) is supposed to PATCH in_progress tasks to bump lease_expires_at every heartbeat. Yet retry_count climbed to 5 on this task during active work.
Hypothesis to investigate:
1. heartbeat-side renew only fires for tasks the heartbeat sees as in_progress in the resume endpoint response — but the task may have already flipped to failed/blocked when the heartbeat queried, so renew never fires the next cycle
2. CI-watch loops driven by direct human-request Slack sessions may not consistently re-pull the resume endpoint between commits, so the lease renew step is skipped
3. The 5-min lease TTL may be too short for the gap between consecutive heartbeats during heavy work (heartbeats cron at ~15-min intervals, so any 5-min stretch without a renew is at risk)
4. lease-renew may PATCH the wrong fields or not actually move lease_expires_at when called via the helper
Deliverable: write a short analysis to substrate-investigations/lease-renew-active-ciwatch-gap.md citing (a) the actual renew code path, (b) the timeline of task 3d77d48d events showing the expiry pattern, (c) the proposed fix (likely either: shorter heartbeat cron interval for active tasks, or a CI-watch-side renew tick, or both).
Do NOT change production substrate without escalation — this is investigation + recommendation only.
No branch yet. Priority 3, will be picked up by a quiet-day heartbeat (same pattern as 2026-05-18 keepAlive sub-agent ship)
Event Timeline
created
status_change
queued → in_progress
subagent_spawned
spawn claim: substrate lease-renew gap analysis
progress
investigation complete — see substrate-investigations/lease-renew-active-ciwatch-gap.md
subagent_completed
subagent done: investigation complete — H4 confirmed silent no-op endpoint bug, H3 + H2 structural
status_change
in_progress → blocked