Back to Signal Feed
CodeTracked since May 19, 2026

Make duplicate cancel calls idempotent to eliminate spurious 409 conflicts

The pull request fixes a race in gateway run cancellation where two concurrent `cancel_run` requests could both pass the initial existence check, causing the second call to return HTTP 409 even though the run had already moved to `interrupted`. The router now re-reads run state after a failed cancel attempt and returns HTTP 202 when the run is already interrupted or already removed, while preserving 409 only for genuinely non-cancelable states.

cancel_rungateway routerTOCTOU racerun status re-check

What Happened

  • The pull request fixes a race in gateway run cancellation where two concurrent `cancel_run` requests could both pass the initial existence check, causing the second call to return HTTP 409 even though the run had already moved to `interrupted`. The router now re-reads run state after a failed cancel attempt and returns HTTP 202 when the run is already interrupted or already removed, while preserving 409 only for genuinely non-cancelable states.
  • The pull request fixes a race in gateway run cancellation where two concurrent `cancel_run` requests could both pass the initial existence check, causing the second call to return HTTP 409 even though the run had already moved to `interrupted`. The router now re-reads run state after a failed cancel attempt and returns HTTP 202 when the run is already interrupted or already removed, while preserving 409 only for genuinely non-cancelable states.
  • 1 evidence item attached for review.

What is Different

Before

Scattered source updates, isolated context, and manual follow-up across multiple feeds.

Now

Introduces an authoritative post-cancel status re-check in the cancellation flow: after `cancel()` returns false, the router fetches the current run record and maps `status == interrupted` or missing record to a successful idempotent 202 response, avoiding false conflict responses under concurrent cancel traffic.

Why Track This

Why It Matters

Operators retrying cancel requests on the same run can avoid misleading 409 failures, so orchestration and cleanup jobs are less likely to abort due to race-related false negatives. The change removes an intermittent operational failure mode by treating already-interrupted or already-cleared runs as idempotent success, then still returning 409 for truly non-cancelable states such as completed runs. Follow-up should watch whether this re-fetch path causes any measurable overhead under very high cancel concurrency and whether any completed/store-only cases still surface as unexpected conflicts.

Impact

Operators retrying cancel requests on the same run can avoid misleading 409 failures, so orchestration and cleanup jobs are less likely to abort due to race-related false negatives. The change removes an intermittent operational failure mode by treating already-interrupted or already-cleared runs as idempotent success, then still returning 409 for truly non-cancelable states such as completed runs. Follow-up should watch whether this re-fetch path causes any measurable overhead under very high cancel concurrency and whether any completed/store-only cases still surface as unexpected conflicts.

What To Watch Next

  • Watch whether cancel_run becomes a repeated pattern.
  • Track follow-up changes around LLMOps.
  • Compare future signals against this evidence trail.
  • Re-check risk flags: double_cancel_requests_under_high_load, post_cancel_status_reread_latency.
Open Topic TimelineOpen Technical EventOpen Original Sourcedouble_cancel_requests_under_high_load / post_cancel_status_reread_latency / conflict_code_behavior_for_completed_runs

Supporting Evidence