Back to Signal Feed
CodeTracked since May 19, 2026

Make cancel endpoint idempotent for already-interrupted runs

Fixed a race in `cancel_run` where two concurrent cancel requests could both pass the pre-check and one call returned `409` after another had already interrupted the run; the API now re-checks state after `cancel()` fails and returns `202` when the run is already interrupted or already cleaned up.

bytedance/deer-flowcancel_runrun statusinterrupted

What Happened

  • Fixed a race in `cancel_run` where two concurrent cancel requests could both pass the pre-check and one call returned `409` after another had already interrupted the run; the API now re-checks state after `cancel()` fails and returns `202` when the run is already interrupted or already cleaned up.
  • Fixed a race in `cancel_run` where two concurrent cancel requests could both pass the pre-check and one call returned `409` after another had already interrupted the run; the API now re-checks state after `cancel()` fails and returns `202` when the run is already interrupted or already cleaned up.
  • 1 evidence item attached for review.

What is Different

Before

Scattered source updates, isolated context, and manual follow-up across multiple feeds.

Now

Added explicit post-`cancel()` re-read logic in the run-cancel path: if cancellation fails, the handler now inspects the latest run record and returns idempotent success for `interrupted` or missing records (cleanup races), with conflict only for completed/error/timeout states.

Why Track This

Why It Matters

Operators and integrations that send duplicate cancel requests (for retries, timeouts, or concurrent controllers) now get deterministic success instead of random `409` failures when a run was already interrupted, reducing false alarms and unnecessary re-cancel loops while keeping protection against canceling already finished work. After a failed `cancel()` call, the gateway re-fetches the run state and maps `interrupted` or absent records to `202`, while preserving `409` for successful/completed runs. Watch for clients that previously treated any `409` as a terminal cancel error, and monitor conflict-rate telemetry under high-concurrency cancellation storms to catch any remaining race regressions.

Impact

Operators and integrations that send duplicate cancel requests (for retries, timeouts, or concurrent controllers) now get deterministic success instead of random `409` failures when a run was already interrupted, reducing false alarms and unnecessary re-cancel loops while keeping protection against canceling already finished work. After a failed `cancel()` call, the gateway re-fetches the run state and maps `interrupted` or absent records to `202`, while preserving `409` for successful/completed runs. Watch for clients that previously treated any `409` as a terminal cancel error, and monitor conflict-rate telemetry under high-concurrency cancellation storms to catch any remaining race regressions.

What To Watch Next

  • Watch whether bytedance/deer-flow becomes a repeated pattern.
  • Track follow-up changes around Agent Orchestration Platforms.
  • Compare future signals against this evidence trail.
  • Re-check risk flags: idempotent_cancel_handling_client_expectations, remaining_race_conditions_at_record_cleanup.
Open Topic TimelineOpen Technical EventOpen Original Sourceidempotent_cancel_handling_client_expectations / remaining_race_conditions_at_record_cleanup / high_concurrency_cancel_conflict_visibility / retry_backoff_interactions_with_202

Supporting Evidence