Back to Signal Feed
CodeTracked since May 19, 2026

Stop streaming hang and endless retries on hard 429 rate limits

The pull request changes the OpenAI stream error path by adding a defensive `response.text()` wrapper with a 5-second hard timeout and by making retry logic aware of hard-quota signals, including large `Retry-After` values. This prevents the coding agent from getting stuck on non-200 responses and stops auto-retry loops when the provider signals a hard usage limit.

response.text()fetch interceptoropenai-nodeRetry-After

What Happened

  • The pull request changes the OpenAI stream error path by adding a defensive `response.text()` wrapper with a 5-second hard timeout and by making retry logic aware of hard-quota signals, including large `Retry-After` values. This prevents the coding agent from getting stuck on non-200 responses and stops auto-retry loops when the provider signals a hard usage limit.
  • The pull request changes the OpenAI stream error path by adding a defensive `response.text()` wrapper with a 5-second hard timeout and by making retry logic aware of hard-quota signals, including large `Retry-After` values. This prevents the coding agent from getting stuck on non-200 responses and stops auto-retry loops when the provider signals a hard usage limit.
  • 1 evidence item attached for review.

What is Different

Before

Scattered source updates, isolated context, and manual follow-up across multiple feeds.

Now

Introduced a quota-aware failure path for streaming completions: non-200 responses now return an error within 5 seconds instead of hanging, and auto-retry is disabled for hard quota indicators or excessive retry windows (over 300 seconds), while still permitting normal transient rate-limit recovery.

Why Track This

Why It Matters

Operators and users of the coding agent will see rate-limit failures fail fast instead of freezing in a perpetual "Working" state, so tasks no longer consume resources in endless retries when a provider has reached hard usage limits. The fix adds explicit extraction of `Retry-After` into error details and enforces a 5-second timeout on error-body reads, which reduces UI lockups and avoids retry storms on opaque upstream connection failures; teams should watch for regressions if providers rely on slower-error response patterns or if retry-window thresholds are too conservative for future billing or provider semantics.

Impact

Operators and users of the coding agent will see rate-limit failures fail fast instead of freezing in a perpetual "Working" state, so tasks no longer consume resources in endless retries when a provider has reached hard usage limits. The fix adds explicit extraction of `Retry-After` into error details and enforces a 5-second timeout on error-body reads, which reduces UI lockups and avoids retry storms on opaque upstream connection failures; teams should watch for regressions if providers rely on slower-error response patterns or if retry-window thresholds are too conservative for future billing or provider semantics.

What To Watch Next

  • Watch whether response.text() becomes a repeated pattern.
  • Track follow-up changes around AI Debugging and Error Localization.
  • Compare future signals against this evidence trail.
  • Re-check risk flags: watch_for_false_positive_hard_quota_keyword_matches, watch_timeout_too_short_for_slow_error_response_bodies.
Open Topic TimelineOpen Technical EventOpen Original Sourcewatch_for_false_positive_hard_quota_keyword_matches / watch_timeout_too_short_for_slow_error_response_bodies / watch_retry_after_parsing_breaks_with_provider_variants / watch_backoff_behavior_for_transient_429_cases

Supporting Evidence