CodeTracked since May 19, 2026

Stop streaming hang and endless retries on hard 429 rate limits

The pull request changes the OpenAI stream error path by adding a defensive `response.text()` wrapper with a 5-second hard timeout and by making retry logic aware of hard-quota signals, including large `Retry-After` values. This prevents the coding agent from getting stuck on non-200 responses and stops auto-retry loops when the provider signals a hard usage limit.

response.text()fetch interceptoropenai-nodeRetry-After

What Happened

The pull request changes the OpenAI stream error path by adding a defensive `response.text()` wrapper with a 5-second hard timeout and by making retry logic aware of hard-quota signals, including large `Retry-After` values. This prevents the coding agent from getting stuck on non-200 responses and stops auto-retry loops when the provider signals a hard usage limit.
The pull request changes the OpenAI stream error path by adding a defensive `response.text()` wrapper with a 5-second hard timeout and by making retry logic aware of hard-quota signals, including large `Retry-After` values. This prevents the coding agent from getting stuck on non-200 responses and stops auto-retry loops when the provider signals a hard usage limit.
1 evidence item attached for review.

What is Different

Before

Scattered source updates, isolated context, and manual follow-up across multiple feeds.

Now

Introduced a quota-aware failure path for streaming completions: non-200 responses now return an error within 5 seconds instead of hanging, and auto-retry is disabled for hard quota indicators or excessive retry windows (over 300 seconds), while still permitting normal transient rate-limit recovery.

Why Track This

Why It Matters

Operators and users of the coding agent will see rate-limit failures fail fast instead of freezing in a perpetual "Working" state, so tasks no longer consume resources in endless retries when a provider has reached hard usage limits. The fix adds explicit extraction of `Retry-After` into error details and enforces a 5-second timeout on error-body reads, which reduces UI lockups and avoids retry storms on opaque upstream connection failures; teams should watch for regressions if providers rely on slower-error response patterns or if retry-window thresholds are too conservative for future billing or provider semantics.

Impact

What To Watch Next

Watch whether response.text() becomes a repeated pattern.
Track follow-up changes around AI Debugging and Error Localization.
Compare future signals against this evidence trail.
Re-check risk flags: watch_for_false_positive_hard_quota_keyword_matches, watch_timeout_too_short_for_slow_error_response_bodies.

Open Topic Timeline Open Technical Event Open Original Sourcewatch_for_false_positive_hard_quota_keyword_matches / watch_timeout_too_short_for_slow_error_response_bodies / watch_retry_after_parsing_breaks_with_provider_variants / watch_backoff_behavior_for_transient_429_cases

Supporting Evidence

GITHUB PULL REQUESTHigh Trust

earendil-works/pi PR #4736: fix: resolve streaming hang on 429 rate limits and prevent useless auto-retries

A custom fetch interceptor now clones non-200 responses, overrides `text()` with a timeout, appends `Retry-After` to error messages, and `_isRetryableError` rejects retries for hard quota/error-window cases.