Back to Signal Feed
CodeTracked since May 19, 2026

Fix coding-agent 429 handling to prevent stream hangs and endless retries

The PR introduces a defensive fetch path for OpenAI responses that enforces a 5-second timeout on non-200 `response.text()` reads and carries `Retry-After` into error text, then updates retry logic so 429 hard-quota cases (usage/balance/quota indicators or long Retry-After >300s) are not auto-retried. It adds regression coverage for hard-limit hangs versus transient recovery behavior.

openai-nodeopenai-completions.tsagent-session.ts429 Too Many Requests

What Happened

  • The PR introduces a defensive fetch path for OpenAI responses that enforces a 5-second timeout on non-200 `response.text()` reads and carries `Retry-After` into error text, then updates retry logic so 429 hard-quota cases (usage/balance/quota indicators or long Retry-After >300s) are not auto-retried. It adds regression coverage for hard-limit hangs versus transient recovery behavior.
  • The PR introduces a defensive fetch path for OpenAI responses that enforces a 5-second timeout on non-200 `response.text()` reads and carries `Retry-After` into error text, then updates retry logic so 429 hard-quota cases (usage/balance/quota indicators or long Retry-After >300s) are not auto-retried. It adds regression coverage for hard-limit hangs versus transient recovery behavior.
  • 1 evidence item attached for review.

What is Different

Before

Scattered source updates, isolated context, and manual follow-up across multiple feeds.

Now

Implemented a concrete reliability fix: a custom fetch interceptor now prevents error-path stream hangs by racing `response.text()` against a 5-second timeout, and `_isRetryableError` now treats hard quota signals and long Retry-After waits as non-retryable conditions. Added regression tests to verify the session aborts on hard limits and still recovers from standard transient rate limiting.

Why Track This

Why It Matters

Operators and developers using the coding-agent will stop seeing indefinite “Working” hangs when rate limits hit hard usage ceilings, so workflows fail predictably and do not waste time retrying futilely; the system now recovers only on transient throttling cases. This is enforced by cloning non-200 responses, injecting a timeout into `text()` parsing, and adding hard-quota/large-`Retry-After` checks in retryability logic before auto-retry. Watch for remaining provider differences in `Retry-After` and error-message formats, because mismatches could hide legitimate quota states or suppress recoverable retries.

Impact

Operators and developers using the coding-agent will stop seeing indefinite “Working” hangs when rate limits hit hard usage ceilings, so workflows fail predictably and do not waste time retrying futilely; the system now recovers only on transient throttling cases. This is enforced by cloning non-200 responses, injecting a timeout into `text()` parsing, and adding hard-quota/large-`Retry-After` checks in retryability logic before auto-retry. Watch for remaining provider differences in `Retry-After` and error-message formats, because mismatches could hide legitimate quota states or suppress recoverable retries.

What To Watch Next

  • Watch whether openai-node becomes a repeated pattern.
  • Track follow-up changes around AI Workflow Automation.
  • Compare future signals against this evidence trail.
  • Re-check risk flags: retry_after_format_variants, hard_quota_message_misclassification.
Open Topic TimelineOpen Technical EventOpen Original Sourceretry_after_format_variants / hard_quota_message_misclassification / timeout_false_positive_on_slow_responses / provider_specific_error_shape_changes

Supporting Evidence