Back to Signal Feed
CodeTracked since May 22, 2026

Default Bedrock maxTokens to model limits to prevent silent 4k truncation

The change updates Bedrock requests so that when callers do not supply `maxTokens`, the provider sends `inferenceConfig.maxTokens` from `model.maxTokens` instead of relying on the provider default, removing a silent 4096-token output cap that triggered `stopReason: "length"` on long Anthropic Claude generations.

Amazon BedrockinferenceConfig.maxTokensmodel.maxTokensAnthropic Claude Opus 4.7

What Happened

  • The change updates Bedrock requests so that when callers do not supply `maxTokens`, the provider sends `inferenceConfig.maxTokens` from `model.maxTokens` instead of relying on the provider default, removing a silent 4096-token output cap that triggered `stopReason: "length"` on long Anthropic Claude generations.
  • The change updates Bedrock requests so that when callers do not supply `maxTokens`, the provider sends `inferenceConfig.maxTokens` from `model.maxTokens` instead of relying on the provider default, removing a silent 4096-token output cap that triggered `stopReason: "length"` on long Anthropic Claude generations.
  • 1 evidence item attached for review.

What is Different

Before

Scattered source updates, isolated context, and manual follow-up across multiple feeds.

Now

Added a concrete fallback in the Bedrock provider request builder to populate `inferenceConfig.maxTokens` from `model.maxTokens` when user options omit `maxTokens`, then validated the fix with a real end-to-end test that reproduces and confirms removal of the length-stop truncation.

Why Track This

Why It Matters

Developers and operators using this SDK to call Bedrock no longer get long responses cut off mid-task at about 4096 tokens when they forget to pass `maxTokens`, so multi-thousand-token outputs (for coding, writing, and other long-form tasks) can complete more reliably. Previously, missing `maxTokens` let Bedrock enforce a server-side default cap that caused `stopReason:"length"`; the fix now applies the model’s declared token limit by default, and teams should watch for output-cost/latency growth on long prompts and verify new or updated models still expose correct `maxTokens` values.

Impact

Developers and operators using this SDK to call Bedrock no longer get long responses cut off mid-task at about 4096 tokens when they forget to pass `maxTokens`, so multi-thousand-token outputs (for coding, writing, and other long-form tasks) can complete more reliably. Previously, missing `maxTokens` let Bedrock enforce a server-side default cap that caused `stopReason:"length"`; the fix now applies the model’s declared token limit by default, and teams should watch for output-cost/latency growth on long prompts and verify new or updated models still expose correct `maxTokens` values.

What To Watch Next

  • Watch whether Amazon Bedrock becomes a repeated pattern.
  • Track follow-up changes around AI Coding Agents.
  • Compare future signals against this evidence trail.
  • Re-check risk flags: watch_output_cost_growth_for_long_prompts, verify_model_maxTokens_values_for_new_provider_models.
Open Topic TimelineOpen Technical EventOpen Original Sourcewatch_output_cost_growth_for_long_prompts / verify_model_maxTokens_values_for_new_provider_models / monitor_stopReason_length_regressions

Supporting Evidence