Stage: Expansion

Tool Calling

Track important changes in Tool Calling, including capabilities, product updates, adoption signals, risks, and evidence worth continued monitoring.

TOOL CALLINGTRACKING

Signal Feed

Changes worth continued tracking

20 unique signals

issueMay 19, 2026, 9:18 AM
Serena may overwrite externally edited files due to stale open-file cache
Issue #1013 identified a correctness bug where Serena reused a cached file buffer in `open_file()` without checking disk freshness, so `replace_content` could apply regex edits to stale content and silently overwrite newer external changes.
What ChangedIssue #1013 identified a correctness bug where Serena reused a cached file buffer in `open_file()` without checking disk freshness, so `replace_content` could apply regex edits to stale content and silently overwrite newer external changes.
Why It MattersDevelopers using Serena alongside other tools (for example Claude Code Edit/Read and git workflows) can see real code changes disappear without warning, because an edit tool may report success while persisting stale content over newer file versions; teams should monitor multi-tool sessions for silent overwrite behavior and verify that future releases add explicit stale-buffer detection for all write paths. After the fix, operations like `replace_content` should only write against fresh file content, reducing risk of accidental data loss and making mixed-workflow editing safer.
Final score 84Confidence 961 evidence itemoraios/serenaopen_fileopen_file_buffersreplace_contentEditedFileContext.get_original_contentfind_symbol
Analyze Evidence
pull requestMay 19, 2026, 9:57 AM
Preserve Vertex AI function IDs during genai deserialization
Restores the round-trip behavior for Vertex AI session history by mapping `aiplatformpb.FunctionCall.Id` and `aiplatformpb.FunctionResponse.Id` to `genai.FunctionCall.ID` and `genai.FunctionResponse.ID` when reading events back into `genai.Content`, so tool-call and tool-response links are not dropped.
What ChangedRestores the round-trip behavior for Vertex AI session history by mapping `aiplatformpb.FunctionCall.Id` and `aiplatformpb.FunctionResponse.Id` to `genai.FunctionCall.ID` and `genai.FunctionResponse.ID` when reading events back into `genai.Content`, so tool-call and tool-response links are not dropped.
Why It MattersAgents and tools that replay Vertex AI session history with ADK-go can keep tool calls tied to the correct responses across turns, reducing misrouted follow-up actions and fragile multi-turn tool workflows in production conversations. The change maps Vertex AI function IDs back into `genai` IDs during deserialization, so ID pairing now survives the write-read cycle; this should be watched for older or partially migrated sessions and for any other event conversion paths that might still skip the same ID fields.
Final score 81Confidence 981 evidence itemaiplatformToGenaiContentgenai.FunctionCall.IDgenai.FunctionResponse.IDaiplatformpb.FunctionCall.Idaiplatformpb.FunctionResponse.Idsession historytool-call pairing
Analyze Evidence
pull requestMay 19, 2026, 7:58 PM
Strip DeepSeek reasoning fields when tool calls are present
This PR fixes a DeepSeek-only compatibility regression by auto-detecting models routed to `deepseek-reasoner` and removing `reasoning_effort`/`reasoning` from requests whenever `tools` are included, which prevents reasoner endpoints from rejecting tool-call traffic.
What ChangedThis PR fixes a DeepSeek-only compatibility regression by auto-detecting models routed to `deepseek-reasoner` and removing `reasoning_effort`/`reasoning` from requests whenever `tools` are included, which prevents reasoner endpoints from rejecting tool-call traffic.
Why It MattersDevelopers using DeepSeek tool-calling workflows (for example subagents) can avoid immediate 400 failures on tool invocations, so assistant runs no longer stop at the first step for affected Flash/reasoner models. The change is implemented via compat-mode detection in `openai-completions-compat` and parameter stripping in `buildParams`; in the next cycle, watch for new DeepSeek model naming/routing changes and verify custom compat overrides do not unintentionally remove reasoning from non-targeted models.
Final score 81Confidence 951 evidence itemdeepseek-v4-flashdeepseek-reasonerapi.deepseek.comreasoning_efforttoolsOpenAICompat
Analyze Evidence
pull requestMay 22, 2026, 6:25 AM
Emit completed tool responses before confirmation requests
In google/adk-go, `runOneStep` now yields the merged tool function-response event before yielding `adk_request_confirmation`. This ensures a completed tool result is written to session history even when a consumer stops at the confirmation boundary.
What ChangedIn google/adk-go, `runOneStep` now yields the merged tool function-response event before yielding `adk_request_confirmation`. This ensures a completed tool result is written to session history even when a consumer stops at the confirmation boundary.
Why It MattersTool-call frameworks and operators using adk-go will no longer lose completed tool outputs when execution pauses for user approval, so downstream state, replay, and audit flows stay consistent instead of appearing to stop at a confirmation-only snapshot. Technically, the change emits the merged function-response event before the confirmation event; teams should watch custom consumers for implicit ordering assumptions and add/adjust tests around stop-at-confirmation behavior.
Final score 81Confidence 971 evidence itemrunOneStepadk_request_confirmationfunction-response eventsession historytool execution
Analyze Evidence
pull requestMay 19, 2026, 5:27 AM
Use supported `reasoning` interleaved field for Cerebras Zai-GLM
The PR changes how reasoning text is sent for Cerebras by adding `reasoning` as an allowed `interleaved.field` and making Zai-GLM models default to that field, replacing top-level `reasoning_content` that Cerebras rejects.
What ChangedThe PR changes how reasoning text is sent for Cerebras by adding `reasoning` as an allowed `interleaved.field` and making Zai-GLM models default to that field, replacing top-level `reasoning_content` that Cerebras rejects.
Why It MattersCerebras users running Kilo-Org/kilocode with Zai-GLM no longer face request failures in reasoning workflows, so inference flows that include reasoning turns can continue without breaking API calls. This works by replacing the unsupported top-level `reasoning_content` payload with the supported inline `reasoning` interleaved field for Cerebras, so operators should watch whether any existing clients still emit `reasoning_content` and whether other provider profiles need equivalent field mappings to avoid new compatibility regressions.
Final score 81Confidence 971 evidence itemCerebras APIZai-GLMinterleaved.fieldreasoning_contentreasoning
Analyze Evidence
pull requestMay 19, 2026, 9:27 AM
Fix POSIX ACP CLI detection to avoid false-missing results after timeout
The PR replaces a brittle POSIX batch-only CLI availability check with a two-step strategy: it raises the batch `command -v` timeout to 8000ms, and on batch timeout it runs parallel per-CLI probes (each with 3000ms). This prevents a single slow PATH entry from forcing all built-in ACP CLIs to be reported as missing.
What ChangedThe PR replaces a brittle POSIX batch-only CLI availability check with a two-step strategy: it raises the batch `command -v` timeout to 8000ms, and on batch timeout it runs parallel per-CLI probes (each with 3000ms). This prevents a single slow PATH entry from forcing all built-in ACP CLIs to be reported as missing.
Why It MattersUsers and operators running AionUi in WSL, Docker, or Linux hosts with slow mounted PATHs will no longer see all ACP CLIs disappear as unavailable at startup, so installed tools remain selectable and usable instead of silently falling back to a degraded workflow. Technical mechanism: the detector still prefers a fast batch probe for normal environments, but now recovers from timeout cases by probing each CLI independently with bounded 3s checks; continue watching whether fallback frequency increases on very slow filesystems and whether per-CLI fallback meaningfully extends startup latency when many CLIs repeatedly timeout.
Final score 81Confidence 951 evidence itemAcpDetectorPOSIX batch CLI detectioncommand -vsafeExecAIONUI ACP tools
Analyze Evidence
pull requestMay 20, 2026, 7:07 PM
Fix ADK console tool confirmations to return actionable yes/no results
Added dedicated console-side handling for `toolconfirmation.FunctionCallName` so operator confirmation prompts now return a `{"confirmed": bool}` payload instead of the generic `{"result": <text>}` fallback that `ctx.ToolConfirmation()` could not consume.
What ChangedAdded dedicated console-side handling for `toolconfirmation.FunctionCallName` so operator confirmation prompts now return a `{"confirmed": bool}` payload instead of the generic `{"result": <text>}` fallback that `ctx.ToolConfirmation()` could not consume.
Why It MattersOperators using the ADK console can now actually confirm or reject sensitive tool actions (for example, deleting records), so approval-driven workflows resume correctly instead of appearing blocked by a confirmation response that was never usable. This matters most for teams running interactive HITL flows because it restores a reliable manual control path; continue to watch for regressions in other clients or scripts that still emit or expect the old `{"result": <text>}` response format.
Final score 81Confidence 971 evidence itemadk launcher consoletool confirmation interrupttoolconfirmation.FunctionCallNamectx.ToolConfirmationFunctionCall.ID
Analyze Evidence
pull requestMay 20, 2026, 9:34 AM
LibreChat starts MCP OAuth flow before connect when tokens are missing
This fix changes MCP connection behavior for servers explicitly marked `requiresOAuth: true`: LibreChat now launches OAuth before attempting to connect, preventing a false "connected" state and subsequent tool-call failures for providers that accept anonymous handshakes but reject unauthenticated execution. It closes a real correctness gap where users reached a confusing dead-end after clicking Connect.
What ChangedThis fix changes MCP connection behavior for servers explicitly marked `requiresOAuth: true`: LibreChat now launches OAuth before attempting to connect, preventing a false "connected" state and subsequent tool-call failures for providers that accept anonymous handshakes but reject unauthenticated execution. It closes a real correctness gap where users reached a confusing dead-end after clicking Connect.
Why It MattersTeams using MCP tools such as Google BigQuery in LibreChat now get the OAuth login step during connect, so they no longer see a green connection followed by immediate "OAuth authentication required" when the first tool runs; authenticated calls become either available immediately after consent or fail early with a clear auth URL. This should reduce support friction and broken workflows, while operator monitoring should focus on callback failures, token-store behavior, and any repeated redirect loops for misconfigured OAuth apps or server URLs.
Final score 81Confidence 971 evidence itemMCPConnectionFactory.createConnectionrequiresOAuthoauthRequiredoauthHandledoauthFailedattemptToConnectParsedServerConfig.url
Analyze Evidence
pull requestMay 19, 2026, 1:16 PM
PraisonAI tool wrapper adds shared async bridge with cancellation-safe timeouts
This PR fixes a core async-safety problem in PraisonAI’s tool-calling wrapper by replacing per-runtime event-loop execution with a shared `_async_bridge` and adding explicit cancellation-aware timeout handling so run-time tool calls can finish/abort without leaking resources.
What ChangedThis PR fixes a core async-safety problem in PraisonAI’s tool-calling wrapper by replacing per-runtime event-loop execution with a shared `_async_bridge` and adding explicit cancellation-aware timeout handling so run-time tool calls can finish/abort without leaking resources.
Why It MattersUsers and operators invoking tools through PraisonAI should see fewer stalled jobs and background leftovers when calls timeout, because long-running or failing tool runs now terminate with explicit timeout outcomes and cleaned-up execution state. Technically, this is achieved by unifying async execution around `_async_bridge` plus cancellation/finally cleanup on `run_sync` timeout paths and a boundary-level `tool_timeout` guard; watch for edge cases where custom tools do not propagate cancellation correctly or where strict timeout settings may prematurely cut legitimate long tasks under heavy concurrency.
Final score 81Confidence 941 evidence itemPraisonAIInteractiveRuntime_async_bridgerun_synctool_timeout
Analyze Evidence
pull requestMay 21, 2026, 10:32 AM
Fix empty-ID OpenAI tool-call replays to prevent ghost calls
The PR fixes a replay-path correctness issue in Pi’s OpenAI-compatible provider layer by normalizing tool-call IDs during chat-completions/Responses replay: it merges argument-only streaming deltas into the active tool call, assigns deterministic non-empty IDs when ids are missing or pipe-prefixed, and drops orphan tool outputs without a matching emitted assistant tool call.
What ChangedThe PR fixes a replay-path correctness issue in Pi’s OpenAI-compatible provider layer by normalizing tool-call IDs during chat-completions/Responses replay: it merges argument-only streaming deltas into the active tool call, assigns deterministic non-empty IDs when ids are missing or pipe-prefixed, and drops orphan tool outputs without a matching emitted assistant tool call.
Why It MattersOperators of agent workflows using Pi with OpenAI-compatible APIs can avoid unexpected failures when replaying tool calls across turns, because empty or malformed replay IDs no longer produce requests that the API rejects. Concretely, the provider now rewrites missing or pipe-prefixed identifiers into stable non-empty IDs, merges orphaned argument-only deltas into the current call context, and filters unmatched orphan outputs before sending request payloads; this should reduce intermittent broken turns in function-calling systems. Watch for any rare fallback-ID collision or over-filtering case where a valid delayed output might be dropped and requires retry behavior from the caller.
Final score 80Confidence 961 evidence itemOpenAI-compatible providertool-call replaychat-completionsResponsestool_call_idcall_id
Analyze Evidence
pull requestMay 22, 2026, 6:24 AM
Preserve ThoughtSignature on ADK synthetic confirmation calls
This PR fixes a regression in ADK’s confirmation replay flow: synthetic `adk_request_confirmation` function-call parts were created without copying the original call’s `ThoughtSignature`, which caused Gemini thinking models to reject them with `400 INVALID_ARGUMENT`. The fix now ensures the replayed confirmation part inherits that signature.
What ChangedThis PR fixes a regression in ADK’s confirmation replay flow: synthetic `adk_request_confirmation` function-call parts were created without copying the original call’s `ThoughtSignature`, which caused Gemini thinking models to reject them with `400 INVALID_ARGUMENT`. The fix now ensures the replayed confirmation part inherits that signature.
Why It MattersFor applications using ADK tool-calling with Gemini thinking models, replayed confirmation calls now avoid the `function call adk_request_confirmation is missing a thought_signature` failure, so conversations are less likely to stop mid-flow and require operator retries. Technically, the change updates synthetic confirmation construction in `internal/llminternal/functions.go` to look up the source call by function-call ID and transfer the signature when available, which should be monitored after rollout for any remaining replay paths that might still drop signatures.
Final score 80Confidence 971 evidence itemgenerateRequestConfirmationEventadk_request_confirmationThoughtSignatureFunctionCallGemini thinking modelsfunction-call replay
Analyze Evidence
pull requestMay 19, 2026, 8:55 AM
Fix dropped function-response events by restoring missing IDs in adk-go tool-call deserialization
Google adk-go PR #690 fixes a bug where `aiplatformToGenaiContent` failed to copy the `Id` field from `FunctionCall` and `FunctionResponse`, which caused function-response events to be silently dropped in multi-invocation tool-calling sessions that rely on non-empty IDs. A round-trip unit test was added to prevent the regression.
What ChangedGoogle adk-go PR #690 fixes a bug where `aiplatformToGenaiContent` failed to copy the `Id` field from `FunctionCall` and `FunctionResponse`, which caused function-response events to be silently dropped in multi-invocation tool-calling sessions that rely on non-empty IDs. A round-trip unit test was added to prevent the regression.
Why It MattersApplications using adk-go tool calling will stop losing function responses during multi-invocation flows, so automated tool workflows can continue without silent drops and no need for extra retry logic. The fix restores ID propagation from serialized protobuf message parts to match the expected ID-based event matching path; teams should watch whether any other tool-call protobuf fields still bypass ID mapping in similar conversion paths and confirm behavior after merge and rollout.
Final score 80Confidence 971 evidence itemaiplatformToGenaiContentcreateAiplatformpbContentFunctionCallFunctionResponseprotobufId fieldtool callingmulti-invocation sessionround-trip unit test
Analyze Evidence
pull requestMay 18, 2026, 8:22 PM
Rulesync enforces explicit Kilo subagent frontmatter validation
This pull request replaces the old alias-based handling of Kilo subagent frontmatter with a dedicated schema that explicitly lists supported Kilo fields and uses it for runtime parsing/validation, so invalid values are rejected at config load time instead of passing silently.
What ChangedThis pull request replaces the old alias-based handling of Kilo subagent frontmatter with a dedicated schema that explicitly lists supported Kilo fields and uses it for runtime parsing/validation, so invalid values are rejected at config load time instead of passing silently.
Why It MattersDevelopers and operators using rulesync with Kilo subagents will now see immediate configuration errors when they specify bad agent-frontmatter values, which reduces silent misconfiguration and helps avoid broken agent runs later in deployment pipelines. The update introduces `KiloSubagentFrontmatterSchema` as the authoritative field set for `displayName`, `temperature`, `model`, and related Kilo options, and makes `KiloSubagent.validate()`/`fromRulesyncSubagent` enforce it with Zod at runtime while still allowing unknown future Kilo fields via loose passthrough. Watch for any existing subagent configs that relied on permissive parsing, and monitor CI or startup logs after rollout for newly surfaced validation failures.
Final score 80Confidence 951 evidence itemrulesyncKiloSubagentKiloSubagentFrontmatterSchemaZodfromRulesyncSubagent
Analyze Evidence
pull requestMay 18, 2026, 6:11 AM
Fix Copilot CLI hook compatibility so PeonPing no longer drops key agent events
The PR replaces the Copilot integration path with a unified event-handling fix: Copilot hooks are written directly under `~/.copilot/hooks` when available, the Copilot adapters now use explicit per-event translation instead of implicit remaps, and incoming payloads are normalized from camelCase aliases to the expected snake_case fields so events like `permissionRequest` are detected instead of being silently ignored.
What ChangedThe PR replaces the Copilot integration path with a unified event-handling fix: Copilot hooks are written directly under `~/.copilot/hooks` when available, the Copilot adapters now use explicit per-event translation instead of implicit remaps, and incoming payloads are normalized from camelCase aliases to the expected snake_case fields so events like `permissionRequest` are detected instead of being silently ignored.
Why It MattersCopilot CLI users of PeonPing now receive audible cues for completion, permission prompts, and failure/notification signals again, reducing the chance of missing critical workflow prompts during agent sessions; this also makes behavior more predictable for operators who rely on these hooks for task oversight. The change works by replacing brittle event remaps with explicit translation and a compatibility shim for upstream payload drift, while preserving existing handled event coverage. Watch for whether future Copilot CLI payload schema changes introduce new field-name variants beyond the 13 aliases and whether any unhandled event names appear in real traffic.
Final score 80Confidence 941 evidence itemPeonPingGitHub Copilot CLI 1.0.48-1copilot hooksadapters/copilot.shadapters/copilot.ps1install.shinstall.ps1peon.shpeon.ps1
Analyze Evidence
pull requestMay 21, 2026, 10:34 PM
Add persistent Cursor MAX mode across model selection and session lifecycle
Adds end-to-end support for Cursor MAX mode by introducing a `:max` model selector flag and threading that state through model parsing/formatting, agent session state, and task-subagent startup so supported models can reliably run in 1M-context mode.
What ChangedAdds end-to-end support for Cursor MAX mode by introducing a `:max` model selector flag and threading that state through model parsing/formatting, agent session state, and task-subagent startup so supported models can reliably run in 1M-context mode.
Why It MattersCursor users and operators can now keep expensive long-context runs in the intended 1M-context mode across model selection, resumes, and subagent creation, instead of being silently pulled back to a lower context setting mid workflow. The implementation wires `:max` through selector parsing/formatting and session lifecycle APIs (`setCursorMaxMode`, `getCursorMaxMode`) while propagating policy during startup and subagent initialization, so behavior is consistent; track parser regressions with non-Cursor models, capability detection mistakes that could enable/disable MAX incorrectly, and any gaps in MAX inheritance when sessions are rehydrated.
Final score 80Confidence 931 evidence itemCursor MAX mode`:max` selector flagSelectorFlagsAgentSessiontask subagent session inheritance
Analyze Evidence
pull requestMay 19, 2026, 9:39 AM
Stop tool-call batch execution on abort in agent run loop
This change fixes tool-call handling so that when `ctx.abort()` is triggered, the agent execution loop checks `signal?.aborted` during prep/execution, exits early, returns aborted tool results, and halts the run after the current turn while still finalizing through `afterToolCall`.
What ChangedThis change fixes tool-call handling so that when `ctx.abort()` is triggered, the agent execution loop checks `signal?.aborted` during prep/execution, exits early, returns aborted tool results, and halts the run after the current turn while still finalizing through `afterToolCall`.
Why It MattersWhen a tool run is aborted, users and operators are less likely to see a workflow continue after cancellation, which reduces confusing partial behavior across a turn and improves operational correctness of tool-driven sessions. Concretely, aborted calls now short-circuit instead of letting other queued calls proceed silently, and cleanup still runs via `afterToolCall`; teams should watch multi-tool-call batches where non-aborted sibling calls remain represented in session export for potential UI or downstream trace interpretation issues.
Final score 79Confidence 961 evidence itemctx.abortsignal?.abortedagent tool-call loopafterToolCalltool-call session turn
Analyze Evidence
issueMay 22, 2026, 9:27 AM
cc-connect opencode tool calls return blank outputs on Feishu under Windows/WSL
An open issue in cc-connect v1.2.1 reports that opencode mode can start sessions and complete turns but returns empty output for tool calls such as `whoami` and file reads, which means the assistant’s core command/result flow is no longer observable for end users, a critical functional regression for operators relying on tool execution.
What ChangedAn open issue in cc-connect v1.2.1 reports that opencode mode can start sessions and complete turns but returns empty output for tool calls such as `whoami` and file reads, which means the assistant’s core command/result flow is no longer observable for end users, a critical functional regression for operators relying on tool execution.
Why It MattersOperators using cc-connect as an Lark/Feishu assistant on Windows/WSL cannot get command output or file-read results, so automated or interactive tool-driven workflows become non-functional even though tasks appear to complete. The logs repeatedly show session start/complete events without visible response content, so teams should watch whether response frames are dropped after execution, especially in the stream/preview branch (`hasHandle=false`, `degraded=true`), and validate fixes in both command and read-file paths in upcoming versions.
Final score 79Confidence 871 evidence itemcc-connectopencodeFeishuWindows 11WSLtool callingv1.2.1
Analyze Evidence
pull requestMay 20, 2026, 10:55 AM
Accept string and numeric u32 values in Claude ambient tool inputs
The PR fixes a parser failure in Claude ambient tool calling where fields expected as u32 were sent as strings, causing ambient cycles to fail at deserialization and stop working.
What ChangedThe PR fixes a parser failure in Claude ambient tool calling where fields expected as u32 were sent as strings, causing ambient cycles to fail at deserialization and stop working.
Why It MattersUsers and operators relying on Claude ambient tool calls can now keep ambient workflows running because calls that send numeric fields like `"0"` no longer break parsing and abort the whole ambient cycle, reducing complete mode outages after tool invocations. The technical fix is a dual-acceptance parser path for targeted u32 fields, applied only to a small set of schedule-related inputs. Next, watch whether similar stringified numeric fields appear in other tool schemas and track ambient-cycle failure/retry metrics after deployment to catch any remaining type-contract gaps.
Final score 79Confidence 971 evidence itemClaude tool callingambient modeSerdeu32string-or-u32 deserializationEndCycleInputScheduleInputScheduleToolInputNextScheduleInput
Analyze Evidence
pull requestMay 20, 2026, 10:27 AM
Canonicalize HERMES_ONLY_TOOLS filtering to prevent tool shadowing
This change adds a single authoritative `HermesToolFilter` implementation and integrates it into PraisonAI’s tool registry, CLI loading, and export flow so `HERMES_ONLY_TOOLS` now has deterministic behavior. It specifically addresses overlapping tool names across environments by defining clear visibility rules and diagnostics, with unknown or invalid tool lists handled consistently.
What ChangedThis change adds a single authoritative `HermesToolFilter` implementation and integrates it into PraisonAI’s tool registry, CLI loading, and export flow so `HERMES_ONLY_TOOLS` now has deterministic behavior. It specifically addresses overlapping tool names across environments by defining clear visibility rules and diagnostics, with unknown or invalid tool lists handled consistently.
Why It MattersOperators running PraisonAI across multiple environments should get more predictable tool behavior, because overlapping tool names are now filtered consistently and agents are less likely to execute the wrong tool in production workflows. Technically, PR #1700 introduces centralized whitelisting via `HERMES_ONLY_TOOLS` (all tools vs. explicit list), logs startup diagnostics for collisions, and applies dev-mode warning+strip plus CI-mode strict-fail behavior for unknown tool names. Watch next how existing deployments with legacy environment setups react to the stricter unknown-tool checks and whether onboarding docs and env files are updated before broad rollout.
Final score 79Confidence 941 evidence itemHERMES_ONLY_TOOLSHermesToolFilterpraisonaiagents/hermes_filter.pytool registryCLI tool loading
Analyze Evidence
pull requestMay 20, 2026, 11:30 PM
Gate tool-pair summarization to context windows of 64K or less
PR #9152 changes Goose so tool-pair summarization is skipped when context grows beyond 64K tokens, while leaving small-context behavior unchanged; users can still fully disable it with GOOSE_TOOL_PAIR_SUMMARIZATION=false.
What ChangedPR #9152 changes Goose so tool-pair summarization is skipped when context grows beyond 64K tokens, while leaving small-context behavior unchanged; users can still fully disable it with GOOSE_TOOL_PAIR_SUMMARIZATION=false.
Why It MattersOperators of Goose in long sessions with many tool calls should see fewer harmful long-run behaviors because summarization is no longer applied unconditionally at high context lengths, reducing the chance of session quality drops. Technical follow-up: the cutoff is now fixed at 64K tokens, so teams should monitor whether this threshold is too strict for some workloads and verify the disable flag consistently applies in all deployment environments.
Final score 79Confidence 981 evidence itemtool-pair summarizationcontext window64K tokensGOOSE_TOOL_PAIR_SUMMARIZATION
Analyze Evidence

Topic Timeline

How the topic has changed over time

44 events

May 22, 2026, 10:06 AM
pull request
Fix dangling tool-call normalization for repeated tool IDs
The PR updates DanglingToolCallMiddleware so it keeps all ToolMessages sharing the same tool_call_id and processes them in occurrence order, preventing later matching tool outputs from being dropped during transcript normalization in summarized tool-calling sessions.
ContributionConverted the middleware’s normalization logic from single-item lookup per tool_call_id to ordered multi-match handling, so repeated IDs across assistant turns no longer cause later ToolMessages to be discarded.
ImpactTool-enabled chat applications can avoid silently losing a later tool result when the same tool_call_id appears in multiple turns, so developers and operators get stable multi-turn behavior instead of intermittent missing actions after summarization or compression steps. The middleware now tracks and consumes all matching ToolMessages by turn order, which should reduce hard-to-debug state corruption in tool flows, but it should be monitored for any edge cases with very long compressed histories where repeated IDs are heavily interleaved.
May 22, 2026, 9:34 AM
issue
Add configurable HTTP Webhook Sidecar for Clawd approval decisions
An issue proposes a single primary change: add a pluggable HTTP Webhook Sidecar so Clawd remote permission approval is no longer fixed to Telegram. The new flow would let Clawd send permission requests to a user-defined endpoint and receive allow/deny callbacks to resolve tool execution.
ContributionDefines a concrete integration capability: a pluggable approval client that reuses existing sidecar behavior while replacing the hardcoded Telegram transport with any custom HTTP endpoint. The proposed behavior posts structured permission payloads (including requestId/sessionId/tool info) and binds callback responses back into existing permission resolution.
ImpactOperators who want approval on their own local devices or self-hosted bots (e.g., ESP32 button panels, Home Assistant, Node-RED, private Slack/Discord/Feishu bridges) could route Clawd approval traffic through their chosen infrastructure instead of being bound to Telegram, enabling broader local or air-gapped control workflows. Technically, this means the approval transport is moved from a single vendor channel to configurable webhook targets with callbacks into resolvePermissionEntry, so follow-up monitoring should focus on callback security, endpoint authentication, requestId idempotency, and timeout/retry handling to avoid stuck approvals or mis-bound decisions.
May 22, 2026, 9:34 AM
commit burst
Bridge path-access approvals to the web dashboard
The update fixes a broken interactive path-approval flow by wiring `path_access` prompts end-to-end from the terminal side to the web UI, so the dashboard can now present and resolve the permission decision instead of leaving the session waiting.
ContributionImplemented a concrete path-approval bridge by adding a `path` active-modal variant, propagating pending path requests through `App.tsx` and `DashboardContext`, and extending the modal resolution API to accept `kind: path` plus explicit outcomes (`run_once`, `always_allow`, `deny`) so web operators can actually complete the approval flow.
ImpactWeb users and operators can now approve or deny file-access prompts from the web dashboard, so model runs that request sandbox-external paths no longer hang at a blank modal waiting state. The fix unifies path approval with the same TUI↔web modal handling used for other blocking gates, with explicit resolve actions routed through `/api/modal/resolve`, reducing stalled sessions and making permission outcomes visible during execution. Watch for reconnect/replay cases that could drop `pendingPath` state, and verify the new `always_allow`/`run_once` semantics never let high-risk path access be applied unexpectedly.
May 22, 2026, 9:27 AM
issue
cc-connect opencode tool calls return blank outputs on Feishu under Windows/WSL
An open issue in cc-connect v1.2.1 reports that opencode mode can start sessions and complete turns but returns empty output for tool calls such as `whoami` and file reads, which means the assistant’s core command/result flow is no longer observable for end users, a critical functional regression for operators relying on tool execution.
ContributionPinpoints a concrete behavior defect in the opencode tool-response path where session lifecycle logs are present but response payloads are not delivered, which is vital for debugging and prioritizing a fix because it blocks verified command-based workflows rather than minor UI or config noise.
ImpactOperators using cc-connect as an Lark/Feishu assistant on Windows/WSL cannot get command output or file-read results, so automated or interactive tool-driven workflows become non-functional even though tasks appear to complete. The logs repeatedly show session start/complete events without visible response content, so teams should watch whether response frames are dropped after execution, especially in the stream/preview branch (`hasHandle=false`, `degraded=true`), and validate fixes in both command and read-file paths in upcoming versions.
May 22, 2026, 9:21 AM
security fix
Hard-block tool calls for Title, Summarize, and Compaction agents
Kilo-Org/kilocode changed its system-agent permission setup so Title, Summarize, and Compaction agents now enforce a deny-all tool policy at the agent level, preventing top-level allow rules from re-enabling tool calls for these agents.
ContributionIntroduced an explicit hardcoded `deny: "*"` rule for these system agents, fixing a permission-precedence bug where outer allow rules could override the intended prohibition and allow prohibited tool invocation.
ImpactOperators of workflows that use KiloCode’s Title, Summarize, or Compaction agents will see fewer broken outputs and tool-related misbehavior, because these agents are now blocked from attempting tool calls they are not supposed to make. This reduces unexpected external interactions during routine title generation and summarization/compaction tasks and is worth watching for regressions where any of these agents still needs legitimate tool access, along with any side effects if future changes reintroduce permissive rule overrides.
May 22, 2026, 6:25 AM
pull request
Emit completed tool responses before confirmation requests
In google/adk-go, `runOneStep` now yields the merged tool function-response event before yielding `adk_request_confirmation`. This ensures a completed tool result is written to session history even when a consumer stops at the confirmation boundary.
ContributionReordered the event stream so completed tool responses are emitted first, preserving session-history correctness for human-in-the-loop tool flows.
ImpactTool-call frameworks and operators using adk-go will no longer lose completed tool outputs when execution pauses for user approval, so downstream state, replay, and audit flows stay consistent instead of appearing to stop at a confirmation-only snapshot. Technically, the change emits the merged function-response event before the confirmation event; teams should watch custom consumers for implicit ordering assumptions and add/adjust tests around stop-at-confirmation behavior.
May 22, 2026, 6:24 AM
bug fix
Preserve ThoughtSignature on ADK synthetic confirmation calls
This PR fixes a regression in ADK’s confirmation replay flow: synthetic `adk_request_confirmation` function-call parts were created without copying the original call’s `ThoughtSignature`, which caused Gemini thinking models to reject them with `400 INVALID_ARGUMENT`. The fix now ensures the replayed confirmation part inherits that signature.
ContributionCopies the original model `FunctionCall`’s `ThoughtSignature` into synthetic confirmation parts during request confirmation generation, closing a compatibility gap with replayed model-role function calls and preventing signature-related rejection on Gemini thinking models.
ImpactFor applications using ADK tool-calling with Gemini thinking models, replayed confirmation calls now avoid the `function call adk_request_confirmation is missing a thought_signature` failure, so conversations are less likely to stop mid-flow and require operator retries. Technically, the change updates synthetic confirmation construction in `internal/llminternal/functions.go` to look up the source call by function-call ID and transfer the signature when available, which should be monitored after rollout for any remaining replay paths that might still drop signatures.
May 22, 2026, 12:42 AM
release
Hooks system added for pre/post tool call control
The v1.35.0 release introduces an extensible hooks framework around tool execution, including a PreToolUse denial hook, so tool invocations can be intercepted and controlled before and after running.
ContributionImplemented a new hook layer for tool calls in Goose that registers pre- and post-execution handlers, with explicit denial support for pre-tool checks. This gives integrations a concrete extension point to validate, block, or augment tool calls instead of relying only on implicit runtime behavior.
ImpactDevelopers and operators can now stop prohibited tool actions before they execute, which reduces accidental or unsafe automation behavior and shifts policy enforcement closer to where actions happen. The new hook points also make it easier to add auditing and custom validation around tool usage; teams should watch for misconfigured hooks that could over-block legitimate calls or add extra latency at high tool-call volume.
May 21, 2026, 10:34 PM
pull request
Add persistent Cursor MAX mode across model selection and session lifecycle
Adds end-to-end support for Cursor MAX mode by introducing a `:max` model selector flag and threading that state through model parsing/formatting, agent session state, and task-subagent startup so supported models can reliably run in 1M-context mode.
ContributionImplemented MAX-mode as a first-class model capability (`maxMode`) that is parsed from model selectors, preserved in session state, applied in provider request construction, and inherited by subagent sessions, replacing ad-hoc flag handling with a shared selector/session flow tied to Cursor capability policy.
ImpactCursor users and operators can now keep expensive long-context runs in the intended 1M-context mode across model selection, resumes, and subagent creation, instead of being silently pulled back to a lower context setting mid workflow. The implementation wires `:max` through selector parsing/formatting and session lifecycle APIs (`setCursorMaxMode`, `getCursorMaxMode`) while propagating policy during startup and subagent initialization, so behavior is consistent; track parser regressions with non-Cursor models, capability detection mistakes that could enable/disable MAX incorrectly, and any gaps in MAX inheritance when sessions are rehydrated.
May 21, 2026, 10:32 AM
pull request
Fix empty-ID OpenAI tool-call replays to prevent ghost calls
The PR fixes a replay-path correctness issue in Pi’s OpenAI-compatible provider layer by normalizing tool-call IDs during chat-completions/Responses replay: it merges argument-only streaming deltas into the active tool call, assigns deterministic non-empty IDs when ids are missing or pipe-prefixed, and drops orphan tool outputs without a matching emitted assistant tool call.
ContributionImplemented deterministic replay normalization for tool-call identifiers and call reconstruction, covering both chat-completions and Responses flows, and added regression tests for empty-id ghost calls and pipe-prefixed IDs.
ImpactOperators of agent workflows using Pi with OpenAI-compatible APIs can avoid unexpected failures when replaying tool calls across turns, because empty or malformed replay IDs no longer produce requests that the API rejects. Concretely, the provider now rewrites missing or pipe-prefixed identifiers into stable non-empty IDs, merges orphaned argument-only deltas into the current call context, and filters unmatched orphan outputs before sending request payloads; this should reduce intermittent broken turns in function-calling systems. Watch for any rare fallback-ID collision or over-filtering case where a valid delayed output might be dropped and requires retry behavior from the caller.
May 21, 2026, 10:02 AM
pull request
PraisonAI enforces OpenAI-compatible BaseTool schemas at definition time
The PR strengthens PraisonAI tool-calling by validating BaseTool schemas against OpenAI expectations before they are used in chat/agent dispatch, adding schema round-trip and tool-list consistency checks so malformed schemas fail fast instead of propagating into runtime tool-call failures.
ContributionImplemented concrete schema checks that gate tool registration/dispatch: BaseTool now performs OpenAI-compatibility validation, validates JSON schema round-tripping for serialization parity, enforces tool-list consistency (including safer handling of duplicates), and reports actionable remediation hints.
ImpactDevelopers integrating tools with PraisonAI’s chat/agent workflows will get earlier, clearer failures when tool schemas are malformed, so broken tool calls are more likely to be caught during development or startup instead of interrupting live conversations. The change applies OpenAI-alignment checks in BaseTool.validate and the @tool decorator path, and adds round-trip plus list consistency validators to avoid silent dispatch mismatches; this should reduce debugging time and production incidents from tool schema drift. Continue monitoring whether existing custom tools need schema adjustments for strict JSON compatibility or naming consistency after adoption.
May 21, 2026, 9:23 AM
issue
Plugin-only install in context-mode breaks `ctx stats` tool access
`context-mode` users report that in the latest release, installing via plugin-only in OpenCode (without MCP server) no longer supports `ctx stats`, and the command fails with an unavailable-tool error instead of returning context statistics.
ContributionThe issue identifies a concrete regression in runtime behavior: plugin-only installation stops exposing the `context-mode_ctx_stats` tool, so OpenCode cannot execute `ctx stats` even though installation and upgrade steps succeed.
ImpactDevelopers and operators using `context-mode` through the plugin-only path cannot run `ctx stats`, so they lose a key diagnostics capability during sessions and may have to re-enable MCP server integration or switch install modes to avoid command failures. The error shows the platform no longer exposes `context-mode_ctx_stats` after plugin-only setup, while expected debug/validation artifacts like `ctx-debug.sh` are absent; monitor whether future releases restore plugin-only tool registration and whether Windows/PowerShell environments are especially affected.
May 21, 2026, 8:10 AM
dependency update
Upgrade anthropic-sdk-go to v1.44.1 with runner tool-call ownership fix
This Renovate PR updates `github.com/anthropics/anthropic-sdk-go` from v1.26.0 to v1.44.1, with the primary actionable change being the runner bug fix where unowned tool calls are now skipped.
ContributionUpgrades the project’s Anthropic Go SDK dependency to v1.44.1 and incorporates the fixed runner behavior that ignores tool calls not owned by `SessionToolRunner`, which changes tool execution semantics to prevent incorrect ownership-based dispatch during tool workflows.
ImpactDevelopers and services using this project’s Anthropic tool-calling path can avoid some incorrect tool-invocation handling after the SDK bump, reducing malformed tool call flows and related request instability. The updated `SessionToolRunner` ownership guard in v1.44.1 is intended to reduce cross-owner tool-call mishandling, but the large version jump (1.26→1.44) still needs end-to-end validation for any subtle API or behavioral compatibility regressions in existing integrations.
May 20, 2026, 11:30 PM
pull request
Gate tool-pair summarization to context windows of 64K or less
PR #9152 changes Goose so tool-pair summarization is skipped when context grows beyond 64K tokens, while leaving small-context behavior unchanged; users can still fully disable it with GOOSE_TOOL_PAIR_SUMMARIZATION=false.
ContributionAdded a context-length guard around the tool-pair summarization path so it runs only within 64K-context sessions, preventing the known long-session degradation while preserving short-context behavior and an explicit opt-out flag.
ImpactOperators of Goose in long sessions with many tool calls should see fewer harmful long-run behaviors because summarization is no longer applied unconditionally at high context lengths, reducing the chance of session quality drops. Technical follow-up: the cutoff is now fixed at 64K tokens, so teams should monitor whether this threshold is too strict for some workloads and verify the disable flag consistently applies in all deployment environments.
May 20, 2026, 7:07 PM
pull request
Fix ADK console tool confirmations to return actionable yes/no results
Added dedicated console-side handling for `toolconfirmation.FunctionCallName` so operator confirmation prompts now return a `{"confirmed": bool}` payload instead of the generic `{"result": <text>}` fallback that `ctx.ToolConfirmation()` could not consume.
ContributionImplemented a targeted fix for tool-confirmation interactions: the console now renders a dedicated confirmation prompt for tool-call interrupts and parses user responses into a boolean `confirmed` field, enabling `ctx.ToolConfirmation()` to correctly apply operator approval or rejection.
ImpactOperators using the ADK console can now actually confirm or reject sensitive tool actions (for example, deleting records), so approval-driven workflows resume correctly instead of appearing blocked by a confirmation response that was never usable. This matters most for teams running interactive HITL flows because it restores a reliable manual control path; continue to watch for regressions in other clients or scripts that still emit or expect the old `{"result": <text>}` response format.
May 20, 2026, 6:41 PM
commit burst
LibreChat hardens data-retention cleanup for temporary chats and related artifacts
The burst’s main change is a unified retention fix: temporary/all retention policies are now applied consistently across conversations plus linked artifacts (files, tool calls, and shared links), with added expiry metadata and sweep logic to prevent stale data from being kept indefinitely.
ContributionImplemented end-to-end retention-semantics hardening by extending schema and cleanup logic for conversation-attached artifacts, propagating temporary-flag context from client paths into server checks, and correcting sweep/read behavior for all-retention and temporary modes.
ImpactAdmins and operators managing LibreChat chats can expect temporary or all-retention policies to remove session artifacts more reliably, so fewer private files/tool calls persist beyond intended periods. The change adds retention-aware paths for files, tool calls, and shared links with expiredAt/TTL handling and deterministic sweep behavior, so teams should monitor legacy-record cleanup, sweep overlap failures, and cleanup latency as data volume grows.
May 20, 2026, 6:07 PM
bug fix
Fix empty tool-call payloads to prevent Bedrock 400s
The commit burst resolves a production-breaking tool-calling path by changing how `/btw` and IRC ephemeral turns encode tools for LiteLLM→Bedrock. The fix makes `context.tools === undefined` the only case for the `hasToolHistory` sentinel, switches checks to length-based logic so explicit `context.tools = []` is treated as an intentional no-tool request, and drops redundant `tool_choice: 'none'` when no tools are available, preventing the empty `toolConfig` payload that triggered Bedrock request rejections.
ContributionFixed a concrete tool-routing logic bug in AI provider request construction: distinguished “caller omitted tools” from “caller passed empty tools,” gated tool history handling accordingly, and removed `tool_choice` when no tools are resolved so an empty tool array no longer emits an invalid directive.
ImpactOperators and developers running /btw/IRC background tool-calling flows will see fewer failed completions, because sessions that previously hit a 400 rejection after earlier tool interactions now keep working with explicit no-tool turns. Watch whether any provider path still expects an empty tools field, and verify no regression in the Anthropic-proxy compatibility branch (`hasToolHistory`) while continuing to validate mixed tool-history scenarios.
May 20, 2026, 3:29 PM
release
Pi CLI release/install path hardened with locked dependencies
v0.75.4 introduces supply-chain hardening for the Pi CLI install and release flow by adding a generated transitive dependency lockfile, enforcing checks for dependency pinning and lifecycle-script allowlists, and disabling lifecycle scripts during self-update/local release installs.
ContributionImplemented a concrete supply-chain safety path for CLI installs and updates: the release now ships a generated transitive dependency lockfile, blocks accidental lockfile drift, validates dependency pinning and allowed lifecycle scripts, and disables risky lifecycle hooks during self-update/local install flows with pre-release npm/Bun install smoke tests.
ImpactOperators and developers who install or auto-update the Pi CLI should see safer upgrades with fewer surprise dependency changes and reduced risk from unexpected install hooks, so rollouts are less likely to introduce build-time surprises or accidental package-level risk during deployment. The change is enforced through a release-time npm-shrinkwrap.json plus checks for pinned transitive dependencies and lifecycle allowlists, and by disabling lifecycle scripts in reinstall/update paths. Watch for false positives in environments using custom install hooks and monitor any regressions where legitimate packages are rejected by the new lockfile or allowlist rules.
May 20, 2026, 10:55 AM
pull request
Accept string and numeric u32 values in Claude ambient tool inputs
The PR fixes a parser failure in Claude ambient tool calling where fields expected as u32 were sent as strings, causing ambient cycles to fail at deserialization and stop working.
ContributionIntroduced custom deserializers that accept both string and u32 for numeric fields and applied them to EndCycleInput.memories_modified/compactions and wake_in_minutes in NextScheduleInput, ScheduleInput, and ScheduleToolInput, eliminating strict type-only parsing for these tool-call payloads.
ImpactUsers and operators relying on Claude ambient tool calls can now keep ambient workflows running because calls that send numeric fields like `"0"` no longer break parsing and abort the whole ambient cycle, reducing complete mode outages after tool invocations. The technical fix is a dual-acceptance parser path for targeted u32 fields, applied only to a small set of schedule-related inputs. Next, watch whether similar stringified numeric fields appear in other tool schemas and track ambient-cycle failure/retry metrics after deployment to catch any remaining type-contract gaps.
May 20, 2026, 10:27 AM
pull request
Canonicalize HERMES_ONLY_TOOLS filtering to prevent tool shadowing
This change adds a single authoritative `HermesToolFilter` implementation and integrates it into PraisonAI’s tool registry, CLI loading, and export flow so `HERMES_ONLY_TOOLS` now has deterministic behavior. It specifically addresses overlapping tool names across environments by defining clear visibility rules and diagnostics, with unknown or invalid tool lists handled consistently.
ContributionImplemented the canonical filter module for tool whitelisting in `praisonaiagents/hermes_filter.py`, defined concrete semantics for unset/empty/explicit values, added startup diagnostics, integrated filtering into tool registry and CLI loading paths, and added unit/integration test coverage and configuration docs.
ImpactOperators running PraisonAI across multiple environments should get more predictable tool behavior, because overlapping tool names are now filtered consistently and agents are less likely to execute the wrong tool in production workflows. Technically, PR #1700 introduces centralized whitelisting via `HERMES_ONLY_TOOLS` (all tools vs. explicit list), logs startup diagnostics for collisions, and applies dev-mode warning+strip plus CI-mode strict-fail behavior for unknown tool names. Watch next how existing deployments with legacy environment setups react to the stricter unknown-tool checks and whether onboarding docs and env files are updated before broad rollout.
May 20, 2026, 9:51 AM
commit burst
Lifecycle checkpoint transitions now gate high-risk tool mutations
The runtime now enters an explicit checkpoint state when a checkpoint is reached, so existing mutation gating applies during approval pauses and stopping at a checkpoint now cancels the lifecycle cleanly instead of leaving execution state behind.
ContributionIntroduced a concrete lifecycle behavior change: checkpoint entry now forces the state machine into `checkpoint`, enabling the existing high-risk mutation guard to run during paused review/revise/continue flows, and ensures checkpoint stop paths terminate via `lifecycle.cancel()`.
ImpactOperators and developers using the engineering lifecycle get safer checkpoint pauses, because risky commands and edits are now blocked while a checkpoint is waiting for approval, and stopping a checkpointed flow exits cleanly instead of lingering in a half-executing state. This should reduce unintended side effects during human review gates, while the next thing to watch is whether any legitimate checkpoint workflows are blocked too aggressively or any non-`App.tsx` code paths still skip the new transition.
May 20, 2026, 9:34 AM
pull request
LibreChat starts MCP OAuth flow before connect when tokens are missing
This fix changes MCP connection behavior for servers explicitly marked `requiresOAuth: true`: LibreChat now launches OAuth before attempting to connect, preventing a false "connected" state and subsequent tool-call failures for providers that accept anonymous handshakes but reject unauthenticated execution. It closes a real correctness gap where users reached a confusing dead-end after clicking Connect.
ContributionIntroduces a proactive OAuth gate: in createConnection, when `serverConfig.requiresOAuth === true` and no stored tokens exist, LibreChat emits `oauthRequired` before `attemptToConnect()`, awaits `oauthHandled` to proceed or `oauthFailed` to return auth guidance, and restricts this path to explicit OAuth-required servers. This directly replaces reactive 401-triggered auth, removes misleading successful handshakes, and preserves existing OAuth handler flow.
ImpactTeams using MCP tools such as Google BigQuery in LibreChat now get the OAuth login step during connect, so they no longer see a green connection followed by immediate "OAuth authentication required" when the first tool runs; authenticated calls become either available immediately after consent or fail early with a clear auth URL. This should reduce support friction and broken workflows, while operator monitoring should focus on callback failures, token-store behavior, and any repeated redirect loops for misconfigured OAuth apps or server URLs.
May 20, 2026, 7:22 AM
dependency update
opencode 1.15.5 update fixes plugin `ask` tool-call completion
This update in `dyoshikawa/rulesync` bumps `anomalyco/opencode` from v1.14.50 to v1.15.5, mainly to incorporate the release fix where plugin `ask` calls complete correctly instead of leaving tool invocations unresolved.
ContributionIntroduces a concrete bug fix in plugin execution by correcting `ask`-based tool calls so they can complete and report results instead of stalling.
ImpactDevelopers using plugin-driven flows are less likely to see stalled tool steps, so automated agent runs and interactive tasks are more likely to finish without manual retries or operator intervention. In technical terms, this release changes plugin `ask` behavior toward a resolvable completion path (promise-style behavior), which should reduce hanging tool interactions; watch whether custom plugins assume the prior unresolved `ask` pattern and whether this behavioral change affects their integration contracts.
May 19, 2026, 7:58 PM
pull request
Strip DeepSeek reasoning fields when tool calls are present
This PR fixes a DeepSeek-only compatibility regression by auto-detecting models routed to `deepseek-reasoner` and removing `reasoning_effort`/`reasoning` from requests whenever `tools` are included, which prevents reasoner endpoints from rejecting tool-call traffic.
ContributionAdds a new `OpenAICompat` flag (`disableReasoningWhenToolsPresent`) and DeepSeek auto-detection (`*-flash`, `deepseek-r1`, `deepseek-reasoner` on `api.deepseek.com`) so build-time parameter assembly drops `reasoning_effort` and `reasoning` whenever `tools` are present, with tests covering detection and request-body behavior.
ImpactDevelopers using DeepSeek tool-calling workflows (for example subagents) can avoid immediate 400 failures on tool invocations, so assistant runs no longer stop at the first step for affected Flash/reasoner models. The change is implemented via compat-mode detection in `openai-completions-compat` and parameter stripping in `buildParams`; in the next cycle, watch for new DeepSeek model naming/routing changes and verify custom compat overrides do not unintentionally remove reasoning from non-targeted models.
May 19, 2026, 3:17 PM
capability announcement
AWS publishes three implementation paths for Programmatic Tool Calling on Bedrock
The post’s core change is a concrete set of three implementation paths for Programmatic Tool Calling on Amazon Bedrock—self-hosted ECS sandbox, managed Bedrock AgentCore Code Interpreter, and an Anthropic SDK-compatible proxy—so teams can choose one integration route that fits their control, ops, and compatibility needs.
ContributionProvides a practical guide for adopting Bedrock tool-calling through three distinct architectures (self-hosted, managed, and SDK-compatible), reducing the need for teams to design separate custom orchestration from scratch.
ImpactDevelopers and operators can now add model tool-calling with less custom integration work because they can reuse existing Anthropic-style workflows or select a managed versus self-hosted deployment pattern based on governance and ops preferences, but teams should closely monitor security hardening in the ECS sandbox, proxy auth/routing correctness, and AgentCore service limits as they scale workloads across these paths.
May 19, 2026, 3:09 PM
pull request
Tighten Azure tool schema checks before sending tool calls
A single fix in the Azure/OpenAI adapter now enforces strict tool schema validation before chat-completion requests are sent and treats optional tool parameters as part of required schema validation, with a regression test covering run_skill_script optional-argument behavior.
ContributionThe change introduces stricter preflight validation in the Azure/OpenAI-compatible tool-calling path and updates schema requirements so optional tool arguments are explicitly represented during validation, reducing the chance of malformed tool calls being accepted and forwarded.
ImpactDevelopers building Azure-based agent tools with codecompanion.nvim will get more predictable tool-call behavior, because invalid or mismatched tool payloads are caught before hitting the chat-completion API, which helps avoid unexpected breakages in tool-driven workflows. The adapter now applies strict schema checks and marks optional arguments in the required validation path for Azure-compatible calls; operators should watch for new false-rejections of legitimate optional fields and watch downstream plugin telemetry for any increase in tool-call validation failures after rollout.
May 19, 2026, 1:16 PM
pull request
PraisonAI tool wrapper adds shared async bridge with cancellation-safe timeouts
This PR fixes a core async-safety problem in PraisonAI’s tool-calling wrapper by replacing per-runtime event-loop execution with a shared `_async_bridge` and adding explicit cancellation-aware timeout handling so run-time tool calls can finish/abort without leaking resources.
ContributionImplemented a single async-safety track for the tool wrapper: all internal tools now execute through a shared async bridge, `run_sync` timeout paths now cancel pending work with cleanup, and a timeout wrapper now returns explicit timeout results at the boundary to avoid silent hangs and leaked processes.
ImpactUsers and operators invoking tools through PraisonAI should see fewer stalled jobs and background leftovers when calls timeout, because long-running or failing tool runs now terminate with explicit timeout outcomes and cleaned-up execution state. Technically, this is achieved by unifying async execution around `_async_bridge` plus cancellation/finally cleanup on `run_sync` timeout paths and a boundary-level `tool_timeout` guard; watch for edge cases where custom tools do not propagate cancellation correctly or where strict timeout settings may prematurely cut legitimate long tasks under heavy concurrency.
May 19, 2026, 12:23 PM
software release
Forge adds local-agent guardrails that raise multi-step task reliability
Forge was introduced as an open-source reliability layer for self-hosted LLM tool-calling, adding model-agnostic runtime guardrails (retry nudges, step enforcement, error recovery, and VRAM-aware context handling) so the same 8B model can complete multi-step agentic workflows much more consistently without changing model weights.
ContributionIntroduces a concrete runtime control layer around local tool-calling agents that manages execution failures and context growth during multi-step runs, producing reported large reliability gains for 8B models (from ~53% to ~99% in its evaluation setup) without requiring model fine-tuning.
ImpactDevelopers and operators using local 8B agents can expect far fewer failed tool-use runs in multi-step workflows, which can reduce repeated retries and make smaller on-device models viable for production-like agent tasks that previously needed fallback to heavier/remote setups. This matters because the gain is attributed to guardrail behavior (retry nudges, strict step sequencing, recovery logic, and VRAM-aware context controls), so teams should next monitor whether those controls generalize across different toolchains and whether recovery logic introduces unacceptable latency or behavior drift in long-running automated agents.
May 19, 2026, 10:10 AM
pull request
Serena may overwrite externally edited files due to stale open-file cache
Tool Calling showed a tracked change with evidence attached, making the topic easier to monitor over time.
ContributionAdds evidence to the topic's change timeline.
ImpactHelps teams decide whether this direction deserves continued tracking.
May 19, 2026, 10:08 AM
pull request
Add Sandbox download_file API for binary retrieval
This PR adds a new `download_file` method to the sandbox abstraction and implements it in both `AioSandbox` and `LocalSandbox`, giving callers a unified way to read raw file bytes from sandboxed files. It also adds focused unit coverage for the async implementation’s normal, empty-file, lock, error, and chunked-read paths.
ContributionIntroduces a concrete capability gap fix: a new abstract sandbox API for downloading file contents is defined and implemented in two existing backends, enabling standardized binary output extraction instead of backend-specific workarounds.
ImpactDevelopers building multi-step workflows on Deer-flow can now download outputs (such as generated files) through a single sandbox API, reducing brittle ad-hoc extraction logic and making artifact handling easier to automate in downstream steps; teams should watch whether all sandbox backends consistently implement the new contract and whether error paths behave predictably for large or contested files. The new method is defined on `Sandbox` and implemented in `AioSandbox` and `LocalSandbox`, with tests added for empty files, locking, error propagation, and single-chunk behavior, so regressions in retrieval semantics should surface earlier during CI.
May 19, 2026, 9:57 AM
bugfix
Preserve Vertex AI function IDs during genai deserialization
Restores the round-trip behavior for Vertex AI session history by mapping `aiplatformpb.FunctionCall.Id` and `aiplatformpb.FunctionResponse.Id` to `genai.FunctionCall.ID` and `genai.FunctionResponse.ID` when reading events back into `genai.Content`, so tool-call and tool-response links are not dropped.
ContributionFixes a correctness issue where persisted Vertex AI session events lost function IDs on read-back, by explicitly preserving call/response IDs in the conversion path and adding `TestAiplatformToGenaiContentPreservesFunctionIDs` to guard this behavior.
ImpactAgents and tools that replay Vertex AI session history with ADK-go can keep tool calls tied to the correct responses across turns, reducing misrouted follow-up actions and fragile multi-turn tool workflows in production conversations. The change maps Vertex AI function IDs back into `genai` IDs during deserialization, so ID pairing now survives the write-read cycle; this should be watched for older or partially migrated sessions and for any other event conversion paths that might still skip the same ID fields.
May 19, 2026, 9:39 AM
pull request
Stop tool-call batch execution on abort in agent run loop
This change fixes tool-call handling so that when `ctx.abort()` is triggered, the agent execution loop checks `signal?.aborted` during prep/execution, exits early, returns aborted tool results, and halts the run after the current turn while still finalizing through `afterToolCall`.
ContributionAdds an explicit abort guard in the tool-call execution path that breaks processing as soon as abort state is observed, returns explicit aborted tool outputs, and preserves post-tool-call finalization behavior.
ImpactWhen a tool run is aborted, users and operators are less likely to see a workflow continue after cancellation, which reduces confusing partial behavior across a turn and improves operational correctness of tool-driven sessions. Concretely, aborted calls now short-circuit instead of letting other queued calls proceed silently, and cleanup still runs via `afterToolCall`; teams should watch multi-tool-call batches where non-aborted sibling calls remain represented in session export for potential UI or downstream trace interpretation issues.
May 19, 2026, 9:27 AM
pull request
Fix POSIX ACP CLI detection to avoid false-missing results after timeout
The PR replaces a brittle POSIX batch-only CLI availability check with a two-step strategy: it raises the batch `command -v` timeout to 8000ms, and on batch timeout it runs parallel per-CLI probes (each with 3000ms). This prevents a single slow PATH entry from forcing all built-in ACP CLIs to be reported as missing.
ContributionUpdated `AcpDetector.batchCheckCliAvailability` in `src/process/agent/acp/AcpDetector.ts` by introducing `POSIX_BATCH_TIMEOUT_MS = 8000`, adding `POSIX_PER_CLI_TIMEOUT_MS = 3000`, and adding fallback logic that runs isolated per-CLI `command -v` checks when the batch probe times out.
ImpactUsers and operators running AionUi in WSL, Docker, or Linux hosts with slow mounted PATHs will no longer see all ACP CLIs disappear as unavailable at startup, so installed tools remain selectable and usable instead of silently falling back to a degraded workflow. Technical mechanism: the detector still prefers a fast batch probe for normal environments, but now recovers from timeout cases by probing each CLI independently with bounded 3s checks; continue watching whether fallback frequency increases on very slow filesystems and whether per-CLI fallback meaningfully extends startup latency when many CLIs repeatedly timeout.
May 19, 2026, 9:18 AM
issue
Serena may overwrite externally edited files due to stale open-file cache
Issue #1013 identified a correctness bug where Serena reused a cached file buffer in `open_file()` without checking disk freshness, so `replace_content` could apply regex edits to stale content and silently overwrite newer external changes.
ContributionThe change request defines a concrete fix for stale-buffer safety: before returning a cached buffer, compare it with current on-disk state (mtime or content hash), invalidate cached buffers when changed, and force a fresh read for edit operations.
ImpactDevelopers using Serena alongside other tools (for example Claude Code Edit/Read and git workflows) can see real code changes disappear without warning, because an edit tool may report success while persisting stale content over newer file versions; teams should monitor multi-tool sessions for silent overwrite behavior and verify that future releases add explicit stale-buffer detection for all write paths. After the fix, operations like `replace_content` should only write against fresh file content, reducing risk of accidental data loss and making mixed-workflow editing safer.
May 19, 2026, 8:55 AM
pull request
Fix dropped function-response events by restoring missing IDs in adk-go tool-call deserialization
Google adk-go PR #690 fixes a bug where `aiplatformToGenaiContent` failed to copy the `Id` field from `FunctionCall` and `FunctionResponse`, which caused function-response events to be silently dropped in multi-invocation tool-calling sessions that rely on non-empty IDs. A round-trip unit test was added to prevent the regression.
ContributionAligned deserialization with existing serialization behavior by populating `Id` in `aiplatformToGenaiContent` for both `FunctionCall` and `FunctionResponse`, and added a round-trip test to lock in correct ID handling in future changes.
ImpactApplications using adk-go tool calling will stop losing function responses during multi-invocation flows, so automated tool workflows can continue without silent drops and no need for extra retry logic. The fix restores ID propagation from serialized protobuf message parts to match the expected ID-based event matching path; teams should watch whether any other tool-call protobuf fields still bypass ID mapping in similar conversion paths and confirm behavior after merge and rollout.
May 19, 2026, 5:27 AM
pull request
Use supported `reasoning` interleaved field for Cerebras Zai-GLM
The PR changes how reasoning text is sent for Cerebras by adding `reasoning` as an allowed `interleaved.field` and making Zai-GLM models default to that field, replacing top-level `reasoning_content` that Cerebras rejects.
ContributionIntroduced API-compatibility fix for Cerebras by registering `reasoning` as a supported `interleaved.field` and switching default Zai-GLM behavior to send reasoning text through this supported path instead of `reasoning_content`.
ImpactCerebras users running Kilo-Org/kilocode with Zai-GLM no longer face request failures in reasoning workflows, so inference flows that include reasoning turns can continue without breaking API calls. This works by replacing the unsupported top-level `reasoning_content` payload with the supported inline `reasoning` interleaved field for Cerebras, so operators should watch whether any existing clients still emit `reasoning_content` and whether other provider profiles need equivalent field mappings to avoid new compatibility regressions.
May 19, 2026, 3:26 AM
documentation update
Align skill/plugin tutorials to the live schema validation workflow
The PR makes tutorial documentation match the current validator rules by replacing the old in-notebook self-checking guidance with canonical schema tooling references, updating key tutorial text from deprecated rules, and removing duplicate notebooks so contributors use one authoritative set of examples.
ContributionSingle primary change: documentation canonicalization for contributor workflows by redirecting the validation tutorial to the official validator + spec sources and synchronizing 10 tutorial notebooks to current schema wording (for example required fields, tools format, and compatibility naming), then deleting duplicated notebook copies to prevent stale references.
ImpactNew contributors creating Claude skills and plugins are less likely to be guided by obsolete instructions that cause immediate schema-validation failures, reducing onboarding friction and rework while they follow the tutorials. Concretely, the PR shifts teaching from a hardcoded local validator sketch to the real validation path and updates the most common wrong rules in the major tutorial set, so contributors can converge on acceptable skill definitions faster; continue to monitor whether stale code cells outside the markdown pass (and any removed/notebook-link assumptions) reintroduce outdated guidance as the schema evolves.
May 19, 2026, 3:02 AM
release
Add AWS credential_process profile support in oh-my-pi
v15.1.4 adds support for `credential_process` profile entries in oh-my-pi’s AWS credential resolver, so teams can authenticate AI workflows through command-based AWS credential providers without relying only on static env credentials.
ContributionImplemented profile-aware AWS credential resolution by extending the AI runtime credential resolver to execute and consume `credential_process` outputs, allowing dynamic/temporary credentials from profile-defined providers during tool and agent calls.
ImpactOperators using oh-my-pi with AWS-backed tool calls can keep using their existing enterprise credential provider setup, reducing auth setup breakage and startup failures when running AI tasks through profile-based credentials. This should lower manual credential wiring during deployments, but teams should monitor command output parsing errors, provider execution permission issues, and whether fallback to old static-resolution paths changes behavior in multi-profile environments.
May 19, 2026, 12:32 AM
pull request
Fix validator batch discovery for root-level SKILL.md in Anthropic-spec plugins
PR #744 adds a second SKILL discovery path in the validator so plugins using the Anthropic-spec layout (`<plugin_root>/SKILL.md` beside `.claude-plugin/plugin.json`) are no longer skipped in batch checks, and adds tests to cover both root-level and legacy nested layouts without regression.
ContributionIntroduced root-level Anthropic-spec skill-file discovery in `validate_skills_schema.py` by adding a second walk anchored at `.claude-plugin/plugin.json`, then merging results with the legacy nested scan using absolute-path deduplication; this directly changes validation behavior from “silent miss” to “count and grade,” while preserving legacy layout support.
ImpactRepository operators and plugin maintainers using batch validation will now see valid Anthropic-spec plugins counted correctly instead of being shown as having zero skills, so plugins are less likely to be mistaken as missing or low-quality during quality checks. The validator now performs an additional root-level scan and SKILL dedup path for mixed-layout plugins, which should reduce false negatives but may change orphan-count and skill-count metrics in CI/monitoring, so dashboards and alert thresholds should be watched for intentional count jumps and duplicate-reporting regressions.
May 18, 2026, 8:22 PM
pull request
Rulesync enforces explicit Kilo subagent frontmatter validation
This pull request replaces the old alias-based handling of Kilo subagent frontmatter with a dedicated schema that explicitly lists supported Kilo fields and uses it for runtime parsing/validation, so invalid values are rejected at config load time instead of passing silently.
ContributionAdded a dedicated runtime-validated Kilo frontmatter schema and wired `KiloSubagent` parsing/validation to it, replacing fallback validation via the parent schema so supported Kilo settings become explicit and type-safe.
ImpactDevelopers and operators using rulesync with Kilo subagents will now see immediate configuration errors when they specify bad agent-frontmatter values, which reduces silent misconfiguration and helps avoid broken agent runs later in deployment pipelines. The update introduces `KiloSubagentFrontmatterSchema` as the authoritative field set for `displayName`, `temperature`, `model`, and related Kilo options, and makes `KiloSubagent.validate()`/`fromRulesyncSubagent` enforce it with Zod at runtime while still allowing unknown future Kilo fields via loose passthrough. Watch for any existing subagent configs that relied on permissive parsing, and monitor CI or startup logs after rollout for newly surfaced validation failures.
May 18, 2026, 3:00 PM
pull request
Add configurable OpenRouter base URL for Claude-Mem
The PR adds a new setting, `CLAUDE_MEM_OPENROUTER_BASE_URL`, that allows OpenRouter requests to be routed to any OpenAI-compatible `/v1` endpoint, while keeping the current OpenRouter default endpoint behavior when the setting is empty.
ContributionIntroduced a configurable OpenRouter base URL override and wired it into validation plus the settings UI, enabling deploy-time routing control to custom OpenAI-compatible endpoints with existing default fallback preserved.
ImpactDevelopers and operators using Claude-Mem can redirect OpenRouter calls to internal or custom-compatible gateways with a single setting, so they can adopt enterprise/private routing or proxy-based setups without rebuilding or patching worker bundles. Keep watching for per-environment endpoint compatibility and authentication/path issues, since custom `/v1` URLs may differ in auth headers, response formats, or rate-limit behavior even though the fallback default (`https://openrouter.ai/api/v1/chat/completions`) remains unchanged when unset.
May 18, 2026, 6:11 AM
pull request
Fix Copilot CLI hook compatibility so PeonPing no longer drops key agent events
The PR replaces the Copilot integration path with a unified event-handling fix: Copilot hooks are written directly under `~/.copilot/hooks` when available, the Copilot adapters now use explicit per-event translation instead of implicit remaps, and incoming payloads are normalized from camelCase aliases to the expected snake_case fields so events like `permissionRequest` are detected instead of being silently ignored.
ContributionImplemented a compatibility and routing fix for Copilot CLI events: direct hook auto-wiring in install/uninstall scripts, explicit event mapping in adapters, and camelCase payload fallback (13 aliases) in both shell and PowerShell execution paths. This turns previously dropped events—especially `permissionRequest`—into detectable events for PeonPing.
ImpactCopilot CLI users of PeonPing now receive audible cues for completion, permission prompts, and failure/notification signals again, reducing the chance of missing critical workflow prompts during agent sessions; this also makes behavior more predictable for operators who rely on these hooks for task oversight. The change works by replacing brittle event remaps with explicit translation and a compatibility shim for upstream payload drift, while preserving existing handled event coverage. Watch for whether future Copilot CLI payload schema changes introduce new field-name variants beyond the 13 aliases and whether any unhandled event names appear in real traffic.
May 18, 2026, 3:32 AM
pull request
Fix JSON escaping in Traycer phase mode tools config
A syntax error in `Traycer AI/phase_mode_tools.json` was fixed by escaping quotes in the `write_phases.description` entry around `cut-over`, which prevents tool-configuration parsing failures that could block usage of this repo’s tool definitions.
ContributionEscaped the embedded quotes in the `cut-over` text so the tool definition file is valid JSON and can be loaded by normal JSON parsers without manual repair.
ImpactOperators and developers loading this tool config will no longer hit startup/configuration failures from a malformed JSON file, so model/tooling integrations can initialize reliably instead of failing on startup. Technically, the change fixes invalid quoting in `write_phases.description` that previously triggered `JSONDecodeError` at line 295, enabling consistent `json.load` validation and deployment flows. Continue watching for other prompt/template updates that may reintroduce unescaped quotes, since a single malformed string can still break parser-based loading steps.
May 9, 2026, 1:19 AM
release
cc-connect Bridge now requires a token when enabled
In v1.3.3-beta.2, Bridge mode was hardened so cc-connect enforces token-based access, preventing Bridge calls from proceeding without credentials.
ContributionAdded a credential check for Bridge mode that blocks unauthenticated Bridge requests and requires a configured token before Bridge actions can be used.
ImpactIntegrators and operators using Bridge will be safer against unauthorized access, because Bridge commands now fail unless a valid token is provided; this reduces the risk of accidental or malicious use of the Bridge endpoint, while rollout should focus on catching any existing automation that currently sends Bridge requests without tokens and validating token rotation/secret storage so legitimate workflows do not stop unexpectedly.

Evidence Trail

github_pull_request
bytedance/deer-flow PR #3143: fix(middleware): handle repeated tool call ids
Fix repeated tool_call_id handling by avoiding single-value map eviction and consuming matching ToolMessages in order.
Open Source
github_issue
Serena may overwrite externally edited files due to stale open-file cache
Issue #311 requests decoupling Clawd’s remote approval path from telegram-approval-sidecar.js by introducing a configurable HTTP endpoint flow: Clawd POSTs permission JSON to a user webhook and consumes a remote-decision callback (allow/deny) to proceed.
Open Source
github_commit_burst
esengine/DeepSeek-Reasonix commit burst: 10 commits in 7 days
Previously, when the model requested file access outside the sandbox, the TUI opened `PathConfirm` but the web dashboard stayed blank and the loop waited indefinitely; the new flow adds a web-resolvable `path` modal with `run_once`, `always_allow`, and `deny` actions.
Open Source
github_issue
Serena may overwrite externally edited files due to stale open-file cache
Tool Calling has source-backed evidence attached to the latest tracked change.
Open Source

Source Coverage

github pull request: 30 events · 30 evidence items; 2 days ago
github release: 4 events · 4 evidence items; 2 days ago
github issue: 4 events · 4 evidence items; 2 days ago
github commit burst: 4 events · 4 evidence items; 2 days ago
rss feed: 1 event · 1 evidence item; 4 days ago
hacker news feed: 1 event · 1 evidence item; 5 days ago

Subscribe to this topic

Keep tracking Tool Calling with weekly digests and high-signal alerts once your account subscription is active.

Review Pro tracking

Watching Next

Tool Calling tracks source-backed changes, trend stages, evidence volume, and the signals worth watching over time.

Turn on alerts

Stage: Expansion

Tool Calling

Track important changes in Tool Calling, including capabilities, product updates, adoption signals, risks, and evidence worth continued monitoring.

TOOL CALLINGTRACKING

Signal Feed

Changes worth continued tracking

20 unique signals

issueMay 19, 2026, 9:18 AM
Serena may overwrite externally edited files due to stale open-file cache
Issue #1013 identified a correctness bug where Serena reused a cached file buffer in `open_file()` without checking disk freshness, so `replace_content` could apply regex edits to stale content and silently overwrite newer external changes.
What ChangedIssue #1013 identified a correctness bug where Serena reused a cached file buffer in `open_file()` without checking disk freshness, so `replace_content` could apply regex edits to stale content and silently overwrite newer external changes.
Why It MattersDevelopers using Serena alongside other tools (for example Claude Code Edit/Read and git workflows) can see real code changes disappear without warning, because an edit tool may report success while persisting stale content over newer file versions; teams should monitor multi-tool sessions for silent overwrite behavior and verify that future releases add explicit stale-buffer detection for all write paths. After the fix, operations like `replace_content` should only write against fresh file content, reducing risk of accidental data loss and making mixed-workflow editing safer.
Final score 84Confidence 961 evidence itemoraios/serenaopen_fileopen_file_buffersreplace_contentEditedFileContext.get_original_contentfind_symbol
Analyze Evidence
pull requestMay 19, 2026, 9:57 AM
Preserve Vertex AI function IDs during genai deserialization
Restores the round-trip behavior for Vertex AI session history by mapping `aiplatformpb.FunctionCall.Id` and `aiplatformpb.FunctionResponse.Id` to `genai.FunctionCall.ID` and `genai.FunctionResponse.ID` when reading events back into `genai.Content`, so tool-call and tool-response links are not dropped.
What ChangedRestores the round-trip behavior for Vertex AI session history by mapping `aiplatformpb.FunctionCall.Id` and `aiplatformpb.FunctionResponse.Id` to `genai.FunctionCall.ID` and `genai.FunctionResponse.ID` when reading events back into `genai.Content`, so tool-call and tool-response links are not dropped.
Why It MattersAgents and tools that replay Vertex AI session history with ADK-go can keep tool calls tied to the correct responses across turns, reducing misrouted follow-up actions and fragile multi-turn tool workflows in production conversations. The change maps Vertex AI function IDs back into `genai` IDs during deserialization, so ID pairing now survives the write-read cycle; this should be watched for older or partially migrated sessions and for any other event conversion paths that might still skip the same ID fields.
Final score 81Confidence 981 evidence itemaiplatformToGenaiContentgenai.FunctionCall.IDgenai.FunctionResponse.IDaiplatformpb.FunctionCall.Idaiplatformpb.FunctionResponse.Idsession historytool-call pairing
Analyze Evidence
pull requestMay 19, 2026, 7:58 PM
Strip DeepSeek reasoning fields when tool calls are present
This PR fixes a DeepSeek-only compatibility regression by auto-detecting models routed to `deepseek-reasoner` and removing `reasoning_effort`/`reasoning` from requests whenever `tools` are included, which prevents reasoner endpoints from rejecting tool-call traffic.
What ChangedThis PR fixes a DeepSeek-only compatibility regression by auto-detecting models routed to `deepseek-reasoner` and removing `reasoning_effort`/`reasoning` from requests whenever `tools` are included, which prevents reasoner endpoints from rejecting tool-call traffic.
Why It MattersDevelopers using DeepSeek tool-calling workflows (for example subagents) can avoid immediate 400 failures on tool invocations, so assistant runs no longer stop at the first step for affected Flash/reasoner models. The change is implemented via compat-mode detection in `openai-completions-compat` and parameter stripping in `buildParams`; in the next cycle, watch for new DeepSeek model naming/routing changes and verify custom compat overrides do not unintentionally remove reasoning from non-targeted models.
Final score 81Confidence 951 evidence itemdeepseek-v4-flashdeepseek-reasonerapi.deepseek.comreasoning_efforttoolsOpenAICompat
Analyze Evidence
pull requestMay 22, 2026, 6:25 AM
Emit completed tool responses before confirmation requests
In google/adk-go, `runOneStep` now yields the merged tool function-response event before yielding `adk_request_confirmation`. This ensures a completed tool result is written to session history even when a consumer stops at the confirmation boundary.
What ChangedIn google/adk-go, `runOneStep` now yields the merged tool function-response event before yielding `adk_request_confirmation`. This ensures a completed tool result is written to session history even when a consumer stops at the confirmation boundary.
Why It MattersTool-call frameworks and operators using adk-go will no longer lose completed tool outputs when execution pauses for user approval, so downstream state, replay, and audit flows stay consistent instead of appearing to stop at a confirmation-only snapshot. Technically, the change emits the merged function-response event before the confirmation event; teams should watch custom consumers for implicit ordering assumptions and add/adjust tests around stop-at-confirmation behavior.
Final score 81Confidence 971 evidence itemrunOneStepadk_request_confirmationfunction-response eventsession historytool execution
Analyze Evidence
pull requestMay 19, 2026, 5:27 AM
Use supported `reasoning` interleaved field for Cerebras Zai-GLM
The PR changes how reasoning text is sent for Cerebras by adding `reasoning` as an allowed `interleaved.field` and making Zai-GLM models default to that field, replacing top-level `reasoning_content` that Cerebras rejects.
What ChangedThe PR changes how reasoning text is sent for Cerebras by adding `reasoning` as an allowed `interleaved.field` and making Zai-GLM models default to that field, replacing top-level `reasoning_content` that Cerebras rejects.
Why It MattersCerebras users running Kilo-Org/kilocode with Zai-GLM no longer face request failures in reasoning workflows, so inference flows that include reasoning turns can continue without breaking API calls. This works by replacing the unsupported top-level `reasoning_content` payload with the supported inline `reasoning` interleaved field for Cerebras, so operators should watch whether any existing clients still emit `reasoning_content` and whether other provider profiles need equivalent field mappings to avoid new compatibility regressions.
Final score 81Confidence 971 evidence itemCerebras APIZai-GLMinterleaved.fieldreasoning_contentreasoning
Analyze Evidence
pull requestMay 19, 2026, 9:27 AM
Fix POSIX ACP CLI detection to avoid false-missing results after timeout
The PR replaces a brittle POSIX batch-only CLI availability check with a two-step strategy: it raises the batch `command -v` timeout to 8000ms, and on batch timeout it runs parallel per-CLI probes (each with 3000ms). This prevents a single slow PATH entry from forcing all built-in ACP CLIs to be reported as missing.
What ChangedThe PR replaces a brittle POSIX batch-only CLI availability check with a two-step strategy: it raises the batch `command -v` timeout to 8000ms, and on batch timeout it runs parallel per-CLI probes (each with 3000ms). This prevents a single slow PATH entry from forcing all built-in ACP CLIs to be reported as missing.
Why It MattersUsers and operators running AionUi in WSL, Docker, or Linux hosts with slow mounted PATHs will no longer see all ACP CLIs disappear as unavailable at startup, so installed tools remain selectable and usable instead of silently falling back to a degraded workflow. Technical mechanism: the detector still prefers a fast batch probe for normal environments, but now recovers from timeout cases by probing each CLI independently with bounded 3s checks; continue watching whether fallback frequency increases on very slow filesystems and whether per-CLI fallback meaningfully extends startup latency when many CLIs repeatedly timeout.
Final score 81Confidence 951 evidence itemAcpDetectorPOSIX batch CLI detectioncommand -vsafeExecAIONUI ACP tools
Analyze Evidence
pull requestMay 20, 2026, 7:07 PM
Fix ADK console tool confirmations to return actionable yes/no results
Added dedicated console-side handling for `toolconfirmation.FunctionCallName` so operator confirmation prompts now return a `{"confirmed": bool}` payload instead of the generic `{"result": <text>}` fallback that `ctx.ToolConfirmation()` could not consume.
What ChangedAdded dedicated console-side handling for `toolconfirmation.FunctionCallName` so operator confirmation prompts now return a `{"confirmed": bool}` payload instead of the generic `{"result": <text>}` fallback that `ctx.ToolConfirmation()` could not consume.
Why It MattersOperators using the ADK console can now actually confirm or reject sensitive tool actions (for example, deleting records), so approval-driven workflows resume correctly instead of appearing blocked by a confirmation response that was never usable. This matters most for teams running interactive HITL flows because it restores a reliable manual control path; continue to watch for regressions in other clients or scripts that still emit or expect the old `{"result": <text>}` response format.
Final score 81Confidence 971 evidence itemadk launcher consoletool confirmation interrupttoolconfirmation.FunctionCallNamectx.ToolConfirmationFunctionCall.ID
Analyze Evidence
pull requestMay 20, 2026, 9:34 AM
LibreChat starts MCP OAuth flow before connect when tokens are missing
This fix changes MCP connection behavior for servers explicitly marked `requiresOAuth: true`: LibreChat now launches OAuth before attempting to connect, preventing a false "connected" state and subsequent tool-call failures for providers that accept anonymous handshakes but reject unauthenticated execution. It closes a real correctness gap where users reached a confusing dead-end after clicking Connect.
What ChangedThis fix changes MCP connection behavior for servers explicitly marked `requiresOAuth: true`: LibreChat now launches OAuth before attempting to connect, preventing a false "connected" state and subsequent tool-call failures for providers that accept anonymous handshakes but reject unauthenticated execution. It closes a real correctness gap where users reached a confusing dead-end after clicking Connect.
Why It MattersTeams using MCP tools such as Google BigQuery in LibreChat now get the OAuth login step during connect, so they no longer see a green connection followed by immediate "OAuth authentication required" when the first tool runs; authenticated calls become either available immediately after consent or fail early with a clear auth URL. This should reduce support friction and broken workflows, while operator monitoring should focus on callback failures, token-store behavior, and any repeated redirect loops for misconfigured OAuth apps or server URLs.
Final score 81Confidence 971 evidence itemMCPConnectionFactory.createConnectionrequiresOAuthoauthRequiredoauthHandledoauthFailedattemptToConnectParsedServerConfig.url
Analyze Evidence
pull requestMay 19, 2026, 1:16 PM
PraisonAI tool wrapper adds shared async bridge with cancellation-safe timeouts
This PR fixes a core async-safety problem in PraisonAI’s tool-calling wrapper by replacing per-runtime event-loop execution with a shared `_async_bridge` and adding explicit cancellation-aware timeout handling so run-time tool calls can finish/abort without leaking resources.
What ChangedThis PR fixes a core async-safety problem in PraisonAI’s tool-calling wrapper by replacing per-runtime event-loop execution with a shared `_async_bridge` and adding explicit cancellation-aware timeout handling so run-time tool calls can finish/abort without leaking resources.
Why It MattersUsers and operators invoking tools through PraisonAI should see fewer stalled jobs and background leftovers when calls timeout, because long-running or failing tool runs now terminate with explicit timeout outcomes and cleaned-up execution state. Technically, this is achieved by unifying async execution around `_async_bridge` plus cancellation/finally cleanup on `run_sync` timeout paths and a boundary-level `tool_timeout` guard; watch for edge cases where custom tools do not propagate cancellation correctly or where strict timeout settings may prematurely cut legitimate long tasks under heavy concurrency.
Final score 81Confidence 941 evidence itemPraisonAIInteractiveRuntime_async_bridgerun_synctool_timeout
Analyze Evidence
pull requestMay 21, 2026, 10:32 AM
Fix empty-ID OpenAI tool-call replays to prevent ghost calls
The PR fixes a replay-path correctness issue in Pi’s OpenAI-compatible provider layer by normalizing tool-call IDs during chat-completions/Responses replay: it merges argument-only streaming deltas into the active tool call, assigns deterministic non-empty IDs when ids are missing or pipe-prefixed, and drops orphan tool outputs without a matching emitted assistant tool call.
What ChangedThe PR fixes a replay-path correctness issue in Pi’s OpenAI-compatible provider layer by normalizing tool-call IDs during chat-completions/Responses replay: it merges argument-only streaming deltas into the active tool call, assigns deterministic non-empty IDs when ids are missing or pipe-prefixed, and drops orphan tool outputs without a matching emitted assistant tool call.
Why It MattersOperators of agent workflows using Pi with OpenAI-compatible APIs can avoid unexpected failures when replaying tool calls across turns, because empty or malformed replay IDs no longer produce requests that the API rejects. Concretely, the provider now rewrites missing or pipe-prefixed identifiers into stable non-empty IDs, merges orphaned argument-only deltas into the current call context, and filters unmatched orphan outputs before sending request payloads; this should reduce intermittent broken turns in function-calling systems. Watch for any rare fallback-ID collision or over-filtering case where a valid delayed output might be dropped and requires retry behavior from the caller.
Final score 80Confidence 961 evidence itemOpenAI-compatible providertool-call replaychat-completionsResponsestool_call_idcall_id
Analyze Evidence
pull requestMay 22, 2026, 6:24 AM
Preserve ThoughtSignature on ADK synthetic confirmation calls
This PR fixes a regression in ADK’s confirmation replay flow: synthetic `adk_request_confirmation` function-call parts were created without copying the original call’s `ThoughtSignature`, which caused Gemini thinking models to reject them with `400 INVALID_ARGUMENT`. The fix now ensures the replayed confirmation part inherits that signature.
What ChangedThis PR fixes a regression in ADK’s confirmation replay flow: synthetic `adk_request_confirmation` function-call parts were created without copying the original call’s `ThoughtSignature`, which caused Gemini thinking models to reject them with `400 INVALID_ARGUMENT`. The fix now ensures the replayed confirmation part inherits that signature.
Why It MattersFor applications using ADK tool-calling with Gemini thinking models, replayed confirmation calls now avoid the `function call adk_request_confirmation is missing a thought_signature` failure, so conversations are less likely to stop mid-flow and require operator retries. Technically, the change updates synthetic confirmation construction in `internal/llminternal/functions.go` to look up the source call by function-call ID and transfer the signature when available, which should be monitored after rollout for any remaining replay paths that might still drop signatures.
Final score 80Confidence 971 evidence itemgenerateRequestConfirmationEventadk_request_confirmationThoughtSignatureFunctionCallGemini thinking modelsfunction-call replay
Analyze Evidence
pull requestMay 19, 2026, 8:55 AM
Fix dropped function-response events by restoring missing IDs in adk-go tool-call deserialization
Google adk-go PR #690 fixes a bug where `aiplatformToGenaiContent` failed to copy the `Id` field from `FunctionCall` and `FunctionResponse`, which caused function-response events to be silently dropped in multi-invocation tool-calling sessions that rely on non-empty IDs. A round-trip unit test was added to prevent the regression.
What ChangedGoogle adk-go PR #690 fixes a bug where `aiplatformToGenaiContent` failed to copy the `Id` field from `FunctionCall` and `FunctionResponse`, which caused function-response events to be silently dropped in multi-invocation tool-calling sessions that rely on non-empty IDs. A round-trip unit test was added to prevent the regression.
Why It MattersApplications using adk-go tool calling will stop losing function responses during multi-invocation flows, so automated tool workflows can continue without silent drops and no need for extra retry logic. The fix restores ID propagation from serialized protobuf message parts to match the expected ID-based event matching path; teams should watch whether any other tool-call protobuf fields still bypass ID mapping in similar conversion paths and confirm behavior after merge and rollout.
Final score 80Confidence 971 evidence itemaiplatformToGenaiContentcreateAiplatformpbContentFunctionCallFunctionResponseprotobufId fieldtool callingmulti-invocation sessionround-trip unit test
Analyze Evidence
pull requestMay 18, 2026, 8:22 PM
Rulesync enforces explicit Kilo subagent frontmatter validation
This pull request replaces the old alias-based handling of Kilo subagent frontmatter with a dedicated schema that explicitly lists supported Kilo fields and uses it for runtime parsing/validation, so invalid values are rejected at config load time instead of passing silently.
What ChangedThis pull request replaces the old alias-based handling of Kilo subagent frontmatter with a dedicated schema that explicitly lists supported Kilo fields and uses it for runtime parsing/validation, so invalid values are rejected at config load time instead of passing silently.
Why It MattersDevelopers and operators using rulesync with Kilo subagents will now see immediate configuration errors when they specify bad agent-frontmatter values, which reduces silent misconfiguration and helps avoid broken agent runs later in deployment pipelines. The update introduces `KiloSubagentFrontmatterSchema` as the authoritative field set for `displayName`, `temperature`, `model`, and related Kilo options, and makes `KiloSubagent.validate()`/`fromRulesyncSubagent` enforce it with Zod at runtime while still allowing unknown future Kilo fields via loose passthrough. Watch for any existing subagent configs that relied on permissive parsing, and monitor CI or startup logs after rollout for newly surfaced validation failures.
Final score 80Confidence 951 evidence itemrulesyncKiloSubagentKiloSubagentFrontmatterSchemaZodfromRulesyncSubagent
Analyze Evidence
pull requestMay 18, 2026, 6:11 AM
Fix Copilot CLI hook compatibility so PeonPing no longer drops key agent events
The PR replaces the Copilot integration path with a unified event-handling fix: Copilot hooks are written directly under `~/.copilot/hooks` when available, the Copilot adapters now use explicit per-event translation instead of implicit remaps, and incoming payloads are normalized from camelCase aliases to the expected snake_case fields so events like `permissionRequest` are detected instead of being silently ignored.
What ChangedThe PR replaces the Copilot integration path with a unified event-handling fix: Copilot hooks are written directly under `~/.copilot/hooks` when available, the Copilot adapters now use explicit per-event translation instead of implicit remaps, and incoming payloads are normalized from camelCase aliases to the expected snake_case fields so events like `permissionRequest` are detected instead of being silently ignored.
Why It MattersCopilot CLI users of PeonPing now receive audible cues for completion, permission prompts, and failure/notification signals again, reducing the chance of missing critical workflow prompts during agent sessions; this also makes behavior more predictable for operators who rely on these hooks for task oversight. The change works by replacing brittle event remaps with explicit translation and a compatibility shim for upstream payload drift, while preserving existing handled event coverage. Watch for whether future Copilot CLI payload schema changes introduce new field-name variants beyond the 13 aliases and whether any unhandled event names appear in real traffic.
Final score 80Confidence 941 evidence itemPeonPingGitHub Copilot CLI 1.0.48-1copilot hooksadapters/copilot.shadapters/copilot.ps1install.shinstall.ps1peon.shpeon.ps1
Analyze Evidence
pull requestMay 21, 2026, 10:34 PM
Add persistent Cursor MAX mode across model selection and session lifecycle
Adds end-to-end support for Cursor MAX mode by introducing a `:max` model selector flag and threading that state through model parsing/formatting, agent session state, and task-subagent startup so supported models can reliably run in 1M-context mode.
What ChangedAdds end-to-end support for Cursor MAX mode by introducing a `:max` model selector flag and threading that state through model parsing/formatting, agent session state, and task-subagent startup so supported models can reliably run in 1M-context mode.
Why It MattersCursor users and operators can now keep expensive long-context runs in the intended 1M-context mode across model selection, resumes, and subagent creation, instead of being silently pulled back to a lower context setting mid workflow. The implementation wires `:max` through selector parsing/formatting and session lifecycle APIs (`setCursorMaxMode`, `getCursorMaxMode`) while propagating policy during startup and subagent initialization, so behavior is consistent; track parser regressions with non-Cursor models, capability detection mistakes that could enable/disable MAX incorrectly, and any gaps in MAX inheritance when sessions are rehydrated.
Final score 80Confidence 931 evidence itemCursor MAX mode`:max` selector flagSelectorFlagsAgentSessiontask subagent session inheritance
Analyze Evidence
pull requestMay 19, 2026, 9:39 AM
Stop tool-call batch execution on abort in agent run loop
This change fixes tool-call handling so that when `ctx.abort()` is triggered, the agent execution loop checks `signal?.aborted` during prep/execution, exits early, returns aborted tool results, and halts the run after the current turn while still finalizing through `afterToolCall`.
What ChangedThis change fixes tool-call handling so that when `ctx.abort()` is triggered, the agent execution loop checks `signal?.aborted` during prep/execution, exits early, returns aborted tool results, and halts the run after the current turn while still finalizing through `afterToolCall`.
Why It MattersWhen a tool run is aborted, users and operators are less likely to see a workflow continue after cancellation, which reduces confusing partial behavior across a turn and improves operational correctness of tool-driven sessions. Concretely, aborted calls now short-circuit instead of letting other queued calls proceed silently, and cleanup still runs via `afterToolCall`; teams should watch multi-tool-call batches where non-aborted sibling calls remain represented in session export for potential UI or downstream trace interpretation issues.
Final score 79Confidence 961 evidence itemctx.abortsignal?.abortedagent tool-call loopafterToolCalltool-call session turn
Analyze Evidence
issueMay 22, 2026, 9:27 AM
cc-connect opencode tool calls return blank outputs on Feishu under Windows/WSL
An open issue in cc-connect v1.2.1 reports that opencode mode can start sessions and complete turns but returns empty output for tool calls such as `whoami` and file reads, which means the assistant’s core command/result flow is no longer observable for end users, a critical functional regression for operators relying on tool execution.
What ChangedAn open issue in cc-connect v1.2.1 reports that opencode mode can start sessions and complete turns but returns empty output for tool calls such as `whoami` and file reads, which means the assistant’s core command/result flow is no longer observable for end users, a critical functional regression for operators relying on tool execution.
Why It MattersOperators using cc-connect as an Lark/Feishu assistant on Windows/WSL cannot get command output or file-read results, so automated or interactive tool-driven workflows become non-functional even though tasks appear to complete. The logs repeatedly show session start/complete events without visible response content, so teams should watch whether response frames are dropped after execution, especially in the stream/preview branch (`hasHandle=false`, `degraded=true`), and validate fixes in both command and read-file paths in upcoming versions.
Final score 79Confidence 871 evidence itemcc-connectopencodeFeishuWindows 11WSLtool callingv1.2.1
Analyze Evidence
pull requestMay 20, 2026, 10:55 AM
Accept string and numeric u32 values in Claude ambient tool inputs
The PR fixes a parser failure in Claude ambient tool calling where fields expected as u32 were sent as strings, causing ambient cycles to fail at deserialization and stop working.
What ChangedThe PR fixes a parser failure in Claude ambient tool calling where fields expected as u32 were sent as strings, causing ambient cycles to fail at deserialization and stop working.
Why It MattersUsers and operators relying on Claude ambient tool calls can now keep ambient workflows running because calls that send numeric fields like `"0"` no longer break parsing and abort the whole ambient cycle, reducing complete mode outages after tool invocations. The technical fix is a dual-acceptance parser path for targeted u32 fields, applied only to a small set of schedule-related inputs. Next, watch whether similar stringified numeric fields appear in other tool schemas and track ambient-cycle failure/retry metrics after deployment to catch any remaining type-contract gaps.
Final score 79Confidence 971 evidence itemClaude tool callingambient modeSerdeu32string-or-u32 deserializationEndCycleInputScheduleInputScheduleToolInputNextScheduleInput
Analyze Evidence
pull requestMay 20, 2026, 10:27 AM
Canonicalize HERMES_ONLY_TOOLS filtering to prevent tool shadowing
This change adds a single authoritative `HermesToolFilter` implementation and integrates it into PraisonAI’s tool registry, CLI loading, and export flow so `HERMES_ONLY_TOOLS` now has deterministic behavior. It specifically addresses overlapping tool names across environments by defining clear visibility rules and diagnostics, with unknown or invalid tool lists handled consistently.
What ChangedThis change adds a single authoritative `HermesToolFilter` implementation and integrates it into PraisonAI’s tool registry, CLI loading, and export flow so `HERMES_ONLY_TOOLS` now has deterministic behavior. It specifically addresses overlapping tool names across environments by defining clear visibility rules and diagnostics, with unknown or invalid tool lists handled consistently.
Why It MattersOperators running PraisonAI across multiple environments should get more predictable tool behavior, because overlapping tool names are now filtered consistently and agents are less likely to execute the wrong tool in production workflows. Technically, PR #1700 introduces centralized whitelisting via `HERMES_ONLY_TOOLS` (all tools vs. explicit list), logs startup diagnostics for collisions, and applies dev-mode warning+strip plus CI-mode strict-fail behavior for unknown tool names. Watch next how existing deployments with legacy environment setups react to the stricter unknown-tool checks and whether onboarding docs and env files are updated before broad rollout.
Final score 79Confidence 941 evidence itemHERMES_ONLY_TOOLSHermesToolFilterpraisonaiagents/hermes_filter.pytool registryCLI tool loading
Analyze Evidence
pull requestMay 20, 2026, 11:30 PM
Gate tool-pair summarization to context windows of 64K or less
PR #9152 changes Goose so tool-pair summarization is skipped when context grows beyond 64K tokens, while leaving small-context behavior unchanged; users can still fully disable it with GOOSE_TOOL_PAIR_SUMMARIZATION=false.
What ChangedPR #9152 changes Goose so tool-pair summarization is skipped when context grows beyond 64K tokens, while leaving small-context behavior unchanged; users can still fully disable it with GOOSE_TOOL_PAIR_SUMMARIZATION=false.
Why It MattersOperators of Goose in long sessions with many tool calls should see fewer harmful long-run behaviors because summarization is no longer applied unconditionally at high context lengths, reducing the chance of session quality drops. Technical follow-up: the cutoff is now fixed at 64K tokens, so teams should monitor whether this threshold is too strict for some workloads and verify the disable flag consistently applies in all deployment environments.
Final score 79Confidence 981 evidence itemtool-pair summarizationcontext window64K tokensGOOSE_TOOL_PAIR_SUMMARIZATION
Analyze Evidence

Topic Timeline

How the topic has changed over time

44 events

May 22, 2026, 10:06 AM
pull request
Fix dangling tool-call normalization for repeated tool IDs
The PR updates DanglingToolCallMiddleware so it keeps all ToolMessages sharing the same tool_call_id and processes them in occurrence order, preventing later matching tool outputs from being dropped during transcript normalization in summarized tool-calling sessions.
ContributionConverted the middleware’s normalization logic from single-item lookup per tool_call_id to ordered multi-match handling, so repeated IDs across assistant turns no longer cause later ToolMessages to be discarded.
ImpactTool-enabled chat applications can avoid silently losing a later tool result when the same tool_call_id appears in multiple turns, so developers and operators get stable multi-turn behavior instead of intermittent missing actions after summarization or compression steps. The middleware now tracks and consumes all matching ToolMessages by turn order, which should reduce hard-to-debug state corruption in tool flows, but it should be monitored for any edge cases with very long compressed histories where repeated IDs are heavily interleaved.
May 22, 2026, 9:34 AM
issue
Add configurable HTTP Webhook Sidecar for Clawd approval decisions
An issue proposes a single primary change: add a pluggable HTTP Webhook Sidecar so Clawd remote permission approval is no longer fixed to Telegram. The new flow would let Clawd send permission requests to a user-defined endpoint and receive allow/deny callbacks to resolve tool execution.
ContributionDefines a concrete integration capability: a pluggable approval client that reuses existing sidecar behavior while replacing the hardcoded Telegram transport with any custom HTTP endpoint. The proposed behavior posts structured permission payloads (including requestId/sessionId/tool info) and binds callback responses back into existing permission resolution.
ImpactOperators who want approval on their own local devices or self-hosted bots (e.g., ESP32 button panels, Home Assistant, Node-RED, private Slack/Discord/Feishu bridges) could route Clawd approval traffic through their chosen infrastructure instead of being bound to Telegram, enabling broader local or air-gapped control workflows. Technically, this means the approval transport is moved from a single vendor channel to configurable webhook targets with callbacks into resolvePermissionEntry, so follow-up monitoring should focus on callback security, endpoint authentication, requestId idempotency, and timeout/retry handling to avoid stuck approvals or mis-bound decisions.
May 22, 2026, 9:34 AM
commit burst
Bridge path-access approvals to the web dashboard
The update fixes a broken interactive path-approval flow by wiring `path_access` prompts end-to-end from the terminal side to the web UI, so the dashboard can now present and resolve the permission decision instead of leaving the session waiting.
ContributionImplemented a concrete path-approval bridge by adding a `path` active-modal variant, propagating pending path requests through `App.tsx` and `DashboardContext`, and extending the modal resolution API to accept `kind: path` plus explicit outcomes (`run_once`, `always_allow`, `deny`) so web operators can actually complete the approval flow.
ImpactWeb users and operators can now approve or deny file-access prompts from the web dashboard, so model runs that request sandbox-external paths no longer hang at a blank modal waiting state. The fix unifies path approval with the same TUI↔web modal handling used for other blocking gates, with explicit resolve actions routed through `/api/modal/resolve`, reducing stalled sessions and making permission outcomes visible during execution. Watch for reconnect/replay cases that could drop `pendingPath` state, and verify the new `always_allow`/`run_once` semantics never let high-risk path access be applied unexpectedly.
May 22, 2026, 9:27 AM
issue
cc-connect opencode tool calls return blank outputs on Feishu under Windows/WSL
An open issue in cc-connect v1.2.1 reports that opencode mode can start sessions and complete turns but returns empty output for tool calls such as `whoami` and file reads, which means the assistant’s core command/result flow is no longer observable for end users, a critical functional regression for operators relying on tool execution.
ContributionPinpoints a concrete behavior defect in the opencode tool-response path where session lifecycle logs are present but response payloads are not delivered, which is vital for debugging and prioritizing a fix because it blocks verified command-based workflows rather than minor UI or config noise.
ImpactOperators using cc-connect as an Lark/Feishu assistant on Windows/WSL cannot get command output or file-read results, so automated or interactive tool-driven workflows become non-functional even though tasks appear to complete. The logs repeatedly show session start/complete events without visible response content, so teams should watch whether response frames are dropped after execution, especially in the stream/preview branch (`hasHandle=false`, `degraded=true`), and validate fixes in both command and read-file paths in upcoming versions.
May 22, 2026, 9:21 AM
security fix
Hard-block tool calls for Title, Summarize, and Compaction agents
Kilo-Org/kilocode changed its system-agent permission setup so Title, Summarize, and Compaction agents now enforce a deny-all tool policy at the agent level, preventing top-level allow rules from re-enabling tool calls for these agents.
ContributionIntroduced an explicit hardcoded `deny: "*"` rule for these system agents, fixing a permission-precedence bug where outer allow rules could override the intended prohibition and allow prohibited tool invocation.
ImpactOperators of workflows that use KiloCode’s Title, Summarize, or Compaction agents will see fewer broken outputs and tool-related misbehavior, because these agents are now blocked from attempting tool calls they are not supposed to make. This reduces unexpected external interactions during routine title generation and summarization/compaction tasks and is worth watching for regressions where any of these agents still needs legitimate tool access, along with any side effects if future changes reintroduce permissive rule overrides.
May 22, 2026, 6:25 AM
pull request
Emit completed tool responses before confirmation requests
In google/adk-go, `runOneStep` now yields the merged tool function-response event before yielding `adk_request_confirmation`. This ensures a completed tool result is written to session history even when a consumer stops at the confirmation boundary.
ContributionReordered the event stream so completed tool responses are emitted first, preserving session-history correctness for human-in-the-loop tool flows.
ImpactTool-call frameworks and operators using adk-go will no longer lose completed tool outputs when execution pauses for user approval, so downstream state, replay, and audit flows stay consistent instead of appearing to stop at a confirmation-only snapshot. Technically, the change emits the merged function-response event before the confirmation event; teams should watch custom consumers for implicit ordering assumptions and add/adjust tests around stop-at-confirmation behavior.
May 22, 2026, 6:24 AM
bug fix
Preserve ThoughtSignature on ADK synthetic confirmation calls
This PR fixes a regression in ADK’s confirmation replay flow: synthetic `adk_request_confirmation` function-call parts were created without copying the original call’s `ThoughtSignature`, which caused Gemini thinking models to reject them with `400 INVALID_ARGUMENT`. The fix now ensures the replayed confirmation part inherits that signature.
ContributionCopies the original model `FunctionCall`’s `ThoughtSignature` into synthetic confirmation parts during request confirmation generation, closing a compatibility gap with replayed model-role function calls and preventing signature-related rejection on Gemini thinking models.
ImpactFor applications using ADK tool-calling with Gemini thinking models, replayed confirmation calls now avoid the `function call adk_request_confirmation is missing a thought_signature` failure, so conversations are less likely to stop mid-flow and require operator retries. Technically, the change updates synthetic confirmation construction in `internal/llminternal/functions.go` to look up the source call by function-call ID and transfer the signature when available, which should be monitored after rollout for any remaining replay paths that might still drop signatures.
May 22, 2026, 12:42 AM
release
Hooks system added for pre/post tool call control
The v1.35.0 release introduces an extensible hooks framework around tool execution, including a PreToolUse denial hook, so tool invocations can be intercepted and controlled before and after running.
ContributionImplemented a new hook layer for tool calls in Goose that registers pre- and post-execution handlers, with explicit denial support for pre-tool checks. This gives integrations a concrete extension point to validate, block, or augment tool calls instead of relying only on implicit runtime behavior.
ImpactDevelopers and operators can now stop prohibited tool actions before they execute, which reduces accidental or unsafe automation behavior and shifts policy enforcement closer to where actions happen. The new hook points also make it easier to add auditing and custom validation around tool usage; teams should watch for misconfigured hooks that could over-block legitimate calls or add extra latency at high tool-call volume.
May 21, 2026, 10:34 PM
pull request
Add persistent Cursor MAX mode across model selection and session lifecycle
Adds end-to-end support for Cursor MAX mode by introducing a `:max` model selector flag and threading that state through model parsing/formatting, agent session state, and task-subagent startup so supported models can reliably run in 1M-context mode.
ContributionImplemented MAX-mode as a first-class model capability (`maxMode`) that is parsed from model selectors, preserved in session state, applied in provider request construction, and inherited by subagent sessions, replacing ad-hoc flag handling with a shared selector/session flow tied to Cursor capability policy.
ImpactCursor users and operators can now keep expensive long-context runs in the intended 1M-context mode across model selection, resumes, and subagent creation, instead of being silently pulled back to a lower context setting mid workflow. The implementation wires `:max` through selector parsing/formatting and session lifecycle APIs (`setCursorMaxMode`, `getCursorMaxMode`) while propagating policy during startup and subagent initialization, so behavior is consistent; track parser regressions with non-Cursor models, capability detection mistakes that could enable/disable MAX incorrectly, and any gaps in MAX inheritance when sessions are rehydrated.
May 21, 2026, 10:32 AM
pull request
Fix empty-ID OpenAI tool-call replays to prevent ghost calls
The PR fixes a replay-path correctness issue in Pi’s OpenAI-compatible provider layer by normalizing tool-call IDs during chat-completions/Responses replay: it merges argument-only streaming deltas into the active tool call, assigns deterministic non-empty IDs when ids are missing or pipe-prefixed, and drops orphan tool outputs without a matching emitted assistant tool call.
ContributionImplemented deterministic replay normalization for tool-call identifiers and call reconstruction, covering both chat-completions and Responses flows, and added regression tests for empty-id ghost calls and pipe-prefixed IDs.
ImpactOperators of agent workflows using Pi with OpenAI-compatible APIs can avoid unexpected failures when replaying tool calls across turns, because empty or malformed replay IDs no longer produce requests that the API rejects. Concretely, the provider now rewrites missing or pipe-prefixed identifiers into stable non-empty IDs, merges orphaned argument-only deltas into the current call context, and filters unmatched orphan outputs before sending request payloads; this should reduce intermittent broken turns in function-calling systems. Watch for any rare fallback-ID collision or over-filtering case where a valid delayed output might be dropped and requires retry behavior from the caller.
May 21, 2026, 10:02 AM
pull request
PraisonAI enforces OpenAI-compatible BaseTool schemas at definition time
The PR strengthens PraisonAI tool-calling by validating BaseTool schemas against OpenAI expectations before they are used in chat/agent dispatch, adding schema round-trip and tool-list consistency checks so malformed schemas fail fast instead of propagating into runtime tool-call failures.
ContributionImplemented concrete schema checks that gate tool registration/dispatch: BaseTool now performs OpenAI-compatibility validation, validates JSON schema round-tripping for serialization parity, enforces tool-list consistency (including safer handling of duplicates), and reports actionable remediation hints.
ImpactDevelopers integrating tools with PraisonAI’s chat/agent workflows will get earlier, clearer failures when tool schemas are malformed, so broken tool calls are more likely to be caught during development or startup instead of interrupting live conversations. The change applies OpenAI-alignment checks in BaseTool.validate and the @tool decorator path, and adds round-trip plus list consistency validators to avoid silent dispatch mismatches; this should reduce debugging time and production incidents from tool schema drift. Continue monitoring whether existing custom tools need schema adjustments for strict JSON compatibility or naming consistency after adoption.
May 21, 2026, 9:23 AM
issue
Plugin-only install in context-mode breaks `ctx stats` tool access
`context-mode` users report that in the latest release, installing via plugin-only in OpenCode (without MCP server) no longer supports `ctx stats`, and the command fails with an unavailable-tool error instead of returning context statistics.
ContributionThe issue identifies a concrete regression in runtime behavior: plugin-only installation stops exposing the `context-mode_ctx_stats` tool, so OpenCode cannot execute `ctx stats` even though installation and upgrade steps succeed.
ImpactDevelopers and operators using `context-mode` through the plugin-only path cannot run `ctx stats`, so they lose a key diagnostics capability during sessions and may have to re-enable MCP server integration or switch install modes to avoid command failures. The error shows the platform no longer exposes `context-mode_ctx_stats` after plugin-only setup, while expected debug/validation artifacts like `ctx-debug.sh` are absent; monitor whether future releases restore plugin-only tool registration and whether Windows/PowerShell environments are especially affected.
May 21, 2026, 8:10 AM
dependency update
Upgrade anthropic-sdk-go to v1.44.1 with runner tool-call ownership fix
This Renovate PR updates `github.com/anthropics/anthropic-sdk-go` from v1.26.0 to v1.44.1, with the primary actionable change being the runner bug fix where unowned tool calls are now skipped.
ContributionUpgrades the project’s Anthropic Go SDK dependency to v1.44.1 and incorporates the fixed runner behavior that ignores tool calls not owned by `SessionToolRunner`, which changes tool execution semantics to prevent incorrect ownership-based dispatch during tool workflows.
ImpactDevelopers and services using this project’s Anthropic tool-calling path can avoid some incorrect tool-invocation handling after the SDK bump, reducing malformed tool call flows and related request instability. The updated `SessionToolRunner` ownership guard in v1.44.1 is intended to reduce cross-owner tool-call mishandling, but the large version jump (1.26→1.44) still needs end-to-end validation for any subtle API or behavioral compatibility regressions in existing integrations.
May 20, 2026, 11:30 PM
pull request
Gate tool-pair summarization to context windows of 64K or less
PR #9152 changes Goose so tool-pair summarization is skipped when context grows beyond 64K tokens, while leaving small-context behavior unchanged; users can still fully disable it with GOOSE_TOOL_PAIR_SUMMARIZATION=false.
ContributionAdded a context-length guard around the tool-pair summarization path so it runs only within 64K-context sessions, preventing the known long-session degradation while preserving short-context behavior and an explicit opt-out flag.
ImpactOperators of Goose in long sessions with many tool calls should see fewer harmful long-run behaviors because summarization is no longer applied unconditionally at high context lengths, reducing the chance of session quality drops. Technical follow-up: the cutoff is now fixed at 64K tokens, so teams should monitor whether this threshold is too strict for some workloads and verify the disable flag consistently applies in all deployment environments.
May 20, 2026, 7:07 PM
pull request
Fix ADK console tool confirmations to return actionable yes/no results
Added dedicated console-side handling for `toolconfirmation.FunctionCallName` so operator confirmation prompts now return a `{"confirmed": bool}` payload instead of the generic `{"result": <text>}` fallback that `ctx.ToolConfirmation()` could not consume.
ContributionImplemented a targeted fix for tool-confirmation interactions: the console now renders a dedicated confirmation prompt for tool-call interrupts and parses user responses into a boolean `confirmed` field, enabling `ctx.ToolConfirmation()` to correctly apply operator approval or rejection.
ImpactOperators using the ADK console can now actually confirm or reject sensitive tool actions (for example, deleting records), so approval-driven workflows resume correctly instead of appearing blocked by a confirmation response that was never usable. This matters most for teams running interactive HITL flows because it restores a reliable manual control path; continue to watch for regressions in other clients or scripts that still emit or expect the old `{"result": <text>}` response format.
May 20, 2026, 6:41 PM
commit burst
LibreChat hardens data-retention cleanup for temporary chats and related artifacts
The burst’s main change is a unified retention fix: temporary/all retention policies are now applied consistently across conversations plus linked artifacts (files, tool calls, and shared links), with added expiry metadata and sweep logic to prevent stale data from being kept indefinitely.
ContributionImplemented end-to-end retention-semantics hardening by extending schema and cleanup logic for conversation-attached artifacts, propagating temporary-flag context from client paths into server checks, and correcting sweep/read behavior for all-retention and temporary modes.
ImpactAdmins and operators managing LibreChat chats can expect temporary or all-retention policies to remove session artifacts more reliably, so fewer private files/tool calls persist beyond intended periods. The change adds retention-aware paths for files, tool calls, and shared links with expiredAt/TTL handling and deterministic sweep behavior, so teams should monitor legacy-record cleanup, sweep overlap failures, and cleanup latency as data volume grows.
May 20, 2026, 6:07 PM
bug fix
Fix empty tool-call payloads to prevent Bedrock 400s
The commit burst resolves a production-breaking tool-calling path by changing how `/btw` and IRC ephemeral turns encode tools for LiteLLM→Bedrock. The fix makes `context.tools === undefined` the only case for the `hasToolHistory` sentinel, switches checks to length-based logic so explicit `context.tools = []` is treated as an intentional no-tool request, and drops redundant `tool_choice: 'none'` when no tools are available, preventing the empty `toolConfig` payload that triggered Bedrock request rejections.
ContributionFixed a concrete tool-routing logic bug in AI provider request construction: distinguished “caller omitted tools” from “caller passed empty tools,” gated tool history handling accordingly, and removed `tool_choice` when no tools are resolved so an empty tool array no longer emits an invalid directive.
ImpactOperators and developers running /btw/IRC background tool-calling flows will see fewer failed completions, because sessions that previously hit a 400 rejection after earlier tool interactions now keep working with explicit no-tool turns. Watch whether any provider path still expects an empty tools field, and verify no regression in the Anthropic-proxy compatibility branch (`hasToolHistory`) while continuing to validate mixed tool-history scenarios.
May 20, 2026, 3:29 PM
release
Pi CLI release/install path hardened with locked dependencies
v0.75.4 introduces supply-chain hardening for the Pi CLI install and release flow by adding a generated transitive dependency lockfile, enforcing checks for dependency pinning and lifecycle-script allowlists, and disabling lifecycle scripts during self-update/local release installs.
ContributionImplemented a concrete supply-chain safety path for CLI installs and updates: the release now ships a generated transitive dependency lockfile, blocks accidental lockfile drift, validates dependency pinning and allowed lifecycle scripts, and disables risky lifecycle hooks during self-update/local install flows with pre-release npm/Bun install smoke tests.
ImpactOperators and developers who install or auto-update the Pi CLI should see safer upgrades with fewer surprise dependency changes and reduced risk from unexpected install hooks, so rollouts are less likely to introduce build-time surprises or accidental package-level risk during deployment. The change is enforced through a release-time npm-shrinkwrap.json plus checks for pinned transitive dependencies and lifecycle allowlists, and by disabling lifecycle scripts in reinstall/update paths. Watch for false positives in environments using custom install hooks and monitor any regressions where legitimate packages are rejected by the new lockfile or allowlist rules.
May 20, 2026, 10:55 AM
pull request
Accept string and numeric u32 values in Claude ambient tool inputs
The PR fixes a parser failure in Claude ambient tool calling where fields expected as u32 were sent as strings, causing ambient cycles to fail at deserialization and stop working.
ContributionIntroduced custom deserializers that accept both string and u32 for numeric fields and applied them to EndCycleInput.memories_modified/compactions and wake_in_minutes in NextScheduleInput, ScheduleInput, and ScheduleToolInput, eliminating strict type-only parsing for these tool-call payloads.
ImpactUsers and operators relying on Claude ambient tool calls can now keep ambient workflows running because calls that send numeric fields like `"0"` no longer break parsing and abort the whole ambient cycle, reducing complete mode outages after tool invocations. The technical fix is a dual-acceptance parser path for targeted u32 fields, applied only to a small set of schedule-related inputs. Next, watch whether similar stringified numeric fields appear in other tool schemas and track ambient-cycle failure/retry metrics after deployment to catch any remaining type-contract gaps.
May 20, 2026, 10:27 AM
pull request
Canonicalize HERMES_ONLY_TOOLS filtering to prevent tool shadowing
This change adds a single authoritative `HermesToolFilter` implementation and integrates it into PraisonAI’s tool registry, CLI loading, and export flow so `HERMES_ONLY_TOOLS` now has deterministic behavior. It specifically addresses overlapping tool names across environments by defining clear visibility rules and diagnostics, with unknown or invalid tool lists handled consistently.
ContributionImplemented the canonical filter module for tool whitelisting in `praisonaiagents/hermes_filter.py`, defined concrete semantics for unset/empty/explicit values, added startup diagnostics, integrated filtering into tool registry and CLI loading paths, and added unit/integration test coverage and configuration docs.
ImpactOperators running PraisonAI across multiple environments should get more predictable tool behavior, because overlapping tool names are now filtered consistently and agents are less likely to execute the wrong tool in production workflows. Technically, PR #1700 introduces centralized whitelisting via `HERMES_ONLY_TOOLS` (all tools vs. explicit list), logs startup diagnostics for collisions, and applies dev-mode warning+strip plus CI-mode strict-fail behavior for unknown tool names. Watch next how existing deployments with legacy environment setups react to the stricter unknown-tool checks and whether onboarding docs and env files are updated before broad rollout.
May 20, 2026, 9:51 AM
commit burst
Lifecycle checkpoint transitions now gate high-risk tool mutations
The runtime now enters an explicit checkpoint state when a checkpoint is reached, so existing mutation gating applies during approval pauses and stopping at a checkpoint now cancels the lifecycle cleanly instead of leaving execution state behind.
ContributionIntroduced a concrete lifecycle behavior change: checkpoint entry now forces the state machine into `checkpoint`, enabling the existing high-risk mutation guard to run during paused review/revise/continue flows, and ensures checkpoint stop paths terminate via `lifecycle.cancel()`.
ImpactOperators and developers using the engineering lifecycle get safer checkpoint pauses, because risky commands and edits are now blocked while a checkpoint is waiting for approval, and stopping a checkpointed flow exits cleanly instead of lingering in a half-executing state. This should reduce unintended side effects during human review gates, while the next thing to watch is whether any legitimate checkpoint workflows are blocked too aggressively or any non-`App.tsx` code paths still skip the new transition.
May 20, 2026, 9:34 AM
pull request
LibreChat starts MCP OAuth flow before connect when tokens are missing
This fix changes MCP connection behavior for servers explicitly marked `requiresOAuth: true`: LibreChat now launches OAuth before attempting to connect, preventing a false "connected" state and subsequent tool-call failures for providers that accept anonymous handshakes but reject unauthenticated execution. It closes a real correctness gap where users reached a confusing dead-end after clicking Connect.
ContributionIntroduces a proactive OAuth gate: in createConnection, when `serverConfig.requiresOAuth === true` and no stored tokens exist, LibreChat emits `oauthRequired` before `attemptToConnect()`, awaits `oauthHandled` to proceed or `oauthFailed` to return auth guidance, and restricts this path to explicit OAuth-required servers. This directly replaces reactive 401-triggered auth, removes misleading successful handshakes, and preserves existing OAuth handler flow.
ImpactTeams using MCP tools such as Google BigQuery in LibreChat now get the OAuth login step during connect, so they no longer see a green connection followed by immediate "OAuth authentication required" when the first tool runs; authenticated calls become either available immediately after consent or fail early with a clear auth URL. This should reduce support friction and broken workflows, while operator monitoring should focus on callback failures, token-store behavior, and any repeated redirect loops for misconfigured OAuth apps or server URLs.
May 20, 2026, 7:22 AM
dependency update
opencode 1.15.5 update fixes plugin `ask` tool-call completion
This update in `dyoshikawa/rulesync` bumps `anomalyco/opencode` from v1.14.50 to v1.15.5, mainly to incorporate the release fix where plugin `ask` calls complete correctly instead of leaving tool invocations unresolved.
ContributionIntroduces a concrete bug fix in plugin execution by correcting `ask`-based tool calls so they can complete and report results instead of stalling.
ImpactDevelopers using plugin-driven flows are less likely to see stalled tool steps, so automated agent runs and interactive tasks are more likely to finish without manual retries or operator intervention. In technical terms, this release changes plugin `ask` behavior toward a resolvable completion path (promise-style behavior), which should reduce hanging tool interactions; watch whether custom plugins assume the prior unresolved `ask` pattern and whether this behavioral change affects their integration contracts.
May 19, 2026, 7:58 PM
pull request
Strip DeepSeek reasoning fields when tool calls are present
This PR fixes a DeepSeek-only compatibility regression by auto-detecting models routed to `deepseek-reasoner` and removing `reasoning_effort`/`reasoning` from requests whenever `tools` are included, which prevents reasoner endpoints from rejecting tool-call traffic.
ContributionAdds a new `OpenAICompat` flag (`disableReasoningWhenToolsPresent`) and DeepSeek auto-detection (`*-flash`, `deepseek-r1`, `deepseek-reasoner` on `api.deepseek.com`) so build-time parameter assembly drops `reasoning_effort` and `reasoning` whenever `tools` are present, with tests covering detection and request-body behavior.
ImpactDevelopers using DeepSeek tool-calling workflows (for example subagents) can avoid immediate 400 failures on tool invocations, so assistant runs no longer stop at the first step for affected Flash/reasoner models. The change is implemented via compat-mode detection in `openai-completions-compat` and parameter stripping in `buildParams`; in the next cycle, watch for new DeepSeek model naming/routing changes and verify custom compat overrides do not unintentionally remove reasoning from non-targeted models.
May 19, 2026, 3:17 PM
capability announcement
AWS publishes three implementation paths for Programmatic Tool Calling on Bedrock
The post’s core change is a concrete set of three implementation paths for Programmatic Tool Calling on Amazon Bedrock—self-hosted ECS sandbox, managed Bedrock AgentCore Code Interpreter, and an Anthropic SDK-compatible proxy—so teams can choose one integration route that fits their control, ops, and compatibility needs.
ContributionProvides a practical guide for adopting Bedrock tool-calling through three distinct architectures (self-hosted, managed, and SDK-compatible), reducing the need for teams to design separate custom orchestration from scratch.
ImpactDevelopers and operators can now add model tool-calling with less custom integration work because they can reuse existing Anthropic-style workflows or select a managed versus self-hosted deployment pattern based on governance and ops preferences, but teams should closely monitor security hardening in the ECS sandbox, proxy auth/routing correctness, and AgentCore service limits as they scale workloads across these paths.
May 19, 2026, 3:09 PM
pull request
Tighten Azure tool schema checks before sending tool calls
A single fix in the Azure/OpenAI adapter now enforces strict tool schema validation before chat-completion requests are sent and treats optional tool parameters as part of required schema validation, with a regression test covering run_skill_script optional-argument behavior.
ContributionThe change introduces stricter preflight validation in the Azure/OpenAI-compatible tool-calling path and updates schema requirements so optional tool arguments are explicitly represented during validation, reducing the chance of malformed tool calls being accepted and forwarded.
ImpactDevelopers building Azure-based agent tools with codecompanion.nvim will get more predictable tool-call behavior, because invalid or mismatched tool payloads are caught before hitting the chat-completion API, which helps avoid unexpected breakages in tool-driven workflows. The adapter now applies strict schema checks and marks optional arguments in the required validation path for Azure-compatible calls; operators should watch for new false-rejections of legitimate optional fields and watch downstream plugin telemetry for any increase in tool-call validation failures after rollout.
May 19, 2026, 1:16 PM
pull request
PraisonAI tool wrapper adds shared async bridge with cancellation-safe timeouts
This PR fixes a core async-safety problem in PraisonAI’s tool-calling wrapper by replacing per-runtime event-loop execution with a shared `_async_bridge` and adding explicit cancellation-aware timeout handling so run-time tool calls can finish/abort without leaking resources.
ContributionImplemented a single async-safety track for the tool wrapper: all internal tools now execute through a shared async bridge, `run_sync` timeout paths now cancel pending work with cleanup, and a timeout wrapper now returns explicit timeout results at the boundary to avoid silent hangs and leaked processes.
ImpactUsers and operators invoking tools through PraisonAI should see fewer stalled jobs and background leftovers when calls timeout, because long-running or failing tool runs now terminate with explicit timeout outcomes and cleaned-up execution state. Technically, this is achieved by unifying async execution around `_async_bridge` plus cancellation/finally cleanup on `run_sync` timeout paths and a boundary-level `tool_timeout` guard; watch for edge cases where custom tools do not propagate cancellation correctly or where strict timeout settings may prematurely cut legitimate long tasks under heavy concurrency.
May 19, 2026, 12:23 PM
software release
Forge adds local-agent guardrails that raise multi-step task reliability
Forge was introduced as an open-source reliability layer for self-hosted LLM tool-calling, adding model-agnostic runtime guardrails (retry nudges, step enforcement, error recovery, and VRAM-aware context handling) so the same 8B model can complete multi-step agentic workflows much more consistently without changing model weights.
ContributionIntroduces a concrete runtime control layer around local tool-calling agents that manages execution failures and context growth during multi-step runs, producing reported large reliability gains for 8B models (from ~53% to ~99% in its evaluation setup) without requiring model fine-tuning.
ImpactDevelopers and operators using local 8B agents can expect far fewer failed tool-use runs in multi-step workflows, which can reduce repeated retries and make smaller on-device models viable for production-like agent tasks that previously needed fallback to heavier/remote setups. This matters because the gain is attributed to guardrail behavior (retry nudges, strict step sequencing, recovery logic, and VRAM-aware context controls), so teams should next monitor whether those controls generalize across different toolchains and whether recovery logic introduces unacceptable latency or behavior drift in long-running automated agents.
May 19, 2026, 10:10 AM
pull request
Serena may overwrite externally edited files due to stale open-file cache
Tool Calling showed a tracked change with evidence attached, making the topic easier to monitor over time.
ContributionAdds evidence to the topic's change timeline.
ImpactHelps teams decide whether this direction deserves continued tracking.
May 19, 2026, 10:08 AM
pull request
Add Sandbox download_file API for binary retrieval
This PR adds a new `download_file` method to the sandbox abstraction and implements it in both `AioSandbox` and `LocalSandbox`, giving callers a unified way to read raw file bytes from sandboxed files. It also adds focused unit coverage for the async implementation’s normal, empty-file, lock, error, and chunked-read paths.
ContributionIntroduces a concrete capability gap fix: a new abstract sandbox API for downloading file contents is defined and implemented in two existing backends, enabling standardized binary output extraction instead of backend-specific workarounds.
ImpactDevelopers building multi-step workflows on Deer-flow can now download outputs (such as generated files) through a single sandbox API, reducing brittle ad-hoc extraction logic and making artifact handling easier to automate in downstream steps; teams should watch whether all sandbox backends consistently implement the new contract and whether error paths behave predictably for large or contested files. The new method is defined on `Sandbox` and implemented in `AioSandbox` and `LocalSandbox`, with tests added for empty files, locking, error propagation, and single-chunk behavior, so regressions in retrieval semantics should surface earlier during CI.
May 19, 2026, 9:57 AM
bugfix
Preserve Vertex AI function IDs during genai deserialization
Restores the round-trip behavior for Vertex AI session history by mapping `aiplatformpb.FunctionCall.Id` and `aiplatformpb.FunctionResponse.Id` to `genai.FunctionCall.ID` and `genai.FunctionResponse.ID` when reading events back into `genai.Content`, so tool-call and tool-response links are not dropped.
ContributionFixes a correctness issue where persisted Vertex AI session events lost function IDs on read-back, by explicitly preserving call/response IDs in the conversion path and adding `TestAiplatformToGenaiContentPreservesFunctionIDs` to guard this behavior.
ImpactAgents and tools that replay Vertex AI session history with ADK-go can keep tool calls tied to the correct responses across turns, reducing misrouted follow-up actions and fragile multi-turn tool workflows in production conversations. The change maps Vertex AI function IDs back into `genai` IDs during deserialization, so ID pairing now survives the write-read cycle; this should be watched for older or partially migrated sessions and for any other event conversion paths that might still skip the same ID fields.
May 19, 2026, 9:39 AM
pull request
Stop tool-call batch execution on abort in agent run loop
This change fixes tool-call handling so that when `ctx.abort()` is triggered, the agent execution loop checks `signal?.aborted` during prep/execution, exits early, returns aborted tool results, and halts the run after the current turn while still finalizing through `afterToolCall`.
ContributionAdds an explicit abort guard in the tool-call execution path that breaks processing as soon as abort state is observed, returns explicit aborted tool outputs, and preserves post-tool-call finalization behavior.
ImpactWhen a tool run is aborted, users and operators are less likely to see a workflow continue after cancellation, which reduces confusing partial behavior across a turn and improves operational correctness of tool-driven sessions. Concretely, aborted calls now short-circuit instead of letting other queued calls proceed silently, and cleanup still runs via `afterToolCall`; teams should watch multi-tool-call batches where non-aborted sibling calls remain represented in session export for potential UI or downstream trace interpretation issues.
May 19, 2026, 9:27 AM
pull request
Fix POSIX ACP CLI detection to avoid false-missing results after timeout
The PR replaces a brittle POSIX batch-only CLI availability check with a two-step strategy: it raises the batch `command -v` timeout to 8000ms, and on batch timeout it runs parallel per-CLI probes (each with 3000ms). This prevents a single slow PATH entry from forcing all built-in ACP CLIs to be reported as missing.
ContributionUpdated `AcpDetector.batchCheckCliAvailability` in `src/process/agent/acp/AcpDetector.ts` by introducing `POSIX_BATCH_TIMEOUT_MS = 8000`, adding `POSIX_PER_CLI_TIMEOUT_MS = 3000`, and adding fallback logic that runs isolated per-CLI `command -v` checks when the batch probe times out.
ImpactUsers and operators running AionUi in WSL, Docker, or Linux hosts with slow mounted PATHs will no longer see all ACP CLIs disappear as unavailable at startup, so installed tools remain selectable and usable instead of silently falling back to a degraded workflow. Technical mechanism: the detector still prefers a fast batch probe for normal environments, but now recovers from timeout cases by probing each CLI independently with bounded 3s checks; continue watching whether fallback frequency increases on very slow filesystems and whether per-CLI fallback meaningfully extends startup latency when many CLIs repeatedly timeout.
May 19, 2026, 9:18 AM
issue
Serena may overwrite externally edited files due to stale open-file cache
Issue #1013 identified a correctness bug where Serena reused a cached file buffer in `open_file()` without checking disk freshness, so `replace_content` could apply regex edits to stale content and silently overwrite newer external changes.
ContributionThe change request defines a concrete fix for stale-buffer safety: before returning a cached buffer, compare it with current on-disk state (mtime or content hash), invalidate cached buffers when changed, and force a fresh read for edit operations.
ImpactDevelopers using Serena alongside other tools (for example Claude Code Edit/Read and git workflows) can see real code changes disappear without warning, because an edit tool may report success while persisting stale content over newer file versions; teams should monitor multi-tool sessions for silent overwrite behavior and verify that future releases add explicit stale-buffer detection for all write paths. After the fix, operations like `replace_content` should only write against fresh file content, reducing risk of accidental data loss and making mixed-workflow editing safer.
May 19, 2026, 8:55 AM
pull request
Fix dropped function-response events by restoring missing IDs in adk-go tool-call deserialization
Google adk-go PR #690 fixes a bug where `aiplatformToGenaiContent` failed to copy the `Id` field from `FunctionCall` and `FunctionResponse`, which caused function-response events to be silently dropped in multi-invocation tool-calling sessions that rely on non-empty IDs. A round-trip unit test was added to prevent the regression.
ContributionAligned deserialization with existing serialization behavior by populating `Id` in `aiplatformToGenaiContent` for both `FunctionCall` and `FunctionResponse`, and added a round-trip test to lock in correct ID handling in future changes.
ImpactApplications using adk-go tool calling will stop losing function responses during multi-invocation flows, so automated tool workflows can continue without silent drops and no need for extra retry logic. The fix restores ID propagation from serialized protobuf message parts to match the expected ID-based event matching path; teams should watch whether any other tool-call protobuf fields still bypass ID mapping in similar conversion paths and confirm behavior after merge and rollout.
May 19, 2026, 5:27 AM
pull request
Use supported `reasoning` interleaved field for Cerebras Zai-GLM
The PR changes how reasoning text is sent for Cerebras by adding `reasoning` as an allowed `interleaved.field` and making Zai-GLM models default to that field, replacing top-level `reasoning_content` that Cerebras rejects.
ContributionIntroduced API-compatibility fix for Cerebras by registering `reasoning` as a supported `interleaved.field` and switching default Zai-GLM behavior to send reasoning text through this supported path instead of `reasoning_content`.
ImpactCerebras users running Kilo-Org/kilocode with Zai-GLM no longer face request failures in reasoning workflows, so inference flows that include reasoning turns can continue without breaking API calls. This works by replacing the unsupported top-level `reasoning_content` payload with the supported inline `reasoning` interleaved field for Cerebras, so operators should watch whether any existing clients still emit `reasoning_content` and whether other provider profiles need equivalent field mappings to avoid new compatibility regressions.
May 19, 2026, 3:26 AM
documentation update
Align skill/plugin tutorials to the live schema validation workflow
The PR makes tutorial documentation match the current validator rules by replacing the old in-notebook self-checking guidance with canonical schema tooling references, updating key tutorial text from deprecated rules, and removing duplicate notebooks so contributors use one authoritative set of examples.
ContributionSingle primary change: documentation canonicalization for contributor workflows by redirecting the validation tutorial to the official validator + spec sources and synchronizing 10 tutorial notebooks to current schema wording (for example required fields, tools format, and compatibility naming), then deleting duplicated notebook copies to prevent stale references.
ImpactNew contributors creating Claude skills and plugins are less likely to be guided by obsolete instructions that cause immediate schema-validation failures, reducing onboarding friction and rework while they follow the tutorials. Concretely, the PR shifts teaching from a hardcoded local validator sketch to the real validation path and updates the most common wrong rules in the major tutorial set, so contributors can converge on acceptable skill definitions faster; continue to monitor whether stale code cells outside the markdown pass (and any removed/notebook-link assumptions) reintroduce outdated guidance as the schema evolves.
May 19, 2026, 3:02 AM
release
Add AWS credential_process profile support in oh-my-pi
v15.1.4 adds support for `credential_process` profile entries in oh-my-pi’s AWS credential resolver, so teams can authenticate AI workflows through command-based AWS credential providers without relying only on static env credentials.
ContributionImplemented profile-aware AWS credential resolution by extending the AI runtime credential resolver to execute and consume `credential_process` outputs, allowing dynamic/temporary credentials from profile-defined providers during tool and agent calls.
ImpactOperators using oh-my-pi with AWS-backed tool calls can keep using their existing enterprise credential provider setup, reducing auth setup breakage and startup failures when running AI tasks through profile-based credentials. This should lower manual credential wiring during deployments, but teams should monitor command output parsing errors, provider execution permission issues, and whether fallback to old static-resolution paths changes behavior in multi-profile environments.
May 19, 2026, 12:32 AM
pull request
Fix validator batch discovery for root-level SKILL.md in Anthropic-spec plugins
PR #744 adds a second SKILL discovery path in the validator so plugins using the Anthropic-spec layout (`<plugin_root>/SKILL.md` beside `.claude-plugin/plugin.json`) are no longer skipped in batch checks, and adds tests to cover both root-level and legacy nested layouts without regression.
ContributionIntroduced root-level Anthropic-spec skill-file discovery in `validate_skills_schema.py` by adding a second walk anchored at `.claude-plugin/plugin.json`, then merging results with the legacy nested scan using absolute-path deduplication; this directly changes validation behavior from “silent miss” to “count and grade,” while preserving legacy layout support.
ImpactRepository operators and plugin maintainers using batch validation will now see valid Anthropic-spec plugins counted correctly instead of being shown as having zero skills, so plugins are less likely to be mistaken as missing or low-quality during quality checks. The validator now performs an additional root-level scan and SKILL dedup path for mixed-layout plugins, which should reduce false negatives but may change orphan-count and skill-count metrics in CI/monitoring, so dashboards and alert thresholds should be watched for intentional count jumps and duplicate-reporting regressions.
May 18, 2026, 8:22 PM
pull request
Rulesync enforces explicit Kilo subagent frontmatter validation
This pull request replaces the old alias-based handling of Kilo subagent frontmatter with a dedicated schema that explicitly lists supported Kilo fields and uses it for runtime parsing/validation, so invalid values are rejected at config load time instead of passing silently.
ContributionAdded a dedicated runtime-validated Kilo frontmatter schema and wired `KiloSubagent` parsing/validation to it, replacing fallback validation via the parent schema so supported Kilo settings become explicit and type-safe.
ImpactDevelopers and operators using rulesync with Kilo subagents will now see immediate configuration errors when they specify bad agent-frontmatter values, which reduces silent misconfiguration and helps avoid broken agent runs later in deployment pipelines. The update introduces `KiloSubagentFrontmatterSchema` as the authoritative field set for `displayName`, `temperature`, `model`, and related Kilo options, and makes `KiloSubagent.validate()`/`fromRulesyncSubagent` enforce it with Zod at runtime while still allowing unknown future Kilo fields via loose passthrough. Watch for any existing subagent configs that relied on permissive parsing, and monitor CI or startup logs after rollout for newly surfaced validation failures.
May 18, 2026, 3:00 PM
pull request
Add configurable OpenRouter base URL for Claude-Mem
The PR adds a new setting, `CLAUDE_MEM_OPENROUTER_BASE_URL`, that allows OpenRouter requests to be routed to any OpenAI-compatible `/v1` endpoint, while keeping the current OpenRouter default endpoint behavior when the setting is empty.
ContributionIntroduced a configurable OpenRouter base URL override and wired it into validation plus the settings UI, enabling deploy-time routing control to custom OpenAI-compatible endpoints with existing default fallback preserved.
ImpactDevelopers and operators using Claude-Mem can redirect OpenRouter calls to internal or custom-compatible gateways with a single setting, so they can adopt enterprise/private routing or proxy-based setups without rebuilding or patching worker bundles. Keep watching for per-environment endpoint compatibility and authentication/path issues, since custom `/v1` URLs may differ in auth headers, response formats, or rate-limit behavior even though the fallback default (`https://openrouter.ai/api/v1/chat/completions`) remains unchanged when unset.
May 18, 2026, 6:11 AM
pull request
Fix Copilot CLI hook compatibility so PeonPing no longer drops key agent events
The PR replaces the Copilot integration path with a unified event-handling fix: Copilot hooks are written directly under `~/.copilot/hooks` when available, the Copilot adapters now use explicit per-event translation instead of implicit remaps, and incoming payloads are normalized from camelCase aliases to the expected snake_case fields so events like `permissionRequest` are detected instead of being silently ignored.
ContributionImplemented a compatibility and routing fix for Copilot CLI events: direct hook auto-wiring in install/uninstall scripts, explicit event mapping in adapters, and camelCase payload fallback (13 aliases) in both shell and PowerShell execution paths. This turns previously dropped events—especially `permissionRequest`—into detectable events for PeonPing.
ImpactCopilot CLI users of PeonPing now receive audible cues for completion, permission prompts, and failure/notification signals again, reducing the chance of missing critical workflow prompts during agent sessions; this also makes behavior more predictable for operators who rely on these hooks for task oversight. The change works by replacing brittle event remaps with explicit translation and a compatibility shim for upstream payload drift, while preserving existing handled event coverage. Watch for whether future Copilot CLI payload schema changes introduce new field-name variants beyond the 13 aliases and whether any unhandled event names appear in real traffic.
May 18, 2026, 3:32 AM
pull request
Fix JSON escaping in Traycer phase mode tools config
A syntax error in `Traycer AI/phase_mode_tools.json` was fixed by escaping quotes in the `write_phases.description` entry around `cut-over`, which prevents tool-configuration parsing failures that could block usage of this repo’s tool definitions.
ContributionEscaped the embedded quotes in the `cut-over` text so the tool definition file is valid JSON and can be loaded by normal JSON parsers without manual repair.
ImpactOperators and developers loading this tool config will no longer hit startup/configuration failures from a malformed JSON file, so model/tooling integrations can initialize reliably instead of failing on startup. Technically, the change fixes invalid quoting in `write_phases.description` that previously triggered `JSONDecodeError` at line 295, enabling consistent `json.load` validation and deployment flows. Continue watching for other prompt/template updates that may reintroduce unescaped quotes, since a single malformed string can still break parser-based loading steps.
May 9, 2026, 1:19 AM
release
cc-connect Bridge now requires a token when enabled
In v1.3.3-beta.2, Bridge mode was hardened so cc-connect enforces token-based access, preventing Bridge calls from proceeding without credentials.
ContributionAdded a credential check for Bridge mode that blocks unauthenticated Bridge requests and requires a configured token before Bridge actions can be used.
ImpactIntegrators and operators using Bridge will be safer against unauthorized access, because Bridge commands now fail unless a valid token is provided; this reduces the risk of accidental or malicious use of the Bridge endpoint, while rollout should focus on catching any existing automation that currently sends Bridge requests without tokens and validating token rotation/secret storage so legitimate workflows do not stop unexpectedly.

Evidence Trail

github_pull_request
bytedance/deer-flow PR #3143: fix(middleware): handle repeated tool call ids
Fix repeated tool_call_id handling by avoiding single-value map eviction and consuming matching ToolMessages in order.
Open Source
github_issue
Serena may overwrite externally edited files due to stale open-file cache
Issue #311 requests decoupling Clawd’s remote approval path from telegram-approval-sidecar.js by introducing a configurable HTTP endpoint flow: Clawd POSTs permission JSON to a user webhook and consumes a remote-decision callback (allow/deny) to proceed.
Open Source
github_commit_burst
esengine/DeepSeek-Reasonix commit burst: 10 commits in 7 days
Previously, when the model requested file access outside the sandbox, the TUI opened `PathConfirm` but the web dashboard stayed blank and the loop waited indefinitely; the new flow adds a web-resolvable `path` modal with `run_once`, `always_allow`, and `deny` actions.
Open Source
github_issue
Serena may overwrite externally edited files due to stale open-file cache
Tool Calling has source-backed evidence attached to the latest tracked change.
Open Source

Source Coverage

github pull request: 30 events · 30 evidence items; 2 days ago
github release: 4 events · 4 evidence items; 2 days ago
github issue: 4 events · 4 evidence items; 2 days ago
github commit burst: 4 events · 4 evidence items; 2 days ago
rss feed: 1 event · 1 evidence item; 4 days ago
hacker news feed: 1 event · 1 evidence item; 5 days ago

Subscribe to this topic

Keep tracking Tool Calling with weekly digests and high-signal alerts once your account subscription is active.

Review Pro tracking

Watching Next

Tool Calling tracks source-backed changes, trend stages, evidence volume, and the signals worth watching over time.

Turn on alerts