Added an explicit `maxDuration = 800` to both streaming MCP route handlers (`apps/api/src/app/api/agent/[transport]/route.ts` and `apps/api/src/app/api/v2/agent/[transport]/route.ts`), replacing the implicit Vercel default timeout path for these endpoints so MCP sessions can run for about 13 minutes before function recycle.
What ChangedAdded an explicit `maxDuration = 800` to both streaming MCP route handlers (`apps/api/src/app/api/agent/[transport]/route.ts` and `apps/api/src/app/api/v2/agent/[transport]/route.ts`), replacing the implicit Vercel default timeout path for these endpoints so MCP sessions can run for about 13 minutes before function recycle.
Why It MattersOperators and users of Cursor MCP sessions should stop seeing active AI coding sessions drop around 5 minutes, so workflows that rely on long-lived streaming responses can continue without abrupt session interruption. It does this by overriding the default 300-second Vercel timeout with an explicit 800-second budget; after deployment, teams should monitor for residual timeout errors near the new ceiling and verify no regression if sessions legitimately exceed ~13 minutes.
Final score 85Confidence 991 evidence itemVercelmaxDuration/api/agent/mcp/api/v2/agent/mcpMCPWebStandardStreamableHTTPServerTransport
This PR adds a 30,000ms client-side timeout to `hostServiceCall` in `mcp-v2`, using `AbortController` instead of allowing relay fetches to run until Vercel kills the lambda at 300s; on abort it now throws a clear timeout error, and the timer is always cleared in `finally` so repeated requests do not leak resources. It also adds `HostServiceCallOptions.timeoutMs` to allow override by callers.
What ChangedThis PR adds a 30,000ms client-side timeout to `hostServiceCall` in `mcp-v2`, using `AbortController` instead of allowing relay fetches to run until Vercel kills the lambda at 300s; on abort it now throws a clear timeout error, and the timer is always cleared in `finally` so repeated requests do not leak resources. It also adds `HostServiceCallOptions.timeoutMs` to allow override by callers.
Why It MattersOperators and tool users calling MCP endpoints such as `/api/agent/mcp` and `/api/v2/agent/mcp` should see stalled calls fail in about 30 seconds instead of hanging for 300 seconds, which directly reduces lambda starvation and makes host/offline relay failures visible as actionable tool errors rather than silent timeout storms (previously 83,850+ errors in 24h, 312 in the last hour). The change affects `agents_run`, `agents_list`, `workspaces_create`, and `workspaces_delete`; it improves incident response by surfacing a consistent timeout path and preventing long-running blocked handlers. Watch next: whether the 30s default is too short for legitimate high-latency relay paths and whether clients need tuned `timeoutMs` settings to avoid unnecessary aborts.
Final score 82Confidence 971 evidence itemhostServiceCallAbortControllerHostServiceCallOptions.timeoutMsMCPVercel lambda timeout
This change fixes a production-impacting retry failure in mirrord by disabling HTTP/2 connection pooling in `ClientStore`; reusing pooled `SendRequest` objects after an idle GOAWAY left a closed sender in the pool, so subsequent retries repeatedly failed with `channel closed`, and forcing a new connection per request removes that failure loop.
What ChangedThis change fixes a production-impacting retry failure in mirrord by disabling HTTP/2 connection pooling in `ClientStore`; reusing pooled `SendRequest` objects after an idle GOAWAY left a closed sender in the pool, so subsequent retries repeatedly failed with `channel closed`, and forcing a new connection per request removes that failure loop.
Why It MattersOperators and developers running mirrord in gRPC/HTTP/2 environments with short idle timeouts (for example Quarkus via Vert.x in Kubernetes meshes) should see fewer intermittent `channel closed` failures and more successful retries, reducing flaky behavior during local traffic-steering and integration testing. The update works by preventing `get_with_pooling()` from returning stale `http2::SendRequest` instances after GOAWAY-triggered closure, so each request starts with a fresh channel. Watch for connection-establishment cost under high request volume and whether a future targeted fix (e.g., discarding only closed senders) is needed to regain pooling benefits without reintroducing stale-channel reuse bugs.
Final score 81Confidence 991 evidence itemmirrordClientStoreHTTP/2gRPCGOAWAYSendRequestconnection pooling
The pull request fixes a race in gateway run cancellation where two concurrent `cancel_run` requests could both pass the initial existence check, causing the second call to return HTTP 409 even though the run had already moved to `interrupted`. The router now re-reads run state after a failed cancel attempt and returns HTTP 202 when the run is already interrupted or already removed, while preserving 409 only for genuinely non-cancelable states.
What ChangedThe pull request fixes a race in gateway run cancellation where two concurrent `cancel_run` requests could both pass the initial existence check, causing the second call to return HTTP 409 even though the run had already moved to `interrupted`. The router now re-reads run state after a failed cancel attempt and returns HTTP 202 when the run is already interrupted or already removed, while preserving 409 only for genuinely non-cancelable states.
Why It MattersOperators retrying cancel requests on the same run can avoid misleading 409 failures, so orchestration and cleanup jobs are less likely to abort due to race-related false negatives. The change removes an intermittent operational failure mode by treating already-interrupted or already-cleared runs as idempotent success, then still returning 409 for truly non-cancelable states such as completed runs. Follow-up should watch whether this re-fetch path causes any measurable overhead under very high cancel concurrency and whether any completed/store-only cases still surface as unexpected conflicts.
Final score 81Confidence 981 evidence itemcancel_rungateway routerTOCTOU racerun status re-checkHTTP 202HTTP 409
A deserialization bug in google/adk-go PR #690 was fixed so `aiplatformToGenaiContent` now copies the `Id` field for `FunctionCall` and `FunctionResponse`, matching the existing serialization path in `createAiplatformpbContent`; this prevents function response events from being dropped in multi-invocation tool-call sessions that require non-empty IDs.
What ChangedA deserialization bug in google/adk-go PR #690 was fixed so `aiplatformToGenaiContent` now copies the `Id` field for `FunctionCall` and `FunctionResponse`, matching the existing serialization path in `createAiplatformpbContent`; this prevents function response events from being dropped in multi-invocation tool-call sessions that require non-empty IDs.
Why It MattersDevelopers building multi-turn tool-calling agents with this SDK will no longer see function responses vanish during a session, which reduces failed or incomplete tool workflows and less manual recovery. Since matching logic in session handling depends on non-empty IDs, the parser now preserves those IDs consistently across serialization/deserialization; continue watching for other protobuf message fields that can desynchronize similarly, and expand regression coverage if multi-turn tool-call routing starts failing in new invocation paths.
Final score 81Confidence 981 evidence itemaiplatformToGenaiContentcreateAiplatformpbContentFunctionCallFunctionResponseIdprotobuftool calls
The PR fixes a deterministic CI breakage by correcting the vendoring sync config for the slack-channel plugin: test files were excluded while `bun test` was still kept, causing `No tests found!` failures. The change updates `sources.yaml` so upstream `features/**` test suites are included and runnable in the mirror.
What ChangedThe PR fixes a deterministic CI breakage by correcting the vendoring sync config for the slack-channel plugin: test files were excluded while `bun test` was still kept, causing `No tests found!` failures. The change updates `sources.yaml` so upstream `features/**` test suites are included and runnable in the mirror.
Why It MattersCI maintainers for MCP plugins now get real test validation for slack-channel instead of a repeatable false failure, so policy/journal/supervisor regressions are more likely to be detected before merge. The prior configuration kept `package.json`’s `bun test` script but removed the referenced tests, which produced a deterministic `No tests found!` exit and let the job short-circuit, hiding later plugin checks; continue monitoring whether other vendored plugins have similar sync-rule drift and whether CI still masks failures before a collect-all-failures mode is fully enforced.
Final score 81Confidence 951 evidence itemsources.yamlslack-channelbun testtest (mcp-plugins)sync-external.mjsfeatures/**
This PR changes herdr’s vendored libghostty-vt Zig build to compile with a baseline CPU target so the nested `zig build` no longer emits instructions from the build host that can crash on older x86_64 machines.
What ChangedThis PR changes herdr’s vendored libghostty-vt Zig build to compile with a baseline CPU target so the nested `zig build` no longer emits instructions from the build host that can crash on older x86_64 machines.
Why It MattersOperators using this package on Linux are less likely to see `SIGILL` startup failures when deployments run on slower or older x86_64 CPUs, so they can avoid per-host pinning or emergency rollbacks for apparently random crashes. This is implemented by forcing baseline-codegen in the vendored Zig build path used by `herdr`, and teams should still watch for regressions when the vendored library or Zig compiler defaults change so host-feature leakage does not reappear.
Final score 81Confidence 971 evidence itemherdrlibghostty-vtZigbuild.rsx86_64SIGILL-Dcpu=baseline
DeepSeek-Reasonix now detects Azure-hosted endpoints (`azure.com` in `baseUrl`) and removes the proprietary `extra_body.thinking` field from outgoing requests for those endpoints while still sending `reasoning_effort`, preventing the `Unrecognized request argument supplied: extra_body` 400 error on DeepSeek v4 calls.
What ChangedDeepSeek-Reasonix now detects Azure-hosted endpoints (`azure.com` in `baseUrl`) and removes the proprietary `extra_body.thinking` field from outgoing requests for those endpoints while still sending `reasoning_effort`, preventing the `Unrecognized request argument supplied: extra_body` 400 error on DeepSeek v4 calls.
Why It MattersUsers and operators calling Azure-hosted DeepSeek v4 models through this client should stop seeing failed reasoning requests due to a 400 validation error, so inference calls are more reliable without manual retries or request workarounds. Technically, this is a provider-compatibility fix that aligns Azure payloads with supported fields by removing `extra_body.thinking` while retaining `reasoning_effort`; monitor for additional Azure endpoint URL patterns that could bypass hostname detection and reintroduce unsupported-field failures.
Final score 81Confidence 971 evidence itemDeepSeek-ReasonixDeepSeek v4Azure Foundry endpointextra_bodyreasoning_effortbaseUrl
The PR replaces the old recursive `run_tasks_base` executor with a queue-based worker pipeline (`run_worker_pipeline`) and adds `Task(workers=...)`/`Task(timeout=...)` control using `FixedWorkers` and `AdaptiveWorkers`, while adding a cross-run registry so adaptive tuning can resume from prior targets instead of re-discovering concurrency on every run.
What ChangedThe PR replaces the old recursive `run_tasks_base` executor with a queue-based worker pipeline (`run_worker_pipeline`) and adds `Task(workers=...)`/`Task(timeout=...)` control using `FixedWorkers` and `AdaptiveWorkers`, while adding a cross-run registry so adaptive tuning can resume from prior targets instead of re-discovering concurrency on every run.
Why It MattersOperators running large cognee `cognify` jobs should see faster and more predictable batch completion with clearer failure handling, because failed items can be reported individually instead of failing the whole run. The PR reports a runtime drop on a 200-file benchmark from 108.7s baseline to ~94s on warm, resumed runs, and adds per-task timeouts plus adaptive worker control to reduce stalled execution, but teams should watch for adaptive scale oscillation under changing throttling and for stale cross-run registry targets after process restarts or environment changes.
Final score 80Confidence 931 evidence itemrun_tasksrun_worker_pipelineWorkerStrategyFixedWorkersAdaptiveWorkerscross-run convergence registry_AdaptivePool
This PR introduces a new AI-DLC resiliency extension that adds a 15-rule reliability baseline mapped to 11 of 13 AWS Well-Architected Reliability Pillar questions, gates requirements capture via an opt-in workflow plus an RTO/RPO follow-up, and provides template-based validation so reliability controls are checked through generated CloudFormation artifacts.
What ChangedThis PR introduces a new AI-DLC resiliency extension that adds a 15-rule reliability baseline mapped to 11 of 13 AWS Well-Architected Reliability Pillar questions, gates requirements capture via an opt-in workflow plus an RTO/RPO follow-up, and provides template-based validation so reliability controls are checked through generated CloudFormation artifacts.
Why It MattersTeams building applications with AI-DLC can enable the resiliency extension and receive explicit reliability requirements and safeguards (for example RTO/RPO targets, DR strategy, observability defaults, and health checks) before deployment, which reduces the operational risk of discovering resilience gaps only after production exposure. The change also brings measurable gains in reviewed templates (9/15 rules compliant vs 3/15 without it), but it should be tracked for template-review accuracy, whether the added context (~4,370 tokens) affects larger workflows, and what gaps remain from the excluded REL 2 and REL 3 areas.
This change delivers notebooklm-py 0.5.0 as a cleanup-focused API update: deprecated v0.3-era APIs and shims are removed, remaining deprecations for `mime_type` and `poll_interval` are explicitly scheduled for v0.6.0, and version-gate/docs were updated to enforce the migration path.
What ChangedThis change delivers notebooklm-py 0.5.0 as a cleanup-focused API update: deprecated v0.3-era APIs and shims are removed, remaining deprecations for `mime_type` and `poll_interval` are explicitly scheduled for v0.6.0, and version-gate/docs were updated to enforce the migration path.
Why It MattersApplications and integration scripts that still depend on old notebooklm-py calls will start failing immediately after upgrading to 0.5.0 unless they migrate to the new APIs, so teams get an earlier and cleaner signal that their wrappers are no longer aligned with the supported interface. The PR also closes a planned cleanup pass by removing old compatibility baggage and documenting the v0.6.0 deadline for remaining deprecations, so operators should monitor migration failures around `client.notebooks.share()`, removed `Source/Artifact` fields, and any remaining `poll_interval` usage while validating that error-paths behave normally with only the preserved RPC error aliases.
Final score 80Confidence 951 evidence itemnotebooklm-py0.5.0 releaseAPI deprecationpoll_intervalinitial_intervalRPCError.rpc_idRPCError.code
This PR introduces the `scripts/aidlc-codereviewer` package and `aidlc-code-reviewer` CLI, replacing manual, disconnected review steps with a single workflow that runs configured static analyzers plus AI critique and generates linked summary, technical, and business-logic reports for AIDLC code.
What ChangedThis PR introduces the `scripts/aidlc-codereviewer` package and `aidlc-code-reviewer` CLI, replacing manual, disconnected review steps with a single workflow that runs configured static analyzers plus AI critique and generates linked summary, technical, and business-logic reports for AIDLC code.
Why It MattersDevelopers reviewing AIDLC-generated code can now run one command and get both technical and business-logic review outputs in one pass, which should reduce missed defects and manual review overhead before code moves further in the pipeline; the implementation ties built-in analyzers (bandit, ruff, mypy, radon, vulture) with Bedrock-powered agents and emits linked HTML/Markdown artifacts, but teams should monitor Bedrock permission/setup stability, auto-wrapper generation reliability for newly configured tools, and whether severity filtering continues to prevent non-security tools from surfacing HIGH/CRITICAL findings.
InsForge introduced per-IP, category-based write-rate limiting on mutating API routes and applied 429-aware backoff handling to Deno/Vercel provider write calls, so write traffic is intentionally paced before provider quotas are exhausted and quota breaches are surfaced as controlled rate-limit responses.
What ChangedInsForge introduced per-IP, category-based write-rate limiting on mutating API routes and applied 429-aware backoff handling to Deno/Vercel provider write calls, so write traffic is intentionally paced before provider quotas are exhausted and quota breaches are surfaced as controlled rate-limit responses.
Why It MattersDevelopers and operators can now see a clear, predictable 429 signal when write traffic is too aggressive, which reduces hidden cascading failures during deploy/update bursts and makes retry behavior safer for downstream clients. Concretely, write-heavy flows to Vercel/Deno are throttled and retried using provider headers (`Retry-After`, `X-RateLimit-Reset`) with bounded delay, so quota pressure is managed instead of producing abrupt 500-level outcomes; this should continue to be monitored for side effects on heavy automation workflows and for false-positive throttling on legitimate high-throughput pipelines.
mirrord now rejects duplicate entries in incoming `port_mapping` and `listen_ports` before config processing continues, returning a clear duplicate-port error instead of accepting the config and silently dropping one mapping.
What Changedmirrord now rejects duplicate entries in incoming `port_mapping` and `listen_ports` before config processing continues, returning a clear duplicate-port error instead of accepting the config and silently dropping one mapping.
Why It MattersOperators and developers configuring mirrord now get immediate, explicit feedback when a port is duplicated in `port_mapping` or `listen_ports`, which helps prevent hidden forwarding misconfigurations and missing routes during debugging or development sessions. The previous behavior depended on `BiMap` semantics that kept only one of the duplicates, so one intended mapping vanished without warning; now the failure is deterministic and visible. Next, watch for whether any valid workflows rely on duplicate-like definitions and ensure CI/error surfaces continue to block only truly conflicting configs.
Final score 80Confidence 971 evidence itemmirrord-configport_mappinglisten_portsBiMapduplicate-port validation
This change fixes a compatibility gap in DeepSeek tool-calling requests by adding auto-detection for `api.deepseek.com` models that route to `deepseek-reasoner` (such as `deepseek-v4-flash` and `deepseek-r1`-style IDs), then enabling a compat flag that removes `reasoning`/`reasoning_effort` whenever `tools` are present in the request, which prevents the reproduced 400 rejection path while keeping `deepseek-v4-pro` behavior unchanged.
What ChangedThis change fixes a compatibility gap in DeepSeek tool-calling requests by adding auto-detection for `api.deepseek.com` models that route to `deepseek-reasoner` (such as `deepseek-v4-flash` and `deepseek-r1`-style IDs), then enabling a compat flag that removes `reasoning`/`reasoning_effort` whenever `tools` are present in the request, which prevents the reproduced 400 rejection path while keeping `deepseek-v4-pro` behavior unchanged.
Why It MattersTool-using LLM integrations that rely on DeepSeek flash/reasoner models in `oh-my-pi` no longer fail on the first tool-call turn, so operators and developers can keep their subagent workflows running without abrupt 400 errors and without forcing model fallbacks. The guard is implemented in parameter construction after model routing detection, and `deepseek-v4-pro` is explicitly excluded; monitor whether new DeepSeek model IDs or API behavior changes reduce the accuracy of the current ID-based routing heuristic and whether any model marked compatible starts requiring both reasoning and tools together.
Final score 79Confidence 941 evidence itemDeepSeek APIdeepseek-v4-flashdeepseek-r1deepseek-reasonerOpenAICompatreasoning_efforttoolsopenai-completions-compat
The PR fixes a crash path in `PromptHelper.truncate` and `ChatPromptHelper` by adding empty-input guards so `[]` inputs return an empty result instead of reaching `_get_available_chunk_size`, which previously divided by zero.
What ChangedThe PR fixes a crash path in `PromptHelper.truncate` and `ChatPromptHelper` by adding empty-input guards so `[]` inputs return an empty result instead of reaching `_get_available_chunk_size`, which previously divided by zero.
Why It MattersDevelopers using llama_index with dynamic prompts or chat batches that can occasionally become empty will stop seeing hard failures in truncation/repack flows, so inference pipelines keep processing requests as no-op cases instead of aborting with runtime errors. This is implemented by returning `[]` immediately in `PromptHelper.truncate`, `ChatPromptHelper.atruncate`, and `ChatPromptHelper.arepack` when inputs are empty, which bypasses the buggy division step; monitor for any downstream code that previously depended on an exception for empty lists or that assumes a non-empty return contract.
Final score 79Confidence 981 evidence itemPromptHelper.truncateChatPromptHelper.atruncateChatPromptHelper.arepackZeroDivisionError
This PR shifts LLM profile ownership from user-level storage to an organization-level store for SaaS/Cloud users, introducing org-scoped profile APIs and UI hooks so personal-org admins can create, update, list, activate, rename, and delete shared profiles.
What ChangedThis PR shifts LLM profile ownership from user-level storage to an organization-level store for SaaS/Cloud users, introducing org-scoped profile APIs and UI hooks so personal-org admins can create, update, list, activate, rename, and delete shared profiles.
Why It MattersOrganization admins and members on OpenHands SaaS can now keep LLM settings in one org place, so shared model credentials and runtime preferences can be reused across users instead of being recreated per account, which should reduce setup inconsistency and misconfiguration risk. The change migrates profiles to `org.llm_profiles` (EncryptedJSON), exposes org routes for profile lifecycle actions, and adds UI plumbing in the LLM settings screen for personal orgs with role-based access (owner/admin write, members read/activate). Watch next for rollout behavior in existing accounts: whether legacy per-user profile data is migrated cleanly, whether role checks hold under real org membership changes, and how team-org UI gaps affect users not yet supported by this first version.
Final score 78Confidence 951 evidence itemorg.llm_profilesEncryptedJSONorg_profiles routerorg-level LLM profilesEDIT_ORG_SETTINGSVIEW_ORG_SETTINGS
Fixes a user-facing stale-state bug by treating authentication transitions as cache boundaries: when login/logout/auth errors or authenticated user-id changes occur, InsForge removes auth-scoped queries (apiKey, metadata, users, database tables, usage) so the dashboard recovers with the current user’s data instead of showing old or blank content.
What ChangedFixes a user-facing stale-state bug by treating authentication transitions as cache boundaries: when login/logout/auth errors or authenticated user-id changes occur, InsForge removes auth-scoped queries (apiKey, metadata, users, database tables, usage) so the dashboard recovers with the current user’s data instead of showing old or blank content.
Why It MattersUsers returning from idle or re-authenticating after tab switching will see their own dashboard status instead of blank screens or residual data from another session, which improves trust in what they are seeing and reduces operational confusion during routine navigation. Technically, auth-scoped queries for apiKey, metadata, users, tables, and usage are now canceled and removed on auth state changes, with timeout-capped requests enforced through AbortSignal.any. Continue watching for timeout-related false negatives on very slow networks and for any legitimate background requests that are unexpectedly canceled during rapid auth transitions.
This change adds explicit __all__ exports to four internal collaborator modules and pins those symbol lists with a new test (`test_tier_13_all_exports.py`), preventing silent internal import drift during the Tier-12→Tier-13 migration while keeping the public API unchanged.
What ChangedThis change adds explicit __all__ exports to four internal collaborator modules and pins those symbol lists with a new test (`test_tier_13_all_exports.py`), preventing silent internal import drift during the Tier-12→Tier-13 migration while keeping the public API unchanged.
Why It MattersDevelopers and integrators depending on internal NotebookLM collaborator modules now get deterministic CI failures when expected imports disappear, so hidden breakages from internal refactors are caught during migration and fixed before release packaging rather than discovered in production. The PR enforces this through new __all__ exports and export-set tests, so teams can monitor for two follow-ups: intentional symbol removals that need coordinated test updates, and any downstream packages that still rely on undocumented private symbols outside this contract.
Final score 78Confidence 931 evidence item__all___authed_transport.py_rpc_executor.py_conversation_cache.py_cookie_persistence.pytest_tier_13_all_exports.pyTier 13 migration
The PR introduces a native OrcaRouter integration in aider under the `orcarouter/` namespace, including model metadata lookup and request routing so users can call OrcaRouter models from the same aider workflow. It also wires token-limit and cost metadata from OrcaRouter into aider’s model info path with cached retrieval, and exposes startup validation for the new API key.
What ChangedThe PR introduces a native OrcaRouter integration in aider under the `orcarouter/` namespace, including model metadata lookup and request routing so users can call OrcaRouter models from the same aider workflow. It also wires token-limit and cost metadata from OrcaRouter into aider’s model info path with cached retrieval, and exposes startup validation for the new API key.
Why It MattersDevelopers and operators using aider can now consume OrcaRouter’s catalog from the same command-line flow with one provider-specific setup key, which lowers integration friction when adding/switching vendor models and enables broader model experimentation without custom client changes; the follow-up risk is that stale or wrong cached model metadata could produce unexpected quota ceilings or billing mismatches for production prompts. The implementation maps `orcarouter/` requests to the OpenAI-compatible endpoint used by litellm, adds a 24-hour on-disk cache for pricing and token limits, and adds startup validation plus an `orcarouter-auto` alias and docs/tests to make adoption safer.
Final score 78Confidence 961 evidence itemOrcaRouteraiderOrcaRouterModelManagerModelInfoManagerModel.send_completionlitellmORCAROUTER_API_KEYorcarouter-auto
The PR replaces the old recursive `run_tasks_base` executor with a queue-based worker pipeline (`run_worker_pipeline`) and adds `Task(workers=...)`/`Task(timeout=...)` control using `FixedWorkers` and `AdaptiveWorkers`, while adding a cross-run registry so adaptive tuning can resume from prior targets instead of re-discovering concurrency on every run.
ContributionIntroduces a configurable per-task concurrency engine in the public API and executor path, enabling fixed-order execution when required and adaptive worker scaling/rollback logic that can automatically increase, shrink, or reuse prior optimal concurrency for each task stage.
ImpactOperators running large cognee `cognify` jobs should see faster and more predictable batch completion with clearer failure handling, because failed items can be reported individually instead of failing the whole run. The PR reports a runtime drop on a 200-file benchmark from 108.7s baseline to ~94s on warm, resumed runs, and adds per-task timeouts plus adaptive worker control to reduce stalled execution, but teams should watch for adaptive scale oscillation under changing throttling and for stale cross-run registry targets after process restarts or environment changes.
This change bumps the qmd dependency in numtide/llm-agents.nix from version 2.1.0 to 2.5.1, so environments using this flake will install the newer qmd release by default.
ContributionBumped the managed qmd package version in the repository from 2.1.0 to 2.5.1, updating dependent Nix users to a newer upstream revision.
ImpactDevelopers and operators using numtide/llm-agents.nix to provision qmd now get a newer release automatically, which can alter qmd behavior or tooling compatibility in their workflows, so they should re-run qmd-based tests and validate prompts/CLI usage after merging. The update refreshes the dependency to upstream v2.5.1, so watch for regressions in existing scripts, reproducibility changes in pinned environments, and any integration assumptions tied to 2.1.0 defaults.
The key change in this burst is the qmd version upgrade, moving the repository’s pinned qmd package from 2.1.0 to 2.5.1, which changes the default qmd version served by numtide/llm-agents.nix.
ContributionUpgraded the pinned qmd package version in the repository from 2.1.0 to 2.5.1, so consumers of this flake get a newer qmd upstream release through default inputs instead of staying on the older pinned line.
ImpactTeams and operators using numtide/llm-agents.nix as their default dependency set can get a newer qmd release automatically, which reduces manual pin management and keeps environments closer to current upstream behavior. Because this is a multi-version jump (2.1.0 → 2.5.1), deployment teams should watch for build or runtime regressions in scripts, CI workflows, and configs that depend on older qmd CLI or package behavior.
This PR updates the repository’s pinned amp revision from 0.0.1779222574-g8bb401 to 0.0.1779236441-g5063f4, changing the exact upstream version that users and operators get when using numtide/llm-agents.nix.
ContributionThe change updates the pinned amp commit that numtide/llm-agents.nix consumes, moving its dependency resolution to a newer upstream revision.
ImpactUsers and operators of numtide/llm-agents.nix will start using a newer amp revision after this merge, so behavior, compatibility, and stability can shift based on upstream changes in that snapshot; teams should closely monitor builds and integration tests after rollout to catch regressions early. The practical consequence is that any fixed upstream bug may disappear without local code changes, while any compatibility break can surface as dependency resolution errors or runtime behavior changes, making a staged validation pass necessary before broad deployment.
In PR #5056, llm-agents.nix moves ccusage packaging to a Rust-based approach while updating ccusage to 20.0.0, replacing the previous packaging path so it aligns with ccusage’s current Rust implementation.
ContributionAligned the ccusage definition in llm-agents.nix with ccusage’s Rust-based upstream by migrating the package recipe during the 19.0.3 -> 20.0.0 upgrade.
ImpactNix users deploying llm-agents.nix should get ccusage upgrades that are less likely to break from packaging mismatch, because the package now tracks the project’s Rust-based upstream structure instead of a legacy packaging path. Continue monitoring build output for Rust toolchain compatibility, reproducibility across systems, and any downstream scripts that assume the previous ccusage package layout.
This pull request performs an automated dependency refresh that bumps the gitbutler pin in numtide/llm-agents.nix from 0.19.12 to 0.19.13.
ContributionRefreshed the repository’s dependency definition so users now consume gitbutler 0.19.13 instead of 0.19.12, keeping the flake package set aligned with the newer upstream release.
ImpactUsers and operators of llm-agents.nix can take advantage of the updated gitbutler version through the standard update path, which helps keep their automation stacks in sync with current upstream changes; after this bump, they should monitor whether existing prompts, hooks, or CI jobs behave differently and whether any configuration compatibility issues appear. By moving to 0.19.13, this change could reduce the risk of running against stale behavior, while also introducing potential subtle breakages if downstream modules depend on older gitbutler semantics.
The burst’s main change is a default-on strict decode mode for RPC schema handling (`NOTEBOOKLM_STRICT_DECODE=1`), so schema drift now surfaces as `UnknownRPCMethodError` instead of being silently collapsed to `None` by `safe_index`.
ContributionIntroduced a code-level behavior change that makes response-schema mismatch handling fail-fast by default in `safe_index`; added docs/ADRs plus tests covering the default-on path and `NOTEBOOKLM_STRICT_DECODE=0` override.
ImpactDevelopers and operators calling NotebookLM RPCs now get explicit runtime failures when batchexecute/response schemas change instead of silent null-like behavior, so broken integrations are detected earlier and less likely to corrupt downstream application logic. The system now defaults to strict decode and only allows legacy soft-mode behavior through `NOTEBOOKLM_STRICT_DECODE=0` for one release window, so teams should watch for clients that previously relied on `None` fallbacks, monitor new production error rates around `UnknownRPCMethodError`, and harden schema handling paths before strict mode is enforced broadly.
This change delivers notebooklm-py 0.5.0 as a cleanup-focused API update: deprecated v0.3-era APIs and shims are removed, remaining deprecations for `mime_type` and `poll_interval` are explicitly scheduled for v0.6.0, and version-gate/docs were updated to enforce the migration path.
ContributionImplemented the core deprecation-removal work by stripping legacy methods/properties/shims from the public and CLI API surface and converting the waiting API surface toward `initial_interval`, while retaining permanent aliases for `RPCError.rpc_id` and `RPCError.code` to reduce hard breakage in error handling.
ImpactApplications and integration scripts that still depend on old notebooklm-py calls will start failing immediately after upgrading to 0.5.0 unless they migrate to the new APIs, so teams get an earlier and cleaner signal that their wrappers are no longer aligned with the supported interface. The PR also closes a planned cleanup pass by removing old compatibility baggage and documenting the v0.6.0 deadline for remaining deprecations, so operators should monitor migration failures around `client.notebooks.share()`, removed `Source/Artifact` fields, and any remaining `poll_interval` usage while validating that error-paths behave normally with only the preserved RPC error aliases.
This change fixes a compatibility gap in DeepSeek tool-calling requests by adding auto-detection for `api.deepseek.com` models that route to `deepseek-reasoner` (such as `deepseek-v4-flash` and `deepseek-r1`-style IDs), then enabling a compat flag that removes `reasoning`/`reasoning_effort` whenever `tools` are present in the request, which prevents the reproduced 400 rejection path while keeping `deepseek-v4-pro` behavior unchanged.
ContributionIntroduced a concrete DeepSeek compatibility fix: a new `disableReasoningWhenToolsPresent` flag is auto-enabled for reasoner-bound model IDs on `api.deepseek.com`, and `buildParams` now strips `reasoning` and `reasoning_effort` whenever `tools` exist, with regression tests added to verify the exact keep/drop behavior and opt-out via explicit compat override.
ImpactTool-using LLM integrations that rely on DeepSeek flash/reasoner models in `oh-my-pi` no longer fail on the first tool-call turn, so operators and developers can keep their subagent workflows running without abrupt 400 errors and without forcing model fallbacks. The guard is implemented in parameter construction after model routing detection, and `deepseek-v4-pro` is explicitly excluded; monitor whether new DeepSeek model IDs or API behavior changes reduce the accuracy of the current ID-based routing heuristic and whether any model marked compatible starts requiring both reasoning and tools together.
This change adds explicit __all__ exports to four internal collaborator modules and pins those symbol lists with a new test (`test_tier_13_all_exports.py`), preventing silent internal import drift during the Tier-12→Tier-13 migration while keeping the public API unchanged.
ContributionStabilized internal module APIs by declaring and enforcing explicit export lists for collaboration helpers, turning undocumented symbol drift into deterministic test failures instead of runtime import surprises.
ImpactDevelopers and integrators depending on internal NotebookLM collaborator modules now get deterministic CI failures when expected imports disappear, so hidden breakages from internal refactors are caught during migration and fixed before release packaging rather than discovered in production. The PR enforces this through new __all__ exports and export-set tests, so teams can monitor for two follow-ups: intentional symbol removals that need coordinated test updates, and any downstream packages that still rely on undocumented private symbols outside this contract.
This PR changes herdr’s vendored libghostty-vt Zig build to compile with a baseline CPU target so the nested `zig build` no longer emits instructions from the build host that can crash on older x86_64 machines.
ContributionAdded an explicit compatibility safeguard in the nested Zig invocation by setting `-Dcpu=baseline` for vendored libghostty-vt, and retained the current `-Dtarget` handling needed for sandboxed source builds, changing the package’s effective binary output ABI behavior across x86_64 targets.
ImpactOperators using this package on Linux are less likely to see `SIGILL` startup failures when deployments run on slower or older x86_64 CPUs, so they can avoid per-host pinning or emergency rollbacks for apparently random crashes. This is implemented by forcing baseline-codegen in the vendored Zig build path used by `herdr`, and teams should still watch for regressions when the vendored library or Zig compiler defaults change so host-feature leakage does not reappear.
Google’s announcement of Gemini 3.5 Flash is being interpreted as a major commercial re-positioning of the Flash line: community pricing discussion reports $1.50/$9.00 per million input/output tokens, versus approximately $0.50/$3.00 for Gemini 3.0 Flash preview and around the same bracket as Gemini 2.5 Pro, indicating a clear per-token cost increase rather than a lower-cost Flash default.
ContributionThe release introduces a new Gemini 3.5 Flash model variant whose pricing is materially higher than prior Flash positioning, effectively creating a higher-cost Flash-tier choice for users who may accept increased spend for stronger model capability.
ImpactTeams building on Gemini should expect noticeably higher API spend at similar traffic volumes, so this is a direct budget and routing decision for product owners, not just a naming change. The model now looks closer to a premium tier, so production systems should review model-traffic split rules, budget caps, and fallback thresholds to avoid unexpected token-cost escalation. Operationally, the next thing to monitor is whether 3.5 Flash’s quality or latency improvements are sufficient to justify the cost move against cheaper alternatives, and whether Google publishes official pricing confirmation/docs revisions.
The new Gemini 3.5 Flash model is now the announced focus of the release, and community pricing reports indicate a sharp increase to about $1.50 input / $9.00 output per million tokens, making it materially more expensive than the prior Gemini 2.5 Flash pricing level.
ContributionIntroduces a new model tier version with a clearly higher publicized cost profile, creating a concrete change in how teams should provision and route Flash-class inference traffic.
ImpactOrganizations using Gemini for production inference can expect materially higher API spend with Gemini 3.5 Flash, so teams may need to tighten budget controls or keep older Flash-tier models in routing logic to prevent budget overruns. Watch whether the higher cost is offset by throughput/quality gains before broad migration, and monitor the reported immediate quota-exhaustion behavior in teamwork-preview mode because it may indicate launch-time capacity pressure that could block peak workloads.
This PR performs a single major change: it bumps the repo’s pnpm/action-setup dependency from v2 to v6 in CI automation, moving dependency installation flows to the newer action line so workflow runs are not locked to legacy action behavior.
ContributionThe change updates the pinned action version used by the project’s automation pipeline from v2 to v6, which is important because it unblocks CI/automation compatibility with newer package-manager versions (including pnpm 11-era behavior) and avoids staying on a deprecated dependency-management path.
ImpactDevelopers and operators maintaining ruvnet/ruflo should see fewer dependency-setup surprises in CI as the workflow aligns with a newer action stack instead of the old v2 setup path, but you should watch for failures tied to input defaults, cache-key behavior, and bootstrap path handling after the major action upgrade. Technically, this is a major-version bump of pnpm/action-setup; upstream changes in this range include pnpm 11 support and Node.js 24-era updates, so workflow assumptions that worked with v2 may need adjustment.
The SageMaker Python SDK announcement introduces v3.8.0 with new Feature Store pipeline capabilities, including guided support for Lake Formation governance and Apache Iceberg table property workflows for feature-pipeline setups.
ContributionIntroduced a new SDK release (v3.8.0) for SageMaker Feature Store that adds concrete pipeline-oriented capabilities and implementation examples for governance and table-property operations, reducing manual integration steps for end-to-end feature ingestion workflows.
ImpactMLOps teams managing feature pipelines can shorten setup time and reduce misconfiguration risk because the new SDK release packages ready-to-use guidance for governed Feature Store workflows, so they can focus on business features instead of hand-building policy/table-property plumbing. Operators should watch for Lake Formation permission behavior in existing environments and verify Iceberg table-property handling in production before fully standardizing the new v3.8.0 workflow across pipelines.
mirrord now rejects duplicate entries in incoming `port_mapping` and `listen_ports` before config processing continues, returning a clear duplicate-port error instead of accepting the config and silently dropping one mapping.
Contributionadded a pre-check for duplicate source/target ports in `port_mapping` and `listen_ports` so conflicting network mappings are rejected during config generation with an explicit validation error.
ImpactOperators and developers configuring mirrord now get immediate, explicit feedback when a port is duplicated in `port_mapping` or `listen_ports`, which helps prevent hidden forwarding misconfigurations and missing routes during debugging or development sessions. The previous behavior depended on `BiMap` semantics that kept only one of the duplicates, so one intended mapping vanished without warning; now the failure is deterministic and visible. Next, watch for whether any valid workflows rely on duplicate-like definitions and ensure CI/error surfaces continue to block only truly conflicting configs.
This PR introduces a new AI-DLC resiliency extension that adds a 15-rule reliability baseline mapped to 11 of 13 AWS Well-Architected Reliability Pillar questions, gates requirements capture via an opt-in workflow plus an RTO/RPO follow-up, and provides template-based validation so reliability controls are checked through generated CloudFormation artifacts.
ContributionIntroduces a concrete resiliency-by-design capability: a ruleset + opt-in flow that captures recovery objectives and DR strategy during requirements, then propagates those constraints into design/infrastructure stages and enforces them against templates via comparison and review tooling.
ImpactTeams building applications with AI-DLC can enable the resiliency extension and receive explicit reliability requirements and safeguards (for example RTO/RPO targets, DR strategy, observability defaults, and health checks) before deployment, which reduces the operational risk of discovering resilience gaps only after production exposure. The change also brings measurable gains in reviewed templates (9/15 rules compliant vs 3/15 without it), but it should be tracked for template-review accuracy, whether the added context (~4,370 tokens) affects larger workflows, and what gaps remain from the excluded REL 2 and REL 3 areas.
An open issue in databricks-solutions/ai-dev-kit requests running evaluation tests for the ai-ml-engineering experimental branch across five listed components, indicating an explicit pre-stability validation pass before broader use.
ContributionIntroduced a single coordinated validation request: execute the evaluation suite on the ai-ml-engineering experimental branch for multiple AI runtime components instead of treating those components separately.
ImpactRepository maintainers can use this issue as a coordination signal to run a shared quality gate on the experimental stack, reducing the chance that regressions in one of these modules are missed before adoption. The practical follow-up is to verify that each listed component is actually evaluated and to watch for failed or inconsistent results that could delay promotion or require rollback.
The pull request fixes a race in gateway run cancellation where two concurrent `cancel_run` requests could both pass the initial existence check, causing the second call to return HTTP 409 even though the run had already moved to `interrupted`. The router now re-reads run state after a failed cancel attempt and returns HTTP 202 when the run is already interrupted or already removed, while preserving 409 only for genuinely non-cancelable states.
ContributionIntroduces an authoritative post-cancel status re-check in the cancellation flow: after `cancel()` returns false, the router fetches the current run record and maps `status == interrupted` or missing record to a successful idempotent 202 response, avoiding false conflict responses under concurrent cancel traffic.
ImpactOperators retrying cancel requests on the same run can avoid misleading 409 failures, so orchestration and cleanup jobs are less likely to abort due to race-related false negatives. The change removes an intermittent operational failure mode by treating already-interrupted or already-cleared runs as idempotent success, then still returning 409 for truly non-cancelable states such as completed runs. Follow-up should watch whether this re-fetch path causes any measurable overhead under very high cancel concurrency and whether any completed/store-only cases still surface as unexpected conflicts.
This change fixes a production-impacting retry failure in mirrord by disabling HTTP/2 connection pooling in `ClientStore`; reusing pooled `SendRequest` objects after an idle GOAWAY left a closed sender in the pool, so subsequent retries repeatedly failed with `channel closed`, and forcing a new connection per request removes that failure loop.
ContributionThe PR changes the mirrord HTTP/2 client path so that pooling is effectively turned off (`should_enable_connection_pooling() -> false`), ensuring each request uses a newly created connection instead of reusing possibly closed pooled senders, which restores retry effectiveness.
ImpactOperators and developers running mirrord in gRPC/HTTP/2 environments with short idle timeouts (for example Quarkus via Vert.x in Kubernetes meshes) should see fewer intermittent `channel closed` failures and more successful retries, reducing flaky behavior during local traffic-steering and integration testing. The update works by preventing `get_with_pooling()` from returning stale `http2::SendRequest` instances after GOAWAY-triggered closure, so each request starts with a fresh channel. Watch for connection-establishment cost under high request volume and whether a future targeted fix (e.g., discarding only closed senders) is needed to regain pooling benefits without reintroducing stale-channel reuse bugs.
This PR adds a 30,000ms client-side timeout to `hostServiceCall` in `mcp-v2`, using `AbortController` instead of allowing relay fetches to run until Vercel kills the lambda at 300s; on abort it now throws a clear timeout error, and the timer is always cleared in `finally` so repeated requests do not leak resources. It also adds `HostServiceCallOptions.timeoutMs` to allow override by callers.
ContributionIntroduced a concrete timeout guard for `host-service-client.ts` relay calls: fetch requests are bound to an `AbortController` with a default 30s timeout, emit explicit `Host <id> timed out after <ms> for <procedure>` errors when aborted, and clear timers in `finally` to avoid timer leaks.
ImpactOperators and tool users calling MCP endpoints such as `/api/agent/mcp` and `/api/v2/agent/mcp` should see stalled calls fail in about 30 seconds instead of hanging for 300 seconds, which directly reduces lambda starvation and makes host/offline relay failures visible as actionable tool errors rather than silent timeout storms (previously 83,850+ errors in 24h, 312 in the last hour). The change affects `agents_run`, `agents_list`, `workspaces_create`, and `workspaces_delete`; it improves incident response by surfacing a consistent timeout path and preventing long-running blocked handlers. Watch next: whether the 30s default is too short for legitimate high-latency relay paths and whether clients need tuned `timeoutMs` settings to avoid unnecessary aborts.
DeepSeek-Reasonix now detects Azure-hosted endpoints (`azure.com` in `baseUrl`) and removes the proprietary `extra_body.thinking` field from outgoing requests for those endpoints while still sending `reasoning_effort`, preventing the `Unrecognized request argument supplied: extra_body` 400 error on DeepSeek v4 calls.
ContributionIntroduced `_isAzureEndpoint()` and a payload gate in `buildPayload()` to conditionally strip `extra_body` for Azure hosts, and added a regression test for Azure `baseUrl` behavior so the compatibility change is preserved.
ImpactUsers and operators calling Azure-hosted DeepSeek v4 models through this client should stop seeing failed reasoning requests due to a 400 validation error, so inference calls are more reliable without manual retries or request workarounds. Technically, this is a provider-compatibility fix that aligns Azure payloads with supported fields by removing `extra_body.thinking` while retaining `reasoning_effort`; monitor for additional Azure endpoint URL patterns that could bypass hostname detection and reintroduce unsupported-field failures.
Added an explicit `maxDuration = 800` to both streaming MCP route handlers (`apps/api/src/app/api/agent/[transport]/route.ts` and `apps/api/src/app/api/v2/agent/[transport]/route.ts`), replacing the implicit Vercel default timeout path for these endpoints so MCP sessions can run for about 13 minutes before function recycle.
ContributionThis change configures the MCP agent streaming routes with an explicit `maxDuration` value of 800 seconds (Vercel Pro cap), directly preventing platform-default 300-second timeouts from cutting off active Cursor MCP sessions during normal usage.
ImpactOperators and users of Cursor MCP sessions should stop seeing active AI coding sessions drop around 5 minutes, so workflows that rely on long-lived streaming responses can continue without abrupt session interruption. It does this by overriding the default 300-second Vercel timeout with an explicit 800-second budget; after deployment, teams should monitor for residual timeout errors near the new ceiling and verify no regression if sessions legitimately exceed ~13 minutes.
This change enables callers to pass explicit Google Cloud credentials when creating Anthropic model clients on Google Vertex AI, replacing the previous reliance on Application Default Credentials.
ContributionAdds an explicit credentials parameter path to Anthropic initialization on Vertex AI so applications can inject tenant-specific or service-account-specific Google credentials instead of always using environment defaults.
ImpactMulti-tenant applications and platform operators can now run Anthropic calls on Vertex AI with separate Google credentials per tenant, reducing the risk of cross-tenant credential mixing and helping enforce identity isolation without reconfiguring global ADC state. Watch for regressions where any request path still falls back to Application Default Credentials unexpectedly, and validate error handling when invalid or absent explicit credentials are supplied so failures are surfaced clearly instead of continuing with the wrong identity.
A deserialization bug in google/adk-go PR #690 was fixed so `aiplatformToGenaiContent` now copies the `Id` field for `FunctionCall` and `FunctionResponse`, matching the existing serialization path in `createAiplatformpbContent`; this prevents function response events from being dropped in multi-invocation tool-call sessions that require non-empty IDs.
ContributionThe fix restores round-trip integrity for tool-call event payloads by populating missing `Id` fields during deserialization, and adds a dedicated unit test to lock in this behavior so future changes cannot reintroduce silent response drops.
ImpactDevelopers building multi-turn tool-calling agents with this SDK will no longer see function responses vanish during a session, which reduces failed or incomplete tool workflows and less manual recovery. Since matching logic in session handling depends on non-empty IDs, the parser now preserves those IDs consistently across serialization/deserialization; continue watching for other protobuf message fields that can desynchronize similarly, and expand regression coverage if multi-turn tool-call routing starts failing in new invocation paths.
The PR introduces a dedicated cold-start performance regression suite (`TestPerf_*`) as a hard CI gate and a separate advisory benchmark track, focused on `agent-deck --help` and `--version` lifecycle paths that were previously ungated.
ContributionAdds enforceable startup performance guardrails by classifying lifecycle cases as COLD/WARM, applying concrete budget formulas (5x/3x local median with 1ms floor), and integrating the hard walltime gate into CI so regressions in startup latency are detected automatically.
ImpactDevelopers and operators can now get a fail-fast signal when agent-deck cold-start becomes slower, so release workflows can catch startup regressions before they affect users who rely on quick startup for help/version commands. The gate uses trimmed-mean timing, setup-excluded measurements, and an environment multiplier (`PERF_BUDGET_MULTIPLIER`, default 1.0 locally and 2.0 in CI) to reduce flake-driven failures while still rejecting sustained slowdown; teams should watch for budget drift from CI variance and validate whether the deferred storage-related lifecycle paths remain unguarded.
This PR shifts LLM profile ownership from user-level storage to an organization-level store for SaaS/Cloud users, introducing org-scoped profile APIs and UI hooks so personal-org admins can create, update, list, activate, rename, and delete shared profiles.
ContributionIntroduced a concrete org-scoped LLM profile capability: a new database column on `org`, permissioned org-level profile endpoints, and frontend managers for listing, editing, activating, renaming, and deleting profiles, replacing the prior user-level storage path.
ImpactOrganization admins and members on OpenHands SaaS can now keep LLM settings in one org place, so shared model credentials and runtime preferences can be reused across users instead of being recreated per account, which should reduce setup inconsistency and misconfiguration risk. The change migrates profiles to `org.llm_profiles` (EncryptedJSON), exposes org routes for profile lifecycle actions, and adds UI plumbing in the LLM settings screen for personal orgs with role-based access (owner/admin write, members read/activate). Watch next for rollout behavior in existing accounts: whether legacy per-user profile data is migrated cleanly, whether role checks hold under real org membership changes, and how team-org UI gaps affect users not yet supported by this first version.
This pull request updates the dependency on `actions/github-script` from v7 to v9, which adopts the v9 `@actions/github`/`@octokit/core` stack and switches script behavior to use an injected `getOctokit` helper instead of import-style patterns that relied on `require('@actions/github')`.
ContributionMigrated the CI action dependency to v9 and enabled the new injected `getOctokit` script entrypoint, which changes how automation authors obtain authenticated GitHub API clients inside `github-script` steps and replaces v7-era runtime/import assumptions.
ImpactWorkflow operators and repo maintainers need to update existing `github-script` snippets that still use `require('@actions/github')` or redeclare `getOctokit`, because those CI steps can fail at runtime after the bump; the practical next step is to run integration tests after the upgrade to confirm all action scripts execute. The underlying migration is to the v9 ESM-based package stack and adds `ACTIONS_ORCHESTRATION_ID` in user-agent, so teams should also watch for SyntaxError-like breakages in scripts that rely on old internal `@actions/github` references.
Superset’s PR rewires workspace creation to use TanStack DB optimistic inserts (plus a txid-aware sync path) so in-flight workspace state is tracked in the main collection instead of a separate Zustand sidecar, with failures kept for retry.
ContributionChanged the workspace create flow to a synchronous, fire-and-forget `submit()` that immediately writes an optimistic workspace row into TanStack DB and relies on Electric-confirmed txid matching, removing the dedicated in-flight `workspace-creates` sidecar and adding a minimal local failure store for retry/dismiss behavior.
ImpactPeople creating workspaces should see a more predictable user flow: the new workspace row appears right away as loading, and failed creates now show a concrete retry path instead of turning into long waits or unclear UI states, so operators and users lose fewer minutes to apparent hangs. Technically, this is enabled by moving in-flight state into optimistic `v2Workspaces` rows with `$synced` tracking and fixing write-sync txid handling (`pg_current_xact_id()::xid::text`, null/no-op txid handling, and Electric shape-stream retry on non-fatal errors), which is intended to eliminate the prior 30-second await-timeout loops after create/update/delete races. Watch for whether auth-refresh retries, txid validation around edge-case xids, and failed-create recovery after restart introduce regressions under load.
This change makes secret behavior environment-aware: in cloud deployments, `INSFORGE_INTERNAL_URL` is no longer seeded and is rewritten to `INSFORGE_BASE_URL`, while OSS/non-cloud deployments keep the original internal URL flow for container routing. Unit tests were added to verify both branches so the behavior is locked in for each environment.
ContributionAdded an explicit environment-gated secret flow in backend seeding and function-secret resolution that removes the reserved internal URL secret from cloud environments and preserves OSS internal routing behavior.
ImpactCloud operators and function authors should get less confusing configuration behavior because the platform stops surfacing `INSFORGE_INTERNAL_URL` as a seeded secret in cloud mode, so they can rely on the intended base URL path and avoid misconfigured function endpoints while OSS users retain existing container-to-container behavior. Internally, this is enforced by skipping `INSFORGE_INTERNAL_URL` creation in `seedBackend` and applying the rewrite to `INSFORGE_BASE_URL` only when `isCloudEnvironment()` is true, with new unit tests; next, watch for any cloud scripts/docs still expecting the old secret name and for deployment flows where cloud detection is incorrectly inferred.
The pull request makes a single documentation-focused change by correcting two misspellings in run-llama/llama_index: `envrioment` → `environment` in the You.com API key field description and `psycopge2` → `psycopg2` in the YugabyteDB README. No functional logic, API, or runtime behavior was changed.
ContributionCorrected the textual accuracy of configuration and dependency documentation by fixing two field/driver names (`environment`, `psycopg2`) so developers can follow the docs without being misled by misspellings.
ImpactDevelopers integrating You.com or YugabyteDB with llama_index can follow the docs with fewer ambiguous instructions, which reduces the chance of configuration mistakes during setup caused by wrong variable or driver names. The PR is documentation-only, so there is no immediate model/runtime behavior impact; still, watch whether other references (README sections, examples, or related docs) keep propagating the same misspellings and cause recurring onboarding confusion.
Fixes a user-facing stale-state bug by treating authentication transitions as cache boundaries: when login/logout/auth errors or authenticated user-id changes occur, InsForge removes auth-scoped queries (apiKey, metadata, users, database tables, usage) so the dashboard recovers with the current user’s data instead of showing old or blank content.
ContributionIntroduced auth-aware lifecycle handling in the data/auth flow: AuthProvider now detects user identity transitions and removes user-scoped query data, cancels in-flight auth-scoped requests on auth transitions, and routes request cancellation through combined signals (caller + 30s timeout) to avoid hung or stale auth-bound operations.
ImpactUsers returning from idle or re-authenticating after tab switching will see their own dashboard status instead of blank screens or residual data from another session, which improves trust in what they are seeing and reduces operational confusion during routine navigation. Technically, auth-scoped queries for apiKey, metadata, users, tables, and usage are now canceled and removed on auth state changes, with timeout-capped requests enforced through AbortSignal.any. Continue watching for timeout-related false negatives on very slow networks and for any legitimate background requests that are unexpectedly canceled during rapid auth transitions.
The PR changes campfirein/byterover-cli so forked pull requests only run validation when they are explicitly marked with a `safe-to-test` label, preventing validation from starting automatically on unlabeled fork submissions.
ContributionIntroduced a label-based gate in the fork-PR validation path so validation jobs are conditionally executed only when a pull request has the `safe-to-test` label.
ImpactRepository operators and contributors can prevent unnecessary fork-PR validation runs, so CI queues stay cleaner and compute is spent on PRs that are explicitly approved for testing. The change makes test execution explicit at the label level in the validation workflow; monitor whether urgent changes are delayed because labels are missing or inconsistently applied, and whether bypass paths remain blocked for security or workflow bypass attempts.
The PR upgrades the CI release dependency from goreleaser/goreleaser-action 7.2.1 to 7.2.2, with the primary behavioral change being a fix in nightly mode to resolve and select the newest published release instead of potentially older artifacts.
ContributionUpdated the repository’s release automation dependency to goreleaser/goreleaser-action 7.2.2 and brought in the upstream fix that changes nightly release resolution logic to pick the most recently published release, preventing release jobs from resolving outdated candidates.
ImpactMaintainers can reduce the chance of publishing stale or inconsistent nightly builds, so release operators are less likely to push wrong artifact versions without noticing. The update to goreleaser/goreleaser-action v7.2.2 includes the nightly-resolution fix (`select newest published release`) and action dependency updates, so the release pipeline should now resolve nightly versions more correctly; watch for any subsequent changes in release tags or workflow permissions to ensure the new resolver keeps selecting intended artifacts after future workflow updates.
The PR introduces a native OrcaRouter integration in aider under the `orcarouter/` namespace, including model metadata lookup and request routing so users can call OrcaRouter models from the same aider workflow. It also wires token-limit and cost metadata from OrcaRouter into aider’s model info path with cached retrieval, and exposes startup validation for the new API key.
ContributionAdded a first-class OrcaRouter provider path in aider by implementing `OrcaRouterModelManager` for fetching and caching OrcaRouter model limits/costs, delegating `orcarouter/` model info resolution to it in `ModelInfoManager`, and routing `orcarouter/<vendor>/<model>` requests through the existing OpenAI-compatible litellm client with injected OrcaRouter API key and attribution headers.
ImpactDevelopers and operators using aider can now consume OrcaRouter’s catalog from the same command-line flow with one provider-specific setup key, which lowers integration friction when adding/switching vendor models and enables broader model experimentation without custom client changes; the follow-up risk is that stale or wrong cached model metadata could produce unexpected quota ceilings or billing mismatches for production prompts. The implementation maps `orcarouter/` requests to the OpenAI-compatible endpoint used by litellm, adds a 24-hour on-disk cache for pricing and token limits, and adds startup validation plus an `orcarouter-auto` alias and docs/tests to make adoption safer.
This PR dual-published the May 17 startaitools.com Tier 2 deep-dive on honest LLM performance benchmarking by converting Hugo frontmatter to Astro with no body edits, preserving the article content while adapting it to the repository’s publishing pipeline.
ContributionIntroduced a standardized, publishable guide for paid-API-in-the-loop benchmark design, adding machine-comparable input generation (seeded RNG corpus) and explicit API access gating patterns to reduce misleading benchmark behavior.
ImpactTeams benchmarking LLM systems that involve paid APIs now have a concrete, reusable guide that can make comparison runs more trustworthy across environments, so performance and cost decisions are less likely to be made from misleading numbers. The article also formalizes a double-gate workflow (`API_KEY` plus `EXPLICIT_OPT_IN`) with skipped-but-recorded record-shape handling, so future benchmark harnesses can expose consent/configuration issues earlier; continue watching whether the documented process is actually adopted in CI benchmark scripts and whether any environment differences still produce inconsistent paid-API sampling.
The PR fixes a deterministic CI breakage by correcting the vendoring sync config for the slack-channel plugin: test files were excluded while `bun test` was still kept, causing `No tests found!` failures. The change updates `sources.yaml` so upstream `features/**` test suites are included and runnable in the mirror.
ContributionUpdated the slack-channel upstream sync mapping by removing `*.test.ts`/`*.spec.ts` from `exclude` and adding `features/**` to `include` in `sources.yaml`, so synced test files such as `server.test.ts`, `features/gate-properties.test.ts`, and `features/runner.test.ts` are restored in the vendored mirror and actually executed in CI.
ImpactCI maintainers for MCP plugins now get real test validation for slack-channel instead of a repeatable false failure, so policy/journal/supervisor regressions are more likely to be detected before merge. The prior configuration kept `package.json`’s `bun test` script but removed the referenced tests, which produced a deterministic `No tests found!` exit and let the job short-circuit, hiding later plugin checks; continue monitoring whether other vendored plugins have similar sync-rule drift and whether CI still masks failures before a collect-all-failures mode is fully enforced.
The burst’s primary change is a security-related fix that scopes message conversation access so message operations are constrained to the correct chat context, helping prevent message leakage across conversations.
ContributionAdded/updated the message conversation access logic so requests are validated against the current conversation context, preventing operations from reading or acting on messages that do not belong to that scope.
ImpactUsers with multiple active chats are less likely to see or modify messages from the wrong thread, reducing privacy exposure and context confusion in day-to-day chat operations; teams should monitor handoff and agent-driven flows for edge cases where legitimate cross-thread context is still expected. From a technical perspective, the update hardens message-route authorization by enforcing conversation-level scope checks, so follow-up should verify no bypass exists on less-traveled endpoints or regression in handoff paths.
The nightly release page for charmbracelet/crush now adds a clear artifact verification flow, instructing users to download `checksums.txt` and `checksums.txt.sigstore.json`, verify the checksum file with cosign signatures, and then validate downloaded binaries with sha256 sums.
ContributionIntroduces a concrete supply-chain integrity path in the release documentation: published checksum and Sigstore signature metadata, plus verified command examples, so consumers can authenticate and validate downloaded artifacts before use.
ImpactOperators and developers pulling nightly `charmbracelet/crush` binaries can validate what they install before execution, reducing the chance that an altered or corrupted artifact is used in build, test, or deployment flows. The release now bundles `checksums.txt` with `checksums.txt.sigstore.json` and specifies cosign verification against the goreleaser workflow identity plus hash checking with `sha256sum`, which helps gate CI/CD and local setups against tampering or accidental bad downloads. Watch for identity/issuer changes, failed verification signatures, and checksum mismatches in automated pipelines, as those are the first signals of packaging, trust-chain, or build process regressions.
InsForge introduced per-IP, category-based write-rate limiting on mutating API routes and applied 429-aware backoff handling to Deno/Vercel provider write calls, so write traffic is intentionally paced before provider quotas are exhausted and quota breaches are surfaced as controlled rate-limit responses.
ContributionImplemented a concrete write-protection behavior change: all mutating endpoints now flow through category-based IP rate limits, and provider write calls now retry 429s with header-aware exponential backoff and standardize final exhaustion as a 429 RATE_LIMITED error instead of implicit failure.
ImpactDevelopers and operators can now see a clear, predictable 429 signal when write traffic is too aggressive, which reduces hidden cascading failures during deploy/update bursts and makes retry behavior safer for downstream clients. Concretely, write-heavy flows to Vercel/Deno are throttled and retried using provider headers (`Retry-After`, `X-RateLimit-Reset`) with bounded delay, so quota pressure is managed instead of producing abrupt 500-level outcomes; this should continue to be monitored for side effects on heavy automation workflows and for false-positive throttling on legitimate high-throughput pipelines.
InsForge updated write-path throttling so calls that exhaust retry budgets now return an explicit `AppError(429, RATE_LIMITED)` instead of being remapped to a generic internal 500 error, and moved per-category write limits (functions/deployments/compute) to a live S3-backed JSON config with hourly refresh and startup fallback defaults.
ContributionIt changes the failure behavior of write requests under heavy load by surfacing throttling as a dedicated 429 response and by externalizing write quotas to a hot-reloadable config, eliminating the previous silent mapping to 500 and enabling non-code quota updates.
ImpactAPI clients and platform operators now get a clear throttling signal when write requests are over quota, so clients can back off instead of looping on opaque 500s and operators can tune write limits without redeploying. Technically, this aligns deno-subhosting retry-exhaustion handling with Vercel-style `RATE_LIMITED` signaling and adds a function-form rate-limit source that reads and refreshes per-category caps from S3 config; teams should continue monitoring sustained 429 trends, malformed/missing config behavior, and whether cap updates are applied on the next request as expected.
This PR introduces the `scripts/aidlc-codereviewer` package and `aidlc-code-reviewer` CLI, replacing manual, disconnected review steps with a single workflow that runs configured static analyzers plus AI critique and generates linked summary, technical, and business-logic reports for AIDLC code.
ContributionAdded a unified review stack (runner, common, agent, and tools layers) that orchestrates static checks, AI-assisted code/logic evaluation, and report generation in one command, including auto-generation and caching of tool wrappers and severity mapping that reserves HIGH/CRITICAL for security findings only.
ImpactDevelopers reviewing AIDLC-generated code can now run one command and get both technical and business-logic review outputs in one pass, which should reduce missed defects and manual review overhead before code moves further in the pipeline; the implementation ties built-in analyzers (bandit, ruff, mypy, radon, vulture) with Bedrock-powered agents and emits linked HTML/Markdown artifacts, but teams should monitor Bedrock permission/setup stability, auto-wrapper generation reliability for newly configured tools, and whether severity filtering continues to prevent non-security tools from surfacing HIGH/CRITICAL findings.
This PR hardens Codex MCP export by rebuilding Codex output from a stripped MCP server view and then re-merging only valid, Codex-specific `envVars`, so rulesync-only source fields are no longer emitted into Codex configuration. The change also rejects malformed `env_vars`, `enabled_tools`, and `disabled_tools` inputs and validates environment-variable names, which prevents invalid metadata from corrupting Codex-facing settings.
ContributionImplemented a Codex-specific configuration path that strips unsupported fields before emission, filters and validates `envVars` against the Codex schema, and ignores malformed import arrays for `env_vars`/`enabled_tools`/`disabled_tools`. This concrete cleanup prevents non-Codex metadata from entering Codex output and is covered by regression tests for keeping `env_vars` while removing `targets`, `description`, and `exposed` in generated output.
ImpactDevelopers exporting MCP rules to Codex will get cleaner, less error-prone Codex configs, reducing integration friction and startup/configuration failures caused by unsupported metadata being included in generated output. The key next checks are whether stricter field filtering breaks any custom but still-valid legacy fields in user workflows and whether similar leaking appears in other MCP generators since the PR intentionally deferred a broader cross-generator contract pass.
The PR updates the evaluator dependency group by upgrading semgrep from 1.162.0 to 1.163.0 in awslabs/aidlc-workflows, which changes the code-scanning pipeline behavior toward faster startup and parsing, rather than introducing a new model or runtime feature.
ContributionUpgraded semgrep in the evaluator dependency set (1.162.0 -> 1.163.0), which brings upstream changes for rule loading/validation and parsing: parallel rule validation/parsing for large rule sets, dependency-aware validation moved to core, faster JSON-based transitive reachability parsing, and reduced memory/false-positive pressure in name-resolution flows.
ImpactTeams running CI or local scans through the awslabs/aidlc-workflows evaluator should see scan jobs start faster and return results sooner on large rule sets, so engineers get quicker feedback and less idle wait time on lint/security checks. The upgrade’s mechanism is semgrep’s parallelized rule validation/parsing and core-side dependency-aware validation (plus the transitive-reachability JSON parse-path fix); keep watching for behavior shifts in rule-config failures or finding patterns, especially in Java/Kotlin/Scala projects, where name-resolution changes can alter diagnostics.
The PR fixes a crash path in `PromptHelper.truncate` and `ChatPromptHelper` by adding empty-input guards so `[]` inputs return an empty result instead of reaching `_get_available_chunk_size`, which previously divided by zero.
ContributionIntroduced a concrete correctness fix by short-circuiting three prompt-helper methods on empty sequences (`[]`) and preventing integer division by zero in `_get_available_chunk_size`; also added six targeted unit tests for empty-input and non-empty-input behavior.
ImpactDevelopers using llama_index with dynamic prompts or chat batches that can occasionally become empty will stop seeing hard failures in truncation/repack flows, so inference pipelines keep processing requests as no-op cases instead of aborting with runtime errors. This is implemented by returning `[]` immediately in `PromptHelper.truncate`, `ChatPromptHelper.atruncate`, and `ChatPromptHelper.arepack` when inputs are empty, which bypasses the buggy division step; monitor for any downstream code that previously depended on an exception for empty lists or that assumes a non-empty return contract.
The auth service changed /api/v1/auth/setup-status from a hard per-IP 60-second cooldown that returned HTTP 429 to most requests into a cached-result flow, which is important because it removes a real login breakage pattern where multiple browser tabs hammering the endpoint at once all receive spurious errors. This keeps the endpoint usable during reconnect storms while still limiting costly `count_admin_users` checks to once per IP per 60 seconds.
ContributionAdded a per-IP TTL response cache for setup-status so repeated requests within the cooldown window return the previously computed status instead of repeatedly triggering 429 throttling, preserving the original DB-throttling intent without blocking users.
ImpactUsers logging in from multiple tabs or after a service restart are less likely to get blocked by fake rate-limit errors, so authentication can continue instead of stalling before the UI recovers. Technically, the endpoint now serves cached setup-status responses during the 60-second window and only executes the admin-count query once per IP in that period; operators should watch for stale auth-status cache behavior when admin-user data changes, and verify cache invalidation and keying do not create cross-user sharing or missed updates.
This pull request adds a new `databricks-mlflow-ml` skill that documents and standardizes the classic ML lifecycle on Databricks (train, register to Unity Catalog, batch score), closing a gap not covered by existing MLflow, serving, or Unity Catalog skills.
ContributionIntroduces a single reusable skill artifact (`SKILL.md` plus `references`, `GOTCHAS.md`, and `patterns-*.md`) focused on classic ML, with concrete workflow guidance for UC registration and batch inference, including explicit fixes for known breakages (for example, UC volume artifact location, registry URI targeting, alias-based promotion, and Lakeflow UDF placement).
ImpactData scientists and ML operators running traditional ML on Databricks can now follow one validated, step-by-step playbook from training to batch scoring in Unity Catalog, which should reduce wasted time on recurring issues like models silently going to the wrong registry and scoring paths failing in Lakeflow pipelines. The skill’s added docs and examples pin this down with tested conventions for `mlflow.set_registry_uri('databricks-uc')`, UC `artifact_location` requirements, `@champion/@challenger` alias-based promotion, and `mlflow.pyfunc` loading patterns, and it also enforces correct dependency installation (`mlflow[databricks]`) outside managed clusters. Continue watching for residual failures from older 2.x DBR behavior (for example `artifact_path` usage) and from environments that omit the required UC extras.
InsForge v2.1.6 changes storage download links so release artifacts are served with an appended `?v=<etag>` parameter. This makes CDN caching aware of updated content and reduces stale artifact delivery after updates.
ContributionIntroduced versioned storage URLs by adding an ETag-derived `v` query parameter to download endpoints, so cache keys change when object content changes and updated artifacts are fetched reliably.
ImpactDeployment operators and services that consume InsForge-hosted files are less likely to pull stale artifacts from CDN caches after new uploads, reducing mixed-version rollouts and the resulting runtime inconsistencies. The update works by appending `?v=<etag>` to download URLs, forcing cache refresh behavior on cache layers keyed by query string; teams should watch for path-only caching proxies or clients that drop query parameters, because those paths can still mask fresh content and keep serving old copies.
In v1.26.1, `@nanocollective/get-md` is upgraded so `node-llama-cpp` is only an optional peer dependency, which removes the transitive `node-llama-cpp` download path and deletes about 500 MB of platform-native binaries from default installs and the Nix closure.
ContributionConverted the dependency chain to avoid a hard transitive requirement on `node-llama-cpp`, so the package no longer drags in ~500 MB of platform-specific native artifacts during install while preserving `fetch_url` behavior.
ImpactPeople installing or packaging nanocoder now avoid downloading and storing hundreds of megabytes of unnecessary native binaries, which makes install/build footprints smaller and reduces practical risks like slow CI jobs and disk-pressure failures; watch whether any downstream scripts still assume `node-llama-cpp` is present transitively despite it becoming optional. The change is implemented by upgrading `@nanocollective/get-md` to `^1.4.0` and switching `node-llama-cpp` to an optional peer dependency, and `fetch_url` continues using the standard HTML-to-Markdown path that does not need the removed converter.
The v3.3.0 release introduces explicit zip-slip and path traversal protections in the application flow, tightening handling of extracted or processed archives so crafted file paths cannot escape expected directories.
ContributionAdded path validation in file/archive handling to block traversal payloads from resolving outside the intended base path, directly preventing directory-escape behavior during model/import and update workflows.
ImpactOpen Cowork operators and users get safer installs and model-update flows because malicious archive inputs are less likely to write files outside the app workspace, reducing practical risk of local file corruption or unauthorized file placement. The change also adds explicit zip-slip/path traversal checks in the file-handling path; teams should watch for any legitimate packages that fail due to strict path rules and validate whether any remaining reports show traversal attempts or blocked-edge-case archives.
Aider’s v0.86.0 release extends supported models to include GPT-5 families plus Grok-4, Gemini 2.5 Flash Lite (preview), and Kimi-k2 through existing model integration paths.
ContributionExpanded Aider’s model compatibility list so these new model families are accepted in normal chat/model-selection usage, enabling users to run against GPT-5, Grok-4, Gemini, and Kimi-k2 without adding separate integration work.
ImpactDevelopers and operators can now switch Aider usage to newer models (including GPT-5, Grok-4, Gemini 2.5 Flash Lite, and Kimi-k2) within the same workflow, which can improve model choice for quality/cost balance without changing tooling; watch for provider-specific auth changes, quota/rate-limit behavior, and any model-output or tool-call differences that could surface only during early production use.