Serena could cache file contents in `open_file_buffers` and reuse them without re-reading from disk, so `replace_content` might rewrite a file using outdated content after external edits. This creates a correctness risk in MCP-based coding sessions, with the reported impact including silent deletion of 388 lines of user code, and the key fix is to invalidate cached buffers when the on-disk file changes.
What ChangedSerena could cache file contents in `open_file_buffers` and reuse them without re-reading from disk, so `replace_content` might rewrite a file using outdated content after external edits. This creates a correctness risk in MCP-based coding sessions, with the reported impact including silent deletion of 388 lines of user code, and the key fix is to invalidate cached buffers when the on-disk file changes.
Why It MattersIn multi-tool development workflows, files edited outside Serena (for example by Claude Code Edit/git operations) can be silently and incorrectly overwritten by Serena’s edit tools, so operators can lose real edits while seeing a successful `OK`; with re-read-or-invalidate logic, mixed-tool edits become safer, but teams should still monitor for missed changes on filesystems with coarse timestamp resolution and add checks for near-concurrent edit races.
Final score 85Confidence 971 evidence itemSerenareplace_contentopen_fileopen_file_buffersEditedFileContext.get_original_contentself._encodingcache invalidationfile mtime
The burst is dominated by a security hardening line that upgrades mistune, mako, and axios to patched versions, directly addressing CVE-2026-44897, CVE-2026-44307, and CVE-2026-42264 and reducing known attack surface in request and rendering paths.
What ChangedThe burst is dominated by a security hardening line that upgrades mistune, mako, and axios to patched versions, directly addressing CVE-2026-44897, CVE-2026-44307, and CVE-2026-42264 and reducing known attack surface in request and rendering paths.
Why It MattersDevelopers and operators of OpenHands services now face fewer known security exposure points in markdown parsing, template rendering, and external API request flows, which lowers immediate risk of exploitation from crafted input during normal deployment. Specifically, mistune, mako, and axios were updated to versions associated with CVE-2026-44897, CVE-2026-44307, and CVE-2026-42264; teams should still run regression checks for Markdown/template behavior and HTTP client compatibility after rollout to catch any compatibility or policy-impacting changes.
Final score 84Confidence 961 evidence itemmistune 3.2.1mako 1.3.12axios 1.15.2CVE-2026-44897CVE-2026-44307CVE-2026-42264
Aider’s OpenRouter OAuth flow in `aider/onboarding.py` is updated to prevent the credential file from being created with default permissive permissions: it now hardens `~/.aider` to `0o700`, writes `oauth-keys.env` using explicit `0o600` creation flags, and corrects mode on existing token files after write.
What ChangedAider’s OpenRouter OAuth flow in `aider/onboarding.py` is updated to prevent the credential file from being created with default permissive permissions: it now hardens `~/.aider` to `0o700`, writes `oauth-keys.env` using explicit `0o600` creation flags, and corrects mode on existing token files after write.
Why It MattersOn shared Linux/macOS systems, developers and operators using OpenRouter through Aider are less exposed to local token theft because the saved API key is no longer left world-readable; watch whether multi-user or containerized workstations still have legacy insecure `~/.aider` permission states after upgrading. The fix enforces owner-only permissions (`0o700` for `~/.aider`, `0o600` for `oauth-keys.env`) by creating the file via `os.open` with explicit mode and applying `os.chmod` on existing paths, which removes the `open()` + default-umask exposure and TOCTOU window, while non-Unix platforms retain prior behavior via caught `chmod` exceptions.
Final score 83Confidence 991 evidence itemaider/onboarding.pyOpenRouter OAuthOPENROUTER_API_KEY~/.aider/oauth-keys.envumaskopen()os.openos.chmod0o6000o700
This change introduces a new Next.js middleware and shared host validator that enforce an allowlist on every `/api/*` request, so API handlers like convert/deploy can only be called from loopback by default (`127.0.0.1`, `localhost`, `::1`) unless operators explicitly extend hosts or intentionally disable the gate via environment variables.
What ChangedThis change introduces a new Next.js middleware and shared host validator that enforce an allowlist on every `/api/*` request, so API handlers like convert/deploy can only be called from loopback by default (`127.0.0.1`, `localhost`, `::1`) unless operators explicitly extend hosts or intentionally disable the gate via environment variables.
Why It MattersLocal users of html-anything are less exposed to silent drive-by compromises because API calls from attacker-controlled hostnames are now blocked before they can trigger dangerous actions, such as remote code-execution prompts to local agent CLIs or unauthorized Vercel-token writes. This matters operationally: if a user visits a malicious page while a dev server is running, the server now returns 403 for untrusted `Host` headers instead of spawning commands with skip-permission flags or accepting token swaps. Watch for two follow-ups: whether reverse-proxy setups correctly rewrite/validate Host and whether anyone enables `HTML_ANYTHING_ALLOW_ANY_HOST=1` outside trusted boundaries, because that opt-out restores exposure to the same attack class.
The PR adds `maxDuration = 800` to the two MCP streaming routes (`/api/agent/mcp` and `/api/v2/agent/mcp`) to override Vercel’s default 300s timeout. This is a direct fix for the reported forced disconnects that repeatedly dropped Cursor AI coding sessions at ~5 minutes.
What ChangedThe PR adds `maxDuration = 800` to the two MCP streaming routes (`/api/agent/mcp` and `/api/v2/agent/mcp`) to override Vercel’s default 300s timeout. This is a direct fix for the reported forced disconnects that repeatedly dropped Cursor AI coding sessions at ~5 minutes.
Why It MattersCursor users running MCP-based AI coding sessions will avoid abrupt session drops around the 5-minute mark, so long-running coding interactions are far less likely to be interrupted mid-task. By explicitly setting `maxDuration` to 800 seconds in `apps/api/src/app/api/agent/[transport]/route.ts` and `apps/api/src/app/api/v2/agent/[transport]/route.ts`, the handler lifecycle now bypasses Vercel’s default timeout behavior; teams should continue watching whether sessions still fail near or after 13 minutes and whether timeout-driven reconnect or retry behavior changes for these routes in production traffic.
Final score 83Confidence 991 evidence itemVercelmaxDurationWebStandardStreamableHTTPServerTransport/api/agent/mcp/api/v2/agent/mcpCursor MCP sessions
The change updates Bedrock requests so that when callers do not supply `maxTokens`, the provider sends `inferenceConfig.maxTokens` from `model.maxTokens` instead of relying on the provider default, removing a silent 4096-token output cap that triggered `stopReason: "length"` on long Anthropic Claude generations.
What ChangedThe change updates Bedrock requests so that when callers do not supply `maxTokens`, the provider sends `inferenceConfig.maxTokens` from `model.maxTokens` instead of relying on the provider default, removing a silent 4096-token output cap that triggered `stopReason: "length"` on long Anthropic Claude generations.
Why It MattersDevelopers and operators using this SDK to call Bedrock no longer get long responses cut off mid-task at about 4096 tokens when they forget to pass `maxTokens`, so multi-thousand-token outputs (for coding, writing, and other long-form tasks) can complete more reliably. Previously, missing `maxTokens` let Bedrock enforce a server-side default cap that caused `stopReason:"length"`; the fix now applies the model’s declared token limit by default, and teams should watch for output-cost/latency growth on long prompts and verify new or updated models still expose correct `maxTokens` values.
Final score 83Confidence 981 evidence itemAmazon BedrockinferenceConfig.maxTokensmodel.maxTokensAnthropic Claude Opus 4.7stopReason=lengthstreamBedrocke2e test
PeonPing fixed Windows hook launch failures caused by unquoted path arguments in PowerShell `Start-Process -ArgumentList` calls, where `%USERPROFILE%` values with spaces were being split into truncated paths and child scripts exited with errors while parents still reported success.
What ChangedPeonPing fixed Windows hook launch failures caused by unquoted path arguments in PowerShell `Start-Process -ArgumentList` calls, where `%USERPROFILE%` values with spaces were being split into truncated paths and child scripts exited with errors while parents still reported success.
Why It MattersWindows users whose account/profile path contains spaces will now get hook audio and desktop notifications again instead of silent failures, so reminders and event alerts become visible/audible during normal operation. The fix preserves the full script path when spawning detached PowerShell children; to monitor next, check for remaining unquoted path arguments in other detached `Start-Process` call sites because the parent process still often does not inspect child exit status or stderr in normal paths.
Final score 82Confidence 991 evidence itemPowerShellStart-Process-ArgumentList-FilePeonPing hook scriptsWindows user profile path
A reported issue in memsearch showed that after indexing, `memsearch search` can fail with `MilvusException` 101 (`Collection 'memsearch_chunks' is in state 'released'`) on newer `pymilvus`/`milvus-lite` versions because `_ensure_collection` in `memsearch/store.py` never loads the collection after attach/create. The proposed fix is to call `client.load_collection(collection_name)` in both collection branches (`has_collection` and create path), restoring normal hybrid search behavior on current client versions.
What ChangedA reported issue in memsearch showed that after indexing, `memsearch search` can fail with `MilvusException` 101 (`Collection 'memsearch_chunks' is in state 'released'`) on newer `pymilvus`/`milvus-lite` versions because `_ensure_collection` in `memsearch/store.py` never loads the collection after attach/create. The proposed fix is to call `client.load_collection(collection_name)` in both collection branches (`has_collection` and create path), restoring normal hybrid search behavior on current client versions.
Why It MattersUsers running memsearch indexing pipelines on modern `pymilvus`/`milvus-lite` installations can resume `search` immediately after indexing without random hard failures, which removes a production break where retrieval operations stopped working and required manual dependency workarounds. This happens because newer client versions no longer auto-load collections, so the fix aligns memsearch with explicit load requirements; teams should keep watching for environments where `load_collection` intermittently fails because the patch handles load errors with a broad `try/except`, which could mask retry-worthy root causes.
Final score 82Confidence 941 evidence itemmemsearchpymilvusmilvus-liteMilvusException_ensure_collectionclient.load_collection
This change fixes a handoff/compaction bug where a prior assistant tool call is summarized away, leaving a `tool_result` with no visible `tool_use` anchor and causing Anthropic API 400 failures on resume. The PR adds a second pass in `transformMessages` to track surviving `tool_use` IDs, drop truly orphaned `tool_result` entries, and preserve their payload in a synthesized `<stale-tool-result>` developer message so context is not silently lost.
What ChangedThis change fixes a handoff/compaction bug where a prior assistant tool call is summarized away, leaving a `tool_result` with no visible `tool_use` anchor and causing Anthropic API 400 failures on resume. The PR adds a second pass in `transformMessages` to track surviving `tool_use` IDs, drop truly orphaned `tool_result` entries, and preserve their payload in a synthesized `<stale-tool-result>` developer message so context is not silently lost.
Why It MattersOperators running `oh-my-pi` resume/handoff flows with tool-calling will be able to recover from previously wedged sessions because retries no longer loop on the same malformed payload, so they avoid repeated manual intervention and failed continue actions. The PR converts malformed orphan outputs into structured developer notes while preserving content for auditability, and leaves legitimate deferred or aborted tool-call pairs untouched; teams should still monitor for any unexpected drop of delayed tool results if future compaction transforms alter message ordering or summarization behavior.
Final score 82Confidence 981 evidence itemAnthropic APIhandoff compactiontool_usetool_resulttransformMessagesstale-tool-result
Engram introduces read-side recall telemetry and promotion for stored observations by adding `recall_count` and `last_recalled_at`, then exposing a `/promoted` API and `mem_promoted` tool so high-recall observations can be surfaced by frequency.
What ChangedEngram introduces read-side recall telemetry and promotion for stored observations by adding `recall_count` and `last_recalled_at`, then exposing a `/promoted` API and `mem_promoted` tool so high-recall observations can be surfaced by frequency.
Why It MattersAgents and tool integrators using Engram will get faster access to the memories that are actually reused most often, improving context quality and reducing time spent re-surfacing stale or irrelevant observations during session startup. The system now tracks recall frequency on reads (`Search()` and `GetObservation()`), stores it in `recall_count`/`last_recalled_at`, and exposes promoted results via `/promoted` and `mem_promoted`, so consumer code can implement memory promotion policies without scanning all observations. Continue monitoring whether automated traffic inflates recall metrics, whether fire-and-forget increments can silently miss updates, and whether promotion thresholds (`min_recalls`, `limit`) select useful context instead of noisy repeats.
Final score 82Confidence 921 evidence itemrecall_countlast_recalled_atPromotedObservations()GET /promotedmem_promoted MCP tool
In PR #2499, the Codex adapter now removes `suppressOutput` from the serialized hook result so Codex CLI no longer receives a Claude-only field. This fixes the regression where every affected hook call could fail with `PostToolUse hook returned unsupported suppressOutput` when hooks shared a common `HookResult` object.
What ChangedIn PR #2499, the Codex adapter now removes `suppressOutput` from the serialized hook result so Codex CLI no longer receives a Claude-only field. This fixes the regression where every affected hook call could fail with `PostToolUse hook returned unsupported suppressOutput` when hooks shared a common `HookResult` object.
Why It MattersDevelopers and operators using Codex through claude-mem will see tool/hook calls stop failing on unsupported output fields, so automated flows can continue without unexpected CLI aborts. The patch enforces a clean output contract in `src/cli/adapters/codex.ts` by filtering the Claude-only `suppressOutput` flag before stdout emission; the key follow-up is to continue monitoring other adapter changes for similar leakage of unsupported hook fields that could reintroduce CLI hard failures.
Final score 82Confidence 991 evidence itemCodex CLIsuppressOutputbuildBaseOutput()src/cli/adapters/codex.tsHookResultPostToolUse
Issue #189 identifies a concrete reliability bug where cc-connect’s shared agent-read loop in `codexSession`/`claudeSession` uses Go `bufio.Scanner` for stdout parsing, so single-line outputs around/above ~64KB (e.g., JSON/JSONL) trigger `token too long` and terminate the session.
What ChangedIssue #189 identifies a concrete reliability bug where cc-connect’s shared agent-read loop in `codexSession`/`claudeSession` uses Go `bufio.Scanner` for stdout parsing, so single-line outputs around/above ~64KB (e.g., JSON/JSONL) trigger `token too long` and terminate the session.
Why It MattersIt matters because repeated evidence-backed changes help separate durable movement from noisy update streams.
Final score 82Confidence 941 evidence itemcc-connectcodexSessionclaudeSessionagent stdoutGo bufio.ScannerreadLooptoken too long
This PR fixes a deterministic CI regression by updating the slack-channel sync block in `sources.yaml` to stop excluding `*.test.ts` and `*.spec.ts` and to explicitly include `features/**`, so vendored `jeremylongshore/claude-code-slack-channel` now keeps its upstream test files while still syncing correctly.
What ChangedThis PR fixes a deterministic CI regression by updating the slack-channel sync block in `sources.yaml` to stop excluding `*.test.ts` and `*.spec.ts` and to explicitly include `features/**`, so vendored `jeremylongshore/claude-code-slack-channel` now keeps its upstream test files while still syncing correctly.
Why It MattersDevelopers and CI operators will now get actual slack-channel plugin test results instead of a fake failure caused by missing test files, so regressions in MCP plugin logic are less likely to be hidden and can be caught during normal CI runs. The root cause was a sync-rule mismatch in `sources.yaml` (`exclude` dropped tests while `bun test` remained), and the fix now includes `features/**` while removing `*.test.ts`/`*.spec.ts` from excludes; continue monitoring whether other MCP plugin sync mirrors have similar rule drift and whether remaining short-circuit logic still hides failures in later plugin iterations.
Final score 82Confidence 981 evidence itemjeremylongshore/claude-code-plugins-plus-skillsjeremylongshore/claude-code-slack-channelsources.yamlbun testsync-external.mjstest (mcp-plugins) CI
This change replaces five wildcard `NOPASSWD` sudoers entries (e.g., `useradd *`, `chpasswd`, `find *`) with a single `agor-user-admin` command path and routes user/group/symlink privileged operations through that wrapper with strict validators, reducing privileged-command exposure from broad shell-like sudo access to a constrained entry point.
What ChangedThis change replaces five wildcard `NOPASSWD` sudoers entries (e.g., `useradd *`, `chpasswd`, `find *`) with a single `agor-user-admin` command path and routes user/group/symlink privileged operations through that wrapper with strict validators, reducing privileged-command exposure from broad shell-like sudo access to a constrained entry point.
Why It MattersOperators and security teams can reduce the risk of privilege abuse from the agor daemon because user and password-management actions now go through a single audited root entry point instead of several unrestricted wildcard sudo commands that were vulnerable to flag/path/password-smuggling. This change closes concrete attack surfaces around `useradd*`, `chpasswd`, and `find*`-style misuse while preserving normal functionality, and it should be tracked for any future privileged operation that might still call root tools directly or drift out of the wrapper’s allowlist.
Final score 82Confidence 951 evidence itemsudoersagor-user-adminAGOR_USER_ADMINwrapper validatorsreadlink -fchpasswduser/group/symlink management
Agor now ensures commit identity is set consistently in PR environments by populating missing `GIT_COMMITTER_*` variables from configured `GIT_AUTHOR_*` values, instead of falling back to the executor host’s `~/.gitconfig`. This fixes author/committer splits that caused incorrect multiple PR authors, while preserving any explicitly configured `GIT_COMMITTER_*` settings.
What ChangedAgor now ensures commit identity is set consistently in PR environments by populating missing `GIT_COMMITTER_*` variables from configured `GIT_AUTHOR_*` values, instead of falling back to the executor host’s `~/.gitconfig`. This fixes author/committer splits that caused incorrect multiple PR authors, while preserving any explicitly configured `GIT_COMMITTER_*` settings.
Why It MattersDevelopers and operators using AGOR pipelines will see commits and PR metadata consistently attributed to the intended person, reducing confusion from unexpected committer identities appearing from the execution host and making auditability of contribution history more reliable. Under the hood, the environment resolver now performs identity fallback from missing committer vars to provided author vars, so review workflows should stop showing split authorship; watch whether any legacy jobs still depend on inheriting host `~/.gitconfig` committer settings.
Final score 82Confidence 971 evidence itemAgorcreateUserProcessEnvironmentGIT_AUTHOR_NAMEGIT_AUTHOR_EMAILGIT_COMMITTER_NAMEGIT_COMMITTER_EMAIL~/.gitconfig
This PR adds `object` and `array` recipe parameter types and threads them through Goose’s existing build pipeline so template rendering can consume structured JSON values directly. Matching parameters are parsed from string inputs into `serde_json::Value` and rendered with a structured MiniJinja path, enabling native template access, iteration, and conditionals without changing the external `build_recipe_from_template` API surface.
What ChangedThis PR adds `object` and `array` recipe parameter types and threads them through Goose’s existing build pipeline so template rendering can consume structured JSON values directly. Matching parameters are parsed from string inputs into `serde_json::Value` and rendered with a structured MiniJinja path, enabling native template access, iteration, and conditionals without changing the external `build_recipe_from_template` API surface.
Why It MattersRecipe authors using goose-cli, goose-server, summon, or execute_commands can now implement conditional and loop-heavy templates against nested signal fields (such as arrays of findings or severity metadata) without manually flattening inputs into one-string keys, reducing template rewrites and reducing integration friction for richer automation outputs. The implementation keeps existing callers compatible while parsing `input_type: object/array` payloads into structured values under `build_recipe`; non-structured parameters still use the legacy string path, so behavior remains stable for simple recipes. Watch for increase in JSON-shape errors from producers and ensure invalid nested payloads are caught early so template failures are debuggable before deployment.
Final score 82Confidence 971 evidence itemRecipeParameterInputTypebuild_recipe_from_templateMiniJinjaserde_json::Valueobjectarray
This PR makes the Copilot model list authoritative to the live `/models` response so disabled or unavailable models no longer remain visible. It adds `filterModel` in `githubCopilotModelManagerOptions` to drop returned entries with `model_picker_enabled = false` or non-enabled `policy.state`, and introduces `dynamicIsAuthoritative: true` in `ModelManagerOptions` so `resolveProviderModels` discards bundled `models.json` entries absent from live data before merge and cache write.
What ChangedThis PR makes the Copilot model list authoritative to the live `/models` response so disabled or unavailable models no longer remain visible. It adds `filterModel` in `githubCopilotModelManagerOptions` to drop returned entries with `model_picker_enabled = false` or non-enabled `policy.state`, and introduces `dynamicIsAuthoritative: true` in `ModelManagerOptions` so `resolveProviderModels` discards bundled `models.json` entries absent from live data before merge and cache write.
Why It MattersUsers and operators selecting Copilot models in oh-my-pi will stop seeing models they cannot actually run, so broken picks and `400 model_not_supported` failures caused by org-policy or account setting mismatches should disappear once the live list is refreshed and cached. The change forces `resolveProviderModels` to treat live `/models` output as ground truth, removes stale entries from both in-memory and cold-start cache paths, and should be monitored for edge cases where temporary API truncation or schema changes in `policy` fields could over-prune valid models.
Final score 81Confidence 961 evidence itemGitHub CopilotgithubCopilotModelManagerOptionsfilterModelmodel_picker_enabledpolicy.stateModelManagerOptionsdynamicIsAuthoritative/models APIresolveProviderModelsmodels.json
A single change in `firstLine()` fixes a correctness bug where slicing event subjects to `maxLen` could cut multibyte UTF-8 characters in half, writing invalid UTF-8 into `watcher_events.subject`; the function now backs up to the nearest valid boundary before storing subjects, with targeted tests added for UTF-8 edge cases (Cyrillic, em dash, emoji, and boundary alignment).
What ChangedA single change in `firstLine()` fixes a correctness bug where slicing event subjects to `maxLen` could cut multibyte UTF-8 characters in half, writing invalid UTF-8 into `watcher_events.subject`; the function now backs up to the nearest valid boundary before storing subjects, with targeted tests added for UTF-8 edge cases (Cyrillic, em dash, emoji, and boundary alignment).
Why It MattersWatchers and downstream bridges (including the Python sqlite poller) can now process events reliably without hitting UTF-8 decode crashes from poisoned `subject` rows, so forwarding does not get stuck re-reading the same bad row and silently expanding the retry queue. Practically, this replaces invalid truncated text with a safe nearest-rune boundary (at most 3 trailing bytes removed per case), so event flow stays stable after non-ASCII subjects; operators should still watch for any other code paths that write `subject` values directly and for future changes to truncation length settings that might reintroduce boundary-splitting.
Final score 81Confidence 991 evidence itemfirstLine()UTF-8 boundary trimmingwatcher_events.subjectinternal/watcher/webhook.go
A single bug fix in adk-go PR #670 changes loadartifactstool to return artifact name lists as `[]any` (instead of `[]string`) and updates request handling to accept both `[]string` and `[]any` after a structpb round-trip, directly removing the protobuf serialization break that can block Vertex AI session persistence.
What ChangedA single bug fix in adk-go PR #670 changes loadartifactstool to return artifact name lists as `[]any` (instead of `[]string`) and updates request handling to accept both `[]string` and `[]any` after a structpb round-trip, directly removing the protobuf serialization break that can block Vertex AI session persistence.
Why It MattersDevelopers using Vertex AI sessions with `loadartifactstool` will avoid function-call response persistence errors, so tool invocations can continue and sessions are less likely to fail during normal operation. The root issue was a Go type mismatch between typed string slices and protobuf struct expectations; operators should still watch whether any other tool response fields return typed slices (`[]string`, `[]int`, etc.) because those can reintroduce the same session-layer serialization failure pattern.
Final score 81Confidence 961 evidence itemloadartifactstool.Runstructpb.NewStructProcessRequest[]string[]anyVertex AI session service
The PR adds a startup guard for Ink-based runs that periodically clears Node performance entries so they no longer accumulate indefinitely in the global perf_hooks buffer during long-lived workflows.
What ChangedThe PR adds a startup guard for Ink-based runs that periodically clears Node performance entries so they no longer accumulate indefinitely in the global perf_hooks buffer during long-lived workflows.
Why It MattersLong-running Nanocoder users running many subagent turns are less likely to see hard crashes from `JavaScript heap out of memory`, so operators can keep automation sessions alive longer without manual restarts; monitor memory growth and crash incidence during very long runs to confirm the cleanup frequency remains sufficient under peak render/request load. The root fix removes a silent leak path from dev-mode React render timing calls and undici request timings that were never observed in Nanocoder, and periodically drains the global performance entry buffer that had previously been growing without bound.
Discussion around the HN story converges on a single trend: AI coding helpers enable near-realtime iteration on code and UI changes, while the generated output can be low quality and create a maintenance burden, shifting developer effort from writing code to cleanup and review.
ContributionThe primary change is a workflow reframing: AI is being used more as an acceleration layer for exploration and scaffolding than as a reliable replacement for developers, with maintainability quality now depending heavily on human-led validation and refactoring.
ImpactDevelopers may ship prototypes and interface iterations faster, but teams that treat generated patches as production-ready will face more hidden fragility, refactors, and debt, so ops and engineering leaders should expect higher review overhead and longer stabilization cycles even if initial coding speed improves. The mechanism is that AI-generated solutions optimize for immediate completion and visible correctness, which can leave long-tail quality issues; watch next for rising bug rates, rising maintenance load, and whether AI adoption reduces engineers or simply increases the number of repositories, agents, and tasks each engineer must steward.
PR #662 introduces a copy-ready prompt intended to make installing and setting up context-mode easier for coding agents.
ContributionAdded a standardized, ready-to-use onboarding prompt that encapsulates the setup flow for context-mode, reducing the need to manually assemble install/configuration steps during agent-guided development.
ImpactDevelopers using coding agents can onboard context-mode more quickly with fewer setup errors, because a single copy-paste prompt handles the installation and initialization flow. The technical change is a new prompt template for agent workflows; operators should watch whether the prompt stays aligned with repository setup requirements over time and whether it works consistently across different coding-agent environments.
The main change is in the AI integration path: a default `max_tokens` value was set for Bedrock Claude requests, replacing implicit/unset behavior so generation length handling becomes deterministic by default.
ContributionIntroduces an explicit fallback `max_tokens` in the Bedrock Claude client configuration so calls no longer depend on undefined defaults, improving consistency of generated output sizing.
ImpactDevelopers and operators using the repo’s Bedrock Claude path will see more predictable completion lengths by default, which reduces surprise from inconsistent short responses and makes model behavior easier to manage in production or tests. This is achieved by applying a fixed default token cap when none is specified, and teams should monitor long-generation workflows and verify any prompts that now approach the new default to avoid accidental truncation and to adjust overrides if needed.
This burst fixes a reliability gap in the MSBuild authoring skills by tightening how the Copilot plugin identifies and routes MSBuild review/fix requests, so the four core skills (target-authoring, property-patterns, item-management, extension-points) are selected more consistently and evaluated more accurately.
ContributionReworked the MSBuild skill definitions and eval harness by adding explicit MSBuild/.NET-only activation cues, explicit USE/DO NOT USE boundaries, domain-anchored prompts, and timeout/scenario adjustments that remove routing misses, overfitting behavior, and timeout-related evaluation failures.
ImpactDevelopers using dotnet/skills for Copilot-assisted MSBuild reviews are now more likely to get relevant, skill-specific guidance on build-file issues before merge, reducing the chance that dependency-chain or import mistakes slip into CI. The changes include shortening skill descriptions under SDK limits, clarifying activation scope, and improving diagnosis/fix prompts plus multi-file timeout budgets to cut false skips and premature failures. Watch for sustained activation rate after future skill additions and whether longer eval timeouts raise review latency or hide performance regressions.
The change unifies how the repository constructs file paths by cleaning up path-join usage across the codebase, replacing inconsistent ad hoc handling with consistent behavior to reduce fragile path-related bugs.
ContributionRefactored path-join logic in all relevant code locations to use a single, consistent path handling pattern, directly addressing issue #4780 around path joining correctness in multi-location workflows.
ImpactDevelopers and operators using pi will see fewer crashes or broken file operations when workflows move across different mounts or filesystems, so path-based automation is less likely to fail mid-run and require manual reruns. The technical effect is a repository-wide standardization of path concatenation behavior (including cross-device cases), which should lower environment-specific inconsistencies; monitor for regressions in Windows/Unix separator handling, symlink and relative-path scenarios, and any external integrations that depended on previous inconsistent formats.
Centralized git-ref handling so both install and update paths share the same logic, then ensured existing source directories checkout the pinned ref and re-adding a source with a different ref replaces the prior settings entry.
ContributionIntroduced a shared `ensureGitRef` workflow across install/update operations, added directory reconciliation to re-checkout existing sources to the configured pinned ref, and changed settings replacement behavior to avoid stale ref metadata when a source is re-added with a new ref.
ImpactDevelopers and operators running the coding-agent source management flow will get consistent checked-out code after reinstalls or ref switches, so environments are less likely to run against an unintended revision and avoid the kind of silent drift that causes hidden build/test mismatches; monitor whether any existing callers still skip the shared ref-sync path, as those cases could still leave stale workspaces after ref updates.
The change updates Bedrock requests so that when callers do not supply `maxTokens`, the provider sends `inferenceConfig.maxTokens` from `model.maxTokens` instead of relying on the provider default, removing a silent 4096-token output cap that triggered `stopReason: "length"` on long Anthropic Claude generations.
ContributionAdded a concrete fallback in the Bedrock provider request builder to populate `inferenceConfig.maxTokens` from `model.maxTokens` when user options omit `maxTokens`, then validated the fix with a real end-to-end test that reproduces and confirms removal of the length-stop truncation.
ImpactDevelopers and operators using this SDK to call Bedrock no longer get long responses cut off mid-task at about 4096 tokens when they forget to pass `maxTokens`, so multi-thousand-token outputs (for coding, writing, and other long-form tasks) can complete more reliably. Previously, missing `maxTokens` let Bedrock enforce a server-side default cap that caused `stopReason:"length"`; the fix now applies the model’s declared token limit by default, and teams should watch for output-cost/latency growth on long prompts and verify new or updated models still expose correct `maxTokens` values.
In issue #9255, the closed change centers on a single fix: when running CodeActAgent in CLIRuntime, OpenHands should switch to dedicated CLI prompts that remove browser-action guidance and direct the agent to command-line/file-based workflows.
ContributionIntroduced a CLI-specific prompt path (system/user prompt templates plus mode detection) for CodeActAgent so prompt selection is based on runtime capability, preventing the browser-action guidance from being shown when browsing is unavailable.
ImpactCLI users of OpenHands will stop seeing the agent suggest web-browser workflows that cannot execute, so interactions like image or web-content tasks will use shell/file/Python-oriented steps instead of failing unexpectedly. This directly cuts the “browser functionality is not implemented” failure path and reduces time spent in misleading prompt loops in command-line sessions. Next, watch whether runtime detection reliably switches prompt sets in all CLI entry paths and whether the new CLI templates stay up to date with supported local tool guidance.
Release v0.13.10 changes workspace behavior so mode selection is automatic from the currently open file tabs, removing the manual selector and eliminating reliance on stale persisted-mode state.
ContributionImplemented context-aware workspace mode switching that infers conversation vs. fusion mode from tab state, and removed the legacy manual mode toggle plus persisted-mode file handling that could leave the UI in an outdated mode.
ImpactDevelopers can keep working through file-to-file transitions without manually toggling workspace mode, which reduces interruptions and mistaken mode usage during normal coding and chat workflows. The change now depends on live tab context instead of stored mode state; teams should monitor whether complex multi-tab sessions trigger incorrect auto-switches and whether users need a manual override path for cases where a fixed mode is still required.
Superset adds an agent-only verification pipeline that runs tsgo typecheck, Biome lint/format, and fallow audit checks when Claude Code or Codex sessions stop or attempt commit/push, blocking agent commits on failures. The new hooks are explicitly scoped by agent-session detection, so normal human git commits are not automatically gated by this path.
ContributionImplemented a concrete pre-commit quality gate for AI-generated code by wiring typecheck, lint/format, and dead-code/boundary audits into agent lifecycle hooks and commit/push flow, preventing low-quality agent changes from proceeding without verification.
ImpactTeams using AI coding agents will see fewer low-quality agent patches (type errors, lint drift, or unnecessary/dead module changes) reaching manual review, which reduces cleanup cycles and reduces the chance of broken or mis-scoped code being merged later. The new flow adds agent-only guardrails at stop/PreToolUse events and on commit/push, while leaving human commits unaffected to avoid workflow friction. Continue watching whether the agent-session detection correctly identifies all agent invocation modes, whether fallow audit has false positives in monorepo edits, and whether Codex/Claude hooks fire reliably in real sessions.
Codeg now fixes a permission-handling bug where the assistant could stay in a waiting animation after a permission decision, especially across local windows, reconnects, and auto-approval flows. The release adds a `PermissionResolved` event with `request_id` so the exact pending permission is retired immediately after response handling.
ContributionEmit `AcpEvent::PermissionResolved { request_id }` from both permission-response branches and consume it in the pet snapshot, session-state snapshot, and frontend connection store so the corresponding pending entry is removed by id right away. This replaces the prior implicit wait-for-next-turn behavior that could pin the UI in Waiting and delay follow-up actions.
ImpactUsers approving actions can continue working without the assistant appearing frozen in a waiting state, which reduces false stalls in normal chat, multi-window, and reconnect scenarios and avoids manual intervention to recover stuck permission UI states. Watch for any remaining permission paths that do not include request-id correlation, and monitor reconnect edge cases for late or duplicate events that could still clear the wrong pending permission.
In PR #2499, the Codex adapter now removes `suppressOutput` from the serialized hook result so Codex CLI no longer receives a Claude-only field. This fixes the regression where every affected hook call could fail with `PostToolUse hook returned unsupported suppressOutput` when hooks shared a common `HookResult` object.
ContributionChanged the Codex output serialization path to strip `suppressOutput` before printing hook output, while keeping the Claude Code adapter behavior intact, and added/updated hook-output tests to lock in the corrected payload shape.
ImpactDevelopers and operators using Codex through claude-mem will see tool/hook calls stop failing on unsupported output fields, so automated flows can continue without unexpected CLI aborts. The patch enforces a clean output contract in `src/cli/adapters/codex.ts` by filtering the Claude-only `suppressOutput` flag before stdout emission; the key follow-up is to continue monitoring other adapter changes for similar leakage of unsupported hook fields that could reintroduce CLI hard failures.
This PR changes oh-my-pi so macOS completion and ask notifications are no longer sent through the OSC 9/99 terminal escape path that was often dropped by the OS, but are instead dispatched via `alerter` or `terminal-notifier` when present. It also removes the foreground-mode guard in `sendCompletionNotification`, which previously prevented notifications from firing during normal interactive use.
ContributionThe change introduces a concrete notification path fix: macOS now uses notifier binaries (with per-app notification permission) instead of OSC stdout escapes that were silently discarded, removes an incorrect foreground-blocking condition, and converts notification inputs to a structured options shape so completion/ask events can reliably carry title/body/group metadata for richer handling.
ImpactmacOS users of oh-my-pi now receive completion and input-request notifications during normal interactive sessions instead of silent non-delivery, so operators and developers can reliably notice finished jobs and pending prompts without manually polling terminal output, and click-to-focus can return them to the originating session pane. This is implemented through `alerter`/`terminal-notifier` dispatch on Darwin and the removal of the foreground suppression guard; deployments should keep watching environments where no notifier binary is installed (feature becomes no-op with a logged warning) and whether terminal multiplexer passthrough remains configured correctly for click routing.
This change introduces a dedicated `omp setup codex` command path that standardizes Codex onboarding in oh-my-pi by supporting existing credential import from `~/.codex/auth.json`, automatic device-code OAuth login when credentials are missing, and explicit setup-mode controls.
ContributionIntroduces a first-class Codex credential onboarding command in `setup` that can import existing local Codex CLI auth, run fresh device-code OAuth when needed, and verify credential state via dedicated check modes, with documented usage for the end-to-end setup path.
ImpactDevelopers using oh-my-pi with OpenAI Codex can configure and switch into Codex-capable workflows with less manual credential plumbing, reducing setup friction and avoiding brittle hand-edited auth steps. The command now centralizes onboarding behavior into explicit, scriptable modes (`--from-codex`, `--device`, `--check`, and JSON check output), while also documenting the flow for reliable operator onboarding. Watch for failures around imported credential parsing, permission handling of local auth files, and token lifecycle edge cases (e.g., expired or partially written credentials) that can still break unattended setup.
Adds Cursor MAX mode for 1M-context Cursor models by introducing a `:max` model selector flag and threading it through model discovery, parsing/formatting, session state, and subagent/session creation so MAX context behavior is consistently applied and restored.
ContributionIntroduces MAX mode as a first-class selector flag that is parsed, formatted, and carried as session state (`maxMode`), then consumed by model policy, agent session initialization, and task/subagent creation to keep context-mode decisions consistent across startup, resume, and delegation.
ImpactCursor users can explicitly opt into MAX mode and keep that setting across sessions and delegated subagents, so long workflows are less likely to silently fall back to smaller context behavior or lose continuity during handoffs. Technically, the PR unifies `:max` handling through discovery, selector parsing/formatting, and session/runtime plumbing, and this should be watched for regressions in combined flag parsing (e.g., `:max` with thinking-level modifiers), context-policy fallback paths on session restore, and token/accounting behavior under large histories.
A targeted dependency patch to deepagents 0.6.3 restores automatic ID assignment for incoming messages that lack explicit IDs, fixing a regression introduced in 0.6.2.
ContributionThe change updates deepagents from 0.6.2 to 0.6.3 specifically to re-enable deterministic ID assignment for inbound messages that omit IDs, which was the broken behavior causing the regression.
ImpactOperators and maintainers of Open SWE pipelines can now keep inbound events correctly linked again when messages arrive without predefined IDs, reducing the chance that processing logic misclassifies repeated or retried messages and corrupts review workflow state. Technically this is a regression fix in deepagents’ message-ID path, and teams should watch whether mixed integrations still emit identifier-less messages and whether any duplicate-threading anomalies appear immediately after upgrade.
v1.9.29 introduces an idle-timeout behavior that automatically stops inactive child sessions, reducing forgotten background sessions in Agent Deck workflows.
ContributionAdded an automatic idle-timeout session lifecycle control so dormant child sessions are terminated by default when inactive, improving session hygiene without requiring manual cleanup.
ImpactDevelopers and operators using Agent Deck will have inactive child sessions cleaned up automatically, so idle terminals stop consuming attention and resources, which helps keep multi-session coding workflows responsive and easier to manage. Technically, this release adds `--idle-timeout` session handling to stop dormant children after inactivity in this release; teams should monitor whether legitimate long-pause tasks get killed unexpectedly and tune timeout values per workflow.
In Agent Deck v1.9.28, Claude session IDs are now persisted and rebound from SQLite so terminal AI sessions can resume with correct continuity instead of losing linkage after restart.
ContributionReworked session state handling to persist and rebind Claude session-id in SQLite, fixing the previous behavior where session identity could be dropped and context continuity was not reliably restored.
ImpactDevelopers and operators using Claude sessions in Agent Deck can recover interrupted sessions with fewer unexpected resets, which reduces wasted setup time and continuity loss during AI-assisted coding after restarts. The change is implemented by storing session-id state in SQLite and reloading it on rebind, but teams should watch for migration behavior of sessions created before v1.9.28, and monitor for rare cases where concurrent rebinds may reuse or conflict over session metadata.
Runtime introduced a team-focused workflow for shipping AI coding agents in managed sandboxes, positioning the platform so non-engineers can use Claude Code/Codex-style agents with shared team context and standardized integrations instead of requiring per-repo, one-off local setup from engineers.
ContributionRuntime established a centralized model for coding-agent operations where engineering defines system instructions, skills, and scoped integrations once, then team members can reuse that setup through sandboxed agent sessions that feed proposed changes back into a reviewable development flow.
ImpactTeams with limited engineering bandwidth can now let non-engineers participate in coding tasks through AI agents with less dependency on experts for each repository, which could increase delivery speed for routine edits while keeping changes reviewable before production deployment. From a technical perspective, the platform’s shared context/sandbox approach aims to reduce local-environment drift and setup overhead, but rollout quality should be watched for PR correctness, sandbox isolation and credential safety, and the practical impact of its licensing model on adoption.
Discussion around Google’s Antigravity story indicates a major rollout that feels like a product reset for existing users rather than a compatible upgrade, with users reporting that prior setup and interaction patterns no longer map to the new app state. This is important because teams that depend on continuity in their coding environment may face unplanned migration work after each update, not smooth iteration gains.
ContributionThe primary change is a reported hard transition in Antigravity’s user-facing behavior: the update appears to invalidate prior local state and workflow expectations, so existing users cannot assume smooth continuity after upgrading.
ImpactExisting Antigravity users may see their local development setup become inconsistent after the update, which can directly reduce productivity because time is spent repairing settings, extensions, and chat context instead of coding; operators and maintainers should watch for repeatable migration failures, unofficial recovery dependence, and whether a supported rollback or state-preserving path appears before this disruption repeats. This resembles a state-compatibility break, where extension paths and local SQLite-backed chat history become misaligned until manually reconstructed, increasing operational friction and potential data continuity risks for teams relying on history and local customization.
Agent Deck v1.9.26 changes web session management so closing a session is no longer immediately destructive, reducing the chance that users lose active work when they close the wrong session.
ContributionIntroduced a non-destructive close flow for web sessions and paired it with Undo Delete, replacing immediate permanent deletion and preserving workflow continuity.
ImpactDevelopers using the Agent Deck web interface can close a session without permanently losing it, so accidental closes are recoverable and active coding workflows are less likely to be interrupted; the UI now records close as a reversible action and exposes undo, so operators should watch for restore correctness under rapid close/undo sequences and whether session metadata stays intact after recovery.
The main change in this burst is a coding-agent feature that exposes an edit tool using a unified patch format, giving AI coding flows a direct path for applying code changes.
ContributionAdded a concrete coding-agent edit capability that publishes and applies changes via unified patch output, which standardizes how agent-generated edits are represented and consumed.
ImpactDevelopers using PI’s coding-agent can apply automated code fixes with a cleaner patch-based workflow, reducing friction in scripted edits and making integrations more predictable, while we should watch whether patch generation remains correct in larger repos and whether any existing non-patch clients lose compatibility. The implementation channels edit actions through a unified patch pathway, so downstream tooling can more reliably capture and replay changes, but syntax consistency and edge-case patch failures need monitoring.
This PR changes PI’s coding-agent theme picker to prioritize the theme’s internal content name (instead of filename) when listing and deduplicating themes, preventing mixed or duplicate entries when a theme file name does not match its declared theme name.
ContributionAdjusted theme loading/listing behavior so theme identity is based on the embedded theme name rather than the theme file path/name, aligning the selector and dedupe behavior with the actual content metadata that users care about.
ImpactUsers selecting themes in PI now see consistent theme entries based on the theme’s declared name, so they are less likely to pick the wrong or duplicated theme when file names and metadata differ. The implementation now matches PI’s dedupe behavior (which already keyed on theme name) during listing, so operators should monitor whether any legitimate separately packaged themes share identical names and thus collide, and whether renaming theme files still triggers stale cache behavior in the selection menu.
In v1.9.25, Agent Deck adds MCP management endpoints with a matching web UI flow, enabling users to configure MCP integrations through the UI instead of relying on separate manual configuration paths.
ContributionIntroduced a new web-level MCP management path by adding HTTP endpoints and UI pages for MCP integration management in Agent Deck.
ImpactAI coding workflow operators can now add and update MCP integrations directly in Agent Deck’s web interface, reducing manual setup friction and lowering the chance of misconfiguring tools during onboarding, so teams should watch whether new UI-driven config changes are validated and applied safely without forcing session restarts. The release implements explicit MCP management endpoint + UI coverage in the web layer, which formalizes previously missing management functionality; next checks should focus on credential handling, error messaging quality, and state consistency between web-configured MCP settings and active terminal session behavior.
The patch fixes branch history metadata by storing the original summarized leaf ID as `fromId` in `BranchSummaryEntry`, instead of always copying `parentId`, so branch-navigation events now reflect actual source/summarized context.
ContributionCorrects a branch-summary bookkeeping bug by decoupling `fromId` from `parentId` and persisting the original source leaf ID during branch summarization.
ImpactExtensions and tools that traverse conversation history can now preserve branch-specific state correctly after tree navigation, reducing incorrect state reconstruction and context drift in message-history workflows. Previously, branch summary events collapsed source and target IDs, which made it harder for downstream consumers to detect true navigation boundaries; the fix makes those boundaries explicit by preserving the original summarized leaf as `fromId`. Watch for any integrations that assumed old `fromId == parentId` behavior and for edge cases in existing saved sessions that may need reconciliation.
A user-reported crash shows the Crush agent path fails when executing a `write` tool call generated during a coding session: the tool payload contains invalid JSON (`unexpected end of JSON input` / `unterminated string`), so the assistant cannot complete the requested file write and returns `Bad Request` instead of running the action.
ContributionThis issue identifies a concrete correctness defect in Crush’s tool-call execution path: generated `write` arguments containing inline HTML/JS content are not consistently serialized as valid JSON, causing parse-time rejection before the tool runs.
ImpactDevelopers using Crush for AI-assisted code generation can have generation steps fail silently from the user side because the assistant cannot execute `write` actions when the request payload is malformed, so completed file outputs are not produced and workflows stall. The failure appears in normal tool execution flow (`util.InfoTypeError` and `Bad Request`), so teams should monitor prompts with large multiline code blocks for recurrent payload truncation or escaping regressions and confirm whether newer model outputs increase the rate of invalid JSON in tool invocations.
Release v15.1.9 adds a hard timeout plus abort propagation for stalled web-search fetches, so stalled remote requests no longer block the AI workflow indefinitely.
ContributionAdds timeout enforcement and abort-signal propagation in the web-search fetch path so stalled requests are canceled rather than silently hanging.
ImpactDevelopers and operators using web-search actions can keep assistant sessions usable under poor network conditions, because stalled outbound fetches now fail fast instead of tying up UI/agent flow for long periods. By propagating aborts through the fetch chain, the system can release waiting work sooner and reduce stuck-tool behavior, but teams should watch whether timeout thresholds are too aggressive for slow endpoints and verify retry policies so legitimate long responses are not dropped unexpectedly.
A security-focused pull request updates OpenHands’ Poetry dependency tooling to version 2.3.4 to remediate CVE-2026-41140.
ContributionIntroduces a concrete security patch by bumping the Poetry version used in the project, replacing a toolchain component associated with CVE-2026-41140.
ImpactDevelopers and operators running or integrating OpenHands have a lower immediate security risk from the package-management path covered by CVE-2026-41140, so environments are less likely to inherit a known dependency-management weakness. The change is proposed in an open PR and should be monitored for merge and release timing, CI/build stability after the Poetry upgrade, and any workflow regressions in dependency installation inside containerized setups.
The PR introduces a new `firepass` provider in oh-my-pi so Fire Pass users can authenticate with `omp /login firepass` using `fpk_` keys and access the single subscription model `kimi-k2.6-turbo`, with `firepass/kimi-k2.6-turbo` requests translated to `accounts/fireworks/routers/kimi-k2p6-turbo` to match Fire Pass endpoint constraints.
ContributionImplemented a separate Fire Pass provider path instead of reusing the generic Fireworks provider, including key-specific login flow (`/login firepass`), constrained model cataloging, and provider-level request remapping so `fpk_` keys are sent to the router endpoint Fire Pass requires. A targeted wire-id test was added to validate the routing behavior.
ImpactTeams using Fire Pass in oh-my-pi can now log in with their `fpk_` subscription key and run `firepass/kimi-k2.6-turbo` directly, so the subscription becomes usable through the normal CLI flow instead of being blocked by the old cross-provider behavior; watch whether Fire Pass endpoint or routing contract changes break the single-model, single-endpoint path. Technically, this adds a dedicated `firepass` provider, isolates the model catalog to `kimi-k2.6-turbo`, and translates the model identifier to `accounts/fireworks/routers/kimi-k2p6-turbo` for compatible request routing.
The PR fixes `omp update` rollback behavior so a failed replacement (wrong version or broken binary) is not kept in place: the backup executable is preserved until version verification succeeds, and on verification failure the previous binary is restored while the staged `<target>.new` file is cleaned up.
ContributionIntroduced a new update flow in `update-cli.ts` that defers deleting the backup binary until the new executable passes `printVerification(expectedVersion)`, then performs explicit cleanup and backup restore on failure; this is reinforced with regression coverage for rollback-on-verification-failure and stale `.new` cleanup.
ImpactCLI users and operators of `omp` now avoid ending up with a broken executable after an update, because failed or wrong-version replacements are no longer kept as the active install. The updater now retains the backup until verification succeeds, and if verification fails it restores the previous binary and removes the failed staging file, which reduces downtime from self-update incidents and simplifies rollback behavior during operational updates; watch for any remaining update interruption scenarios (for example killed/cancelled updates or platform-specific path differences) that could bypass the same protection on non-macOS environments.
This update fixes a regression where disabled local providers were still probed: even with `disabledProviders` set to include `llama.cpp`, `lm-studio`, or `ollama`, the model registry still launched discovery work for them during refresh. The patch filters out disabled providers before adding implicit discoverable providers, before runtime discovery refresh, and before built-in provider descriptor/model-manager creation.
ContributionAdded provider-gating logic to the discovery pipeline so disabled providers are never added to implicit discovery configs, never refreshed as runtime discoveries, and never used to build provider descriptors/model managers, plus regression tests that verify disabled provider URLs are not fetched.
ImpactOperators and developers who disable local backends now avoid unexpected localhost calls when refreshing models, so local setups stay quieter and more stable instead of triggering needless network traffic to endpoints they turned off; watch for any newly introduced discovery paths that might still bypass this filter and re-enable probing. Practically, disabled providers are now filtered in registration and refresh stages (`#addImplicitDiscoverableProviders`, `#refreshRuntimeDiscoveries`, `#collectBuiltInModelManagerOptions`), with the new behavior validated by regression tests asserting an empty `disabledProbeUrls` list.
Railway is publicly positioning its platform around AI coding agents and a pull-request-free development flow, highlighting this as the core direction for deployment and automation rather than traditional PR-based code review.
ContributionIntroduced a clear platform strategy shift toward agent-native operations, where AI coding agents become a first-class workflow with less dependence on PR handoffs for iterative development.
ImpactDevelopers and teams using Railway can ship AI-generated code changes with fewer manual PR bottlenecks, so release speed can improve, but teams should monitor quality control and governance because automated agent workflows reduce traditional review friction. The accompanying scale claims and own-metal positioning indicate Railway is attempting to support high-volume AI coding operations, so operators should watch for stability of large agent workloads, auditability of agent-generated changes, and unexpected infrastructure or cost drift as usage grows.
Railway announced a strategic shift toward an agent-native cloud model, combining rapid adoption scale with a move to coding-agent-driven delivery over pull-request-heavy collaboration on its own-metal infrastructure.
ContributionThe primary change is the platform-level repositioning around coding agents as the main execution path for software delivery, signaling that Railway is trying to operationalize higher-throughput agent workflows while de-emphasizing traditional PR-based handoffs.
ImpactTeams using Railway can expect a faster, more automated path from task to deployment by relying more on AI coding agents and less on manual PR loops, which can raise velocity but also shifts quality and safety burden to policy, audit, and testing automation. The practical consequence is better delivery speed only if control points (review gates, permissions, and rollout validation) keep pace as agent activity grows, so operators should continue monitoring approval bypass risk, agent-authored change quality, and cloud cost growth tied to sustained agent spend.
Aider now sets `SSL_VERIFY=False` when `--no-verify-ssl` is used, instead of setting it to an empty string, so LiteLLM-backed providers like Gemini, OpenRouter, and Anthropic correctly receive `verify=False` and stop failing with `CERTIFICATE_VERIFY_FAILED` during HTTPS calls.
ContributionFixed a concrete SSL configuration bug by aligning the `SSL_VERIFY` environment value with LiteLLM’s expected boolean-like input, restoring the intended behavior of the user-facing `--no-verify-ssl` flag.
ImpactUsers and operators running LiteLLM-backed inference or coding-tool calls with `--no-verify-ssl` can now avoid blocked requests due to certificate verification errors, which is especially important in environments with private CAs or constrained networking. The fix works by writing `SSL_VERIFY=False` so LiteLLM’s `get_ssl_verify()` maps it to `verify=False` in its HTTP client; this removes an incorrect false-negation path caused by using an empty string. Watch for whether other code paths still ignore the flag or expose certificate-skip behavior outside trusted environments, since this mode is security-sensitive.
A long-time daily aider user opened an issue noting a perceived slowdown in releases and asking maintainers for explicit direction: upcoming major features/architecture changes, whether the project is now maintenance-focused, and where community help is most useful.
ContributionThe issue provides a concrete feedback signal that users need visible planning communication and contributor guidance, highlighting roadmap and maintenance transparency as the primary operational gap rather than a specific code or runtime change.
ImpactRegular aider users and teams depending on it for daily development workflows now face planning uncertainty, so they may hesitate to invest further in aider-specific process changes until direction is clarified. This is important because continued ambiguity can delay internal adoption decisions and trigger a fallback to alternate tools; monitor whether maintainers publish a public roadmap, clarify maintenance vs. feature priorities, and define actionable contribution paths.
This release adds a remote-session UI update where the preview pane now propagates per-session cost/usage information, giving users visibility into resource and spend impact while the remote agent runs.
ContributionImplements a UI fix that pushes cost and usage metadata from remote sessions into the preview pane, improving operational visibility for remote terminal workflows.
ImpactOperators can see cost and usage signals in the same remote session view, so unexpected spend or inefficient long-running remote runs are easier to notice before they become larger problems; this is especially useful for teams controlling multiple concurrent AI coding agents and doing manual budget monitoring. Continue to watch whether the displayed metrics stay synchronized with backend billing counters during rapid session churn and whether any latency is introduced in high-frequency preview updates.
The discussion signals a shift in AI coding workflows from relying on “smarter” agents to relying on runtime-enforced verification gates: model-generated changes should pass deterministic, typed checks before proceeding in the loop. This turns the control point from prompt quality to validated state transitions, so execution is governed by explicit constraints rather than model intent alone.
ContributionIntroduces and reinforces a concrete behavior change for AI coding systems: insert structural verification checkpoints (guards/invariants) in the agent workflow so each generated step is type-checked and tool-validated before it can execute or move the state forward.
ImpactFor teams building or operating AI coding platforms, adding formal verification gates can prevent unsafe or incorrect agent-generated edits from silently reaching CI and deployment, so developers get clearer rejection signals instead of hidden rollout failures. The mechanism is to enforce runtime gatekeeping through deterministic tooling and typed invariants, with the key follow-up question being how complete those invariants are in security- and correctness-critical paths versus adding extra friction to normal coding throughput.
Agent Deck v1.9.23 introduces a configurable Sessions/Preview split in the terminal UI, letting operators control how the session and preview panes are arranged.
ContributionThe release adds a concrete UI behavior change: users can now configure the split between Sessions and Preview panes, replacing the prior fixed layout and enabling task-specific workspace arrangement.
ImpactDevelopers using the terminal session manager can now organize sessions and previews in a way that matches their workflow, which can reduce context switching during AI-assisted coding, and operators should watch for whether split preferences persist correctly across restarts and screen-resize conditions.
oraios/serena proposes changing status handling so it is updated proactively inside the agent lifecycle instead of computed only when a request asks for it, keeping dashboard status in sync with current agent/task-queue state.
ContributionDefines a concrete status-management change: add SerenaAgent._update_status and invoke it during agent initialization and after each task execution (and task queue changes), with dashboard-mediated status emission and optional differentiation of project-level vs task-queue updates.
ImpactOperators and users reading the Serena dashboard will see fresher, more reliable task progress, reducing confusion from stale status displays when tasks are added, moved, or completed. The design moves status computation from lazy request-time recalculation to explicit updates in the execution path via a new _update_status hook tied to initialization and task/queue mutation points, which should make monitoring behavior easier to trust. Continue watching for any queue transitions that do not trigger the hook and whether higher-frequency updates introduce noticeable UI or runtime overhead.
v1.9.22 adds a fix for MCP-backed sessions by changing child-process cleanup to a single-snapshot discovery plus post-SIGKILL verification, so process teardown is more deterministic and less likely to leave orphaned MCP children behind.
ContributionIntroduces a session shutdown behavior change for MCP workers: child processes are captured in one snapshot for reap and verified after SIGKILL, reducing race-prone cleanup paths during session termination.
ImpactOperators of MCP-enabled Agent Deck sessions are less likely to see leftover helper processes after a session closes, which reduces resource leakage and avoids the buildup that can destabilize long-running coding agent workflows. The single-snapshot reap and post-SIGKILL check should improve cleanup reliability, but deployment teams should keep watching for any legitimate long-running MCP processes being killed during teardown and whether cleanup latency grows when sessions are started/ended in high volume.
This release introduces a security baseline by enabling repository and CI checks (CodeQL, Dependabot, govulncheck, and stricter golangci-lint) and adding SECURITY.md/CODEOWNERS, so security and dependency issues are surfaced earlier in the development path.
ContributionAdded repository-level and CI security guardrails in v1.9.21 by integrating multiple security tools and policy files, creating a clearer gate for risky code paths and vulnerable dependency updates before release.
ImpactDevelopers and release operators get earlier warnings about risky code changes and vulnerable dependencies, which makes it less likely that insecure changes are shipped to users. The release wires CodeQL, Dependabot, govulncheck, and strict golangci-lint into the repo process and adds SECURITY.md and CODEOWNERS, but teams should watch for new scan noise, stricter enforcement friction for contributors, and any increase in CI runtime that could delay publishing.
This burst includes a focused change to the CLI device-login flow: users can copy the login device code to the clipboard, reducing manual entry during authentication setup.
ContributionAdded clipboard-based support in the device authentication path so the login device code can be copied directly from terminal output instead of being retyped.
ImpactCLI users doing device-based authentication can copy the code to their clipboard and paste it on the target browser/device, which lowers setup friction and reduces mistakes that can block or delay sign-in. Continued tracking should focus on whether clipboard handling works consistently across supported OS/terminal environments (especially headless or permission-restricted sessions), since failures would push users back to slower manual copy workflows and could reintroduce authentication friction.
A leading HN discussion on building 100K+ lines of Rust with AI highlights a shift toward spec-first development: one LLM writes or plans code, another LLM repeatedly critiques the spec, and coding proceeds only after convergence to reduce flawed designs before implementation.
ContributionDefines a concrete AI coding practice in which LLMs are chained as a review loop on implementation specs, so weak or inconsistent designs are challenged before heavy code generation and only validated plans move into implementation.
ImpactRust teams using AI coding assistants can reduce late-stage surprise failures by validating requirements and design through cross-LLM critique before writing large code chunks, which can cut rework and refactor churn in long projects; teams should watch whether this guard actually lowers real-world defects in generated code, especially around Rust ownership/lifetime errors and overuse of fallback patterns like excessive cloning. The practical mechanism is a dual-model review and sign-off step plus stronger test practices, and the key follow-up signal is whether generated tests and reviews remain manually auditable rather than trusted as ground truth.
This change targets a real responsiveness failure: on Windows, Defender-related blocking during file operations can freeze the TUI. By moving likely synchronous file-system operations and image resizing out of the main path and into asynchronous/background execution during streaming, the PR aims to keep interactive sessions usable under these contention conditions.
ContributionRefactored streaming-adjacent tool behavior so common blocking filesystem calls are handled asynchronously and image resize handling is executed by a worker instead of the UI-facing flow, reducing direct main-thread blocking.
ImpactWindows users running the TUI are less likely to lose interactivity during streaming, so their coding sessions stay responsive instead of locking up when Defender interferes. The main loop is decoupled from these slow I/O and image-processing steps, which should lower freeze frequency and keep tool output flowing. Continue to watch for background-task backlog, any ordering or timing changes in resized outputs, and whether async failures are reported clearly enough when file operations are deferred.
The change documents the existing Pi coding agent across the CLI docs set (README, supported-agent lists, architecture integration checklist, and agent guide), adding Pi-specific limitations, transcript format details, and plugin hook guidance so users have a unified integration reference instead of piecing behavior together from source internals.
ContributionClosed a documentation gap by explicitly adding Pi to discovery and architecture documentation, including session-tree behavior, active-branch resolution, and transcript/file-based plugin integration requirements; this turns Pi usage from implicit/implicit-only to documented guidance.
ImpactDevelopers integrating Pi into entireio/cli workflows now have explicit, in-repo instructions, so setup and troubleshooting become more predictable and less error-prone than relying on inferred behavior. The practical path now includes documented transcript capture, hook lifecycle, and session caching expectations, which should reduce integration mistakes and support requests; teams should monitor whether docs stay synchronized with actual Pi implementation details, especially around transcript schema, branch resolution, and plugin file patterns.
This change introduces a PR-scoped babysitter agent in open-swe, adding metadata helpers plus a unified control surface (cron-driven polling, GitHub comment commands, and a dashboard cancel endpoint) so babysitting tasks can be managed per pull request.
ContributionImplemented PR-scoped babysitter orchestration that tracks agent state per pull request and exposes operator controls to start, stop, and cancel babysitting workflows through comments and UI actions.
ImpactRepository operators and reviewers can now control PR babysitting per pull request instead of relying on ad hoc manual monitoring, which can reduce forgotten or duplicated babysitter runs and make intervention clearer during long-lived reviews. The PR-level babysitter graph and metadata helpers pair scheduled polling with explicit command-driven control, and the cancel endpoint allows explicit termination from the dashboard; teams should watch for state divergence between scheduled polling and comment commands, cancellation idempotency under rapid command changes, and permission/access handling on dashboard actions.
The key change is a memory-safety fix that drains the performance entry buffer during extended Ink workloads, preventing buffer growth from pushing the process into out-of-memory termination.
ContributionAdded or adjusted performance-entry buffer handling so buffered telemetry is drained during long sessions instead of accumulating unboundedly, directly fixing a root cause of memory exhaustion.
ImpactOperators running Nano-Collective/nanocoder on long Ink sessions should see fewer out-of-memory crashes and less abrupt interruption, which improves session continuity for production inference or tooling workflows. After this change, teams should continue watching memory ceilings and session stability over multi-hour runs to confirm the drain cadence is effective and that needed diagnostics are still retained for debugging.
This pull request updates the numtide/llm-agents.nix dependency pin for amp from 0.0.1779222574-g8bb401 to 0.0.1779236441-g5063f4, so future Nix builds use a newer amp snapshot.
ContributionRefreshed the pinned amp version in the project’s dependency inputs, replacing the previous committed revision with a newer one.
ImpactDeployments that consume llm-agents.nix will now pull a newer amp revision on rebuild, so operators may receive dependency-level fixes or behavior changes without manually editing the pin. Concretely, the lock revision moved from 0.0.1779222574-g8bb401 to 0.0.1779236441-g5063f4; teams should watch downstream CI, integration tests, and reproducible rebuilds for compatibility or behavior regressions.
Agor now ensures commit identity is set consistently in PR environments by populating missing `GIT_COMMITTER_*` variables from configured `GIT_AUTHOR_*` values, instead of falling back to the executor host’s `~/.gitconfig`. This fixes author/committer splits that caused incorrect multiple PR authors, while preserving any explicitly configured `GIT_COMMITTER_*` settings.
ContributionImplemented an explicit env-resolution fix: when `GIT_AUTHOR_NAME`/`GIT_AUTHOR_EMAIL` are present but corresponding committer vars are absent, Agor auto-sets `GIT_COMMITTER_NAME`/`GIT_COMMITTER_EMAIL`; explicit committer vars are never overwritten. Added test coverage for full mirror, explicit override, partial mirror, and no-op behavior.
ImpactDevelopers and operators using AGOR pipelines will see commits and PR metadata consistently attributed to the intended person, reducing confusion from unexpected committer identities appearing from the execution host and making auditability of contribution history more reliable. Under the hood, the environment resolver now performs identity fallback from missing committer vars to provided author vars, so review workflows should stop showing split authorship; watch whether any legacy jobs still depend on inheriting host `~/.gitconfig` committer settings.
The ccusage package definition in numtide/llm-agents.nix was migrated to match ccusage’s Rust-based implementation during the 19.0.3 to 20.0.0 update.
ContributionAligns the Nix packaging path for ccusage with its upstream Rust codebase, reducing the risk of using an incompatible non-Rust packaging workflow for the new 20.0.0 release.
ImpactNix users upgrading ccusage through llm-agents.nix should see packaging behavior that tracks upstream implementation changes more closely, so upgrades are less likely to fail due to stale or mismatched build assumptions. This update is a compatibility-focused runtime of the package expression, so operators should watch whether rebuilds remain reproducible across supported platforms and whether Rust compiler or crate-version pinning in the flake causes transient build breakage; if those issues appear, they indicate follow-up fixes needed before broad rollout.
This PR adds `object` and `array` recipe parameter types and threads them through Goose’s existing build pipeline so template rendering can consume structured JSON values directly. Matching parameters are parsed from string inputs into `serde_json::Value` and rendered with a structured MiniJinja path, enabling native template access, iteration, and conditionals without changing the external `build_recipe_from_template` API surface.
ContributionIntroduced end-to-end structured parameter support by adding `object` and `array` to the parameter type enum, converting only those inputs from `HashMap<String, String>` into JSON values, and switching the renderer path to a structured MiniJinja flow for those templates.
ImpactRecipe authors using goose-cli, goose-server, summon, or execute_commands can now implement conditional and loop-heavy templates against nested signal fields (such as arrays of findings or severity metadata) without manually flattening inputs into one-string keys, reducing template rewrites and reducing integration friction for richer automation outputs. The implementation keeps existing callers compatible while parsing `input_type: object/array` payloads into structured values under `build_recipe`; non-structured parameters still use the legacy string path, so behavior remains stable for simple recipes. Watch for increase in JSON-shape errors from producers and ensure invalid nested payloads are caught early so template failures are debuggable before deployment.
This PR makes the Copilot model list authoritative to the live `/models` response so disabled or unavailable models no longer remain visible. It adds `filterModel` in `githubCopilotModelManagerOptions` to drop returned entries with `model_picker_enabled = false` or non-enabled `policy.state`, and introduces `dynamicIsAuthoritative: true` in `ModelManagerOptions` so `resolveProviderModels` discards bundled `models.json` entries absent from live data before merge and cache write.
ContributionIntroduced two coordinated catalog guards: a policy-aware `filterModel` filter for live Copilot responses and an authoritative merge mode that replaces the static-then-overlay model merge with live-list pruning, ensuring bundled-only models cannot reappear in the picker or cache when the account API omits them.
ImpactUsers and operators selecting Copilot models in oh-my-pi will stop seeing models they cannot actually run, so broken picks and `400 model_not_supported` failures caused by org-policy or account setting mismatches should disappear once the live list is refreshed and cached. The change forces `resolveProviderModels` to treat live `/models` output as ground truth, removes stale entries from both in-memory and cold-start cache paths, and should be monitored for edge cases where temporary API truncation or schema changes in `policy` fields could over-prune valid models.
This PR updates the pinned gitbutler version in numtide/llm-agents.nix from 0.19.12 to 0.19.13, replacing the previously used patch release in the Nix-based configuration.
ContributionBumps the repository’s gitbutler dependency pin to a newer patch version, so downstream users of this flake track the latest upstream release when they update this revision.
ImpactDevelopers and operators using llm-agents.nix can adopt gitbutler 0.19.13 without manual pin edits, which can reduce drift against upstream and make any upstream bugfixes immediately available in their workflows; the key next step is to observe whether repository-sync and agent orchestration behavior changes after rollout. This change is an automated package bump only, so validation should focus on CLI compatibility, hook/config behavior, and CI automation to catch any regressions introduced by the newer gitbutler release.
Introduces a per-tool approval policy system for the coding-agent so each tool is classified as auto-allowed, prompt-required, or denied according to its risk profile, with explicit action-based exception handling before execution.
ContributionImplements explicit safety logic that resolves tool execution decisions from configured risk levels and exception rules, replacing implicit/default behavior with deterministic pre-execution control for each tool.
ImpactOperators and teams using automated coding workflows get safer default behavior because risky tools can now be blocked or confirmed before they run, reducing accidental harmful commands in both interactive and CI-like automation flows. The new policy engine evaluates action exceptions plus config overrides and emits actionable blocks in non-interactive mode, while --auto-approve still bypasses prompts; watch for misconfigured exceptions and any production scripts that relied on old permissive defaults, since those are the highest risk areas for unintended execution changes.
The PR rewrites the reviewer agent prompt to prioritize precision over recall, adding stricter severity calibration, explicit non-flag categories for speculative claims, and evidence-first handling of cross-file or architectural findings.
ContributionChanges the reviewer behavior from broad flagging toward evidence-backed triage: speculative concurrency, security, and performance claims are de-prioritized, comment text is constrained to concise bug-first findings, and architectural findings are required to be grounded in repo/docs/code checks.
ImpactDevelopers using the Open SWE reviewer should receive fewer speculative or noisy findings, so review time is likely to shift from filtering false positives to addressing real defects, while teams should watch the next benchmark run to confirm recall does not regress as precision improves. The rewrite adds explicit calibration for severity (making high/low the normal case, medium the exception), a "things not to flag" section, and a mandatory evidence gate for cross-file or architectural claims, which can reduce over-reporting but may also hide lower-confidence issues if the evidence-gathering step is incomplete. Continue monitoring comment usefulness, missed-valid issues, and fallback behavior when DeepWiki or web lookups are unavailable.
This change makes Copilot authentication handling consistent by always sending `Copilot-Integration-Id: vscode-chat` on Copilot model requests and by storing a stable GitHub numeric user id from `GET /user` on login so repeated OAuth logins match existing credentials instead of creating duplicates.
ContributionAligns Copilot API requests with the correct integrator scope (`vscode-chat`) and makes Copilot credentials matchable by stable account identity, preventing `model_not_available_for_integrator` behavior from mixed token types and avoiding unbounded duplicate credential rows.
ImpactOperators and developers using oh-my-pi with GitHub Copilot will see more stable model availability and cleaner credential storage because logins now resolve to the same user identity and requests are tagged with the proper Copilot integrator context. Watch next for any new Copilot call path that bypasses the new header, and for intermittent `/user` fetch failures (currently silent) because those are the remaining conditions that can still produce duplicate rows or fallback credential behavior.
Version 0.5.0 of `notebooklm-py` completes a deprecation cleanup pass by removing v0.3-era public APIs and legacy shims (including deprecated sharing/file/polling surfaces), while updating docs and version-gate checks to enforce the deprecation timeline and keeping `RPCError.rpc_id` / `RPCError.code` as permanent aliases.
ContributionThis change implements the first phase of the deprecation plan by deleting deprecated type properties and compatibility shims, replacing `client.notebooks.share` usage with `client.sharing.set_public`, migrating waiting calls to `initial_interval`, and aligning docs/changelog/version-gate coverage so the deprecation removals become explicit and test-covered.
ImpactDevelopers and operators upgrading to notebooklm-py 0.5.0 must migrate integrations that still call removed sharing, source, or polling interfaces, otherwise notebook workflows can fail at runtime during share or completion-polling steps. The PR removes deprecated APIs (`client.notebooks.share`, `--mime-type`, `poll_interval`, deprecated type properties like `source_type`/`artifact_type`) and legacy shims, while preserving `RPCError.rpc_id` and `RPCError.code` as permanent aliases to reduce immediate breakage in existing error-handling code; watch upgrade logs for callers hitting removed names and verify no lingering dependency on removed properties before the v0.6 removal window.
This change updates artifact type filtering so unknown and quiz/flashcard artifact outputs are handled more correctly, reducing misclassification during NotebookLM processing.
ContributionAdjusted the core artifact-type filter to explicitly account for previously edge-case categories (unknown and quiz/flashcard artifact kinds), fixing classification behavior that could cause them to be routed incorrectly.
ImpactDevelopers and operators handling NotebookLM outputs will get more stable artifact pipelines for odd or newer artifact kinds, so less time is spent on manual cleanup when quiz/flashcard or unrecognized items appear. The update tightens filtering logic around these categories; teams should watch for integrations that depend on older fallback behavior for custom artifact labels, since those paths may now behave differently.
This change makes `safe_index` explicitly raise a `DeprecationWarning` when fallback to soft decode is triggered by explicit `NOTEBOOKLM_STRICT_DECODE=0`, and it adds targeted tests and documentation for the deprecation and behavior timeline.
ContributionAdded a concrete deprecation signal path for opting out of strict decode: explicit soft-mode fallback now warns at runtime, with unit tests covering warning/no-warning modes and documentation/changelog updates that define the configuration retirement timeline.
ImpactOperators and integrators using `NOTEBOOKLM_STRICT_DECODE=0` now get an early, visible notice that their current decode configuration is deprecated, so they can migrate before the v0.6.0 behavior change instead of discovering failures after upgrades. The warning also includes decoder identity and has verification coverage for strict and fallback paths, but teams should continue monitoring environments that still depend on silent soft-mode behavior and whether log/CI pipelines start surfacing deprecation noise from long-running workloads.
Added a new `antigravity` package at version 1.0.0 to llm-agents.nix, packaging the upstream Google Antigravity CLI binaries for Linux/Darwin x86_64 and aarch64 and exposing them as `agy` with an `antigravity` symlink.
ContributionIntroduced a concrete Nix packaging change: a dedicated `packages/antigravity` definition for v1.0.0, with official manifest-based binary sources, hash pinning, and a custom updater so the package can be installed reproducibly and refreshed from upstream releases.
ImpactDevelopers and operators using this flake can now add Google Antigravity CLI 1.0.0 via `nix` and run it through `agy`/`antigravity` immediately, reducing manual install steps and setup variance across Linux and macOS environments.
The package is wired to official per-platform manifest endpoints and SHA512-pinned artifacts, then exposed through a convenience symlink, so the next thing to watch is whether upstream manifest format or checksums change and how reliably the updater keeps the package current across both architectures.
The pull request re-anchors the concrete client orchestration implementation as `_session.Session` (in `src/notebooklm/_session.py`) and converts `_core.py` into a legacy re-export shim to preserve existing private import compatibility.
ContributionDefined a stable new internal location (`_session.Session`) for the concrete implementation and kept `_core.py` as a compatibility layer, so internal callers can migrate to the new structure without breaking private callers.
ImpactDevelopers and operators that rely on notebooklm’s internal/private imports can keep existing integrations running while the codebase shifts maintenance to one canonical implementation location, reducing the immediate risk of integration breakage during rollout. The PR moves the concrete session logic into `_session.Session` and leaves `_core.py` as a compatibility re-export surface; this should make future edits easier to consolidate, but teams should watch for hidden dependencies on `_core` internals or monkeypatch surfaces (especially in private extension code and tests) because those can fail even when public APIs like `NotebookLMClient` are unchanged.
The PR updates `codecompanion.nvim`’s `claude` and `cli` parsers so relative `@` include paths are resolved against the directory of the file containing the include, instead of the current working directory, fixing include lookups for files located outside `cwd`; absolute paths and cases with unknown source directories are intentionally left unchanged.
ContributionFixes include-path resolution logic in both parser paths so `@`-style relative includes are looked up from the include file’s own directory, preventing wrong-file resolution when that file is outside the process working directory; adds parser tests to validate the behavior.
ImpactDevelopers using include-based prompt fragments will get the intended instruction files loaded consistently even when those files are stored outside the project working directory, so prompt context is less likely to go missing during daily tool usage. Technically, the `claude` and `cli` parsers now anchor relative include resolution to `source_dir` (with absolute-path behavior unchanged), and this should reduce workspace-dependent parsing failures; continue watching cases with ephemeral/virtual buffers, symlink-heavy paths, and unresolved `source_dir` states for any remaining include misses.
Replaces the prior fuzzy 3-way stale-anchor recovery path with a strict two-tier process that either applies edits directly, shifts anchors through a structured diff map, or returns corrected anchors for an immediate retry, removing the old mechanism that could silently mis-apply edits after file changes.
ContributionImplements a safer stale-anchor workflow in the hashline edit engine: when anchors match, edits apply immediately; when the file changed structurally, `computeLineShiftMap` shifts anchors and re-validates strictly; if mismatch remains, `HashlineMismatchError` now carries `remaps` and `buildCorrectedEdit` rewrites the model input so it can retry with corrected anchors instead of proceeding with fuzzy 3-way matching. It also hardens parser handling for bare line-number anchors and malformed range operators so anchor errors are deterministic and recoverable.
ImpactDevelopers and operators using the coding agent will see fewer silent bad writes when a file changes between read and edit, so merged results are more trustworthy and less likely to require manual rollback or forensic repair. This is achieved by removing the fuzzy 3-way merge recovery and enforcing direct/shifted strict-match paths before corrected-anchor feedback; watch for repeated fallback events under high-frequency concurrent edits because they can still increase read/patch churn and delay automation even without corrupting content.
Relaxed the browser-use Anthropic dependency from an exact `0.76.0` pin to `>=0.76.0,<1.0.0`, allowing downstream projects to install newer 0.x releases such as `0.102.0` and avoid dependency conflicts from strict version locking.
ContributionChanged the dependency constraint logic in the package setup so browser-use no longer hard-locks to `anthropic==0.76.0`; it now permits any Anthropic SDK in the 0.x line up to (but not including) 1.0, while keeping an upper bound to avoid unvetted major-version API breaks.
ImpactDevelopers using browser-use with newer Anthropic SDKs can upgrade without pip/uv install failures caused by a strict pin, so integration and deployment pipelines are less likely to stall on dependency resolution. The change replaces a single-version lock with a bounded semver range, which should reduce manual pin overrides and conflict-prone workarounds in projects that already track newer Anthropic minors. Next watch: whether any Anthropic 0.x updates introduce behavior changes before 1.0 and whether environments with strict lockfiles need synchronized refreshes after this change.
Aider’s OpenRouter OAuth flow in `aider/onboarding.py` is updated to prevent the credential file from being created with default permissive permissions: it now hardens `~/.aider` to `0o700`, writes `oauth-keys.env` using explicit `0o600` creation flags, and corrects mode on existing token files after write.
ContributionIt adds concrete filesystem hardening for stored API credentials by eliminating umask-dependent file creation, explicitly setting Unix mode bits on the Aider config directory and OAuth key file, and normalizing permissions on both first-run and re-run OAuth flows.
ImpactOn shared Linux/macOS systems, developers and operators using OpenRouter through Aider are less exposed to local token theft because the saved API key is no longer left world-readable; watch whether multi-user or containerized workstations still have legacy insecure `~/.aider` permission states after upgrading. The fix enforces owner-only permissions (`0o700` for `~/.aider`, `0o600` for `oauth-keys.env`) by creating the file via `os.open` with explicit mode and applying `os.chmod` on existing paths, which removes the `open()` + default-umask exposure and TOCTOU window, while non-Unix platforms retain prior behavior via caught `chmod` exceptions.
Nanocoder now installs a periodic `perf_hooks` cleanup hook in CLI startup so Ink sessions periodically clear marks, measures, and resource timings, preventing the performance entry buffer from growing without bound and causing long-run JavaScript heap OOM failures during subagent-heavy workflows.
ContributionAdded `source/utils/perf-buffer.ts` and wired `installPerfBufferGuard()` from `source/cli.tsx` so every interactive and run-mode Ink session starts with an automatic timer-based performance-buffer drain, eliminating unbounded retention of performance entries generated by React render timing and undici request timings.
ImpactLong-running Nanocoder users, especially those running many subagent turns in interactive sessions, should see far fewer sudden JavaScript heap OOM crashes, so operators can keep automation workflows alive for hours instead of restarting on memory failure. The guard runs as an unref’d 30-second interval that clears performance entries before rendering begins, which is low-impact because the entries were never consumed by Nanocoder itself; continue monitoring are there memory gains across non-INK paths and whether the cleanup interval remains effective or needs tuning under extreme request/render volume.
v1.9.20 introduces copilot session detection as a first-class feature and allows sessions to be resumed with preserved model and allow-all configuration, so operators can continue AI coding work from where it left off instead of rebuilding session context.
ContributionAdded a copilot workflow change that treats sessions as persistent entities, detects existing sessions, and restores them with prior model and allow-all settings when resumed.
ImpactDevelopers using Agent Deck’s copilot can resume a previously stopped coding session with the same model and settings intact, reducing wasted time from manual reconfiguration and lowering the chance of losing mid-task context after restarts. This is implemented through first-class session detection plus resume logic that rehydrates copilot state, so operations teams should monitor whether resuming after long idle periods or model changes can introduce stale state or partial restoration failures.
mirrord now adds upfront validation in config generation to reject duplicate incoming `port_mapping`, duplicate target ports, and duplicate `listen_ports` entries instead of silently accepting them. Previously, duplicates were collected into a `BiMap` where one mapping could be dropped without warning.
ContributionIntroduces explicit duplicate-port checks in the network config path so conflicting `incoming`/`listen` port definitions fail fast with a clear duplicate-port error, preventing silent loss of mappings during resolution.
ImpactOperators and developers using mirrord for port-forwarding will no longer face silently dropped port rules, so forwarding behavior is more predictable and they can fix bad configs immediately instead of debugging missing or wrong routes in a running session. The validation runs before `BiMap` construction, producing a deterministic configuration-time failure for duplicate incoming or listen ports, and teams should watch for any workflows that previously relied on last-entry-wins behavior when duplicate ports were present in checked-in configs or automation output.
v1.9.19 updates Agent Deck’s Claude session flow so restarts are resumed more safely: the resume-from-summary picker is auto-confirmed and the restart path is made single-flight to avoid repeated restart attempts.
ContributionImplemented two concrete session-runtime fixes: automatic confirmation during the Claude "Resume from summary" path, and a single-flight restart guard that prevents duplicate restart actions after Claude exits.
ImpactAI coding teams running long-running Claude sessions through Agent Deck should see fewer restart interruptions and fewer duplicate relaunches, which reduces session churn during recoveries and lowers manual intervention during active work; watch for cases where auto-confirm suppresses an intentional user choice and whether any valid concurrent restart events are unintentionally blocked. Technically, the update normalizes the resume-picker flow and serializes restart handling in session management so one restart path is active at a time, which should make recovery behavior more deterministic after abrupt exits.
An open issue reports a usability gap in the OpenMonoAgent CLI: users cannot paste copied content and cannot reference local files with an @/path/to style syntax, making the tool difficult to use for interactive agent workflows.
ContributionThe issue specifically requests enabling clipboard paste and local file reference input in CLI prompts, a concrete usability behavior change not currently present in the current workflow.
ImpactDevelopers using the CLI are forced to manually retype prompts and cannot quickly attach local files, so routine scripting and debugging sessions become slower and more error-prone. This indicates a clear input-flow friction risk: watch whether future CLI updates add clipboard ingestion and @/path-like path resolution, or whether teams adopt brittle manual workarounds that can reduce adoption of the tool.
The PR adds a README notice for AIDLC v2 and updates issue templates so reporters must select whether they are using v1 current or v2 alpha, keeping commit/hash version details in bug reports.
ContributionIntroduced explicit v2 alpha visibility in documentation and a version-selection field in issue templates, enabling reporters to classify submissions by release line and reducing ambiguity in triage while retaining commit-level traceability for bug reports.
ImpactUsers and maintainers can now separate AIDLC v1 and v2 alpha issue reports, so support and fixes are less likely to target the wrong release line; teams should monitor whether reporters consistently select the correct version and whether any v2 alpha issues still enter the v1 workflow. The change links README to the v2 branch, adds a version selector to bug/feature request/documentation forms, and keeps the existing commit/hash field for precise version tracking.
The PR rewires the task editor file tree to use a proper hierarchical `children` model with normalized POSIX relative paths, fixing a bug where Windows-style paths (e.g., backslashes) made nested files appear as duplicated root-level folder/file rows.
ContributionReplaced the flat file-tree renderer model with explicit per-folder `children` traversal and centralized path normalization for load/reveal/add/remove/watch operations, so nested paths are interpreted as `src -> routes -> ...` instead of a single flattened identity.
ImpactWindows users editing tasks in emdash should stop seeing duplicated folder/file rows in the file tree, so directory navigation becomes reliable and they are less likely to open the wrong path or lose context during workspace work. The implementation also reduces renderer confusion by feeding the UI a normalized, root-aware hierarchy and lazily loading folder contents, but it should be monitored for regressions in watch-event handling and path normalization when directories are added or removed during long sessions.
This release adds Cursor as a first-class agent integration in Agent Deck’s terminal session manager, so users can run Cursor workflows through the same Agent Deck interface instead of separate manual tooling.
ContributionIntroduces native support for the Cursor command-line agent path as a first-class option in Agent Deck, reducing the need for external glue scripts and enabling direct orchestration of Cursor-based coding sessions from one tool.
ImpactDevelopers using Cursor for AI-assisted coding can manage Cursor-based sessions inside Agent Deck directly, which lowers setup friction and makes day-to-day automation less error-prone compared with custom wrapper workflows. This is likely implemented as native CLI dispatch integration, so teams should watch for Cursor version/flag compatibility issues, session-context propagation behavior, and authentication or environment-variable handling during migration from previous custom setups.
This change refactors the task editor file tree from a flat parent-index renderer to a normalized hierarchical `FileNode` model with per-folder `children`, and normalizes local relative paths to POSIX form so nested Windows paths are displayed as a proper directory tree instead of duplicated root-level wrappers.
ContributionImplemented a concrete file-tree behavior fix: store and render task file structures from nested `FileNode` children with stable POSIX-style relative paths, while applying path normalization across load/reveal/add/remove/watch operations and keeping folder expansion state, which directly removes duplicate wrapper rows and stabilizes tree visibility.
ImpactTask editor users, especially on Windows, will see a correct and stable file hierarchy when browsing or editing files, reducing confusion from duplicate folder/file rows and lowering the chance of selecting the wrong path during normal work; teams should continue watching for any path-related regressions in watch/reveal updates where normalized paths might miss or mis-map events.
This is achieved by replacing the flat renderer with child-walked root visibility, enforcing path normalization in LocalFileSystem and file-store flows, and preserving `expandedPaths` while loading directory contents lazily on expand/reveal.
The commit burst includes a feature change to add MySQL IAM authentication support (INT-420), introducing schema/lint updates so mirrord users can configure MySQL access via IAM-style credentials instead of only static secrets.
ContributionIntroduces an IAM-based MySQL authentication path in mirrord, including configuration/schema handling and validation updates so database access can be expressed through IAM credentials.
ImpactDevelopers and operators using mirrord with MySQL can move to IAM-based authentication, reducing dependence on long-lived database passwords and making local-to-prod style workflows safer and cleaner for cloud environments. The change is currently represented by config/schema/lint updates, so teams should monitor whether IAM token retrieval, role/permission mapping, and rotation work reliably under real deployment patterns and watch for auth-failure regressions during integration testing.
Release v0.13.7 fixes a key configuration issue by ensuring the model selected in Codeg settings is actually applied when the app launches.
ContributionCodex changes startup configuration handling so env_json explicitly sets OPENAI_MODEL to the selected model, preventing the launch process from ignoring the user’s model choice.
ImpactUsers who switch to a specific model before starting Codeg now get that model loaded automatically, so automation and long-running coding sessions won’t quietly run on a wrong backend. The implementation makes env_json the authoritative source at launch, and you should watch for cases where external startup scripts also set OPENAI_MODEL, which could create unexpected precedence conflicts.
Release v15.1.7 adds a new Anthropic fast mode to the AI coding-agent and wires in automatic fallback logic, prioritizing responsiveness while reducing manual recovery when the fast path is unavailable.
ContributionIntroduced Anthropic fast-mode execution in the ai/coding-agent path with automatic fallback behavior when the fast path cannot be used.
ImpactDevelopers using the AI coding-agent can keep receiving coding-assistant responses more reliably, because fast-mode failures now fall back automatically instead of requiring manual intervention or blocking workflows. This is implemented as a new fast-mode path for Anthropic in the coding-agent with built-in auto-fallback to an alternate path on failure, which should improve continuity of interactive coding sessions, but teams should monitor how often fallback triggers, whether fallback changes response quality or latency, and whether provider-specific edge cases (timeouts, rate limits, permission drift) start to appear more often.
This pull request fixes a bug where the conversation interface could appear empty, restoring normal rendering when the conversation is in an empty state.
ContributionIntroduced a fix to the conversation rendering path for empty states, preventing the chat view from failing to display when there is no immediate conversation content.
ImpactUsers opening or returning to a chat can now see the conversation area instead of a blank screen, so they can continue interacting with the app without confusion or false failure reports. This likely improves day-to-day usability, but teams should watch for any remaining state-transition paths (e.g., loading to empty, switch-to-empty, and error-to-empty) that might still suppress rendering.