Back to Signal Feed
CodeTracked since May 19, 2026

Persist interrupted LLM partial responses in run journal

In DeerFlow PR #3039, the run worker now buffers streamed `AIMessageChunk` data by stable message ID and, on `RunStatus.interrupted`, writes it as `llm.ai.partial` journal output so a stopped run can return partial assistant text instead of an empty run record after refresh.

RunJournalAIMessageChunkpartial_ai_contentrecord_partial_ai_message

What Happened

  • In DeerFlow PR #3039, the run worker now buffers streamed `AIMessageChunk` data by stable message ID and, on `RunStatus.interrupted`, writes it as `llm.ai.partial` journal output so a stopped run can return partial assistant text instead of an empty run record after refresh.
  • In DeerFlow PR #3039, the run worker now buffers streamed `AIMessageChunk` data by stable message ID and, on `RunStatus.interrupted`, writes it as `llm.ai.partial` journal output so a stopped run can return partial assistant text instead of an empty run record after refresh.
  • 1 evidence item attached for review.

What is Different

Before

Scattered source updates, isolated context, and manual follow-up across multiple feeds.

Now

Introduces interruption-safe partial-message persistence by buffering streamed chunks during LLM generation and persisting them when a run is manually stopped, while recording completed message IDs to prevent duplicate entries; this turns previously lost in-progress output into recoverable history.

Why Track This

Why It Matters

Developers and operators reloading a run after an explicit stop will now see the partial assistant response instead of a blank/partial conversation, which preserves diagnostic context and reduces the chance of having to replay or discard interrupted sessions. The mechanism buffers chunked output in the worker and emits `llm.ai.partial` events during the interrupt finalization path, with completed-message ID tracking to suppress duplicates if normal completion happened just before stop. Watch next for partial-buffer cleanup and dedup behavior under rapid stop/retry cycles or high-concurrency streaming, since stale buffers or race windows could otherwise reintroduce duplicated or missing fragments.

Impact

Developers and operators reloading a run after an explicit stop will now see the partial assistant response instead of a blank/partial conversation, which preserves diagnostic context and reduces the chance of having to replay or discard interrupted sessions. The mechanism buffers chunked output in the worker and emits `llm.ai.partial` events during the interrupt finalization path, with completed-message ID tracking to suppress duplicates if normal completion happened just before stop. Watch next for partial-buffer cleanup and dedup behavior under rapid stop/retry cycles or high-concurrency streaming, since stale buffers or race windows could otherwise reintroduce duplicated or missing fragments.

What To Watch Next

  • Watch whether RunJournal becomes a repeated pattern.
  • Track follow-up changes around AI Agents.
  • Compare future signals against this evidence trail.
  • Re-check risk flags: partial_buffer_cleanup_on_abort, high_concurrency_duplicate_suppression.
Open Topic TimelineOpen Technical EventOpen Original Sourcepartial_buffer_cleanup_on_abort / high_concurrency_duplicate_suppression / long_running_stream_memory_growth

Supporting Evidence