Back to Signal Feed
CodeTracked since May 19, 2026

Trim watcher subject truncation at UTF-8 boundaries

A single change in `firstLine()` fixes a correctness bug where slicing event subjects to `maxLen` could cut multibyte UTF-8 characters in half, writing invalid UTF-8 into `watcher_events.subject`; the function now backs up to the nearest valid boundary before storing subjects, with targeted tests added for UTF-8 edge cases (Cyrillic, em dash, emoji, and boundary alignment).

firstLine()UTF-8 boundary trimmingwatcher_events.subjectinternal/watcher/webhook.go

What Happened

  • A single change in `firstLine()` fixes a correctness bug where slicing event subjects to `maxLen` could cut multibyte UTF-8 characters in half, writing invalid UTF-8 into `watcher_events.subject`; the function now backs up to the nearest valid boundary before storing subjects, with targeted tests added for UTF-8 edge cases (Cyrillic, em dash, emoji, and boundary alignment).
  • A single change in `firstLine()` fixes a correctness bug where slicing event subjects to `maxLen` could cut multibyte UTF-8 characters in half, writing invalid UTF-8 into `watcher_events.subject`; the function now backs up to the nearest valid boundary before storing subjects, with targeted tests added for UTF-8 edge cases (Cyrillic, em dash, emoji, and boundary alignment).
  • 1 evidence item attached for review.

What is Different

Before

Scattered source updates, isolated context, and manual follow-up across multiple feeds.

Now

Fixed the watcher subject truncation path so capped subject strings remain valid UTF-8, preventing corrupted DB rows from multibyte input and adding regression tests that assert UTF-8 validity and cap safety.

Why Track This

Why It Matters

Watchers and downstream bridges (including the Python sqlite poller) can now process events reliably without hitting UTF-8 decode crashes from poisoned `subject` rows, so forwarding does not get stuck re-reading the same bad row and silently expanding the retry queue. Practically, this replaces invalid truncated text with a safe nearest-rune boundary (at most 3 trailing bytes removed per case), so event flow stays stable after non-ASCII subjects; operators should still watch for any other code paths that write `subject` values directly and for future changes to truncation length settings that might reintroduce boundary-splitting.

Impact

Watchers and downstream bridges (including the Python sqlite poller) can now process events reliably without hitting UTF-8 decode crashes from poisoned `subject` rows, so forwarding does not get stuck re-reading the same bad row and silently expanding the retry queue. Practically, this replaces invalid truncated text with a safe nearest-rune boundary (at most 3 trailing bytes removed per case), so event flow stays stable after non-ASCII subjects; operators should still watch for any other code paths that write `subject` values directly and for future changes to truncation length settings that might reintroduce boundary-splitting.

What To Watch Next

  • Watch whether firstLine() becomes a repeated pattern.
  • Track follow-up changes around AI Coding Agents.
  • Compare future signals against this evidence trail.
  • Re-check risk flags: non_utf8_subject_injection_from_other_writers, boundary_regression_after_maxlen_changes.
Open Topic TimelineOpen Technical EventOpen Original Sourcenon_utf8_subject_injection_from_other_writers / boundary_regression_after_maxlen_changes / retry_queue_growth_if_other_columns_become_corrupt

Supporting Evidence