Back to Signal Feed
CodeTracked since May 19, 2026

Fix watcher subject truncation to preserve UTF-8 validity

The pull request changes `firstLine()` in `internal/watcher` so that when a subject is capped at `maxLen=200`, trimming now backs up to the nearest valid UTF-8 boundary instead of slicing at an arbitrary byte offset. This keeps `watcher_events.subject` rows validly encoded for Slack/ntfy/webhook paths and prevents strict UTF-8 consumers from failing on truncated multi-byte characters.

firstLine()watcher_events.subjectinternal/watcher/webhook.goutf8.ValidString

What Happened

  • The pull request changes `firstLine()` in `internal/watcher` so that when a subject is capped at `maxLen=200`, trimming now backs up to the nearest valid UTF-8 boundary instead of slicing at an arbitrary byte offset. This keeps `watcher_events.subject` rows validly encoded for Slack/ntfy/webhook paths and prevents strict UTF-8 consumers from failing on truncated multi-byte characters.
  • The pull request changes `firstLine()` in `internal/watcher` so that when a subject is capped at `maxLen=200`, trimming now backs up to the nearest valid UTF-8 boundary instead of slicing at an arbitrary byte offset. This keeps `watcher_events.subject` rows validly encoded for Slack/ntfy/webhook paths and prevents strict UTF-8 consumers from failing on truncated multi-byte characters.
  • 1 evidence item attached for review.

What is Different

Before

Scattered source updates, isolated context, and manual follow-up across multiple feeds.

Now

Implemented UTF-8 boundary-aware truncation in the shared `firstLine()` helper (used by `slack.go`, `ntfy.go`, and `webhook.go`), replacing raw byte slicing with a backward adjustment loop that removes at most 3 trailing bytes when a cut lands inside a multi-byte rune.

Why Track This

Why It Matters

Operators and integration developers using watcher consumers can avoid notification pipelines stalling on a single bad subject row, because strict UTF-8 decoders (such as the Python bridge) no longer fail when reading oversized multi-byte subjects and the queue can continue advancing. Technically, the fix removes malformed trailing bytes at 200-byte truncation points and validates behavior for Cyrillic, em-dash, and emoji boundaries; continue tracking whether other event fields or upstream producers can still emit non-UTF8 data and whether retry logic should route failures to a dead-letter path instead of endless reprocessing.

Impact

Operators and integration developers using watcher consumers can avoid notification pipelines stalling on a single bad subject row, because strict UTF-8 decoders (such as the Python bridge) no longer fail when reading oversized multi-byte subjects and the queue can continue advancing. Technically, the fix removes malformed trailing bytes at 200-byte truncation points and validates behavior for Cyrillic, em-dash, and emoji boundaries; continue tracking whether other event fields or upstream producers can still emit non-UTF8 data and whether retry logic should route failures to a dead-letter path instead of endless reprocessing.

What To Watch Next

  • Watch whether firstLine() becomes a repeated pattern.
  • Track follow-up changes around AI Agents.
  • Compare future signals against this evidence trail.
  • Re-check risk flags: strict_utf8_consumers_can_still_be_blocked_by_other_columns, truncated_subjects_shorten_unicode_content_by_up_to_3_bytes.
Open Topic TimelineOpen Technical EventOpen Original Sourcestrict_utf8_consumers_can_still_be_blocked_by_other_columns / truncated_subjects_shorten_unicode_content_by_up_to_3_bytes / retry_queue_should_surface_and_isolate_corrupt_event_rows

Supporting Evidence