What Happened
- This change adds a dedicated daily evaluation module so teams can benchmark browser-agent runs in a repeatable, task-card-driven way instead of using ad-hoc scripts, making quality drift easier to detect over time.
- This change adds a dedicated daily evaluation module so teams can benchmark browser-agent runs in a repeatable, task-card-driven way instead of using ad-hoc scripts, making quality drift easier to detect over time.
- 1 evidence item attached for review.