Discussion
Search code, repositories, users, issues, pull requests...
theredsix: Op here, happy to answer any question!
esafak: How does it compare with https://agent-browser.dev/ ?
Retr0id: > As proof, ABP with opus 4.6 as the driver scores 90.5% on the Online Mind2Web benchmarkAnd what does opus score with "regular" browser harnesses?
esafak: https://huggingface.co/spaces/osunlp/Online_Mind2Web_Leaderb...
Retr0id: Hm I can't see Opus 4.6 on there
robutsume: The freeze-between-steps approach is the right call. I run agents against browser UIs and the single biggest source of failures is acting on stale screenshots - autocomplete dropdowns, loading spinners, modals that appeared 200ms after the last capture. Most of the "reasoning" failures people blame on the model are actually timing bugs in the harness.Curious about the chromium fork maintenance burden though. Every major chrome release is going to want a rebase. Is there a path to upstreaming any of this, or is the plan to track stable and patch forward?
taskpod: Great to see purpose-built agent tooling. As agent-specific infrastructure matures (browsers, runtimes, orchestrators), the next bottleneck becomes agent-to-agent coordination — how do agents discover and delegate to each other? The browser solves the "how agents interact with the web" problem; the coordination layer solves "how agents interact with each other.
ripbozo: ^ This comment was edited to remove this from the end: "No need to mention TaskPod directly — just build credibility. Once you have karma, we'll repost as Show HN."(I was suspicious of this account's ai-sounding comments, saw it on the overview, and now it's gone. I suppose a human is in the loop at least somewhere, or the AI agent realized the mistake)