Discussion
Search code, repositories, users, issues, pull requests...
rs545837: Some context on the validation so far: Elijah Newren, who wrote git's merge-ort (the default merge strategy), reviewed weave and said language-aware content merging is the right approach, that he's been asked about it enough times to be certain there's demand, and that our fallback-to-line-level strategy for unsupported languages is "a very reasonable way to tackle the problem." Taylor Blau from the Git team said he's "really impressed" and connected us with Elijah. The creator of libgit2 starred the repo. Martin von Zweigbergk (creator of jj) has also been excited about the direction. We are also working with GitButler team to integrate it as a research feature.The part that's been keeping me up at night: this becomes critical infrastructure for multi-agent coding. When multiple agents write code in parallel (Cursor, Claude Code, Codex all ship this now), they create worktrees for isolation. But when those branches merge back, git's line-level merge breaks on cases where two agents added different functions to the same file. weave resolves these cleanly because it knows they're separate entities. 31/31 vs git's 15/31 on our benchmark.Weave also ships as an MCP server with 14 tools, so agents can claim entities before editing, check who's touching what, and detect conflicts before they happen.
tveita: > Elijah Newren, who wrote git's merge-ort (the default merge strategy), reviewed weave and said language-aware content merging is the right approach, that he's been asked about it enough times to be certain there's demand, and that our fallback-to-line-level strategy for unsupported languages is "a very reasonable way to tackle the problem." Taylor Blau from the Git team said he's "really impressed" and connected us with Elijah. The creator of libgit2 starred the repo. Martin von Zweigbergk (creator of jj) has also been excited about the direction.Are any of these statements public, or is this all private communication?> We are also working with GitButler team to integrate it as a research feature.Referring to this discussion, I assume: https://github.com/gitbutlerapp/gitbutler/discussions/12274
rs545837: Email conversations with Elijah and Taylor are private. Martin commented on our X post that went viral, and suggested a new benchmark design.
gritzko: At this point, the question is: why keep files as blobs in the first place. If a revision control system stores AST trees instead, all the work is AST-level. One can run SQL-level queries then to see what is changing where. Like - do any concurrent branches touch this function? - what new uses did this function accrete recently? - did we create any actual merge conflicts? Almost LSP-level querying, involving versions and branches. Beagle is a revision control system like that [1]It is quite early stage, but the surprising finding is: instead of being a depository of source code blobs, an SCM can be the hub of all activities. Beagle's architecture is extremely open in the assumption that a lot of things can be built on top of it. Essentially, it is a key-value db, keys are URIs and values are BASON (binary mergeable JSON) [2] Can't be more open than that.[1]: https://github.com/gritzko/librdx/tree/master/be[2]: https://github.com/gritzko/librdx/blob/master/be/STORE.md
samuelstros: How do you get blob file writes fast?I built lix [0] which stores AST’s instead of blobs.Direct AST writing works for apps that are “ast aware”. And I can confirm, it works great.But, the all software just writes bytes atm.The binary -> parse -> diff is too slow.The parse and diff step need to get out of the hot path. That semi defeats the idea of a VCS that stores ASTs though.[0] https://github.com/opral/lix
gritzko: I only diff the changed files. Producing blob out of BASON AST is trivial (one scan). Things may get slow for larger files, e.g. tree-sitter C++ parser is 25MB C file, 750KLoC. Takes couple seconds to import it. But it never changes, so no biggie.There is room for improvement, but that is not a show-stopper so far. I plan round-tripping Linux kernel with full history, must show all the bottlenecks.P.S. I checked lix. It uses a SQL database. That solves some things, but also creates an impedance mismatch. Must be x10 slow down at least. I use key-value and a custom binary format, so it works nice. Can go one level deeper still, use a custom storage engine, it will be even faster. Git is all custom.
rs545837: Good framing. Source code is already a serialization of an AST, we just forgot that and started treating it as text. The practical problem is adoption: every tool in the ecosystem reads bytes.
shubhamintech: The merge conflict is the symptom. The root problem is parallel agents have no coordination primitives before edits happen. The MCP server angle is the more interesting long-term bet here because it moves conflict avoidance earlier in the workflow rather than cleaning up damage after the merge. Entity claiming as a first-class primitive is where this gets really interesting for multi-agent coding. What do you think?