Discussion
How we built a virtual filesystem for our Assistant
seanlinehan: This is definitely the way. There are good use cases for real sandboxes (if your agent is executing arbitrary code, you better it do so in an air-gapped environment).But the idea of spinning up a whole VM to use unix IO primitives is way overkill. Makes way more sense to let the agent spit our unix-like tool calls and then use whatever your prod stack uses to do IO.
softwaredoug: The real thing I think people are rediscovering with file system based search is that there’s other kind of semantic search than embedding based retrieval. One that looks more like how a librarian organizes files into shelves based on the domain.We’re rediscovering forms of in search we’ve known about for decades. And it turns out they’re more interpretable to agents.https://softwaredoug.com/blog/2026/01/08/semantic-search-wit...
whattheheckheck: Turns out the millions of people in knowledge work arent librarians and they wing shit everywhere
wielebny: Someone simply assumed at some point that RAG must be based on vector search, and everyone followed.
softwaredoug: It’s something of a historical accidentWe started with LLMs when everyone in search was building question answering systems. Those architectures look like the vector DB + chunking we associate with RAG.Agents ability to call tools, using any retrieval backend, call that into question.
pboulos: I think this is a great approach for a startup like Mintlify. I do have skepticism around how practical this would be in some of the “messier” organisations where RAG stands to add the most value. From personal experience, getting RAG to work well in places where the structure of the organisation and the information contained therein is far from hierarchical or partition-able is a very hard task.
dmix: This puts a lot of LLM in front of the information discovery. That would require far more sophisticated prompting and guardrails. I'd be curious to see how people architect an LLM->document approach with tool calling, rather than RAG->reranker->LLM. I'm also curious what the response times are like since it's more variable.
skeptrune: Hmmm, the post is an attempt to explain that Mintlify migrated from embedding-retrieval->reranker->LLM to an agent loop with access to call POSIX tools as it desires. Perhaps we didn't provide enough detail?
tschellenbach: I think generally we are going from vector based search, to agentic tool use, and hierarchy based systems like skills.
skeptrune: Vector search has moved from a "complete solution" to just one tool among many which you should likely provide to an agent.
skeptrune: I think it's cool that LLMs can effectively do this kind of categorization on the fly at relatively large scale. When you give the LLM tools beyond just "search", it really is effectively cheating.
ctxc: haha, sweet. One of the cooler things I've read lately
mandeepj: > even a minimal setup (1 vCPU, 2 GiB RAM, 5-minute session lifetime) would put us north of $70,000 a year based on Daytona's per-second sandbox pricing ($0.0504/h per vCPU, $0.0162/h per GiB RAM)$70k?how about if we round off one zero? Give us $7000.That number still seems to be very high.
lstodd: Hm. I think a dedicated 16-core box with 64 ram can be had for under $1000/year.It being dedicated there are no limits on session lifetime and it'd run 16 those sessions no problem, so the real price should be around ~$70/year for that load.
bluegatty: It was the terminology that did that more than anything. The term 'RAG' just has a lot of consequential baggage. Unfortunately.
bluegatty: RAG should have have been represented as a context tool but rather just vector querying ad an variation of search/query - and that's it.We were bit by our own nomenclature.Just a small variation in chosen acronym may have wrought a different outcome.Different ways to find context are welcome, we have a long way to go!
maille: Let's say I want a free, local or free-tier-llm, simple solution to search information mostly from my emails and a little bit from text, doc and pdf files. Are there any tool I should try to have ollamma or gemini able to reply with my own knowledge base?
ghywertelling: https://onyx.app/This could be useful.
czhu12: Similar effort with PageIndex [1], which basically creates a table of contents like tree. Then an LLM traverses the tree to figure out which chunks are relevant for the context in the prompt.1: https://github.com/VectifyAI/PageIndex
TeMPOraL: Right. R in RAG stands for retrieval, and for a brief moment initially, it meant just that: any kind of tool call that retrieves information based on query, whether that was web search, or RDBMS query, or grep call, or asking someone to look up an address in a phone book. Nothing in RAG implies vector search and text embeddings (beyond those in the LLM itself), yet somehow people married the acronym to one very particular implementation of the idea.
oceansky: I'm still using the old definition, never got the memo.
Galanwe: I am not familiar with the tech stack they use, but from an outsider point of view, I was sort of expecting some kind of fuse solution. Could someone explain why they went through a fake shell? There has to be a reason.
skeptrune: 100% agree a FUSE mount would be the way to go given more time and resources.Putting Chroma behind a FUSE adapter was my initial thought when I was implementing this but it was way too slow.I think we would also need to optimize grep even if we had a FUSE mount.This was easier in our case, because we didn’t need a 100% POSIX compatibility for our read only docs use case because the agent used only a subset of bash commands anyway to traverse the docs. This also avoids any extra infra overhead or maintenance of EC2 nodes/sandboxes that the agent would have to use.
Galanwe: Makes sense, thanks for clarifying!