Discussion
Your File System Is Already a Graph Database
embedding-shape: I've been playing around with the same, but trying to use local models as my Obsidian vault obviously contain a bunch of private things I'm not willing to share with for-profit companies, but I have yet to find any model that comes close to working out as well as just codex or cc with the small models, even with 96GB of VRAM to play around with.I've started to think about maybe a fine-tuned model is needed, specifically for "journal data retrieval" or something like that, is anyone aware of any existing models for things like this? I'd do it myself, but since I'm unwilling to send larger parts of my data to 3rd parties, I'm struggling collecting actual data I could use for fine-tuning myself, ending up in a bit of a catch 22.
exossho: I can't remember how many file structures I've already tried... LLMs seem to be a great help here. Also used CC to organize my messy harddrive.Now just need to find a good way to maintain the order...
freedomben: > Also used CC to organize my messy harddrive.Do you still have your prompt by chance, and willing to share it? I took a stab at this and it didn't want to make much change. I think I need to be more specific but am not sure how to do that in a general way
itake: I'm wonder though:1. Why does AI need that folder structure? Why not a flat list of files and let the AI agent explore with BM25 / grep, etc.2. pre-compute compression vs compute at query time.Kaparthy (and you) are recommending pre-compressing and sorting based on hard coded human abstraction opinions that may match how the data might be queried into human-friendly buckets and language.Why not just let the AI calculate this at run time? Many of these use cases have very few files and for a low traffic knowledge store, it probably costs less tokens if you only tokenize the files you need.
laurowyn: > Why does AI need that folder structure? Why not a flat list of files and let the AI agent explore with BM25 / grep, etc.It doesn't. The human creating the files needs it, to make it easier to traverse in future as the file count grows. At 52k files, that's a horrendous list to scroll through to find the thing you're looking for. Meanwhile, an AI can just `find . -type f -exec whatever {} \;` and be able to process it however it needs. Human doesn't need to change the way they work to appease the magic rock in the box under the desk.