Discussion
greenstevester/how-to-setup-ollama-on-a-macmini.md
redrove: There is virtually no reason to use Ollama over LM Studio or the myriad of other alternatives.Ollama is slower and they started out as a shameless llama.cpp ripoff without giving credit and now they “ported” it to Go which means they’re just vibe code translating llama.cpp, bugs included.
easygenes: Why is ollama so many people’s go-to? Genuinely curious, I’ve tried it but it feels overly stripped down / dumbed down vs nearly everything else I’ve used.Lately I’ve been playing with Unsloth Studio and think that’s probably a much better “give it to a beginner” default.
iLoveOncall: > There is virtually no reason to use Ollama over LM Studio or the myriad of other alternatives.Hmm, the fact that Ollama is open-source, can run in Docker, etc.?
alifeinbinary: I really like LM Studio when I can use it under Windows but for people like me with Intel Macs + AMD gpu ollama is the only option because it can leverage the gpu using MoltenVK aka Vulkan, unofficially. We're still testing it, hoping to get the Vulkan support in the main branch soon. It works perfectly for single GPUs but some edge cases when using multiple GPUs are unsupported until upstream support from MoltenVK comes through. But yeah, I agree, it wasn't cool to repackage Georgi's work like that.
robotswantdata: Why are you using Ollama? Just use llama.cppbrew install llama.cppuse the inbuilt CLI, Server or Chat interface. + Hook it up to any other app
gen6acd60af: LM Studio is closed source.And didn't Ollama independently ship a vision pipeline for some multimodal models months before llama.cpp supported it?
lousken: lm studio is not opensource and you can't use it on the server and connect clients to it?
jedisct1: LM Studio can absolutely run as as server.
meltyness: I feel like the READMEs for these 3 large popular packages already illustrate tradeoffs better than hacker news argument
greenstevester: Right. So Google released Gemma 4, a 26B mixture-of-experts model that only activates 4B parameters per token.It's essentially a model that's learned to do the absolute minimum amount of work while still getting paid. I respect that enormously.It scores 1441 on Arena Elo — roughly the same as Qwen 3.5 at 397B and Kimi k2.5 at 1100B.Ollama v0.19 switched to Apple's MLX framework on Apple Silicon. 93% faster decode.They've also improved caching so your coding agents don't have to re-read the entire prompt every time, about time I'd say.The gist covers the full setup: install, auto-start on boot, keep the model warm in memory.It runs on a 24GB Mac mini, which means the most expensive part of your local AI setup is still the desk you put it on.
krzyk: By desk you mean that "Mac mini"? Because it is pricey. In my country it is 1000 USD (from Apple for basic M4 with 24GB). My desk was 1/5th of that price.And considering that this Mac mini won't be doing anything else is there a reason why not just buy subscription from Claude, OpenAI, Google, etc.?Are those open models more performant compared to Sonnet 4.5/4.6? Or have at least bigger context?
boutell: Last night I had to install the VO.20 pre-release of ollama to use this model. So I'm wondering if these instructions are accurate.
diflartle: Ollama is good enough to dabble with, and getting a model is as easy as ollama pull <model name> vs figuring it out by yourself on hugging face and trying to make sense on all the goofy letters and numbers between the forty different names of models, and not needing a hugging face account to download.So you start there and eventually you want to get off the happy path, then you need to learn more about the server and it's all so much more complicated than just using ollama. You just want to try models, not learn the intricacies of hosting LLMs.
Bigsy: For MLX I'd guess.