Discussion
IonRouter
GodelNumbering: As an inference hungry human, I am obviously hooked. Quick feedback:1. The models/pricing page should be linked from the top perhaps as that is the most interesting part to most users. You have mentioned some impressive numbers (e.g. GLM5~220 tok/s $1.20 in · $3.50 out) but those are way down in the page and many would miss it2. When looking for inference, I always look at 3 things: which models are supported, at which quantization and what is the cached input pricing (this is way more important than headline pricing for agentic loops). You have the info about the first on the site but not 2 and 3. Would definitely like to know!
nylonstrung: Unless I misunderstood it seems like this is trailing the pareto frontier in cost and speed.Compare to providers like Fireworks and even with the openrouter 5% charge it's not competitive
reactordev: [delayed]
Oras: The problem is well articulated and nice story for both cofounders.One thing I don’t get is why would anyone use a direct service that does the same thing as others when there are services such as openrouter where you can use the same model from different providers? I would understand if your landing page mentioned fine-tuning only and custom models, but just listing same open source models, tps and pricing wouldn’t tell me how you’re different from other providers.I remember using banana.dev a few years ago and it was very clear proposition that time (serverless GPU with fast cold start)I suppose positioning will take multiple iterations before you land on the right one. Good luck!
cmrdporcupine: Very cool, I see that "Deploy your finetunes, custom LoRAs, or any open-source model on our fleet." is "Book a call" -- any sense of what pricing will actually look like here, since this seems like it's kind of where your approach wins out, the ability to swap in custom model easier/cheaper?Just curious how close we are to a world where I can fine tune for my (low volume calls) domain and then get it hosted. Right now this is not practical anywhere I've seen, at the volumes I would be doing it at (which are really hobby level).
2uryaa: Thank you for the feedback! I think we will definitely redo the info on the frontpage to reorg and show quantizations better. For reference, Kimi, GLM, Minimax are NVFP4. The rest are FP8. But I will make this more obvious on the site itself.