Discussion
Jack5500: Sadly no mention on regions.
bm-rf: Not seeing any pricing info on the models[1] page. Wonder how much of a lift this is over paying providers directly. Perhaps Cloudflare is doing this at cost? Also interesting that zero data retention is not on by default, and is not supported with all providers[2]. Finally, would be great if this could return OpenAI AND Anthropic style completions.[1] https://developers.cloudflare.com/ai/models/[2] https://developers.cloudflare.com/ai-gateway/features/unifie...
yoavm: Workers AI pricing is this: https://developers.cloudflare.com/workers-ai/platform/pricin...
throwpoaster: Anthropic gonna acquire Cloudflare for stock. Solves their infrastructure problems in one shot.
neya: I'm not ready to for another rug pull, so please no :( I really enjoy Cloudflare's CDN.
mbtrucks: Can I set a hard cost limit ? Else I'm not interested, don't be like googles mess of billing.
mbtrucks: Can I set a hard cost limit per day ? With no drift, else I'm not interested.
pprotas: Can't wait for the free tier!
yoavm: Workers AI had a free tier since it launched, I think? See the pricing page I linked to above.
indigodaddy: So looks like the AI Platform free tier will have access to the open models only perhaps? And the 10,000 neuron thing? I don't see any mention of frontier models in the url you linked in the other comment ( https://news.ycombinator.com/item?id=47792538#47793142 )
samjs: Hey! I'm one of the engineers who built this :)We'll be adding prices to the docs and the model catalog in the dashboard shortly.In short: currently the pricing matches whatever the provider charges. You can buy unified billing credits [1] which charges a small processing fee.> Finally, would be great if this could return OpenAI AND Anthropic style completions.Agreed! This will be coming shortly. Currently we'll match the provider themselves, but we plan to make it possible to specify an API format when using LLMs.[1]: https://developers.cloudflare.com/ai-gateway/features/unifie...
kylehotchkiss: No way! Cloudflare will buy anthropic when the economy begins self-correcting. Looking forward to Workers AI getting all those H100s to run more Qwens
whereistejas: This actually looks very useful. Cloudflare seems to be brining together a great set of tools. Not to mention, D2 is literally the only sqlite-as-a-service solution out there whose reliability is great and free tier limits are generous.
kylehotchkiss: * D1, but agreed. I wish Cloudflare would offer a built-in D1-R2 backups system though! (Can be done with custom code in a worker, but wish it was first-party)
james2doyle: I find it really confusing that the worker AI models on here: https://developers.cloudflare.com/workers-ai/models/ do not have full overlap with the ones on here: https://developers.cloudflare.com/ai/models/Yes, you can see the same "hosted" ones on there, but when you look at the models endpoint, there are much less options at the "workers-ai/*" namespace. Is that intentional?
james2doyle: To better clarify, I don’t see "workers-ai/@cf/google/gemma-4-26b-a4b-it" in the /models enpoint in gateway.ai.cloudflare.com but it does seem to exist as a hosted model. Same with "workers-ai/@cf/nvidia/nemotron-3-120b-a12b" which I would expect to see
BoorishBears: > For those who don’t use Workers, we’ll be releasing REST API support in the coming weeks, so you can access the full model catalog from any environment.Cloudflare seems to be building for lock-in and I don't love it. I especially don't understand how you build an OpenRouter and only have bindings for your custom runtime at launch.
eis: D1 reliability has been bad in our experience. We've had queries hanging on their internal network layer for several seconds, sometimes double digits over extended periods (on the order of weeks). Recently I've seen a few times plain network exceptions - again, these are internal between their worker and the D1 hosts. And many of the hung queries wouldn't even show up under traces in their observability dashboard so unless you have your own timeout detection you wouldn't even know things are not working. It was hard to get someone on their side to take a look and actually acknowledge and understand the problem.But even without network issues that have plagued it I would hesitate to build anything for production on it because it can't even do transactions and the product manager for D1 openly stated they wont implement them [0]. Your only way to ensure data consistency is to use a Durable Object which comes with its own costs and tradeoffs.https://github.com/cloudflare/workers-sdk/issues/2733#issuec...The basic idea of D1 is great. I just don't trust the implementation.For a hobby project it's a neat product for sure.
agentifysh: excellent! please make sure to include rate limit details as well.
messh: So, is this similar to openrouter?
mips_avatar: with Argo networking
mips_avatar: So it's basically just openrouter with cloudflare argo networking? I feel like they could do some much more interesting stuff with their replicate acquisition. Application specific RL is getting so good but there's no good way to deploy these models in a scalable way. Even the providers like fireworks which claim to let you deploy LORAs in a scalable way can't do it. For now I literally have to host base load on my application on a rack of 3090s in my garage which seems silly but it saves me $1k a month.
pizzly: Yes with less models to choose from unless you bring your own model.
vladgur: Curious which models are you able to run and how many 3090s do they require at scale?
mips_avatar: 4 3090s with nvlinks on each pair. Super fast inference on Moe models around 20-36b
ignoramous: > And many of the hung queries wouldn't even show up under traces in their observability dashboardHow did you work around this problem? As in, how do you monitor for hung queries and cancel them?> D1 reliability has been bad in our experience.What about reads? We use D1 in prod & our traffic pattern may not be similar to yours (our workload is async queue-driven & so retries last in order of weeks), nor have we really observed D1 erroring out for extended periods or frequently.
Normal_gaussian: yeah this really sucks.No downtime snapshots would be the best but I'd be quite happy with a blocking backup on a set schedule that can be set from the GUI / from the cli / from a config file. Its a huge PITA having to play 'trust me bro' to clients and their admins with custom workers and backups.I currently stream it D1 dump -> worker(encrypt w/ key wrapping) -> R2 on a schedule, then have a container spin up once a day and create changesets from the dumps. An external tool pulls the dumps and changesets.
kinnth: openrouter works perfectly well for me called by cloudflare workers. open router also has superior cascading and waterfalling if models are offline. Not sure they have that working from V1.I love everything about openrouter. So kinda a fan boy.
wahnfrieden: No spending limit / no ability to set a budget, unlike Google or OpenAI. Be prepared for an eye-watering invoice if you have a bug or get hacked.edit: Why downvote? It's correct, and it's a risk that competitors handle better, including for their CDN products (compared to Bunny CDN). Maybe you are just used to the risk and haven't felt the burn yourself yet. Or you have the mistaken notion that there is no price at which temporary downtime is worthwhile to avoid paying.
rl3: [delayed]
jonfromsf: Gilfoyle? Is that you?
mips_avatar: I think these gpus were actually used for bitcoin mining before I bought them
eis: > How did you work around this problem? As in, how do you monitor for hung queries and cancel them?You just wrap your DB queries in your own timeout logic. You can then continue your business logic but you can't truly cancel the query because well, the communication layer for it is stuck and you can't kill it via a new connection. Your only choice is to abandon that query. Sometimes we could retry and it would immediately succeed suggesting that the original query probably had something like packetloss that wasn't handled properly by CF. Easy when it's a read but when you have writes then it gets complicated fast and you have to ensure your writes are idempotent. And since they don't support transactions it's even more complex.Aphyr would have a field day with D1 I'd imagine.> What about reads? We use D1 in prod & our traffic pattern may not be similar to yours (our workload is async queue-driven & so retries last in order of weeks), nor have we really observed D1 erroring out for extended periods or frequently.We have reads and writes which most of the time are latency sensitive (direct user feedback). A user interaction can usually involve 3-5 queries and they might need to run in sequence. When queries take 500ms+ the system starts to feel sluggish. When they take 2-3s it's very frustrating. The high latencies happened for both reads and writes, you can do a simple "SELECT 123" and it would hang. You could even reproduce that from the Cloudflare dashboard when it's in this degradated state.From the comments of others who had similar issues I think it heavily depends on the CF locations or D1 hosts. Most people probably are lucky and don't get one of the faulty D1 servers. But there are a few dozen people who were not so lucky, you can find them complaining on Github, on the CF forum etc. but simply not heard. And you can find these complaints going back years.This long timeframe without fixes to their network stack (networking is CF's bread and butter!), the refusal to implement transactions, the silence in their forum to cries for help, the absurdly low 10GB limit for databases... it just all adds up. We made the decision to not implement any new product on D1 and just continue using proper databases. It's a shame because workers + a close-by read replica could be absolutely great for latency. Paradoxically it was the opposite outcome.
switz: Workers runtime is open source and permissively licensed fwiwhttps://github.com/cloudflare/workerd
eis: Yes but that is just a tiny part of the whole CF worker ecosystem. The other services are not open source and so the lock-in is very very real. There are no API compatible alternatives that cover a good chunk of the services. If you build your application around workers and make use of the integrated services and APIs there is no way for you to switch to another provider because well, there is none.
mikeocool: Agreed -- except that all of their docs and marketing pitches it for use cases like "per-user, per-tenant or per-entity databases" -- which would be SO great.But in practice, it's basically impossible to use that way in conjunctions with workers, since you have to bind every database you want to use to the worker and binding a new database requires redeploying the worker.
AgentME: If you want to dynamically create sqlite databases, then moving to durable objects which are each backed by an sqlite database seems to be the way to go currently.
eis: And now you've put everything on the equivalent of a single NodeJS process running on a tiny VM. Next step: spread out over multiple durable objects but that means implementing a sharding logic. Complexity escalates very fast once you leave toy project territory.
bryden_cruz: Running a rack of 3090s in your garage to avoid provider lock-in/costs is the most Hacker News thing. Out of curiosity, what are you doing for uptime/failover? If you are running production traffic to that garage rack, does your app just degrade gracefully if your home internet drops, or do you have a cloud fallback?
brikym: There is always one thing that bites you because Cloudflare is different. I just built an AI game (sleuththetruth.com) and the primary reason it's so slow to prompt a new board is actually not because of AI latency. It's because CF workers have a limit of 6 connections (including spawned workers). There is no way to gulp down all the wiki images I want all at once. If I had put the backend on Railway I don't think I'd have this issue.
handfuloflight: Which AI did you use to write this?
TheServitor: That's so brilliant that it's already a thing called openrouter!
rs_rs_rs_rs_rs: Yeah but the 10GB limit for D1 is crazy, can you really start building on that? Other than toy projects?
jillesvangurp: Most website content management systems would never get close to that size. If you need a bigger database, D1 is probably the wrong solution to begin with. 10GB can be millions of records depending on your table structure. But if you are gathering some survey data, running a CMS, etc. you probably should be fine with even just a few MB of data; which is probably the sweet spot for D1.
hemangjoshi37a: The interesting question isn't "can CF run agent inference" — it's what the routing layer needs to look like for multi-turn workflows. Shipping agent systems to enterprise clients the last year, the bottleneck is never raw tokens/sec. It's (a) state checkpointing betweentool calls, (b) cold-start latency on embedding/rerank models, (c) rate-limit coordination across concurrent agent loops. Does CF expose per-session state, or still stateless-per-request? Without that, you end up building the interesting part yourself.
strimoza: Interesting timing — I've been using Bunny CDN for video delivery and considering moving parts to Cloudflare. Anyone have experience comparing the two for media streaming specifically?
ncrmro: Turso/libsql has been great for poc project so far
lateral_cloud: Thanks ChatGPT
ascorbic: The interesting part is that you can use the same API with Workers AI models (hosted at the edge) and proxied models (OpenRouter-style).Disclaimer: I work at Cloudflare, but not on this.