Discussion
LLMs predict my coffee
amha: There's a simple differential equation often taught in intro calc courses, "Newton's Law of Cooling/Heating," which basically says that the rate of heat loss is proportional to the difference in temperature between a substance and its environment. I'm curious what that'd look like here. It's a very simple model, of course, not taking into account all the variables that Dynomight points out, but if a simple model can be nearly as predictive as more complex models...I'm also curious to see the details of the models that Dynomight's LLMs produced!
amelius: That model doesn't explain the relatively sharp drop in the beginning.
coder68: It does? There is a fast drop followed by a long decay, exponential in fact. The cooling rate is proportional to the temperature difference, so the drop is sharpest at the very beginning when the object is hottest.
amelius: I mean that initial drop doesn't look like it is part of the same exponential decay.
leecommamichael: ... and so another benchmark is born.
IncreasePosts: The water temperature drops quickly because the room temperature ceramic mug is getting heated to near equilibrium with the water. If you used a vacuum sealed mug(thermos) then the water temp would drop a bit but not much at all initially.
3eb7988a1663: The appendix lists the equations transcribed from the raw answers. LLM T(t) Cost Kimi K2.5 (reasoning) 20 + 52.9 exp(-t/3600)+ 27.1 exp(-t/80) $0.01 Gemini 3.1 Pro 20 + 53 exp(-t/2500) + 27 exp(-t/149.25) $0.09 GPT 5.4 20 + 54.6 exp(-t/2920) + 25.4 exp(-t/68.1) $0.11 Claude 4.6 Opus (reasoning) 20 + 55 exp(-t/1700) + 25 exp(-t/43) $0.61 (eeek) Qwen3-235B 20 + 53.17 exp(-t/1414.43) $0.009 GLM-4.7 (reasoning) 20 + 53.2 exp(-t/2500) $0.03
kurthr: It looks like a lot of them are missing something big. I'd think the two big ones are the evaporative cooling as you pour into the cup, and heating up the cup (by convection) itself. The convective cooling to the air is tertiary, but important (and conduction of the mug to the table probably isn't completely negligible). If there's only one exponential, they're definitely doing something wrong.I'd like to see a sensitivity study to see how much those terms would need to be changed to match within a few %. Exponentials are really tweaky!
andai: Is that what that first drop is? The cold cup stealing heat from the coffee?