Discussion
NanoGPT Slowrun: 10x Data Efficiency with Infinite Compute
littlestymaar: > Data efficiency matters because compute grows much faster than data [2] (referencing a paper from 2022)I'm not convinced this is particularly true in today's world, if you have more compute, you can simply generate more, and higher quality, artificial data. That's what all labs have been doing since at least 2023. Also, the post references the Chinchilla-optimal training as a comparison baseline, but everyone has moved far beyond Chinchilla scaling, small models are routinely trained on 10-400 times more data than (1-40T tokens) than the Chinchilla-optimal number.
yorwba: Related: Discussion on the initial NanoGPT Slowrun announcement: https://news.ycombinator.com/item?id=47251259 (185 points 15 days ago, 39 comments)
sdpmas: thanks!
sdpmas: > you can simply generate more, and higher quality, artificial datathis is simply not true. and it's very clear if you look at continual learning, robotics, biology, etc. each has enough economic incentives to spend 1000x compute if that led to much better results, but we just don't know how to do that.good point on chinchilla, but our models are still absurdly large no matter what standards you compare them to.