Discussion
Introducing Ternary Bonsai: Top Intelligence at 1.58 Bits
yodon: So excited to see this - the big advantage of 1.58 bits is there are no multiplications at inference time, so you can run them on radically simpler and cheaper hardware.
wmf: Yet again they're comparing against unquantized versions of other models. They would probably still win but by a much smaller size margin.
Dumbledumb: Wouldnt the margin be higher? All other models being moved from unquantized to quantized would lower their performance, while bonsai stays. I get what you see if it was in regards to score/modelsize, but not for absolute performance
Animats: At 4 bits, you could just have a hard-wired table lookup. Two 4 bit values in, 256 entry table. You can have saturating arithmetic and a post-processing function for free. Somebody must be building hardware like that.
Animats: This makes sense. The 1-bit model implies needing 2x as many neurons, because you need an extra level to invert. But the ternary model still has a sign, just really low resolution.(I've been reading the MMLU-Redux questions for electrical engineering. They're very funny. Fifty years ago they might have been relevant. The references to the Intel 8085 date this to the mid-1970s. Moving coil meters were still a big thing back then. Ward-Leonard drives still drove some elevators and naval guns. This is supposed to be the hand-curated version of the questions. Where do they get this stuff? Old exams?)[1] https://github.com/aryopg/mmlu-redux/blob/main/outputs/multi...
ericb: This is pretty cool! I would love to see an even larger models shrunk down.If you got that into a couple gigs--what could you stuff into 20 gigs?