Discussion
Search code, repositories, users, issues, pull requests...
jszymborski: This is reminds me of the input gates of an LSTM.
jjcm: Two things stand out to me with this:1. Drops compute required for training by ~20%. This approach wont just help the ever escalating model sizes larger companies are pushing for, it means things like autoresearch can iterate on new model architectures faster.2. WAY lower bandwidth requirements for inference. Means with approaches like this it should run on consumer hardware far better. It apparently requires 1/6th the memory bandwidth of a traditional approach for better results.This is a big improvement if it can be generalized. They're claiming it's a drop in replacement, so it seems like it can as well.
com2kid: > 2. WAY lower bandwidth requirements for inference. Means with approaches like this it should run on consumer hardware far better. It apparently requires 1/6th the memory bandwidth of a traditional approach for better results.That should be the headline right there. Giant side 60 font headline.Some people have PhDs in burying the lede!
talloaktrees: except it's not true
Murfalo: Amazingly, the first author is a high school student! https://nathanchen.me/public/About%20me.html