Discussion
Entropy-Guided KV Cache Summarization via Low-Rank Attention Reconstruction
bee_rider: Were there any downsides or difficulties?It would be sort of surprising if an SVD-based opportunity was missed (since it is such a familiar tool). But, your entropy and least-squares ideas are necessary to set that up, so I guess it makes sense that you’d find some new territory here.