Ml | Tech Blog

The paper In this blogpost, we’ll explore the paper: Mamba: Linear-Time Sequence Modeling with Selective State Spaces, which introduced a neural network architecture that bridges the gap between RNNs and Transformers. The authors are: Albert Gu and Tri Dao. Albert worked FlashAttention, which significantly improved Transformer efficiency and have been widely adopted in all prominent deep learning libraries . Motivation In the paper authors argue that: A fundamental problem of sequence modeling is compressing context into a smaller state ...