Recent Machine Learning Papers

cs.LG cs.AI cs.CC cs.CL

2024-12-23

Theoretical Constraints on the Expressive Power of $\mathsf{RoPE}$-based Tensor Attention Transformers

Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song, Mingda Wan

Tensor Attention extends traditional attention mechanisms by capturing high-order correlations across multiple modalities, addressing the limitations of classical matrix-based attention. Meanwhile, Rotary Position Embedding ($\mathsf{RoPE}$) has shown superior performance in encoding positional info...

PDF arXiv