Theoretical Analysis of Positional Encodings in Transformer Models arxiv.org 32 points by PaulHoule 12 hours ago
semiinfinitely 8 hours ago Kinda disappointing that rope- the most common pe- is given about one sentence in this work and omitted from the analysis. gsf_emergency_2 5 hours ago Maybe it's because ropes by themselves do nothing for the model capacity?
gsf_emergency_2 5 hours ago Maybe it's because ropes by themselves do nothing for the model capacity?
Kinda disappointing that rope- the most common pe- is given about one sentence in this work and omitted from the analysis.
Maybe it's because ropes by themselves do nothing for the model capacity?