Nah, I wouldn’t give up on these so easily. They still have applications and advantages over transformers, e.g., efficiency, where the quality might suffice for the reduced time/space conplexity (Vanilla transformer still has O(n^2), and I have yet to find an efficient and qualitatively similar causal transformer.)
But regarding sequence modeling / reasoning about sequences ability, attention models are the hot shit and currently transformers excel on that.
Nah, I wouldn’t give up on these so easily. They still have applications and advantages over transformers, e.g., efficiency, where the quality might suffice for the reduced time/space conplexity (Vanilla transformer still has O(n^2), and I have yet to find an efficient and qualitatively similar causal transformer.)
But regarding sequence modeling / reasoning about sequences ability, attention models are the hot shit and currently transformers excel on that.