To be fair, LLM technology is really making other fields obsolete. Nobody is going to bother making yet another shitty CNN, GRU, LSTM or something when we have transformer architecture, and LLMs that do not work with text (like large vision models) are looking like the future
Nah, I wouldn’t give up on these so easily. They still have applications and advantages over transformers, e.g., efficiency, where the quality might suffice for the reduced time/space conplexity (Vanilla transformer still has O(n^2), and I have yet to find an efficient and qualitatively similar causal transformer.)
But regarding sequence modeling / reasoning about sequences ability, attention models are the hot shit and currently transformers excel on that.
To be fair, LLM technology is really making other fields obsolete. Nobody is going to bother making yet another shitty CNN, GRU, LSTM or something when we have transformer architecture, and LLMs that do not work with text (like large vision models) are looking like the future
Nah, I wouldn’t give up on these so easily. They still have applications and advantages over transformers, e.g., efficiency, where the quality might suffice for the reduced time/space conplexity (Vanilla transformer still has O(n^2), and I have yet to find an efficient and qualitatively similar causal transformer.)
But regarding sequence modeling / reasoning about sequences ability, attention models are the hot shit and currently transformers excel on that.