• Zacryon@feddit.org
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 day ago

    Nah, I wouldn’t give up on these so easily. They still have applications and advantages over transformers, e.g., efficiency, where the quality might suffice for the reduced time/space conplexity (Vanilla transformer still has O(n^2), and I have yet to find an efficient and qualitatively similar causal transformer.)

    But regarding sequence modeling / reasoning about sequences ability, attention models are the hot shit and currently transformers excel on that.