• vivendi@programming.dev
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    2
    ·
    3 days ago

    To be fair, LLM technology is really making other fields obsolete. Nobody is going to bother making yet another shitty CNN, GRU, LSTM or something when we have transformer architecture, and LLMs that do not work with text (like large vision models) are looking like the future

    • Zacryon@feddit.org
      link
      fedilink
      English
      arrow-up
      1
      ·
      6 hours ago

      Nah, I wouldn’t give up on these so easily. They still have applications and advantages over transformers, e.g., efficiency, where the quality might suffice for the reduced time/space conplexity (Vanilla transformer still has O(n^2), and I have yet to find an efficient and qualitatively similar causal transformer.)

      But regarding sequence modeling / reasoning about sequences ability, attention models are the hot shit and currently transformers excel on that.