• PhilipTheBucket@piefed.social
    link
    fedilink
    English
    arrow-up
    2
    ·
    13 hours ago

    Yeah, I get it. I don’t think it is necessarily bad research or anything. I just feel like maybe it would have been good to go into it as two papers:

    1. Look at the funny LLM and how far off the rails it goes if you don’t keep it stable and let it kind of “build on itself” over time iteratively and don’t put the right boundaries on
    2. How should we actually wrap up an LLM into a sensible model so that it can pursue an “agent” type of task, what leads it off the rails and what doesn’t, what are some various ideas to keep it grounded and which ones work and don’t work

    And yeah obviously they can get confused or output counterfactuals or nonsense as a failure mode, what I meant to say was just that they don’t really do that as a response to an overload / “DDOS” situation specifically. They might do it as a result of too much context or a badly set up framework around them sure.

    • Sasha [They/Them]@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      1
      ·
      13 hours ago

      I meant they’re specifically not going for that though. The experiment isn’t about improving the environment itself, it’s about improving the LLM. Otherwise they’d have spent the paper evaluating the effects of different environments and not different LLMs.