• kautau@lemmy.world
    link
    fedilink
    arrow-up
    13
    arrow-down
    1
    ·
    1 day ago

    Ironically, I think to truly train an LLM the way fascists would want, they’d need more content, but there’s not enough original fascist revisionist content, so they’d need an LLM to generate all or most of the training data, which would lead to https://en.wikipedia.org/wiki/Model_collapse

      • Tar_Alcaran@sh.itjust.works
        link
        fedilink
        arrow-up
        4
        ·
        1 day ago

        The big problem with training LLMs is that you need good data, but there’s so much data you can’t really manually separate all “good” from all “bad” data. You have to use the set of all data, and a much much smaller set of tagged and marked “good” data.

    • NuXCOM_90Percent@lemmy.zip
      link
      fedilink
      arrow-up
      5
      ·
      edit-2
      1 day ago

      No. They don’t need to generate data to train on data. There is PLENTY of white supremacist hate shit out there.

      The issue is one of labeling and weighting. Which is a pretty solved problem. It isn’t 100% solved and there will be isolated cases but “grok” breaks under even the most cursory of poking.

      Don’t believe me? Go look at the crowd who can convert any image or text generating model into porn/smut/liveleak in nothing flat. Or, for a less horrifying version of that, how concepts like RAG and the like to take generalized models and heavily weight them toward what you actually care about.

      Nah. This, like most things musk, just highlights how grossly incompetent basically all of his companies are. Even spacex mostly just coasts on being the only ones allowed to work on stuff (RIP NASA and, to a lesser extent, JPL) and then poaching the talent from everyone else to keep them from showing that.