• Tar_Alcaran@sh.itjust.works
    link
    fedilink
    arrow-up
    4
    ·
    2 days ago

    The big problem with training LLMs is that you need good data, but there’s so much data you can’t really manually separate all “good” from all “bad” data. You have to use the set of all data, and a much much smaller set of tagged and marked “good” data.