• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    42
    ·
    edit-2
    4 hours ago

    It’s ironic how conservative the spending actually is.

    Awesome ML papers and ideas come out every week. Low power training/inference optimizations, fundamental changes in the math like bitnet, new attention mechanisms, cool tools to make models more controllable and steerable and grounded. This is all getting funded, right?

    No.

    Universities and such are seeding and putting out all this research, but the big model trainers holding the purse strings/GPU clusters are not using them. They just keep releasing very similar, mostly bog standard transformers models over and over again, bar a tiny expense for a little experiment here and there. In other words, it’s full corporate: tiny, guaranteed incremental improvements without changing much, and no sharing with each other. It’s hilariously inefficient. And it relies on lies and jawboning from people like Sam Altman.

    Deepseek is what happens when a company is smart but resource constrained. An order of magnitude more efficient, and even their architecture was very conservative.