My enthusiasm for tech is basically gone.

stoy@lemmy.zip · 14 days ago

My enthusiasm for tech is basically gone.

qqq@lemmy.world · edit-2 14 days ago

It really doesn’t suck at them. AI writes great code; I think we just want it to suck. It can’t magically generate a new Linux kernel, but the small tasks I’ve seen it do have all been mostly above average. (I have also seen some complete garbage, yes, mostly above average)

Kortalh@lemmy.world · 14 days ago

“Small tasks” are the key there. It can write a small script or spin up application boilerplate very well, but it really struggles with long-term maintenance and new features on a complex application.

I spend about as much time telling the AI “no, not like that” as it supposedly saves me from not having to type the code manually.

It does have some value, but I’d put it around a 10% boost – not the 500% boost that senior leadership insists it can do.

qqq@lemmy.world · 14 days ago

Right, but I think we’re kidding ourselves if we don’t think it’s going to get better. I have no doubt it will be able to magically generate a new Linux kernel.

SparroHawc@lemmy.zip · edit-2 13 days ago

You say that, but you have to remember that LLMs produce the average output of their training materials. Not the best, but the average. And there’s a lot of code out there that is simple. Only the outliers have the magic combination of conciseness AND quality AND complexity.

LLMs also have no understanding of context outside the immediate. Satire is completely opaque to them. Sarcasm is lost on them, by and large. And they have no way to differentiate between good and bad output. Or good and bad input, for that matter. Joke pseudocode is just as valid in their training corpus as dire warnings about insecure code.

I read a comment once that still rings true - “Hallucinations” are a misnomer. Everything an LLM puts out is a hallucination; it’s just that a lot of the time, it happens to be accurate. Eliminating that last percentage of inaccurate hallucinations is going to be nearly impossible.

qqq@lemmy.world · edit-2 13 days ago

I’d push back on your point here with a few things:

The primary one being: the code doesn’t need to be perfect or even above average – average is perfectly fine. The idea here is comparing the AI to a human, not to perfection. I see this constantly with AI and I find it a bit disingenuous.

I do truly believe what I said above will be possible within my career (I’m in my mid 30s), but it’s not really what I’m worried about right now. I think the current code I see being generated is generally “good enough”. I’m not comparing it to perfect: I’m comparing it to people.

I read a comment once that still rings true - “Hallucinations” are a misnomer. Everything an LLM puts out is a hallucination; it’s just that a lot of the time, it happens to be accurate. Eliminating that last percentage of inaccurate hallucinations is going to be nearly impossible.

I don’t see any reason you have to remove all hallucinations to get a good tool for autonomous development: humans aren’t perfect either. We compensate for that with processes and checking each others work, but plenty still falls through the cracks.

LLMs also have no understanding of context outside the immediate. Satire is completely opaque to them. Sarcasm is lost on them, by and large. And they have no way to differentiate between good and bad output. Or good and bad input, for that matter. Joke pseudocode is just as valid in their training corpus as dire warnings about insecure code.

Have you seen output in which satirical code is actually included? I’m well aware of things like https://www.anthropic.com/research/small-samples-poison and the potential here. And do you not believe that either (a) these types of trivial issues would be caught by a person whose job was just to audit output or even (b) this type of issue could be caught by specially trained domain limited AIs designed to check output?

SparroHawc@lemmy.zip · edit-2 12 days ago

I think the current code I see being generated is generally “good enough”. I’m not comparing it to perfect: I’m comparing it to people.

If this were true, then open source projects would have much less of an issue with pull requests from sloperators.

Have you seen output in which satirical code is actually included?

I wouldn’t expect to see it. Satirical code requires more thought than an LLM is capable of putting into its writing - you need to understand what is expected of whoever you’re satirizing, and then you have to take that expectation and take it a step further into the absurd. Without having that context of something that is specifically being satirized, what you have instead is just incorrect code. And again, the LLM is incapable of valuing proper code over intentionally wrong code, so it’s going to poison the database to some extent.

And LLMs don’t drop big chunks of copy-pasted code from Stack Exchange like an intern would. They work one token at a time. (Which is why trying to get them to understand that quotations need to be all in one piece is a futile endeavor.)

Besides, ‘satirical code’ is just one example of the many things that can poison the training. I couldn’t even begin to enumerate all the things that could mess with it, and honestly I’m surprised that LLMs do as well as they do considering they likely have all sorts of cross-language screwball connections (which may be why it has such a tendency to make up libraries; it doesn’t necessarily understand that a common PHP library doesn’t exist in Java).

do you not believe that either (a) these types of trivial issues would be caught by a person whose job was just to audit output or even (b) this type of issue could be caught by specially trained domain limited AIs designed to check output?

These issues could be caught by someone whose job it is to audit code, sure. The problem is that sloperators often don’t audit their own stuff well enough. They leave it to the open source repo’s admins. When pull requests from overeager noobs were infrequent, it wasn’t the problem; they could gently correct them, the repo would stay high-quality, the noob would learn, everyone is fine. But now, sloperators are dumping low-quality pull requests on the repos faster than the admins can sort through them - because it now takes less time to produce slop code than it takes to determine whether or not the slop is worth including. The admins are swamped, because they can’t sort the wheat from the chaff fast enough.

A domain-limited AI designed to check output would be useful - if it could be trusted. Open-source project admins are some of the best coders out there, and they vastly outstrip the capabilities of LLMs. You’re suggesting that we replace THEM with an agent. They are in that position because they’re right far more often than they’re wrong when it comes to understanding the code as it exists, and how incoming code would impact it - or at least they’re right often enough to keep the project alive. LLMs will be worse at that job, I guarantee it. They’d be fast, but they’d be wrong too often. This is the primary issue with LLM agents.

qqq@lemmy.world · 10 days ago

You’re suggesting that we replace THEM with an agent.

I am not suggesting we replace anyone, least of all the open source community, so let’s not put words in my mouth

I think the current code I see being generated is generally “good enough”. I’m not comparing it to perfect: I’m comparing it to people.

If this were true, then open source projects would have much less of an issue with pull requests from sloperators.

This doesn’t follow to me. A good tool in the hand of a crappy user doesn’t suddenly make good output. I specifically said that LLMs write good code in a specific setting. Clearly random person generating thousands of lines at a time for a project they don’t understand isn’t that setting.

You seem to be very focused on crappy code generated by people that don’t know what they’re doing, the technology isn’t good enough for that, so yes, it won’t work in that setting, I agree.

Senal@programming.dev · 14 days ago

It really doesn’t suck at them. AI writes great code; I think we just want it to suck.

Citation? I’m really asking because I’ve yet to hear about anything above a toy project that has had any verifiable success with AI code generation as a major component of their workflow.

As in a like for like improvement in code quality, security, bug occurrences and severity, developer efficiency, all that jazz, not just the standard “we’ve funnelled so much money in to this we are almost fiscally required to claim success”

its not a dig, i really want to see one so i can found out how it was done.

qqq@lemmy.world · edit-2 14 days ago

Claude commits to GitHub with the same name no matter who uses it. You can see every single line of open source code it has written (for GitHub only of course): https://github.com/search?q=author%3Aclaude&type=commits&s=author-date&o=desc. Look around as you please, most of it is just fine.

People that I know to be good developers have also shared their experiences with it and say yes, it has written good code for them. I’ve personally used ChatGPT to generate very mundane tasks and the code it output was more than adequate.

It introduces security bugs and subtle bugs at probably the same rate as a human (I have no “citation” there, just what I’ve seen). It needs to be “driven” by a human, yes, but it’s not clear for how long it will need to be, and even if it always does, personally I don’t want my job to be to “drive an AI”.

Senal@programming.dev · 14 days ago

I appreciate the answer but that’s not at all what i asked.

I have anecdotes and personal experience i could cite but that’s not particularly helpful in a general sense.

Pointing to claude submissions in projects is actively less than helpful in this case because it only proves that single files in isolation look like they are well written, it gives no indication of overall project quality.

People that I know to be good developers have also shared their experiences with it and say yes, it has written good code for them. I’ve personally used ChatGPT to generate very mundane tasks and the code it output was more than adequate.

So in a very limited context the code generated for you personally was acceptable, that’s great, i’ve found much the same, but that’s a far cry from “AI writes great code; I think we just want it to suck.”

It’s somewhat my bad though, when i say “citation” i don’t need a full research paper (though that would be nice) i’d like something a bit more substantial than a “trust me bro”.

It introduces security bugs and subtle bugs at probably the same rate as a human (I have no “citation” there, just what I’ve seen)

That’s a load-bearing probably, my experience has been the polar opposite of that, I’ve been involved in two major AI initiatives and both choked hard on security and domain bugs. That could very well be a project management or company specific issue, hence the search for successful projects to compare.

My quest continues.

qqq@lemmy.world · edit-2 14 days ago

I didn’t say “trust me bro” and showing Claude submissions is sufficient for analyzing code in the context I believe it is good: one file at a time and one task at a time. This is also the same realm that a human is good. You are welcome to look at the project as a whole to determine the “project quality” as well: it’s open source. But I’m not here to argue: I believe this tech that is barely in its infancy is already quite good and going to get better, and I’m already considering what it will do to my life. If you don’t, that’s fine.

I’ll add here that I find it very frustrating to talk about these “AI agents” and their code output, because it’s something we’re all close to and spent a lot of time learning. The concept of “a machine” getting “better than us” so quickly, with the background context of an industry that is chomping at the bit to replace humans makes these discussions inherently difficult and really emotional. I feel genuine sadness when I think about it. If the world were different we’d probably all be stoked. I don’t want the AI to be better than me, and I currently don’t believe it is, but I think:

My belief doesn’t stop the market. People do believe that it is better than me or at least good enough. This has a real effect on my life and the lives of people I know.
I don’t see any fundamental reason it won’t get better at development. Part of the reason it struggles with large projects is context: that doesn’t sound like a fundamental engineering constraint to me, it sounds like a memory constraint. Specialization will also make it better and better I assume.
Even if it is never better than me, it will certainly be more efficient and eventually the market will consider my time better spent correcting its output or guiding it, removing the fun part of the work in my mind.

I don’t think my job is currently on the chopping block today: I don’t do development I do security work. But I do think it will either be on the chopping block or fundamentally change sooner than I’m comfortable with.

Senal@programming.dev · edit-2 13 days ago

That’s on me, I meant the equivalent of a “trust me bro” , in this case an anecdotal “me and the people I know all say…”

showing Claude submissions is sufficient for analyzing code in the context I believe it is good

Yes, in the context you provided it makes sense, as a response to my question which specified examples of larger projects/workflows, it does not.

Im not here to argue either, I asked a specific question and your answer didn’t really address any of it, i was just pointing that out.

I too find it frustrating but it seems for different reasons.

I really really dislike the way it’s being sold as a solution for things it’s in no way a solution for.

They do certain things fine, good even, but blanket statements like “their code is great” without appropriate qualifiers is contributing to the validation of these bullshit sales-oriented claims of task competency.

1: agreed

2: then I think you are missing the fundamental limitations of the current approaches, but we can agree to disagree on this.

3: see 2

I agree with jobs on the chopping block, though i think that’s in large part due to poor due diligence and planing by management, but that’s nothing new, the same thing has and is still happening with offshoring (throwing more people at a problem generally won’t solve design and governance issues).

I also think the current systems aren’t capable of being a viable replacement for anything above junior level stuff, if that ( not that that doesn’t present it’s own problems )

I think the difference in opinion comes from my belief that LLM’s and the current tooling around them aren’t fundamentally capable of replacing existing resources, not that they just don’t have the power yet.

Putting increasing large compute in a calculator won’t magically make it a spreadsheet application.

qqq@lemmy.world · edit-2 13 days ago

To your point then: what are your thoughts on this project? https://github.com/anthropics/claudes-c-compiler I’m not particularly interested in this use case right now but it seems more in line with what you’re interested in.

I think it shows a lot of limitations but also a lot of potential. I don’t personally think the AI needs to get the code perfect on the first go – it has to be compared to humans and we definitely don’t do that.

I really really dislike the way it’s being sold as a solution for things it’s in no way a solution for.

Yes, of course. I think it’s important to look passed the blowhards and think about what it’s actually doing: that is the perspective I’m trying to talk about this from.

Senal@programming.dev · 13 days ago

My initial thoughts are that my original ask was this :

because I’ve yet to hear about anything above a toy project that has had any verifiable success with AI code generation as a major component of their workflow.

and the example you provided was a toy project used as a publicity stunt.

On the technical side i don’t know enough rust to be able to weigh in on the technical accuracy of the project.

The ability for current LLM’s to churn out something that looks relatively good at first glance isn’t my point of contention, most of us know it can do that.

I’m just looking for a single medium to large project that is successfully being used in production (close to production is also fine) that was created with significant LLM involvement.

There is so much talk around this by that the fact i haven’t come across any mention of a successful deliverable (in the context i mentioned) raises all sorts of red flags for me, personally.

I’m not trying to catch you out, it’s just that i haven’t seen one so i was wondering if you have, if you haven’t that’s fine, it’s not a trap.

I think it shows a lot of limitations but also a lot of potential. I don’t personally think the AI needs to get the code perfect on the first go – it has to be compared to humans and we definitely don’t do that.

Iterative progress is generally the way of things, but most non-trivial agentic workflows already work with iterative code generation and testing so expecting a correct solution at the end of that process is more reasonable than you would think.

The difference between people and LLM’s is the types of interactions you have with them, you can ask the LLM to explain why it did something, but if you’ve ever tried that I’m sure you can understand why it’s not the same as the kind of answers you’d get from a person.

Yes, of course. I think it’s important to look passed the blowhards and think about what it’s actually doing: that is the perspective I’m trying to talk about this from.

As am i, I’m not against LLM usage, I’m against the pretense that it has capabilities it does not, in fact, have.

Selling something on the basis of it being able to do something it can’t do is where term “snake oil salesman” comes from.