I started my IT career in 2011, I have enjoyed it, I have got to do a lot of interesting stuff and meet interesting people, I will treasure those memories forever.
But, starting with crypto turing general computing from being:
“Wow, this machine can run so many apps at the same time!” or “Holy shit, those graphics look epic!” or “Amazing, this computer has really sped up that annoying task!”
To being:
Yo! Look at how many numbers I can generate!
That brought down my enthusiasm severely, but hey, figuring out solutions to problems was still fun.
Then came AI/LLMs.
And with it, a mountain of slop.
Finding help about an issue has gone from googling and reading help articles written by something with an actual brain to mostly being rephrased manuals that only provide working answers to semi standard answers.
Add to that a general push to us AI in anything and everything, no matter how little relevance it holds for the task at hand.
I also remember how AI was sold to the us at first, we were promised to do away with boring paperwork, so we could get on with our actual job.
What did we get? An AI that takes the fun and creative parts, leaving the paperwork for the workers.
We got an AI that we need to expect to be stealing our work and data at every point, giving us shit work back, while being told that we should applaude it and be grateful for it.
And the worst thing, the worst thing is that people seem happy with it. I keep getting requests to buy another Copilot license or asking for another AI service to be added to our tenant, I am sick of it!
We got an AI that somehow has slithered onto the golden throne and can’t be questioned.
I am not able to leave the tech market at this time, but I will focus on more tangible hobbies going forward.
This year, I have given myself a project, I will try to build a model railway in a suitcase. That will be a Z-scale tiny world in a suitcase.
I have never done anything remotely like it, but I feel like I need something physical to take my mind off tech.
Sorry for the rant, but I just came off of a high from realizing and putting words to my feelings.


It really doesn’t suck at them. AI writes great code; I think we just want it to suck. It can’t magically generate a new Linux kernel, but the small tasks I’ve seen it do have all been mostly above average. (I have also seen some complete garbage, yes, mostly above average)
“Small tasks” are the key there. It can write a small script or spin up application boilerplate very well, but it really struggles with long-term maintenance and new features on a complex application.
I spend about as much time telling the AI “no, not like that” as it supposedly saves me from not having to type the code manually.
It does have some value, but I’d put it around a 10% boost – not the 500% boost that senior leadership insists it can do.
Right, but I think we’re kidding ourselves if we don’t think it’s going to get better. I have no doubt it will be able to magically generate a new Linux kernel.
You say that, but you have to remember that LLMs produce the average output of their training materials. Not the best, but the average. And there’s a lot of code out there that is simple. Only the outliers have the magic combination of conciseness AND quality AND complexity.
LLMs also have no understanding of context outside the immediate. Satire is completely opaque to them. Sarcasm is lost on them, by and large. And they have no way to differentiate between good and bad output. Or good and bad input, for that matter. Joke pseudocode is just as valid in their training corpus as dire warnings about insecure code.
I read a comment once that still rings true - “Hallucinations” are a misnomer. Everything an LLM puts out is a hallucination; it’s just that a lot of the time, it happens to be accurate. Eliminating that last percentage of inaccurate hallucinations is going to be nearly impossible.
I’d push back on your point here with a few things:
The primary one being: the code doesn’t need to be perfect or even above average – average is perfectly fine. The idea here is comparing the AI to a human, not to perfection. I see this constantly with AI and I find it a bit disingenuous.
I do truly believe what I said above will be possible within my career (I’m in my mid 30s), but it’s not really what I’m worried about right now. I think the current code I see being generated is generally “good enough”. I’m not comparing it to perfect: I’m comparing it to people.
I don’t see any reason you have to remove all hallucinations to get a good tool for autonomous development: humans aren’t perfect either. We compensate for that with processes and checking each others work, but plenty still falls through the cracks.
Have you seen output in which satirical code is actually included? I’m well aware of things like https://www.anthropic.com/research/small-samples-poison and the potential here. And do you not believe that either (a) these types of trivial issues would be caught by a person whose job was just to audit output or even (b) this type of issue could be caught by specially trained domain limited AIs designed to check output?
If this were true, then open source projects would have much less of an issue with pull requests from sloperators.
I wouldn’t expect to see it. Satirical code requires more thought than an LLM is capable of putting into its writing - you need to understand what is expected of whoever you’re satirizing, and then you have to take that expectation and take it a step further into the absurd. Without having that context of something that is specifically being satirized, what you have instead is just incorrect code. And again, the LLM is incapable of valuing proper code over intentionally wrong code, so it’s going to poison the database to some extent.
And LLMs don’t drop big chunks of copy-pasted code from Stack Exchange like an intern would. They work one token at a time. (Which is why trying to get them to understand that quotations need to be all in one piece is a futile endeavor.)
Besides, ‘satirical code’ is just one example of the many things that can poison the training. I couldn’t even begin to enumerate all the things that could mess with it, and honestly I’m surprised that LLMs do as well as they do considering they likely have all sorts of cross-language screwball connections (which may be why it has such a tendency to make up libraries; it doesn’t necessarily understand that a common PHP library doesn’t exist in Java).
These issues could be caught by someone whose job it is to audit code, sure. The problem is that sloperators often don’t audit their own stuff well enough. They leave it to the open source repo’s admins. When pull requests from overeager noobs were infrequent, it wasn’t the problem; they could gently correct them, the repo would stay high-quality, the noob would learn, everyone is fine. But now, sloperators are dumping low-quality pull requests on the repos faster than the admins can sort through them - because it now takes less time to produce slop code than it takes to determine whether or not the slop is worth including. The admins are swamped, because they can’t sort the wheat from the chaff fast enough.
A domain-limited AI designed to check output would be useful - if it could be trusted. Open-source project admins are some of the best coders out there, and they vastly outstrip the capabilities of LLMs. You’re suggesting that we replace THEM with an agent. They are in that position because they’re right far more often than they’re wrong when it comes to understanding the code as it exists, and how incoming code would impact it - or at least they’re right often enough to keep the project alive. LLMs will be worse at that job, I guarantee it. They’d be fast, but they’d be wrong too often. This is the primary issue with LLM agents.
I am not suggesting we replace anyone, least of all the open source community, so let’s not put words in my mouth
This doesn’t follow to me. A good tool in the hand of a crappy user doesn’t suddenly make good output. I specifically said that LLMs write good code in a specific setting. Clearly random person generating thousands of lines at a time for a project they don’t understand isn’t that setting.
You seem to be very focused on crappy code generated by people that don’t know what they’re doing, the technology isn’t good enough for that, so yes, it won’t work in that setting, I agree.
Citation? I’m really asking because I’ve yet to hear about anything above a toy project that has had any verifiable success with AI code generation as a major component of their workflow.
As in a like for like improvement in code quality, security, bug occurrences and severity, developer efficiency, all that jazz, not just the standard “we’ve funnelled so much money in to this we are almost fiscally required to claim success”
its not a dig, i really want to see one so i can found out how it was done.
Claude commits to GitHub with the same name no matter who uses it. You can see every single line of open source code it has written (for GitHub only of course): https://github.com/search?q=author%3Aclaude&type=commits&s=author-date&o=desc. Look around as you please, most of it is just fine.
People that I know to be good developers have also shared their experiences with it and say yes, it has written good code for them. I’ve personally used ChatGPT to generate very mundane tasks and the code it output was more than adequate.
It introduces security bugs and subtle bugs at probably the same rate as a human (I have no “citation” there, just what I’ve seen). It needs to be “driven” by a human, yes, but it’s not clear for how long it will need to be, and even if it always does, personally I don’t want my job to be to “drive an AI”.
I appreciate the answer but that’s not at all what i asked.
I have anecdotes and personal experience i could cite but that’s not particularly helpful in a general sense.
Pointing to claude submissions in projects is actively less than helpful in this case because it only proves that single files in isolation look like they are well written, it gives no indication of overall project quality.
So in a very limited context the code generated for you personally was acceptable, that’s great, i’ve found much the same, but that’s a far cry from “AI writes great code; I think we just want it to suck.”
It’s somewhat my bad though, when i say “citation” i don’t need a full research paper (though that would be nice) i’d like something a bit more substantial than a “trust me bro”.
That’s a load-bearing probably, my experience has been the polar opposite of that, I’ve been involved in two major AI initiatives and both choked hard on security and domain bugs. That could very well be a project management or company specific issue, hence the search for successful projects to compare.
My quest continues.
I didn’t say “trust me bro” and showing Claude submissions is sufficient for analyzing code in the context I believe it is good: one file at a time and one task at a time. This is also the same realm that a human is good. You are welcome to look at the project as a whole to determine the “project quality” as well: it’s open source. But I’m not here to argue: I believe this tech that is barely in its infancy is already quite good and going to get better, and I’m already considering what it will do to my life. If you don’t, that’s fine.
I’ll add here that I find it very frustrating to talk about these “AI agents” and their code output, because it’s something we’re all close to and spent a lot of time learning. The concept of “a machine” getting “better than us” so quickly, with the background context of an industry that is chomping at the bit to replace humans makes these discussions inherently difficult and really emotional. I feel genuine sadness when I think about it. If the world were different we’d probably all be stoked. I don’t want the AI to be better than me, and I currently don’t believe it is, but I think:
I don’t think my job is currently on the chopping block today: I don’t do development I do security work. But I do think it will either be on the chopping block or fundamentally change sooner than I’m comfortable with.
That’s on me, I meant the equivalent of a “trust me bro” , in this case an anecdotal “me and the people I know all say…”
Yes, in the context you provided it makes sense, as a response to my question which specified examples of larger projects/workflows, it does not.
Im not here to argue either, I asked a specific question and your answer didn’t really address any of it, i was just pointing that out.
I too find it frustrating but it seems for different reasons.
I really really dislike the way it’s being sold as a solution for things it’s in no way a solution for.
They do certain things fine, good even, but blanket statements like “their code is great” without appropriate qualifiers is contributing to the validation of these bullshit sales-oriented claims of task competency.
1: agreed
2: then I think you are missing the fundamental limitations of the current approaches, but we can agree to disagree on this.
3: see 2
I agree with jobs on the chopping block, though i think that’s in large part due to poor due diligence and planing by management, but that’s nothing new, the same thing has and is still happening with offshoring (throwing more people at a problem generally won’t solve design and governance issues).
I also think the current systems aren’t capable of being a viable replacement for anything above junior level stuff, if that ( not that that doesn’t present it’s own problems )
I think the difference in opinion comes from my belief that LLM’s and the current tooling around them aren’t fundamentally capable of replacing existing resources, not that they just don’t have the power yet.
Putting increasing large compute in a calculator won’t magically make it a spreadsheet application.
To your point then: what are your thoughts on this project? https://github.com/anthropics/claudes-c-compiler I’m not particularly interested in this use case right now but it seems more in line with what you’re interested in.
I think it shows a lot of limitations but also a lot of potential. I don’t personally think the AI needs to get the code perfect on the first go – it has to be compared to humans and we definitely don’t do that.
Yes, of course. I think it’s important to look passed the blowhards and think about what it’s actually doing: that is the perspective I’m trying to talk about this from.
My initial thoughts are that my original ask was this :
and the example you provided was a toy project used as a publicity stunt.
On the technical side i don’t know enough rust to be able to weigh in on the technical accuracy of the project.
The ability for current LLM’s to churn out something that looks relatively good at first glance isn’t my point of contention, most of us know it can do that.
I’m just looking for a single medium to large project that is successfully being used in production (close to production is also fine) that was created with significant LLM involvement.
There is so much talk around this by that the fact i haven’t come across any mention of a successful deliverable (in the context i mentioned) raises all sorts of red flags for me, personally.
I’m not trying to catch you out, it’s just that i haven’t seen one so i was wondering if you have, if you haven’t that’s fine, it’s not a trap.
Iterative progress is generally the way of things, but most non-trivial agentic workflows already work with iterative code generation and testing so expecting a correct solution at the end of that process is more reasonable than you would think.
The difference between people and LLM’s is the types of interactions you have with them, you can ask the LLM to explain why it did something, but if you’ve ever tried that I’m sure you can understand why it’s not the same as the kind of answers you’d get from a person.
As am i, I’m not against LLM usage, I’m against the pretense that it has capabilities it does not, in fact, have.
Selling something on the basis of it being able to do something it can’t do is where term “snake oil salesman” comes from.