17 Comments
User's avatar
Michał's avatar

I have very similar observations like you Oskar. I gave it a very simple task of improving code coverage in the existing codebase (small python project). I used Cursor. I was positively surprised by the automation factor. It could run certain commands to get current coverage percent. It could run/rerun the tests to check for improvements, etc. Then the magic started to fade away. It struggled to write improving tests. I directed it in some course and it could make it work. The sad part is ... I could have done it way faster without struggling to describe "the obvious" sometimes. I still prefer to have fun writing the code even the boilerplate :)

Expand full comment
Oskar Dudycz's avatar

I think that for coding, it can give okayish results as long as it's touching a single file/class/concept, then it can be some help for boilerplate. But then, as you said, the question is if reviewing and fixing the code won't still take the same amount of time as doing it from scratch.

Expand full comment
Tito George's avatar

I agree too! Most of the time, few lines of code gets generated in seconds and I end up spending next few mins to correct it. I have only used github copilot. One thing it does well is writing unit test cases, in my case go-lang. It is not 100% correct, but most of the boiler-plate (ctrc+c, ctrl+v) code gets generated in bulk. No denying the fact these tools increased the productivity of an average developer, but that is a problem. The quality is poor, code is not performant. The developer who committed don't know how to tune it. Time saved while writing is lost during performance tuning. We had to rewrite API's after finding issues in performance testing.

Expand full comment
Oskar Dudycz's avatar

Yup, I think that we hoped that practices that didn't work well for regular development would magically work with LLMs. And they don't, still, unclear design is not a good practice. Generating a huge pile of code is unmanageable. I think that it should be treated as a more extension to IDE capabilities, then it could be useful help, but not a replacement for our work.

Expand full comment
Greg H's avatar

"A lot of us came to programming to express our creativity. The puzzle-solving, the flow state, and the satisfaction of building something with our own hands."

I stopped using AI the moment I felt this slipping away, and then turned on AI when I realized how blatantly it was stealing from all artists.

Expand full comment
Oskar Dudycz's avatar

Yup, I didn't even touch this topic, which is also important, I think that stealing shouldn't be a business model, and it may be one of the reasons why it may collapse soon. Plus, soon it will need to be learning on synthetic data, or code generated by LLMs, which can be a spiral down for quality. I think that we're already seeing that with Chat GPT 5 release.

Expand full comment
Greg H's avatar

can't come soon enough

Expand full comment
Ynon's avatar

I'm not using the tools the same way. What works for me is to think of AI as a "continuation machine" - once there's a good base, AI can build the rest.

So first what works:

1. Create another page in the GUI if the app already has multiple pages and good abstractions.

2. Connect to another backend endpoint.

3. Add a new event type to a system with slightly different UI.

4. Add new tests based on existing code, tests and fixtures.

5. Analyze code and suggest edge cases

6. Scan project to find specific flow or possible causes of a bug in the flow.

One debugging example that saved me hours was when AI found I had the wrong same site cookie policy. Another good catch it had was finding a missing "order" clause in an SQL query. These are small things but can be useful in larger projects.

Also when working with dynamically typed languages like Ruby or JavaScript AI doesn't have typos and makes almost no type or name mistakes.

What doesn't work:

1. AI doesn't think - so refactoring in general works terrible or not at all.

2. AI creates terrible tests because it doesn't understand the actual use case or edge cases. It will test too much or not the correct things.

3. AI is a terrible architect. It is too influenced by the existing code.

In general use AI as an advanced copy-paster and you'll find you can get work done much faster. Let it "think" for you and you'll just spend your life cleaning senseless code.

Expand full comment
Andrzej Zabost's avatar

For me, it didn't even work as a copy-paster. I tried it in two scenarios: one where it was instructed to copy the same line to 40 files, and another where it was supposed to delete duplicated lines from two files and move them (deduplicated) into a third file. It failed miserably in both cases.

I'm not trying to say these tools are completely useless, though. There are tons of other scenarios where they work well enough. It's just that they certainly don't speed me up as much as I would like to, and sometimes they even slow me down if I don't realize early enough that the given task was too difficult.

Expand full comment
Architecture Corner's avatar

I agree that the productivity gains are inflated for the real case of production code -- not the vibe nature. To begin with, developers do not spend 100% of their time in the mechanical task of typing. Whatever gains we can get are on the portion of the development effort that mostly entails the actual code generation part.

I do not want to go in the comparison with junior dev but maybe to share that my perspective of investing time in the requirements to actually be a positive thing if taken properly.

I subscribe to the first solve the problem and then code, and spending some time to do exactly that helps you to understand, with LLM or not, what is path to be taken.

As long as you do that in a continuous and incremental fashion, like you would in any "agile" approach, the risk of big design upfront or rational rose/waterfall diminishes.

If I were to guess I would say the factor to me is more likely 1.25x than 10 but still positive. At least for business applications. I can't speak on using it for developing frameworks or other types.

Things that _helped_ me:

- having my coding / architecture standards documented in .md instruction files (what libraries to use, my definitions of entities, value objects, security, infrastructure etc)

- using a spec / PRD

- asking it to do small iterations and update an execution plan

Not perfect and sometimes enfuriating when something simple goes wrong but mostly positive.

Expand full comment
Oskar Dudycz's avatar

Thanks for expanding!

What I tried this week after writing this article was:

- telling LLM to do brainstorming and focusing on asking questions and challenging my design and noting down all questions and answers into md,

- I was answering and treating Claude Code as rubber duck, that will challenge me and help me to polish the rough corners,

- I also asked to check, analyse and give me options based on the code from solutions,

- Building a summary spec and doing some iterations on it.

When I knew what I want to achieve, and the general vision how, it helped me to take it out of my head, still, I implemented it on my own after having those steps outlined.

I think that's something I will play more with. Important observation is that I had to do it with Opus, not Sonnet, as discussing that with "dumber" model was meaningless.

Expand full comment
Shahar Barak's avatar

Hey Oskar, great piece - the Gabriel analogy made me laugh! But I think you're being too harsh based on just two attempts. Remember your first week with vim or Spring Boot? There's a real learning curve with AI tools that takes weeks to develop the right intuitions.

The 10x gains are real but domain-specific (boilerplate, tests, docs - not complex architecture). And needing detailed specs isn't a bug, it's a feature - it forces clear thinking upfront. The "micromanagement" frustration fades once you develop a rhythm, like learning pair programming.

Your key insight is the expectation mismatch: you wanted autonomous completion ("handle this boring work while I did something else") but these tools work best as collaborative partners. The developers getting value have learned to work with AI, not expect it to work for them. Give it more time - the investment does pay off, just not immediately.

Expand full comment
Szymon Bernad's avatar

> "Dude, check those series, they're great! I mean, the first five seasons might not seem impressive if you're not into such series, but watch it, the 6th season will be mindblowing!"<

I don't think that's how it should work...

If you invest so much time to change your habits and rewire your own personal brain to collaborate with AI assistant, you no longer can speak of it as productivity gains - it's just completely different activity. The main problem I have with this AI-hype is that it's still running in a limited-time-discount-mode, the Big Techs are trying to make a disruptive product out of it but don't know yet how and are really unsure about the actual price tag. The entire thing should be more human-centric and the actual costs (including environmental toll) should be clear from the beginning.

Expand full comment
Oskar Dudycz's avatar

Yup, that was also one of my goals of this article, this model is not sustainable also for LLM providers, they for now promising more that they can give, and still for discounted price. Maybe they will catch up at some point, but I'm doubting if LLM by its nature is actually a tool that could achieve that.

Expand full comment
Szymon Bernad's avatar

I'm really mad about how wasteful this hype is. IMO it shows in a nutshell everything that is wrong with this late stage of turbo-capitalism... And someone could rightfully argue that this is exactly how innovation is done these days.

Expand full comment
Oskar Dudycz's avatar

Actually, I was playing with it longer than a week, those were more a timeframe of those specific "experiments". I agree that I should try harder, that's what I will try, although tbf, I don't have much time to waste, so I need to be careful on using my time.

Definitely they're not yet even close to being autonomous.

Expand full comment
Alex P's avatar

Same experience

I've been using Copilot for a good while at work, Cursor and JetBrains AI (AI assistant and recently their new "agent" Junie)

Top achievements: copilot cuts me some boilerplate business scaffolding unit tests and adding logger calls

ChatGPT - using the pro version since its release and I can swear it becomes more and more lazy and dumb when it comes to coding. I wonder if this is done on purpose to be able to sell specialized coding assistant products easier

Expand full comment