People claim they get 10x productivity boosts with AI coding tools. After my recent experiments with Claude Code, I'm starting to think we're not using these tools the same way. Or that they’re just lying. Or both. See my story, and learn why you should not be like Gabriel!
I have very similar observations like you Oskar. I gave it a very simple task of improving code coverage in the existing codebase (small python project). I used Cursor. I was positively surprised by the automation factor. It could run certain commands to get current coverage percent. It could run/rerun the tests to check for improvements, etc. Then the magic started to fade away. It struggled to write improving tests. I directed it in some course and it could make it work. The sad part is ... I could have done it way faster without struggling to describe "the obvious" sometimes. I still prefer to have fun writing the code even the boilerplate :)
I agree too! Most of the time, few lines of code gets generated in seconds and I end up spending next few mins to correct it. I have only used github copilot. One thing it does well is writing unit test cases, in my case go-lang. It is not 100% correct, but most of the boiler-plate (ctrc+c, ctrl+v) code gets generated in bulk. No denying the fact these tools increased the productivity of an average developer, but that is a problem. The quality is poor, code is not performant. The developer who committed don't know how to tune it. Time saved while writing is lost during performance tuning. We had to rewrite API's after finding issues in performance testing.
"A lot of us came to programming to express our creativity. The puzzle-solving, the flow state, and the satisfaction of building something with our own hands."
I stopped using AI the moment I felt this slipping away, and then turned on AI when I realized how blatantly it was stealing from all artists.
I'm not using the tools the same way. What works for me is to think of AI as a "continuation machine" - once there's a good base, AI can build the rest.
So first what works:
1. Create another page in the GUI if the app already has multiple pages and good abstractions.
2. Connect to another backend endpoint.
3. Add a new event type to a system with slightly different UI.
4. Add new tests based on existing code, tests and fixtures.
5. Analyze code and suggest edge cases
6. Scan project to find specific flow or possible causes of a bug in the flow.
One debugging example that saved me hours was when AI found I had the wrong same site cookie policy. Another good catch it had was finding a missing "order" clause in an SQL query. These are small things but can be useful in larger projects.
Also when working with dynamically typed languages like Ruby or JavaScript AI doesn't have typos and makes almost no type or name mistakes.
What doesn't work:
1. AI doesn't think - so refactoring in general works terrible or not at all.
2. AI creates terrible tests because it doesn't understand the actual use case or edge cases. It will test too much or not the correct things.
3. AI is a terrible architect. It is too influenced by the existing code.
In general use AI as an advanced copy-paster and you'll find you can get work done much faster. Let it "think" for you and you'll just spend your life cleaning senseless code.
I agree that the productivity gains are inflated for the real case of production code -- not the vibe nature. To begin with, developers do not spend 100% of their time in the mechanical task of typing. Whatever gains we can get are on the portion of the development effort that mostly entails the actual code generation part.
I do not want to go in the comparison with junior dev but maybe to share that my perspective of investing time in the requirements to actually be a positive thing if taken properly.
I subscribe to the first solve the problem and then code, and spending some time to do exactly that helps you to understand, with LLM or not, what is path to be taken.
As long as you do that in a continuous and incremental fashion, like you would in any "agile" approach, the risk of big design upfront or rational rose/waterfall diminishes.
If I were to guess I would say the factor to me is more likely 1.25x than 10 but still positive. At least for business applications. I can't speak on using it for developing frameworks or other types.
Things that _helped_ me:
- having my coding / architecture standards documented in .md instruction files (what libraries to use, my definitions of entities, value objects, security, infrastructure etc)
- using a spec / PRD
- asking it to do small iterations and update an execution plan
Not perfect and sometimes enfuriating when something simple goes wrong but mostly positive.
I have very similar observations like you Oskar. I gave it a very simple task of improving code coverage in the existing codebase (small python project). I used Cursor. I was positively surprised by the automation factor. It could run certain commands to get current coverage percent. It could run/rerun the tests to check for improvements, etc. Then the magic started to fade away. It struggled to write improving tests. I directed it in some course and it could make it work. The sad part is ... I could have done it way faster without struggling to describe "the obvious" sometimes. I still prefer to have fun writing the code even the boilerplate :)
I agree too! Most of the time, few lines of code gets generated in seconds and I end up spending next few mins to correct it. I have only used github copilot. One thing it does well is writing unit test cases, in my case go-lang. It is not 100% correct, but most of the boiler-plate (ctrc+c, ctrl+v) code gets generated in bulk. No denying the fact these tools increased the productivity of an average developer, but that is a problem. The quality is poor, code is not performant. The developer who committed don't know how to tune it. Time saved while writing is lost during performance tuning. We had to rewrite API's after finding issues in performance testing.
"A lot of us came to programming to express our creativity. The puzzle-solving, the flow state, and the satisfaction of building something with our own hands."
I stopped using AI the moment I felt this slipping away, and then turned on AI when I realized how blatantly it was stealing from all artists.
I'm not using the tools the same way. What works for me is to think of AI as a "continuation machine" - once there's a good base, AI can build the rest.
So first what works:
1. Create another page in the GUI if the app already has multiple pages and good abstractions.
2. Connect to another backend endpoint.
3. Add a new event type to a system with slightly different UI.
4. Add new tests based on existing code, tests and fixtures.
5. Analyze code and suggest edge cases
6. Scan project to find specific flow or possible causes of a bug in the flow.
One debugging example that saved me hours was when AI found I had the wrong same site cookie policy. Another good catch it had was finding a missing "order" clause in an SQL query. These are small things but can be useful in larger projects.
Also when working with dynamically typed languages like Ruby or JavaScript AI doesn't have typos and makes almost no type or name mistakes.
What doesn't work:
1. AI doesn't think - so refactoring in general works terrible or not at all.
2. AI creates terrible tests because it doesn't understand the actual use case or edge cases. It will test too much or not the correct things.
3. AI is a terrible architect. It is too influenced by the existing code.
In general use AI as an advanced copy-paster and you'll find you can get work done much faster. Let it "think" for you and you'll just spend your life cleaning senseless code.
I agree that the productivity gains are inflated for the real case of production code -- not the vibe nature. To begin with, developers do not spend 100% of their time in the mechanical task of typing. Whatever gains we can get are on the portion of the development effort that mostly entails the actual code generation part.
I do not want to go in the comparison with junior dev but maybe to share that my perspective of investing time in the requirements to actually be a positive thing if taken properly.
I subscribe to the first solve the problem and then code, and spending some time to do exactly that helps you to understand, with LLM or not, what is path to be taken.
As long as you do that in a continuous and incremental fashion, like you would in any "agile" approach, the risk of big design upfront or rational rose/waterfall diminishes.
If I were to guess I would say the factor to me is more likely 1.25x than 10 but still positive. At least for business applications. I can't speak on using it for developing frameworks or other types.
Things that _helped_ me:
- having my coding / architecture standards documented in .md instruction files (what libraries to use, my definitions of entities, value objects, security, infrastructure etc)
- using a spec / PRD
- asking it to do small iterations and update an execution plan
Not perfect and sometimes enfuriating when something simple goes wrong but mostly positive.