My First Experiences with Coding Agents
- Dom

My First Experiences with Coding Agents

I actually don’t remember when I signed up for ChatGPT (free) or Claude, but I still remember when I started to use coding agents.

August 2025

I’ve just changed the department and joined a new team, when the very first task of the team was to try out coding agents to create a completely new service from scratch. This was so exciting for me. Before that, I used ChatGPT as a better and personalized Google Search and GitHub Copilot in IntelliJ as better auto-suggestion.

For me, the idea of an AI tool completely creating a new service was unfamiliar and strange at the same time. I don’t remember, which emotions I had back then, but I could imagine that I felt also a little bit frustrated and scared, if coding agents are about to replace human software engineers.

The problem we wanted to solve

The story was about a new service that managers numbers. Sounds trivial, right? It had some certain constraints: Numbers are year-, tenant- and user-bound, meaning that a user can’t declare the same number twice for a certain year. Also, the users should be able to declare numbers freely in a certain range and there should also be a feature to generate the next consecutive n numbers. When a fragmented range is exhausted, it should roll over and return the next available numbers, not necessarily consecutive.

The procedure

I joined the team at the time, when they had created the scaffold manually (AWS CDK based infrastructure, empty Spring Boot service in Kotlin, logging and authentication). A colleague of mine, let’s call him Gabriel, was very advanced with AI tools and took over the lead. He refined the user story with ChatGPT to finally come up with a proposal. He used this proposal to feed it to Claude Code (with Bedrock) and built it in one shot.

After that, the team spent days of reviewing the code and improving it manually. We did lots of manual testing too, because we didn’t trust the implementation. And we were right. We identified a conceptual issue that leads to degrading performance when the database gets full.

The system keeps track of the already declared number and calculated the free ranges by querying these numbers. The more numbers are declared, the more items are returned in the database query and the worse the performance.

Manual refactoring

The next couple of days were dedicated to a refactoring of the AI generated solution. We tried to fix the performance issue with some tweaks, but couldn’t resolve it. It was simply a bad design choice to calculate free gaps based on declared numbers.

AI to the rescue

After we had finally resigned to improve the performance with refactorings, we decided to create a completely new concept and let the AI agent reimplement the logic again. This time, we came up with an approach to save the free ranges besides the declared numbers. The ranges are used to calculate the next free n numbers, and it was no surprise to us, that this time the performance was stable on increasing database size.

It was definitely our fault to blindly rely on the design decisions by the agent. I think it had happened, because we didn’t tell the agent about non-functional requirement and performance constraints, when creating the implementation proposal. Maybe the limitations of the model at that time was a factor, too.

I learned from that experiment that you really need to be careful and can’t trust AI generated code blindly. I also believed that with my current code review approach and toolset, it’s quite hard to review AI generated code.

Still in August 2025

Obsessed by the issues that we had when we reviewed the AI generated code, I started on a Tech Day 1 to create my own code review tool, tailored for reviewing AI generated code. The main idea of the tool was to have a single place, where your review comments are stored. After a review has been done, these comments could be exported as an input prompt for a coding agent.

I used a vibe-coding approach. I pretended to care about code and code quality, but the reality was I didn’t. Over the time, feature creep happened, and the code was not maintainable anymore. The project ended, as all of my projects, on the graveyard of unfinished and unreleased projects.

I learned again the hard way that you really need to understand the code and the architecture, and you must improve it constantly, otherwise it will degenerate over time and is not maintainable anymore.

November/December 2025

I don’t know exactly why I’ve chosen this date, but I remember that around that time I rarely wrote code by myself. I used now Codex with OpenAI models to write my code. This went on for a couple of weeks until my colleagues learned about it. Some of them were really surprised and others (Gabriel) confirmed that he basically did the same.

This was the starting point for me to think about a team-wide AI strategy. In follow-up episodes, I’ll show you our engineering process before AI and also in detail how we started to use AI commonly within the team for various of tasks.

Footnotes

  1. Tech Days is once a month when engineers can spent time for learning.