Can We Skip TDD with Modern AI? A Context Experiment

Dec 09, 2025

The Hook

Recently, some colleagues pitched me an idea: “Today, LLMs are so powerful, you can start exactly from implementation and it will work well. No need to use TDD or other more complicated XP techniques”.

It is a tempting thought. If an AI can generate a complete feature in seconds, is my approach—always start from a test—still relevant?.

I decided to check it. I ran an experiment to see if I could implement a complex feature by describing the task and letting GenAI create the application. My hypothesis was that TDD is still vital, but I wanted to see if the “Just Do It” method could prove me wrong.

The result? I confirmed exactly what I expected: TDD is one of the best ways to create context for an LLM.

Personal Context & Tools

For this experiment, I returned to a project I started in a previous article: “Does AI Need Clear Goals? My Experiment in Turning Vague Ideas into Code”.

My tool of choice was GPT-4.1 (via GitHub Copilot), utilizing its Agent mode to handle multi-file context. Usually, I treat the AI as a pair programmer, following structured collaboration methods I’ve discussed in “Pair-Authoring with an AI: A Case Study in Structured Collaboration”.

But for this session, I acted as a “manager,” giving requirements and approving plans, but explicitly skipping the “Red” phase of TDD. I let the AI write the code first.

The Failed Experiment

The task was Story #2346: Implement a “Day of Week Pricing Plan”. The requirements were clear: users needed to compare power usage costs based on the day of the week and rank price plans accordingly.

I approved the AI’s plan and let it generate the implementation. Here is where the “No TDD” approach started to show its cracks.

1. The “Ghost Method” Problem After the AI implemented the service layer, my IDE lit up with errors. The AI used a method getDayOfWeekMultiplier(DayOfWeek) that didn’t exist. It “hallucinated” a method on the domain object because it was writing the service in isolation. I am usually fine with “Red” code, but this wasn’t TDD “Red”—this was just broken code requiring immediate fixes.

2. The Regression Nightmare When we fixed the missing method, we broke the existing logic.

PricePlanTest > shouldReceiveMultipleExceptionalDateTimes() FAILED

Because we implemented the new logic over the old logic without a guiding test, the AI introduced regressions. We had to do several iterations just to get back to a baseline.

3. The Context Disconnect The real struggle happened during Functional Testing. I asked the AI to verify the endpoints. It generated a test that tried to hit the API, but it returned a 404 Not Found. Why? The AI created a test that queried a Smart Meter ID, but “it didn’t have a context!”. It forgot that in this application, a Smart Meter must be linked to a Price Plan via the AccountService first. The AI tried to guess the solution, attempting to call an API /account/link/{smart-metter-id} that didn’t even exist.

Principles That Actually Work

I eventually finished the task without TDD, but it required multiple rollbacks and context corrections. Through this struggle, I confirmed why TDD works:

Principle 1: Tests Are Context Anchors The reason the AI failed the functional test setup was a lack of context. If I had written the test first, I would have been forced to set up the AccountService association immediately. The failing test provides the AI with a strict “Context Window” of what is required, as I explored in “The Context Window Paradox”.

Principle 2: Small Steps Prevent “Imagination” When the AI doesn’t have enough context, it tries to imagine the answer. TDD forces small, verifiable steps. By skipping the test, I forced the AI to generate a large chunk of logic (Controller + Service) at once, increasing the surface area for hallucinations.

Unexpected Discovery

The most painful part of skipping TDD wasn’t the coding—it was the debugging.

When I finally added tests after the implementation to verify the logic, one failed with a confusing error:

Expecting actual: {FRIDAY=[...]} to contain key: MONDAY

This revealed a critical weakness of the “Test After” approach. When a test fails, you don’t know where the problem is: “In the tests or in the business logic.”. It turned out to be an error in the test data (the date provided was a Friday, not Monday). If I had written the test first, the AI would have generated the implementation based on that test data. We wouldn’t have had this problem at all.

The Central Paradox

We tend to think that as AI gets smarter, we can think less. I touched on this in “Can We Think Less with AI?”.

But this experiment confirmed a paradox: To move faster with AI, you must slow down enough to write the test.

Can we avoid the loops of small context errors? Yes. TDD reduces complexity and creates trust between us and the AI . The test acts as a contract. Without it, you are just hoping the AI guesses your architectural constraints correctly.

Forward-Looking Conclusion

So, can we skip TDD? Yes, but you will spend more time adding additional context manually.

The power of TDD is approaching a new peak in the AI era: tests create a POWERFUL CONTEXT for LLMs. Modern models like GPT-4 are powerful, but “better LLM, not exclude context from that function”.

If you want to get the most out of your AI teammate, don’t just ask it to write code. Give it a failing test.

Nik Malykhin

Discussion about this post

Ready for more?