Nik Malykhin

Designing AI-Driven Development Workflows

Tue, 30 Jun 2026 07:01:49 GMT

Evaluating the operational efficiency of engineering workflows is essential when integrating advanced language models into daily development cycles. To understand the practical boundaries of autonomous code generation, I conducted an implementation experiment using the GPT-5.3Codex model. The objective was to complete a medium-sized user story involving the integration of SendGrid template rendering and storage capabilities into an established email notification use case.

This assessment contrasts two distinct methodologies: an out-of-the-box framework known as spec-kit, and a lightweight alternative designated as the custom workflow. The custom workflow utilizes chat mode during the initial kick-off and planning stages to generate specific tasks, subsequently shifting to Codex for the explicit implementation of those tasks guided by an AGENT.md operational file and localized skills. The goal is to determine whether spec-kit provides immediate utility without modification or if its inherent structural characteristics necessitate explicit customization.

Architectural Slicing and Pull Request Topography

The structural composition of code updates significantly influences the sustainability of continuous integration pipelines. During the experiment, the custom workflow isolated changes into modular components that aligned directly with my default hexagonal architecture. This approach generated six discrete pull requests. The median size of these code updates remained between three and four files, with the most extensive single update containing five files. Every file generated by this workflow contained exclusively functional implementation code, eliminating secondary artifact noise.

In contrast, the spec-kit framework approached the user story through vertical slicing, attempting to package complete functional business capabilities into each cycle. This strategy yielded four pull requests, but the internal volume of these updates was substantially larger, averaging seven to eight files per pull request. The most expansive update within this set encompassed thirteen distinct files.

From an engineering operations perspective, managing large pull requests introduces definitive maintenance challenges. Reviewing a thirteen-file modification requires deep contextual immersion and can easily exhaust a multi-hour block of defensive engineering time. Conversely, integrating five to seven highly compact pull requests throughout a standard working day introduces negligible cognitive friction, provided the changes remain small and structurally isolated. Notably, both workflows initially introduced an identical rendering bug involving SendGrid template helpers, which required a targeted corrective commit. This suggests that the structural layout of the pull requests, rather than initial code accuracy, serves as the primary differentiator in developer friction.

Discovery Mechanisms and Cognitive Loading

The preparation phase exposes a stark contrast in the type of mental energy required by each workflow. The custom workflow relies heavily on an interactive discovery process during the chat-based kick-off. The model proactively initiated a clarification and planning dialogue to map out the implementation requirements before generating the discrete tasks for Codex.

This interactive session translated into a substantial textual footprint. The initial clarification phase required three distinct iterations of questioning and answering, totaling eight pages and 2,281 words. This was immediately followed by the planning phase, which required two subsequent iterations and produced an additional eight pages and 2,370 words. Cumulatively, this chat dialogue generated 16 pages of standard layout text, or 4,651 words. Assuming an average conversational rate of 150 words per minute, this preparatory phase equates to a thirty-minute collaborative pair-programming session.

The spec-kit framework approaches preparation through localized, static document synthesis rather than ongoing verbal dialogue. Before initiating code generation, the tool compiled eight distinct analytical documents within the specs directory.

An examination of the generated specs directory reveals how this text is distributed across individual documents. The requirements checklist contains 149 words, while the OpenAPI contract specifying the interface changes takes up 102 words. The data model specification consists of 211 words, and the implementation plan spans 409 words. Additionally, the quickstart document contains 140 words, the research summary covers 260 words, the comprehensive technical specification comprises 1,021 words, and the final task breakdown document details 1,540 words.

The documentation total matches the 16-page volume of the conversational workflow but contains 3,832 words of highly dense technical material. When applying an analytical reading standard of 75 words per minute for complex documentation, reviewing this output demands roughly 50 minutes of solitary, rigorous technical analysis. This calculation excludes the initial setup interactions required to seed the tool.

Insight: Engaging in a collaborative, bidirectional technical dialogue yields lower cognitive fatigue than parsing dense, machine-generated analytical documentation independently. The conversational format allows an engineer to guide the discovery path dynamically, whereas the document-heavy approach demands prolonged, solitary code-review stamina.

Estimation Metrics and Delivery Impact

To contextualize project velocity, I utilize a standard estimation scale where one point equates to a minor task, three points represent half of a development iteration, and five points correspond to a full iteration block. Historically, the targeted user story would receive an empirical estimate of three story points.

By utilizing either AI-driven development environment, the effective complexity of the implementation dropped significantly, allowing the story to be re-estimated at two points. This finding aligns with observations gathered over a multi-month period: the strategic application of generative models consistently removes approximately one story point from medium-sized requirements.

However, this efficiency gain exhibits a clear non-linear trend when applied to larger tasks. A single-point reduction on a highly complex, five-point user story does not alter the fundamental delivery architecture or allow the task to be decomposed more effectively. For larger software initiatives, the exact return on investment provided by these autonomous tools requires further empirical evaluation.

Long-Term Repository Maintenance and Documentation Bloat

A critical consideration when adopting spec-kit out of the box is the long-term structural health of the code repository. Generating eight non-service documentation files for a single medium-sized user story introduces a noticeable maintenance tail.

Consider a baseline engineering department consisting of three to four development pairs. If these pairs collectively deliver approximately three completed user stories per development iteration across 26 annual iterations, the repository configuration changes dramatically over time. Under the unmodified spec-kit framework, this delivery velocity results in the accumulation of roughly 600 non-service Markdown and YAML files every year. Managing the lifecycle, accuracy, and relevance of hundreds of static documentation files creates an administrative burden that can quickly devalue the initial velocity gains of automated generation.

Chronological Integration Patterns

The Custom Workflow Evolution

The custom workflow distributed code modifications across six isolated, single-purpose commits containing exclusively functional code. The sequence began with a five-file commit implementing active SendGrid template retrieval, which introduced the core repository interface, its SendGrid implementation, an exception for missing templates, a version value object, and a corresponding repository error test. Next, a four-file commit introduced the Handlebars email template renderer by modifying the build configuration and adding the renderer service, the rendered email domain model, and the renderer test suite.

The third step was a three-file commit handling the storage of the rendered template within the document management system, which impacted the primary use case, the consumer contract test, and the use case test. To address a rendering bug, a two-file corrective commit added explicit support for SendGrid Handlebars helpers within the core rendering logic. This was followed by a three-file commit introducing global exception mapping using an exception handler advice and its corresponding integration test. The evolution concluded with a three-file commit aggregating final integration and regression verifications across the controller and use-case boundaries.

The Spec-Kit Framework Evolution

The spec-kit framework grouped its operations into broader, multi-file updates that combined documentation and implementation boundaries. The process opened with an eight-file initial commit compiling the prerequisite requirements, OpenAPI specifications, data models, plans, quickstart guides, research notes, technical specifications, and task manifests within the specs directory.

This was followed by a thirteen-file monolithic commit deploying the dynamic template rendering architecture, which simultaneously modified the build configuration, the task checklist, the exception handler advice, the SendGrid repository implementation, the missing template exception, the document management system store request, the rendered email domain model, the core use case, and their associated tests. The third phase was a four-file verification commit introducing test coverage for document management system failure scenarios. The cycle concluded with a five-file verification commit ensuring proper handling of missing templates, which updated integration tests, contract verifications, and serialization payloads.

Final Assessment: To Customize or Adopt As-Is

Returning to the original operational query: can spec-kit be utilized effectively without modification? The data suggests that an out-of-the-box deployment introduces distinct operational trade-offs that make customization necessary for long-term health.

While spec-kit succeeds in lowering short-term delivery complexity, its vertical slicing strategy creates overly large pull requests that challenge standard daily review workflows. Furthermore, the generation of extensive static documentation introduces systemic repository bloat that scales poorly across multiple engineering teams.

The ideal path forward requires a hybrid architecture. By customizing spec-kit to inherit the structural instructions of the custom workflow, we can merge the systematic rigor of automated planning with the clean, highly isolated pull request structure required by hexagonal architectures. Future efforts will focus on implementing custom skills within the agent configuration to restrict the generation of non-service files while preserving shared context between the conversational interface and the underlying code generation engine.

The Non-Transactional Reality of PostgreSQL Sequences

Tue, 23 Jun 2026 07:00:59 GMT

The Expectation of Monotonicity in Order Systems

When building an order management pipeline, the primary objective is to capture, validate, and permanently store transactional records such as customer purchases, financial ledgers, or invoices. This system represents a comprehensive infrastructure architecture rather than a simple database configuration because it operates as a multi-layered distributed pipeline. In a typical production environment, this architecture encompasses web servers ingesting thousands of concurrent requests, connection pools regulating database lifecycles, and downstream services like fulfillment, inventory, and accounting that ingest this data via asynchronous message queues.

To ensure absolute tracking, auditing, and predictability across these decoupled architectural boundaries, a common engineering assumption is that these transaction records will possess sequentially ordered identifiers, moving uniformly from one integer to the next without omission. In my implementation, I utilized PostgreSQL with a primary key defined as a big integer generated by default as an identity. This approach is widely recognized for its enterprise stability and seamless integration within robust backend data ecosystems.

During routine disaster recovery drills, however, the monitoring logs revealed an unexpected pattern in the primary key sequence. Instead of a continuous, gapless progression, the identifiers exhibited distinct omissions, appearing as a broken sequence with missing elements. A manual audit verified that no records were lost; every transaction was accounted for, yet the identifiers contained significant gaps. This discovery prompted a detailed investigation into the core mechanics of sequence manipulation within the PostgreSQL engine.

Simulating the Anomalies: Forward and Backward Jumps

To isolate the root causes of these numerical omissions, I constructed a controlled replication environment using Kotlin and the Exposed framework to simulate various infrastructure failure states.

The Forward Jump and the Mechanics of the Write-Ahead Log

The first scenario reproduces a sudden infrastructure termination. The configuration initializes a sequence and captures the initial increment within a standard database transaction block.

transaction {
    exec(”CREATE SEQUENCE seq;”)
    val firstVal = exec(”SELECT nextval(’seq’);”) { rs ->
        rs.next()
        rs.getLong(1)
    }

    println(”Initial value: $firstVal”)
}

Once the initial value is confirmed as one, the application executes an ungraceful process termination at the operating system level, targeted directly at the backend process identifier associated with the current database session.

fun triggerDatabaseProcessCrash() {
    transaction {
        val pid = exec(”SELECT pg_backend_pid();”) { rs ->
            rs.next()
            rs.getInt(1)
        }

        Runtime.getRuntime().exec(”kill -9 $pid”)
    }
}

Following this abrupt termination, the Kotlin application encounters a communication exception or a transient connection exception as the underlying connection pool loses its link to the server. Upon the re-establishment of a stable connection to the database instance, a subsequent call to the sequence reveals a substantial forward leap rather than the expected single increment.

transaction {
    val nextVal = exec(”SELECT nextval(’seq’);”) { rs ->
        rs.next()
        rs.getLong(1)
    }
    println(”Value after crash recovery: $nextVal”)
}

The resulting output yields a value of 34. This behavior is directly attributable to an internal optimization parameter within the PostgreSQL source code, governed by a pre-allocation macro.

To minimize persistent disk write frequency and maximize concurrent scalability, the engine pre-allocates a block of 32 sequence values by default and logs this allocation to the Write-Ahead Log. When an ungraceful shutdown occurs, the remaining unassigned values within that cached block are permanently lost, causing the sequence to resume from the boundary of the subsequent pre-allocated block during recovery.

The Backward Jump and Uncommitted States

An even more perplexing anomaly occurs when a sequence appears to move backward following a critical system failure. This state can be demonstrated by advancing a sequence multiple times within a single transaction block without executing a formal commit statement, followed by an immediate hard process termination.

import org.jetbrains.exposed.sql.transactions.transaction
import org.jetbrains.exposed.sql.exec

fun demonstrateBackwardsJump() {
    transaction { exec(”CREATE SEQUENCE seq;”) }

    transaction {
        val v1 = exec(”SELECT nextval(’seq’);”) { it.next(); it.getLong(1) }
        val v2 = exec(”SELECT nextval(’seq’);”) { it.next(); it.getLong(1) }
        val v3 = exec(”SELECT nextval(’seq’);”) { it.next(); it.getLong(1) }
        println(”Sequence values in-transaction: $v1, $v2, $v3”)
        val pid = exec(”SELECT pg_backend_pid();”) { it.next(); it.getInt(1) }
        Runtime.getRuntime().exec(”kill -9 $pid”)
    }
}

Upon reconnecting to the database and invoking the next value, the system returns a value of one. This behavior emphasizes that engine sequences operate entirely outside standard transactional boundaries. While individual sessions track these increments locally during an active transaction, the underlying values are discarded during a hard crash because they were never permanently etched into the Write-Ahead Log as a committed state.

Evaluating Alternatives: The Flawed Custom Counter Workaround

In an attempt to bypass the inherent gaps associated with standard database sequences, a developer might consider implementing a custom identity counter utilizing standard transactional tables and functions. The implementation typically involves creating an explicit sequence tracking table and an atomic update function.

CREATE TABLE MY_SEQ (ID BIGINT NOT NULL);

INSERT INTO MY_SEQ (ID) VALUES (0);

CREATE FUNCTION NEXT_VAL() RETURNS BIGINT
    LANGUAGE SQL AS
‘UPDATE MY_SEQ SET ID = ID + 1 RETURNING ID’;

While this mechanism successfully eliminates numerical gaps by utilizing the standard transactional engine, it introduces a massive performance penalty that makes it unviable for high-throughput applications. When the custom function executes the update statement, PostgreSQL applies an exclusive row-level lock to that single row within the tracking table. Consequently, every concurrent transaction across the entire application ecosystem must wait in a strict, single-file queue to obtain a new identifier.

If a single transaction requires 100 milliseconds to process its internal business logic prior to committing, all other concurrent threads are completely blocked for that duration. In a high-concurrency production environment, this structural bottleneck rapidly triggers database connection timeouts, thread starvation, and severe application latency. The trade-off between absolute numerical continuity and system throughput represents a deliberate choice where performance must be prioritized.

Architectural Best Practices for Kotlin Applications

Managing non-transactional sequence behavior within Kotlin services requires a deliberate approach to application architecture and data flow design.

Essential Insight: Database sequences must be treated as internal, transient optimization helpers rather than durable, externally accurate identifiers.

When developing services that interface with PostgreSQL sequences, specific architectural practices must guide the implementation to prevent data degradation across the broader system. It is vital to never expose or distribute an identifier generated by a sequence to external systems, such as asynchronous message brokers or user-facing REST responses, until the enclosing database transaction has been successfully committed. If a system failure or network interruption occurs prior to the final commit, the sequence value is permanently discarded, leading to data inconsistencies or dangling references within external architectures.

Furthermore, structuring database transaction boundaries to be as narrow and short-lived as possible limits the window of vulnerability for process crashes and mitigates the risk of unexpected numerical anomalies. If the core business domain dictates a strict requirement for guaranteed, immutable, and gapless identifiers that must survive catastrophic infrastructure failures, database-driven sequences must be abandoned entirely. In such scenarios, transitioning to high-resolution Universally Unique Identifiers or deploying a dedicated identity reservation ledger specifically engineered to preserve state integrity across volatile failure scenarios provides the necessary durability.

Human Overwatch in AI Code Generation

Tue, 09 Jun 2026 07:01:54 GMT

Designing a Duplicate Protection Hashing Service

In a recent architectural assignment, I was tasked with implementing a duplicate-protection data hashing service for a high-throughput Kotlin application. This service operates within an enterprise runtime environment utilizing Java 21 and is deployed on AWS ECS. The fundamental requirement was to convert incoming JSON payloads into a stable, unique string representation to prevent duplicate processing within the system pipeline.

Initially, I gravitated toward the MD5 algorithm. This selection was informed by a historical performance convention, assuming MD5 would minimize latency for an internal data validation check where cryptographic security was not a driving requirement. Because I routinely leverage generative artificial intelligence to streamline production workflows, I provided an AI assistant with a prompt detailing these exact constraints: a high-throughput Kotlin service tasked with producing an MD5 hash from an incoming JSON payload.

The Baseline AI-Generated Implementation

The generative AI model delivered exactly what was requested based on those initial assumptions. It produced a complete, functional component wrapper that mapped incoming data to an MD5 hash. Because the initial prompt did not ask the AI to evaluate potential platform regressions or optimize memory allocations, the model generated a literal translation of the requested functional requirements, complete with standard boilerplate utilities.

import com.fasterxml.jackson.databind.JsonNode
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.databind.node.ArrayNode
import com.fasterxml.jackson.databind.node.ObjectNode
import org.springframework.stereotype.Component
import java.nio.charset.StandardCharsets
import java.security.MessageDigest

@Component
class HashGenerator(
   private val objectMapper: ObjectMapper,
) {
   fun generate(
       conversationId: String?,
       payload: JsonNode,
   ): String {
       val normalizedConversationId = conversationId.orEmpty()
       val canonicalPayloadBytes = objectMapper.writeValueAsBytes(canonicalize(payload))
       val conversationBytes = normalizedConversationId.toByteArray(StandardCharsets.UTF_8)

       val md5 = MessageDigest.getInstance(MD5_ALGORITHM)
       md5.update(conversationBytes)
       md5.update(canonicalPayloadBytes)
       return md5.digest().toHexString()
   }

   private fun ByteArray.toHexString(): String {
       val chars = CharArray(size * 2)
       forEachIndexed { index, byte ->
           val value = byte.toInt() and 0xFF
           chars[index * 2] = HEX_CHARS[value ushr 4]
           chars[(index * 2) + 1] = HEX_CHARS[value and 0x0F]
       }
       return String(chars)
   }

   private fun canonicalize(node: JsonNode): JsonNode =
       when {
           node.isObject -> {
               val objectNode = node as ObjectNode
               val sortedFields =
                   objectNode
                       .fields()
                       .asSequence()
                       .toList()
                       .sortedBy { it.key }
               val canonicalObject = objectMapper.nodeFactory.objectNode()
               sortedFields.forEach { (key, value) ->
                   canonicalObject.set(key, canonicalize(value))
               }
               canonicalObject
           }
           node.isArray -> {
               val arrayNode = node as ArrayNode
               val canonicalArray = objectMapper.nodeFactory.arrayNode()
               arrayNode.forEach { item ->
                   canonicalArray.add(canonicalize(item))
               }
               canonicalArray
           }
           else -> node
       }

   private companion object {
       const val MD5_ALGORITHM = "MD5"
       val HEX_CHARS = "0123456789abcdef".toCharArray()
   }
}

The Conflict of Speed Versus Security

While the code executed correctly in testing environment, it triggered a critical security flag during static code analysis in SonarQube. Our internal security champion mandated an immediate transition to SHA-256, citing systemic software vulnerabilities associated with MD5 collision risks. This requirement instigated a broader team discussion regarding the trade-offs between processing speed and cryptographic security within microservices.

To resolve this conflict, I conducted a deeper investigation into the execution paths of the hashing utility. The findings completely reframed the problem space. On a modern Java 21 runtime running on optimized cloud infrastructure, the execution variance between MD5 and SHA-256 is structurally negligible. The true computational bottlenecks were located within the data preprocessing layers rather than the mathematical operations of the message digest.

Key Insight: Upgrading an algorithm to meet security compliance parameters rarely degrades system performance if the surrounding data manipulation logic remains unoptimized. The true latency hotspots frequently reside in object serialization and memory allocation patterns.

Identifying the True Microbenchmarking Hotspots

The profiling data isolated three specific architectural execution risks within the original code structure:

The canonicalization routine introduced deep recursion. Converting JSON fields into sequences, collecting them into lists, and sorting them generated an unsustainable volume of short-lived heap objects. This structure risks triggering frequent JVM Garbage Collection pauses under high throughput.
Jackson serialization via the writeValueAsBytes function consumed substantially more CPU cycles than any subsequent hashing operation. Transforming a newly instantiated object graph into a raw byte array is computationally expensive.
The manual byte-to-hex manipulation loop, while functional, missed the low-level optimizations provided by modern platform utilities.

Implementing Immediate Algorithmic and Structural Upgrades

The first step involved addressing the security non-compliance while cleaning up the obvious inefficiencies. I replaced the manual hex encoding with the native HexFormat utility introduced in Java 17 and further optimized in Java 21. Concurrently, I transitioned the algorithm to SHA-256, which leverages hardware acceleration on contemporary processors.

import com.fasterxml.jackson.databind.JsonNode
import com.fasterxml.jackson.databind.ObjectMapper
import org.springframework.stereotype.Component
import java.security.MessageDigest
import java.util.HexFormat

@Component
class HashGenerator(
    private val objectMapper: ObjectMapper,
) {
    private val hexFormatter = HexFormat.of()

    fun generate(
        conversationId: String?,
        payload: JsonNode,
    ): String {
        val normalizedConversationId = conversationId.orEmpty()
        val canonicalPayloadBytes = objectMapper.writeValueAsBytes(canonicalize(payload))
        val conversationBytes = normalizedConversationId.toByteArray(java.nio.charset.StandardCharsets.UTF_8)

        val sha256 = MessageDigest.getInstance(SHA256_ALGORITHM)
        sha256.update(conversationBytes)
        sha256.update(canonicalPayloadBytes)
        
        return hexFormatter.formatHex(sha256.digest())
    }
}

Optimizing the Canonicalization Routine

Resolving the security alert was an essential compliance milestone, but achieving production-grade execution required a total refactoring of the canonicalize function. To eliminate high allocation rates and latency spikes, I rewrote the structural transformation logic to treat heap memory defensively.

import com.fasterxml.jackson.databind.JsonNode
import com.fasterxml.jackson.databind.node.ArrayNode
import com.fasterxml.jackson.databind.node.ObjectNode
import java.util.TreeMap

private fun canonicalize(node: JsonNode): JsonNode =
    when {
        node.isObject -> {
            val sortedMap = TreeMap()
            val fieldsIterator = node.fields()
            while (fieldsIterator.hasNext()) {
                val entry = fieldsIterator.next()
                sortedMap[entry.key] = canonicalize(entry.value)
            }
            ObjectNode(objectMapper.nodeFactory, sortedMap)
        }
        node.isArray -> {
            val canonicalArray = objectMapper.nodeFactory.arrayNode(node.size())
            for (item in node) {
                canonicalArray.add(canonicalize(item))
            }
            canonicalArray
        }
        else -> node
    }

Strategic Improvements in Memory Management

The architectural enhancements within the refactored canonicalization pipeline are governed by four distinct design choices across key subsections.

Automatic Sorting via TreeMap

The original logic explicitly pulled object fields into a Kotlin sequence, forced them into a temporary list, and executed a sorting lambda. The optimized approach streams fields directly into a java.util.TreeMap. Operating on a red-black tree architecture, the TreeMap inherently handles alphabetical key sorting upon element insertion, completely eliminating intermediate collection lifecycles.

Elimination of Lambda Allocations

Chains of functional methods like asSequence, toList, and sortedBy generate short-lived operational objects behind the scenes. In a high-throughput Spring Boot architecture, these objects increase the allocation rate and burden the garbage collector. Replacing functional abstractions with explicit while and for loops guarantees zero closure allocations inside the iteration logic.

Pre-Sized Array Allocation

The initial implementation initialized the array node wrapper using an empty factory declaration. By default, Jackson instantiates an underlying storage array with a conservative capacity constraint. When parsing highly populated arrays, the JVM is forced to repeatedly suspend execution to reallocate memory and migrate elements. Explicitly defining the initialization size via node.size prepares the exact memory requirements upfront.

Direct Constructor Instantiation

The default initialization sequence of a Jackson ObjectNode instantiates an internal LinkedHashMap before receiving data updates via the set method. The revised approach utilizes a public constructor that directly accepts the pre-populated TreeMap, reducing the required object instantiation operations by half.

The Production-Ready Hash Service

Combining these algorithmic upgrades and memory management adjustments results in a secure, performant, and enterprise-grade component.

import com.fasterxml.jackson.databind.JsonNode
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.databind.node.ArrayNode
import com.fasterxml.jackson.databind.node.ObjectNode
import org.springframework.stereotype.Component
import java.security.MessageDigest
import java.util.HexFormat
import java.util.TreeMap

@Component
class HashGenerator(
    private val objectMapper: ObjectMapper,
) {
    private val hexFormatter = HexFormat.of()

    fun generate(
        conversationId: String?,
        payload: JsonNode,
    ): String {
        val normalizedConversationId = conversationId.orEmpty()
        val canonicalPayloadBytes = objectMapper.writeValueAsBytes(canonicalize(payload))
        val conversationBytes = normalizedConversationId.toByteArray(java.nio.charset.StandardCharsets.UTF_8)

        val sha256 = MessageDigest.getInstance(SHA256_ALGORITHM)
        sha256.update(conversationBytes)
        sha256.update(canonicalPayloadBytes)
        
        return hexFormatter.formatHex(sha256.digest())
    }

    private fun canonicalize(node: JsonNode): JsonNode =
        when {
            node.isObject -> {
                val sortedMap = TreeMap()
                val fieldsIterator = node.fields()
                while (fieldsIterator.hasNext()) {
                    val entry = fieldsIterator.next()
                    sortedMap[entry.key] = canonicalize(entry.value)
                }
                ObjectNode(objectMapper.nodeFactory, sortedMap)
            }
            node.isArray -> {
                val canonicalArray = objectMapper.nodeFactory.arrayNode(node.size())
                for (item in node) {
                    canonicalArray.add(canonicalize(item))
                }
                canonicalArray
            }
            else -> node
        }

    private companion object {
        const val SHA256_ALGORITHM = "SHA-256"
    }
}

The Intersect of Generative Artificial Intelligence and Enterprise Engineering

This optimization exercise highlights a critical reality regarding the application of generative artificial intelligence within enterprise software development. The initial AI-generated code was not technically broken. It accurately realized the precise constraints of the original prompt: it calculated an MD5 hash over an object payload. The defect was rooted in my own outdated assumptions about cryptographic overhead and the omission of strict platform analysis in the initial prompt requirements.

Takeaway: Generative artificial intelligence operates as an exceptional execution mechanism, but it lacks the contextual capacity to independently enforce enterprise-grade performance boundaries without human engineering overwatch. The true value of the technology lies in its capacity to serve as an interactive learning accelerator, contracting traditional research cycles from hours down to a matter of minutes.

When AI Breaks Database Parity

Nik — Tue, 02 Jun 2026 07:03:12 GMT

The Landscape of Database Selection and the Integration Testing Paradigm

According to global database engine rankings, relational models continue to dominate the software development landscape. The top positions are consistently occupied by Oracle, MySQL, Microsoft SQL Server, and PostgreSQL, with MongoDB following closely as the primary document-oriented alternative. In my own architectural designs, PostgreSQL serves as the primary relational database engine, complemented by Amazon Web Services S3 for object storage.

Previously, I explored the complexities of managing database migrations with Flyway. Today, I want to extend that conversation to address database integration testing and the critical requirement of environmental parity. For a considerable duration within Java and Kotlin development stacks, the H2 database engine served as the standard default for local execution and integration testing. As an in-memory, runtime-configured database, H2 provides seamless integration with the Spring Framework and requires zero external infrastructure installation. The engine also supports a dedicated PostgreSQL compatibility mode, which historically made it an appealing candidate for simulating a production environment during local development.

The Illusion of Compatibility and
the Environmental Disparity Trap

While H2 excels as a lightweight runtime database when interactions are mediated entirely by abstract object-relational mapping frameworks, its feature parity with PostgreSQL falls short of complete functional duplication. The compatibility boundary rarely covers advanced native database capabilities, leading to subtle and disruptive behavioral deviations between development and production environments.

For instance, H2 natively supports specific windowing functions like ROWNUM, which are completely absent in PostgreSQL. Conversely, writing advanced queries that exploit native PostgreSQL functions or triggers quickly exposes the limitations of the compatibility mode. The critical nature of this gap becomes evident during schema migration lifecycle events.

During a recent project iteration, our development workflow required introducing an MD5 hashing mechanism to process historical records during a data migration phase. The PostgreSQL syntax accepts a simple byte array input for its native md5 function. When Flyway attempted to execute this migration script against the local H2 testing instance, the build failed immediately. The H2 engine does not recognize this function format, requiring an entirely different functional signature known as HASH, which demands an explicit algorithm string and expression parameters. This mismatch highlights the structural risk of relying on a simulated environment.

True environmental parity cannot be achieved by translating syntax at runtime; it requires validating software against the exact engine configuration slated for production deployment.

The Architectural Evolution of Local Infrastructure

The necessity of accepting the behavioral compromises of an in-memory database has been thoroughly eliminated by advancements in containerization and build-tool integration. The introduction of Docker fundamentally modified local engineering environments, a transformation subsequently extended to automated testing via the Testcontainers framework.

With the release of Spring Boot 3.1.0 in the spring of 2023, the framework introduced built-in, first-class configuration mechanisms for Testcontainers. This development eliminated the primary architectural justification for maintaining a split database architecture between testing and production. Even for projects maintaining simple data models, the modern tooling ecosystem removes the necessity of managing an alternate database dialect for local verification.

The Token Regression:
Generative AI and Legacy Patterns

The availability of modern containerized alternatives raises a pertinent question as to why environmental disparity remains a topic of discussion in 2026. The emergence of generative artificial intelligence as a ubiquitous development tool provides the explanation. During a concurrent development phase involving the bootstrapping of four distinct microservices, my engineering team utilized GitHub Copilot to accelerate the generation of service skeletons and initial configuration manifests.

Because generative models predict output tokens based on historical training data, their recommendations are heavily weighted toward long-standing industry conventions. Due to the decade-long prominence of H2 in historical Spring tutorials and code repositories, the assistant recommended an in-memory H2 configuration for local development. The engineers initializing the services accepted this recommendation as a functional baseline, thereby reintroducing legacy environmental friction back into a modern development stack.

Generative code assistants operate on statistical probability derived from historical data, which can inadvertently cause architectural regressions by propagating legacy best practices into modern codebases.

Implementing Local Parity through Automation

To resolve the structural friction caused by mismatched database engines, we replaced the in-memory simulation with a containerized PostgreSQL instance dedicated to local execution. To ensure this change did not introduce manual overhead to the developer workflow, we integrated the container lifecycles directly into our build orchestration layer.

Declarative Local Infrastructure with Docker Compose

The local database environment is declared using a concise seventeen-line Docker Compose configuration. This manifest utilizes a lightweight Alpine Linux distribution of PostgreSQL 17.9 and includes an explicit readiness health check to ensure dependent tasks block until the database engine is fully initialized.

name: one_service

services:
 postgres:
   image: postgres:17.9-alpine
   container_name: one-service-postgres
   environment:
     POSTGRES_DB: one_service
     POSTGRES_USER: admin
     POSTGRES_PASSWORD: admin
   ports:
    - "5432:5432"
   healthcheck:
     test: ["CMD-SHELL", "pg_isready -U admin -d one_service"]
     interval: 10s
     timeout: 5s
     retries: 5

This configuration allows developers to manage the entire infrastructure state directly from the terminal using standard compose lifecycle commands.

Automating Container Lifecycles within the Gradle

To eliminate manual intervention entirely, we registered custom execution tasks within the Kotlin DSL build configuration file (build.gradle.kts). These tasks manage the container lifecycle programmatically, guaranteeing that the database is active during specific phases such as schema generation or local application execution.

val composeUpPostgres by tasks.registering(Exec::class) {
   group = "documentation"
   description = "Starts local Postgres container and waits until it is healthy"
   commandLine("docker", "compose", "up", "-d", "--wait", "--wait-timeout", "120", "postgres")
}

val composeStopPostgres by tasks.registering(Exec::class) {
   group = "documentation"
   description = "Stops local Postgres container after OpenAPI generation"
   commandLine("docker", "compose", "stop", "postgres")
}

By utilizing Gradle task graph dependencies, these infrastructure tasks are hooked automatically into the application build process. For example, generating OpenAPI documentation requires an active database to resolve the schema accurately. We map this dependency explicitly using the build task lifecycle.

tasks.named("generateOpenApiDocs") {
   dependsOn(composeUpPostgres)
   finalizedBy(composeStopPostgres)
   ...
}

This structural configuration ensures that the container initializes prior to the generation task and terminates cleanly upon completion, removing manual environmental variance from the automated workflow.

Ultimately, the architectural tools available mean there are very few justifications for maintaining an in-memory database simulation in a modern ecosystem. When automated assistants suggest these legacy configurations, human engineers must remain the final arbiters of architectural validity, recognizing that statistical probability does not always equate to engineering excellence.

A Systematic Approach to AI in Production

Tue, 19 May 2026 07:01:26 GMT

I have utilized generative AI tools such as ChatGPT and GitHub Copilot for several years, but the central question that has consistently occupied my research is how to effectively apply these technologies within a production environment. Through dozens of experiments, I have moved beyond simple code generation to delivering production-ready stories with minimal manual intervention. My objective is to transition from viewing AI as a mere novelty to integrating it into a functional triad programming model.

The Evolution Toward Triad Programming

In my experience, modern enterprise software cannot be developed in isolation; it requires a collaborative team effort. For roughly six months, I have explored the transition from traditional pair programming to triad programming, where an AI teammate joins the human pair to facilitate development. This transition requires a cultural shift within the team to move from treating AI as a buzzword to utilizing it as a practical tool.

The support of technical leadership is an important prerequisite for this shift. Without such backing, changing established team initiatives and workflows is difficult. To support this cultural change, we organized internal sessions and weekly two-hour workshops dedicated to demystifying the technology. By exploring how to master context and refine instructions, the team can eliminate the magical perception often associated with artificial intelligence and treat it as a predictable component of the engineering process.

Establishing the AI Environment through Context

Defining the AI environment is an ongoing challenge, especially given the limitations inherent in production workflows. For my current purposes, I define the environment as the context provided to the model, which effectively makes the AI environment equal to its instructions. Whether these instructions are provided through a prompt, a specific configuration file, or an MCP server, they serve as the foundational constraints for the AI's output.

I believe it is essential to manage the AI environment as closely as possible to the development process. This allows the team to remain agile and make necessary changes without creating disconnected silos of instruction.

A significant advantage of this approach is the ability to leverage existing, plain-English documentation rather than creating specialized AI adaptations. For example, I use the team's standard Confluence page for quality assurance and testing strategies as a direct instruction set. This documentation outlines requirements such as ensuring every acceptance criterion is covered by a test and avoiding complex end-to-end suites in favor of integration coverage. Decoupling the testing strategy from AI-specific formatting ensures that if the team updates their standards, the AI's context is automatically updated, while the documentation remains readable for non-engineering stakeholders.

Architectural Constraints and Testing Strategies

To reduce cognitive load and provide clear boundaries for the AI, my team established a strict architectural agreement for our services. We utilize a hexagonal architecture, which is documented in Confluence to ensure consistency when engineers rotate between different services. This structure includes a defined hierarchy of adapters, controllers, and domain use cases.

The current structure organizes components into clear packages such as:

com.todo.adapter.controller for handling external requests and DTOs
com.todo.adapter.supplier for repository adapters and external client configurations
com.todo.domain for core exceptions, models, and use cases

While this structure is optimized for organizational clarity rather than pure readability, it serves as a robust framework that prevents the AI from generating unexpected or hallucinated results. By grounding the AI in these established conventions, we save significant time that would otherwise be spent on custom instruction maintenance.

The Practical Workflow: From Init Prompt to Autopilot

The bridge between our documentation and the code is the initialization prompt. I have found that the most effective flow involves using ChatGPT, which has integrated connections to Jira, Confluence, and our GitHub repositories. This allows me to create a prompt that references specific Jira stories and Confluence guidance pages directly.

When provided with these links, ChatGPT analyzes the story details, the codebase structure, and the architectural standards to generate a grounded implementation plan. This plan maps to actual ports and adapter conventions rather than generic advice. This approach also facilitates a dialogue between human pair partners, as the chat becomes a shared space for reaching an agreement before the final prompt is passed to GitHub Copilot.

Slicing and Iterative Implementation

A critical aspect of using AI in production is task slicing. To prevent the AI from attempting to generate non-existent dependencies, it is vital to isolate fragments of the story. For a simple task involving a controller, a use case, and a client, I follow a isolated sequence:

Implement a controller with a hard-coded response.
Implement the client that connects to the external service.
Develop the use case to bridge the domain model and the client.
Update the controller to utilize the new use case.

Each slice follows a rigorous autopilot loop within GitHub Copilot. I provide a specific instruction set that mandates a test-driven development cycle:

Analyze the task and the repository for alignment.
Create tests and mark them as skipped until the plan is approved.
Establish an implementation order for the tests.
Iterate through each test by removing the skip marker, implementing the code, and verifying the test passes.
Execute a full build, such as gradle clean build test, after each passing test to ensure overall system stability.

Human Oversight and Integration

Despite the high level of AI involvement, human oversight remains a non-negotiable requirement for production code. I request that Copilot organize the resulting files into commit groups that are easy for a person to understand before opening a pull request.

By keeping pull requests small and isolated, they remain manageable for human review, ensuring they meet specific client requirements and that the human engineers maintain a deep understanding of the codebase.

This workflow demonstrates that by leveraging existing organizational processes and treating AI as an integrated teammate rather than an external tool, we can deliver high-quality software with greater efficiency and consistency.

Diagnosing Observability Gaps in Blocking Controller Methods

Tue, 12 May 2026 07:02:00 GMT

In a distributed system, the invisibility of an expected log entry often signals a deeper divergence between execution flow and infrastructure expectations. During a recent implementation of a test email functionality within a Kotlin-based service, I encountered a scenario where logs in Datadog appeared for certain execution paths but remained absent for others. This inconsistency prompted an investigation into the interaction between the Kotlin when expression, blocking downstream calls, and the lifecycle of a request within the Datadog logging pipeline.

The target of this investigation was the sendTestEmail method located in the TestEmailController. The domain logic returns three distinct results: Success, FeatureTurnedOff, and Error. While the FeatureTurnedOff case consistently produced logs in the monitoring dashboard, the Success and Error outcomes frequently failed to emit the final confirmation log.

Analyzing the Execution Flow

The initial hypothesis centered on potential issues with the Kotlin when block or a misconfiguration of the Mapped Diagnostic Context (MDC). However, the technical finding revealed a more fundamental cause related to execution timing and the nature of the downstream service interaction.

The FeatureTurnedOff result is a short-circuit path. When the feature toggle is disabled, the use case returns a result immediately, allowing the controller to reach the final log statement and exit within a negligible timeframe. Conversely, both the Success and Error paths require a call to a downstream notification service. This call is implemented using a blocking mechanism via the .block() method on a reactive stream.

The discrepancy in log visibility was not a failure of the logging library but a consequence of the controller thread waiting on a blocking call. If the downstream service experienced latency or if the client closed the connection before the call completed, the final log statement was never reached or recorded.

This behavior was corroborated by Datadog errors indicating that the stream was closed by the client and that there were errors reading events. In environments utilizing the ssm-agent-worker, these interruptions can occur when the infrastructure or the initiating client terminates the request context before the application finishes its blocking operation.

Implementing a Robust Logging Lifecycle

To resolve the visibility gap, I restructured the logging strategy to separate request arrival from processing outcomes. By introducing a log statement immediately upon entry to the controller method, I ensured that a record exists regardless of how the downstream call performs.

The revised implementation follows a deliberate pattern of enrichment and cleanup. I utilized MDC to attach structured metadata to the log records, which facilitates precise filtering in Datadog. It is essential to avoid generic MDC keys such as status, as these often conflict with reserved fields or common conventions in log aggregators. Instead, I opted for specific identifiers like testEmailOutcome and templateId.

Structured Implementation and MDC Hygiene

The following structure ensures that the MDC is populated at the start of the request and, crucially, cleared in a finally block to prevent context leakage between threads.

try {
    MDC.put("templateId", request.templateId)
    logger.info("Test email request received")

    val status =
        when (val result = sendTestEmailUseCase.execute(request.templateId)) {
            is SendTestEmailResult.Success -> {
                MDC.put("testEmailOutcome", "test email was sent successfully")
                HttpStatus.CREATED
            }

            is SendTestEmailResult.FeatureTurnedOff -> {
                MDC.put("testEmailOutcome", "feature toggle is off, test email was not sent")
                HttpStatus.ACCEPTED
            }

            is SendTestEmailResult.Error -> {
                MDC.put("testEmailOutcome", "test email failed to send")
                MDC.put("testEmailErrorMessage", result.cause.message ?: "unknown error")
                HttpStatus.INTERNAL_SERVER_ERROR
            }
        }

    logger.info("Test email request processed")

    return ResponseEntity
        .status(status)
        .body(SendTestEmailResponse(templateId = request.templateId))
} finally {
    MDC.clear()
}

This approach provides a clear narrative in the logs. The Test email request received log serves as a heartbeat, confirming the controller was reached. The final Test email request processed log confirms the blocking call completed and indicates which branch of the when logic was executed.

Interpreting Downstream Service Signals

Understanding the relationship between the application and the notification service is vital for interpreting the logs. For instance, an observed HTTP 400 Bad Request error from the notification service endpoint indicates that the feature toggle was active and the application successfully initiated the call. Because this is a terminal error from the downstream provider, the result maps to SendTestEmailResult.Error.

Logging the specific error message from the result cause into a dedicated MDC field allows for immediate debugging of downstream rejections without requiring a manual trace of the network call.

The introduction of the early log statement fixed the observability issue for all three execution paths. It provides a reliable controller-level record that the request was received before any slow or failing downstream behavior could interfere with the logging thread.

Conclusion on Implementation Choices

The decision to add a pre-call log and wrap the execution in a try-finally block was a logical response to the constraints of blocking I/O. While reactive, non-blocking patterns are often preferred, existing architectural constraints sometimes necessitate the use of .block(). In such cases, the primary responsibility of the developer is to ensure that the system remains observable even when execution is stalled.

By grounding the logging strategy in the lifecycle of the request rather than just the final outcome, I established a more resilient monitoring posture. The logs now clearly differentiate between request arrival, downstream processing, and final controller outcome, providing the necessary context to diagnose failures in a distributed environment.

Project 2002: Curating a Practical Build for the Windows 98 Era

Tue, 05 May 2026 07:01:45 GMT

The foundation of any retro-computing project is a clear definition of its historical boundaries. My experience with personal computing began in late 1999 with a Pentium III 500MHz system. While that era provided significant exposure to the Windows 98 environment , much of my formative gaming history occurred during the Windows XP period. To capture the intersection of these two eras, I have focused my research on the late 2002 period.

The selection of Windows 98 SE as the primary operating system is a deliberate, pragmatic choice. While Windows XP offers stability, Windows 98 SE provides native MS-DOS support, which serves as a significant technical bonus for a gaming-centric build. This allows for a hardware configuration that can bridge the gap between late-nineties legacy software and the more demanding titles released at the turn of the millennium.

The Core Architecture: Transitioning from Theory to Reality

Identifying the ideal processor for 2002 requires a comparison between the AMD Athlon XP and the Intel Pentium 4. In early 2002, the AMD Athlon XP 1700+ was often viewed as the superior choice due to its performance-per-clock advantages over the Intel Pentium 4 Northwood. Furthermore, Windows 98 faces documented stability issues when running on processors exceeding 2.1GHz. This limitation makes the mid-range Athlon XP an ideal candidate for this specific operating system.

However, retro-computing often requires flexibility based on hardware availability. While my research initially favored a Socket A configuration, I acquired a ready-made system featuring an Intel Pentium 4 Northwood. This pivot highlights a core principle of the project: prioritizing functional, accessible hardware that matches the target era over an unattainable theoretical ideal.

Motherboard and Maintenance

The system is built around a QDI SuperB 4 motherboard. Positioned as a reliable middle-class component, it provides the necessary infrastructure for this era, though it presents a specific maintenance challenge common to hardware of this epoch.

The longevity of early 2000s hardware is frequently compromised by failing capacitors. The QDI SuperB 4 requires a complete recapping to ensure future stability and prevent electrical failure.

System Stability and Memory Constraints

While Windows 98 can be modified to address up to 1GB of memory, it is natively limited to 512MB. For a 2002 build, 512MB was considered a substantial amount and remains the most stable configuration for this operating system.

My selection for the memory module is a 512MB Kingston HyperX stick (KHX3200AK2/512). Although this specific module was released in July 2003, the HyperX line itself debuted in November 2002, making it a period-appropriate choice for a high-performance system of that time. I opted for a single 512MB module rather than a dual-channel configuration to ensure compatibility with the motherboard and to maintain a simpler, more stable signal path.

Graphics and the Economics of Retro Hardware

The video card is the most critical component for a gaming setup. While the Radeon 9700 Pro (August 2002) was the performance leader at the time, many users followed an upgrade path in subsequent years. In a modern context, the GeForce 6600 GT is often recommended as the fastest reliable solution for Windows 98 builds.

However, market dynamics dictate a different choice. The current price for a GeForce 6600 GT often reaches 100 euro, which is difficult to justify for a hobbyist project. By contrast, the ATI Radeon 9600 Pro (October 2003) can be acquired for approximately 10 euro. The 9600 Pro offers excellent driver support for Windows 98 and represents a logical "upper-mid" consumer upgrade that would have been common for a system originally purchased in late 2002.

Storage, Audio, and Networking

For storage, the system utilizes a 60GB Seagate Barracuda ATA IV. Released in late 2001, this drive is a period-correct selection that avoids the complexities and potential instability of using SATA-to-IDE adapters or industrial CompactFlash readers in a Windows 98 environment.

The audio configuration currently relies on a Creative Labs Sound Blaster PCI 128 (CT4750). While functional, the long-term goal is to source a Sound Blaster Audigy 1 or 2, which represented the pinnacle of consumer audio during the early 2000s.

The networking hardware is a standout artifact: the 3Com 3CSOHO100-TX.

This card was released in September 1999.
It utilizes the Parallel Tasking II architecture.
It processes network traffic on its own silicon, reducing the load on the CPU.

Optical Drives and Media Artifacts

The system includes two distinct optical drives that serve as markers of the epoch. The first is a Pioneer DVR-104 (April 2002), a reliable DVD-RW reader that requires the latest firmware for optimal performance. The second is the LG GDR-8161B. This drive is a unique historical artifact, as it is one of the few consumer drives capable of reading original GameCube and Wii discs.

To complement these, I have integrated a standard Samsung 3.5-inch floppy drive. The acquisition of new-old-stock floppy disks ensures that I can reliably write and load legacy DOS games using physical media.

Conclusion

Building a retro PC is a process where there is no single correct path, provided the researcher maintains a clear perspective on their goals. The transition from theoretical research to the physical assembly of hardware delivers a deep understanding of the technological transitions that defined the early 2000s. While maintenance tasks like recapping require time and patience, the result is a preserved piece of computing history that remains functional for modern exploration.

The Preparation of the Machine

Tue, 28 Apr 2026 07:01:44 GMT

The Sim Racing Setup

I’ve spent some time in this industry to know that the promise of “plug-and-play” is usually a lie told to people who don’t have to maintain the results. We’ve grown accustomed to our IDEs functioning almost perfectly the moment we install them, which has created a bit of a lazy habit in our collective psyche. We expect our tools to meet us where we are without any effort on our part. But when I look at the current state of Generative AI, I’m reminded much more of high-performance sim racing or building a custom PC. You can just plug a wheel into a desk and start driving, but you won’t actually feel the road, and you certainly won’t win any races. To get professional results, you have to embrace the preparation. The setup isn’t an annoying preamble; it is the work itself.

Hierarchies of Instruction

In my recent experiments, I’ve moved away from treating ChatGPT as a blank slate. Instead, I’ve been refining a two-tier configuration that relies on Project Instructions, which are specific directives tailored to a particular codebase or business domain that work in tandem with my global settings. I found that by splitting instructions between a global level—who I am and how I want to be spoken to—and a project level, I could stop the AI from hallucinating a generic solution. This isn’t about giving the AI a long list of rules to follow blindly. It’s about creating a runtime environment that respects the reality of my actual repository.

Slicing Against the Grain

There is a fundamental tension in how we break down work for a machine versus how we break it down for a human. In the agile world, we are taught the value of a Vertical Slice, which is a functional piece of work that touches every layer of the system to deliver a complete feature. When I am working with AI, however, I’ve found that this approach often leads to a mess. I’ve started practicing a methodology where I break a complex story into isolated, technical layers—repository, use case, then controller—as separate steps. I didn’t set out to slice the “layers of a pie” instead of the “slices of a cake” because I thought it was a better way to design software; I did it because I found it simply works better for the AI’s current reasoning capabilities. It’s an empirical adjustment. By forcing the AI to focus on one technical layer at a time, I prevent the logic from becoming a tangled knot of half-finished abstractions.

The Logic of Two Flows

Within these project instructions, I’ve found success by defining two distinct paths of interaction. I call these Flow-Based Prompts, a system where the AI knows whether we are in an analysis phase or an execution phase.

Flow 1: Analysis & Slicing
- Goal: Digest the Jira story and propose the technical slices.
- Output: A structured implementation plan.

Flow 2: Prompt Generation
- Goal: Create a specific instruction for GitHub Copilot.
- Output: A isolated prompt for a single technical layer.

In the first flow, the AI acts as a sounding board, helping me decompose a story and identify the technical boundaries. In the second flow, it transitions into a generator, producing the exact context needed for GitHub Copilot to write the code. This prevents the “handoff” problem where context gets lost between the chat window and the code editor. It ensures that when I move to my IDE, the instructions are already tailored to the specific slice of the system I am currently building.

The Evolutionary Tree

Of course, I’ve been skeptical of “perfectly automated” prompts that try to handle every edge case from the start. I’ve discarded that idea for now because, at this stage of my understanding, those prompts usually just add unnecessary weight and noise. However, I don’t think we are stuck here. I suspect that as we get better at this, our instruction sets will evolve into something more like a tree. The system won’t just be a static list of rules; it will be an adaptive structure that detects the current context of the work and branches out to provide exactly the right level of detail.

We are moving toward a future where the tool detects the type of instruction needed rather than requiring us to shout the same commands every morning.

For now, the manual setup is where the value lives. It’s the difference between a tool that guesses and a tool that knows.

Back to Reality

In the end, I’m keeping the slicing methodology and the dual-flow instruction setup in my toolkit. I’ve set aside the hunt for a “magic” prompt that solves everything in one go. Reality is messy, and our tools need to be flexible enough to reflect that. We should be skeptical of any AI workflow that promises to do the thinking for us. The real value is in the preparation—the configuration of the environment—that allows us to do our best thinking with a bit less friction.

Further Reading / Related Reflections

The Shared Reality of the Database Ledger

Tue, 21 Apr 2026 07:01:15 GMT

I spent a good portion of the early 2000s staring into the flickering glow of a CRT monitor, trying to master the precise sequence of an RTS build order. In games like StarCraft, you didn’t just build a Factory on a whim; you followed a rigid, physical sequence of Supply Depots and Barracks. The real problem wasn’t just losing a match—it was the desync, a fatal error where one player’s game state no longer matched the other’s. When that happened, the shared reality of the match simply evaporated.

I found that managing a database schema with Flyway feels remarkably similar. We often treat database evolution as a fluid, agile process, but the underlying reality is much more rigid. When we move from the isolated “practice map” of local development to the high-stakes environment of a production database, we are moving into a space where the history of what we built is just as important as the current state. In this space, a mismatch between your code’s expectations and the database’s actual schema is the ultimate game-breaker.

The Migration Ledger

Flyway manages this by utilizing a migration-based approach, which means every change to the database—whether adding a table or altering a column—is captured in a versioned SQL script. It maintains a dedicated table called flyway_schema_history to track exactly which scripts have been executed. To ensure consistency, the system calculates a checksum, which is a digital fingerprint of the file’s content.

If I ever change a script after it has already run on a server, Flyway detects that the fingerprint has changed. This results in a checksum mismatch, and the system will stop the application from starting. This immutability is not a hurdle; it is a safety feature designed to prevent the database from entering an unknown state where the code expects one schema but the database has another.

Iteration in the Local Loop

The friction often begins when we forget that our local environment is a sandbox, not a permanent monument. On macOS, I found that using Docker and Testcontainers is the most reliable way to ensure a local database actually matches production. We can spin up a local container with a single command to test our build order:

docker run --name my-db -e POSTGRES_PASSWORD=pass -p 5432:5432 -d postgres

This local container allows us to iterate quickly . In our build.gradle.kts configuration, we ensure that the cleanDisabled flag is set to false .

flyway {
    url = "jdbc:postgresql://localhost:5432/mydb"
    user = "postgres"
    password = "pass"
    cleanDisabled = false
}

This setup gives us a reset button . If I realize my first version of a script is flawed, I don’t create a second script to fix the first one locally. Instead, I edit the original script, run ./gradlew flywayClean, and then ./gradlew flywayMigrate. This ensures that my local state remains clean and my scripts remain concise before they are ever shared with the team.

The Virtue of Squashing

When working on a complex feature, I often end up with several different migration scripts as I refine the design. Merging all five into the main branch is a mistake because it clutters the history with a “diary” of my trial and error. Instead, I practice squashing, the act of consolidating all logic from multiple feature-branch scripts into one single, optimized file.

Squashing improves readability, making it easier for a peer to review one coherent table creation rather than a series of renames and drops. It also improves performance, as fewer scripts mean faster deployment and test execution. Before I merge a Pull Request, I ensure my local database is cleaned and migrated one last time to verify that the final, squashed script works perfectly.

Constraints of the Persistent Environment

The danger arises when we attempt to treat a persistent environment, like AWS Aurora, as if it were a local Docker container . Unlike our local sandbox, we cannot simply wipe a cloud database.

Triggering a clean command in a persistent environment is the ultimate “Game Over,” as it will drop all application data and cause a full service interruption .

Production database users usually lack the permissions to drop schemas anyway, which is a vital safety rail. However, errors still happen. Because PostgreSQL does not always roll back schema changes perfectly, a failed script can leave the database in a “half-built” state. When this happens, we must fix the script in the codebase and run ./gradlew flywayRepair . This command updates the history table to match the new checksums without deleting any data, though sometimes manual SQL intervention is required to fix the table structure before the repair can succeed .

Discipline Over Magic

At the end of the day, database migrations are about the discipline you bring to the ledger rather than the tool itself. Flyway is a powerful engine, but it won’t save you from a messy build order or a lack of environmental parity. I’m keeping the practice of squashing and the strict use of containers in my toolkit, while setting aside any hope that these systems will ever be truly “set and forget”.

The reality is that database state is heavy and unforgiving. If you treat your migrations with the respect a shared reality demands, your deployments will become boring—which is exactly what we should strive for.

Further Reading / Related Reflections

The ISO Wall and the CCD: Testing a €45 Film Alternative

Mon, 20 Apr 2026 07:01:11 GMT

The Parallel: Soul vs. Spec Sheets

I recently moved the family to Spain, and in the process, I liquidated most of my camera gear. I told myself I’d be happy with the smartphone in my pocket, but after a few months of shooting the boys in the Mediterranean light, I realized the images felt sterile. They were too perfect, too computed. I found myself missing the unpredictability of film—the way a certain stock renders a sunset not as a collection of high-dynamic-range data points, but as a mood.

A 35mm film photograph

However, I didn’t want to deal with the rising cost of rolls or the lead times of lab processing while trying to settle into a new country. I started wondering if I could find a shortcut to that aesthetic by looking backward rather than forward.

The Setup: The S45 System

I decided to test a theory I’d seen floating around certain corners of the web: that early digital sensors possess a “soul” that modern ones have scrubbed away in the name of efficiency. I spent €45 on a Canon PowerShot S45, a brick-like device from 2002. It doesn’t have a modern CMOS sensor; instead, it uses a CCD, a type of light-gathering hardware that reads the entire sensor at once and, arguably, renders color with a more organic, film-like saturation.

Plaintext

System: Canon PowerShot S45 (Circa 2002)
Sensor: 1/1.8" CCD (4.0 Megapixels)
Processor: DIGIC 1
Storage: CompactFlash (CF)
Interface: Tactical sliding lens cover, manual control dial

Holding the S45 feels like holding a piece of industrial equipment. It has weight, it makes mechanical noises, and it forces a specific cadence. You cannot “spray and pray” with this device. It demands that you wait for the buffer to clear.

The Friction: The ISO Wall

The experiment hit reality the moment the sun began to dip behind the hills. Modern sensors have spoiled us; we expect to shoot in near-darkness and let software sort out the mess. The DIGIC 1 processor inside the S45 has no such intelligence. I quickly discovered what I call the ISO Wall.

While the camera claims to go higher, anything above ISO 100 introduces a level of electronic noise that doesn’t look like pleasant film grain—it looks like a broken television. The sensor “fatigues” almost immediately when the light isn’t optimal. This constraint changed my behavior. I stopped trying to capture everything and started looking for the light first, and the subject second. If the light wasn’t there, the camera stayed in my pocket. It is a fragile system that requires a high-light environment to maintain its integrity.

The Signal and Load

There is a significant difference in the cognitive load between shooting with an iPhone and the S45. With the phone, the signal is “everything is a photo.” The computational overhead is handled by the device, leaving me with a flat, predictable result. With the S45, the signal-to-noise ratio is much tighter. I have to think about the exposure compensation and the white balance because the early internal logic often gets it wrong.

The friction of using old tech is actually a filter; it forces you to decide if a moment is actually worth the effort of capturing.

Surprisingly, when I run the files through my PIXMA G650 printer at 13x18 size, the 4-megapixel files hold up beautifully. The “imperfections”—the slight softness and the specific way the CCD handles the blues and reds—provide a look that I would usually spend twenty minutes trying to emulate in post-processing software.

A CCD-sensor shot from the S45

What Stood the Test

The experiment confirmed that I don’t need to chase a $700 Fujifilm X100 to feel inspired. The “Third Way” of photography is now my ground truth. It’s a space that sits between the mindless convenience of a smartphone and the high-maintenance ritual of film.

The S45 proved that character matters more than cost. The hardware is slow, the screen is tiny, and the battery life is questionable, but the output has an aesthetic “thickness” that modern gear lacks. It isn’t a 1:1 replacement for 35mm film, but it satisfies the same creative itch for a fraction of the price.

Final Reflections

I am merging the “vintage digital” approach into my permanent toolkit. The S45 will stay in my bag for those bright, coastal afternoons where I want the world to look a bit more like a memory and less like a data set. I’m backlogging the idea of buying a high-end mirrorless body for now; the “Side Quest” taught me that I was bored with the sensor, not the hobby.

The Cognitive Cost of AI Delegation

Tue, 14 Apr 2026 07:02:37 GMT

The Brake-Fade on the Downhill (The Hook)

When you’re descending a steep technical trail on a mountain bike, your most precious resource isn’t your speed—it’s your biological energy and grip strength. If you spend the entire descent white-knuckling the brakes because you’re afraid of the terrain, you hit “brake fade.” The system overheats, your hands cramp, and by the time you reach the truly dangerous rock garden at the bottom, you have zero “focus capital” left to navigate it. You crash not because the trail was too hard, but because you wasted your resources on the easy parts.

In the professional world, GenAI is being marketed as the ultimate “ebike” for our brains. The industry assumption is that more output equals more productivity. But if this “unlimited output” is the popular choice, why does it feel like I’m fighting the system? Why does receiving a perfectly formatted, AI-generated A4 page feel like a cognitive “crash” before I’ve even reached the conclusion?

The Architecture of the Proxy Mind (The Landscape)

The environment I’m navigating isn’t just a chat interface; it’s a Mind-to-Mind Pipeline where the AI acts as a middleware layer. We are dealing with a system defined by the following geometry:

[Input: Raw/Unorganized Chaos]

          ↓

[Processor: GenAI “Mind Extension”]

          ↓

[Output: Structured Narrative (High Volume)]

          ↓

[Buffer: Human Reviewer (The Fatigue Point)]

          ↓

[Destination: Recipient’s Attention Span]

The constraints here are rigid. The LLM has no “physical” weight, but its output carries massive cognitive weight. The dependencies are tightly coupled: if I delegate the “thinking” to the tool without managing the “output volume,” the invisible boundary of the recipient’s attention is breached. Data moves through this space quickly, but meaning gets trapped in the friction of the preamble.

The A4 Saturation Point (The Stress Test)

I moved my observations from the “theoretical path” to the “actual terrain” where people have many unread messages.

➤ The Breaking Point: The methodology of “Ask and Forward” failed at the third iteration. When I pushed a full A4 page of structured AI text to a colleague, the system showed immediate fatigue.

➤ The Silent Failure: The recipient didn’t tell me the text was too long. Instead, they “swallowed” the error—skimming the preamble, missing the critical “result of work” buried in the middle, and asking a question that was already answered in the text.

➤ The Observation: The gap between the “Structured Answer” provided by the AI and the actual Information Transferred was a massive chasm. While I didn’t measure the exact percentage, the observation was clear: the system was technically functioning, but the mission failed. The recipient’s focus simply didn’t survive the “A4 size” barrier.

The Noise Floor of the Preamble (The Handoff)

This is a failure of delegation. When we use AI to structure “unstructured vision,” we often translate our goal into an action that generates clutter rather than clarity.

➤ Signal-to-Noise: GenAI tools are programmed to be “helpful,” which means adding long, polite preambles and exhaustive summaries. This is the “noise floor”.

➤ Cognitive Load: By sending unedited AI responses, you aren’t saving time; you are just shifting the processing debt onto the recipient. You spend 10 seconds generating the text, but you force the recipient to spend minutes mining it for value. This eventually leads to a “system blackout” where people ignore messages entirely.

The Hard Character Limit (The Verification)

After observing these failures, only one principle remained standing: The Short Style Constraint.

➤ Stability: The only communication that survived the “skimming” reflex was the “Elevator Pitch” format. When forced into a tight container, the AI is actually better at its job. It stops “hallucinating value” through word count and starts organizing logic.

➤ The New Baseline: The trusted approach is the Init Prompt Constraint. I tell the system: “Structure my thoughts, but do not exceed 280 characters” or “Provide the result first, no preamble”.

➤ The Evolution: I no longer view AI as a “writer”; I view it as a compressor. The strategy has shifted from using AI to say more to using it to say exactly enough.

The Navigator’s Log (Actionable Insights)

➤ Backlog:

The “A4-size” response—a legacy format that died with the printer.
“Respectful” AI preambles—they are actually disrespectful to the recipient’s time.
Trusting the human brain to catch errors in long AI texts after multiple iterations (brain laziness is a hardware feature, not a bug).

➤ Merged:

The “Short Style” Init Prompt: Force the AI into a constraint before it generates a single word.
Energy Conservation: Spend mental energy on the constraint, not on editing massive, verbose text.
The Win-Win Protocol: If the sender spends less energy reviewing and the recipient spends less energy reading, the system remains stable.

Final Wisdom: In a world of infinite AI-generated noise, the most “premium” technical skill is the discipline to limit content. Be respectful to the system, or the system will stop listening.

𝗧𝗵𝗲 𝗣𝗿𝗮𝗴𝗺𝗮𝘁𝗶𝗰 𝗛𝗲𝘅𝗮𝗴𝗼𝗻: 𝗦𝗰𝗮𝗹𝗶𝗻𝗴 𝗗𝗲𝗰𝗼𝘂𝗽𝗹𝗶𝗻𝗴 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝗖𝗼𝗺𝗽𝗹𝗲𝘅𝗶𝘁𝘆

Tue, 24 Mar 2026 13:21:45 GMT

𝗧𝗵𝗲 𝗧𝗲𝗻𝘀𝗶𝗼𝗻 𝗼𝗻 𝘁𝗵𝗲 𝗧𝗿𝗮𝗶𝗹

In a professional kitchen, there is a concept called mise en place—everything in its place. You don’t start searing the scallops until every herb is chopped and every sauce is whisked. If you skip the prep to “save time,” you end up adjusting the recipe mid-sauté, usually resulting in a frantic mess, ruined ingredients, and a dish that takes twice as long to serve.

Modern software development has a similar “popular choice”: start coding the logic immediately to show “progress.” But when we skip the architectural prep—the interfaces and boundaries—we aren’t moving fast; we are just building a kitchen we’ll have to tear down while the customers are waiting. I’ve watched engineers lose sight of the goal in the pursuit of a “perfect flow” that wasn’t grounded in discipline. If everyone says they want “clean code,” why does the system feel like it’s fighting us the moment we add a new story?

𝗦𝘆𝘀𝘁𝗲𝗺 𝗚𝗲𝗼𝗺𝗲𝘁𝗿𝘆

The environment of this experiment is a standard Kotlin and Spring Boot stack. The landscape is defined by three distinct zones designed to minimize the “weight” of dependencies. To navigate this space, we use a rigid directory structure that acts as our map:

app

├── domain      <-- THE HEART (POKOs only)
│   ├── model
│   │   └── Data.kt     <-- Pure Kotlin Data Class
│   └── ports
│       └── outgoing    <-- Interfaces defining “What” we need
│           ├── DataPersistencePort.kt    <- SQL db
│           └── DataStoragePort.kt        <- Object storage
├── usecases    <-- THE ORCHESTRATOR
│   └── StoreDataUseCase.kt    <-- Feature logic
└── adapter     <-- THE “HOW” (Infrastructure)
    ├── web         <-- Inbound Adapter
    │   ├── DataController.kt
    │   ├── dto         <-- Request/Response DTOs
    │       └── WebMapper.kt    <-- DTO <-> Domain mapping
    ├── sqldb       <-- Outbound Adapter
    │   ├── entity
    │   │   └── DataJpaEntity.kt    <-- @Entity + JPA annotation
    │   ├── DataRepository.kt        <-- Spring Data/CrudRepository
    │   ├── PersistenceMapper.kt     <-- Entity <-> Domain mapping
    │   └── PersistenceAdapter.kt    <-- Impl DataPersistencePort
    └── cloud       <-- Outbound Adapter
        └── ObjectStorageAdapter.kt

➤ The Heart (Domain): Pure Kotlin Data Classes and business logic common to all usecases.

➤ The Orchestrator (Usecases): Where feature-specific logic lives and adapters are coordinated.

➤ The Infrastructure (Adapters): The “How” of the system—web controllers, JPA entities, and cloud storage clients.

The invisible boundary here is the Port. It’s an interface that defines “what” we need without caring “how” it’s done. In theory, this geometry should be light and flexible, yet many teams find it rigid because they misunderstand the direction of the signal.

𝗘𝗺𝗽𝗶𝗿𝗶𝗰𝗮𝗹 𝗘𝘅𝗽𝗹𝗼𝗿𝗮𝘁𝗶𝗼𝗻

I moved from the “theoretical path” of perfect architecture to the “actual terrain” of daily PRs. The system showed its breaking point not in a crash, but in a silent failure of discipline: the Domain Import Leak.

➤ The Breaking Point: It usually starts when an engineer adds a domain service that directly imports an adapter: import app.adapter.NewAdapter.kt.

➤ The Silent Failure: The code still passes tests. It still “works”. But the “Pure Domain” has been poisoned by infrastructure concerns.

➤ The Result: When the time inevitably comes to move that service to a usecase, the system reacts with extreme fatigue. We end up with PRs requiring the renaming of tens of files, leading to typos, package mismatches, and a massive mental load on reviewers.

𝗠𝗮𝗻𝗮𝗴𝗶𝗻𝗴 𝘁𝗵𝗲 𝗦𝗶𝗴𝗻𝗮𝗹

The handoff between layers is where the “spaghetti” starts or ends. In my exploration, I found that the clarity of intent is often lost because teams are afraid of the “complexity” of an extra interface.

➤ Cognitive Load: Trying to refactor architecture in the middle of a feature story creates a “refactoring nightmare”.

➤ Signal-to-Noise: If you are 100% sure a logic block belongs in the domain, put it there. If not, the “cleaner” signal is to start in a Usecase and extract downward only when the need is proven.

➤ Direct Translation: To keep the signal clear, I’ve found it’s even acceptable to call a Port directly from a controller for simple cases. This avoids 1:1 “pass-through” mapping while keeping the adapter decoupled through the interface.

𝗪𝗵𝗮𝘁 𝗘𝗮𝗿𝗻𝗲𝗱 𝗧𝗿𝘂𝘀𝘁?

After the stress test of “no time to decouple,” one principle remained standing: Mandatory Ports from the Start.

➤ Stability: The “price” of an interface at the start is effectively zero. It provides an immediate boundary that prevents the “import leak” and allows the domain to remain pure. ➤ The New Baseline: My trusted navigation strategy is now TDD-driven Hexagon.

• Step 1: Define the Domain Model.

• Step 2: Build the Adapter and verify it with Testcontainers (SQL or Object Storage).

• Step 3: Finally, orchestrate it all in the Usecase or Controller using the Port interface.

𝗔𝗰𝘁𝗶𝗼𝗻𝗮𝗯𝗹𝗲 𝗜𝗻𝘀𝗶𝗴𝗵𝘁𝘀

➤ Backlog (Failed the Stress Test):

• “Refactoring-in-the-middle”: Changing architecture while delivering a story leads to mess and typos.

• Direct Adapter Imports: Any import app.adapter inside app.domain is a bug, not a feature.

➤ Merged (Trusted Toolkit):

• Ports First: Always create the interface for 3rd party services or repositories immediately.

• Adapter-First Testing: Use Testcontainers to prove your “How” works before you worry about the “What” in your orchestration.

• Minimum Layers: Only add a Usecase layer if there is actual orchestration; otherwise, call the Port from the Controller.

Final Wisdom: Clean architecture isn’t about having the most layers; it’s about having the most resilient boundaries. The “price” of an interface is nothing compared to the cost of a messy PR that no one wants to review.

The 24-Inch Migration: Onboarding a 5-Year-Old to New Hardware

Tue, 17 Mar 2026 11:03:03 GMT

In the world of software, we often talk about “breaking changes.” You upgrade a core library, and suddenly the interfaces you relied on are deprecated, the latency spikes, and the system becomes unpredictable. Last week, I attempted a major version upgrade on my 5-year-old son’s primary transport layer: we moved from a 16-inch “legacy” bike to a Specialized Hotrock 24.

Physically, he was ready. He’s tall for his age, and the metrics suggested he could handle the 24-inch wheels. But as any Tech Lead knows, just because the hardware supports the requirements doesn’t mean the operator is ready to push to production.

The System Architecture: Specialized Hotrock 24

In this migration, the hardware selection was about finding the right Long Term Support (LTS) release. We skipped the 20-inch version entirely; in our roadmap, a 20-inch bike was a short-term patch that would only serve us for a year or two before hitting its end-of-life.

We went straight for the 24-inch platform as our LTS. To make this high-performance hardware compatible with a 5-year-old’s geometry, I chose the Hotrock for its low-slung frame—think of it as a compatibility layer or a “shim” that allows a smaller user to interface with a much larger system architecture.

The Debugging Phase: Staging Environment (Weekend 1)

We didn’t head straight for the trails. That would be like deploying a refactored monolith to 100% of users without a staging environment. We set up a 3x3 meter “Sandbox” in a parking lot to run our first integration tests.

1. The Look-Ahead Buffer (The Square)

The first bug we encountered was Visual Latency. He was looking at his front wheel—the equivalent of a system only processing the data packet currently in the buffer.

The Fix: I implemented a new algorithm. Start at Cone 1, look at Cone 2. When the front wheel enters the zone between 1 and 2, immediately point the sensors (eyes) toward Cone 3. We were teaching him to process future state while executing current operations.

2. The I/O Interrupt v1.0 (Stop-on-Line)

We tested the “Stop” command with a simple line. At this stage, we kept the requirements low: just execute a HALT command exactly on the line. He passed this test without issues—the braking interface was working, even if it was still a bit binary.

Scaling the System (Weekend 2)

Once the basic “Look-Ahead” logic was cached, we increased the complexity of our tests.

2.1 The I/O Interrupt v1.1 (The “No-Touch” Constraint)

We refactored the stop-and-go drill. Now, he had to stop on the line and then resume driving without touching the floor. This was about refining balance and power delivery—moving from a simple halt to a complex state transition.

3. The Slalom (Logic Branching)

Finally, we introduced the Slalom. This was a true logic-branching exercise: navigating a sequence of four cones. It required high-frequency adjustments to his trajectory based on the “Look-Ahead” data he was now successfully processing.

The “Merged PR”: Managing the Developer Experience (DX)

The first weekend wasn’t a “success” by pure performance metrics. He failed several drills, the “build” felt shaky, and the cones remained largely un-navigated.

But here is the most important log entry: He didn’t get frustrated. In my day job, when a Junior Developer (or an AI agent like Jules) struggles with a new stack, the worst thing a Tech Lead can do is demand they stay until midnight to “fix the build.” That is how you accrue Human Technical Debt—you might get the code merged today, but you’ve poisoned the developer’s relationship with the codebase for tomorrow.

By applying a “Freedom of Decision” protocol and capping sessions at 15 minutes, we prioritized the Developer Experience. Because I didn’t push, he didn’t associate the new hardware with stress. We maintained a high “morale-to-output” ratio, ensuring he was excited to “reboot” the training the following weekend.

The Feature: By the end of the second weekend, something clicked. It wasn’t about completing the drills perfectly—it was about the feel. The “Look-Ahead” algorithm was finally running in the background, and he started to feel comfortable on the new hardware.

The Post-Deployment Cleanup: Ownership

The real sign that the migration was a success came after the training was over. Without being asked, he started cleaning the bike himself.

In engineering, we call this Full-Cycle Ownership. It’s the moment a developer stops just writing code and starts caring about the health of the system they operate. Seeing a 5-year-old wipe down his own “hardware” after a successful sprint in the sandbox is the ultimate proof of engagement. He wasn’t just using the tool; he was owning it.

The Log:

Hardware: Specialized Hotrock 24 (LTS Migration).
Total Training Time: Two 15-minute sprints.
Bugs Fixed: Visual Latency (Front-wheel staring).
Post-Deployment: Automatic system maintenance (he cleaned the bike).
Emotional ROI: High. The goal isn't to go fast on day one—it's to make sure that when we finally hit the trails, the pilot feels like the system belongs to him.

Lying Tests and the Silent Swallow: Hardening Legacy Java

Tue, 17 Mar 2026 08:00:27 GMT

Is your CI/CD pipeline telling you the truth, or is it just telling you what you want to hear?

In many legacy projects, the build is “Green,” the tests pass, and the console shows no errors. Yet, the moment the application hits production, it fails. The culprit is often a “Lying Test”—a suite that passes not because the code works, but because the errors have been carefully hidden, logged to a void, or suppressed by a generic catch-all block.

How do you turn a “politely silent” codebase into one that fails loudly enough to be fixed?

The ‘Before’ State: Setting the Context

In older Java applications (circa 2005), error handling was often synonymous with e.printStackTrace(). Developers used manual main() methods or early JUnit versions to “test” logic. When an exception occurred, the instinct was to keep the process running at all costs.

The “old way” of testing often looked like this:

The Silent Swallow: Generic catch (Exception e) blocks that log a message but do not rethrow or signal failure.
Exit Code 0: Build scripts (Ant) that encounter a runtime error but still report a successful exit code, tricking the developer into thinking everything is fine.
Manual Verification: Tests that require a human to read the console output to see if it “looks right,” rather than asserting a specific outcome.

Introducing the Core Concept: Honest Testing

Honest Testing is the process of stripping away the “safety blankets” of legacy error handling to force the application to Crash Loudly.

What is it? It is a “Hardening Phase” where you replace swallowed exceptions with meaningful failures and migrate manual checks to automated assertions.

Why does it matter? You cannot refactor code you do not understand. If your tests are lying to you about the state of the system, any “improvement” you make is just a guess. Making the build RED is the first step toward making it truly GREEN.

Practical Applications & Use Cases

Use Case A: Exposing the Silent Swallow

The most common anti-pattern in legacy Java is the “Log and Forget” block. We must convert these into loud failures during the testing phase.

// BEFORE: The Lying Code
public void storeData() {
    try {
        // critical logic
    } catch (Exception e) {
        System.out.println("Error happened, but let's keep going!");
    }
}

// AFTER: Honest Code for Testing
public void storeData() {
    try {
        // critical logic
    } catch (Exception e) {
        // Re-throwing as a RuntimeException forces the test to fail
        throw new RuntimeException("Hardened Failure: Data storage failed", e);
    }
}

Benefit: The test suite will now immediately catch failures that were previously invisible.

Use Case B: From `main()` to JUnit 5

Legacy projects often have “test” classes that are just public static void main(String[] args) methods. These don’t integrate with CI/CD.

// Migrating to JUnit 5 Assertions
@Test
void testBackendConnection() {
    Backend b = new Backend("qbert.guba.com");
    // Instead of printing to console, we assert the state
    assertDoesNotThrow(() -> b.connect(), "Connection should be stable");
    assertNotNull(b.getStatus(), "Status should be initialized");
}

Benefit: Provides a quantifiable “Safety Net” that build tools like Gradle can interpret as a Pass/Fail signal.

Common Pitfalls & Misconceptions

The “Fear of Red” Pitfall: Many teams are terrified of a broken build. They think that if the build turns red, they’ve failed.

The Truth: In legacy refactoring, a Red Build is a victory. It means you’ve finally found the boundaries of the system. You’ve moved from “unknown-unknowns” to “known-knowns.” Don’t rush to fix the red; use it as a map to find where the code is truly broken.

Core Trade-offs & Nuances

The “Crash” Period: When you start hardening tests, the project might not compile or pass for days. This requires stakeholder buy-in—you are breaking the “illusion of stability” to find the “reality of the debt.”
Log Noise: Hardening exceptions often results in massive stack traces in your logs. This is necessary labor; you have to clean the noise to find the signals.

Forward-Looking Conclusion

A “Green Build” is only valuable if it is earned. By removing the “Silent Swallows” from your legacy Java project, you are performing a diagnostic surgery. It is painful, and it reveals the rot, but it is the only way to heal the codebase.

Once your tests are honest, you can finally apply modern AI tools and refactoring patterns with confidence. You aren’t just “hacking” anymore; you are Engineering.

Refactoring the Workshop

Nik — Thu, 05 Mar 2026 16:29:41 GMT

The Migration Headache

Ever tried to migrate a massive, stateful legacy system to a new cloud region with zero downtime? That was my life in 2024. But here’s the thing about technical debt: it follows you.

My “Legacy System” wasn’t the physical tools—I’d sold those off before leaving Israel. The debt was in my head. My experience as a consultant in a small local MTB shop in Saint-Petersburg fifteen years ago had programmed me with a “pro-shop” bias. When we lived in Israel, I acted on that bias and built a monolith: a massive toolset, a wheel balancing stand, the works. It was classic Over-engineering.

Now, standing in my garage in Spain with two Merida Big Nine 60s and my son’s Specialized Hotrock 24, I realized I didn’t need to rebuild the data center. I needed to refactor for efficiency. I needed a modular set of microservices.

The “System Architecture”: A Modular Toolchain

Instead of a “buy-it-all” approach, I’ve decoupled the maintenance into three high-performance modules.

1. Edge Computing: The “On-The-Trail” Kit

This is for high-availability fixes. If this service fails, the “user” (my oldest son) has a total system crash 5km from the trailhead. I’ve packed this “payload” into a SKYSPER 20L backpack, organized in Zip-lock bags for modular access:

The Processor: Crankbrothers M17 multi-tool.
Error Handling: KMC Missing Links (9 and 7-speed) + 2x Pedro’s or Park Tool tire levers.
Redundancy: Kenda tubes (29x2.2 and 24x1.95) + Park Tool GP-2 pre-glued patches.
Hardware Peripherals: hand pump with manometer + 2x small microfiber towels (Dirty/Clean).
On-the-fly Patches: Small 60ml Finish Line Dry Lube + a travel-size Teflon spray (MO-94/GT85).

2. Maintenance Scripts: The “Dry-Clean” Routine

Think of this as your cron jobs. It runs weekly to prevent system degradation. Here is the deployment logic:

Mechanical Cleaning: Back-pedal the chain through a dry rag to remove “big” grit.
Rinse (Optional): If you hear “sand grinding” in the gears, flush it with water.
Stanchion Wipe: Clean the shiny bits of the fork with a dedicated rag.
The Teflon Interface (Conditional Logic):
- if (no_rinse): Spray Teflon onto a rag (not the bike) to wipe the chain/bolts.
- else if (rinse_performed): Protect the brakes and spray Teflon directly onto the chain for water displacement.
The Wipe-Down: Use that Teflon-soaked rag to wipe the chain and bolt heads. This microscopic film stops the Spanish salt air from “bit-rotting” your hardware.
Re-Lube: Apply Finish Line Dry Lube to the rollers.
Final Wipe: Wait 60 seconds for penetration, then wipe off excess.

3. Core Infrastructure: The “Yearly Service”

This is the “bare metal” hardware needed for the deep dives.

Health Monitoring: A Chain Wear Indicator. If it hits 0.75, the chain is “deprecated” and needs replacement.
The Interface: A thin-profile 15mm Pedal Wrench. You can’t hack this with a standard DIY wrench.
Environment Setup: A Floor-to-Frame Stand. I found one for €30 on Vinted—a small investment for a massive increase in “developer comfort.”
JIT (Just-In-Time) Dependencies: Specialized tools like the Cassette Lockring Tool and Cable Cutters are in the “backlog.” I won’t buy them until the specific part needs a “version upgrade.”

The Bonus: “Season Deep Clean” (System Integrity Audit)

Once a season, we need more than a script; we need a full System Audit. This is where we check for “memory leaks” and hardware degradation.

The Audit Kit

Garbage Collector: Bio-Degreaser (Finish Line EcoTech).
The “Gherkin” Brush: A drivetrain detail brush with a “claw” for digging out grit.
Linter Tool: Chain Wear Indicator.

The Protocol

Pre-Wash & Degrease: Remove the mud, then spray degreaser on the gears. Let the “Garbage Collector” run for 3 minutes.
Scrub & Rinse: Use the “Gherkin” claw to dig out grit. Rinse with low-pressure water.
Water Displacement: While wet, spray Teflon on the chain, bolts, and derailleur springs to prevent oxidation.
Dry: Use a microfiber towel. Crucial: If the chain isn’t dry, your lube won’t “deploy” correctly into the metal.
Re-Lubrication: Apply one drop of Line Dry Lube to each roller on the inside of the chain while back-pedaling.
The Wipe-Down: Wait 60 seconds for the lube to soak into the “inner pins.” Then, use a clean rag to wipe off the excess. The chain should be lubricated on the inside, but dry to the touch on the outside to prevent sand from sticking to the surface.

The Health Check (Static Analysis)

Dependency Check: Use the Chain Wear Indicator. If it hits 0.75, the chain is deprecated—replace it.
Brake Validation: Check for 1mm thickness. Safety is a non-negotiable fail-safe.
Indexing: Shift through all gears. If it “clicks,” adjust the barrel adjuster by 0.5 turns (like fine-tuning a config file).
Cable Integrity: Look for “blooming” silver wires. If a cable is untwisting, it’s about to crash. If shifting is “crunchy,” the cable is “dragging” in the housing—likely a rust/dirt bottleneck.
Load Balancing: Spin the wheels. If they wobble >3mm, they need balancing (truing).

The Debugging Phase: Ego vs. Reality

The biggest “bug” I encountered was my own Professional Ego. Because I worked in that shop in Saint-Petersburg and maintained a “perfect” setup in Israel, I felt like a “junior” by not having every professional tool immediately.

I had to debug that thought process. In software, we call this YAGNI (You Ain’t Gonna Need It). For a Merida Big Nine 60, I can “debug” a wobbly wheel by watching it against the frame. I don’t need a $300 truing stand to verify a fix.

The real challenge is Onboarding the Junior Dev (my son). When his Hotrock 24 starts “clicking,” the latency between my coaching cue and his execution is high. Keeping his bike “clean” via these scripts reduces the “noise” in his learning process. A smooth drivetrain is just a better UI for a kid.

The “Merged PR”: Log Summary

The “monolith” workshop is officially decommissioned. It’s been replaced by a streamlined, purpose-built kit, neatly “containerized” in Zip-lock bags within a single backpack.

Status: Healthy.
Packaging: All trail tools isolated in Zip-locks for weatherproofing.
Uptime: All family bikes are 100% operational.
Backlog: Need to keep an eye on the brake pads; we’re approaching a “major version” update there.

Environment Emulation: Using Docker as a Time Machine for Legacy Java

Nik — Tue, 03 Mar 2026 08:01:11 GMT

What do you do when the code is right, but the world has changed too much to run it? You’ve successfully compiled a 20-year-old Java app, but the moment you hit “Run,” it crashes. It’s looking for a server named qbert.guba.com that was decommissioned in 2011. It’s searching for a local directory belonging to a developer who left the company fifteen years ago.

How do you convince a digital “antique” that it’s still living in 2005?

The ‘Before’ State: Setting the Context

In the early days of Java development, “Environment Variables” and “Configuration as Code” were often ignored in favor of hardcoded assumptions. Developers wrote code that relied on:

Static Network Topologies: Hardcoded hostnames in .properties files or even inside .class files.
Personalized File Paths: Logic that pointed to /Users/ericlambrecht/data, making the code physically impossible to run on any other machine.
Specific Hardware Quirks: Reliance on the way Intel processors handled certain operations, which breaks on modern ARM-based chips like Apple’s M-series.

The “old way” to fix this was a massive refactoring effort to externalize configuration. But when you have thousands of lines of “spaghetti” code, you risk introducing more bugs than you fix.

Introducing the Core Concept: Environment Emulation

Environment Emulation is the practice of using containerization to recreate a specific historical “reality” for your application. Instead of changing the code to fit the modern world, you change the world to fit the code.

What is it? It’s a “Time Capsule” strategy where Docker mimics the network, filesystem, and CPU architecture the application expects.

Why does it matter? It allows you to achieve a “Green Start” without touching a single line of legacy business logic. By stabilizing the environment first, you can verify that the code can work before you begin the dangerous work of refactoring it.

Practical Applications & Use Cases

Use Case A: Network Trickery (Docker Aliases)

If your legacy code is hardcoded to look for qbert.guba.com, you don’t need to hunt through the source code. You can use Docker’s network aliases to point that “ghost” hostname to a local container or a mock service.

# docker-compose.yml
services:
  legacy-app:
    image: my-ancient-app:latest
    networks:
      backend:
        aliases:
          - qbert.guba.com  # The app thinks it found its long-lost server
networks:
  backend:

Benefit: The application connects successfully without any code changes or /etc/hosts hacking on your host machine.

Use Case B: Filesystem Mimicry (Volume Mapping)

When code is locked to a specific path like /Users/eric/data, Docker volumes can “teleport” your modern project directory into that exact location inside the container.

docker run -v $(pwd)/data:/Users/ericlambrecht/data my-legacy-java-app

Benefit: You satisfy hardcoded file requirements immediately, allowing the app to boot and pass its initial I/O checks.

Use Case C: Hardware Realities (x86 on ARM)

Older binaries or specific versions of the JVM (like early Java 6 or 8 builds) may behave unpredictably on Apple Silicon (ARM64). You can force Docker to emulate the original Intel environment.

# Specify the platform to ensure 100% compatibility with legacy binaries
FROM --platform=linux/amd64 eclipse-temurin:8-jdk

Benefit: Eliminates subtle “Heisenbugs” caused by CPU architecture differences.

Common Pitfalls & Misconceptions

The "Config-First" Trap: Many engineers think they must "clean up" the configuration files before they can run the app in Docker.

The Fix: Don’t clean. Emulate. Use Docker to satisfy the app’s current (even if “ugly”) requirements. Once you have a running, testable container, you can then refactor the configuration into modern environment variables as a second, safer step.

Core Trade-offs & Nuances

The “Magic” Burden: Environment emulation can feel like “magic” to new developers. If the docker-compose.yml isn’t well-documented, a newcomer won’t understand why the app is looking for a server that doesn’t exist.
Performance: Running x86 images on ARM64 via emulation (QEMU) is slower than native execution. This is acceptable for refactoring and testing, but may not be ideal for high-performance production needs.

Forward-Looking Conclusion

Modernization is an act of engineering, not just coding. By using Docker as a “Time Machine,” you stop fighting the environment and start observing the application’s actual behavior.

Once the “Time Capsule” is built, you have achieved the ultimate goal of the software archaeologist: Reproducibility. From here, you can move forward with confidence, knowing that any changes you make to the code are being tested against a stable, predictable reality.

The Strangler Build: Modernizing Java Tooling with Gradle 7.6

Nik — Tue, 17 Feb 2026 08:03:21 GMT

What do you do when your build system is the primary blocker to your modernization? You want to introduce automated testing and containerized deployments, but your project is locked inside an opaque build.xml file. It’s not necessarily that the file is thousands of lines long—it’s that it represents a “frozen” process. The fear of breaking a specific, undocumented Ant target often keeps teams stuck in the past, manually running builds because they don’t trust the automation.

The ‘Before’ State: Setting the Context

In the early 2000s, Apache Ant was the industry standard. It was purely imperative: you wrote a “script” telling the computer exactly how to delete folders, copy files, and compile classes.

The problem isn’t just the age of the tool; it’s the lack of lifecycle. Unlike Maven or Gradle, Ant has no built-in concept of a “test” phase or a “package” phase unless someone manually scripted them. For many legacy projects, this resulted in a build process that is fragile, hard to replicate in CI/CD, and completely disconnected from modern dependency management.

Introducing the Core Concept: The Tooling Strangler

The Tooling Strangler applies the Strangler Fig pattern to your build infrastructure. Instead of attempting a “Big Bang” migration where you delete Ant and spend a week debugging a new Gradle script, you wrap the old logic.

What is it? Using Gradle’s ant.importBuild, you surface your legacy Ant targets as native Gradle tasks.

Why does it matter? It allows you to move to a modern CLI immediately. You get the benefits of the Gradle Wrapper (./gradlew), advanced caching, and build scans, while the actual heavy lifting is still performed by the original, proven Ant logic.

Practical Applications & Use Cases

Use Case A: The “Wrapper” Migration

By importing the build, you can start adding modern features (like dependency management) around the old Ant tasks without changing the Ant file itself.

// build.gradle
// Import the existing Ant logic
ant.importBuild 'build.xml'

// Add a modern dependency that Ant didn't know about
dependencies {
    implementation 'org.slf4j:slf4j-api:1.7.36'
    testImplementation 'org.junit.jupiter:junit-jupiter:5.9.1'
}

// "Hook" a modern task into an old Ant target
tasks.named('compile') {
    doLast {
        println "Ant finished compiling. Gradle is now verifying the output..."
    }
}

Benefit: Risk-free modernization. Your build stays “green” throughout the entire transition.

Use Case B: The 7.6 “Goldilocks” Version

In my experiments, I found that Gradle 7.6 is the specific “sweet spot” for this work. Why?

JDK 8 Compatibility: It is the last major version that runs its own background processes (the daemon) natively on Java 8.
Modern Features: It still supports the latest JUnit 5 platforms and Docker-ready plugins.
The Bridge: It allows you to bridge the gap between a 2005 build logic and a 2026 deployment pipeline.

Common Pitfalls & Misconceptions

The "Pure Gradle" Obsession: A common mistake is trying to make the build.gradle file "perfect" from day one. Developers often get stuck trying to replicate a weird Ant copy task in Gradle's DSL.

The Fix: If the Ant task works, leave it in Ant. Use the Strangler Fig approach: only move tasks to Gradle when you actually need to change their logic or improve their performance.

Core Trade-offs & Nuances

Dual Maintenance: For a period, you have both build.xml and build.gradle. You must treat the Gradle file as the new “entry point” for the team.
Mindset Shift: You are moving from a “Scripting” mindset (Ant) to a “Task Graph” mindset (Gradle). Understanding how tasks depend on one another is more important than knowing the syntax.

Forward-Looking Conclusion

Modernizing a build system doesn’t require a “demolition and rebuild.” By using Gradle 7.6 as a wrapper for your legacy Ant scripts, you buy yourself the most valuable asset in refactoring: time. You get the project into a modern CI/CD pipeline on day one. Once the build is stabilized and automated, you can “strangle” the remaining Ant targets at your own pace.

The Golden Bridge: Why Java 8 is the Ultimate Tool for Legacy Refactoring

Nik — Mon, 16 Feb 2026 08:02:13 GMT

When does “latest and greatest” become a liability? Imagine you’ve just inherited a “Big Ball of Mud”: a 20-year-old repository built with Ant, running on Java 1.5, and filled with raw types and swallowed exceptions. Your instinct is to jump to Java 21 to get the latest performance gains and features. But when you try to compile, you’re met with thousands of breaking changes, deleted APIs, and a build system that refuses to acknowledge modern hardware.

How do you modernize a system that is too old to run, but too critical to fail?

The ‘Before’ State: Setting the Context

In the world of “Software Archaeology,” we often encounter projects stuck in the mid-2000s. These applications are often:

Compiler-Locked: They rely on syntax (like certain raw-type configurations) that modern JDKs (11, 17, 21) simply won’t compile anymore.
Environment-Fragile: They only “work on Bob’s machine” because Bob has a specific 2008-era Intel laptop and a prehistoric version of the JDK.
Tooling-Limited: They use Ant or early Maven versions that don’t understand modern CI/CD pipelines or containerization.

The “old way” of fixing this was the Big Bang Migration: a grueling six-month rewrite where you try to jump 15 years of evolution in one go. Most of these attempts end in failure, reverted commits, and exhausted teams.

Introducing the Core Concept: The Golden Bridge

The Golden Bridge methodology uses Java 8 not as a final destination, but as a strategic "Field Hospital." What is it? It is the practice of migrating ancient code (Java 1.4 - 1.6) specifically to Java 8 first, rather than the current LTS.
Why does it matter? Java 8 sits at a unique historical intersection. It is the “Last of the Ancients” and the “First of the Moderns.” It provides a stable environment where you can fix the internal architecture of the code without the external environment fighting you.

How does it work?

Dual-Compatibility: It supports the -source 1.5 flag to compile ancient syntax while allowing you to use modern IDEs.
Architecture Neutrality: It is the first version that runs natively on Apple Silicon (ARM64) via Zulu or Temurin builds, ending the reliance on old hardware.
Tooling Support: It is fully supported by Gradle 7.6, which acts as the "Strangler Fig" for old Ant builds.

Practical Applications & Use Cases

Use Case A: Compiling the “Uncompilable”

Modern JDKs have removed many internal APIs and tightened the rules on source compatibility. Java 8 allows you to keep the old code running while you transition the build system.

// In your build.gradle, you can target the past while living in the present
java {
    toolchain {
        languageVersion = JavaLanguageVersion.of(8)
    }
}

Benefit: You get a green build in hours, not weeks.

Use Case B: The Docker “Time Machine”

By using Java 8, you can create a Docker image that mirrors the production environment exactly, but runs on a 2024 MacBook.

FROM eclipse-temurin:8-jdk
# Map the 20-year-old hardcoded file paths to modern volumes
VOLUME /Users/original_dev/data:/data 
COPY . /app
WORKDIR /app
CMD ["ant", "test"]

Benefit: Eliminates “Works on my machine” bugs immediately.

Common Pitfalls & Misconceptions

The "Destination" Trap: The biggest mistake is thinking that moving to Java 8 is "enough."

Java 8 is a bridge, not a home. If you stay there, you are still accumulating technical debt. The goal of the Golden Bridge is to get the code clean enough (removing raw types, fixing tests) so that the jump to Java 17 or 21 becomes a simple compiler flag change rather than a structural nightmare.

Core Trade-offs & Nuances

The Cost: You have to maintain a specific legacy toolchain (like Gradle 7.6) because the newest versions of build tools have dropped support for Java 8.
The Mindset: You must resist the urge to use Java 8 features (like Streams or Optionals) immediately. Your first goal is stabilization, not modernization. Adding new syntax to a “muddy” codebase only makes the archaeology harder.

Forward-Looking Conclusion

Java 8 is the unique “Goldilocks” zone of the Java ecosystem. It’s old enough to understand where the code came from, and modern enough to work with the tools of today.

By treating Java 8 as your Golden Bridge, you turn a high-risk “archaeological dig” into a controlled engineering project. Use it to stabilize your build, containerize your environment, and harden your tests. Once the mud is washed away, the path to Java 21 will be wide open.

Does Delegating to AI Mean We Can Finally Be Lazy Managers?

Nik — Tue, 20 Jan 2026 08:00:59 GMT

1. The Hook

We often sell AI adoption to our bosses (and ourselves) with the promise of speed. We imagine a future where we toss a vague request over the wall—”fix the build,” “export the data,” “optimize the query”—and the AI handles the rest while we grab a coffee.

But my recent experiments with Jules, Google’s new AI agent, suggest the opposite is true. The more “autonomy” I gave the AI, the more mediocre the code became. This leads to an uncomfortable question: Does effective AI delegation actually require more management overhead, not less?

2. Context & Tools

I’ve been experimenting with Jules, testing its ability to act as a “Junior Developer” in my Spring Boot repository, joyofenergy-java.

In my previous explorations, I looked at Pair-Authoring with an AI and the Context Window Paradox. This time, I wanted to test the difference between Abdication (lazy delegation) and Navigation (structured delegation) when asking an agent to build a feature from scratch.

3. The Failed Experiment: The “Friday Afternoon” Prompt

I set up a scenario we’ve all faced: It’s Friday afternoon, I want a new feature shipped, and I don’t want to think about the implementation details.

I gave Jules the “Lazy Manager” prompt:

“Jules, create an endpoint to export meter readings as a CSV file. Use the existing MeterReadingService.”

I intentionally withheld constraints. I didn’t mention memory usage, libraries, or formatting.

The Result?

Technically, it worked. Jules created a CsvService, updated the controller, and passed the tests. But structurally, it was a time-bomb.

Memory Unsafety: It loaded the entire dataset into a List in memory before writing the response. For a smart meter with 100,000 readings, this is an OutOfMemoryError waiting to happen.
Library Bloat: It generated a new service class (CsvService) where a simple stream in the controller would have sufficed.
Junior Mistakes: It used standard Java formatting without considering how a user would actually open the file in Excel.

The “lazy” prompt produced “lazy” code: functional, but dangerous at scale. It validated my fear that More Powerful AI Doesn’t Always Mean Faster Fixes.

4. Principles That Actually Work: The “Brief”

I reset the experiment. This time, I treated Jules like a Senior Engineer would treat a Junior: I wrote a spec.

I uploaded a file named feature-csv-export.md containing strict constraints:

No New Dependencies: Do not add apache-commons or opencsv.
Memory Safety: Do not load lists into memory; stream directly to the HttpServletResponse.
Strict Formatting: Use yyyy-MM-dd HH:mm.

I then prompted:

“Jules, I’ve uploaded a spec file... Please refactor the implementation to strictly follow these constraints.”

The Outcome:

The difference was night and day.

Architectural Safety: Jules implemented a streaming solution using PrintWriter, avoiding the memory bottleneck entirely.
Dependency Management: It correctly added jakarta.servlet-api as a compileOnly dependency, respecting the “no runtime bloat” rule.
Test Integrity: It initially failed to test the controller response correctly, but because I had defined the “correct” output in the spec, I could guide it to fix the assertion logic.

5. Unexpected Discovery: The “Spec” as a Guardrail

The most surprising insight was that Jules didn’t just follow the instructions—it used the spec file as a defense mechanism against bad code.

When I ran the “Lazy” experiment, Jules defaulted to the path of least resistance (loading data into memory). When I provided the “Brief,” Jules shifted behavior entirely. It didn’t just write code; it navigated the constraints.

This confirms a theory I touched on in Can We Make AI Code Assistants Smarter by Asking Them to Write Their Own Rules? The AI performs best not when it has “creative freedom,” but when it is boxed in by rigid technical constraints. The “Senior Engineer” input wasn’t the code I wrote, but the boundaries I set.

6. The Central Paradox

This brings us to the Delegation Paradox:

To get an AI agent to work autonomously, you must micromanage the requirements.

If you want to be “lazy” during the implementation phase (execution), you must be hyper-active during the definition phase (specification). You cannot abdicate both.

Abdication (Vague prompt) -> Requires heavy code review and refactoring later.
Navigation (Detailed spec) -> Requires heavy upfront thought, but produces near-production-ready code.

We aren’t thinking less with AI; we are shifting when we think.

7. Forward-Looking Conclusion

Tools like Jules are shifting the developer’s role from “writer of code” to “architect of constraints.”

If you treat your AI agent like a magic wand that reads your mind, you will build technical debt at record speeds. But if you treat it like a talented but literal-minded junior developer who needs a solid brief, it becomes a powerful force multiplier.

The future of engineering isn’t about writing the perfect function; it’s about writing the perfect spec.

Can We Skip TDD with Modern AI? A Context Experiment

Nik — Tue, 09 Dec 2025 08:01:03 GMT

The Hook

Recently, some colleagues pitched me an idea: “Today, LLMs are so powerful, you can start exactly from implementation and it will work well. No need to use TDD or other more complicated XP techniques”.

It is a tempting thought. If an AI can generate a complete feature in seconds, is my approach—always start from a test—still relevant?.

I decided to check it. I ran an experiment to see if I could implement a complex feature by describing the task and letting GenAI create the application. My hypothesis was that TDD is still vital, but I wanted to see if the “Just Do It” method could prove me wrong.

The result? I confirmed exactly what I expected: TDD is one of the best ways to create context for an LLM.

Personal Context & Tools

For this experiment, I returned to a project I started in a previous article: “Does AI Need Clear Goals? My Experiment in Turning Vague Ideas into Code”.

My tool of choice was GPT-4.1 (via GitHub Copilot), utilizing its Agent mode to handle multi-file context. Usually, I treat the AI as a pair programmer, following structured collaboration methods I’ve discussed in “Pair-Authoring with an AI: A Case Study in Structured Collaboration”.

But for this session, I acted as a “manager,” giving requirements and approving plans, but explicitly skipping the “Red” phase of TDD. I let the AI write the code first.

The Failed Experiment

The task was Story #2346: Implement a “Day of Week Pricing Plan”. The requirements were clear: users needed to compare power usage costs based on the day of the week and rank price plans accordingly.

I approved the AI’s plan and let it generate the implementation. Here is where the “No TDD” approach started to show its cracks.

1. The “Ghost Method” Problem After the AI implemented the service layer, my IDE lit up with errors. The AI used a method getDayOfWeekMultiplier(DayOfWeek) that didn’t exist. It “hallucinated” a method on the domain object because it was writing the service in isolation. I am usually fine with “Red” code, but this wasn’t TDD “Red”—this was just broken code requiring immediate fixes.

2. The Regression Nightmare When we fixed the missing method, we broke the existing logic.

PricePlanTest > shouldReceiveMultipleExceptionalDateTimes() FAILED

Because we implemented the new logic over the old logic without a guiding test, the AI introduced regressions. We had to do several iterations just to get back to a baseline.

3. The Context Disconnect The real struggle happened during Functional Testing. I asked the AI to verify the endpoints. It generated a test that tried to hit the API, but it returned a 404 Not Found. Why? The AI created a test that queried a Smart Meter ID, but “it didn’t have a context!”. It forgot that in this application, a Smart Meter must be linked to a Price Plan via the AccountService first. The AI tried to guess the solution, attempting to call an API /account/link/{smart-metter-id} that didn’t even exist.

Principles That Actually Work

I eventually finished the task without TDD, but it required multiple rollbacks and context corrections. Through this struggle, I confirmed why TDD works:

Principle 1: Tests Are Context Anchors The reason the AI failed the functional test setup was a lack of context. If I had written the test first, I would have been forced to set up the AccountService association immediately. The failing test provides the AI with a strict “Context Window” of what is required, as I explored in “The Context Window Paradox”.

Principle 2: Small Steps Prevent “Imagination” When the AI doesn’t have enough context, it tries to imagine the answer. TDD forces small, verifiable steps. By skipping the test, I forced the AI to generate a large chunk of logic (Controller + Service) at once, increasing the surface area for hallucinations.

Unexpected Discovery

The most painful part of skipping TDD wasn’t the coding—it was the debugging.

When I finally added tests after the implementation to verify the logic, one failed with a confusing error:

Expecting actual: {FRIDAY=[...]} to contain key: MONDAY

This revealed a critical weakness of the “Test After” approach. When a test fails, you don’t know where the problem is: “In the tests or in the business logic.”. It turned out to be an error in the test data (the date provided was a Friday, not Monday). If I had written the test first, the AI would have generated the implementation based on that test data. We wouldn’t have had this problem at all.

The Central Paradox

We tend to think that as AI gets smarter, we can think less. I touched on this in “Can We Think Less with AI?”.

But this experiment confirmed a paradox: To move faster with AI, you must slow down enough to write the test.

Can we avoid the loops of small context errors? Yes. TDD reduces complexity and creates trust between us and the AI . The test acts as a contract. Without it, you are just hoping the AI guesses your architectural constraints correctly.

Forward-Looking Conclusion

So, can we skip TDD? Yes, but you will spend more time adding additional context manually.

The power of TDD is approaching a new peak in the AI era: tests create a POWERFUL CONTEXT for LLMs. Modern models like GPT-4 are powerful, but “better LLM, not exclude context from that function”.

If you want to get the most out of your AI teammate, don’t just ask it to write code. Give it a failing test.

Nik Malykhin

Designing AI-Driven Development Workflows

Architectural Slicing and Pull Request Topography

Discovery Mechanisms and Cognitive Loading

Estimation Metrics and Delivery Impact

Long-Term Repository Maintenance and Documentation Bloat

Chronological Integration Patterns

The Custom Workflow Evolution

The Spec-Kit Framework Evolution

Final Assessment: To Customize or Adopt As-Is

The Non-Transactional Reality of PostgreSQL Sequences

The Expectation of Monotonicity in Order Systems

Simulating the Anomalies: Forward and Backward Jumps

The Forward Jump and the Mechanics of the Write-Ahead Log

The Backward Jump and Uncommitted States

Evaluating Alternatives: The Flawed Custom Counter Workaround

Architectural Best Practices for Kotlin Applications

Human Overwatch in AI Code Generation

Designing a Duplicate Protection Hashing Service

The Baseline AI-Generated Implementation

The Conflict of Speed Versus Security

Identifying the True Microbenchmarking Hotspots

Implementing Immediate Algorithmic and Structural Upgrades

Optimizing the Canonicalization Routine

Strategic Improvements in Memory Management

Automatic Sorting via TreeMap

Elimination of Lambda Allocations

Pre-Sized Array Allocation

Direct Constructor Instantiation

The Production-Ready Hash Service

The Intersect of Generative Artificial Intelligence and Enterprise Engineering

When AI Breaks Database Parity

The Landscape of Database Selection and the Integration Testing Paradigm

The Illusion of Compatibility and the Environmental Disparity Trap

The Architectural Evolution of Local Infrastructure

The Token Regression: Generative AI and Legacy Patterns

Implementing Local Parity through Automation

Declarative Local Infrastructure with Docker Compose

Automating Container Lifecycles within the Gradle

A Systematic Approach to AI in Production

The Evolution Toward Triad Programming

Establishing the AI Environment through Context

Architectural Constraints and Testing Strategies

The Practical Workflow: From Init Prompt to Autopilot

Slicing and Iterative Implementation

Human Oversight and Integration

Diagnosing Observability Gaps in Blocking Controller Methods

Analyzing the Execution Flow

Implementing a Robust Logging Lifecycle

Structured Implementation and MDC Hygiene

Interpreting Downstream Service Signals

Conclusion on Implementation Choices

Project 2002: Curating a Practical Build for the Windows 98 Era

The Core Architecture: Transitioning from Theory to Reality

Motherboard and Maintenance

System Stability and Memory Constraints

Graphics and the Economics of Retro Hardware

Storage, Audio, and Networking

Optical Drives and Media Artifacts

Conclusion

The Preparation of the Machine

The Sim Racing Setup

Hierarchies of Instruction

Slicing Against the Grain

The Logic of Two Flows

The Evolutionary Tree

Back to Reality

The Shared Reality of the Database Ledger

The Migration Ledger

Iteration in the Local Loop

The Virtue of Squashing

Constraints of the Persistent Environment

Discipline Over Magic

The ISO Wall and the CCD: Testing a €45 Film Alternative

The Parallel: Soul vs. Spec Sheets

The Setup: The S45 System

The Friction: The ISO Wall

The Signal and Load

What Stood the Test

Final Reflections

The Illusion of Compatibility and
the Environmental Disparity Trap

The Token Regression:
Generative AI and Legacy Patterns

Use Case B: From `main()` to JUnit 5