Nik Malykhin: Production

De-Risking the Database Migration

Mon, 13 Jul 2026 19:10:44 GMT

The Monolithic Bottleneck: Stored Logic and On-Premises Lock-In

Engineering leaders in scaling enterprises frequently encounter a critical conflict between maintaining infrastructure stability and sustaining product delivery velocity. Pausing a product roadmap to execute a foundational migration introduces severe business risk, yet operating on restrictive legacy infrastructure caps long-term scalability. I encountered this exact tension during an architecture modernization initiative that required a dual-axis migration: transitioning a production environment from on-premises servers to the cloud while simultaneously converting the database engine from Oracle to Amazon Aurora PostgreSQL.

The primary technical bottleneck was a legacy reporting engine processing gigabytes of transactional data daily. At the core of this system sat a monolithic Oracle PL/SQL function spanning approximately 1,000 lines of code, dense with procedural loops, cursor operations, and deeply embedded business logic. This function generated the primary daily transaction report for the business. Because the system was highly bound to native Oracle behavior, a direct automated schema conversion was impossible, and an immediate, comprehensive rewrite threatened to paralyze feature delivery across the entire engineering department.

Modernization initiatives fail when treated as isolated, monumental events that require feature freezes. Scalable architectural evolution requires embedding structural changes directly into the operational fabric of the existing team.

The Reference Architecture Strategy: The Initial Component and the Operational Tax

To reconcile the need for continuous feature delivery with the necessity of infrastructure modernization, I chose to avoid a broad, shallow migration approach. Instead, I selected a targeted strategy that prioritized isolating and resolving the single most complex, high-risk technical component first. This chosen approach centers on an initial reference component. By dedicating senior engineering resources to deconstruct the 1,000-line PL/SQL function, I could expose every fundamental compatibility friction point between Oracle and PostgreSQL under controlled conditions.

The deliberate trade-off in this approach involves a high upfront concentration of engineering effort on a single piece of code. However, the logical justification for this investment is the generation of reusable operational knowledge. Once the complex reporting function was successfully migrated and decoupled, I compiled the exact methodologies, automated scripts, and resolution steps into an internal, highly detailed technical tutorial.

Scaling Migration Capabilities Across Internal Teams

This tutorial functioned as a predictable operational tax on subsequent feature development. Rather than relying on specialized external infrastructure contractors, the client company used its existing engineering staff to migrate the remaining, less complex database modules. Because the engineering team possessed a definitive blueprint derived from the hardest technical problem, they could execute the remaining database conversions incrementally during standard sprint cycles. This strategy successfully preserved the main product roadmap while systematically hardening the underlying data tier.

The Technical Execution: Deconstruction, Emulation, and Decoupling

The execution phase required moving beyond the automated capabilities of basic schema migration tools. The initial phase focused on reverse-engineering the procedural logic without altering business outcomes. Because documentation for the legacy PL/SQL code was absent, I established an objective baseline by developing an automated load-testing framework. This suite generated extensive mock transactional datasets, passing them through the active Oracle environment to capture precise inputs and outputs. The client engineering team verified these test results, ensuring that the behavioral requirements of the report were completely preserved.

Integrating the Target Cloud Architecture

Once the data baseline was verified, the physical migration pipeline was established using a combination of managed cloud services and deliberate software decoupling. Data transfer from the on-premises database to AWS was managed via AWS Database Migration Service for schema baselines and ongoing change data capture replication. For the extraction, transformation, and loading phases of complex relational data subsets, I implemented AWS Glue jobs.

The primary architectural challenge lay in handling the specific structural capabilities of the legacy PL/SQL environment within an open-source database engine. I adopted a two-tiered resolution strategy to address this problem. For minor syntactic differences and native utility functions, I utilized the open-source orafce extension inside the Amazon Aurora PostgreSQL instance. This extension provides native compatibility layers for Oracle-specific components, including date utilities, conditional operations, and specific database packages, allowing mechanical translation of simpler logic strings.

Moving Logic from the Storage Tier to Compute Workers

For the complex procedural logic containing intensive loops and deep data mutations, I made a deliberate architectural decision to pull the code completely out of the database engine. Translating complex procedural routines directly into PostgreSQL PL/pgSQL often results in unscalable database CPU utilization. I extracted these core business calculation routines into standalone Scala jobs.

By migrating the processing logic to Scala, the database was restored to its optimal role as a high-performance transactional data store, rather than a heavy application processing server. The Scala application workers consumed the raw data from Aurora PostgreSQL, processed the transformations efficiently via structured memory management, and compiled the final daily reports.

Quantifying the Architecture: Financial and Performance Outcomes

The success of an infrastructure migration must ultimately be verified by objective operational metrics rather than structural elegance alone. By utilizing the initial reference component strategy and decoupling computation from storage, the architecture achieved clear, verifiable improvements in both system performance and operational expenditure.

True architectural optimization aligns infrastructure cost reductions directly with performance enhancements, validating technical changes through concrete business metrics.

Analyzing Cost and Computational Velocity

Transitioning away from the legacy environment immediately removed the substantial capital expense associated with commercial on-premises Oracle database licenses. By shifting the transactional and analytical workloads to a managed Amazon Aurora PostgreSQL environment combined with temporary compute workers for the Scala jobs, the client company reduced its total database infrastructure expenditures by approximately 10% to 15%.

Simultaneously, the performance of the core business report improved dramatically. The legacy PL/SQL function frequently caused read-write contention and locks on the primary database engine due to the sheer volume of daily transactions. The decoupled architecture, which isolated transactional writes in Aurora and shifted analytical transformations to the Scala runtime environment, executed the critical daily transaction report 40% faster. This optimization was definitively proven by running the initial verification load tests against the completed production cloud infrastructure, demonstrating that systematic, blueprint-driven hardening can occur without interrupting business velocity.

Designing AI-Driven Development Workflows

Tue, 30 Jun 2026 07:01:49 GMT

Evaluating the operational efficiency of engineering workflows is essential when integrating advanced language models into daily development cycles. To understand the practical boundaries of autonomous code generation, I conducted an implementation experiment using the GPT-5.3Codex model. The objective was to complete a medium-sized user story involving the integration of SendGrid template rendering and storage capabilities into an established email notification use case.

This assessment contrasts two distinct methodologies: an out-of-the-box framework known as spec-kit, and a lightweight alternative designated as the custom workflow. The custom workflow utilizes chat mode during the initial kick-off and planning stages to generate specific tasks, subsequently shifting to Codex for the explicit implementation of those tasks guided by an AGENT.md operational file and localized skills. The goal is to determine whether spec-kit provides immediate utility without modification or if its inherent structural characteristics necessitate explicit customization.

Architectural Slicing and Pull Request Topography

The structural composition of code updates significantly influences the sustainability of continuous integration pipelines. During the experiment, the custom workflow isolated changes into modular components that aligned directly with my default hexagonal architecture. This approach generated six discrete pull requests. The median size of these code updates remained between three and four files, with the most extensive single update containing five files. Every file generated by this workflow contained exclusively functional implementation code, eliminating secondary artifact noise.

In contrast, the spec-kit framework approached the user story through vertical slicing, attempting to package complete functional business capabilities into each cycle. This strategy yielded four pull requests, but the internal volume of these updates was substantially larger, averaging seven to eight files per pull request. The most expansive update within this set encompassed thirteen distinct files.

From an engineering operations perspective, managing large pull requests introduces definitive maintenance challenges. Reviewing a thirteen-file modification requires deep contextual immersion and can easily exhaust a multi-hour block of defensive engineering time. Conversely, integrating five to seven highly compact pull requests throughout a standard working day introduces negligible cognitive friction, provided the changes remain small and structurally isolated. Notably, both workflows initially introduced an identical rendering bug involving SendGrid template helpers, which required a targeted corrective commit. This suggests that the structural layout of the pull requests, rather than initial code accuracy, serves as the primary differentiator in developer friction.

Discovery Mechanisms and Cognitive Loading

The preparation phase exposes a stark contrast in the type of mental energy required by each workflow. The custom workflow relies heavily on an interactive discovery process during the chat-based kick-off. The model proactively initiated a clarification and planning dialogue to map out the implementation requirements before generating the discrete tasks for Codex.

This interactive session translated into a substantial textual footprint. The initial clarification phase required three distinct iterations of questioning and answering, totaling eight pages and 2,281 words. This was immediately followed by the planning phase, which required two subsequent iterations and produced an additional eight pages and 2,370 words. Cumulatively, this chat dialogue generated 16 pages of standard layout text, or 4,651 words. Assuming an average conversational rate of 150 words per minute, this preparatory phase equates to a thirty-minute collaborative pair-programming session.

The spec-kit framework approaches preparation through localized, static document synthesis rather than ongoing verbal dialogue. Before initiating code generation, the tool compiled eight distinct analytical documents within the specs directory.

An examination of the generated specs directory reveals how this text is distributed across individual documents. The requirements checklist contains 149 words, while the OpenAPI contract specifying the interface changes takes up 102 words. The data model specification consists of 211 words, and the implementation plan spans 409 words. Additionally, the quickstart document contains 140 words, the research summary covers 260 words, the comprehensive technical specification comprises 1,021 words, and the final task breakdown document details 1,540 words.

The documentation total matches the 16-page volume of the conversational workflow but contains 3,832 words of highly dense technical material. When applying an analytical reading standard of 75 words per minute for complex documentation, reviewing this output demands roughly 50 minutes of solitary, rigorous technical analysis. This calculation excludes the initial setup interactions required to seed the tool.

Insight: Engaging in a collaborative, bidirectional technical dialogue yields lower cognitive fatigue than parsing dense, machine-generated analytical documentation independently. The conversational format allows an engineer to guide the discovery path dynamically, whereas the document-heavy approach demands prolonged, solitary code-review stamina.

Estimation Metrics and Delivery Impact

To contextualize project velocity, I utilize a standard estimation scale where one point equates to a minor task, three points represent half of a development iteration, and five points correspond to a full iteration block. Historically, the targeted user story would receive an empirical estimate of three story points.

By utilizing either AI-driven development environment, the effective complexity of the implementation dropped significantly, allowing the story to be re-estimated at two points. This finding aligns with observations gathered over a multi-month period: the strategic application of generative models consistently removes approximately one story point from medium-sized requirements.

However, this efficiency gain exhibits a clear non-linear trend when applied to larger tasks. A single-point reduction on a highly complex, five-point user story does not alter the fundamental delivery architecture or allow the task to be decomposed more effectively. For larger software initiatives, the exact return on investment provided by these autonomous tools requires further empirical evaluation.

Long-Term Repository Maintenance and Documentation Bloat

A critical consideration when adopting spec-kit out of the box is the long-term structural health of the code repository. Generating eight non-service documentation files for a single medium-sized user story introduces a noticeable maintenance tail.

Consider a baseline engineering department consisting of three to four development pairs. If these pairs collectively deliver approximately three completed user stories per development iteration across 26 annual iterations, the repository configuration changes dramatically over time. Under the unmodified spec-kit framework, this delivery velocity results in the accumulation of roughly 600 non-service Markdown and YAML files every year. Managing the lifecycle, accuracy, and relevance of hundreds of static documentation files creates an administrative burden that can quickly devalue the initial velocity gains of automated generation.

Chronological Integration Patterns

The Custom Workflow Evolution

The custom workflow distributed code modifications across six isolated, single-purpose commits containing exclusively functional code. The sequence began with a five-file commit implementing active SendGrid template retrieval, which introduced the core repository interface, its SendGrid implementation, an exception for missing templates, a version value object, and a corresponding repository error test. Next, a four-file commit introduced the Handlebars email template renderer by modifying the build configuration and adding the renderer service, the rendered email domain model, and the renderer test suite.

The third step was a three-file commit handling the storage of the rendered template within the document management system, which impacted the primary use case, the consumer contract test, and the use case test. To address a rendering bug, a two-file corrective commit added explicit support for SendGrid Handlebars helpers within the core rendering logic. This was followed by a three-file commit introducing global exception mapping using an exception handler advice and its corresponding integration test. The evolution concluded with a three-file commit aggregating final integration and regression verifications across the controller and use-case boundaries.

The Spec-Kit Framework Evolution

The spec-kit framework grouped its operations into broader, multi-file updates that combined documentation and implementation boundaries. The process opened with an eight-file initial commit compiling the prerequisite requirements, OpenAPI specifications, data models, plans, quickstart guides, research notes, technical specifications, and task manifests within the specs directory.

This was followed by a thirteen-file monolithic commit deploying the dynamic template rendering architecture, which simultaneously modified the build configuration, the task checklist, the exception handler advice, the SendGrid repository implementation, the missing template exception, the document management system store request, the rendered email domain model, the core use case, and their associated tests. The third phase was a four-file verification commit introducing test coverage for document management system failure scenarios. The cycle concluded with a five-file verification commit ensuring proper handling of missing templates, which updated integration tests, contract verifications, and serialization payloads.

Final Assessment: To Customize or Adopt As-Is

Returning to the original operational query: can spec-kit be utilized effectively without modification? The data suggests that an out-of-the-box deployment introduces distinct operational trade-offs that make customization necessary for long-term health.

While spec-kit succeeds in lowering short-term delivery complexity, its vertical slicing strategy creates overly large pull requests that challenge standard daily review workflows. Furthermore, the generation of extensive static documentation introduces systemic repository bloat that scales poorly across multiple engineering teams.

The ideal path forward requires a hybrid architecture. By customizing spec-kit to inherit the structural instructions of the custom workflow, we can merge the systematic rigor of automated planning with the clean, highly isolated pull request structure required by hexagonal architectures. Future efforts will focus on implementing custom skills within the agent configuration to restrict the generation of non-service files while preserving shared context between the conversational interface and the underlying code generation engine.

The Non-Transactional Reality of PostgreSQL Sequences

Tue, 23 Jun 2026 07:00:59 GMT

The Expectation of Monotonicity in Order Systems

When building an order management pipeline, the primary objective is to capture, validate, and permanently store transactional records such as customer purchases, financial ledgers, or invoices. This system represents a comprehensive infrastructure architecture rather than a simple database configuration because it operates as a multi-layered distributed pipeline. In a typical production environment, this architecture encompasses web servers ingesting thousands of concurrent requests, connection pools regulating database lifecycles, and downstream services like fulfillment, inventory, and accounting that ingest this data via asynchronous message queues.

To ensure absolute tracking, auditing, and predictability across these decoupled architectural boundaries, a common engineering assumption is that these transaction records will possess sequentially ordered identifiers, moving uniformly from one integer to the next without omission. In my implementation, I utilized PostgreSQL with a primary key defined as a big integer generated by default as an identity. This approach is widely recognized for its enterprise stability and seamless integration within robust backend data ecosystems.

During routine disaster recovery drills, however, the monitoring logs revealed an unexpected pattern in the primary key sequence. Instead of a continuous, gapless progression, the identifiers exhibited distinct omissions, appearing as a broken sequence with missing elements. A manual audit verified that no records were lost; every transaction was accounted for, yet the identifiers contained significant gaps. This discovery prompted a detailed investigation into the core mechanics of sequence manipulation within the PostgreSQL engine.

Simulating the Anomalies: Forward and Backward Jumps

To isolate the root causes of these numerical omissions, I constructed a controlled replication environment using Kotlin and the Exposed framework to simulate various infrastructure failure states.

The Forward Jump and the Mechanics of the Write-Ahead Log

The first scenario reproduces a sudden infrastructure termination. The configuration initializes a sequence and captures the initial increment within a standard database transaction block.

transaction {
    exec(”CREATE SEQUENCE seq;”)
    val firstVal = exec(”SELECT nextval(’seq’);”) { rs ->
        rs.next()
        rs.getLong(1)
    }

    println(”Initial value: $firstVal”)
}

Once the initial value is confirmed as one, the application executes an ungraceful process termination at the operating system level, targeted directly at the backend process identifier associated with the current database session.

fun triggerDatabaseProcessCrash() {
    transaction {
        val pid = exec(”SELECT pg_backend_pid();”) { rs ->
            rs.next()
            rs.getInt(1)
        }

        Runtime.getRuntime().exec(”kill -9 $pid”)
    }
}

Following this abrupt termination, the Kotlin application encounters a communication exception or a transient connection exception as the underlying connection pool loses its link to the server. Upon the re-establishment of a stable connection to the database instance, a subsequent call to the sequence reveals a substantial forward leap rather than the expected single increment.

transaction {
    val nextVal = exec(”SELECT nextval(’seq’);”) { rs ->
        rs.next()
        rs.getLong(1)
    }
    println(”Value after crash recovery: $nextVal”)
}

The resulting output yields a value of 34. This behavior is directly attributable to an internal optimization parameter within the PostgreSQL source code, governed by a pre-allocation macro.

To minimize persistent disk write frequency and maximize concurrent scalability, the engine pre-allocates a block of 32 sequence values by default and logs this allocation to the Write-Ahead Log. When an ungraceful shutdown occurs, the remaining unassigned values within that cached block are permanently lost, causing the sequence to resume from the boundary of the subsequent pre-allocated block during recovery.

The Backward Jump and Uncommitted States

An even more perplexing anomaly occurs when a sequence appears to move backward following a critical system failure. This state can be demonstrated by advancing a sequence multiple times within a single transaction block without executing a formal commit statement, followed by an immediate hard process termination.

import org.jetbrains.exposed.sql.transactions.transaction
import org.jetbrains.exposed.sql.exec

fun demonstrateBackwardsJump() {
    transaction { exec(”CREATE SEQUENCE seq;”) }

    transaction {
        val v1 = exec(”SELECT nextval(’seq’);”) { it.next(); it.getLong(1) }
        val v2 = exec(”SELECT nextval(’seq’);”) { it.next(); it.getLong(1) }
        val v3 = exec(”SELECT nextval(’seq’);”) { it.next(); it.getLong(1) }
        println(”Sequence values in-transaction: $v1, $v2, $v3”)
        val pid = exec(”SELECT pg_backend_pid();”) { it.next(); it.getInt(1) }
        Runtime.getRuntime().exec(”kill -9 $pid”)
    }
}

Upon reconnecting to the database and invoking the next value, the system returns a value of one. This behavior emphasizes that engine sequences operate entirely outside standard transactional boundaries. While individual sessions track these increments locally during an active transaction, the underlying values are discarded during a hard crash because they were never permanently etched into the Write-Ahead Log as a committed state.

Evaluating Alternatives: The Flawed Custom Counter Workaround

In an attempt to bypass the inherent gaps associated with standard database sequences, a developer might consider implementing a custom identity counter utilizing standard transactional tables and functions. The implementation typically involves creating an explicit sequence tracking table and an atomic update function.

CREATE TABLE MY_SEQ (ID BIGINT NOT NULL);

INSERT INTO MY_SEQ (ID) VALUES (0);

CREATE FUNCTION NEXT_VAL() RETURNS BIGINT
    LANGUAGE SQL AS
‘UPDATE MY_SEQ SET ID = ID + 1 RETURNING ID’;

While this mechanism successfully eliminates numerical gaps by utilizing the standard transactional engine, it introduces a massive performance penalty that makes it unviable for high-throughput applications. When the custom function executes the update statement, PostgreSQL applies an exclusive row-level lock to that single row within the tracking table. Consequently, every concurrent transaction across the entire application ecosystem must wait in a strict, single-file queue to obtain a new identifier.

If a single transaction requires 100 milliseconds to process its internal business logic prior to committing, all other concurrent threads are completely blocked for that duration. In a high-concurrency production environment, this structural bottleneck rapidly triggers database connection timeouts, thread starvation, and severe application latency. The trade-off between absolute numerical continuity and system throughput represents a deliberate choice where performance must be prioritized.

Architectural Best Practices for Kotlin Applications

Managing non-transactional sequence behavior within Kotlin services requires a deliberate approach to application architecture and data flow design.

Essential Insight: Database sequences must be treated as internal, transient optimization helpers rather than durable, externally accurate identifiers.

When developing services that interface with PostgreSQL sequences, specific architectural practices must guide the implementation to prevent data degradation across the broader system. It is vital to never expose or distribute an identifier generated by a sequence to external systems, such as asynchronous message brokers or user-facing REST responses, until the enclosing database transaction has been successfully committed. If a system failure or network interruption occurs prior to the final commit, the sequence value is permanently discarded, leading to data inconsistencies or dangling references within external architectures.

Furthermore, structuring database transaction boundaries to be as narrow and short-lived as possible limits the window of vulnerability for process crashes and mitigates the risk of unexpected numerical anomalies. If the core business domain dictates a strict requirement for guaranteed, immutable, and gapless identifiers that must survive catastrophic infrastructure failures, database-driven sequences must be abandoned entirely. In such scenarios, transitioning to high-resolution Universally Unique Identifiers or deploying a dedicated identity reservation ledger specifically engineered to preserve state integrity across volatile failure scenarios provides the necessary durability.

Human Overwatch in AI Code Generation

Tue, 09 Jun 2026 07:01:54 GMT

Designing a Duplicate Protection Hashing Service

In a recent architectural assignment, I was tasked with implementing a duplicate-protection data hashing service for a high-throughput Kotlin application. This service operates within an enterprise runtime environment utilizing Java 21 and is deployed on AWS ECS. The fundamental requirement was to convert incoming JSON payloads into a stable, unique string representation to prevent duplicate processing within the system pipeline.

Initially, I gravitated toward the MD5 algorithm. This selection was informed by a historical performance convention, assuming MD5 would minimize latency for an internal data validation check where cryptographic security was not a driving requirement. Because I routinely leverage generative artificial intelligence to streamline production workflows, I provided an AI assistant with a prompt detailing these exact constraints: a high-throughput Kotlin service tasked with producing an MD5 hash from an incoming JSON payload.

The Baseline AI-Generated Implementation

The generative AI model delivered exactly what was requested based on those initial assumptions. It produced a complete, functional component wrapper that mapped incoming data to an MD5 hash. Because the initial prompt did not ask the AI to evaluate potential platform regressions or optimize memory allocations, the model generated a literal translation of the requested functional requirements, complete with standard boilerplate utilities.

import com.fasterxml.jackson.databind.JsonNode
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.databind.node.ArrayNode
import com.fasterxml.jackson.databind.node.ObjectNode
import org.springframework.stereotype.Component
import java.nio.charset.StandardCharsets
import java.security.MessageDigest

@Component
class HashGenerator(
   private val objectMapper: ObjectMapper,
) {
   fun generate(
       conversationId: String?,
       payload: JsonNode,
   ): String {
       val normalizedConversationId = conversationId.orEmpty()
       val canonicalPayloadBytes = objectMapper.writeValueAsBytes(canonicalize(payload))
       val conversationBytes = normalizedConversationId.toByteArray(StandardCharsets.UTF_8)

       val md5 = MessageDigest.getInstance(MD5_ALGORITHM)
       md5.update(conversationBytes)
       md5.update(canonicalPayloadBytes)
       return md5.digest().toHexString()
   }

   private fun ByteArray.toHexString(): String {
       val chars = CharArray(size * 2)
       forEachIndexed { index, byte ->
           val value = byte.toInt() and 0xFF
           chars[index * 2] = HEX_CHARS[value ushr 4]
           chars[(index * 2) + 1] = HEX_CHARS[value and 0x0F]
       }
       return String(chars)
   }

   private fun canonicalize(node: JsonNode): JsonNode =
       when {
           node.isObject -> {
               val objectNode = node as ObjectNode
               val sortedFields =
                   objectNode
                       .fields()
                       .asSequence()
                       .toList()
                       .sortedBy { it.key }
               val canonicalObject = objectMapper.nodeFactory.objectNode()
               sortedFields.forEach { (key, value) ->
                   canonicalObject.set(key, canonicalize(value))
               }
               canonicalObject
           }
           node.isArray -> {
               val arrayNode = node as ArrayNode
               val canonicalArray = objectMapper.nodeFactory.arrayNode()
               arrayNode.forEach { item ->
                   canonicalArray.add(canonicalize(item))
               }
               canonicalArray
           }
           else -> node
       }

   private companion object {
       const val MD5_ALGORITHM = "MD5"
       val HEX_CHARS = "0123456789abcdef".toCharArray()
   }
}

The Conflict of Speed Versus Security

While the code executed correctly in testing environment, it triggered a critical security flag during static code analysis in SonarQube. Our internal security champion mandated an immediate transition to SHA-256, citing systemic software vulnerabilities associated with MD5 collision risks. This requirement instigated a broader team discussion regarding the trade-offs between processing speed and cryptographic security within microservices.

To resolve this conflict, I conducted a deeper investigation into the execution paths of the hashing utility. The findings completely reframed the problem space. On a modern Java 21 runtime running on optimized cloud infrastructure, the execution variance between MD5 and SHA-256 is structurally negligible. The true computational bottlenecks were located within the data preprocessing layers rather than the mathematical operations of the message digest.

Key Insight: Upgrading an algorithm to meet security compliance parameters rarely degrades system performance if the surrounding data manipulation logic remains unoptimized. The true latency hotspots frequently reside in object serialization and memory allocation patterns.

Identifying the True Microbenchmarking Hotspots

The profiling data isolated three specific architectural execution risks within the original code structure:

The canonicalization routine introduced deep recursion. Converting JSON fields into sequences, collecting them into lists, and sorting them generated an unsustainable volume of short-lived heap objects. This structure risks triggering frequent JVM Garbage Collection pauses under high throughput.
Jackson serialization via the writeValueAsBytes function consumed substantially more CPU cycles than any subsequent hashing operation. Transforming a newly instantiated object graph into a raw byte array is computationally expensive.
The manual byte-to-hex manipulation loop, while functional, missed the low-level optimizations provided by modern platform utilities.

Implementing Immediate Algorithmic and Structural Upgrades

The first step involved addressing the security non-compliance while cleaning up the obvious inefficiencies. I replaced the manual hex encoding with the native HexFormat utility introduced in Java 17 and further optimized in Java 21. Concurrently, I transitioned the algorithm to SHA-256, which leverages hardware acceleration on contemporary processors.

import com.fasterxml.jackson.databind.JsonNode
import com.fasterxml.jackson.databind.ObjectMapper
import org.springframework.stereotype.Component
import java.security.MessageDigest
import java.util.HexFormat

@Component
class HashGenerator(
    private val objectMapper: ObjectMapper,
) {
    private val hexFormatter = HexFormat.of()

    fun generate(
        conversationId: String?,
        payload: JsonNode,
    ): String {
        val normalizedConversationId = conversationId.orEmpty()
        val canonicalPayloadBytes = objectMapper.writeValueAsBytes(canonicalize(payload))
        val conversationBytes = normalizedConversationId.toByteArray(java.nio.charset.StandardCharsets.UTF_8)

        val sha256 = MessageDigest.getInstance(SHA256_ALGORITHM)
        sha256.update(conversationBytes)
        sha256.update(canonicalPayloadBytes)
        
        return hexFormatter.formatHex(sha256.digest())
    }
}

Optimizing the Canonicalization Routine

Resolving the security alert was an essential compliance milestone, but achieving production-grade execution required a total refactoring of the canonicalize function. To eliminate high allocation rates and latency spikes, I rewrote the structural transformation logic to treat heap memory defensively.

import com.fasterxml.jackson.databind.JsonNode
import com.fasterxml.jackson.databind.node.ArrayNode
import com.fasterxml.jackson.databind.node.ObjectNode
import java.util.TreeMap

private fun canonicalize(node: JsonNode): JsonNode =
    when {
        node.isObject -> {
            val sortedMap = TreeMap()
            val fieldsIterator = node.fields()
            while (fieldsIterator.hasNext()) {
                val entry = fieldsIterator.next()
                sortedMap[entry.key] = canonicalize(entry.value)
            }
            ObjectNode(objectMapper.nodeFactory, sortedMap)
        }
        node.isArray -> {
            val canonicalArray = objectMapper.nodeFactory.arrayNode(node.size())
            for (item in node) {
                canonicalArray.add(canonicalize(item))
            }
            canonicalArray
        }
        else -> node
    }

Strategic Improvements in Memory Management

The architectural enhancements within the refactored canonicalization pipeline are governed by four distinct design choices across key subsections.

Automatic Sorting via TreeMap

The original logic explicitly pulled object fields into a Kotlin sequence, forced them into a temporary list, and executed a sorting lambda. The optimized approach streams fields directly into a java.util.TreeMap. Operating on a red-black tree architecture, the TreeMap inherently handles alphabetical key sorting upon element insertion, completely eliminating intermediate collection lifecycles.

Elimination of Lambda Allocations

Chains of functional methods like asSequence, toList, and sortedBy generate short-lived operational objects behind the scenes. In a high-throughput Spring Boot architecture, these objects increase the allocation rate and burden the garbage collector. Replacing functional abstractions with explicit while and for loops guarantees zero closure allocations inside the iteration logic.

Pre-Sized Array Allocation

The initial implementation initialized the array node wrapper using an empty factory declaration. By default, Jackson instantiates an underlying storage array with a conservative capacity constraint. When parsing highly populated arrays, the JVM is forced to repeatedly suspend execution to reallocate memory and migrate elements. Explicitly defining the initialization size via node.size prepares the exact memory requirements upfront.

Direct Constructor Instantiation

The default initialization sequence of a Jackson ObjectNode instantiates an internal LinkedHashMap before receiving data updates via the set method. The revised approach utilizes a public constructor that directly accepts the pre-populated TreeMap, reducing the required object instantiation operations by half.

The Production-Ready Hash Service

Combining these algorithmic upgrades and memory management adjustments results in a secure, performant, and enterprise-grade component.

import com.fasterxml.jackson.databind.JsonNode
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.databind.node.ArrayNode
import com.fasterxml.jackson.databind.node.ObjectNode
import org.springframework.stereotype.Component
import java.security.MessageDigest
import java.util.HexFormat
import java.util.TreeMap

@Component
class HashGenerator(
    private val objectMapper: ObjectMapper,
) {
    private val hexFormatter = HexFormat.of()

    fun generate(
        conversationId: String?,
        payload: JsonNode,
    ): String {
        val normalizedConversationId = conversationId.orEmpty()
        val canonicalPayloadBytes = objectMapper.writeValueAsBytes(canonicalize(payload))
        val conversationBytes = normalizedConversationId.toByteArray(java.nio.charset.StandardCharsets.UTF_8)

        val sha256 = MessageDigest.getInstance(SHA256_ALGORITHM)
        sha256.update(conversationBytes)
        sha256.update(canonicalPayloadBytes)
        
        return hexFormatter.formatHex(sha256.digest())
    }

    private fun canonicalize(node: JsonNode): JsonNode =
        when {
            node.isObject -> {
                val sortedMap = TreeMap()
                val fieldsIterator = node.fields()
                while (fieldsIterator.hasNext()) {
                    val entry = fieldsIterator.next()
                    sortedMap[entry.key] = canonicalize(entry.value)
                }
                ObjectNode(objectMapper.nodeFactory, sortedMap)
            }
            node.isArray -> {
                val canonicalArray = objectMapper.nodeFactory.arrayNode(node.size())
                for (item in node) {
                    canonicalArray.add(canonicalize(item))
                }
                canonicalArray
            }
            else -> node
        }

    private companion object {
        const val SHA256_ALGORITHM = "SHA-256"
    }
}

The Intersect of Generative Artificial Intelligence and Enterprise Engineering

This optimization exercise highlights a critical reality regarding the application of generative artificial intelligence within enterprise software development. The initial AI-generated code was not technically broken. It accurately realized the precise constraints of the original prompt: it calculated an MD5 hash over an object payload. The defect was rooted in my own outdated assumptions about cryptographic overhead and the omission of strict platform analysis in the initial prompt requirements.

Takeaway: Generative artificial intelligence operates as an exceptional execution mechanism, but it lacks the contextual capacity to independently enforce enterprise-grade performance boundaries without human engineering overwatch. The true value of the technology lies in its capacity to serve as an interactive learning accelerator, contracting traditional research cycles from hours down to a matter of minutes.

When AI Breaks Database Parity

Nik — Tue, 02 Jun 2026 07:03:12 GMT

The Landscape of Database Selection and the Integration Testing Paradigm

According to global database engine rankings, relational models continue to dominate the software development landscape. The top positions are consistently occupied by Oracle, MySQL, Microsoft SQL Server, and PostgreSQL, with MongoDB following closely as the primary document-oriented alternative. In my own architectural designs, PostgreSQL serves as the primary relational database engine, complemented by Amazon Web Services S3 for object storage.

Previously, I explored the complexities of managing database migrations with Flyway. Today, I want to extend that conversation to address database integration testing and the critical requirement of environmental parity. For a considerable duration within Java and Kotlin development stacks, the H2 database engine served as the standard default for local execution and integration testing. As an in-memory, runtime-configured database, H2 provides seamless integration with the Spring Framework and requires zero external infrastructure installation. The engine also supports a dedicated PostgreSQL compatibility mode, which historically made it an appealing candidate for simulating a production environment during local development.

The Illusion of Compatibility and
the Environmental Disparity Trap

While H2 excels as a lightweight runtime database when interactions are mediated entirely by abstract object-relational mapping frameworks, its feature parity with PostgreSQL falls short of complete functional duplication. The compatibility boundary rarely covers advanced native database capabilities, leading to subtle and disruptive behavioral deviations between development and production environments.

For instance, H2 natively supports specific windowing functions like ROWNUM, which are completely absent in PostgreSQL. Conversely, writing advanced queries that exploit native PostgreSQL functions or triggers quickly exposes the limitations of the compatibility mode. The critical nature of this gap becomes evident during schema migration lifecycle events.

During a recent project iteration, our development workflow required introducing an MD5 hashing mechanism to process historical records during a data migration phase. The PostgreSQL syntax accepts a simple byte array input for its native md5 function. When Flyway attempted to execute this migration script against the local H2 testing instance, the build failed immediately. The H2 engine does not recognize this function format, requiring an entirely different functional signature known as HASH, which demands an explicit algorithm string and expression parameters. This mismatch highlights the structural risk of relying on a simulated environment.

True environmental parity cannot be achieved by translating syntax at runtime; it requires validating software against the exact engine configuration slated for production deployment.

The Architectural Evolution of Local Infrastructure

The necessity of accepting the behavioral compromises of an in-memory database has been thoroughly eliminated by advancements in containerization and build-tool integration. The introduction of Docker fundamentally modified local engineering environments, a transformation subsequently extended to automated testing via the Testcontainers framework.

With the release of Spring Boot 3.1.0 in the spring of 2023, the framework introduced built-in, first-class configuration mechanisms for Testcontainers. This development eliminated the primary architectural justification for maintaining a split database architecture between testing and production. Even for projects maintaining simple data models, the modern tooling ecosystem removes the necessity of managing an alternate database dialect for local verification.

The Token Regression:
Generative AI and Legacy Patterns

The availability of modern containerized alternatives raises a pertinent question as to why environmental disparity remains a topic of discussion in 2026. The emergence of generative artificial intelligence as a ubiquitous development tool provides the explanation. During a concurrent development phase involving the bootstrapping of four distinct microservices, my engineering team utilized GitHub Copilot to accelerate the generation of service skeletons and initial configuration manifests.

Because generative models predict output tokens based on historical training data, their recommendations are heavily weighted toward long-standing industry conventions. Due to the decade-long prominence of H2 in historical Spring tutorials and code repositories, the assistant recommended an in-memory H2 configuration for local development. The engineers initializing the services accepted this recommendation as a functional baseline, thereby reintroducing legacy environmental friction back into a modern development stack.

Generative code assistants operate on statistical probability derived from historical data, which can inadvertently cause architectural regressions by propagating legacy best practices into modern codebases.

Implementing Local Parity through Automation

To resolve the structural friction caused by mismatched database engines, we replaced the in-memory simulation with a containerized PostgreSQL instance dedicated to local execution. To ensure this change did not introduce manual overhead to the developer workflow, we integrated the container lifecycles directly into our build orchestration layer.

Declarative Local Infrastructure with Docker Compose

The local database environment is declared using a concise seventeen-line Docker Compose configuration. This manifest utilizes a lightweight Alpine Linux distribution of PostgreSQL 17.9 and includes an explicit readiness health check to ensure dependent tasks block until the database engine is fully initialized.

name: one_service

services:
 postgres:
   image: postgres:17.9-alpine
   container_name: one-service-postgres
   environment:
     POSTGRES_DB: one_service
     POSTGRES_USER: admin
     POSTGRES_PASSWORD: admin
   ports:
    - "5432:5432"
   healthcheck:
     test: ["CMD-SHELL", "pg_isready -U admin -d one_service"]
     interval: 10s
     timeout: 5s
     retries: 5

This configuration allows developers to manage the entire infrastructure state directly from the terminal using standard compose lifecycle commands.

Automating Container Lifecycles within the Gradle

To eliminate manual intervention entirely, we registered custom execution tasks within the Kotlin DSL build configuration file (build.gradle.kts). These tasks manage the container lifecycle programmatically, guaranteeing that the database is active during specific phases such as schema generation or local application execution.

val composeUpPostgres by tasks.registering(Exec::class) {
   group = "documentation"
   description = "Starts local Postgres container and waits until it is healthy"
   commandLine("docker", "compose", "up", "-d", "--wait", "--wait-timeout", "120", "postgres")
}

val composeStopPostgres by tasks.registering(Exec::class) {
   group = "documentation"
   description = "Stops local Postgres container after OpenAPI generation"
   commandLine("docker", "compose", "stop", "postgres")
}

By utilizing Gradle task graph dependencies, these infrastructure tasks are hooked automatically into the application build process. For example, generating OpenAPI documentation requires an active database to resolve the schema accurately. We map this dependency explicitly using the build task lifecycle.

tasks.named("generateOpenApiDocs") {
   dependsOn(composeUpPostgres)
   finalizedBy(composeStopPostgres)
   ...
}

This structural configuration ensures that the container initializes prior to the generation task and terminates cleanly upon completion, removing manual environmental variance from the automated workflow.

Ultimately, the architectural tools available mean there are very few justifications for maintaining an in-memory database simulation in a modern ecosystem. When automated assistants suggest these legacy configurations, human engineers must remain the final arbiters of architectural validity, recognizing that statistical probability does not always equate to engineering excellence.

A Systematic Approach to AI in Production

Tue, 19 May 2026 07:01:26 GMT

I have utilized generative AI tools such as ChatGPT and GitHub Copilot for several years, but the central question that has consistently occupied my research is how to effectively apply these technologies within a production environment. Through dozens of experiments, I have moved beyond simple code generation to delivering production-ready stories with minimal manual intervention. My objective is to transition from viewing AI as a mere novelty to integrating it into a functional triad programming model.

The Evolution Toward Triad Programming

In my experience, modern enterprise software cannot be developed in isolation; it requires a collaborative team effort. For roughly six months, I have explored the transition from traditional pair programming to triad programming, where an AI teammate joins the human pair to facilitate development. This transition requires a cultural shift within the team to move from treating AI as a buzzword to utilizing it as a practical tool.

The support of technical leadership is an important prerequisite for this shift. Without such backing, changing established team initiatives and workflows is difficult. To support this cultural change, we organized internal sessions and weekly two-hour workshops dedicated to demystifying the technology. By exploring how to master context and refine instructions, the team can eliminate the magical perception often associated with artificial intelligence and treat it as a predictable component of the engineering process.

Establishing the AI Environment through Context

Defining the AI environment is an ongoing challenge, especially given the limitations inherent in production workflows. For my current purposes, I define the environment as the context provided to the model, which effectively makes the AI environment equal to its instructions. Whether these instructions are provided through a prompt, a specific configuration file, or an MCP server, they serve as the foundational constraints for the AI's output.

I believe it is essential to manage the AI environment as closely as possible to the development process. This allows the team to remain agile and make necessary changes without creating disconnected silos of instruction.

A significant advantage of this approach is the ability to leverage existing, plain-English documentation rather than creating specialized AI adaptations. For example, I use the team's standard Confluence page for quality assurance and testing strategies as a direct instruction set. This documentation outlines requirements such as ensuring every acceptance criterion is covered by a test and avoiding complex end-to-end suites in favor of integration coverage. Decoupling the testing strategy from AI-specific formatting ensures that if the team updates their standards, the AI's context is automatically updated, while the documentation remains readable for non-engineering stakeholders.

Architectural Constraints and Testing Strategies

To reduce cognitive load and provide clear boundaries for the AI, my team established a strict architectural agreement for our services. We utilize a hexagonal architecture, which is documented in Confluence to ensure consistency when engineers rotate between different services. This structure includes a defined hierarchy of adapters, controllers, and domain use cases.

The current structure organizes components into clear packages such as:

com.todo.adapter.controller for handling external requests and DTOs
com.todo.adapter.supplier for repository adapters and external client configurations
com.todo.domain for core exceptions, models, and use cases

While this structure is optimized for organizational clarity rather than pure readability, it serves as a robust framework that prevents the AI from generating unexpected or hallucinated results. By grounding the AI in these established conventions, we save significant time that would otherwise be spent on custom instruction maintenance.

The Practical Workflow: From Init Prompt to Autopilot

The bridge between our documentation and the code is the initialization prompt. I have found that the most effective flow involves using ChatGPT, which has integrated connections to Jira, Confluence, and our GitHub repositories. This allows me to create a prompt that references specific Jira stories and Confluence guidance pages directly.

When provided with these links, ChatGPT analyzes the story details, the codebase structure, and the architectural standards to generate a grounded implementation plan. This plan maps to actual ports and adapter conventions rather than generic advice. This approach also facilitates a dialogue between human pair partners, as the chat becomes a shared space for reaching an agreement before the final prompt is passed to GitHub Copilot.

Slicing and Iterative Implementation

A critical aspect of using AI in production is task slicing. To prevent the AI from attempting to generate non-existent dependencies, it is vital to isolate fragments of the story. For a simple task involving a controller, a use case, and a client, I follow a isolated sequence:

Implement a controller with a hard-coded response.
Implement the client that connects to the external service.
Develop the use case to bridge the domain model and the client.
Update the controller to utilize the new use case.

Each slice follows a rigorous autopilot loop within GitHub Copilot. I provide a specific instruction set that mandates a test-driven development cycle:

Analyze the task and the repository for alignment.
Create tests and mark them as skipped until the plan is approved.
Establish an implementation order for the tests.
Iterate through each test by removing the skip marker, implementing the code, and verifying the test passes.
Execute a full build, such as gradle clean build test, after each passing test to ensure overall system stability.

Human Oversight and Integration

Despite the high level of AI involvement, human oversight remains a non-negotiable requirement for production code. I request that Copilot organize the resulting files into commit groups that are easy for a person to understand before opening a pull request.

By keeping pull requests small and isolated, they remain manageable for human review, ensuring they meet specific client requirements and that the human engineers maintain a deep understanding of the codebase.

This workflow demonstrates that by leveraging existing organizational processes and treating AI as an integrated teammate rather than an external tool, we can deliver high-quality software with greater efficiency and consistency.

Diagnosing Observability Gaps in Blocking Controller Methods

Tue, 12 May 2026 07:02:00 GMT

In a distributed system, the invisibility of an expected log entry often signals a deeper divergence between execution flow and infrastructure expectations. During a recent implementation of a test email functionality within a Kotlin-based service, I encountered a scenario where logs in Datadog appeared for certain execution paths but remained absent for others. This inconsistency prompted an investigation into the interaction between the Kotlin when expression, blocking downstream calls, and the lifecycle of a request within the Datadog logging pipeline.

The target of this investigation was the sendTestEmail method located in the TestEmailController. The domain logic returns three distinct results: Success, FeatureTurnedOff, and Error. While the FeatureTurnedOff case consistently produced logs in the monitoring dashboard, the Success and Error outcomes frequently failed to emit the final confirmation log.

Analyzing the Execution Flow

The initial hypothesis centered on potential issues with the Kotlin when block or a misconfiguration of the Mapped Diagnostic Context (MDC). However, the technical finding revealed a more fundamental cause related to execution timing and the nature of the downstream service interaction.

The FeatureTurnedOff result is a short-circuit path. When the feature toggle is disabled, the use case returns a result immediately, allowing the controller to reach the final log statement and exit within a negligible timeframe. Conversely, both the Success and Error paths require a call to a downstream notification service. This call is implemented using a blocking mechanism via the .block() method on a reactive stream.

The discrepancy in log visibility was not a failure of the logging library but a consequence of the controller thread waiting on a blocking call. If the downstream service experienced latency or if the client closed the connection before the call completed, the final log statement was never reached or recorded.

This behavior was corroborated by Datadog errors indicating that the stream was closed by the client and that there were errors reading events. In environments utilizing the ssm-agent-worker, these interruptions can occur when the infrastructure or the initiating client terminates the request context before the application finishes its blocking operation.

Implementing a Robust Logging Lifecycle

To resolve the visibility gap, I restructured the logging strategy to separate request arrival from processing outcomes. By introducing a log statement immediately upon entry to the controller method, I ensured that a record exists regardless of how the downstream call performs.

The revised implementation follows a deliberate pattern of enrichment and cleanup. I utilized MDC to attach structured metadata to the log records, which facilitates precise filtering in Datadog. It is essential to avoid generic MDC keys such as status, as these often conflict with reserved fields or common conventions in log aggregators. Instead, I opted for specific identifiers like testEmailOutcome and templateId.

Structured Implementation and MDC Hygiene

The following structure ensures that the MDC is populated at the start of the request and, crucially, cleared in a finally block to prevent context leakage between threads.

try {
    MDC.put("templateId", request.templateId)
    logger.info("Test email request received")

    val status =
        when (val result = sendTestEmailUseCase.execute(request.templateId)) {
            is SendTestEmailResult.Success -> {
                MDC.put("testEmailOutcome", "test email was sent successfully")
                HttpStatus.CREATED
            }

            is SendTestEmailResult.FeatureTurnedOff -> {
                MDC.put("testEmailOutcome", "feature toggle is off, test email was not sent")
                HttpStatus.ACCEPTED
            }

            is SendTestEmailResult.Error -> {
                MDC.put("testEmailOutcome", "test email failed to send")
                MDC.put("testEmailErrorMessage", result.cause.message ?: "unknown error")
                HttpStatus.INTERNAL_SERVER_ERROR
            }
        }

    logger.info("Test email request processed")

    return ResponseEntity
        .status(status)
        .body(SendTestEmailResponse(templateId = request.templateId))
} finally {
    MDC.clear()
}

This approach provides a clear narrative in the logs. The Test email request received log serves as a heartbeat, confirming the controller was reached. The final Test email request processed log confirms the blocking call completed and indicates which branch of the when logic was executed.

Interpreting Downstream Service Signals

Understanding the relationship between the application and the notification service is vital for interpreting the logs. For instance, an observed HTTP 400 Bad Request error from the notification service endpoint indicates that the feature toggle was active and the application successfully initiated the call. Because this is a terminal error from the downstream provider, the result maps to SendTestEmailResult.Error.

Logging the specific error message from the result cause into a dedicated MDC field allows for immediate debugging of downstream rejections without requiring a manual trace of the network call.

The introduction of the early log statement fixed the observability issue for all three execution paths. It provides a reliable controller-level record that the request was received before any slow or failing downstream behavior could interfere with the logging thread.

Conclusion on Implementation Choices

The decision to add a pre-call log and wrap the execution in a try-finally block was a logical response to the constraints of blocking I/O. While reactive, non-blocking patterns are often preferred, existing architectural constraints sometimes necessitate the use of .block(). In such cases, the primary responsibility of the developer is to ensure that the system remains observable even when execution is stalled.

By grounding the logging strategy in the lifecycle of the request rather than just the final outcome, I established a more resilient monitoring posture. The logs now clearly differentiate between request arrival, downstream processing, and final controller outcome, providing the necessary context to diagnose failures in a distributed environment.

The Preparation of the Machine

Tue, 28 Apr 2026 07:01:44 GMT

The Sim Racing Setup

I’ve spent some time in this industry to know that the promise of “plug-and-play” is usually a lie told to people who don’t have to maintain the results. We’ve grown accustomed to our IDEs functioning almost perfectly the moment we install them, which has created a bit of a lazy habit in our collective psyche. We expect our tools to meet us where we are without any effort on our part. But when I look at the current state of Generative AI, I’m reminded much more of high-performance sim racing or building a custom PC. You can just plug a wheel into a desk and start driving, but you won’t actually feel the road, and you certainly won’t win any races. To get professional results, you have to embrace the preparation. The setup isn’t an annoying preamble; it is the work itself.

Hierarchies of Instruction

In my recent experiments, I’ve moved away from treating ChatGPT as a blank slate. Instead, I’ve been refining a two-tier configuration that relies on Project Instructions, which are specific directives tailored to a particular codebase or business domain that work in tandem with my global settings. I found that by splitting instructions between a global level—who I am and how I want to be spoken to—and a project level, I could stop the AI from hallucinating a generic solution. This isn’t about giving the AI a long list of rules to follow blindly. It’s about creating a runtime environment that respects the reality of my actual repository.

Slicing Against the Grain

There is a fundamental tension in how we break down work for a machine versus how we break it down for a human. In the agile world, we are taught the value of a Vertical Slice, which is a functional piece of work that touches every layer of the system to deliver a complete feature. When I am working with AI, however, I’ve found that this approach often leads to a mess. I’ve started practicing a methodology where I break a complex story into isolated, technical layers—repository, use case, then controller—as separate steps. I didn’t set out to slice the “layers of a pie” instead of the “slices of a cake” because I thought it was a better way to design software; I did it because I found it simply works better for the AI’s current reasoning capabilities. It’s an empirical adjustment. By forcing the AI to focus on one technical layer at a time, I prevent the logic from becoming a tangled knot of half-finished abstractions.

The Logic of Two Flows

Within these project instructions, I’ve found success by defining two distinct paths of interaction. I call these Flow-Based Prompts, a system where the AI knows whether we are in an analysis phase or an execution phase.

Flow 1: Analysis & Slicing
- Goal: Digest the Jira story and propose the technical slices.
- Output: A structured implementation plan.

Flow 2: Prompt Generation
- Goal: Create a specific instruction for GitHub Copilot.
- Output: A isolated prompt for a single technical layer.

In the first flow, the AI acts as a sounding board, helping me decompose a story and identify the technical boundaries. In the second flow, it transitions into a generator, producing the exact context needed for GitHub Copilot to write the code. This prevents the “handoff” problem where context gets lost between the chat window and the code editor. It ensures that when I move to my IDE, the instructions are already tailored to the specific slice of the system I am currently building.

The Evolutionary Tree

Of course, I’ve been skeptical of “perfectly automated” prompts that try to handle every edge case from the start. I’ve discarded that idea for now because, at this stage of my understanding, those prompts usually just add unnecessary weight and noise. However, I don’t think we are stuck here. I suspect that as we get better at this, our instruction sets will evolve into something more like a tree. The system won’t just be a static list of rules; it will be an adaptive structure that detects the current context of the work and branches out to provide exactly the right level of detail.

We are moving toward a future where the tool detects the type of instruction needed rather than requiring us to shout the same commands every morning.

For now, the manual setup is where the value lives. It’s the difference between a tool that guesses and a tool that knows.

Back to Reality

In the end, I’m keeping the slicing methodology and the dual-flow instruction setup in my toolkit. I’ve set aside the hunt for a “magic” prompt that solves everything in one go. Reality is messy, and our tools need to be flexible enough to reflect that. We should be skeptical of any AI workflow that promises to do the thinking for us. The real value is in the preparation—the configuration of the environment—that allows us to do our best thinking with a bit less friction.

Further Reading / Related Reflections

The Shared Reality of the Database Ledger

Tue, 21 Apr 2026 07:01:15 GMT

I spent a good portion of the early 2000s staring into the flickering glow of a CRT monitor, trying to master the precise sequence of an RTS build order. In games like StarCraft, you didn’t just build a Factory on a whim; you followed a rigid, physical sequence of Supply Depots and Barracks. The real problem wasn’t just losing a match—it was the desync, a fatal error where one player’s game state no longer matched the other’s. When that happened, the shared reality of the match simply evaporated.

I found that managing a database schema with Flyway feels remarkably similar. We often treat database evolution as a fluid, agile process, but the underlying reality is much more rigid. When we move from the isolated “practice map” of local development to the high-stakes environment of a production database, we are moving into a space where the history of what we built is just as important as the current state. In this space, a mismatch between your code’s expectations and the database’s actual schema is the ultimate game-breaker.

The Migration Ledger

Flyway manages this by utilizing a migration-based approach, which means every change to the database—whether adding a table or altering a column—is captured in a versioned SQL script. It maintains a dedicated table called flyway_schema_history to track exactly which scripts have been executed. To ensure consistency, the system calculates a checksum, which is a digital fingerprint of the file’s content.

If I ever change a script after it has already run on a server, Flyway detects that the fingerprint has changed. This results in a checksum mismatch, and the system will stop the application from starting. This immutability is not a hurdle; it is a safety feature designed to prevent the database from entering an unknown state where the code expects one schema but the database has another.

Iteration in the Local Loop

The friction often begins when we forget that our local environment is a sandbox, not a permanent monument. On macOS, I found that using Docker and Testcontainers is the most reliable way to ensure a local database actually matches production. We can spin up a local container with a single command to test our build order:

docker run --name my-db -e POSTGRES_PASSWORD=pass -p 5432:5432 -d postgres

This local container allows us to iterate quickly . In our build.gradle.kts configuration, we ensure that the cleanDisabled flag is set to false .

flyway {
    url = "jdbc:postgresql://localhost:5432/mydb"
    user = "postgres"
    password = "pass"
    cleanDisabled = false
}

This setup gives us a reset button . If I realize my first version of a script is flawed, I don’t create a second script to fix the first one locally. Instead, I edit the original script, run ./gradlew flywayClean, and then ./gradlew flywayMigrate. This ensures that my local state remains clean and my scripts remain concise before they are ever shared with the team.

The Virtue of Squashing

When working on a complex feature, I often end up with several different migration scripts as I refine the design. Merging all five into the main branch is a mistake because it clutters the history with a “diary” of my trial and error. Instead, I practice squashing, the act of consolidating all logic from multiple feature-branch scripts into one single, optimized file.

Squashing improves readability, making it easier for a peer to review one coherent table creation rather than a series of renames and drops. It also improves performance, as fewer scripts mean faster deployment and test execution. Before I merge a Pull Request, I ensure my local database is cleaned and migrated one last time to verify that the final, squashed script works perfectly.

Constraints of the Persistent Environment

The danger arises when we attempt to treat a persistent environment, like AWS Aurora, as if it were a local Docker container . Unlike our local sandbox, we cannot simply wipe a cloud database.

Triggering a clean command in a persistent environment is the ultimate “Game Over,” as it will drop all application data and cause a full service interruption .

Production database users usually lack the permissions to drop schemas anyway, which is a vital safety rail. However, errors still happen. Because PostgreSQL does not always roll back schema changes perfectly, a failed script can leave the database in a “half-built” state. When this happens, we must fix the script in the codebase and run ./gradlew flywayRepair . This command updates the history table to match the new checksums without deleting any data, though sometimes manual SQL intervention is required to fix the table structure before the repair can succeed .

Discipline Over Magic

At the end of the day, database migrations are about the discipline you bring to the ledger rather than the tool itself. Flyway is a powerful engine, but it won’t save you from a messy build order or a lack of environmental parity. I’m keeping the practice of squashing and the strict use of containers in my toolkit, while setting aside any hope that these systems will ever be truly “set and forget”.

The reality is that database state is heavy and unforgiving. If you treat your migrations with the respect a shared reality demands, your deployments will become boring—which is exactly what we should strive for.

Further Reading / Related Reflections

The Cognitive Cost of AI Delegation

Tue, 14 Apr 2026 07:02:37 GMT

The Brake-Fade on the Downhill (The Hook)

When you’re descending a steep technical trail on a mountain bike, your most precious resource isn’t your speed—it’s your biological energy and grip strength. If you spend the entire descent white-knuckling the brakes because you’re afraid of the terrain, you hit “brake fade.” The system overheats, your hands cramp, and by the time you reach the truly dangerous rock garden at the bottom, you have zero “focus capital” left to navigate it. You crash not because the trail was too hard, but because you wasted your resources on the easy parts.

In the professional world, GenAI is being marketed as the ultimate “ebike” for our brains. The industry assumption is that more output equals more productivity. But if this “unlimited output” is the popular choice, why does it feel like I’m fighting the system? Why does receiving a perfectly formatted, AI-generated A4 page feel like a cognitive “crash” before I’ve even reached the conclusion?

The Architecture of the Proxy Mind (The Landscape)

The environment I’m navigating isn’t just a chat interface; it’s a Mind-to-Mind Pipeline where the AI acts as a middleware layer. We are dealing with a system defined by the following geometry:

[Input: Raw/Unorganized Chaos]

          ↓

[Processor: GenAI “Mind Extension”]

          ↓

[Output: Structured Narrative (High Volume)]

          ↓

[Buffer: Human Reviewer (The Fatigue Point)]

          ↓

[Destination: Recipient’s Attention Span]

The constraints here are rigid. The LLM has no “physical” weight, but its output carries massive cognitive weight. The dependencies are tightly coupled: if I delegate the “thinking” to the tool without managing the “output volume,” the invisible boundary of the recipient’s attention is breached. Data moves through this space quickly, but meaning gets trapped in the friction of the preamble.

The A4 Saturation Point (The Stress Test)

I moved my observations from the “theoretical path” to the “actual terrain” where people have many unread messages.

➤ The Breaking Point: The methodology of “Ask and Forward” failed at the third iteration. When I pushed a full A4 page of structured AI text to a colleague, the system showed immediate fatigue.

➤ The Silent Failure: The recipient didn’t tell me the text was too long. Instead, they “swallowed” the error—skimming the preamble, missing the critical “result of work” buried in the middle, and asking a question that was already answered in the text.

➤ The Observation: The gap between the “Structured Answer” provided by the AI and the actual Information Transferred was a massive chasm. While I didn’t measure the exact percentage, the observation was clear: the system was technically functioning, but the mission failed. The recipient’s focus simply didn’t survive the “A4 size” barrier.

The Noise Floor of the Preamble (The Handoff)

This is a failure of delegation. When we use AI to structure “unstructured vision,” we often translate our goal into an action that generates clutter rather than clarity.

➤ Signal-to-Noise: GenAI tools are programmed to be “helpful,” which means adding long, polite preambles and exhaustive summaries. This is the “noise floor”.

➤ Cognitive Load: By sending unedited AI responses, you aren’t saving time; you are just shifting the processing debt onto the recipient. You spend 10 seconds generating the text, but you force the recipient to spend minutes mining it for value. This eventually leads to a “system blackout” where people ignore messages entirely.

The Hard Character Limit (The Verification)

After observing these failures, only one principle remained standing: The Short Style Constraint.

➤ Stability: The only communication that survived the “skimming” reflex was the “Elevator Pitch” format. When forced into a tight container, the AI is actually better at its job. It stops “hallucinating value” through word count and starts organizing logic.

➤ The New Baseline: The trusted approach is the Init Prompt Constraint. I tell the system: “Structure my thoughts, but do not exceed 280 characters” or “Provide the result first, no preamble”.

➤ The Evolution: I no longer view AI as a “writer”; I view it as a compressor. The strategy has shifted from using AI to say more to using it to say exactly enough.

The Navigator’s Log (Actionable Insights)

➤ Backlog:

The “A4-size” response—a legacy format that died with the printer.
“Respectful” AI preambles—they are actually disrespectful to the recipient’s time.
Trusting the human brain to catch errors in long AI texts after multiple iterations (brain laziness is a hardware feature, not a bug).

➤ Merged:

The “Short Style” Init Prompt: Force the AI into a constraint before it generates a single word.
Energy Conservation: Spend mental energy on the constraint, not on editing massive, verbose text.
The Win-Win Protocol: If the sender spends less energy reviewing and the recipient spends less energy reading, the system remains stable.

Final Wisdom: In a world of infinite AI-generated noise, the most “premium” technical skill is the discipline to limit content. Be respectful to the system, or the system will stop listening.

𝗧𝗵𝗲 𝗣𝗿𝗮𝗴𝗺𝗮𝘁𝗶𝗰 𝗛𝗲𝘅𝗮𝗴𝗼𝗻: 𝗦𝗰𝗮𝗹𝗶𝗻𝗴 𝗗𝗲𝗰𝗼𝘂𝗽𝗹𝗶𝗻𝗴 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝗖𝗼𝗺𝗽𝗹𝗲𝘅𝗶𝘁𝘆

Tue, 24 Mar 2026 13:21:45 GMT

𝗧𝗵𝗲 𝗧𝗲𝗻𝘀𝗶𝗼𝗻 𝗼𝗻 𝘁𝗵𝗲 𝗧𝗿𝗮𝗶𝗹

In a professional kitchen, there is a concept called mise en place—everything in its place. You don’t start searing the scallops until every herb is chopped and every sauce is whisked. If you skip the prep to “save time,” you end up adjusting the recipe mid-sauté, usually resulting in a frantic mess, ruined ingredients, and a dish that takes twice as long to serve.

Modern software development has a similar “popular choice”: start coding the logic immediately to show “progress.” But when we skip the architectural prep—the interfaces and boundaries—we aren’t moving fast; we are just building a kitchen we’ll have to tear down while the customers are waiting. I’ve watched engineers lose sight of the goal in the pursuit of a “perfect flow” that wasn’t grounded in discipline. If everyone says they want “clean code,” why does the system feel like it’s fighting us the moment we add a new story?

𝗦𝘆𝘀𝘁𝗲𝗺 𝗚𝗲𝗼𝗺𝗲𝘁𝗿𝘆

The environment of this experiment is a standard Kotlin and Spring Boot stack. The landscape is defined by three distinct zones designed to minimize the “weight” of dependencies. To navigate this space, we use a rigid directory structure that acts as our map:

app

├── domain      <-- THE HEART (POKOs only)
│   ├── model
│   │   └── Data.kt     <-- Pure Kotlin Data Class
│   └── ports
│       └── outgoing    <-- Interfaces defining “What” we need
│           ├── DataPersistencePort.kt    <- SQL db
│           └── DataStoragePort.kt        <- Object storage
├── usecases    <-- THE ORCHESTRATOR
│   └── StoreDataUseCase.kt    <-- Feature logic
└── adapter     <-- THE “HOW” (Infrastructure)
    ├── web         <-- Inbound Adapter
    │   ├── DataController.kt
    │   ├── dto         <-- Request/Response DTOs
    │       └── WebMapper.kt    <-- DTO <-> Domain mapping
    ├── sqldb       <-- Outbound Adapter
    │   ├── entity
    │   │   └── DataJpaEntity.kt    <-- @Entity + JPA annotation
    │   ├── DataRepository.kt        <-- Spring Data/CrudRepository
    │   ├── PersistenceMapper.kt     <-- Entity <-> Domain mapping
    │   └── PersistenceAdapter.kt    <-- Impl DataPersistencePort
    └── cloud       <-- Outbound Adapter
        └── ObjectStorageAdapter.kt

➤ The Heart (Domain): Pure Kotlin Data Classes and business logic common to all usecases.

➤ The Orchestrator (Usecases): Where feature-specific logic lives and adapters are coordinated.

➤ The Infrastructure (Adapters): The “How” of the system—web controllers, JPA entities, and cloud storage clients.

The invisible boundary here is the Port. It’s an interface that defines “what” we need without caring “how” it’s done. In theory, this geometry should be light and flexible, yet many teams find it rigid because they misunderstand the direction of the signal.

𝗘𝗺𝗽𝗶𝗿𝗶𝗰𝗮𝗹 𝗘𝘅𝗽𝗹𝗼𝗿𝗮𝘁𝗶𝗼𝗻

I moved from the “theoretical path” of perfect architecture to the “actual terrain” of daily PRs. The system showed its breaking point not in a crash, but in a silent failure of discipline: the Domain Import Leak.

➤ The Breaking Point: It usually starts when an engineer adds a domain service that directly imports an adapter: import app.adapter.NewAdapter.kt.

➤ The Silent Failure: The code still passes tests. It still “works”. But the “Pure Domain” has been poisoned by infrastructure concerns.

➤ The Result: When the time inevitably comes to move that service to a usecase, the system reacts with extreme fatigue. We end up with PRs requiring the renaming of tens of files, leading to typos, package mismatches, and a massive mental load on reviewers.

𝗠𝗮𝗻𝗮𝗴𝗶𝗻𝗴 𝘁𝗵𝗲 𝗦𝗶𝗴𝗻𝗮𝗹

The handoff between layers is where the “spaghetti” starts or ends. In my exploration, I found that the clarity of intent is often lost because teams are afraid of the “complexity” of an extra interface.

➤ Cognitive Load: Trying to refactor architecture in the middle of a feature story creates a “refactoring nightmare”.

➤ Signal-to-Noise: If you are 100% sure a logic block belongs in the domain, put it there. If not, the “cleaner” signal is to start in a Usecase and extract downward only when the need is proven.

➤ Direct Translation: To keep the signal clear, I’ve found it’s even acceptable to call a Port directly from a controller for simple cases. This avoids 1:1 “pass-through” mapping while keeping the adapter decoupled through the interface.

𝗪𝗵𝗮𝘁 𝗘𝗮𝗿𝗻𝗲𝗱 𝗧𝗿𝘂𝘀𝘁?

After the stress test of “no time to decouple,” one principle remained standing: Mandatory Ports from the Start.

➤ Stability: The “price” of an interface at the start is effectively zero. It provides an immediate boundary that prevents the “import leak” and allows the domain to remain pure. ➤ The New Baseline: My trusted navigation strategy is now TDD-driven Hexagon.

• Step 1: Define the Domain Model.

• Step 2: Build the Adapter and verify it with Testcontainers (SQL or Object Storage).

• Step 3: Finally, orchestrate it all in the Usecase or Controller using the Port interface.

𝗔𝗰𝘁𝗶𝗼𝗻𝗮𝗯𝗹𝗲 𝗜𝗻𝘀𝗶𝗴𝗵𝘁𝘀

➤ Backlog (Failed the Stress Test):

• “Refactoring-in-the-middle”: Changing architecture while delivering a story leads to mess and typos.

• Direct Adapter Imports: Any import app.adapter inside app.domain is a bug, not a feature.

➤ Merged (Trusted Toolkit):

• Ports First: Always create the interface for 3rd party services or repositories immediately.

• Adapter-First Testing: Use Testcontainers to prove your “How” works before you worry about the “What” in your orchestration.

• Minimum Layers: Only add a Usecase layer if there is actual orchestration; otherwise, call the Port from the Controller.

Final Wisdom: Clean architecture isn’t about having the most layers; it’s about having the most resilient boundaries. The “price” of an interface is nothing compared to the cost of a messy PR that no one wants to review.

Lying Tests and the Silent Swallow: Hardening Legacy Java

Tue, 17 Mar 2026 08:00:27 GMT

Is your CI/CD pipeline telling you the truth, or is it just telling you what you want to hear?

In many legacy projects, the build is “Green,” the tests pass, and the console shows no errors. Yet, the moment the application hits production, it fails. The culprit is often a “Lying Test”—a suite that passes not because the code works, but because the errors have been carefully hidden, logged to a void, or suppressed by a generic catch-all block.

How do you turn a “politely silent” codebase into one that fails loudly enough to be fixed?

The ‘Before’ State: Setting the Context

In older Java applications (circa 2005), error handling was often synonymous with e.printStackTrace(). Developers used manual main() methods or early JUnit versions to “test” logic. When an exception occurred, the instinct was to keep the process running at all costs.

The “old way” of testing often looked like this:

The Silent Swallow: Generic catch (Exception e) blocks that log a message but do not rethrow or signal failure.
Exit Code 0: Build scripts (Ant) that encounter a runtime error but still report a successful exit code, tricking the developer into thinking everything is fine.
Manual Verification: Tests that require a human to read the console output to see if it “looks right,” rather than asserting a specific outcome.

Introducing the Core Concept: Honest Testing

Honest Testing is the process of stripping away the “safety blankets” of legacy error handling to force the application to Crash Loudly.

What is it? It is a “Hardening Phase” where you replace swallowed exceptions with meaningful failures and migrate manual checks to automated assertions.

Why does it matter? You cannot refactor code you do not understand. If your tests are lying to you about the state of the system, any “improvement” you make is just a guess. Making the build RED is the first step toward making it truly GREEN.

Practical Applications & Use Cases

Use Case A: Exposing the Silent Swallow

The most common anti-pattern in legacy Java is the “Log and Forget” block. We must convert these into loud failures during the testing phase.

// BEFORE: The Lying Code
public void storeData() {
    try {
        // critical logic
    } catch (Exception e) {
        System.out.println("Error happened, but let's keep going!");
    }
}

// AFTER: Honest Code for Testing
public void storeData() {
    try {
        // critical logic
    } catch (Exception e) {
        // Re-throwing as a RuntimeException forces the test to fail
        throw new RuntimeException("Hardened Failure: Data storage failed", e);
    }
}

Benefit: The test suite will now immediately catch failures that were previously invisible.

Use Case B: From `main()` to JUnit 5

Legacy projects often have “test” classes that are just public static void main(String[] args) methods. These don’t integrate with CI/CD.

// Migrating to JUnit 5 Assertions
@Test
void testBackendConnection() {
    Backend b = new Backend("qbert.guba.com");
    // Instead of printing to console, we assert the state
    assertDoesNotThrow(() -> b.connect(), "Connection should be stable");
    assertNotNull(b.getStatus(), "Status should be initialized");
}

Benefit: Provides a quantifiable “Safety Net” that build tools like Gradle can interpret as a Pass/Fail signal.

Common Pitfalls & Misconceptions

The “Fear of Red” Pitfall: Many teams are terrified of a broken build. They think that if the build turns red, they’ve failed.

The Truth: In legacy refactoring, a Red Build is a victory. It means you’ve finally found the boundaries of the system. You’ve moved from “unknown-unknowns” to “known-knowns.” Don’t rush to fix the red; use it as a map to find where the code is truly broken.

Core Trade-offs & Nuances

The “Crash” Period: When you start hardening tests, the project might not compile or pass for days. This requires stakeholder buy-in—you are breaking the “illusion of stability” to find the “reality of the debt.”
Log Noise: Hardening exceptions often results in massive stack traces in your logs. This is necessary labor; you have to clean the noise to find the signals.

Forward-Looking Conclusion

A “Green Build” is only valuable if it is earned. By removing the “Silent Swallows” from your legacy Java project, you are performing a diagnostic surgery. It is painful, and it reveals the rot, but it is the only way to heal the codebase.

Once your tests are honest, you can finally apply modern AI tools and refactoring patterns with confidence. You aren’t just “hacking” anymore; you are Engineering.

Environment Emulation: Using Docker as a Time Machine for Legacy Java

Nik — Tue, 03 Mar 2026 08:01:11 GMT

What do you do when the code is right, but the world has changed too much to run it? You’ve successfully compiled a 20-year-old Java app, but the moment you hit “Run,” it crashes. It’s looking for a server named qbert.guba.com that was decommissioned in 2011. It’s searching for a local directory belonging to a developer who left the company fifteen years ago.

How do you convince a digital “antique” that it’s still living in 2005?

The ‘Before’ State: Setting the Context

In the early days of Java development, “Environment Variables” and “Configuration as Code” were often ignored in favor of hardcoded assumptions. Developers wrote code that relied on:

Static Network Topologies: Hardcoded hostnames in .properties files or even inside .class files.
Personalized File Paths: Logic that pointed to /Users/ericlambrecht/data, making the code physically impossible to run on any other machine.
Specific Hardware Quirks: Reliance on the way Intel processors handled certain operations, which breaks on modern ARM-based chips like Apple’s M-series.

The “old way” to fix this was a massive refactoring effort to externalize configuration. But when you have thousands of lines of “spaghetti” code, you risk introducing more bugs than you fix.

Introducing the Core Concept: Environment Emulation

Environment Emulation is the practice of using containerization to recreate a specific historical “reality” for your application. Instead of changing the code to fit the modern world, you change the world to fit the code.

What is it? It’s a “Time Capsule” strategy where Docker mimics the network, filesystem, and CPU architecture the application expects.

Why does it matter? It allows you to achieve a “Green Start” without touching a single line of legacy business logic. By stabilizing the environment first, you can verify that the code can work before you begin the dangerous work of refactoring it.

Practical Applications & Use Cases

Use Case A: Network Trickery (Docker Aliases)

If your legacy code is hardcoded to look for qbert.guba.com, you don’t need to hunt through the source code. You can use Docker’s network aliases to point that “ghost” hostname to a local container or a mock service.

# docker-compose.yml
services:
  legacy-app:
    image: my-ancient-app:latest
    networks:
      backend:
        aliases:
          - qbert.guba.com  # The app thinks it found its long-lost server
networks:
  backend:

Benefit: The application connects successfully without any code changes or /etc/hosts hacking on your host machine.

Use Case B: Filesystem Mimicry (Volume Mapping)

When code is locked to a specific path like /Users/eric/data, Docker volumes can “teleport” your modern project directory into that exact location inside the container.

docker run -v $(pwd)/data:/Users/ericlambrecht/data my-legacy-java-app

Benefit: You satisfy hardcoded file requirements immediately, allowing the app to boot and pass its initial I/O checks.

Use Case C: Hardware Realities (x86 on ARM)

Older binaries or specific versions of the JVM (like early Java 6 or 8 builds) may behave unpredictably on Apple Silicon (ARM64). You can force Docker to emulate the original Intel environment.

# Specify the platform to ensure 100% compatibility with legacy binaries
FROM --platform=linux/amd64 eclipse-temurin:8-jdk

Benefit: Eliminates subtle “Heisenbugs” caused by CPU architecture differences.

Common Pitfalls & Misconceptions

The "Config-First" Trap: Many engineers think they must "clean up" the configuration files before they can run the app in Docker.

The Fix: Don’t clean. Emulate. Use Docker to satisfy the app’s current (even if “ugly”) requirements. Once you have a running, testable container, you can then refactor the configuration into modern environment variables as a second, safer step.

Core Trade-offs & Nuances

The “Magic” Burden: Environment emulation can feel like “magic” to new developers. If the docker-compose.yml isn’t well-documented, a newcomer won’t understand why the app is looking for a server that doesn’t exist.
Performance: Running x86 images on ARM64 via emulation (QEMU) is slower than native execution. This is acceptable for refactoring and testing, but may not be ideal for high-performance production needs.

Forward-Looking Conclusion

Modernization is an act of engineering, not just coding. By using Docker as a “Time Machine,” you stop fighting the environment and start observing the application’s actual behavior.

Once the “Time Capsule” is built, you have achieved the ultimate goal of the software archaeologist: Reproducibility. From here, you can move forward with confidence, knowing that any changes you make to the code are being tested against a stable, predictable reality.

The Strangler Build: Modernizing Java Tooling with Gradle 7.6

Nik — Tue, 17 Feb 2026 08:03:21 GMT

What do you do when your build system is the primary blocker to your modernization? You want to introduce automated testing and containerized deployments, but your project is locked inside an opaque build.xml file. It’s not necessarily that the file is thousands of lines long—it’s that it represents a “frozen” process. The fear of breaking a specific, undocumented Ant target often keeps teams stuck in the past, manually running builds because they don’t trust the automation.

The ‘Before’ State: Setting the Context

In the early 2000s, Apache Ant was the industry standard. It was purely imperative: you wrote a “script” telling the computer exactly how to delete folders, copy files, and compile classes.

The problem isn’t just the age of the tool; it’s the lack of lifecycle. Unlike Maven or Gradle, Ant has no built-in concept of a “test” phase or a “package” phase unless someone manually scripted them. For many legacy projects, this resulted in a build process that is fragile, hard to replicate in CI/CD, and completely disconnected from modern dependency management.

Introducing the Core Concept: The Tooling Strangler

The Tooling Strangler applies the Strangler Fig pattern to your build infrastructure. Instead of attempting a “Big Bang” migration where you delete Ant and spend a week debugging a new Gradle script, you wrap the old logic.

What is it? Using Gradle’s ant.importBuild, you surface your legacy Ant targets as native Gradle tasks.

Why does it matter? It allows you to move to a modern CLI immediately. You get the benefits of the Gradle Wrapper (./gradlew), advanced caching, and build scans, while the actual heavy lifting is still performed by the original, proven Ant logic.

Practical Applications & Use Cases

Use Case A: The “Wrapper” Migration

By importing the build, you can start adding modern features (like dependency management) around the old Ant tasks without changing the Ant file itself.

// build.gradle
// Import the existing Ant logic
ant.importBuild 'build.xml'

// Add a modern dependency that Ant didn't know about
dependencies {
    implementation 'org.slf4j:slf4j-api:1.7.36'
    testImplementation 'org.junit.jupiter:junit-jupiter:5.9.1'
}

// "Hook" a modern task into an old Ant target
tasks.named('compile') {
    doLast {
        println "Ant finished compiling. Gradle is now verifying the output..."
    }
}

Benefit: Risk-free modernization. Your build stays “green” throughout the entire transition.

Use Case B: The 7.6 “Goldilocks” Version

In my experiments, I found that Gradle 7.6 is the specific “sweet spot” for this work. Why?

JDK 8 Compatibility: It is the last major version that runs its own background processes (the daemon) natively on Java 8.
Modern Features: It still supports the latest JUnit 5 platforms and Docker-ready plugins.
The Bridge: It allows you to bridge the gap between a 2005 build logic and a 2026 deployment pipeline.

Common Pitfalls & Misconceptions

The "Pure Gradle" Obsession: A common mistake is trying to make the build.gradle file "perfect" from day one. Developers often get stuck trying to replicate a weird Ant copy task in Gradle's DSL.

The Fix: If the Ant task works, leave it in Ant. Use the Strangler Fig approach: only move tasks to Gradle when you actually need to change their logic or improve their performance.

Core Trade-offs & Nuances

Dual Maintenance: For a period, you have both build.xml and build.gradle. You must treat the Gradle file as the new “entry point” for the team.
Mindset Shift: You are moving from a “Scripting” mindset (Ant) to a “Task Graph” mindset (Gradle). Understanding how tasks depend on one another is more important than knowing the syntax.

Forward-Looking Conclusion

Modernizing a build system doesn’t require a “demolition and rebuild.” By using Gradle 7.6 as a wrapper for your legacy Ant scripts, you buy yourself the most valuable asset in refactoring: time. You get the project into a modern CI/CD pipeline on day one. Once the build is stabilized and automated, you can “strangle” the remaining Ant targets at your own pace.

The Golden Bridge: Why Java 8 is the Ultimate Tool for Legacy Refactoring

Nik — Mon, 16 Feb 2026 08:02:13 GMT

When does “latest and greatest” become a liability? Imagine you’ve just inherited a “Big Ball of Mud”: a 20-year-old repository built with Ant, running on Java 1.5, and filled with raw types and swallowed exceptions. Your instinct is to jump to Java 21 to get the latest performance gains and features. But when you try to compile, you’re met with thousands of breaking changes, deleted APIs, and a build system that refuses to acknowledge modern hardware.

How do you modernize a system that is too old to run, but too critical to fail?

The ‘Before’ State: Setting the Context

In the world of “Software Archaeology,” we often encounter projects stuck in the mid-2000s. These applications are often:

Compiler-Locked: They rely on syntax (like certain raw-type configurations) that modern JDKs (11, 17, 21) simply won’t compile anymore.
Environment-Fragile: They only “work on Bob’s machine” because Bob has a specific 2008-era Intel laptop and a prehistoric version of the JDK.
Tooling-Limited: They use Ant or early Maven versions that don’t understand modern CI/CD pipelines or containerization.

The “old way” of fixing this was the Big Bang Migration: a grueling six-month rewrite where you try to jump 15 years of evolution in one go. Most of these attempts end in failure, reverted commits, and exhausted teams.

Introducing the Core Concept: The Golden Bridge

The Golden Bridge methodology uses Java 8 not as a final destination, but as a strategic "Field Hospital." What is it? It is the practice of migrating ancient code (Java 1.4 - 1.6) specifically to Java 8 first, rather than the current LTS.
Why does it matter? Java 8 sits at a unique historical intersection. It is the “Last of the Ancients” and the “First of the Moderns.” It provides a stable environment where you can fix the internal architecture of the code without the external environment fighting you.

How does it work?

Dual-Compatibility: It supports the -source 1.5 flag to compile ancient syntax while allowing you to use modern IDEs.
Architecture Neutrality: It is the first version that runs natively on Apple Silicon (ARM64) via Zulu or Temurin builds, ending the reliance on old hardware.
Tooling Support: It is fully supported by Gradle 7.6, which acts as the "Strangler Fig" for old Ant builds.

Practical Applications & Use Cases

Use Case A: Compiling the “Uncompilable”

Modern JDKs have removed many internal APIs and tightened the rules on source compatibility. Java 8 allows you to keep the old code running while you transition the build system.

// In your build.gradle, you can target the past while living in the present
java {
    toolchain {
        languageVersion = JavaLanguageVersion.of(8)
    }
}

Benefit: You get a green build in hours, not weeks.

Use Case B: The Docker “Time Machine”

By using Java 8, you can create a Docker image that mirrors the production environment exactly, but runs on a 2024 MacBook.

FROM eclipse-temurin:8-jdk
# Map the 20-year-old hardcoded file paths to modern volumes
VOLUME /Users/original_dev/data:/data 
COPY . /app
WORKDIR /app
CMD ["ant", "test"]

Benefit: Eliminates “Works on my machine” bugs immediately.

Common Pitfalls & Misconceptions

The "Destination" Trap: The biggest mistake is thinking that moving to Java 8 is "enough."

Java 8 is a bridge, not a home. If you stay there, you are still accumulating technical debt. The goal of the Golden Bridge is to get the code clean enough (removing raw types, fixing tests) so that the jump to Java 17 or 21 becomes a simple compiler flag change rather than a structural nightmare.

Core Trade-offs & Nuances

The Cost: You have to maintain a specific legacy toolchain (like Gradle 7.6) because the newest versions of build tools have dropped support for Java 8.
The Mindset: You must resist the urge to use Java 8 features (like Streams or Optionals) immediately. Your first goal is stabilization, not modernization. Adding new syntax to a “muddy” codebase only makes the archaeology harder.

Forward-Looking Conclusion

Java 8 is the unique “Goldilocks” zone of the Java ecosystem. It’s old enough to understand where the code came from, and modern enough to work with the tools of today.

By treating Java 8 as your Golden Bridge, you turn a high-risk “archaeological dig” into a controlled engineering project. Use it to stabilize your build, containerize your environment, and harden your tests. Once the mud is washed away, the path to Java 21 will be wide open.

Does Delegating to AI Mean We Can Finally Be Lazy Managers?

Nik — Tue, 20 Jan 2026 08:00:59 GMT

1. The Hook

We often sell AI adoption to our bosses (and ourselves) with the promise of speed. We imagine a future where we toss a vague request over the wall—”fix the build,” “export the data,” “optimize the query”—and the AI handles the rest while we grab a coffee.

But my recent experiments with Jules, Google’s new AI agent, suggest the opposite is true. The more “autonomy” I gave the AI, the more mediocre the code became. This leads to an uncomfortable question: Does effective AI delegation actually require more management overhead, not less?

2. Context & Tools

I’ve been experimenting with Jules, testing its ability to act as a “Junior Developer” in my Spring Boot repository, joyofenergy-java.

In my previous explorations, I looked at Pair-Authoring with an AI and the Context Window Paradox. This time, I wanted to test the difference between Abdication (lazy delegation) and Navigation (structured delegation) when asking an agent to build a feature from scratch.

3. The Failed Experiment: The “Friday Afternoon” Prompt

I set up a scenario we’ve all faced: It’s Friday afternoon, I want a new feature shipped, and I don’t want to think about the implementation details.

I gave Jules the “Lazy Manager” prompt:

“Jules, create an endpoint to export meter readings as a CSV file. Use the existing MeterReadingService.”

I intentionally withheld constraints. I didn’t mention memory usage, libraries, or formatting.

The Result?

Technically, it worked. Jules created a CsvService, updated the controller, and passed the tests. But structurally, it was a time-bomb.

Memory Unsafety: It loaded the entire dataset into a List in memory before writing the response. For a smart meter with 100,000 readings, this is an OutOfMemoryError waiting to happen.
Library Bloat: It generated a new service class (CsvService) where a simple stream in the controller would have sufficed.
Junior Mistakes: It used standard Java formatting without considering how a user would actually open the file in Excel.

The “lazy” prompt produced “lazy” code: functional, but dangerous at scale. It validated my fear that More Powerful AI Doesn’t Always Mean Faster Fixes.

4. Principles That Actually Work: The “Brief”

I reset the experiment. This time, I treated Jules like a Senior Engineer would treat a Junior: I wrote a spec.

I uploaded a file named feature-csv-export.md containing strict constraints:

No New Dependencies: Do not add apache-commons or opencsv.
Memory Safety: Do not load lists into memory; stream directly to the HttpServletResponse.
Strict Formatting: Use yyyy-MM-dd HH:mm.

I then prompted:

“Jules, I’ve uploaded a spec file... Please refactor the implementation to strictly follow these constraints.”

The Outcome:

The difference was night and day.

Architectural Safety: Jules implemented a streaming solution using PrintWriter, avoiding the memory bottleneck entirely.
Dependency Management: It correctly added jakarta.servlet-api as a compileOnly dependency, respecting the “no runtime bloat” rule.
Test Integrity: It initially failed to test the controller response correctly, but because I had defined the “correct” output in the spec, I could guide it to fix the assertion logic.

5. Unexpected Discovery: The “Spec” as a Guardrail

The most surprising insight was that Jules didn’t just follow the instructions—it used the spec file as a defense mechanism against bad code.

When I ran the “Lazy” experiment, Jules defaulted to the path of least resistance (loading data into memory). When I provided the “Brief,” Jules shifted behavior entirely. It didn’t just write code; it navigated the constraints.

This confirms a theory I touched on in Can We Make AI Code Assistants Smarter by Asking Them to Write Their Own Rules? The AI performs best not when it has “creative freedom,” but when it is boxed in by rigid technical constraints. The “Senior Engineer” input wasn’t the code I wrote, but the boundaries I set.

6. The Central Paradox

This brings us to the Delegation Paradox:

To get an AI agent to work autonomously, you must micromanage the requirements.

If you want to be “lazy” during the implementation phase (execution), you must be hyper-active during the definition phase (specification). You cannot abdicate both.

Abdication (Vague prompt) -> Requires heavy code review and refactoring later.
Navigation (Detailed spec) -> Requires heavy upfront thought, but produces near-production-ready code.

We aren’t thinking less with AI; we are shifting when we think.

7. Forward-Looking Conclusion

Tools like Jules are shifting the developer’s role from “writer of code” to “architect of constraints.”

If you treat your AI agent like a magic wand that reads your mind, you will build technical debt at record speeds. But if you treat it like a talented but literal-minded junior developer who needs a solid brief, it becomes a powerful force multiplier.

The future of engineering isn’t about writing the perfect function; it’s about writing the perfect spec.

Can We Skip TDD with Modern AI? A Context Experiment

Nik — Tue, 09 Dec 2025 08:01:03 GMT

The Hook

Recently, some colleagues pitched me an idea: “Today, LLMs are so powerful, you can start exactly from implementation and it will work well. No need to use TDD or other more complicated XP techniques”.

It is a tempting thought. If an AI can generate a complete feature in seconds, is my approach—always start from a test—still relevant?.

I decided to check it. I ran an experiment to see if I could implement a complex feature by describing the task and letting GenAI create the application. My hypothesis was that TDD is still vital, but I wanted to see if the “Just Do It” method could prove me wrong.

The result? I confirmed exactly what I expected: TDD is one of the best ways to create context for an LLM.

Personal Context & Tools

For this experiment, I returned to a project I started in a previous article: “Does AI Need Clear Goals? My Experiment in Turning Vague Ideas into Code”.

My tool of choice was GPT-4.1 (via GitHub Copilot), utilizing its Agent mode to handle multi-file context. Usually, I treat the AI as a pair programmer, following structured collaboration methods I’ve discussed in “Pair-Authoring with an AI: A Case Study in Structured Collaboration”.

But for this session, I acted as a “manager,” giving requirements and approving plans, but explicitly skipping the “Red” phase of TDD. I let the AI write the code first.

The Failed Experiment

The task was Story #2346: Implement a “Day of Week Pricing Plan”. The requirements were clear: users needed to compare power usage costs based on the day of the week and rank price plans accordingly.

I approved the AI’s plan and let it generate the implementation. Here is where the “No TDD” approach started to show its cracks.

1. The “Ghost Method” Problem After the AI implemented the service layer, my IDE lit up with errors. The AI used a method getDayOfWeekMultiplier(DayOfWeek) that didn’t exist. It “hallucinated” a method on the domain object because it was writing the service in isolation. I am usually fine with “Red” code, but this wasn’t TDD “Red”—this was just broken code requiring immediate fixes.

2. The Regression Nightmare When we fixed the missing method, we broke the existing logic.

PricePlanTest > shouldReceiveMultipleExceptionalDateTimes() FAILED

Because we implemented the new logic over the old logic without a guiding test, the AI introduced regressions. We had to do several iterations just to get back to a baseline.

3. The Context Disconnect The real struggle happened during Functional Testing. I asked the AI to verify the endpoints. It generated a test that tried to hit the API, but it returned a 404 Not Found. Why? The AI created a test that queried a Smart Meter ID, but “it didn’t have a context!”. It forgot that in this application, a Smart Meter must be linked to a Price Plan via the AccountService first. The AI tried to guess the solution, attempting to call an API /account/link/{smart-metter-id} that didn’t even exist.

Principles That Actually Work

I eventually finished the task without TDD, but it required multiple rollbacks and context corrections. Through this struggle, I confirmed why TDD works:

Principle 1: Tests Are Context Anchors The reason the AI failed the functional test setup was a lack of context. If I had written the test first, I would have been forced to set up the AccountService association immediately. The failing test provides the AI with a strict “Context Window” of what is required, as I explored in “The Context Window Paradox”.

Principle 2: Small Steps Prevent “Imagination” When the AI doesn’t have enough context, it tries to imagine the answer. TDD forces small, verifiable steps. By skipping the test, I forced the AI to generate a large chunk of logic (Controller + Service) at once, increasing the surface area for hallucinations.

Unexpected Discovery

The most painful part of skipping TDD wasn’t the coding—it was the debugging.

When I finally added tests after the implementation to verify the logic, one failed with a confusing error:

Expecting actual: {FRIDAY=[...]} to contain key: MONDAY

This revealed a critical weakness of the “Test After” approach. When a test fails, you don’t know where the problem is: “In the tests or in the business logic.”. It turned out to be an error in the test data (the date provided was a Friday, not Monday). If I had written the test first, the AI would have generated the implementation based on that test data. We wouldn’t have had this problem at all.

The Central Paradox

We tend to think that as AI gets smarter, we can think less. I touched on this in “Can We Think Less with AI?”.

But this experiment confirmed a paradox: To move faster with AI, you must slow down enough to write the test.

Can we avoid the loops of small context errors? Yes. TDD reduces complexity and creates trust between us and the AI . The test acts as a contract. Without it, you are just hoping the AI guesses your architectural constraints correctly.

Forward-Looking Conclusion

So, can we skip TDD? Yes, but you will spend more time adding additional context manually.

The power of TDD is approaching a new peak in the AI era: tests create a POWERFUL CONTEXT for LLMs. Modern models like GPT-4 are powerful, but “better LLM, not exclude context from that function”.

If you want to get the most out of your AI teammate, don’t just ask it to write code. Give it a failing test.

Does "Extract Method" Actually Hurt Your Readability?

Nik — Tue, 25 Nov 2025 08:01:13 GMT

We’ve all been there. A feature starts simple, maybe 20 lines. But after three or four iterations, that same function has ballooned to 200 lines, a tangled mess of nested if-else blocks.

Does that reality sound familiar?

When faced with this, we have two main choices. One way is to create tech debt, a task we’ll never really get to because we will always have more urgent priorities from the business. The other way was shown in the foundational book, Refactoring by Kent Beck and Martin Fowler. This path treats refactoring as a continuous action, not a tech debt item in the backlog.

But if we choose to refactor continuously, what does that really mean, and are our tools helping or hurting?

My Context and the “Easy” Button

Working in a Java/Kotlin environment, my tool of choice is IntelliJ IDEA. It’s an incredibly powerful IDE with a host of features designed to help.

When facing a 200-line monster method, the most obvious solution is right in the refactoring menu: “Extract Method”. It seems perfect. It makes the original method smaller, which is exactly what I want.

Right?

Introducing the Core Concept: Readability-Driven Refactoring

The main goal of refactoring shouldn’t just be “smaller methods.” For me, the main goals are readability and, secondarily, decoupling.

In fact, readability is arguably more important than adhering to a specific architecture or design pattern. While good architecture often improves readability, it’s not its primary goal. If I have a choice between perfect pattern adherence and readability, I will prefer readability. Working on a typical web application, it’s readability that helps me daily when I look at different parts of the code.

This is where the simple “Extract Method” tool falls short. It often just moves the mess, failing to improve readability.

A more powerful technique for guiding this process is Test-Driven Development (TDD). Instead of just extracting code, we use TDD to describe our expectations for the new, refactored code before we write it. This small shift in process fundamentally changes the quality of the refactoring.

Practical Application: A TDD-Led Refactoring

Let’s look at a practical example.

The Problem Code

Imagine we have this block of code in a method. It’s searching for properties, then mapping them to calculate Avios points, with error handling mixed in .

summaries =
    shc
    .psSearch(
        startDate = startDate,
        nights = nights,
        hotelCodes = it,
        adults = adultsParam,
        children = childrenParam,
        infants = infantsParam,
    ).toTypedArray()
    .mapNotNull { tbh ->
        kotlin
            .runCatching {
                aviosEarn = aviosAdapter.calculateAviosEarn(BigDecimal(tbh.summary!!.totalPrice!!))
                tbh.toAccommodationSummary(aviosEarn)
            }.onFailure { e ->
                SASAdapter.Companion.log.warn("Skipping", e)
            }
            .getOrNull()
    }.toList()

Common Pitfall: The ‘Extract Method’ Trap

If we use the “Extract Method” feature in our IDE, we get this:

Original method:

summaries = requestSummariesAndCalculateAviosEarn(startDate, nights, it, adultsParam, childrenParam, infantsParam)

New private method:

private fun requestSummariesAndCalculateAviosEarn(
    startDate: LocalDate,
    nights: Int,
    it: List,
    adultsParam: String,
    childrenParam: String,
    infantsParam: String,
): List =
    shc
        .psSearch(
            startDate = startDate,
            nights = nights,
            hotelCodes = it,
            adults = adultsParam,
            children = childrenParam,
            infants = infantsParam,
        ).toTypedArray()
        .mapNotNull { tbh ->
            calculateAviosEarnAndMapToAccommodationSummary(tbh)
        }.toList()

private fun calculateAviosEarnAndMapToAccommodationSummary(tbh: TBH): AccommodationSummary? {
    var aviosEarn: Int 
    return runCatching { 
        aviosEarn =
            aviosAdapter.calculateAviosEarn(BigDecimal(tbh.summary!!.totalPrice!!)) 
        tbh.toAccommodationSummary(aviosEarn) 
    }.onFailure { e -> 
        log.warn(”Skipping”, e) 
    }
        .getOrNull() 
}

Is this good? Not exactly. It makes the original method smaller, but it doesn’t improve readability. We’ve just created a new private method that takes a mess of parameters.

The Better Way: The TDD-Led Flow

Instead of using the IDE tool, let’s use the TDD technique.

Describe Expectations: We start by writing a test for the logic we want to have. We don’t want to just test a private method; this logic feels like it belongs in its own service.
Define the “To-Be” Service: We’ll create a test for a new SummaryAdapter. At first, this service is “red” (it doesn’t exist).
Discover the Parameter Problem: As we write the test and describe the method we want to call, we see the problem clearly: it needs too many parameters.
The Solution: The test itself shows us what we need. Instead of passing 6 individual parameters, we should pass a single SearchCriteria object. We define this object as an expectation of our test.
Implement: We now implement the new service, moving the logic from the old method.

The Result:

By extracting the logic to a new service and passing a parameter object, the original code now looks like this:

summaries = SummaryAdapter.requestSummariesAndCalculateAviosEarn(searchCriteria, it)

Did we improve readability? Yes. And not just because the method is smaller, but because we are no longer passing an excessive number of parameters, as we were with the simple “Extract Method”.

A Technique Over a Tool

IDE tools are wonderful, and techniques like TDD are powerful.

Of course, we could have used the IDE tools to change the method signature, create a new class, and move the method there. What the tool can’t do is help us understand what we want to do in the first place. We can’t describe our expectations to the tool.

TDD gives us that option: we describe our expectations before the work. This key difference is what truly changes the quality of our refactoring.

By knowing different techniques, we can understand when and which tool to use. Don’t let the tool lead the refactoring; let your technique guide the tool.

Does AI Need Clear Goals? My Experiment in Turning Vague Ideas into Code

Nik — Tue, 11 Nov 2025 08:00:50 GMT

We’re all told the same thing: AI needs clear, specific, and context-rich prompts to be useful. “Garbage in, garbage out.” This is especially true in engineering.

But what if your job isn’t to execute a clear task, but to find the task?

In my current work, we do a lot of research. Goals are not clear. We receive highly abstract, one-sentence ideas that need to be explored. This research is a necessary, messy process of discovery, and it’s full of “boilerplate” actions.

This got me thinking. We assume AI is for execution, but can we use it for exploration? What happens when you feed an AI a problem that you, the engineer, don’t even fully understand yet?

I ran an experiment to find out, starting with nothing but a single, vague sentence.

My Setup: From Vague Idea to Boilerplate

My goal was to see if I could use Generative AI to shepherd a “one-sentence idea” all the way to a foundational, runnable piece of code.

My toolkit was straightforward:

The Idea: A vague user story, “#2348: As an administrator I want to add a new tariff so that it can be advertised to users who may benefit”. This was perfect because it was so vague—what’s a “tariff”? How is it “advertised”?
The “Analyst” AI: I used Gemini 2.5 Pro to act as a Product Owner and flesh out this vague idea.
The “Developer” AI: I then used GitHub Copilot (CPT 4.1) in IntelliJ to write the boilerplate code.
The Project: All this was done in the context of TW “Joy of Energy” project, a Java Spring Boot application.

The plan was a two-part workflow:

Part 1: AI as Business Analyst. Feed the vague story to Gemini and ask it to define the requirement.
Part 2: AI as Boilerplate Generator. Feed the AI-generated spec to Copilot and ask it to write the code.

The Failed Experiment (That Was Actually a Success)

My first attempts were a perfect illustration of the “AI is context-blind” problem. The “failure” wasn’t that the AI was useless; it’s that its first drafts were wrong in very specific, instructive ways.

Failure 1: The AI “Product Owner” Became a Tech Lead I asked Gemini to act as a Product Owner and flesh out the story . It made a “very popular mistake”: it skipped the “what” and “why” and jumped straight to the “how.”

The very first draft of the spec it gave me wasn’t a user story; it was a technical task. It immediately suggested a JPA @Entity and defined fields like id as a UUID. It was already designing the database schema.

This is exactly what you don’t want from a user story, and it’s a common trap where the AI tries to be the engineer, not the analyst. As I’ve written before, the AI’s job is to reflect our needs, not just give us a technical answer (you can read more on that idea here: How GenAI Helps Engineers Write Better).

I had to intervene, critique the output, and explicitly ask it to “Change database to more abstract system” to get the clean, implementation-agnostic user story and Acceptance Criteria (ACs) I actually needed .

Failure 2: The AI “Developer” Was a Clumsy New Hire After I had a clean spec, I gave it to GitHub Copilot with a clear prompt: generate a POJO, an in-memory Service, and a Controller .

The code it generated was not “copy-paste and run”.

Wrong Package Structure: It invented a “by-feature” package structure (com.joi.energy.tariff). My project uses a “by-layer” structure (uk.tw.energy.domain, uk.tw.energy.service, etc.) .
Missing Dependencies: It correctly suggested using jakarta.validation annotations —a great idea!—but my project didn’t have that dependency.
Minor (Human) Errors: It even forgot the @Service annotation on the TariffService, a simple mistake I’ve made myself a dozen times.

If I were a junior engineer, I would have been blocked or, worse, just pasted it all in, breaking the project’s architecture.

Principles That Actually Work

These “failures” led me to the real principles of using AI for this kind of work.

1. The AI is a “Demultiplicator,” Not a Supercharger This was my single most important insight. A supercharger just makes the engine spin faster. A demultiplicator (like a reduction gear) changes the nature of the work, trading raw speed for torque.

The AI is a demultiplicator for my brain.

When I was iterating on the user story, I didn’t think about “how to write these words or if it sounds good”. I was 100% focused on the business goals. The AI handled the typing, and I handled the validating. This is a profound shift. It took me 30 minutes to get a solid user story, not because I typed fast, but because I thought fast, using the AI’s draft as a disposable starting point.

2. The Engineer’s New Job: Strategist and Context-Provider The AI’s mistakes weren’t stupid; they were context-blind. This reveals the engineer’s true role in an AI-augmented workflow: we are the “Reviewer and Strategist”.

My job wasn’t to write getters and setters. My job was to make two high-level strategic decisions:

“The AI is right, jakarta.validation is a good idea. I will add that dependency”.
“The AI is wrong about the package structure. I will correct it to follow our existing pattern”.

The AI’s “flawed” draft actually forced me to think strategically about my project’s architecture and dependencies.

3. Embrace the “90% Win” and the Iterative Loop The AI’s output doesn’t need to be 100% perfect to be valuable. The boilerplate it generated, despite its flaws, was a “90% win”. It saved me from the “boring boilerplate” and the hours I would have spent on Stack Overflow as a junior engineer.

More importantly, the AI’s mistakes are part of the value. That wrong package structure? It’s a great “recommendation for reorganizing your project” and a perfect topic to bring to a team huddle.

My Unexpected Discovery: “1:0 to AI”

The most surprising moment came during the boilerplate generation. I asked for three files (POJO, Service, Controller). The AI gave me four.

It proactively and correctly created a TariffType.java Enum (FLAT_RATE, TIME_OF_USE).

This was a perfect “micro-improvement”. I called it “1:0 to AI”. I was so focused on the “big picture” of the architecture that I missed this small, obvious detail. This “separating of responsibilities” is incredibly powerful : the AI handles the small details while I focus on the larger strategic goals.

The Central Paradox: AI’s Flaws Are Its Greatest Strength

This leads to the central paradox: The AI is terrible at handling vague, abstract ideas... and yet, it’s the best tool I have for the job.

Why? Because its value isn’t in giving you the right answer. Its value is in its ability to instantly turn a “blank page” into a flawed, tangible draft that you can critique.

The AI’s initial, flawed responses—the over-technical user story, the context-blind package structure—are its most valuable feature. They act as a mirror, forcing the engineer to define the context and make the strategic decisions. It can’t read your mind, so it forces you to figure out what’s in it.

Effective use doesn’t require a perfect prompt. It requires an engineer to stop acting like a typist and start acting like an editor, a critic, and a strategist.

Conclusion: From Vague to Validated

The AI didn’t solve my vague problem. It gave me the tools to solve it myself, faster and at a higher level of abstraction.

By delegating the “boring boiler plate code” , I was able to stay focused on the “big picture” and “business needs”. This workflow is a powerful way to accelerate research, allowing us to build, test, and throw away foundational ideas at a speed we couldn’t before.

The AI isn’t here to replace us. It’s here to take the routine work and free us to focus on the hard parts. It’s a “demultiplicator” that gives us the torque to move from a one-sentence idea to a validated, runnable foundation —flaws and all.

What If the ‘Cleanest’ Code Is the Wrong Solution?

Nik — Tue, 28 Oct 2025 08:00:38 GMT

In our continuing experiment with Trio Programming—two engineers and an AI—we decided to level up. Our first session was a slow, painful grind of fixing our environment. This time, with a stable foundation, we aimed for speed. Our new strategy: write comprehensive tests ourselves, then give the AI the freedom to implement the solution in one big step.

The initial results were promising. The AI produced working code that passed our tests. But then, our instincts as seasoned developers kicked in. We saw the AI’s implementation—a simple Map—and reflexively identified it as a “code smell”. We spent the next hour trying to refactor it into a “cleaner,” more object-oriented design using the Composite pattern.

That’s when we fell into a trap. Our pursuit of clean code was leading us toward a solution that was elegant, sophisticated, and completely wrong. This led us to our second major discovery: In AI-augmented development, the biggest risk isn’t bad AI code, but good human intuition applied to the wrong problem.

Our Setup: Aiming for a Bigger Step

Our team remained the same: I (Nik) acted as the driver for GitHub Copilot, while Javier served as the strategic navigator. Having stabilized our Java, Spring Boot, and Gradle environment in the last session, we were ready to test a new hypothesis: if we write strong, expectation-focused tests, we can trust the AI with a larger implementation scope and move much faster.

The flow was simple:

Human engineers write a small, focused test with clear assertions.
Let the AI generate the implementation code in a single, larger step to make the test pass.
Trust the tests to validate the AI’s work, rather than meticulously reviewing every line of generated code.

The Failed Experiment: Refactoring into a Corner

The first part of the experiment worked. We added two tests for our hierarchy API, one for a root-only employee and one for a simple employee-supervisor relationship. We then prompted the AI: “tests looks good, let’s make postHierarchy method for passing all of them”.

The AI’s implementation worked, save for one minor edge case we quickly fixed. But we weren’t satisfied. The code returned a Map, and our developer brains screamed for type safety and better design.

The “Code Smell” Diagnosis: We prompted the AI with our concern: “maybe, response object will make the readability of the code better and will reduce smell of code?”. This initiated a refactoring plan to introduce a dedicated HierarchyNode class.
Applying a Design Pattern: We pushed further, suggesting a more formal structure: “maybe we can apply composite pattern... to our response?”. The goal was to create a pure, object-oriented hierarchy and eliminate the Map entirely.
The Collision with Reality: Our final prompt revealed the fatal flaw in our logic: “can we avoid to use Map if we will use Spring Boot which we have in our project?”.

The AI’s response was the turning point. It patiently explained that given our requirement for dynamic JSON keys (e.g., “Jonas”: { “Sophie”: ... }), a Map or a structure that serializes like one was unavoidable with Spring Boot and its default Jackson serializer.

We had spent a significant part of our session chasing an elegant design that was fundamentally incompatible with the constraints of our framework and the explicit requirements of the kata. As I noted in my log, “we spend time trying to add something not workable to the code”. The AI’s initial, simpler solution wasn’t a code smell; it was the correct, pragmatic solution from the start.

Principles That Actually Work

This humbling experience confirmed our new hypothesis and revealed principles for a more effective human-AI workflow.

Focus on “What,” Not “How” (Test-Focused Development). Our initial strategy was correct. The most valuable role for the human developers is to define the behavior of the system through precise, comprehensive tests. When we focused on the expected JSON output, the AI produced correct code. When we focused on our preconceived notions of “good” internal implementation, we wasted time. The tests are the contract; the AI’s job is to fulfill it.
The AI is a Mirror for System Constraints. The AI is more than a code generator; it’s an interactive expert on the toolchain. It didn’t just reject our idea; it explained why it wouldn’t work within the Spring Boot ecosystem. This prevented us from going further down a dead-end path. Use the AI not just to write code, but to validate your architectural assumptions against the framework’s reality.
Codify Your Learnings into the System. A failed experiment is only a waste if you don’t learn from it. The most productive outcome of our refactoring dead-end was updating our .github/copilot-instructions.md file. We added an explicit refactoring protocol and guidance on when to challenge the AI’s use of patterns versus accepting framework constraints. This turns a session’s lesson into a permanent upgrade for the trio’s workflow.

Unexpected Discovery: AI Generalizes from Specifics

After our refactoring detour, we returned to our Test-Focused workflow. We added much more complex tests, including one with multiple employees reporting to the same supervisor and another with a full four-level hierarchy.

The surprising part? The AI’s existing implementation passed these complex tests without any modifications. This revealed a powerful insight: the AI is remarkably good at generalizing a solution. It needed a few simple, specific test cases to establish the core logic. Once that logic was in place, it was robust enough to handle more complex scenarios automatically. The “big step” works, but it needs to be built on a foundation of small, clear examples.

The Central Paradox of AI-Driven Speed

This leads to the central paradox we uncovered in this session: To move faster with big, AI-generated implementation steps, you must first slow down and write smaller, more precise human-guided tests.

Our desire for speed was not at odds with the discipline of TDD; it was enabled by it. The quality of the AI’s large-scale contribution was directly proportional to the quality of the small-scale expectations we defined. You cannot achieve reliable speed by simply telling the AI “build this feature.” You achieve it by saying “build something that satisfies these very specific, verifiable behaviors.”

Conclusion: We Are Architects of Behavior, Not Just Code

Our second session was a success, but not because we wrote code faster. It was a success because we learned how to trust our tests more than our own implementation habits. The “Test-Focused Development” rhythm—small tests by humans, big implementation by AI—feels right.

The dynamic is shifting. Our job is becoming less about crafting the perfect implementation and more about architecting the perfect set of expectations. We define the contract with rigorous tests, and the AI, our tireless third programmer, finds the most direct way to fulfill it—even if it’s not the way we would have written it ourselves.

Nik Malykhin: Production

De-Risking the Database Migration

The Monolithic Bottleneck: Stored Logic and On-Premises Lock-In

The Reference Architecture Strategy: The Initial Component and the Operational Tax

Scaling Migration Capabilities Across Internal Teams

The Technical Execution: Deconstruction, Emulation, and Decoupling

Integrating the Target Cloud Architecture

Moving Logic from the Storage Tier to Compute Workers

Quantifying the Architecture: Financial and Performance Outcomes

Analyzing Cost and Computational Velocity

Designing AI-Driven Development Workflows

Architectural Slicing and Pull Request Topography

Discovery Mechanisms and Cognitive Loading

Estimation Metrics and Delivery Impact

Long-Term Repository Maintenance and Documentation Bloat

Chronological Integration Patterns

The Custom Workflow Evolution

The Spec-Kit Framework Evolution

Final Assessment: To Customize or Adopt As-Is

The Non-Transactional Reality of PostgreSQL Sequences

The Expectation of Monotonicity in Order Systems

Simulating the Anomalies: Forward and Backward Jumps

The Forward Jump and the Mechanics of the Write-Ahead Log

The Backward Jump and Uncommitted States

Evaluating Alternatives: The Flawed Custom Counter Workaround

Architectural Best Practices for Kotlin Applications

Human Overwatch in AI Code Generation

Designing a Duplicate Protection Hashing Service

The Baseline AI-Generated Implementation

The Conflict of Speed Versus Security

Identifying the True Microbenchmarking Hotspots

Implementing Immediate Algorithmic and Structural Upgrades

Optimizing the Canonicalization Routine

Strategic Improvements in Memory Management

Automatic Sorting via TreeMap

Elimination of Lambda Allocations

Pre-Sized Array Allocation

Direct Constructor Instantiation

The Production-Ready Hash Service

The Intersect of Generative Artificial Intelligence and Enterprise Engineering

When AI Breaks Database Parity

The Landscape of Database Selection and the Integration Testing Paradigm

The Illusion of Compatibility and the Environmental Disparity Trap

The Architectural Evolution of Local Infrastructure

The Token Regression: Generative AI and Legacy Patterns

Implementing Local Parity through Automation

Declarative Local Infrastructure with Docker Compose

Automating Container Lifecycles within the Gradle

A Systematic Approach to AI in Production

The Evolution Toward Triad Programming

Establishing the AI Environment through Context

Architectural Constraints and Testing Strategies

The Practical Workflow: From Init Prompt to Autopilot

Slicing and Iterative Implementation

Human Oversight and Integration

Diagnosing Observability Gaps in Blocking Controller Methods

Analyzing the Execution Flow

Implementing a Robust Logging Lifecycle

Structured Implementation and MDC Hygiene

Interpreting Downstream Service Signals

Conclusion on Implementation Choices

The Preparation of the Machine

The Sim Racing Setup

Hierarchies of Instruction

Slicing Against the Grain

The Logic of Two Flows

The Evolutionary Tree

Back to Reality

The Shared Reality of the Database Ledger

The Migration Ledger

Iteration in the Local Loop

The Virtue of Squashing

Constraints of the Persistent Environment

Discipline Over Magic

The Cognitive Cost of AI Delegation

The Brake-Fade on the Downhill (The Hook)

The Architecture of the Proxy Mind (The Landscape)

The A4 Saturation Point (The Stress Test)

The Noise Floor of the Preamble (The Handoff)

The Hard Character Limit (The Verification)

The Illusion of Compatibility and
the Environmental Disparity Trap

The Token Regression:
Generative AI and Legacy Patterns

Use Case B: From `main()` to JUnit 5