<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Nik Malykhin: Production]]></title><description><![CDATA[Deep dives into GenAI, App modernization, and the philosophy of building systems with AI teammates.]]></description><link>https://www.nikmalykhin.com/s/production</link><image><url>https://substackcdn.com/image/fetch/$s_!-Ojx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d27381-c618-42b7-a15f-62e1d625e22d_1280x1280.png</url><title>Nik Malykhin: Production</title><link>https://www.nikmalykhin.com/s/production</link></image><generator>Substack</generator><lastBuildDate>Fri, 05 Jun 2026 18:42:19 GMT</lastBuildDate><atom:link href="https://www.nikmalykhin.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Nik Malykhin]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[nik1379616@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[nik1379616@substack.com]]></itunes:email><itunes:name><![CDATA[Nik]]></itunes:name></itunes:owner><itunes:author><![CDATA[Nik]]></itunes:author><googleplay:owner><![CDATA[nik1379616@substack.com]]></googleplay:owner><googleplay:email><![CDATA[nik1379616@substack.com]]></googleplay:email><googleplay:author><![CDATA[Nik]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[When AI Breaks Database Parity]]></title><description><![CDATA[The Landscape of Database Selection and the Integration Testing Paradigm]]></description><link>https://www.nikmalykhin.com/p/when-ai-breaks-database-parity</link><guid isPermaLink="false">https://www.nikmalykhin.com/p/when-ai-breaks-database-parity</guid><dc:creator><![CDATA[Nik]]></dc:creator><pubDate>Tue, 02 Jun 2026 07:03:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-Ojx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d27381-c618-42b7-a15f-62e1d625e22d_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>The Landscape of Database Selection and the Integration Testing Paradigm</h2><p>According to global database engine rankings, relational models continue to dominate the software development landscape. The top positions are consistently occupied by Oracle, MySQL, Microsoft SQL Server, and PostgreSQL, with MongoDB following closely as the primary document-oriented alternative. In my own architectural designs, PostgreSQL serves as the primary relational database engine, complemented by Amazon Web Services S3 for object storage.</p><p>Previously, I explored the complexities of managing database migrations with Flyway. Today, I want to extend that conversation to address database integration testing and the critical requirement of environmental parity. For a considerable duration within Java and Kotlin development stacks, the H2 database engine served as the standard default for local execution and integration testing. As an in-memory, runtime-configured database, H2 provides seamless integration with the Spring Framework and requires zero external infrastructure installation. The engine also supports a dedicated PostgreSQL compatibility mode, which historically made it an appealing candidate for simulating a production environment during local development.</p><h2>The Illusion of Compatibility and <br>the Environmental Disparity Trap</h2><p>While H2 excels as a lightweight runtime database when interactions are mediated entirely by abstract object-relational mapping frameworks, its feature parity with PostgreSQL falls short of complete functional duplication. The compatibility boundary rarely covers advanced native database capabilities, leading to subtle and disruptive behavioral deviations between development and production environments.</p><p>For instance, H2 natively supports specific windowing functions like <em>ROWNUM</em>, which are completely absent in PostgreSQL. Conversely, writing advanced queries that exploit native PostgreSQL functions or triggers quickly exposes the limitations of the compatibility mode. The critical nature of this gap becomes evident during schema migration lifecycle events.</p><p>During a recent project iteration, our development workflow required introducing an MD5 hashing mechanism to process historical records during a data migration phase. The PostgreSQL syntax accepts a simple byte array input for its native <em>md5</em> function. When Flyway attempted to execute this migration script against the local H2 testing instance, the build failed immediately. The H2 engine does not recognize this function format, requiring an entirely different functional signature known as <em>HASH</em>, which demands an explicit algorithm string and expression parameters. This mismatch highlights the structural risk of relying on a simulated environment.</p><blockquote><p>True environmental parity cannot be achieved by translating syntax at runtime; it requires validating software against the exact engine configuration slated for production deployment.</p></blockquote><h2>The Architectural Evolution of Local Infrastructure</h2><p>The necessity of accepting the behavioral compromises of an in-memory database has been thoroughly eliminated by advancements in containerization and build-tool integration. The introduction of Docker fundamentally modified local engineering environments, a transformation subsequently extended to automated testing via the Testcontainers framework.</p><p>With the release of Spring Boot 3.1.0 in the spring of 2023, the framework introduced built-in, first-class configuration mechanisms for Testcontainers. This development eliminated the primary architectural justification for maintaining a split database architecture between testing and production. Even for projects maintaining simple data models, the modern tooling ecosystem removes the necessity of managing an alternate database dialect for local verification.</p><h2>The Token Regression: <br>Generative AI and Legacy Patterns</h2><p>The availability of modern containerized alternatives raises a pertinent question as to why environmental disparity remains a topic of discussion in 2026. The emergence of generative artificial intelligence as a ubiquitous development tool provides the explanation. During a concurrent development phase involving the bootstrapping of four distinct microservices, my engineering team utilized GitHub Copilot to accelerate the generation of service skeletons and initial configuration manifests.</p><p>Because generative models predict output tokens based on historical training data, their recommendations are heavily weighted toward long-standing industry conventions. Due to the decade-long prominence of H2 in historical Spring tutorials and code repositories, the assistant recommended an in-memory H2 configuration for local development. The engineers initializing the services accepted this recommendation as a functional baseline, thereby reintroducing legacy environmental friction back into a modern development stack.</p><blockquote><p>Generative code assistants operate on statistical probability derived from historical data, which can inadvertently cause architectural regressions by propagating legacy best practices into modern codebases.</p></blockquote><h2>Implementing Local Parity through Automation</h2><p>To resolve the structural friction caused by mismatched database engines, we replaced the in-memory simulation with a containerized PostgreSQL instance dedicated to local execution. To ensure this change did not introduce manual overhead to the developer workflow, we integrated the container lifecycles directly into our build orchestration layer.</p><h3>Declarative Local Infrastructure with Docker Compose</h3><p>The local database environment is declared using a concise seventeen-line Docker Compose configuration. This manifest utilizes a lightweight Alpine Linux distribution of PostgreSQL 17.9 and includes an explicit readiness health check to ensure dependent tasks block until the database engine is fully initialized.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;yaml&quot;,&quot;nodeId&quot;:&quot;22401e62-da02-42a4-88a3-f31e4be83d77&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-yaml">name: one_service

services:
 postgres:
   image: postgres:17.9-alpine
   container_name: one-service-postgres
   environment:
     POSTGRES_DB: one_service
     POSTGRES_USER: admin
     POSTGRES_PASSWORD: admin
   ports:
    - "5432:5432"
   healthcheck:
     test: ["CMD-SHELL", "pg_isready -U admin -d one_service"]
     interval: 10s
     timeout: 5s
     retries: 5</code></pre></div><p>This configuration allows developers to manage the entire infrastructure state directly from the terminal using standard compose lifecycle commands.</p><h3>Automating Container Lifecycles within the Gradle</h3><p>To eliminate manual intervention entirely, we registered custom execution tasks within the Kotlin DSL build configuration file (build.gradle.kts). These tasks manage the container lifecycle programmatically, guaranteeing that the database is active during specific phases such as schema generation or local application execution.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;kotlin&quot;,&quot;nodeId&quot;:&quot;9e88bf8e-4446-408e-b49a-c5df17270af1&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-kotlin">val composeUpPostgres by tasks.registering(Exec::class) {
   group = "documentation"
   description = "Starts local Postgres container and waits until it is healthy"
   commandLine("docker", "compose", "up", "-d", "--wait", "--wait-timeout", "120", "postgres")
}

val composeStopPostgres by tasks.registering(Exec::class) {
   group = "documentation"
   description = "Stops local Postgres container after OpenAPI generation"
   commandLine("docker", "compose", "stop", "postgres")
}</code></pre></div><p>By utilizing Gradle task graph dependencies, these infrastructure tasks are hooked automatically into the application build process. For example, generating OpenAPI documentation requires an active database to resolve the schema accurately. We map this dependency explicitly using the build task lifecycle.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;kotlin&quot;,&quot;nodeId&quot;:&quot;0cf1bbb9-c2b9-45ac-9803-63677e5a62cf&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-kotlin">tasks.named("generateOpenApiDocs") {
   dependsOn(composeUpPostgres)
   finalizedBy(composeStopPostgres)
   ...
}</code></pre></div><p>This structural configuration ensures that the container initializes prior to the generation task and terminates cleanly upon completion, removing manual environmental variance from the automated workflow.</p><p>Ultimately, the architectural tools available mean there are very few justifications for maintaining an in-memory database simulation in a modern ecosystem. When automated assistants suggest these legacy configurations, human engineers must remain the final arbiters of architectural validity, recognizing that <em>statistical probability</em> does not always equate to engineering excellence.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.nikmalykhin.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe to get more practical guides on using GenAI tools effectively in software development work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[A Systematic Approach to AI in Production]]></title><description><![CDATA[Implementing Triad Programming]]></description><link>https://www.nikmalykhin.com/p/a-systematic-approach-to-ai-in-production</link><guid isPermaLink="false">https://www.nikmalykhin.com/p/a-systematic-approach-to-ai-in-production</guid><pubDate>Tue, 19 May 2026 07:01:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-Ojx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d27381-c618-42b7-a15f-62e1d625e22d_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I have utilized generative AI tools such as ChatGPT and GitHub Copilot for several years, but the central question that has consistently occupied my research is how to effectively apply these technologies within a production environment. Through dozens of experiments, I have moved beyond simple code generation to delivering production-ready stories with minimal manual intervention. My objective is to transition from viewing AI as a mere novelty to integrating it into a functional triad programming model.</p><h2>The Evolution Toward Triad Programming</h2><p>In my experience, modern enterprise software cannot be developed in isolation; it requires a collaborative team effort. For roughly six months, I have explored the transition from traditional pair programming to triad programming, where an AI teammate joins the human pair to facilitate development. This transition requires a cultural shift within the team to move from treating AI as a buzzword to utilizing it as a practical tool.</p><p>The support of technical leadership is an important prerequisite for this shift. Without such backing, changing established team initiatives and workflows is difficult. To support this cultural change, we organized internal sessions and weekly two-hour workshops dedicated to demystifying the technology. By exploring how to master context and refine instructions, the team can eliminate the <em>magical</em> perception often associated with artificial intelligence and treat it as a predictable component of the engineering process.</p><h2>Establishing the AI Environment through Context</h2><p>Defining the AI environment is an ongoing challenge, especially given the limitations inherent in production workflows. For my current purposes, I define the environment as the context provided to the model, which effectively makes the AI environment equal to its instructions. Whether these instructions are provided through a prompt, a specific configuration file, or an MCP server, they serve as the foundational constraints for the AI's output.</p><blockquote><p>I believe it is essential to manage the AI environment as closely as possible to the development process. This allows the team to remain agile and make necessary changes without creating disconnected silos of instruction.</p></blockquote><p>A significant advantage of this approach is the ability to leverage existing, plain-English documentation rather than creating specialized AI adaptations. For example, I use the team's standard Confluence page for quality assurance and testing strategies as a direct instruction set. This documentation outlines requirements such as ensuring every acceptance criterion is covered by a test and avoiding complex end-to-end suites in favor of integration coverage. Decoupling the testing strategy from AI-specific formatting ensures that if the team updates their standards, the AI's context is automatically updated, while the documentation remains readable for non-engineering stakeholders.</p><h2>Architectural Constraints and Testing Strategies</h2><p>To reduce cognitive load and provide clear boundaries for the AI, my team established a strict architectural agreement for our services. We utilize a hexagonal architecture, which is documented in Confluence to ensure consistency when engineers rotate between different services. This structure includes a defined hierarchy of adapters, controllers, and domain use cases.</p><p>The current structure organizes components into clear packages such as:</p><ul><li><p>com.todo.adapter.controller <em>for handling external requests and DTOs</em></p></li><li><p>com.todo.adapter.supplier <em>for repository adapters and external client configurations</em></p></li><li><p>com.todo.domain <em>for core exceptions, models, and use cases</em></p></li></ul><p>While this structure is optimized for organizational clarity rather than pure readability, it serves as a robust framework that prevents the AI from generating unexpected or hallucinated results. By grounding the AI in these established conventions, we save significant time that would otherwise be spent on custom instruction maintenance.</p><h2>The Practical Workflow: From Init Prompt to Autopilot</h2><p>The bridge between our documentation and the code is the initialization prompt. I have found that the most effective flow involves using ChatGPT, which has integrated connections to Jira, Confluence, and our GitHub repositories. This allows me to create a prompt that references specific Jira stories and Confluence guidance pages directly.</p><p>When provided with these links, ChatGPT analyzes the story details, the codebase structure, and the architectural standards to generate a grounded implementation plan. This plan maps to actual ports and adapter conventions rather than generic advice. This approach also facilitates a dialogue between human pair partners, as the chat becomes a shared space for reaching an agreement before the final prompt is passed to GitHub Copilot.</p><h2>Slicing and Iterative Implementation</h2><p>A critical aspect of using AI in production is task slicing. To prevent the AI from attempting to generate non-existent dependencies, it is vital to isolate fragments of the story. For a simple task involving a controller, a use case, and a client, I follow a isolated sequence:</p><ol><li><p>Implement a controller with a hard-coded response.</p></li><li><p>Implement the client that connects to the external service.</p></li><li><p>Develop the use case to bridge the domain model and the client.</p></li><li><p>Update the controller to utilize the new use case.</p></li></ol><p>Each slice follows a rigorous autopilot loop within GitHub Copilot. I provide a specific instruction set that mandates a test-driven development cycle:</p><ol><li><p>Analyze the task and the repository for alignment.</p></li><li><p>Create tests and mark them as skipped until the plan is approved.</p></li><li><p>Establish an implementation order for the tests.</p></li><li><p>Iterate through each test by removing the skip marker, implementing the code, and verifying the test passes.</p></li><li><p>Execute a full build, such as gradle clean build test, after each passing test to ensure overall system stability.</p></li></ol><h2>Human Oversight and Integration</h2><p>Despite the high level of AI involvement, human oversight remains a non-negotiable requirement for production code. I request that Copilot organize the resulting files into commit groups that are easy for a person to understand before opening a pull request.</p><blockquote><p>By keeping pull requests small and isolated, they remain manageable for human review, ensuring they meet specific client requirements and that the human engineers maintain a deep understanding of the codebase.</p></blockquote><p>This workflow demonstrates that by leveraging existing organizational processes and treating AI as an integrated teammate rather than an external tool, we can deliver high-quality software with greater efficiency and consistency.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.nikmalykhin.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe to get more practical guides on using GenAI tools effectively in software development work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Diagnosing Observability Gaps in Blocking Controller Methods]]></title><description><![CDATA[In a distributed system, the invisibility of an expected log entry often signals a deeper divergence between execution flow and infrastructure expectations.]]></description><link>https://www.nikmalykhin.com/p/diagnosing-observability-gaps-in</link><guid isPermaLink="false">https://www.nikmalykhin.com/p/diagnosing-observability-gaps-in</guid><pubDate>Tue, 12 May 2026 07:02:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-Ojx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d27381-c618-42b7-a15f-62e1d625e22d_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In a distributed system, the invisibility of an expected log entry often signals a deeper divergence between execution flow and infrastructure expectations. During a recent implementation of a test email functionality within a Kotlin-based service, I encountered a scenario where logs in Datadog appeared for certain execution paths but remained absent for others. This inconsistency prompted an investigation into the interaction between the Kotlin <em>when</em> expression, blocking downstream calls, and the lifecycle of a request within the Datadog logging pipeline.</p><p>The target of this investigation was the <em>sendTestEmail</em> method located in the <em>TestEmailController</em>. The domain logic returns three distinct results: <em>Success</em>, <em>FeatureTurnedOff</em>, and <em>Error</em>. While the <em>FeatureTurnedOff</em> case consistently produced logs in the monitoring dashboard, the <em>Success</em> and <em>Error</em> outcomes frequently failed to emit the final confirmation log.</p><h2>Analyzing the Execution Flow</h2><p>The initial hypothesis centered on potential issues with the Kotlin <em>when</em> block or a misconfiguration of the Mapped Diagnostic Context (MDC). However, the technical finding revealed a more fundamental cause related to execution timing and the nature of the downstream service interaction.</p><p>The <em>FeatureTurnedOff</em> result is a short-circuit path. When the feature toggle is disabled, the use case returns a result immediately, allowing the controller to reach the final log statement and exit within a negligible timeframe. Conversely, both the <em>Success</em> and <em>Error</em> paths require a call to a downstream notification service. This call is implemented using a blocking mechanism via the <em>.block()</em> method on a reactive stream.</p><blockquote><p>The discrepancy in log visibility was not a failure of the logging library but a consequence of the controller thread waiting on a blocking call. If the downstream service experienced latency or if the client closed the connection before the call completed, the final log statement was never reached or recorded.</p></blockquote><p>This behavior was corroborated by Datadog errors indicating that the stream was closed by the client and that there were errors reading events. In environments utilizing the <em>ssm-agent-worker</em>, these interruptions can occur when the infrastructure or the initiating client terminates the request context before the application finishes its blocking operation.</p><h2>Implementing a Robust Logging Lifecycle</h2><p>To resolve the visibility gap, I restructured the logging strategy to separate request arrival from processing outcomes. By introducing a log statement immediately upon entry to the controller method, I ensured that a record exists regardless of how the downstream call performs.</p><p>The revised implementation follows a deliberate pattern of enrichment and cleanup. I utilized MDC to attach structured metadata to the log records, which facilitates precise filtering in Datadog. It is essential to avoid generic MDC keys such as <em>status</em>, as these often conflict with reserved fields or common conventions in log aggregators. Instead, I opted for specific identifiers like <em>testEmailOutcome</em> and <em>templateId</em>.</p><h3>Structured Implementation and MDC Hygiene</h3><p>The following structure ensures that the MDC is populated at the start of the request and, crucially, cleared in a finally block to prevent context leakage between threads.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;kotlin&quot;,&quot;nodeId&quot;:&quot;1e570632-6612-4578-afbb-aa1d5ab79ea8&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-kotlin">try {
    MDC.put("templateId", request.templateId)
    logger.info("Test email request received")

    val status =
        when (val result = sendTestEmailUseCase.execute(request.templateId)) {
            is SendTestEmailResult.Success -&gt; {
                MDC.put("testEmailOutcome", "test email was sent successfully")
                HttpStatus.CREATED
            }

            is SendTestEmailResult.FeatureTurnedOff -&gt; {
                MDC.put("testEmailOutcome", "feature toggle is off, test email was not sent")
                HttpStatus.ACCEPTED
            }

            is SendTestEmailResult.Error -&gt; {
                MDC.put("testEmailOutcome", "test email failed to send")
                MDC.put("testEmailErrorMessage", result.cause.message ?: "unknown error")
                HttpStatus.INTERNAL_SERVER_ERROR
            }
        }

    logger.info("Test email request processed")

    return ResponseEntity
        .status(status)
        .body(SendTestEmailResponse(templateId = request.templateId))
} finally {
    MDC.clear()
}</code></pre></div><p>This approach provides a clear narrative in the logs. The <em>Test email request received</em> log serves as a heartbeat, confirming the controller was reached. The final <em>Test email request processed</em> log confirms the blocking call completed and indicates which branch of the <em>when</em> logic was executed.</p><h2>Interpreting Downstream Service Signals</h2><p>Understanding the relationship between the application and the notification service is vital for interpreting the logs. For instance, an observed HTTP 400 Bad Request error from the notification service endpoint indicates that the feature toggle was active and the application successfully initiated the call. Because this is a terminal error from the downstream provider, the result maps to <em>SendTestEmailResult.Error</em>.</p><blockquote><p>Logging the specific error message from the result cause into a dedicated MDC field allows for immediate debugging of downstream rejections without requiring a manual trace of the network call.</p></blockquote><p>The introduction of the early log statement fixed the observability issue for all three execution paths. It provides a reliable controller-level record that the request was received before any slow or failing downstream behavior could interfere with the logging thread.</p><h2>Conclusion on Implementation Choices</h2><p>The decision to add a pre-call log and wrap the execution in a try-finally block was a logical response to the constraints of blocking I/O. While reactive, non-blocking patterns are often preferred, existing architectural constraints sometimes necessitate the use of <em>.block()</em>. In such cases, the primary responsibility of the developer is to ensure that the system remains observable even when execution is stalled.</p><p>By grounding the logging strategy in the lifecycle of the request rather than just the final outcome, I established a more resilient monitoring posture. The logs now clearly differentiate between request arrival, downstream processing, and final controller outcome, providing the necessary context to diagnose failures in a distributed environment.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.nikmalykhin.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe to get more practical guides on using GenAI tools effectively in software development work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[The Preparation of the Machine]]></title><description><![CDATA[The Sim Racing Setup]]></description><link>https://www.nikmalykhin.com/p/the-preparation-of-the-machine</link><guid isPermaLink="false">https://www.nikmalykhin.com/p/the-preparation-of-the-machine</guid><pubDate>Tue, 28 Apr 2026 07:01:44 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-Ojx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d27381-c618-42b7-a15f-62e1d625e22d_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>The Sim Racing Setup</h2><p>I&#8217;ve spent some time in this industry to know that the promise of &#8220;plug-and-play&#8221; is usually a lie told to people who don&#8217;t have to maintain the results. We&#8217;ve grown accustomed to our IDEs functioning almost perfectly the moment we install them, which has created a bit of a lazy habit in our collective psyche. We expect our tools to meet us where we are without any effort on our part. But when I look at the current state of Generative AI, I&#8217;m reminded much more of high-performance sim racing or building a custom PC. You <em>can</em> just plug a wheel into a desk and start driving, but you won&#8217;t actually feel the road, and you certainly won&#8217;t win any races. To get professional results, you have to embrace the preparation. The setup isn&#8217;t an annoying preamble; it is the work itself.</p><h2>Hierarchies of Instruction</h2><p>In my recent experiments, I&#8217;ve moved away from treating ChatGPT as a blank slate. Instead, I&#8217;ve been refining a two-tier configuration that relies on <strong>Project Instructions</strong>, which are specific directives tailored to a particular codebase or business domain that work in tandem with my global settings. I found that by splitting instructions between a global level&#8212;who I am and how I want to be spoken to&#8212;and a project level, I could stop the AI from hallucinating a generic solution. This isn&#8217;t about giving the AI a long list of rules to follow blindly. It&#8217;s about creating a runtime environment that respects the reality of my actual repository.</p><h2>Slicing Against the Grain</h2><p>There is a fundamental tension in how we break down work for a machine versus how we break it down for a human. In the agile world, we are taught the value of a <em>Vertical Slice</em>, which is a functional piece of work that touches every layer of the system to deliver a complete feature. When I am working with AI, however, I&#8217;ve found that this approach often leads to a mess. I&#8217;ve started practicing a methodology where I break a complex story into isolated, technical layers&#8212;repository, use case, then controller&#8212;as separate steps. I didn&#8217;t set out to slice the &#8220;layers of a pie&#8221; instead of the &#8220;slices of a cake&#8221; because I thought it was a better way to design software; I did it because I <em>found</em> it simply works better for the AI&#8217;s current reasoning capabilities. It&#8217;s an empirical adjustment. By forcing the AI to focus on one technical layer at a time, I prevent the logic from becoming a tangled knot of half-finished abstractions.</p><h2>The Logic of Two Flows</h2><p>Within these project instructions, I&#8217;ve found success by defining two distinct paths of interaction. I call these Flow-Based Prompts, a system where the AI knows whether we are in an analysis phase or an execution phase.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;markdown&quot;,&quot;nodeId&quot;:&quot;4e21f924-a4a8-492a-8ff4-3b27d2e07960&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-markdown">Flow 1: Analysis &amp; Slicing
- Goal: Digest the Jira story and propose the technical slices.
- Output: A structured implementation plan.

Flow 2: Prompt Generation
- Goal: Create a specific instruction for GitHub Copilot.
- Output: A isolated prompt for a single technical layer.</code></pre></div><p>In the first flow, the AI acts as a sounding board, helping me decompose a story and identify the technical boundaries. In the second flow, it transitions into a generator, producing the exact context needed for GitHub Copilot to write the code. This prevents the &#8220;handoff&#8221; problem where context gets lost between the chat window and the code editor. It ensures that when I move to my IDE, the instructions are already tailored to the specific slice of the system I am currently building.</p><h2>The Evolutionary Tree</h2><p>Of course, I&#8217;ve been skeptical of &#8220;perfectly automated&#8221; prompts that try to handle every edge case from the start. I&#8217;ve discarded that idea for now because, at this stage of my understanding, those prompts usually just add unnecessary weight and noise. However, I don&#8217;t think we are stuck here. I suspect that as we get better at this, our instruction sets will evolve into something more like a tree. The system won&#8217;t just be a static list of rules; it will be an adaptive structure that detects the current context of the work and branches out to provide exactly the right level of detail.</p><blockquote><p>We are moving toward a future where the tool detects the type of instruction needed rather than requiring us to shout the same commands every morning.</p></blockquote><p>For now, the manual setup is where the value lives. It&#8217;s the difference between a tool that guesses and a tool that knows.</p><h2>Back to Reality</h2><p>In the end, I&#8217;m keeping the slicing methodology and the dual-flow instruction setup in my toolkit. I&#8217;ve set aside the hunt for a &#8220;magic&#8221; prompt that solves everything in one go. Reality is messy, and our tools need to be flexible enough to reflect that. We should be skeptical of any AI workflow that promises to do the thinking for us. The real value is in the preparation&#8212;the configuration of the environment&#8212;that allows us to do our best thinking with a bit less friction.</p><div><hr></div><p><em><strong>Further Reading / Related Reflections</strong></em></p><ul><li><p><em><a href="https://www.nikmalykhin.com/p/pragmatic-hexagon">The Pragmatic Hexagon: scaling decoupling without complexity</a> </em></p></li><li><p><a href="https://help.openai.com/en/articles/10169521-projects-in-chatgpt">Projects in ChatGPT</a> </p></li></ul><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.nikmalykhin.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe to get more practical guides on using GenAI tools effectively in software development work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[The Shared Reality of the Database Ledger]]></title><description><![CDATA[I spent a good portion of the early 2000s staring into the flickering glow of a CRT monitor, trying to master the precise sequence of an RTS build order.]]></description><link>https://www.nikmalykhin.com/p/the-shared-reality-of-the-database</link><guid isPermaLink="false">https://www.nikmalykhin.com/p/the-shared-reality-of-the-database</guid><pubDate>Tue, 21 Apr 2026 07:01:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-Ojx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d27381-c618-42b7-a15f-62e1d625e22d_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I spent a good portion of the early 2000s staring into the flickering glow of a CRT monitor, trying to master the precise sequence of an RTS build order. In games like <em>StarCraft</em>, you didn&#8217;t just build a Factory on a whim; you followed a rigid, physical sequence of Supply Depots and Barracks. The real problem wasn&#8217;t just losing a match&#8212;it was the <em>desync</em>, a fatal error where one player&#8217;s game state no longer matched the other&#8217;s. When that happened, the shared reality of the match simply evaporated.</p><p>I found that managing a database schema with Flyway feels remarkably similar. We often treat database evolution as a fluid, agile process, but the underlying reality is much more rigid. When we move from the isolated &#8220;practice map&#8221; of local development to the high-stakes environment of a production database, we are moving into a space where the history of what we built is just as important as the current state. In this space, a mismatch between your code&#8217;s expectations and the database&#8217;s actual schema is the ultimate game-breaker.</p><h2>The Migration Ledger</h2><p>Flyway manages this by utilizing a <strong>migration-based approach</strong>, which means every change to the database&#8212;whether adding a table or altering a column&#8212;is captured in a versioned SQL script. It maintains a dedicated table called <code>flyway_schema_history</code> to track exactly which scripts have been executed. To ensure consistency, the system calculates a <em>checksum</em>, which is a digital fingerprint of the file&#8217;s content.</p><p>If I ever change a script after it has already run on a server, Flyway detects that the fingerprint has changed. This results in a checksum mismatch, and the system will stop the application from starting. This <em>immutability</em> is not a hurdle; it is a safety feature designed to prevent the database from entering an unknown state where the code expects one schema but the database has another.</p><h2>Iteration in the Local Loop</h2><p>The friction often begins when we forget that our local environment is a sandbox, not a permanent monument. On macOS, I found that using Docker and Testcontainers is the most reliable way to ensure a local database actually <em>matches</em> production. We can spin up a local container with a single command to test our build order:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;256984f0-912b-417a-8a71-87a1db5337d0&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">docker run --name my-db -e POSTGRES_PASSWORD=pass -p 5432:5432 -d postgres</code></pre></div><p>This local container allows us to iterate quickly . In our <code>build.gradle.kts</code> configuration, we ensure that the <code>cleanDisabled</code> flag is set to false .</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;kotlin&quot;,&quot;nodeId&quot;:&quot;31c8acbf-13cb-4fc9-beaf-2d2a5a55d06f&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-kotlin">flyway {
    url = "jdbc:postgresql://localhost:5432/mydb"
    user = "postgres"
    password = "pass"
    cleanDisabled = false
}</code></pre></div><p>This setup gives us a reset button . If I realize my first version of a script is flawed, I don&#8217;t create a second script to fix the first one locally. Instead, I edit the original script, run <code>./gradlew flywayClean</code>, and then <code>./gradlew flywayMigrate</code>. This ensures that my local state remains clean and my scripts remain concise before they are ever shared with the team.</p><h2>The Virtue of Squashing</h2><p>When working on a complex feature, I often end up with several different migration scripts as I refine the design. Merging all five into the main branch is a mistake because it clutters the history with a &#8220;diary&#8221; of my trial and error. Instead, I practice <em>squashing</em>, the act of consolidating all logic from multiple feature-branch scripts into one single, optimized file.</p><p>Squashing improves readability, making it easier for a peer to review one coherent table creation rather than a series of renames and drops. It also improves performance, as fewer scripts mean faster deployment and test execution. Before I merge a Pull Request, I ensure my local database is cleaned and migrated one last time to verify that the final, squashed script works perfectly.</p><h2>Constraints of the Persistent Environment</h2><p>The danger arises when we attempt to treat a <em>persistent environment</em>, like AWS Aurora, as if it were a local Docker container . Unlike our local sandbox, we cannot simply wipe a cloud database.</p><blockquote><p>Triggering a clean command in a persistent environment is the ultimate &#8220;Game Over,&#8221; as it will drop all application data and cause a full service interruption .</p></blockquote><p>Production database users usually lack the permissions to drop schemas anyway, which is a vital safety rail. However, errors still happen. Because PostgreSQL does not always roll back schema changes perfectly, a failed script can leave the database in a &#8220;half-built&#8221; state. When this happens, we must fix the script in the codebase and run <code>./gradlew flywayRepair</code> . This command updates the history table to match the new checksums without deleting any data, though sometimes manual SQL intervention is required to fix the table structure before the repair can succeed .</p><h2>Discipline Over Magic</h2><p>At the end of the day, database migrations are about the discipline you bring to the ledger rather than the tool itself. Flyway is a powerful engine, but it won&#8217;t save you from a messy build order or a lack of environmental parity. I&#8217;m keeping the practice of squashing and the strict use of containers in my toolkit, while setting aside any hope that these systems will ever be truly &#8220;set and forget&#8221;.</p><p>The reality is that database state is heavy and unforgiving. If you treat your migrations with the respect a shared reality demands, your deployments will become boring&#8212;which is exactly what we should strive for.</p><div><hr></div><p><em><strong>Further Reading / Related Reflections</strong></em></p><ul><li><p><em><a href="https://martinfowler.com/articles/evodb.html">Evolutionary Database Design by Martin Fowler</a> </em></p></li><li><p><a href="https://documentation.red-gate.com/fd/choosing-the-right-approach-with-flyway-246972498.html">Choosing the right approach with Flyway</a> </p></li></ul><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.nikmalykhin.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe to get more practical guides on using GenAI tools effectively in software development work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[The Cognitive Cost of AI Delegation]]></title><description><![CDATA[Reflections on the Attention Economy and AI Etiquette]]></description><link>https://www.nikmalykhin.com/p/the-cognitive-cost-of-ai-delegation</link><guid isPermaLink="false">https://www.nikmalykhin.com/p/the-cognitive-cost-of-ai-delegation</guid><pubDate>Tue, 14 Apr 2026 07:02:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-Ojx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d27381-c618-42b7-a15f-62e1d625e22d_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><strong>The Brake-Fade on the Downhill (The Hook)</strong></h3><p>When you&#8217;re descending a steep technical trail on a mountain bike, your most precious resource isn&#8217;t your speed&#8212;it&#8217;s your <strong>biological energy</strong> and grip strength. If you spend the entire descent white-knuckling the brakes because you&#8217;re afraid of the terrain, you hit &#8220;brake fade.&#8221; The system overheats, your hands cramp, and by the time you reach the truly dangerous rock garden at the bottom, you have zero &#8220;focus capital&#8221; left to navigate it. You crash not because the trail was too hard, but because you wasted your resources on the easy parts.</p><p>In the professional world, GenAI is being marketed as the ultimate &#8220;ebike&#8221; for our brains. The industry assumption is that more output equals more productivity. But if this &#8220;unlimited output&#8221; is the popular choice, why does it feel like I&#8217;m fighting the system? Why does receiving a perfectly formatted, AI-generated A4 page feel like a cognitive &#8220;crash&#8221; before I&#8217;ve even reached the conclusion?</p><h3><strong>The Architecture of the Proxy Mind (The Landscape)</strong></h3><p>The environment I&#8217;m navigating isn&#8217;t just a chat interface; it&#8217;s a <strong>Mind-to-Mind Pipeline</strong> where the AI acts as a middleware layer. We are dealing with a system defined by the following geometry:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;b625c3e7-feba-451b-a272-85eddc0b8732&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[Input: Raw/Unorganized Chaos]

          &#8595;

[Processor: GenAI &#8220;Mind Extension&#8221;]

          &#8595;

[Output: Structured Narrative (High Volume)]

          &#8595;

[Buffer: Human Reviewer (The Fatigue Point)]

          &#8595;

[Destination: Recipient&#8217;s Attention Span]</code></pre></div><p>The constraints here are rigid. The LLM has no &#8220;physical&#8221; weight, but its output carries massive <strong>cognitive weight</strong>. The dependencies are tightly coupled: if I delegate the &#8220;thinking&#8221; to the tool without managing the &#8220;output volume,&#8221; the invisible boundary of the recipient&#8217;s attention is breached. Data moves through this space quickly, but <strong>meaning</strong> gets trapped in the friction of the preamble.</p><h3><strong>The A4 Saturation Point (The Stress Test)</strong></h3><p>I moved my observations from the &#8220;theoretical path&#8221; to the &#8220;actual terrain&#8221; where people have many unread messages.</p><p>&#10148; <strong>The Breaking Point:</strong> The methodology of &#8220;Ask and Forward&#8221; failed at the third iteration. When I pushed a full A4 page of structured AI text to a colleague, the system showed immediate fatigue.</p><p>&#10148; <strong>The Silent Failure:</strong> The recipient didn&#8217;t tell me the text was too long. Instead, they &#8220;swallowed&#8221; the error&#8212;skimming the preamble, missing the critical &#8220;result of work&#8221; buried in the middle, and asking a question that was already answered in the text.</p><p>&#10148; <strong>The Observation:</strong> The gap between the &#8220;Structured Answer&#8221; provided by the AI and the actual <strong>Information Transferred</strong> was a massive chasm. While I didn&#8217;t measure the exact percentage, the observation was clear: the system was technically functioning, but the mission failed. The recipient&#8217;s focus simply didn&#8217;t survive the &#8220;A4 size&#8221; barrier.</p><h3><strong>The Noise Floor of the Preamble (The Handoff)</strong></h3><p>This is a failure of delegation. When we use AI to structure &#8220;unstructured vision,&#8221; we often translate our goal into an action that generates <strong>clutter</strong> rather than <strong>clarity</strong>.</p><p>&#10148; <strong>Signal-to-Noise:</strong> GenAI tools are programmed to be &#8220;helpful,&#8221; which means adding long, polite preambles and exhaustive summaries. This is the <strong>&#8220;noise floor&#8221;</strong>.</p><p>&#10148; <strong>Cognitive Load:</strong> By sending unedited AI responses, you aren&#8217;t saving time; you are just shifting the <strong>processing debt</strong> onto the recipient. You spend 10 seconds generating the text, but you force the recipient to spend minutes mining it for value. This eventually leads to a &#8220;system blackout&#8221; where people ignore messages entirely.</p><h3><strong>The Hard Character Limit (The Verification)</strong></h3><p>After observing these failures, only one principle remained standing: <strong>The Short Style Constraint</strong>.</p><p>&#10148; <strong>Stability:</strong> The only communication that survived the &#8220;skimming&#8221; reflex was the <strong>&#8220;Elevator Pitch&#8221;</strong> format. When forced into a tight container, the AI is actually better at its job. It stops &#8220;hallucinating value&#8221; through word count and starts organizing logic.</p><p>&#10148; <strong>The New Baseline:</strong> The trusted approach is the <strong>Init Prompt Constraint</strong>. I tell the system: &#8220;Structure my thoughts, but do not exceed 280 characters&#8221; or &#8220;Provide the result first, no preamble&#8221;.</p><p>&#10148; <strong>The Evolution:</strong> I no longer view AI as a &#8220;writer&#8221;; I view it as a <strong>compressor</strong>. The strategy has shifted from using AI to say more to using it to say exactly enough.</p><h3><strong>The Navigator&#8217;s Log (Actionable Insights)</strong></h3><p>&#10148; <strong>Backlog:</strong></p><ul><li><p>The &#8220;A4-size&#8221; response&#8212;a legacy format that died with the printer.</p></li><li><p>&#8220;Respectful&#8221; AI preambles&#8212;they are actually disrespectful to the recipient&#8217;s time.</p></li><li><p>Trusting the human brain to catch errors in long AI texts after multiple iterations (brain laziness is a hardware feature, not a bug).</p></li></ul><p>&#10148; <strong>Merged:</strong></p><ul><li><p><strong>The &#8220;Short Style&#8221; Init Prompt:</strong> Force the AI into a constraint <em>before</em> it generates a single word.</p></li><li><p><strong>Energy Conservation:</strong> Spend mental energy on the <strong>constraint</strong>, not on editing massive, verbose text.</p></li><li><p><strong>The Win-Win Protocol:</strong> If the sender spends less energy reviewing and the recipient spends less energy reading, the system remains stable.</p></li></ul><p><strong>Final Wisdom:</strong> In a world of infinite AI-generated noise, the most &#8220;premium&#8221; technical skill is the discipline to <strong>limit</strong> content. Be respectful to the system, or the system will stop listening.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.nikmalykhin.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe to get more practical guides on using GenAI tools effectively in software development work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[𝗧𝗵𝗲 𝗣𝗿𝗮𝗴𝗺𝗮𝘁𝗶𝗰 𝗛𝗲𝘅𝗮𝗴𝗼𝗻: 𝗦𝗰𝗮𝗹𝗶𝗻𝗴 𝗗𝗲𝗰𝗼𝘂𝗽𝗹𝗶𝗻𝗴 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝗖𝗼𝗺𝗽𝗹𝗲𝘅𝗶𝘁𝘆]]></title><description><![CDATA[&#120295;&#120309;&#120306; &#120295;&#120306;&#120315;&#120320;&#120310;&#120316;&#120315; &#120316;&#120315; &#120321;&#120309;&#120306; &#120295;&#120319;&#120302;&#120310;&#120313;]]></description><link>https://www.nikmalykhin.com/p/pragmatic-hexagon</link><guid isPermaLink="false">https://www.nikmalykhin.com/p/pragmatic-hexagon</guid><pubDate>Tue, 24 Mar 2026 13:21:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-Ojx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d27381-c618-42b7-a15f-62e1d625e22d_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><strong>&#120295;&#120309;&#120306; &#120295;&#120306;&#120315;&#120320;&#120310;&#120316;&#120315; &#120316;&#120315; &#120321;&#120309;&#120306; &#120295;&#120319;&#120302;&#120310;&#120313;</strong></h3><p>In a professional kitchen, there is a concept called <em>mise en place</em>&#8212;everything in its place. You don&#8217;t start searing the scallops until every herb is chopped and every sauce is whisked. If you skip the prep to &#8220;save time,&#8221; you end up adjusting the recipe mid-saut&#233;, usually resulting in a frantic mess, ruined ingredients, and a dish that takes twice as long to serve.</p><p>Modern software development has a similar &#8220;popular choice&#8221;: start coding the logic immediately to show &#8220;progress.&#8221; But when we skip the architectural prep&#8212;the interfaces and boundaries&#8212;we aren&#8217;t moving fast; we are just building a kitchen we&#8217;ll have to tear down while the customers are waiting. I&#8217;ve watched engineers lose sight of the goal in the pursuit of a &#8220;perfect flow&#8221; that wasn&#8217;t grounded in discipline. If everyone says they want &#8220;clean code,&#8221; why does the system feel like it&#8217;s fighting us the moment we add a new story?</p><h3><strong>&#120294;&#120326;&#120320;&#120321;&#120306;&#120314; &#120282;&#120306;&#120316;&#120314;&#120306;&#120321;&#120319;&#120326;</strong></h3><p>The environment of this experiment is a standard <strong>Kotlin and Spring Boot</strong> stack. The landscape is defined by three distinct zones designed to minimize the &#8220;weight&#8221; of dependencies. To navigate this space, we use a rigid directory structure that acts as our map:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;78781d66-8266-4010-b6a8-432cfa8a8d42&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">app

&#9500;&#9472;&#9472; domain      &lt;-- THE HEART (POKOs only)
&#9474;   &#9500;&#9472;&#9472; model
&#9474;   &#9474;   &#9492;&#9472;&#9472; Data.kt     &lt;-- Pure Kotlin Data Class
&#9474;   &#9492;&#9472;&#9472; ports
&#9474;       &#9492;&#9472;&#9472; outgoing    &lt;-- Interfaces defining &#8220;What&#8221; we need
&#9474;           &#9500;&#9472;&#9472; DataPersistencePort.kt    &lt;- SQL db
&#9474;           &#9492;&#9472;&#9472; DataStoragePort.kt        &lt;- Object storage
&#9500;&#9472;&#9472; usecases    &lt;-- THE ORCHESTRATOR
&#9474;   &#9492;&#9472;&#9472; StoreDataUseCase.kt    &lt;-- Feature logic
&#9492;&#9472;&#9472; adapter     &lt;-- THE &#8220;HOW&#8221; (Infrastructure)
    &#9500;&#9472;&#9472; web         &lt;-- Inbound Adapter
    &#9474;   &#9500;&#9472;&#9472; DataController.kt
    &#9474;   &#9500;&#9472;&#9472; dto         &lt;-- Request/Response DTOs
    &#9474;       &#9492;&#9472;&#9472; WebMapper.kt    &lt;-- DTO &lt;-&gt; Domain mapping
    &#9500;&#9472;&#9472; sqldb       &lt;-- Outbound Adapter
    &#9474;   &#9500;&#9472;&#9472; entity
    &#9474;   &#9474;   &#9492;&#9472;&#9472; DataJpaEntity.kt    &lt;-- @Entity + JPA annotation
    &#9474;   &#9500;&#9472;&#9472; DataRepository.kt        &lt;-- Spring Data/CrudRepository
    &#9474;   &#9500;&#9472;&#9472; PersistenceMapper.kt     &lt;-- Entity &lt;-&gt; Domain mapping
    &#9474;   &#9492;&#9472;&#9472; PersistenceAdapter.kt    &lt;-- Impl DataPersistencePort
    &#9492;&#9472;&#9472; cloud       &lt;-- Outbound Adapter
        &#9492;&#9472;&#9472; ObjectStorageAdapter.kt</code></pre></div><p>&#10148; <strong>The Heart (Domain):</strong> Pure Kotlin Data Classes and business logic common to all usecases.</p><p>&#10148; <strong>The Orchestrator (Usecases):</strong> Where feature-specific logic lives and adapters are coordinated.</p><p>&#10148; <strong>The Infrastructure (Adapters):</strong> The &#8220;How&#8221; of the system&#8212;web controllers, JPA entities, and cloud storage clients.</p><p>The invisible boundary here is the <strong>Port</strong>. It&#8217;s an interface that defines &#8220;what&#8221; we need without caring &#8220;how&#8221; it&#8217;s done. In theory, this geometry should be light and flexible, yet many teams find it rigid because they misunderstand the direction of the signal.</p><h3><strong>&#120280;&#120314;&#120317;&#120310;&#120319;&#120310;&#120304;&#120302;&#120313; &#120280;&#120325;&#120317;&#120313;&#120316;&#120319;&#120302;&#120321;&#120310;&#120316;&#120315;</strong></h3><p>I moved from the &#8220;theoretical path&#8221; of perfect architecture to the &#8220;actual terrain&#8221; of daily PRs. The system showed its breaking point not in a crash, but in a silent failure of discipline: the <strong>Domain Import Leak</strong>.</p><p>&#10148; <strong>The Breaking Point:</strong> It usually starts when an engineer adds a domain service that directly imports an adapter: import app.adapter.NewAdapter.kt.</p><p>&#10148; <strong>The Silent Failure:</strong> The code still passes tests. It still &#8220;works&#8221;. But the &#8220;Pure Domain&#8221; has been poisoned by infrastructure concerns.</p><p>&#10148; <strong>The Result:</strong> When the time inevitably comes to move that service to a usecase, the system reacts with extreme fatigue. We end up with PRs requiring the renaming of tens of files, leading to typos, package mismatches, and a massive mental load on reviewers.</p><h3><strong>&#120288;&#120302;&#120315;&#120302;&#120308;&#120310;&#120315;&#120308; &#120321;&#120309;&#120306; &#120294;&#120310;&#120308;&#120315;&#120302;&#120313;</strong></h3><p>The handoff between layers is where the &#8220;spaghetti&#8221; starts or ends. In my exploration, I found that the clarity of intent is often lost because teams are afraid of the &#8220;complexity&#8221; of an extra interface.</p><p>&#10148; <strong>Cognitive Load:</strong> Trying to refactor architecture in the middle of a feature story creates a &#8220;refactoring nightmare&#8221;.</p><p>&#10148; <strong>Signal-to-Noise:</strong> If you are 100% sure a logic block belongs in the domain, put it there. If not, the &#8220;cleaner&#8221; signal is to start in a <strong>Usecase</strong> and extract downward only when the need is proven.</p><p>&#10148; <strong>Direct Translation:</strong> To keep the signal clear, I&#8217;ve found it&#8217;s even acceptable to call a Port directly from a controller for simple cases. This avoids 1:1 &#8220;pass-through&#8221; mapping while keeping the adapter decoupled through the interface.</p><h3><strong>&#120298;&#120309;&#120302;&#120321; &#120280;&#120302;&#120319;&#120315;&#120306;&#120305; &#120295;&#120319;&#120322;&#120320;&#120321;?</strong></h3><p>After the stress test of &#8220;no time to decouple,&#8221; one principle remained standing: <strong>Mandatory Ports from the Start</strong>.</p><p>&#10148; <strong>Stability:</strong> The &#8220;price&#8221; of an interface at the start is effectively zero. It provides an immediate boundary that prevents the &#8220;import leak&#8221; and allows the domain to remain pure. &#10148; <strong>The New Baseline:</strong> My trusted navigation strategy is now <strong>TDD-driven Hexagon</strong>.</p><p>&#8226; <strong>Step 1:</strong> Define the Domain Model.</p><p>&#8226; <strong>Step 2:</strong> Build the Adapter and verify it with <strong>Testcontainers</strong> (SQL or Object Storage).</p><p>&#8226; <strong>Step 3:</strong> Finally, orchestrate it all in the Usecase or Controller using the Port interface.</p><h3><strong>&#120276;&#120304;&#120321;&#120310;&#120316;&#120315;&#120302;&#120303;&#120313;&#120306; &#120284;&#120315;&#120320;&#120310;&#120308;&#120309;&#120321;&#120320;</strong></h3><p>&#10148; <strong>Backlog (Failed the Stress Test):</strong></p><p>&#8226; &#8220;Refactoring-in-the-middle&#8221;: Changing architecture while delivering a story leads to mess and typos.</p><p>&#8226; Direct Adapter Imports: Any import app.adapter inside app.domain is a bug, not a feature.</p><p>&#10148; <strong>Merged (Trusted Toolkit):</strong></p><p>&#8226; <strong>Ports First:</strong> Always create the interface for 3rd party services or repositories immediately.</p><p>&#8226; <strong>Adapter-First Testing:</strong> Use Testcontainers to prove your &#8220;How&#8221; works before you worry about the &#8220;What&#8221; in your orchestration.</p><p>&#8226; <strong>Minimum Layers:</strong> Only add a Usecase layer if there is actual orchestration; otherwise, call the Port from the Controller.</p><p><strong>Final Wisdom:</strong> Clean architecture isn&#8217;t about having the most layers; it&#8217;s about having the most resilient boundaries. The &#8220;price&#8221; of an interface is nothing compared to the cost of a messy PR that no one wants to review.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.nikmalykhin.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe to get more practical guides on using GenAI tools effectively in software development work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Lying Tests and the Silent Swallow: Hardening Legacy Java]]></title><description><![CDATA[Is your CI/CD pipeline telling you the truth, or is it just telling you what you want to hear?]]></description><link>https://www.nikmalykhin.com/p/lying-tests-and-the-silent-swallow</link><guid isPermaLink="false">https://www.nikmalykhin.com/p/lying-tests-and-the-silent-swallow</guid><pubDate>Tue, 17 Mar 2026 08:00:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-Ojx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d27381-c618-42b7-a15f-62e1d625e22d_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Is your CI/CD pipeline telling you the truth, or is it just telling you what you want to hear?</strong> </p><p>In many legacy projects, the build is &#8220;Green,&#8221; the tests pass, and the console shows no errors. Yet, the moment the application hits production, it fails. The culprit is often a &#8220;Lying Test&#8221;&#8212;a suite that passes not because the code works, but because the errors have been carefully hidden, logged to a void, or suppressed by a generic catch-all block.</p><p>How do you turn a &#8220;politely silent&#8221; codebase into one that fails loudly enough to be fixed?</p><h3>The &#8216;Before&#8217; State: Setting the Context</h3><p>In older Java applications (circa 2005), error handling was often synonymous with <code>e.printStackTrace()</code>. Developers used manual <code>main()</code> methods or early JUnit versions to &#8220;test&#8221; logic. When an exception occurred, the instinct was to keep the process running at all costs.</p><p>The &#8220;old way&#8221; of testing often looked like this:</p><ul><li><p><strong>The Silent Swallow:</strong> Generic <code>catch (Exception e)</code> blocks that log a message but do not rethrow or signal failure.</p></li><li><p><strong>Exit Code 0:</strong> Build scripts (Ant) that encounter a runtime error but still report a successful exit code, tricking the developer into thinking everything is fine.</p></li><li><p><strong>Manual Verification:</strong> Tests that require a human to read the console output to see if it &#8220;looks right,&#8221; rather than asserting a specific outcome.</p></li></ul><h3>Introducing the Core Concept: Honest Testing</h3><p><strong>Honest Testing</strong> is the process of stripping away the &#8220;safety blankets&#8221; of legacy error handling to force the application to <strong>Crash Loudly.</strong></p><p><strong>What is it?</strong> It is a &#8220;Hardening Phase&#8221; where you replace swallowed exceptions with meaningful failures and migrate manual checks to automated assertions.</p><p><strong>Why does it matter?</strong> You cannot refactor code you do not understand. If your tests are lying to you about the state of the system, any &#8220;improvement&#8221; you make is just a guess. Making the build <strong>RED</strong> is the first step toward making it truly <strong>GREEN.</strong></p><h3>Practical Applications &amp; Use Cases</h3><h4>Use Case A: Exposing the Silent Swallow</h4><p>The most common anti-pattern in legacy Java is the &#8220;Log and Forget&#8221; block. We must convert these into loud failures during the testing phase.</p><pre><code><code>// BEFORE: The Lying Code
public void storeData() {
    try {
        // critical logic
    } catch (Exception e) {
        System.out.println("Error happened, but let's keep going!");
    }
}

// AFTER: Honest Code for Testing
public void storeData() {
    try {
        // critical logic
    } catch (Exception e) {
        // Re-throwing as a RuntimeException forces the test to fail
        throw new RuntimeException("Hardened Failure: Data storage failed", e);
    }
}
</code></code></pre><p><em>Benefit: The test suite will now immediately catch failures that were previously invisible.</em></p><h4>Use Case B: From <code>main()</code> to JUnit 5</h4><p>Legacy projects often have &#8220;test&#8221; classes that are just <code>public static void main(String[] args)</code> methods. These don&#8217;t integrate with CI/CD.</p><pre><code><code>// Migrating to JUnit 5 Assertions
@Test
void testBackendConnection() {
    Backend b = new Backend("qbert.guba.com");
    // Instead of printing to console, we assert the state
    assertDoesNotThrow(() -&gt; b.connect(), "Connection should be stable");
    assertNotNull(b.getStatus(), "Status should be initialized");
}
</code></code></pre><p><em>Benefit: Provides a quantifiable &#8220;Safety Net&#8221; that build tools like Gradle can interpret as a Pass/Fail signal.</em></p><h3>Common Pitfalls &amp; Misconceptions</h3><p><strong>The &#8220;Fear of Red&#8221; Pitfall:</strong> Many teams are terrified of a broken build. They think that if the build turns red, they&#8217;ve failed.</p><p><strong>The Truth:</strong> In legacy refactoring, a <strong>Red Build</strong> is a victory. It means you&#8217;ve finally found the boundaries of the system. You&#8217;ve moved from &#8220;unknown-unknowns&#8221; to &#8220;known-knowns.&#8221; Don&#8217;t rush to fix the red; use it as a map to find where the code is truly broken.</p><h3>Core Trade-offs &amp; Nuances</h3><ul><li><p><strong>The &#8220;Crash&#8221; Period:</strong> When you start hardening tests, the project might not compile or pass for days. This requires stakeholder buy-in&#8212;you are breaking the &#8220;illusion of stability&#8221; to find the &#8220;reality of the debt.&#8221;</p></li><li><p><strong>Log Noise:</strong> Hardening exceptions often results in massive stack traces in your logs. This is necessary labor; you have to clean the noise to find the signals.</p></li></ul><h3>Forward-Looking Conclusion</h3><p>A &#8220;Green Build&#8221; is only valuable if it is earned. By removing the &#8220;Silent Swallows&#8221; from your legacy Java project, you are performing a diagnostic surgery. It is painful, and it reveals the rot, but it is the only way to heal the codebase.</p><p>Once your tests are honest, you can finally apply modern AI tools and refactoring patterns with confidence. You aren&#8217;t just &#8220;hacking&#8221; anymore; you are <strong>Engineering.</strong></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.nikmalykhin.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe to get more practical guides on using GenAI tools effectively in software development work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Environment Emulation: Using Docker as a Time Machine for Legacy Java]]></title><description><![CDATA[What do you do when the code is right, but the world has changed too much to run it? You&#8217;ve successfully compiled a 20-year-old Java app, but the moment you hit &#8220;Run,&#8221; it crashes.]]></description><link>https://www.nikmalykhin.com/p/environment-emulation-using-docker</link><guid isPermaLink="false">https://www.nikmalykhin.com/p/environment-emulation-using-docker</guid><dc:creator><![CDATA[Nik]]></dc:creator><pubDate>Tue, 03 Mar 2026 08:01:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-Ojx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d27381-c618-42b7-a15f-62e1d625e22d_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>What do you do when the code is right, but the world has changed too much to run it?</strong> You&#8217;ve successfully compiled a 20-year-old Java app, but the moment you hit &#8220;Run,&#8221; it crashes. It&#8217;s looking for a server named <code>qbert.guba.com</code> that was decommissioned in 2011. It&#8217;s searching for a local directory belonging to a developer who left the company fifteen years ago.</p><p>How do you convince a digital &#8220;antique&#8221; that it&#8217;s still living in 2005?</p><h3>The &#8216;Before&#8217; State: Setting the Context</h3><p>In the early days of Java development, &#8220;Environment Variables&#8221; and &#8220;Configuration as Code&#8221; were often ignored in favor of hardcoded assumptions. Developers wrote code that relied on:</p><ul><li><p><strong>Static Network Topologies:</strong> Hardcoded hostnames in <code>.properties</code> files or even inside <code>.class</code> files.</p></li><li><p><strong>Personalized File Paths:</strong> Logic that pointed to <code>/Users/ericlambrecht/data</code>, making the code physically impossible to run on any other machine.</p></li><li><p><strong>Specific Hardware Quirks:</strong> Reliance on the way Intel processors handled certain operations, which breaks on modern ARM-based chips like Apple&#8217;s M-series.</p></li></ul><p>The &#8220;old way&#8221; to fix this was a massive refactoring effort to externalize configuration. But when you have thousands of lines of &#8220;spaghetti&#8221; code, you risk introducing more bugs than you fix.</p><h3>Introducing the Core Concept: Environment Emulation</h3><p><strong>Environment Emulation</strong> is the practice of using containerization to recreate a specific historical &#8220;reality&#8221; for your application. Instead of changing the code to fit the modern world, you change the world to fit the code.</p><p><strong>What is it?</strong> It&#8217;s a &#8220;Time Capsule&#8221; strategy where Docker mimics the network, filesystem, and CPU architecture the application expects.</p><p><strong>Why does it matter?</strong> It allows you to achieve a &#8220;Green Start&#8221; without touching a single line of legacy business logic. By stabilizing the environment first, you can verify that the code <em>can</em> work before you begin the dangerous work of refactoring it.</p><h3>Practical Applications &amp; Use Cases</h3><h4>Use Case A: Network Trickery (Docker Aliases)</h4><p>If your legacy code is hardcoded to look for <code>qbert.guba.com</code>, you don&#8217;t need to hunt through the source code. You can use Docker&#8217;s network aliases to point that &#8220;ghost&#8221; hostname to a local container or a mock service.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;markdown&quot;,&quot;nodeId&quot;:&quot;0aefdc7d-db5f-40c4-8841-fc3209dcea12&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-markdown"># docker-compose.yml
services:
  legacy-app:
    image: my-ancient-app:latest
    networks:
      backend:
        aliases:
          - qbert.guba.com  # The app thinks it found its long-lost server
networks:
  backend:</code></pre></div><p><em>Benefit: The application connects successfully without any code changes or </em><code>/etc/hosts</code><em> hacking on your host machine.</em></p><h4>Use Case B: Filesystem Mimicry (Volume Mapping)</h4><p>When code is locked to a specific path like <code>/Users/eric/data</code>, Docker volumes can &#8220;teleport&#8221; your modern project directory into that exact location inside the container.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;833c3f29-cd4e-4610-95cf-d1ca05c4eb25&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">docker run -v $(pwd)/data:/Users/ericlambrecht/data my-legacy-java-app</code></pre></div><p><em>Benefit: You satisfy hardcoded file requirements immediately, allowing the app to boot and pass its initial I/O checks.</em></p><h4>Use Case C: Hardware Realities (x86 on ARM)</h4><p>Older binaries or specific versions of the JVM (like early Java 6 or 8 builds) may behave unpredictably on Apple Silicon (ARM64). You can force Docker to emulate the original Intel environment.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;dockerfile&quot;,&quot;nodeId&quot;:&quot;16a91638-e465-40fc-8ce9-b94533cdf233&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-dockerfile"># Specify the platform to ensure 100% compatibility with legacy binaries
FROM --platform=linux/amd64 eclipse-temurin:8-jdk</code></pre></div><p><em>Benefit: Eliminates subtle &#8220;Heisenbugs&#8221; caused by CPU architecture differences.</em></p><h3>Common Pitfalls &amp; Misconceptions</h3><p><strong>The "Config-First" Trap:</strong> Many engineers think they must "clean up" the configuration files before they can run the app in Docker.</p><p><strong>The Fix:</strong> Don&#8217;t clean. <strong>Emulate.</strong> Use Docker to satisfy the app&#8217;s current (even if &#8220;ugly&#8221;) requirements. Once you have a running, testable container, you can then refactor the configuration into modern environment variables as a second, safer step.</p><h3>Core Trade-offs &amp; Nuances</h3><ul><li><p><strong>The &#8220;Magic&#8221; Burden:</strong> Environment emulation can feel like &#8220;magic&#8221; to new developers. If the <code>docker-compose.yml</code> isn&#8217;t well-documented, a newcomer won&#8217;t understand why the app is looking for a server that doesn&#8217;t exist.</p></li><li><p><strong>Performance:</strong> Running x86 images on ARM64 via emulation (QEMU) is slower than native execution. This is acceptable for refactoring and testing, but may not be ideal for high-performance production needs.</p></li></ul><h3>Forward-Looking Conclusion</h3><p>Modernization is an act of engineering, not just coding. By using Docker as a &#8220;Time Machine,&#8221; you stop fighting the environment and start observing the application&#8217;s actual behavior.</p><p>Once the &#8220;Time Capsule&#8221; is built, you have achieved the ultimate goal of the software archaeologist: <strong>Reproducibility.</strong> From here, you can move forward with confidence, knowing that any changes you make to the code are being tested against a stable, predictable reality.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.nikmalykhin.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe to get more practical guides on using GenAI tools effectively in software development work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[The Strangler Build: Modernizing Java Tooling with Gradle 7.6]]></title><description><![CDATA[What do you do when your build system is the primary blocker to your modernization? You want to introduce automated testing and containerized deployments, but your project is locked inside an opaque build.xml file.]]></description><link>https://www.nikmalykhin.com/p/the-strangler-build-modernizing-java</link><guid isPermaLink="false">https://www.nikmalykhin.com/p/the-strangler-build-modernizing-java</guid><dc:creator><![CDATA[Nik]]></dc:creator><pubDate>Tue, 17 Feb 2026 08:03:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-Ojx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d27381-c618-42b7-a15f-62e1d625e22d_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>What do you do when your build system is the primary blocker to your modernization?</strong> You want to introduce automated testing and containerized deployments, but your project is locked inside an opaque <code>build.xml</code> file. It&#8217;s not necessarily that the file is thousands of lines long&#8212;it&#8217;s that it represents a &#8220;frozen&#8221; process. The fear of breaking a specific, undocumented Ant target often keeps teams stuck in the past, manually running builds because they don&#8217;t trust the automation.</p><h3>The &#8216;Before&#8217; State: Setting the Context</h3><p>In the early 2000s, <strong>Apache Ant</strong> was the industry standard. It was purely imperative: you wrote a &#8220;script&#8221; telling the computer exactly how to delete folders, copy files, and compile classes.</p><p>The problem isn&#8217;t just the age of the tool; it&#8217;s the <strong>lack of lifecycle</strong>. Unlike Maven or Gradle, Ant has no built-in concept of a &#8220;test&#8221; phase or a &#8220;package&#8221; phase unless someone manually scripted them. For many legacy projects, this resulted in a build process that is fragile, hard to replicate in CI/CD, and completely disconnected from modern dependency management.</p><h3>Introducing the Core Concept: The Tooling Strangler</h3><p>The <strong>Tooling Strangler</strong> applies the Strangler Fig pattern to your build infrastructure. Instead of attempting a &#8220;Big Bang&#8221; migration where you delete Ant and spend a week debugging a new Gradle script, you <strong>wrap</strong> the old logic.</p><p><strong>What is it?</strong> Using Gradle&#8217;s <code>ant.importBuild</code>, you surface your legacy Ant targets as native Gradle tasks.</p><p><strong>Why does it matter?</strong> It allows you to move to a modern CLI immediately. You get the benefits of the Gradle Wrapper (<code>./gradlew</code>), advanced caching, and build scans, while the actual heavy lifting is still performed by the original, proven Ant logic.</p><h3>Practical Applications &amp; Use Cases</h3><h4>Use Case A: The &#8220;Wrapper&#8221; Migration</h4><p>By importing the build, you can start adding modern features (like dependency management) around the old Ant tasks without changing the Ant file itself.</p><pre><code>// build.gradle
// Import the existing Ant logic
ant.importBuild 'build.xml'

// Add a modern dependency that Ant didn't know about
dependencies {
    implementation 'org.slf4j:slf4j-api:1.7.36'
    testImplementation 'org.junit.jupiter:junit-jupiter:5.9.1'
}

// "Hook" a modern task into an old Ant target
tasks.named('compile') {
    doLast {
        println "Ant finished compiling. Gradle is now verifying the output..."
    }
}</code></pre><p><em>Benefit: Risk-free modernization. Your build stays &#8220;green&#8221; throughout the entire transition.</em></p><h4>Use Case B: The 7.6 &#8220;Goldilocks&#8221; Version</h4><p>In my experiments, I found that <strong>Gradle 7.6</strong> is the specific &#8220;sweet spot&#8221; for this work. Why?</p><ol><li><p><strong>JDK 8 Compatibility:</strong> It is the last major version that runs its own background processes (the daemon) natively on Java 8.</p></li><li><p><strong>Modern Features:</strong> It still supports the latest JUnit 5 platforms and Docker-ready plugins.</p></li><li><p><strong>The Bridge:</strong> It allows you to bridge the gap between a 2005 build logic and a 2026 deployment pipeline.</p></li></ol><h3>Common Pitfalls &amp; Misconceptions</h3><p><strong>The "Pure Gradle" Obsession:</strong> A common mistake is trying to make the <code>build.gradle</code> file "perfect" from day one. Developers often get stuck trying to replicate a weird Ant <code>copy</code> task in Gradle's DSL.</p><p><strong>The Fix:</strong> If the Ant task works, <strong>leave it in Ant.</strong> Use the Strangler Fig approach: only move tasks to Gradle when you actually need to change their logic or improve their performance.</p><h3>Core Trade-offs &amp; Nuances</h3><ul><li><p><strong>Dual Maintenance:</strong> For a period, you have both <code>build.xml</code> and <code>build.gradle</code>. You must treat the Gradle file as the new &#8220;entry point&#8221; for the team.</p></li><li><p><strong>Mindset Shift:</strong> You are moving from a &#8220;Scripting&#8221; mindset (Ant) to a &#8220;Task Graph&#8221; mindset (Gradle). Understanding how tasks depend on one another is more important than knowing the syntax.</p></li></ul><h3>Forward-Looking Conclusion</h3><p>Modernizing a build system doesn&#8217;t require a &#8220;demolition and rebuild.&#8221; By using <strong>Gradle 7.6</strong> as a wrapper for your legacy Ant scripts, you buy yourself the most valuable asset in refactoring: <strong>time.</strong> You get the project into a modern CI/CD pipeline on day one. Once the build is stabilized and automated, you can &#8220;strangle&#8221; the remaining Ant targets at your own pace.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.nikmalykhin.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe to get more practical guides on using GenAI tools effectively in software development work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[The Golden Bridge: Why Java 8 is the Ultimate Tool for Legacy Refactoring]]></title><description><![CDATA[When does &#8220;latest and greatest&#8221; become a liability? Imagine you&#8217;ve just inherited a &#8220;Big Ball of Mud&#8221;: a 20-year-old repository built with Ant, running on Java 1.5, and filled with raw types and swallowed exceptions.]]></description><link>https://www.nikmalykhin.com/p/the-golden-bridge-why-java-8-is-the</link><guid isPermaLink="false">https://www.nikmalykhin.com/p/the-golden-bridge-why-java-8-is-the</guid><dc:creator><![CDATA[Nik]]></dc:creator><pubDate>Mon, 16 Feb 2026 08:02:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-Ojx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d27381-c618-42b7-a15f-62e1d625e22d_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>When does &#8220;latest and greatest&#8221; become a liability?</strong> Imagine you&#8217;ve just inherited a &#8220;Big Ball of Mud&#8221;: a 20-year-old repository built with Ant, running on Java 1.5, and filled with raw types and swallowed exceptions. Your instinct is to jump to Java 21 to get the latest performance gains and features. But when you try to compile, you&#8217;re met with thousands of breaking changes, deleted APIs, and a build system that refuses to acknowledge modern hardware.</p><p>How do you modernize a system that is too old to run, but too critical to fail?</p><h3>The &#8216;Before&#8217; State: Setting the Context</h3><p>In the world of &#8220;Software Archaeology,&#8221; we often encounter projects stuck in the mid-2000s. These applications are often:</p><ul><li><p><strong>Compiler-Locked:</strong> They rely on syntax (like certain raw-type configurations) that modern JDKs (11, 17, 21) simply won&#8217;t compile anymore.</p></li><li><p><strong>Environment-Fragile:</strong> They only &#8220;work on Bob&#8217;s machine&#8221; because Bob has a specific 2008-era Intel laptop and a prehistoric version of the JDK.</p></li><li><p><strong>Tooling-Limited:</strong> They use Ant or early Maven versions that don&#8217;t understand modern CI/CD pipelines or containerization.</p></li></ul><p>The &#8220;old way&#8221; of fixing this was the <strong>Big Bang Migration</strong>: a grueling six-month rewrite where you try to jump 15 years of evolution in one go. Most of these attempts end in failure, reverted commits, and exhausted teams.</p><h3>Introducing the Core Concept: The Golden Bridge</h3><p>The <strong>Golden Bridge</strong> methodology uses Java 8 not as a final destination, but as a strategic <strong>"Field Hospital."</strong> <strong>What is it?</strong> It is the practice of migrating ancient code (Java 1.4 - 1.6) specifically to Java 8 first, rather than the current LTS.<br><strong>Why does it matter?</strong> Java 8 sits at a unique historical intersection. It is the &#8220;Last of the Ancients&#8221; and the &#8220;First of the Moderns.&#8221; It provides a stable environment where you can fix the internal architecture of the code without the external environment fighting you.</p><p><strong>How does it work?</strong> </p><ol><li><p><strong>Dual-Compatibility:</strong> It supports the <code>-source 1.5</code> flag to compile ancient syntax while allowing you to use modern IDEs.</p></li><li><p><strong>Architecture Neutrality:</strong> It is the first version that runs natively on Apple Silicon (ARM64) via Zulu or Temurin builds, ending the reliance on old hardware.</p></li><li><p><strong>Tooling Support:</strong> It is fully supported by Gradle 7.6, which acts as the "Strangler Fig" for old Ant builds.</p></li></ol><h3>Practical Applications &amp; Use Cases</h3><h4>Use Case A: Compiling the &#8220;Uncompilable&#8221;</h4><p>Modern JDKs have removed many internal APIs and tightened the rules on source compatibility. Java 8 allows you to keep the old code running while you transition the build system.</p><pre><code>// In your build.gradle, you can target the past while living in the present
java {
    toolchain {
        languageVersion = JavaLanguageVersion.of(8)
    }
}</code></pre><p><em>Benefit: You get a green build in hours, not weeks.</em></p><h4>Use Case B: The Docker &#8220;Time Machine&#8221;</h4><p>By using Java 8, you can create a Docker image that mirrors the production environment exactly, but runs on a 2024 MacBook.</p><pre><code>FROM eclipse-temurin:8-jdk
# Map the 20-year-old hardcoded file paths to modern volumes
VOLUME /Users/original_dev/data:/data 
COPY . /app
WORKDIR /app
CMD ["ant", "test"]</code></pre><p><em>Benefit: Eliminates &#8220;Works on my machine&#8221; bugs immediately.</em></p><h3>Common Pitfalls &amp; Misconceptions</h3><p><strong>The "Destination" Trap:</strong> The biggest mistake is thinking that moving to Java 8 is "enough."</p><p>Java 8 is a <strong>bridge</strong>, not a home. If you stay there, you are still accumulating technical debt. The goal of the Golden Bridge is to get the code clean enough (removing raw types, fixing tests) so that the jump to Java 17 or 21 becomes a simple compiler flag change rather than a structural nightmare.</p><h3>Core Trade-offs &amp; Nuances</h3><ul><li><p><strong>The Cost:</strong> You have to maintain a specific legacy toolchain (like Gradle 7.6) because the newest versions of build tools have dropped support for Java 8.</p></li><li><p><strong>The Mindset:</strong> You must resist the urge to use Java 8 features (like Streams or Optionals) immediately. Your first goal is <strong>stabilization</strong>, not modernization. Adding new syntax to a &#8220;muddy&#8221; codebase only makes the archaeology harder.</p></li></ul><h3>Forward-Looking Conclusion</h3><p>Java 8 is the unique &#8220;Goldilocks&#8221; zone of the Java ecosystem. It&#8217;s old enough to understand where the code came from, and modern enough to work with the tools of today.</p><p>By treating Java 8 as your <strong>Golden Bridge</strong>, you turn a high-risk &#8220;archaeological dig&#8221; into a controlled engineering project. Use it to stabilize your build, containerize your environment, and harden your tests. Once the mud is washed away, the path to Java 21 will be wide open.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.nikmalykhin.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe to get more practical guides on using GenAI tools effectively in software development work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Does Delegating to AI Mean We Can Finally Be Lazy Managers?]]></title><description><![CDATA[I tested Google's Jules agent with two approaches: a vague "lazy" prompt and a detailed technical spec. The results reveal a paradox about AI autonomy and technical debt.]]></description><link>https://www.nikmalykhin.com/p/does-delegating-to-ai-mean-we-can</link><guid isPermaLink="false">https://www.nikmalykhin.com/p/does-delegating-to-ai-mean-we-can</guid><dc:creator><![CDATA[Nik]]></dc:creator><pubDate>Tue, 20 Jan 2026 08:00:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-Ojx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d27381-c618-42b7-a15f-62e1d625e22d_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>1. The Hook</h3><p>We often sell AI adoption to our bosses (and ourselves) with the promise of speed. We imagine a future where we toss a vague request over the wall&#8212;&#8221;fix the build,&#8221; &#8220;export the data,&#8221; &#8220;optimize the query&#8221;&#8212;and the AI handles the rest while we grab a coffee.</p><p>But my recent experiments with Jules, Google&#8217;s new AI agent, suggest the opposite is true. The more &#8220;autonomy&#8221; I gave the AI, the more mediocre the code became. This leads to an uncomfortable question: <strong>Does effective AI delegation actually require </strong><em><strong>more</strong></em><strong> management overhead, not less?</strong></p><h3>2. Context &amp; Tools</h3><p>I&#8217;ve been experimenting with <strong><a href="https://jules.google/">Jules</a></strong>, testing its ability to act as a &#8220;Junior Developer&#8221; in my Spring Boot repository, <strong><a href="https://github.com/nikmalykhin/joyofenergy-java-jules/">joyofenergy-java</a></strong>.</p><p>In my previous explorations, I looked at <a href="https://www.nikmalykhin.com/p/pair-authoring-with-an-ai-a-case">Pair-Authoring with an AI</a> and the <a href="https://www.nikmalykhin.com/p/the-context-window-paradox-to-get?utm_source=publication-search">Context Window Paradox</a>. This time, I wanted to test the difference between <strong>Abdication</strong> (lazy delegation) and <strong>Navigation</strong> (structured delegation) when asking an agent to build a feature from scratch.</p><h3>3. The Failed Experiment: The &#8220;Friday Afternoon&#8221; Prompt</h3><p>I set up a scenario we&#8217;ve all faced: It&#8217;s Friday afternoon, I want a new feature shipped, and I don&#8217;t want to think about the implementation details.</p><p>I gave Jules the &#8220;Lazy Manager&#8221; prompt:</p><blockquote><p>&#8220;Jules, create an endpoint to export meter readings as a CSV file. Use the existing MeterReadingService.&#8221;</p></blockquote><p>I intentionally withheld constraints. I didn&#8217;t mention memory usage, libraries, or formatting.</p><p>The Result?</p><p>Technically, it worked. Jules created a CsvService, updated the controller, and passed the tests. But structurally, it was a time-bomb.</p><ul><li><p><strong>Memory Unsafety:</strong> It loaded the entire dataset into a <code>List</code> in memory before writing the response. For a smart meter with 100,000 readings, this is an <code>OutOfMemoryError</code> waiting to happen.</p></li><li><p><strong>Library Bloat:</strong> It generated a new service class (<code>CsvService</code>)  where a simple stream in the controller would have sufficed.</p></li><li><p><strong>Junior Mistakes:</strong> It used standard Java formatting without considering how a user would actually open the file in Excel.</p></li></ul><p>The &#8220;lazy&#8221; prompt produced &#8220;lazy&#8221; code: functional, but dangerous at scale. It validated my fear that <a href="https://www.nikmalykhin.com/p/does-more-powerful-ai-mean-slower?utm_source=publication-search">More Powerful AI Doesn&#8217;t Always Mean Faster Fixes</a>.</p><h3>4. Principles That Actually Work: The &#8220;Brief&#8221;</h3><p>I reset the experiment. This time, I treated Jules like a Senior Engineer would treat a Junior: I wrote a spec.</p><p>I uploaded a file named <a href="https://github.com/nikmalykhin/joyofenergy-java-jules/blob/add-specs-feature-csv-export/specs/feature-csv-export.md">feature-csv-export.md</a> containing strict constraints:</p><ol><li><p><strong>No New Dependencies:</strong> Do not add <code>apache-commons</code> or <code>opencsv</code>.</p></li><li><p><strong>Memory Safety:</strong> Do not load lists into memory; stream directly to the <code>HttpServletResponse</code>.</p></li><li><p><strong>Strict Formatting:</strong> Use <code>yyyy-MM-dd HH:mm</code>.</p></li></ol><p>I then prompted:</p><blockquote><p>&#8220;Jules, I&#8217;ve uploaded a spec file... Please refactor the implementation to strictly follow these constraints.&#8221; </p></blockquote><p>The Outcome:</p><p>The difference was night and day.</p><ul><li><p><strong>Architectural Safety:</strong> Jules implemented a streaming solution using <code>PrintWriter</code>, avoiding the memory bottleneck entirely.</p></li><li><p><strong>Dependency Management:</strong> It correctly added <code>jakarta.servlet-api</code> as a <code>compileOnly</code> dependency, respecting the &#8220;no runtime bloat&#8221; rule.</p></li><li><p><strong>Test Integrity:</strong> It initially failed to test the controller response correctly, but because I had defined the &#8220;correct&#8221; output in the spec, I could guide it to fix the assertion logic.</p></li></ul><h3>5. Unexpected Discovery: The &#8220;Spec&#8221; as a Guardrail</h3><p>The most surprising insight was that Jules didn&#8217;t just follow the instructions&#8212;it used the spec file as a defense mechanism against bad code.</p><p>When I ran the &#8220;Lazy&#8221; experiment, Jules defaulted to the path of least resistance (loading data into memory). When I provided the &#8220;Brief,&#8221; Jules shifted behavior entirely. It didn&#8217;t just write code; it <strong>navigated the constraints</strong>.</p><p>This confirms a theory I touched on in <a href="https://www.nikmalykhin.com/p/can-we-make-ai-code-assistants-smarter?utm_source=publication-search">Can We Make AI Code Assistants Smarter by Asking Them to Write Their Own Rules?</a> The AI performs best not when it has &#8220;creative freedom,&#8221; but when it is boxed in by rigid technical constraints. The &#8220;Senior Engineer&#8221; input  wasn&#8217;t the code I wrote, but the boundaries I set.</p><h3>6. The Central Paradox</h3><p>This brings us to the Delegation Paradox:</p><p>To get an AI agent to work autonomously, you must micromanage the requirements.</p><p>If you want to be &#8220;lazy&#8221; during the implementation phase (execution), you must be hyper-active during the definition phase (specification). You cannot abdicate both.</p><ul><li><p><strong>Abdication</strong> (Vague prompt) -&gt; Requires heavy code review and refactoring later.</p></li><li><p><strong>Navigation</strong> (Detailed spec) -&gt; Requires heavy upfront thought, but produces near-production-ready code.</p></li></ul><p>We aren&#8217;t thinking <em>less</em> with AI; we are shifting <em>when</em> we think.</p><h3>7. Forward-Looking Conclusion</h3><p>Tools like Jules are shifting the developer&#8217;s role from &#8220;writer of code&#8221; to &#8220;architect of constraints.&#8221;</p><p>If you treat your AI agent like a magic wand that reads your mind, you will build technical debt at record speeds. But if you treat it like a talented but literal-minded junior developer who needs a solid brief, it becomes a powerful force multiplier.</p><p>The future of engineering isn&#8217;t about writing the perfect function; it&#8217;s about writing the perfect spec.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.nikmalykhin.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe to get more practical guides on using GenAI tools effectively in software development work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Can We Skip TDD with Modern AI? A Context Experiment]]></title><description><![CDATA[The Hook]]></description><link>https://www.nikmalykhin.com/p/can-we-skip-tdd-with-modern-ai-a</link><guid isPermaLink="false">https://www.nikmalykhin.com/p/can-we-skip-tdd-with-modern-ai-a</guid><dc:creator><![CDATA[Nik]]></dc:creator><pubDate>Tue, 09 Dec 2025 08:01:03 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-Ojx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d27381-c618-42b7-a15f-62e1d625e22d_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>The Hook</h3><p>Recently, some colleagues pitched me an idea: &#8220;Today, LLMs are so powerful, you can start exactly from implementation and it will work well. No need to use TDD or other more complicated XP techniques&#8221;.</p><p>It is a tempting thought. If an AI can generate a complete feature in seconds, is my approach&#8212;always start from a test&#8212;still relevant?.</p><p>I decided to check it. I ran an experiment to see if I could implement a complex feature by describing the task and letting GenAI create the application. My hypothesis was that TDD is still vital, but I wanted to see if the &#8220;Just Do It&#8221; method could prove me wrong.</p><p>The result? I confirmed exactly what I expected: <strong>TDD is one of the best ways to create context for an LLM.</strong></p><h3>Personal Context &amp; Tools</h3><p>For this experiment, I returned to a project I started in a previous article: <a href="https://www.nikmalykhin.com/p/does-ai-need-clear-goals-my-experiment">&#8220;Does AI Need Clear Goals? My Experiment in Turning Vague Ideas into Code&#8221;</a>.</p><p>My tool of choice was <strong>GPT-4.1</strong> (via GitHub Copilot), utilizing its Agent mode to handle multi-file context. Usually, I treat the AI as a pair programmer, following structured collaboration methods I&#8217;ve discussed in <a href="https://www.nikmalykhin.com/p/pair-authoring-with-an-ai-a-case">&#8220;Pair-Authoring with an AI: A Case Study in Structured Collaboration&#8221;</a>.</p><p>But for this session, I acted as a &#8220;manager,&#8221; giving requirements and approving plans, but explicitly skipping the &#8220;Red&#8221; phase of TDD. I let the AI write the code first.</p><h3>The Failed Experiment</h3><p>The task was <strong>Story #2346</strong>: Implement a &#8220;Day of Week Pricing Plan&#8221;. The requirements were clear: users needed to compare power usage costs based on the day of the week and rank price plans accordingly.</p><p>I approved the AI&#8217;s plan and let it generate the implementation. Here is where the &#8220;No TDD&#8221; approach started to show its cracks.</p><p><strong>1. The &#8220;Ghost Method&#8221; Problem</strong> After the AI implemented the service layer, my IDE lit up with errors. The AI used a method <code>getDayOfWeekMultiplier(DayOfWeek)</code> that didn&#8217;t exist. It &#8220;hallucinated&#8221; a method on the domain object because it was writing the service in isolation. I am usually fine with &#8220;Red&#8221; code, but this wasn&#8217;t TDD &#8220;Red&#8221;&#8212;this was just broken code requiring immediate fixes.</p><p><strong>2. The Regression Nightmare</strong> When we fixed the missing method, we broke the existing logic.</p><blockquote><p>PricePlanTest &gt; shouldReceiveMultipleExceptionalDateTimes() FAILED</p></blockquote><p>Because we implemented the new logic <em>over</em> the old logic without a guiding test, the AI introduced regressions. We had to do several iterations just to get back to a baseline.</p><p><strong>3. The Context Disconnect</strong> The real struggle happened during Functional Testing. I asked the AI to verify the endpoints. It generated a test that tried to hit the API, but it returned a <strong>404 Not Found</strong>. Why? The AI created a test that queried a Smart Meter ID, but &#8220;it didn&#8217;t have a context!&#8221;. It forgot that in this application, a Smart Meter must be linked to a Price Plan via the <code>AccountService</code> first. The AI tried to guess the solution, attempting to call an API <code>/account/link/{smart-metter-id}</code> that didn&#8217;t even exist.</p><h3>Principles That Actually Work</h3><p>I eventually finished the task without TDD, but it required multiple rollbacks and context corrections. Through this struggle, I confirmed why TDD works:</p><p><strong>Principle 1: Tests Are Context Anchors</strong> The reason the AI failed the functional test setup was a lack of context. If I had written the test <em>first</em>, I would have been forced to set up the <code>AccountService</code> association immediately. The failing test provides the AI with a strict &#8220;Context Window&#8221; of what is required, as I explored in <a href="https://www.nikmalykhin.com/p/the-context-window-paradox-to-get?utm_source=publication-search">&#8220;The Context Window Paradox&#8221;</a>.</p><p><strong>Principle 2: Small Steps Prevent &#8220;Imagination&#8221;</strong> When the AI doesn&#8217;t have enough context, it tries to imagine the answer. TDD forces small, verifiable steps. By skipping the test, I forced the AI to generate a large chunk of logic (Controller + Service) at once, increasing the surface area for hallucinations.</p><h3>Unexpected Discovery</h3><p>The most painful part of skipping TDD wasn&#8217;t the coding&#8212;it was the debugging.</p><p>When I finally added tests <em>after</em> the implementation to verify the logic, one failed with a confusing error:</p><blockquote><p>Expecting actual: {FRIDAY=[...]} to contain key: MONDAY</p></blockquote><p>This revealed a critical weakness of the &#8220;Test After&#8221; approach. When a test fails, you don&#8217;t know where the problem is: &#8220;In the tests or in the business logic.&#8221;. It turned out to be an error in the test data (the date provided was a Friday, not Monday). If I had written the test first, the AI would have generated the implementation <em>based</em> on that test data. We wouldn&#8217;t have had this problem at all.</p><h3>The Central Paradox</h3><p>We tend to think that as AI gets smarter, we can think less. I touched on this in <a href="https://www.nikmalykhin.com/p/can-we-think-less-with-ai?utm_source=publication-search">&#8220;Can We Think Less with AI?&#8221;</a>.</p><p>But this experiment confirmed a paradox: <strong>To move faster with AI, you must slow down enough to write the test.</strong></p><p>Can we avoid the loops of small context errors? Yes. TDD reduces complexity and creates trust between us and the AI . The test acts as a contract. Without it, you are just hoping the AI guesses your architectural constraints correctly.</p><h3>Forward-Looking Conclusion</h3><p>So, can we skip TDD? Yes, but you will spend more time adding additional context manually.</p><p>The power of TDD is approaching a new peak in the AI era: tests create a <strong>POWERFUL CONTEXT</strong> for LLMs. Modern models like GPT-4 are powerful, but &#8220;better LLM, not exclude context from that function&#8221;.</p><p>If you want to get the most out of your AI teammate, don&#8217;t just ask it to write code. Give it a failing test.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.nikmalykhin.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe to get more practical guides on using GenAI tools effectively in software development work</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Does "Extract Method" Actually Hurt Your Readability?]]></title><description><![CDATA[We&#8217;ve all been there.]]></description><link>https://www.nikmalykhin.com/p/does-extract-method-actually-hurt</link><guid isPermaLink="false">https://www.nikmalykhin.com/p/does-extract-method-actually-hurt</guid><dc:creator><![CDATA[Nik]]></dc:creator><pubDate>Tue, 25 Nov 2025 08:01:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-Ojx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d27381-c618-42b7-a15f-62e1d625e22d_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We&#8217;ve all been there. A feature starts simple, maybe 20 lines. But after three or four iterations, that same function has ballooned to 200 lines, a tangled mess of nested <code>if-else</code> blocks. </p><p>Does that reality sound familiar?</p><p>When faced with this, we have two main choices. One way is to create tech debt, a task we&#8217;ll never <em>really</em> get to because we will always have more urgent priorities from the business. The other way was shown in the foundational book, <strong><a href="https://martinfowler.com/books/refactoring.html">Refactoring by Kent Beck and Martin Fowler</a></strong>. This path treats refactoring as a continuous action, not a tech debt item in the backlog.</p><p>But if we choose to refactor continuously, what does that <em>really</em> mean, and are our tools helping or hurting?</p><h3>My Context and the &#8220;Easy&#8221; Button</h3><p>Working in a <strong>Java/Kotlin</strong> environment, my tool of choice is <strong>IntelliJ IDEA</strong>. It&#8217;s an incredibly powerful IDE with a host of features designed to help.</p><p>When facing a 200-line monster method, the most obvious solution is right in the refactoring menu: <strong>&#8220;<a href="https://www.jetbrains.com/help/idea/extract-method.html">Extract Method</a>&#8221;</strong>. It seems perfect. It makes the original method smaller, which is exactly what I want.</p><p>Right?</p><h3>Introducing the Core Concept: Readability-Driven Refactoring</h3><p>The main goal of refactoring shouldn&#8217;t just be &#8220;smaller methods.&#8221; For me, the main goals are <strong>readability</strong> and, secondarily, <strong>decoupling</strong>.</p><p>In fact, readability is arguably more important than adhering to a specific architecture or design pattern. While good architecture often improves readability, it&#8217;s not its primary goal. If I have a choice between perfect pattern adherence and readability, I will prefer readability. Working on a typical web application, it&#8217;s readability that helps me daily when I look at different parts of the code.</p><p>This is where the simple &#8220;Extract Method&#8221; tool falls short. It often just moves the mess, failing to improve readability.</p><p>A more powerful <em>technique</em> for guiding this process is <strong>Test-Driven Development (TDD)</strong>. Instead of just extracting code, we use TDD to <em>describe our expectations</em> for the new, refactored code <em>before</em> we write it. This small shift in process fundamentally changes the quality of the refactoring.</p><h3>Practical Application: A TDD-Led Refactoring</h3><p>Let&#8217;s look at a practical example.</p><h4>The Problem Code</h4><p>Imagine we have this block of code in a method. It&#8217;s searching for properties, then mapping them to calculate Avios points, with error handling mixed in .</p><pre><code>summaries =
    shc
    .psSearch(
        startDate = startDate,
        nights = nights,
        hotelCodes = it,
        adults = adultsParam,
        children = childrenParam,
        infants = infantsParam,
    ).toTypedArray()
    .mapNotNull { tbh -&gt;
        kotlin
            .runCatching {
                aviosEarn = aviosAdapter.calculateAviosEarn(BigDecimal(tbh.summary!!.totalPrice!!))
                tbh.toAccommodationSummary(aviosEarn)
            }.onFailure { e -&gt;
                SASAdapter.Companion.log.warn("Skipping", e)
            }
            .getOrNull()
    }.toList()</code></pre><h4>Common Pitfall: The &#8216;Extract Method&#8217; Trap</h4><p>If we use the &#8220;Extract Method&#8221; feature in our IDE, we get this:</p><p><strong>Original method:</strong></p><pre><code>summaries = requestSummariesAndCalculateAviosEarn(startDate, nights, it, adultsParam, childrenParam, infantsParam)</code></pre><p><strong>New private method:</strong></p><pre><code>private fun requestSummariesAndCalculateAviosEarn(
    startDate: LocalDate,
    nights: Int,
    it: List&lt;String&gt;,
    adultsParam: String,
    childrenParam: String,
    infantsParam: String,
): List&lt;AccommodationSummary&gt; =
    shc
        .psSearch(
            startDate = startDate,
            nights = nights,
            hotelCodes = it,
            adults = adultsParam,
            children = childrenParam,
            infants = infantsParam,
        ).toTypedArray()
        .mapNotNull { tbh -&gt;
            calculateAviosEarnAndMapToAccommodationSummary(tbh)
        }.toList()

private fun calculateAviosEarnAndMapToAccommodationSummary(tbh: TBH): AccommodationSummary? {
    var aviosEarn: Int 
    return runCatching { 
        aviosEarn =
            aviosAdapter.calculateAviosEarn(BigDecimal(tbh.summary!!.totalPrice!!)) 
        tbh.toAccommodationSummary(aviosEarn) 
    }.onFailure { e -&gt; 
        log.warn(&#8221;Skipping&#8221;, e) 
    }
        .getOrNull() 
}</code></pre><p>Is this good? Not exactly. It makes the original method smaller, but it doesn&#8217;t improve readability. We&#8217;ve just created a new private method that takes a <em>mess</em> of parameters.</p><h3>The Better Way: The TDD-Led Flow</h3><p>Instead of using the IDE tool, let&#8217;s use the TDD <em>technique</em>.</p><ol><li><p><strong>Describe Expectations:</strong> We start by writing a test for the logic we <em>want</em> to have. We don&#8217;t want to just test a private method; this logic feels like it belongs in its own service.</p></li><li><p><strong>Define the &#8220;To-Be&#8221; Service:</strong> We&#8217;ll create a test for a new <code>SummaryAdapter</code>. At first, this service is &#8220;red&#8221; (it doesn&#8217;t exist).</p></li><li><p><strong>Discover the Parameter Problem:</strong> As we write the test and describe the method we want to call, we see the problem clearly: it needs too many parameters.</p></li><li><p><strong>The Solution:</strong> The test itself shows us what we need. Instead of passing 6 individual parameters, we should pass a single <code>SearchCriteria</code> object. We define this object as an expectation of our test.</p></li><li><p><strong>Implement:</strong> We now implement the new service, moving the logic from the old method.</p></li></ol><p><strong>The Result:</strong></p><p>By extracting the logic to a new service and passing a parameter object, the original code now looks like this:</p><pre><code>summaries = SummaryAdapter.requestSummariesAndCalculateAviosEarn(searchCriteria, it)</code></pre><p>Did we improve readability? Yes. And not just because the method is smaller, but because we are no longer passing an excessive number of parameters, as we were with the simple &#8220;Extract Method&#8221;.</p><h3>A Technique Over a Tool</h3><p>IDE tools are wonderful, and techniques like TDD are powerful.</p><p>Of course, we <em>could</em> have used the IDE tools to change the method signature, create a new class, and move the method there. What the tool <em>can&#8217;t</em> do is help us understand what we want to do in the first place. We can&#8217;t describe our expectations to the tool.</p><p>TDD gives us that option: <strong>we describe our expectations before the work</strong>. This key difference is what truly changes the quality of our refactoring.</p><p>By knowing different techniques, we can understand when and which tool to use. Don&#8217;t let the tool lead the refactoring; let your <em>technique</em> guide the tool.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.nikmalykhin.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe to get more practical guides on using GenAI tools effectively in software development work</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Does AI Need Clear Goals? My Experiment in Turning Vague Ideas into Code]]></title><description><![CDATA[We&#8217;re all told the same thing: AI needs clear, specific, and context-rich prompts to be useful.]]></description><link>https://www.nikmalykhin.com/p/does-ai-need-clear-goals-my-experiment</link><guid isPermaLink="false">https://www.nikmalykhin.com/p/does-ai-need-clear-goals-my-experiment</guid><dc:creator><![CDATA[Nik]]></dc:creator><pubDate>Tue, 11 Nov 2025 08:00:50 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-Ojx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d27381-c618-42b7-a15f-62e1d625e22d_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We&#8217;re all told the same thing: AI needs clear, specific, and context-rich prompts to be useful. &#8220;Garbage in, garbage out.&#8221; This is especially true in engineering.</p><p>But what if your job isn&#8217;t to execute a clear task, but to <em>find</em> the task?</p><p>In my current work, we do a lot of research. Goals are not clear. We receive highly abstract, one-sentence ideas that need to be explored. This research is a necessary, messy process of discovery, and it&#8217;s full of &#8220;boilerplate&#8221; actions.</p><p>This got me thinking. We assume AI is for <em>execution</em>, but can we use it for <em>exploration</em>? What happens when you feed an AI a problem that you, the engineer, don&#8217;t even fully understand yet?</p><p>I ran an experiment to find out, starting with nothing but a single, vague sentence.</p><h3>My Setup: From Vague Idea to Boilerplate</h3><p>My goal was to see if I could use Generative AI to shepherd a &#8220;one-sentence idea&#8221; all the way to a foundational, runnable piece of code.</p><p>My toolkit was straightforward:</p><ul><li><p><strong>The Idea:</strong> A vague user story, &#8220;#2348: As an administrator I want to add a new tariff so that it can be advertised to users who may benefit&#8221;. This was perfect because it was so vague&#8212;what&#8217;s a &#8220;tariff&#8221;? How is it &#8220;advertised&#8221;?</p></li><li><p><strong>The &#8220;Analyst&#8221; AI:</strong> I used <strong>Gemini 2.5 Pro</strong> to act as a Product Owner and flesh out this vague idea.</p></li><li><p><strong>The &#8220;Developer&#8221; AI:</strong> I then used <strong>GitHub Copilot (CPT 4.1)</strong> in <strong>IntelliJ</strong> to write the boilerplate code.</p></li><li><p><strong>The Project:</strong> All this was done in the context of TW &#8220;<a href="https://www.thoughtworks.com/en-es/insights/blog/careers-at-thoughtworks/joi_application_process">Joy of Energy</a>&#8221; project, a Java Spring Boot application.</p></li></ul><p>The plan was a two-part workflow:</p><ol><li><p><strong>Part 1: AI as Business Analyst.</strong> Feed the vague story to Gemini and ask it to define the requirement.</p></li><li><p><strong>Part 2: AI as Boilerplate Generator.</strong> Feed the <em>AI-generated spec</em> to Copilot and ask it to write the code.</p></li></ol><h3>The Failed Experiment (That Was Actually a Success)</h3><p>My first attempts were a perfect illustration of the &#8220;AI is context-blind&#8221; problem. The &#8220;failure&#8221; wasn&#8217;t that the AI was useless; it&#8217;s that its first drafts were wrong in very specific, instructive ways.</p><p><strong>Failure 1: The AI &#8220;Product Owner&#8221; Became a Tech Lead</strong> I asked Gemini to act as a Product Owner and flesh out the story . It made a &#8220;very popular mistake&#8221;: it skipped the &#8220;what&#8221; and &#8220;why&#8221; and jumped straight to the &#8220;how.&#8221;</p><p>The <em>very first draft</em> of the spec it gave me wasn&#8217;t a user story; it was a technical task. It immediately suggested a <code>JPA @Entity</code> and defined fields like <code>id</code> as a <code>UUID</code>. It was already designing the database schema.</p><p>This is exactly what you <em>don&#8217;t</em> want from a user story, and it&#8217;s a common trap where the AI tries to be the engineer, not the analyst. As I&#8217;ve written before, the AI&#8217;s job is to reflect our needs, not just give us a technical answer (you can read more on that idea here: <a href="https://www.nikmalykhin.com/p/how-genai-helps-engineers-write-better">How GenAI Helps Engineers Write Better</a>).</p><p>I had to intervene, critique the output, and explicitly ask it to &#8220;Change database to more abstract system&#8221; to get the clean, implementation-agnostic user story and Acceptance Criteria (ACs) I actually needed .</p><p><strong>Failure 2: The AI &#8220;Developer&#8221; Was a Clumsy New Hire</strong> After I had a clean spec, I gave it to GitHub Copilot with a clear prompt: generate a POJO, an in-memory Service, and a Controller .</p><p>The code it generated was not &#8220;copy-paste and run&#8221;.</p><ul><li><p><strong>Wrong Package Structure:</strong> It invented a &#8220;by-feature&#8221; package structure (<code>com.joi.energy.tariff</code>). My project uses a &#8220;by-layer&#8221; structure (<code>uk.tw.energy.domain</code>, <code>uk.tw.energy.service</code>, etc.) .</p></li><li><p><strong>Missing Dependencies:</strong> It correctly suggested using <code>jakarta.validation</code> annotations &#8212;a great idea!&#8212;but my project didn&#8217;t have that dependency.</p></li><li><p><strong>Minor (Human) Errors:</strong> It even forgot the <code>@Service</code> annotation on the <code>TariffService</code>, a simple mistake I&#8217;ve made myself a dozen times.</p></li></ul><p>If I were a junior engineer, I would have been blocked or, worse, just pasted it all in, breaking the project&#8217;s architecture.</p><h3>Principles That Actually Work</h3><p>These &#8220;failures&#8221; led me to the real principles of using AI for this kind of work.</p><p><strong>1. The AI is a &#8220;Demultiplicator,&#8221; Not a Supercharger</strong> This was my single most important insight. A supercharger just makes the engine spin <em>faster</em>. A demultiplicator (like a reduction gear) <em>changes the nature</em> of the work, trading raw speed for torque.</p><p>The AI is a demultiplicator for my brain.</p><p>When I was iterating on the user story, I didn&#8217;t think about &#8220;how to write these words or if it sounds good&#8221;. I was 100% focused on the <em>business goals</em>. The AI handled the <em>typing</em>, and I handled the <em>validating</em>. This is a profound shift. It took me 30 minutes to get a solid user story, not because I typed fast, but because I <em>thought</em> fast, using the AI&#8217;s draft as a disposable starting point.</p><p><strong>2. The Engineer&#8217;s New Job: Strategist and Context-Provider</strong> The AI&#8217;s mistakes weren&#8217;t stupid; they were <em>context-blind</em>. This reveals the engineer&#8217;s true role in an AI-augmented workflow: we are the &#8220;Reviewer and Strategist&#8221;.</p><p>My job wasn&#8217;t to write getters and setters. My job was to make two high-level strategic decisions:</p><ol><li><p>&#8220;The AI is right, <code>jakarta.validation</code> is a good idea. I will add that dependency&#8221;.</p></li><li><p>&#8220;The AI is wrong about the package structure. I will correct it to follow our existing pattern&#8221;.</p></li></ol><p>The AI&#8217;s &#8220;flawed&#8221; draft actually <em>forced</em> me to think strategically about my project&#8217;s architecture and dependencies.</p><p><strong>3. Embrace the &#8220;90% Win&#8221; and the Iterative Loop</strong> The AI&#8217;s output doesn&#8217;t need to be 100% perfect to be valuable. The boilerplate it generated, despite its flaws, was a &#8220;90% win&#8221;. It saved me from the &#8220;boring boilerplate&#8221; and the hours I would have spent on Stack Overflow as a junior engineer.</p><p>More importantly, the AI&#8217;s <em>mistakes</em> are part of the value. That wrong package structure? It&#8217;s a great &#8220;recommendation for reorganizing your project&#8221; and a perfect topic to bring to a team huddle.</p><h3>My Unexpected Discovery: &#8220;1:0 to AI&#8221;</h3><p>The most surprising moment came during the boilerplate generation. I asked for <em>three</em> files (POJO, Service, Controller). The AI gave me <em>four</em>.</p><p>It proactively and correctly created a <code>TariffType.java</code> Enum (<code>FLAT_RATE</code>, <code>TIME_OF_USE</code>).</p><p>This was a perfect &#8220;micro-improvement&#8221;. I called it &#8220;1:0 to AI&#8221;. I was so focused on the &#8220;big picture&#8221; of the architecture that I missed this small, obvious detail. This &#8220;separating of responsibilities&#8221; is incredibly powerful : the AI handles the small details while I focus on the larger strategic goals.</p><h3>The Central Paradox: AI&#8217;s Flaws Are Its Greatest Strength</h3><p>This leads to the central paradox: <strong>The AI is terrible at handling vague, abstract ideas... and yet, it&#8217;s the best tool I have for the job.</strong></p><p>Why? Because its value isn&#8217;t in <em>giving you the right answer</em>. Its value is in its ability to <em>instantly turn a &#8220;blank page&#8221; into a flawed, tangible draft that you can critique</em>.</p><p>The AI&#8217;s initial, flawed responses&#8212;the over-technical user story, the context-blind package structure&#8212;are its most valuable feature. They act as a mirror, forcing the engineer to <em>define</em> the context and <em>make</em> the strategic decisions. It can&#8217;t read your mind, so it forces you to figure out what&#8217;s in it.</p><p>Effective use doesn&#8217;t require a perfect prompt. It requires an engineer to stop acting like a <em>typist</em> and start acting like an <em>editor, a critic, and a strategist</em>.</p><h3>Conclusion: From Vague to Validated</h3><p>The AI didn&#8217;t <em>solve</em> my vague problem. It gave me the tools to solve it myself, faster and at a higher level of abstraction.</p><p>By delegating the &#8220;boring boiler plate code&#8221; , I was able to stay focused on the &#8220;big picture&#8221; and &#8220;business needs&#8221;. This workflow is a powerful way to accelerate research, allowing us to build, test, and throw away foundational ideas at a speed we couldn&#8217;t before.</p><p>The AI isn&#8217;t here to replace us. It&#8217;s here to take the routine work and free us to focus on the hard parts. It&#8217;s a &#8220;demultiplicator&#8221; that gives us the torque to move from a one-sentence idea to a validated, runnable foundation &#8212;flaws and all.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.nikmalykhin.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe to get more practical guides on using GenAI tools effectively in software development work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[What If the ‘Cleanest’ Code Is the Wrong Solution?]]></title><description><![CDATA[In our continuing experiment with Trio Programming&#8212;two engineers and an AI&#8212;we decided to level up.]]></description><link>https://www.nikmalykhin.com/p/what-if-the-cleanest-code-is-the</link><guid isPermaLink="false">https://www.nikmalykhin.com/p/what-if-the-cleanest-code-is-the</guid><dc:creator><![CDATA[Nik]]></dc:creator><pubDate>Tue, 28 Oct 2025 08:00:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-Ojx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d27381-c618-42b7-a15f-62e1d625e22d_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In our continuing experiment with <strong>Trio Programming</strong>&#8212;two engineers and an AI&#8212;we decided to level up. Our first session was a slow, painful grind of fixing our environment. This time, with a stable foundation, we aimed for speed. Our new strategy: write comprehensive tests ourselves, then give the AI the freedom to implement the solution in one big step.</p><p>The initial results were promising. The AI produced working code that passed our tests. But then, our instincts as seasoned developers kicked in. We saw the AI&#8217;s implementation&#8212;a simple <code>Map&lt;String, Object&gt;</code>&#8212;and reflexively identified it as a &#8220;code smell&#8221;. We spent the next hour trying to refactor it into a &#8220;cleaner,&#8221; more object-oriented design using the Composite pattern.</p><p>That&#8217;s when we fell into a trap. Our pursuit of clean code was leading us toward a solution that was elegant, sophisticated, and completely wrong. This led us to our second major discovery: <strong>In AI-augmented development, the biggest risk isn&#8217;t bad AI code, but good human intuition applied to the wrong problem.</strong></p><div><hr></div><h3>Our Setup: Aiming for a Bigger Step</h3><p>Our team remained the same: I (Nik) acted as the driver for <strong>GitHub Copilot</strong>, while Javier served as the strategic navigator. Having stabilized our Java, Spring Boot, and Gradle environment in the last session, we were ready to test a new hypothesis: if we write strong, expectation-focused tests, we can trust the AI with a larger implementation scope and move much faster.</p><p>The flow was simple:</p><ol><li><p>Human engineers write a small, focused test with clear assertions.</p></li><li><p>Let the AI generate the implementation code in a single, larger step to make the test pass.</p></li><li><p>Trust the tests to validate the AI&#8217;s work, rather than meticulously reviewing every line of generated code.</p></li></ol><h3>The Failed Experiment: Refactoring into a Corner</h3><p>The first part of the experiment worked. We added two tests for our hierarchy API, one for a root-only employee and one for a simple employee-supervisor relationship. We then prompted the AI: &#8220;tests looks good, let&#8217;s make postHierarchy method for passing all of them&#8221;.</p><p>The AI&#8217;s implementation worked, save for one minor edge case we quickly fixed. But we weren&#8217;t satisfied. The code returned a <code>Map&lt;String, Object&gt;</code>, and our developer brains screamed for type safety and better design.</p><ol><li><p><strong>The &#8220;Code Smell&#8221; Diagnosis:</strong> We prompted the AI with our concern: &#8220;maybe, response object will make the readability of the code better and will reduce smell of code?&#8221;. This initiated a refactoring plan to introduce a dedicated <code>HierarchyNode</code> class.</p></li><li><p><strong>Applying a Design Pattern:</strong> We pushed further, suggesting a more formal structure: &#8220;maybe we can apply composite pattern... to our response?&#8221;. The goal was to create a pure, object-oriented hierarchy and eliminate the <code>Map</code> entirely.</p></li><li><p><strong>The Collision with Reality:</strong> Our final prompt revealed the fatal flaw in our logic: &#8220;can we avoid to use Map if we will use Spring Boot which we have in our project?&#8221;.</p></li></ol><p>The AI&#8217;s response was the turning point. It patiently explained that given our requirement for dynamic JSON keys (e.g., <code>&#8220;Jonas&#8221;: { &#8220;Sophie&#8221;: ... }</code>), a <code>Map</code> or a structure that serializes like one was <strong>unavoidable</strong> with Spring Boot and its default Jackson serializer.</p><p>We had spent a significant part of our session chasing an elegant design that was fundamentally incompatible with the constraints of our framework and the explicit requirements of the kata. As I noted in my log, &#8220;we spend time trying to add something not workable to the code&#8221;. The AI&#8217;s initial, simpler solution wasn&#8217;t a code smell; it was the correct, pragmatic solution from the start.</p><div><hr></div><h3>Principles That Actually Work</h3><p>This humbling experience confirmed our new hypothesis and revealed principles for a more effective human-AI workflow.</p><ol><li><p><strong>Focus on &#8220;What,&#8221; Not &#8220;How&#8221; (Test-Focused Development).</strong> Our initial strategy was correct. The most valuable role for the human developers is to define the <em>behavior</em> of the system through precise, comprehensive tests. When we focused on the expected JSON output, the AI produced correct code. When we focused on our preconceived notions of &#8220;good&#8221; internal implementation, we wasted time. The tests are the contract; the AI&#8217;s job is to fulfill it.</p></li><li><p><strong>The AI is a Mirror for System Constraints.</strong> The AI is more than a code generator; it&#8217;s an interactive expert on the toolchain. It didn&#8217;t just reject our idea; it explained <em>why</em> it wouldn&#8217;t work within the Spring Boot ecosystem. This prevented us from going further down a dead-end path. Use the AI not just to write code, but to validate your architectural assumptions against the framework&#8217;s reality.</p></li><li><p><strong>Codify Your Learnings into the System.</strong> A failed experiment is only a waste if you don&#8217;t learn from it. The most productive outcome of our refactoring dead-end was updating our <code>.github/copilot-instructions.md</code> file. We added an explicit refactoring protocol and guidance on when to challenge the AI&#8217;s use of patterns versus accepting framework constraints. This turns a session&#8217;s lesson into a permanent upgrade for the trio&#8217;s workflow.</p></li></ol><h3>Unexpected Discovery: AI Generalizes from Specifics</h3><p>After our refactoring detour, we returned to our Test-Focused workflow. We added much more complex tests, including one with multiple employees reporting to the same supervisor and another with a full four-level hierarchy.</p><p>The surprising part? <strong>The AI&#8217;s existing implementation passed these complex tests without any modifications</strong>. This revealed a powerful insight: the AI is remarkably good at generalizing a solution. It needed a few simple, specific test cases to establish the core logic. Once that logic was in place, it was robust enough to handle more complex scenarios automatically. The &#8220;big step&#8221; works, but it needs to be built on a foundation of small, clear examples.</p><h3>The Central Paradox of AI-Driven Speed</h3><p>This leads to the central paradox we uncovered in this session: <strong>To move faster with big, AI-generated implementation steps, you must first slow down and write smaller, more precise human-guided tests.</strong></p><p>Our desire for speed was not at odds with the discipline of TDD; it was enabled by it. The quality of the AI&#8217;s large-scale contribution was directly proportional to the quality of the small-scale expectations we defined. You cannot achieve reliable speed by simply telling the AI &#8220;build this feature.&#8221; You achieve it by saying &#8220;build something that satisfies these very specific, verifiable behaviors.&#8221;</p><h3>Conclusion: We Are Architects of Behavior, Not Just Code</h3><p>Our second session was a success, but not because we wrote code faster. It was a success because we learned how to trust our tests more than our own implementation habits. The &#8220;Test-Focused Development&#8221; rhythm&#8212;small tests by humans, big implementation by AI&#8212;feels right.</p><p>The dynamic is shifting. Our job is becoming less about crafting the perfect implementation and more about architecting the perfect set of expectations. We define the contract with rigorous tests, and the AI, our tireless third programmer, finds the most direct way to fulfill it&#8212;even if it&#8217;s not the way we would have written it ourselves.</p>]]></content:encoded></item><item><title><![CDATA[Does an AI Teammate Mean You Write Less Code?]]></title><description><![CDATA[We embarked on an experiment called Trio Programming: two engineers and an AI assistant building software together.]]></description><link>https://www.nikmalykhin.com/p/does-an-ai-teammate-mean-you-write</link><guid isPermaLink="false">https://www.nikmalykhin.com/p/does-an-ai-teammate-mean-you-write</guid><dc:creator><![CDATA[Nik]]></dc:creator><pubDate>Tue, 14 Oct 2025 07:00:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-Ojx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d27381-c618-42b7-a15f-62e1d625e22d_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We embarked on an experiment called <strong>Trio Programming</strong>: two engineers and an AI assistant building software together. Our goal was to discover effective workflows for this new dynamic. We started with a simple code kata, a clear set of rules for our AI, and a straightforward tech stack. Our assumption was that with a powerful AI coder, we&#8217;d move through the logic faster than ever.</p><p>Instead, we spent almost the entire session without writing a single line of business logic. The AI wrote plenty of code, but it was all in service of fixing a development environment that kept breaking. This led us to a counterintuitive conclusion: <strong>adding an AI to the team doesn&#8217;t accelerate feature development, it brutally exposes foundational weaknesses in your environment and workflow.</strong></p><h3>Our Setup: An Experiment in Trio Programming</h3><p>Our team consisted of myself (Nik) acting as the &#8220;driver&#8221;&#8212;the one interacting directly with the AI&#8212;and my colleague Javier as the &#8220;navigator,&#8221; providing high-level direction and quality control. Our third programmer was <strong>GitHub Copilot</strong>, guided by a detailed set of custom instructions emphasizing a strict Test-Driven Development (TDD) cycle, small incremental changes, and explicit permissions before writing any code.</p><p>The plan was to tackle the &#8220;Hierarchy Kata&#8221; &#8212;a REST API for managing an employee hierarchy&#8212;using a pure stack: <strong>Core Java</strong>, <strong>JUnit 5</strong>, and <strong>Gradle</strong>. We wanted to keep things simple and avoid framework magic.</p><h3>The Experiment That Failed: A Cascade of Configuration Errors</h3><p>Our first mistake was idealism. We started with Core Java to avoid frameworks, but quickly realized the sheer amount of boilerplate needed for a simple REST endpoint was distracting us from the actual kata. We pivoted.</p><p>&#8220;Let&#8217;s delegate that work to Spring,&#8221; we decided, thinking it would get us back on track. This is where the real trouble began. Our session devolved into a frustrating, iterative battle with our own setup, guided by an AI that was helpful but lacked strategic oversight.</p><ol><li><p><strong>Missing Dependencies:</strong> We asked Copilot to generate a test for a Spring Boot controller. It correctly produced a test using <code>@WebMvcTest</code> and <code>MockMvc</code> . But when we ran <code>./gradlew build</code>, the build failed spectacularly with dozens of <code>cannot find symbol</code> and <code>package does not exist</code> errors . Our <code>build.gradle</code> file had JUnit, but none of the required Spring Boot test dependencies.</p></li><li><p><strong>Incorrect Dependency Configuration:</strong> We then asked Copilot to fix our Gradle file. It suggested adding the Spring dependencies, but the first attempt failed because we hadn&#8217;t defined a version number, leading to a <code>Could not find org.springframework.boot:spring-boot-starter-web:.</code> error . The next fix involved adding the dependencies to the <code>subprojects</code> block in our root <code>build.gradle</code>, as they weren&#8217;t being inherited by the kata&#8217;s module. Each step was a tiny, painful discovery.</p></li><li><p><strong>Classpath and Package Structure Hell:</strong> After fixing the build file, the errors persisted. The problem? Our test file, <code>HelloWorldControllerTest.java</code>, was in <code>src/main/java</code> instead of <code>src/test/java</code>. The test dependencies weren&#8217;t on the main classpath. Once we moved it, we hit yet another wall: <code>Unable to find a @SpringBootConfiguration</code>. Our test in the <code>com.kata.hierarchy</code> package couldn&#8217;t find the main application class located in <code>com.example.helloworld</code> because of how Spring&#8217;s component scanning works.</p></li></ol><p>The entire session was a cycle of: ask for code, watch the build fail, feed the error log back to the AI, and apply the suggested micro-fix. We weren&#8217;t programming; we were performing highly-structured, AI-assisted debugging on our own environment.</p><div><hr></div><h3>Principles That Actually Work</h3><p>This frustrating experience revealed three principles that are critical for effective AI-augmented development.</p><ol><li><p><strong>The Environment is Non-Negotiable.</strong> An unstable or poorly understood development environment will completely derail any attempt at Trio Programming. The AI can suggest fixes, but it can&#8217;t reason about your setup holistically. Before you can ask an AI to write a feature, the entire team&#8212;humans and AI&#8212;must operate on a rock-solid foundation where builds, tests, and dependencies are flawless.</p></li><li><p><strong>Human Navigation is Paramount.</strong> The session would have been a total failure without a human navigator. Javier&#8217;s role was crucial for steering the ship. He spotted issues in prompts, provided strategic direction (&#8221;let&#8217;s put it in a new package&#8221; ), and kept the focus on the larger goal while I was in the weeds prompting the AI. As I noted in my log, &#8220;Speak, not only think - it&#8217;s a very strong pattern&#8221;. The AI is a powerful tool, but it needs a human strategist to be effective.</p></li><li><p><strong>Treat the AI as a System, Not Just a Coder.</strong> We started by giving the AI rules for writing code (TDD, small steps). But the real value came from using it as a diagnostic tool for a complex system that included our code, our build tool, and our framework. The prompts that worked best weren&#8217;t &#8220;implement this feature,&#8221; but rather &#8220;here is an error log, diagnose the problem and propose a minimal fix&#8221;.</p></li></ol><div><hr></div><h3>The Unexpected Discovery: The AI Reshapes Human Roles</h3><p>The most surprising insight was how the AI&#8217;s presence changed our own roles. My job as the &#8220;driver&#8221; became less about writing code and more about <strong>prompt engineering and AI flow control</strong>. I was focused on translating our navigator&#8217;s intent into precise instructions and context for the AI.</p><p>Javier&#8217;s &#8220;navigator&#8221; role expanded from guiding the code&#8217;s logic to <strong>managing the overall strategy and quality controlling both my prompts and the AI&#8217;s output</strong>. This division of labor was incredibly effective. Having one person focused on the high-level goal while the other managed the human-AI interface prevented us from getting stuck. The AI didn&#8217;t just add a third programmer; it created a new, more specialized dynamic between the two human programmers.</p><h3>The Central Paradox of AI Collaboration</h3><p>Herein lies the paradox: <strong>The goal of using an AI is to abstract away complexity, but its immediate effect is to surface hidden complexities you&#8217;ve been ignoring.</strong></p><p>We thought we had a working Java setup. But the AI, by trying to follow our commands precisely and rapidly, immediately ran into every single flaw in our Gradle configuration and package structure. A human programmer might have found these issues slowly over time. The AI found them all at once, forcing a full stop.</p><p>Effective use of an AI programmer therefore requires:</p><ul><li><p>An <strong>impeccably configured and automated</strong> development environment.</p></li><li><p><strong>Deep human expertise</strong> in the underlying tools (Gradle, Spring), as the AI&#8217;s suggestions still need validation.</p></li><li><p>A workflow where humans provide <strong>strategic intent</strong>, not just tactical instructions.</p></li></ul><h3>Conclusion: Build Your Pipeline Before You Start the Assembly Line</h3><p>Our first Trio Programming session felt slow and, at times, unproductive. We wanted to build an API, but we ended up building a robust, multi-module Spring Boot Gradle configuration. But as Javier aptly put it, this process is like building a good CI/CD pipeline: it &#8220;reduces the price of mistakes&#8221; and gives you the confidence &#8220;to move forward faster&#8221;.</p><p>The lesson is clear. You can&#8217;t just drop an AI into an existing workflow and expect a productivity boost. You must first use the AI to stress-test and harden your foundations. The initial time investment is not spent on writing features, but on creating an environment so solid that the AI can finally be unleashed on the work you actually want it to do. We ended the day in a much safer, more robust place, ready for the real work to begin.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.nikmalykhin.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe to get more practical guides on using GenAI tools effectively in software development work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[From Buzzword to Practical Tool: A Developer's Guide to Generative AI]]></title><description><![CDATA[It seems like every week there&#8217;s a new AI tool that promises to change everything.]]></description><link>https://www.nikmalykhin.com/p/from-buzzword-to-practical-tool-a</link><guid isPermaLink="false">https://www.nikmalykhin.com/p/from-buzzword-to-practical-tool-a</guid><dc:creator><![CDATA[Nik]]></dc:creator><pubDate>Mon, 29 Sep 2025 09:29:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!NqMm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ed9aaf9-db9b-4fdc-9a53-44e4d55b72dc_915x337.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>It seems like every week there&#8217;s a new AI tool that promises to change everything. The hype is impossible to ignore. But behind the marketing, what are we actually dealing with? How do Large Language Models (LLMs) really work, and more importantly, what are the practical limitations that we, as developers, QAs, and analysts, need to understand to use them effectively and responsibly?</p><p>This article cuts through the noise to explain the core mechanics of Generative AI. We'll explore how these models "think," where they fail, and provide a set of practical heuristics for applying them in our work.</p><div><hr></div><h3>The 'Before' State: From Hard-Coded Rules to Learned Patterns</h3><p>To understand today's Generative AI, we have to look at its conceptual ancestors. The original dream of <strong>Artificial Intelligence</strong> (beginning in the 1950s) was about logic and explicit rules. The idea was to encode expert knowledge into a series of <code>IF &lt;condition&gt; THEN &lt;action&gt;</code> statements. This approach is far from obsolete; it&#8217;s still the backbone of many complex systems.</p><p>For example, I previously worked on a phishing detection team at a <strong>global cybersecurity company</strong>, where our core detection engine was a sophisticated, rule-based AI. We analyzed an email&#8217;s characteristics, and if the combined weighted score of all rules triggered a threshold, we marked it as malicious. That was our production system.</p><p>The first major evolution of this paradigm was <strong>Machine Learning</strong> (ML), which gained traction in the 1980s. Instead of engineers hand-crafting every rule, we could feed a system massive amounts of data and let it discover the patterns on its own. We don't tell a spam filter every possible suspicious word; we show it thousands of examples, and it <em>learns</em> the statistical characteristics of spam. These two ideas&#8212;rules and learning&#8212;are often used together. Our plan at the cybersecurity company was to layer ML on top of our rule engine to automatically spot new threats, rather than waiting for an engineer to write a new rule.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NqMm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ed9aaf9-db9b-4fdc-9a53-44e4d55b72dc_915x337.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NqMm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ed9aaf9-db9b-4fdc-9a53-44e4d55b72dc_915x337.png 424w, https://substackcdn.com/image/fetch/$s_!NqMm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ed9aaf9-db9b-4fdc-9a53-44e4d55b72dc_915x337.png 848w, https://substackcdn.com/image/fetch/$s_!NqMm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ed9aaf9-db9b-4fdc-9a53-44e4d55b72dc_915x337.png 1272w, https://substackcdn.com/image/fetch/$s_!NqMm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ed9aaf9-db9b-4fdc-9a53-44e4d55b72dc_915x337.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NqMm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ed9aaf9-db9b-4fdc-9a53-44e4d55b72dc_915x337.png" width="915" height="337" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ed9aaf9-db9b-4fdc-9a53-44e4d55b72dc_915x337.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:337,&quot;width&quot;:915,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:134009,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.nikmalykhin.com/i/174322894?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ed9aaf9-db9b-4fdc-9a53-44e4d55b72dc_915x337.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NqMm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ed9aaf9-db9b-4fdc-9a53-44e4d55b72dc_915x337.png 424w, https://substackcdn.com/image/fetch/$s_!NqMm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ed9aaf9-db9b-4fdc-9a53-44e4d55b72dc_915x337.png 848w, https://substackcdn.com/image/fetch/$s_!NqMm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ed9aaf9-db9b-4fdc-9a53-44e4d55b72dc_915x337.png 1272w, https://substackcdn.com/image/fetch/$s_!NqMm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ed9aaf9-db9b-4fdc-9a53-44e4d55b72dc_915x337.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3>Introducing the Core Concept: Generative AI</h3><p>The next leap came with <strong>Deep Learning</strong> in the 2010s, a type of ML that uses complex, multi-layered "neural networks" to find incredibly subtle patterns in data. This is the technology that powered the huge advances we saw in image recognition and speech-to-text.</p><p>That brings us to today's breakthrough: <strong>Generative AI</strong>.</p><ul><li><p><strong>What is it?</strong> Generative AI takes powerful Deep Learning models and flips their function. Instead of just <em>recognizing</em> patterns (e.g., "this image contains a cat"), it uses its understanding of those patterns to <em><strong>create</strong></em> new, original content (e.g., "generate a picture of a cat"). Large Language Models are a prime example of this capability.</p></li><li><p><strong>Why does it matter?</strong> The impact is massive because an estimated 80% of the world's data is unstructured text&#8212;emails, documents, support tickets, etc. LLMs are the first technology that can both process and generate human language at scale, creating a new human-computer interface where we can use <strong>natural language to express intent</strong>.</p></li><li><p><strong>How does it work?</strong> At its core, an LLM is a sophisticated pattern-matching machine built on a technology called the <strong>Transformer architecture</strong>. Its fundamental job is surprisingly simple: <strong>to predict the most statistically probable next word in a sequence</strong>. It's essentially a very powerful autocomplete. To do this, it relies on two key concepts:</p><ol><li><p><strong>Tokens</strong>: The model doesn't see words; it sees "tokens". Text is broken down into these building blocks&#8212;which can be words, parts of words, or punctuation. For example, <code>Generative AI is powerful</code> might become <code>["Gener", "ative", " AI", " is", " powerful"]</code>. A model's limits and API costs are all measured in tokens.</p></li><li><p><strong>The Context Window</strong>: This is the model's short-term memory. LLMs are <strong>stateless</strong>; they don't truly "remember" past conversations. With each prompt, the application sends the <em>entire conversation history</em> back to the model. This entire block of text must fit within the context window, which has a fixed token limit (e.g., 8k or 128k). If a conversation gets too long, the oldest messages are dropped, which is why the model seems to "forget" what was said earlier.</p></li></ol></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WrMT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a82ca8-6b77-4f6f-b912-af406d66209c_547x325.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WrMT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a82ca8-6b77-4f6f-b912-af406d66209c_547x325.png 424w, https://substackcdn.com/image/fetch/$s_!WrMT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a82ca8-6b77-4f6f-b912-af406d66209c_547x325.png 848w, https://substackcdn.com/image/fetch/$s_!WrMT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a82ca8-6b77-4f6f-b912-af406d66209c_547x325.png 1272w, https://substackcdn.com/image/fetch/$s_!WrMT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a82ca8-6b77-4f6f-b912-af406d66209c_547x325.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WrMT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a82ca8-6b77-4f6f-b912-af406d66209c_547x325.png" width="547" height="325" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a5a82ca8-6b77-4f6f-b912-af406d66209c_547x325.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:325,&quot;width&quot;:547,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19763,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nikmalykhin.com/i/174322894?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a82ca8-6b77-4f6f-b912-af406d66209c_547x325.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WrMT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a82ca8-6b77-4f6f-b912-af406d66209c_547x325.png 424w, https://substackcdn.com/image/fetch/$s_!WrMT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a82ca8-6b77-4f6f-b912-af406d66209c_547x325.png 848w, https://substackcdn.com/image/fetch/$s_!WrMT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a82ca8-6b77-4f6f-b912-af406d66209c_547x325.png 1272w, https://substackcdn.com/image/fetch/$s_!WrMT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5a82ca8-6b77-4f6f-b912-af406d66209c_547x325.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3>Practical Applications: Choosing the Right Tool for the Job</h3><p>Different LLMs are trained with different goals, giving them unique strengths. Choosing the right one is a key engineering decision. The following list is not exhaustive, but it reflects the tools my team and I rely on for our daily work.</p><p>The main model families you'll encounter are:</p><ol><li><p><strong>OpenAI's GPT Series (</strong><code>GPT-4o</code><strong>, etc.)</strong>: Best known as a powerful all-rounder, excelling at tasks requiring strong <strong>logical reasoning</strong> and complex <strong>code generation</strong>. This is often the go-to for debugging a tricky algorithm or scaffolding a new service.</p></li><li><p><strong>Anthropic's Claude Series (</strong><code>Claude 3.5 Sonnet</code><strong>, etc.)</strong>: Built with a heavy emphasis on safety and "Constitutional AI". Claude often produces more careful, <strong>nuanced writing</strong> and is a great choice for tasks like drafting detailed technical documentation or analyzing sensitive user feedback where tone and safety are paramount.</p></li><li><p><strong>Google's Gemini Series (</strong><code>Gemini 1.5 Pro &amp; Flash</code><strong>)</strong>: This family offers a trade-off. <strong>Gemini Pro</strong> is the high-power version focused on top-tier reasoning and advanced multi-modal capabilities. Its sibling, <strong>Gemini Flash</strong>, is optimized for speed and cost-efficiency, making it ideal for high-volume, lower-complexity tasks like chatbots or data extraction where low latency is critical.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L3M0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c86c05-5966-4ec1-bc4f-1fbed57c2126_1176x332.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L3M0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c86c05-5966-4ec1-bc4f-1fbed57c2126_1176x332.png 424w, https://substackcdn.com/image/fetch/$s_!L3M0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c86c05-5966-4ec1-bc4f-1fbed57c2126_1176x332.png 848w, https://substackcdn.com/image/fetch/$s_!L3M0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c86c05-5966-4ec1-bc4f-1fbed57c2126_1176x332.png 1272w, https://substackcdn.com/image/fetch/$s_!L3M0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c86c05-5966-4ec1-bc4f-1fbed57c2126_1176x332.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L3M0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c86c05-5966-4ec1-bc4f-1fbed57c2126_1176x332.png" width="1176" height="332" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f8c86c05-5966-4ec1-bc4f-1fbed57c2126_1176x332.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:332,&quot;width&quot;:1176,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:54635,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nikmalykhin.com/i/174322894?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c86c05-5966-4ec1-bc4f-1fbed57c2126_1176x332.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!L3M0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c86c05-5966-4ec1-bc4f-1fbed57c2126_1176x332.png 424w, https://substackcdn.com/image/fetch/$s_!L3M0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c86c05-5966-4ec1-bc4f-1fbed57c2126_1176x332.png 848w, https://substackcdn.com/image/fetch/$s_!L3M0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c86c05-5966-4ec1-bc4f-1fbed57c2126_1176x332.png 1272w, https://substackcdn.com/image/fetch/$s_!L3M0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c86c05-5966-4ec1-bc4f-1fbed57c2126_1176x332.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3>Common Pitfalls &amp; Misconceptions</h3><p>The architecture of LLMs leads to two fundamental limitations that every user must understand.</p><h4>1. Hallucinations: Plausible vs. Truthful</h4><p>Because an LLM's only job is to predict the next most probable token, it is optimized to generate text that is <strong>plausible</strong>, not text that is factually <strong>true</strong>. It has no internal knowledge base or concept of truth. If you ask it to find sources for a claim, it will generate a list of references that <em>looks</em> perfect&#8212;with authors, titles, and journals that fit the pattern&#8212;but the sources themselves may be completely fabricated.</p><p><strong>How to avoid it</strong>: Be <strong>professionally skeptical</strong>. Treat all outputs as a first draft. Always verify facts, test all code, and check any sources it provides.</p><h4>2. The Black Box Problem: Why vs. What</h4><p>We can make an LLM's output <strong>deterministic</strong> by setting a parameter called "temperature" to zero, meaning it will give the same output for the same input every time. So we can see <em>what</em> it did. However, we can't see <em>why</em> it chose one token over another in a way that is humanly understandable. The decision is a result of calculations across billions of parameters, not a logical decision tree we can audit.</p><p><strong>Why it matters</strong>: This makes it nearly impossible to debug why a model gives a strange answer. In high-stakes domains like finance or healthcare, it's difficult to trust a system when there is no transparent reasoning path.</p><div><hr></div><h3>Core Trade-offs: Free vs. Paid Models</h3><p>The difference between free and paid AI tools is not just about features; it's about the entire engine. The primary trade-off is <strong>cost vs. capability</strong>.</p><ul><li><p><strong>Underlying Model</strong>: Free tiers typically use older, smaller, and less powerful models. Paid tiers give you access to the flagship models.</p></li><li><p><strong>Context Window</strong>: Paid models have much larger context windows (e.g., 128k+ tokens vs. 4k-16k), allowing you to work with larger documents and maintain longer conversations.</p></li><li><p><strong>Reasoning Ability</strong>: Premium models are significantly better at following complex, multi-step instructions. Less capable models are more prone to "laziness"&#8212;giving simplified answers, writing placeholder code, or telling you to do it yourself.</p></li></ul><p>For simple tasks, a free model may suffice. For complex development work, the limitations of a less capable model can become a significant bottleneck.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zxVN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f85a36d-1c7b-4352-8f0b-64123b439909_523x204.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zxVN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f85a36d-1c7b-4352-8f0b-64123b439909_523x204.png 424w, https://substackcdn.com/image/fetch/$s_!zxVN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f85a36d-1c7b-4352-8f0b-64123b439909_523x204.png 848w, https://substackcdn.com/image/fetch/$s_!zxVN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f85a36d-1c7b-4352-8f0b-64123b439909_523x204.png 1272w, https://substackcdn.com/image/fetch/$s_!zxVN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f85a36d-1c7b-4352-8f0b-64123b439909_523x204.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zxVN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f85a36d-1c7b-4352-8f0b-64123b439909_523x204.png" width="523" height="204" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6f85a36d-1c7b-4352-8f0b-64123b439909_523x204.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:204,&quot;width&quot;:523,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32616,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.nikmalykhin.com/i/174322894?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f85a36d-1c7b-4352-8f0b-64123b439909_523x204.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zxVN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f85a36d-1c7b-4352-8f0b-64123b439909_523x204.png 424w, https://substackcdn.com/image/fetch/$s_!zxVN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f85a36d-1c7b-4352-8f0b-64123b439909_523x204.png 848w, https://substackcdn.com/image/fetch/$s_!zxVN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f85a36d-1c7b-4352-8f0b-64123b439909_523x204.png 1272w, https://substackcdn.com/image/fetch/$s_!zxVN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f85a36d-1c7b-4352-8f0b-64123b439909_523x204.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div><hr></div><h3>Conclusion: 4 Heuristics for Using AI Responsibly</h3><p>Generative AI is a powerful tool, not magic. Understanding its mechanism&#8212;next-token prediction within a limited context window&#8212;is key to using it well. To ensure we are using these tools in a safe, effective, and responsible way, our team should always ask four questions before starting a task:</p><ol><li><p><strong>Do we have permission?</strong> Is the use of AI approved for this task by both the client and our company's policies? This is a non-negotiable first step.</p></li><li><p><strong>Are we exposing sensitive data?</strong> Does the prompt contain any client secrets, personal information, or confidential data? The answer must be no.</p></li><li><p><strong>How will we verify the output?</strong> What is our strategy for human review and testing? Whether it's a peer code review or a QA testing plan, a verification process is essential.</p></li><li><p><strong>Is this the right tool for the job?</strong> Is the model's speed, cost, and capability a good fit for this task? This is about making a deliberate engineering trade-off.</p></li></ol><p>By embracing professional skepticism and applying these simple heuristics, we can move beyond the hype and begin using Generative AI as what it is: a powerful, imperfect, but profoundly useful new tool in our professional toolkit.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.nikmalykhin.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe to get more practical guides on using GenAI tools effectively in software development work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Jules, My AI Junior Developer]]></title><description><![CDATA[Here&#8217;s a question that emerged from my recent work with AI coding agents: to get better, more autonomous results, do you need to treat the AI less like a senior peer and more like a junior developer?]]></description><link>https://www.nikmalykhin.com/p/jules-my-ai-junior-developer</link><guid isPermaLink="false">https://www.nikmalykhin.com/p/jules-my-ai-junior-developer</guid><dc:creator><![CDATA[Nik]]></dc:creator><pubDate>Mon, 15 Sep 2025 19:36:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-Ojx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d27381-c618-42b7-a15f-62e1d625e22d_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Here&#8217;s a question that emerged from my recent work with AI coding agents: to get better, more autonomous results, do you need to treat the AI less like a senior peer and more like a junior developer?</p><p>I recently spent time experimenting with Google's Jules, an AI agent designed to operate with a high degree of autonomy. The initial assumption was that I could delegate a series of tasks to a "senior" AI, provide it with a well-documented repository and clear instructions, and expect efficient execution. The experiment, however, surfaced a different, more nuanced reality about the operational model required to work effectively with today's AI agents.<br><br>My plan was to partition the work for a new project, <a href="https://github.com/nikmalykhin-tw/jules-foundation/tree/main">jules-foundation</a>, in a collaborative way:</p><ul><li><p><strong>Backend:</strong> Fully delegated to Jules.</p></li><li><p><strong>CI/CD:</strong> I would initiate the setup, and Jules would continue it.</p></li><li><p><strong>Frontend:</strong> A "ping-pong" approach where Jules would start, I would take over for a task using VSCode and GitHub Copilot, and then hand it back.</p></li></ul><p>This process was underpinned by a detailed <a href="https://github.com/nikmalykhin-tw/jules-foundation/blob/main/AGENTS.md">AGENTS.md</a> file, which codified principles from foundational software engineering texts to guide the agent's behavior. The results were illuminating, but not for the reasons I expected.</p><div><hr></div><h2>The Experiment's Stumbles</h2><p>The initial attempts at delegation quickly ran into issues that revealed the agent's limitations, not in its ability to write code, but in its judgment and awareness of context.</p><h3>1. The Hallucinating Generalist</h3><p>In the very first backend task&#8212;setting up a Kotlin and Micronaut project&#8212;Jules briefly defaulted to a completely different stack, attempting to implement the solution using Python and Poetry. It seemed to fall back on its generalized training data, where Python is a common choice for initial project setups. To its credit, the agent caught its own mistake and asked for confirmation before proceeding, but it was a stark reminder that even with specific instructions, the agent can be swayed by the statistical weight of its training data. It behaves like a junior developer who has a lot of theoretical knowledge but lacks the experience to apply it consistently in a specific context.</p><h3>2. The Context Pollution Problem</h3><p>My most significant error was continuing with the frontend task (<a href="https://github.com/nikmalykhin-tw/jules-foundation/issues/3">Task 3</a>) in the same chat I used for the backend (Task 1). After the first task was completed and merged, the <code>main</code> branch of the repository was updated. However, Jules, operating within its isolated chat context, was working off a stale version of the repository.</p><p>When asked to proceed, its lack of environmental awareness became clear. It stated: "I do not have a direct git pull command. My process is to complete the work and then use submit to propose the changes." Its proposed solution was to start over, re-implementing <em>both the backend and frontend tasks</em> from scratch. This demonstrated that long-running conversations spanning multiple, distinct tasks are unworkable. The context from previous work pollutes the agent's understanding of the current state.</p><h3>3. The Over-Eager Assistant</h3><p>During the frontend task, the instructions specified using simple HTML, Tailwind CSS, and Alpine.js, with Vite mentioned as an <em>optional</em> tool. Jules immediately planned to set up a full Vite project, concluding this was "the most professional and efficient way to approach this task." While a reasonable conclusion for a human engineer, it was a deviation from the core requirement of simplicity. It prioritized an optimized solution over adhering to the task's constraints, forcing me to update the <code>AGENTS.md</code> file with a strict <strong>"Technology Constraint Mandate"</strong> to prevent such deviations.</p><div><hr></div><h2>An Effective Operating Model</h2><p>Through these failures, a more effective workflow emerged. It centered on providing a rigid, well-defined operational framework rather than relying on the agent's "senior" judgment.</p><h3>1. One Task, One Context</h3><p>The <code>git pull</code> fiasco taught me the most important lesson: <strong>every new task requires a new, clean context</strong>. The effective workflow is atomic and mirrors standard development practice:</p><ol><li><p>Start a new "Jules task" for each new GitHub issue.</p></li><li><p>Provide the prompt, linking to the repository and the specific issue.</p></li><li><p>Let the agent fork the <em>current</em> main branch, implement the changes, and submit a pull request.</p></li><li><p>Review, merge, and close the task.</p></li><li><p>Repeat from step 1 for the next issue.</p></li></ol><p>This approach prevents context pollution and also mitigates the risk of git conflicts, as the agent is never in a position where it needs to reconcile its work with other changes made in parallel. The tasks must be designed to be sequential and independent.</p><h3>2. Define the Goal, Not Every Step</h3><p>My most successful interaction was with the first task, where the goal was clear and concise: "Create a Kotlin-based Micronaut application with a single GET endpoint that returns 'Hello, World!'." I didn't over-specify the steps, which allowed the agent to complete the task in just 7 minutes.</p><p>In contrast, my more prescriptive frontend task created blind spots. By trying to detail the steps, I inadvertently omitted small but crucial details, leading to initial friction. The key is to provide a <strong>clear objective and firm constraints</strong> but grant the agent the autonomy to handle the implementation details within those boundaries.</p><h3>3. The "Rules" Are the Scaffolding</h3><p>The foundational <code>AGENTS.md</code> file, which summarized core software engineering principles, was critical. Much like a <a href="https://nik1379616.substack.com/p/can-we-make-ai-code-assistants-smarter">well-crafted context can steer GitHub Copilot's suggestions</a>, these initial instructions act as a firm scaffolding for the agent's behavior. When failures occurred, I didn't just correct the agent in the chat; I updated the foundational rules. This ensures the learning is persistent and benefits all future tasks.</p><div><hr></div><h2>Unexpected Discovery: Simulating a Workflow</h2><p>A fascinating insight was how to enforce quality gates without giving the agent direct access to our environment. Jules can't <em>run</em> a pre-commit hook, but it can be instructed to <strong>simulate one</strong>.</p><p>I created a <strong>"Pre-Flight Simulation"</strong> mandate in the rules. Before submitting code, the agent must:</p><ol><li><p>Analyze the project's pre-commit configuration files.</p></li><li><p>Mentally review its generated code against every check defined in those files.</p></li><li><p>Provide a report confirming it performed the simulation.</p></li></ol><p>This approach improves code quality and reduces the cost of failed CI runs by shifting quality checks earlier in the process, even if only in simulation.</p><div><hr></div><h2>The Core Insight: Constraints Unlock Autonomy</h2><p>This leads to the core realization: <strong>to unlock the autonomy of an AI agent, you must constrain it with a rigid, machine-readable process.</strong></p><p>You can't treat it like a senior engineer with whom you can have a nuanced conversation. You have to manage it like a brilliant, lightning-fast, but utterly naive junior developer. It needs a "manager" to provide:</p><ul><li><p><strong>A Clear Definition of Done:</strong> The GitHub issue.</p></li><li><p><strong>Strict Rules of Engagement:</strong> The <code>AGENTS.md</code> file.</p></li><li><p><strong>An Isolated Work Environment:</strong> A new task for each new unit of work.</p></li></ul><p>The agent's value isn't in its judgment or experience, but in its speed and its ability to flawlessly execute a well-defined process within a tightly controlled environment.</p><div><hr></div><h2>Conclusion: We Are Becoming Architects of AI Workflows</h2><p>My experiment with Jules was a success, though not in the way I initially envisioned. The true leverage of these tools isn't just in code generation&#8212;it's in <strong>automating a workflow</strong>.</p><p>The real engineering work is shifting from pure implementation to architecting the system of rules, constraints, and processes that guide the AI. This has implications beyond just how developers work. It elevates the importance of the <strong>Business Analyst</strong> function, as creating well-defined, atomized, and unambiguous tasks is now a prerequisite for effective AI delegation. We must not only learn new skills for interacting with AI but also adapt our entire development workflow to match the capabilities of these new tools. The future isn't about replacing developers but about providing them with powerful new forms of leverage, provided we are willing to become the architects and managers of our new AI team members.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.nikmalykhin.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe to get more practical guides on using GenAI tools effectively in software development work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Does More Powerful AI Mean Slower Fixes?]]></title><description><![CDATA[Is it possible that our most advanced AI coding assistants are actually slowing us down?]]></description><link>https://www.nikmalykhin.com/p/does-more-powerful-ai-mean-slower</link><guid isPermaLink="false">https://www.nikmalykhin.com/p/does-more-powerful-ai-mean-slower</guid><dc:creator><![CDATA[Nik]]></dc:creator><pubDate>Tue, 26 Aug 2025 15:16:04 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-Ojx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d27381-c618-42b7-a15f-62e1d625e22d_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Is it possible that our most advanced AI coding assistants are actually slowing us down? This question felt absurd as my team was heads-down, polishing our UI for a major release. We were in the final stretch, tackling a long list of small, cosmetic changes&#8212;the kind of work that should be quick. Yet, I found my workflow clogged, not by the complexity of the tasks, but by the "helpfulness" of my AI partner.</p><div><hr></div><h3>My Setup: The Final Polish</h3><p>Our environment was standard: a React codebase, a Git workflow with peer reviews, and an integrated AI coding assistant. My goal was to rapidly work through a backlog of minor UI tickets. Any UI update is a form of refactoring, and for that, I strictly follow a philosophy of making changes in what Javi L&#243;pez aptly calls "<a href="https://www.google.com/search?q=https://javil.substack.com/p/a-lot-of-tiny-steps-16eaac27acb4">a lot of tiny steps</a>," a pattern also known in classic terms as <a href="https://wiki.c2.com/?RefactoringInVerySmallSteps">Refactoring In Very Small Steps</a>. This ensures each commit is atomic and easy for my teammates to review. I was relying on the AI&#8217;s "Agent mode"&#8212;<strong>its capability to autonomously modify the codebase</strong>&#8212;expecting it to align with this micro-step approach. The reality was quite different.</p><div><hr></div><h3>When 'Help' Became a Hindrance</h3><p>The core problem was that the AI agent consistently over-engineered solutions for trivial problems. It treated every request for a small change as an invitation to refactor the entire component. This isn't a failure of intelligence, but a <strong>misalignment of goals</strong>: my goal was a minimal diff, whereas the agent's goal is often holistic file correctness, aiming to fix all potential issues it identifies in one pass. Crucially, even when I gave it explicit, TDD-style instructions to <em>only</em> perform a single, minimal action, it still defaulted to making broad, sweeping changes.</p><h4>Example 1: A Simple CSS Tweak</h4><p>I needed to make a submit button full-width on mobile devices. A straightforward task.</p><p>The fix that was actually needed:</p><p>CSS</p><pre><code><code>@media (max-width: 50rem) {
  .formSubmitMobileWrapper button {
    width: 100%;
  }
}
</code></code></pre><p>I prompted the AI agent: "<em>Only add a new media query for screens under 50rem to the </em><code>.formSubmitMobileWrapper button</code><em> class to set its width to 100%. Do not touch any other code.</em>"</p><p>Despite the clear instruction, the agent generated a massive diff, rewriting existing desktop styles and restructuring the entire CSS class.</p><ul><li><p><strong>Time Wasted:</strong> I spent 15 minutes untangling the AI's suggestion, versus the 2 minutes it would have taken to write the CSS myself.</p></li><li><p><strong>Quality Issues:</strong> The generated code created a high cognitive load for code review. A teammate would have to ask, "Why did we refactor all the button styles just to change one mobile property?"</p></li><li><p><strong>Structural Problems:</strong> This approach created bloated commits, making our Git history noisy and directly violating the "very small steps" principle.</p></li></ul><h4>Example 2: A Minor Accessibility Improvement</h4><p>Next, I picked up a ticket to improve the accessibility of our card components. Again, I gave a precise instruction: "<em>Add a </em><code>role='region'</code><em> attribute to the parent div of the Card component.</em>"</p><p>Instead of a one-line change, the agent tried to rewrite half the component's JSX structure, arguing it was for "better semantic clarity" and completely ignoring my focused instruction.</p><div><hr></div><h3>Principles That Actually Work</h3><p>This friction forced me to re-evaluate how I was using the tool. I realized the key is to <strong>match the tool's capability to the task's scope</strong>. This led me to two guiding principles.</p><h4>1. Use AI Chat for Suggestions, Not Implementation</h4><p>For micro-changes, the AI's "Chat mode" is far more effective. By treating it as a context-aware search engine, I can ask for targeted advice.</p><ul><li><p><strong>Prompt:</strong> "<em>What's the best CSS to make this button full-width on mobile?</em>"</p></li><li><p><strong>Result:</strong> It gives me the precise, minimal code snippet I need. I copy, paste, and commit. The change is atomic and review is trivial.</p></li></ul><p>This keeps the developer in control and prevents the AI from making unsolicited "improvements." The benefits are clear: smaller pull requests and faster review cycles. This aligns with research from <a href="https://www.faros.ai/blog/ai-software-engineering">Faros AI</a>, which notes that while AI can boost individual developer throughput, it often leads to ballooning review queues. I've written more about this in my article, "<a href="https://nik1379616.substack.com/p/can-we-make-ai-code-assistants-smarter">Can we make AI code assistants smarter?</a>".</p><h4>2. Reserve AI Agents for Scaffolding and True Refactoring</h4><p>The autonomous "Agent mode" is incredibly powerful, but its strength lies in larger, well-defined tasks, not surgical strikes.</p><ul><li><p><strong>Good use case:</strong> "<em>Create a new React component for a user profile page with an avatar, name, and bio section. Include Storybook stories and a basic test.</em>"</p></li><li><p><strong>Bad use case:</strong> "<em>Add a </em><code>margin-top</code><em> to the avatar in the user profile component.</em>"</p></li></ul><p>Using an agent is best when the expected outcome is a significant amount of new or changed code.</p><p><em>This simple matrix illustrates the core principle: for small-scoped tasks, a suggestion-based AI interaction is most effective, while large-scoped tasks are better suited for autonomous AI execution.</em></p><div><hr></div><h3>Unexpected Discovery: AI Forced Me to Define "Small"</h3><p>The most surprising insight was that the AI forced me to be more precise in defining a "small change." My heuristic is now this: <strong>if the task's description is longer than the code I expect to write, it will be good to use Chat mode.</strong></p><p>A task like "Make the button full-width on mobile" is a perfect example. The description is simple, and the code is just a few lines. The AI agent, however, interprets this as a symptom of a larger problem ("This component is not fully responsive") and tries to solve that instead. This mental checkpoint prevents me from accidentally turning a 5-minute task into a 30-minute ordeal.</p><div><hr></div><h3>The Autonomy vs. Precision Trade-Off</h3><p>This leads to a central, counterintuitive truth: <strong>the more autonomy you grant an AI coding assistant, the less precision you may get for small, targeted tasks.</strong></p><p>This isn't a paradox; it's a trade-off. Autonomous agents are optimized for holistic correctness. They don't just see the three lines of CSS you want to add; they see the entire file and its potential imperfections. Their goal is to bring the whole file into a state of grace, which directly conflicts with the goal of making a minimal, targeted change.</p><p>Effective use, therefore, requires the developer to:</p><ul><li><p><strong>Explicitly define the scope</strong> of the change before starting.</p></li><li><p><strong>Choose the right mode</strong> for the job (Chat vs. Agent).</p></li><li><p><strong>Maintain control</strong> and view the AI as a suggester, not an infallible executor, for routine work.</p></li></ul><div><hr></div><h3>A More Thoughtful Partnership</h3><p>My journey through pre-release UI tweaks taught me a crucial lesson. AI coding tools aren't a simple "on/off" switch for productivity. They are a suite of capabilities, each with an appropriate use case. An autonomous agent is a powerful ally for building new things from the ground up, but for the delicate art of finishing and polishing, a simple chat-based suggestion is often faster, cleaner, and more respectful of my teammates' time. The real skill in this new era of software development is not just in writing clever prompts, but in having the wisdom to choose the right tool for the job.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.nikmalykhin.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe to get more practical guides on using GenAI tools effectively in software development work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>