Diagnosing Observability Gaps in Blocking Controller Methods

May 12, 2026

In a distributed system, the invisibility of an expected log entry often signals a deeper divergence between execution flow and infrastructure expectations. During a recent implementation of a test email functionality within a Kotlin-based service, I encountered a scenario where logs in Datadog appeared for certain execution paths but remained absent for others. This inconsistency prompted an investigation into the interaction between the Kotlin when expression, blocking downstream calls, and the lifecycle of a request within the Datadog logging pipeline.

The target of this investigation was the sendTestEmail method located in the TestEmailController. The domain logic returns three distinct results: Success, FeatureTurnedOff, and Error. While the FeatureTurnedOff case consistently produced logs in the monitoring dashboard, the Success and Error outcomes frequently failed to emit the final confirmation log.

Analyzing the Execution Flow

The initial hypothesis centered on potential issues with the Kotlin when block or a misconfiguration of the Mapped Diagnostic Context (MDC). However, the technical finding revealed a more fundamental cause related to execution timing and the nature of the downstream service interaction.

The FeatureTurnedOff result is a short-circuit path. When the feature toggle is disabled, the use case returns a result immediately, allowing the controller to reach the final log statement and exit within a negligible timeframe. Conversely, both the Success and Error paths require a call to a downstream notification service. This call is implemented using a blocking mechanism via the .block() method on a reactive stream.

The discrepancy in log visibility was not a failure of the logging library but a consequence of the controller thread waiting on a blocking call. If the downstream service experienced latency or if the client closed the connection before the call completed, the final log statement was never reached or recorded.

This behavior was corroborated by Datadog errors indicating that the stream was closed by the client and that there were errors reading events. In environments utilizing the ssm-agent-worker, these interruptions can occur when the infrastructure or the initiating client terminates the request context before the application finishes its blocking operation.

Implementing a Robust Logging Lifecycle

To resolve the visibility gap, I restructured the logging strategy to separate request arrival from processing outcomes. By introducing a log statement immediately upon entry to the controller method, I ensured that a record exists regardless of how the downstream call performs.

The revised implementation follows a deliberate pattern of enrichment and cleanup. I utilized MDC to attach structured metadata to the log records, which facilitates precise filtering in Datadog. It is essential to avoid generic MDC keys such as status, as these often conflict with reserved fields or common conventions in log aggregators. Instead, I opted for specific identifiers like testEmailOutcome and templateId.

Structured Implementation and MDC Hygiene

The following structure ensures that the MDC is populated at the start of the request and, crucially, cleared in a finally block to prevent context leakage between threads.

try {
    MDC.put("templateId", request.templateId)
    logger.info("Test email request received")

    val status =
        when (val result = sendTestEmailUseCase.execute(request.templateId)) {
            is SendTestEmailResult.Success -> {
                MDC.put("testEmailOutcome", "test email was sent successfully")
                HttpStatus.CREATED
            }

            is SendTestEmailResult.FeatureTurnedOff -> {
                MDC.put("testEmailOutcome", "feature toggle is off, test email was not sent")
                HttpStatus.ACCEPTED
            }

            is SendTestEmailResult.Error -> {
                MDC.put("testEmailOutcome", "test email failed to send")
                MDC.put("testEmailErrorMessage", result.cause.message ?: "unknown error")
                HttpStatus.INTERNAL_SERVER_ERROR
            }
        }

    logger.info("Test email request processed")

    return ResponseEntity
        .status(status)
        .body(SendTestEmailResponse(templateId = request.templateId))
} finally {
    MDC.clear()
}

This approach provides a clear narrative in the logs. The Test email request received log serves as a heartbeat, confirming the controller was reached. The final Test email request processed log confirms the blocking call completed and indicates which branch of the when logic was executed.

Interpreting Downstream Service Signals

Understanding the relationship between the application and the notification service is vital for interpreting the logs. For instance, an observed HTTP 400 Bad Request error from the notification service endpoint indicates that the feature toggle was active and the application successfully initiated the call. Because this is a terminal error from the downstream provider, the result maps to SendTestEmailResult.Error.

Logging the specific error message from the result cause into a dedicated MDC field allows for immediate debugging of downstream rejections without requiring a manual trace of the network call.

The introduction of the early log statement fixed the observability issue for all three execution paths. It provides a reliable controller-level record that the request was received before any slow or failing downstream behavior could interfere with the logging thread.

Conclusion on Implementation Choices

The decision to add a pre-call log and wrap the execution in a try-finally block was a logical response to the constraints of blocking I/O. While reactive, non-blocking patterns are often preferred, existing architectural constraints sometimes necessitate the use of .block(). In such cases, the primary responsibility of the developer is to ensure that the system remains observable even when execution is stalled.

By grounding the logging strategy in the lifecycle of the request rather than just the final outcome, I established a more resilient monitoring posture. The logs now clearly differentiate between request arrival, downstream processing, and final controller outcome, providing the necessary context to diagnose failures in a distributed environment.

Nik Malykhin

Discussion about this post

Ready for more?