Designing a Duplicate Protection Hashing Service
In a recent architectural assignment, I was tasked with implementing a duplicate-protection data hashing service for a high-throughput Kotlin application. This service operates within an enterprise runtime environment utilizing Java 21 and is deployed on AWS ECS. The fundamental requirement was to convert incoming JSON payloads into a stable, unique string representation to prevent duplicate processing within the system pipeline.
Initially, I gravitated toward the MD5 algorithm. This selection was informed by a historical performance convention, assuming MD5 would minimize latency for an internal data validation check where cryptographic security was not a driving requirement. Because I routinely leverage generative artificial intelligence to streamline production workflows, I provided an AI assistant with a prompt detailing these exact constraints: a high-throughput Kotlin service tasked with producing an MD5 hash from an incoming JSON payload.
The Baseline AI-Generated Implementation
The generative AI model delivered exactly what was requested based on those initial assumptions. It produced a complete, functional component wrapper that mapped incoming data to an MD5 hash. Because the initial prompt did not ask the AI to evaluate potential platform regressions or optimize memory allocations, the model generated a literal translation of the requested functional requirements, complete with standard boilerplate utilities.
import com.fasterxml.jackson.databind.JsonNode
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.databind.node.ArrayNode
import com.fasterxml.jackson.databind.node.ObjectNode
import org.springframework.stereotype.Component
import java.nio.charset.StandardCharsets
import java.security.MessageDigest
@Component
class HashGenerator(
private val objectMapper: ObjectMapper,
) {
fun generate(
conversationId: String?,
payload: JsonNode,
): String {
val normalizedConversationId = conversationId.orEmpty()
val canonicalPayloadBytes = objectMapper.writeValueAsBytes(canonicalize(payload))
val conversationBytes = normalizedConversationId.toByteArray(StandardCharsets.UTF_8)
val md5 = MessageDigest.getInstance(MD5_ALGORITHM)
md5.update(conversationBytes)
md5.update(canonicalPayloadBytes)
return md5.digest().toHexString()
}
private fun ByteArray.toHexString(): String {
val chars = CharArray(size * 2)
forEachIndexed { index, byte ->
val value = byte.toInt() and 0xFF
chars[index * 2] = HEX_CHARS[value ushr 4]
chars[(index * 2) + 1] = HEX_CHARS[value and 0x0F]
}
return String(chars)
}
private fun canonicalize(node: JsonNode): JsonNode =
when {
node.isObject -> {
val objectNode = node as ObjectNode
val sortedFields =
objectNode
.fields()
.asSequence()
.toList()
.sortedBy { it.key }
val canonicalObject = objectMapper.nodeFactory.objectNode()
sortedFields.forEach { (key, value) ->
canonicalObject.set<JsonNode>(key, canonicalize(value))
}
canonicalObject
}
node.isArray -> {
val arrayNode = node as ArrayNode
val canonicalArray = objectMapper.nodeFactory.arrayNode()
arrayNode.forEach { item ->
canonicalArray.add(canonicalize(item))
}
canonicalArray
}
else -> node
}
private companion object {
const val MD5_ALGORITHM = "MD5"
val HEX_CHARS = "0123456789abcdef".toCharArray()
}
}The Conflict of Speed Versus Security
While the code executed correctly in testing environment, it triggered a critical security flag during static code analysis in SonarQube. Our internal security champion mandated an immediate transition to SHA-256, citing systemic software vulnerabilities associated with MD5 collision risks. This requirement instigated a broader team discussion regarding the trade-offs between processing speed and cryptographic security within microservices.
To resolve this conflict, I conducted a deeper investigation into the execution paths of the hashing utility. The findings completely reframed the problem space. On a modern Java 21 runtime running on optimized cloud infrastructure, the execution variance between MD5 and SHA-256 is structurally negligible. The true computational bottlenecks were located within the data preprocessing layers rather than the mathematical operations of the message digest.
Key Insight: Upgrading an algorithm to meet security compliance parameters rarely degrades system performance if the surrounding data manipulation logic remains unoptimized. The true latency hotspots frequently reside in object serialization and memory allocation patterns.
Identifying the True Microbenchmarking Hotspots
The profiling data isolated three specific architectural execution risks within the original code structure:
The canonicalization routine introduced deep recursion. Converting JSON fields into sequences, collecting them into lists, and sorting them generated an unsustainable volume of short-lived heap objects. This structure risks triggering frequent JVM Garbage Collection pauses under high throughput.
Jackson serialization via the writeValueAsBytes function consumed substantially more CPU cycles than any subsequent hashing operation. Transforming a newly instantiated object graph into a raw byte array is computationally expensive.
The manual byte-to-hex manipulation loop, while functional, missed the low-level optimizations provided by modern platform utilities.
Implementing Immediate Algorithmic and Structural Upgrades
The first step involved addressing the security non-compliance while cleaning up the obvious inefficiencies. I replaced the manual hex encoding with the native HexFormat utility introduced in Java 17 and further optimized in Java 21. Concurrently, I transitioned the algorithm to SHA-256, which leverages hardware acceleration on contemporary processors.
import com.fasterxml.jackson.databind.JsonNode
import com.fasterxml.jackson.databind.ObjectMapper
import org.springframework.stereotype.Component
import java.security.MessageDigest
import java.util.HexFormat
@Component
class HashGenerator(
private val objectMapper: ObjectMapper,
) {
private val hexFormatter = HexFormat.of()
fun generate(
conversationId: String?,
payload: JsonNode,
): String {
val normalizedConversationId = conversationId.orEmpty()
val canonicalPayloadBytes = objectMapper.writeValueAsBytes(canonicalize(payload))
val conversationBytes = normalizedConversationId.toByteArray(java.nio.charset.StandardCharsets.UTF_8)
val sha256 = MessageDigest.getInstance(SHA256_ALGORITHM)
sha256.update(conversationBytes)
sha256.update(canonicalPayloadBytes)
return hexFormatter.formatHex(sha256.digest())
}
}Optimizing the Canonicalization Routine
Resolving the security alert was an essential compliance milestone, but achieving production-grade execution required a total refactoring of the canonicalize function. To eliminate high allocation rates and latency spikes, I rewrote the structural transformation logic to treat heap memory defensively.
import com.fasterxml.jackson.databind.JsonNode
import com.fasterxml.jackson.databind.node.ArrayNode
import com.fasterxml.jackson.databind.node.ObjectNode
import java.util.TreeMap
private fun canonicalize(node: JsonNode): JsonNode =
when {
node.isObject -> {
val sortedMap = TreeMap<String, JsonNode>()
val fieldsIterator = node.fields()
while (fieldsIterator.hasNext()) {
val entry = fieldsIterator.next()
sortedMap[entry.key] = canonicalize(entry.value)
}
ObjectNode(objectMapper.nodeFactory, sortedMap)
}
node.isArray -> {
val canonicalArray = objectMapper.nodeFactory.arrayNode(node.size())
for (item in node) {
canonicalArray.add(canonicalize(item))
}
canonicalArray
}
else -> node
}Strategic Improvements in Memory Management
The architectural enhancements within the refactored canonicalization pipeline are governed by four distinct design choices across key subsections.
Automatic Sorting via TreeMap
The original logic explicitly pulled object fields into a Kotlin sequence, forced them into a temporary list, and executed a sorting lambda. The optimized approach streams fields directly into a java.util.TreeMap. Operating on a red-black tree architecture, the TreeMap inherently handles alphabetical key sorting upon element insertion, completely eliminating intermediate collection lifecycles.
Elimination of Lambda Allocations
Chains of functional methods like asSequence, toList, and sortedBy generate short-lived operational objects behind the scenes. In a high-throughput Spring Boot architecture, these objects increase the allocation rate and burden the garbage collector. Replacing functional abstractions with explicit while and for loops guarantees zero closure allocations inside the iteration logic.
Pre-Sized Array Allocation
The initial implementation initialized the array node wrapper using an empty factory declaration. By default, Jackson instantiates an underlying storage array with a conservative capacity constraint. When parsing highly populated arrays, the JVM is forced to repeatedly suspend execution to reallocate memory and migrate elements. Explicitly defining the initialization size via node.size prepares the exact memory requirements upfront.
Direct Constructor Instantiation
The default initialization sequence of a Jackson ObjectNode instantiates an internal LinkedHashMap before receiving data updates via the set method. The revised approach utilizes a public constructor that directly accepts the pre-populated TreeMap, reducing the required object instantiation operations by half.
The Production-Ready Hash Service
Combining these algorithmic upgrades and memory management adjustments results in a secure, performant, and enterprise-grade component.
import com.fasterxml.jackson.databind.JsonNode
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.databind.node.ArrayNode
import com.fasterxml.jackson.databind.node.ObjectNode
import org.springframework.stereotype.Component
import java.security.MessageDigest
import java.util.HexFormat
import java.util.TreeMap
@Component
class HashGenerator(
private val objectMapper: ObjectMapper,
) {
private val hexFormatter = HexFormat.of()
fun generate(
conversationId: String?,
payload: JsonNode,
): String {
val normalizedConversationId = conversationId.orEmpty()
val canonicalPayloadBytes = objectMapper.writeValueAsBytes(canonicalize(payload))
val conversationBytes = normalizedConversationId.toByteArray(java.nio.charset.StandardCharsets.UTF_8)
val sha256 = MessageDigest.getInstance(SHA256_ALGORITHM)
sha256.update(conversationBytes)
sha256.update(canonicalPayloadBytes)
return hexFormatter.formatHex(sha256.digest())
}
private fun canonicalize(node: JsonNode): JsonNode =
when {
node.isObject -> {
val sortedMap = TreeMap<String, JsonNode>()
val fieldsIterator = node.fields()
while (fieldsIterator.hasNext()) {
val entry = fieldsIterator.next()
sortedMap[entry.key] = canonicalize(entry.value)
}
ObjectNode(objectMapper.nodeFactory, sortedMap)
}
node.isArray -> {
val canonicalArray = objectMapper.nodeFactory.arrayNode(node.size())
for (item in node) {
canonicalArray.add(canonicalize(item))
}
canonicalArray
}
else -> node
}
private companion object {
const val SHA256_ALGORITHM = "SHA-256"
}
}The Intersect of Generative Artificial Intelligence and Enterprise Engineering
This optimization exercise highlights a critical reality regarding the application of generative artificial intelligence within enterprise software development. The initial AI-generated code was not technically broken. It accurately realized the precise constraints of the original prompt: it calculated an MD5 hash over an object payload. The defect was rooted in my own outdated assumptions about cryptographic overhead and the omission of strict platform analysis in the initial prompt requirements.
Takeaway: Generative artificial intelligence operates as an exceptional execution mechanism, but it lacks the contextual capacity to independently enforce enterprise-grade performance boundaries without human engineering overwatch. The true value of the technology lies in its capacity to serve as an interactive learning accelerator, contracting traditional research cycles from hours down to a matter of minutes.
