Jules, My AI Junior Developer
Here’s a question that emerged from my recent work with AI coding agents: to get better, more autonomous results, do you need to treat the AI less like a senior peer and more like a junior developer?
I recently spent time experimenting with Google's Jules, an AI agent designed to operate with a high degree of autonomy. The initial assumption was that I could delegate a series of tasks to a "senior" AI, provide it with a well-documented repository and clear instructions, and expect efficient execution. The experiment, however, surfaced a different, more nuanced reality about the operational model required to work effectively with today's AI agents.
My plan was to partition the work for a new project, jules-foundation, in a collaborative way:
Backend: Fully delegated to Jules.
CI/CD: I would initiate the setup, and Jules would continue it.
Frontend: A "ping-pong" approach where Jules would start, I would take over for a task using VSCode and GitHub Copilot, and then hand it back.
This process was underpinned by a detailed AGENTS.md file, which codified principles from foundational software engineering texts to guide the agent's behavior. The results were illuminating, but not for the reasons I expected.
The Experiment's Stumbles
The initial attempts at delegation quickly ran into issues that revealed the agent's limitations, not in its ability to write code, but in its judgment and awareness of context.
1. The Hallucinating Generalist
In the very first backend task—setting up a Kotlin and Micronaut project—Jules briefly defaulted to a completely different stack, attempting to implement the solution using Python and Poetry. It seemed to fall back on its generalized training data, where Python is a common choice for initial project setups. To its credit, the agent caught its own mistake and asked for confirmation before proceeding, but it was a stark reminder that even with specific instructions, the agent can be swayed by the statistical weight of its training data. It behaves like a junior developer who has a lot of theoretical knowledge but lacks the experience to apply it consistently in a specific context.
2. The Context Pollution Problem
My most significant error was continuing with the frontend task (Task 3) in the same chat I used for the backend (Task 1). After the first task was completed and merged, the main
branch of the repository was updated. However, Jules, operating within its isolated chat context, was working off a stale version of the repository.
When asked to proceed, its lack of environmental awareness became clear. It stated: "I do not have a direct git pull command. My process is to complete the work and then use submit to propose the changes." Its proposed solution was to start over, re-implementing both the backend and frontend tasks from scratch. This demonstrated that long-running conversations spanning multiple, distinct tasks are unworkable. The context from previous work pollutes the agent's understanding of the current state.
3. The Over-Eager Assistant
During the frontend task, the instructions specified using simple HTML, Tailwind CSS, and Alpine.js, with Vite mentioned as an optional tool. Jules immediately planned to set up a full Vite project, concluding this was "the most professional and efficient way to approach this task." While a reasonable conclusion for a human engineer, it was a deviation from the core requirement of simplicity. It prioritized an optimized solution over adhering to the task's constraints, forcing me to update the AGENTS.md
file with a strict "Technology Constraint Mandate" to prevent such deviations.
An Effective Operating Model
Through these failures, a more effective workflow emerged. It centered on providing a rigid, well-defined operational framework rather than relying on the agent's "senior" judgment.
1. One Task, One Context
The git pull
fiasco taught me the most important lesson: every new task requires a new, clean context. The effective workflow is atomic and mirrors standard development practice:
Start a new "Jules task" for each new GitHub issue.
Provide the prompt, linking to the repository and the specific issue.
Let the agent fork the current main branch, implement the changes, and submit a pull request.
Review, merge, and close the task.
Repeat from step 1 for the next issue.
This approach prevents context pollution and also mitigates the risk of git conflicts, as the agent is never in a position where it needs to reconcile its work with other changes made in parallel. The tasks must be designed to be sequential and independent.
2. Define the Goal, Not Every Step
My most successful interaction was with the first task, where the goal was clear and concise: "Create a Kotlin-based Micronaut application with a single GET endpoint that returns 'Hello, World!'." I didn't over-specify the steps, which allowed the agent to complete the task in just 7 minutes.
In contrast, my more prescriptive frontend task created blind spots. By trying to detail the steps, I inadvertently omitted small but crucial details, leading to initial friction. The key is to provide a clear objective and firm constraints but grant the agent the autonomy to handle the implementation details within those boundaries.
3. The "Rules" Are the Scaffolding
The foundational AGENTS.md
file, which summarized core software engineering principles, was critical. Much like a well-crafted context can steer GitHub Copilot's suggestions, these initial instructions act as a firm scaffolding for the agent's behavior. When failures occurred, I didn't just correct the agent in the chat; I updated the foundational rules. This ensures the learning is persistent and benefits all future tasks.
Unexpected Discovery: Simulating a Workflow
A fascinating insight was how to enforce quality gates without giving the agent direct access to our environment. Jules can't run a pre-commit hook, but it can be instructed to simulate one.
I created a "Pre-Flight Simulation" mandate in the rules. Before submitting code, the agent must:
Analyze the project's pre-commit configuration files.
Mentally review its generated code against every check defined in those files.
Provide a report confirming it performed the simulation.
This approach improves code quality and reduces the cost of failed CI runs by shifting quality checks earlier in the process, even if only in simulation.
The Core Insight: Constraints Unlock Autonomy
This leads to the core realization: to unlock the autonomy of an AI agent, you must constrain it with a rigid, machine-readable process.
You can't treat it like a senior engineer with whom you can have a nuanced conversation. You have to manage it like a brilliant, lightning-fast, but utterly naive junior developer. It needs a "manager" to provide:
A Clear Definition of Done: The GitHub issue.
Strict Rules of Engagement: The
AGENTS.md
file.An Isolated Work Environment: A new task for each new unit of work.
The agent's value isn't in its judgment or experience, but in its speed and its ability to flawlessly execute a well-defined process within a tightly controlled environment.
Conclusion: We Are Becoming Architects of AI Workflows
My experiment with Jules was a success, though not in the way I initially envisioned. The true leverage of these tools isn't just in code generation—it's in automating a workflow.
The real engineering work is shifting from pure implementation to architecting the system of rules, constraints, and processes that guide the AI. This has implications beyond just how developers work. It elevates the importance of the Business Analyst function, as creating well-defined, atomized, and unambiguous tasks is now a prerequisite for effective AI delegation. We must not only learn new skills for interacting with AI but also adapt our entire development workflow to match the capabilities of these new tools. The future isn't about replacing developers but about providing them with powerful new forms of leverage, provided we are willing to become the architects and managers of our new AI team members.