Claude Code, Codex & Gemini CLI: Shipping Agent-Written Code to Production

A new class of developer tool has gone mainstream in the last year: the agentic coding harness. Instead of autocompleting a line, these tools take a goal, read your codebase, run commands, edit files across the project and iterate until the task is done. The best known are Claude Code, OpenAI Codex, Gemini CLI, Cursor and Replit. They are astonishingly productive. They also quietly move the bottleneck: writing code is no longer the hard part - getting agent-written code to production safely is.

What is an agentic coding harness?

A coding agent is an LLM wrapped in a loop that can act on your repository: it plans, edits multiple files, runs your tests and build, reads the errors, and tries again - with little or no human typing in between. That autonomy is the difference between a chat assistant that suggests snippets and a harness that ships a feature. It is also exactly why the output needs review: the agent optimizes for "the task looks done", not "this is safe to run for real customers".

The tools, and where each one shines

Claude Code

Anthropic's terminal-native agent. Strong at multi-file reasoning, large-codebase navigation and long, structured tasks (refactors, migrations, test suites). It plans before it edits and is conservative about destructive actions, which makes it a favourite for working inside existing production code rather than greenfield demos.

OpenAI Codex

OpenAI's coding agent, available in the CLI and IDE. Fast at turning a written spec into a working implementation and good at well-scoped, self-contained tasks. Excellent throughput when the problem is clearly defined; the flip side is that under-specified prompts produce confident code that compiles and still does the wrong thing.

Gemini CLI

Google's open-source terminal agent with a very large context window, which helps when a change touches a lot of files at once or needs to reason over a big repository. Tightly integrated with Google's ecosystem. Like the others, its large-context confidence is a strength for breadth and a risk for review depth.

Cursor

An AI-first editor rather than a pure CLI: agent mode plus inline editing in a familiar VS Code-style UI. The most approachable entry point for people who aren't living in the terminal, which is precisely why a lot of Cursor output reaches production without a senior pass.

Replit

An agent that lives inside Replit's cloud workspace: it writes code, provisions the database and deployment itself, exercises the app in a browser and fixes what it finds. The shortest path from idea to a running prototype with zero local setup - which is exactly why Replit-built projects so often reach production from people who don't write code at all.

Where agent-written code breaks in production

The failure modes are remarkably consistent across all five tools, because they share the same incentive - reach "it works" - and the same blind spots:

Security. Secrets committed to the repo or shipped to the browser, routes without authentication or authorization, missing input validation, no rate limiting. Independent scans of AI-built apps repeatedly find a majority carry at least one critical flaw.
Reliability. Happy-path code that works in the demo and falls over on the inputs nobody prompted for - because there are no tests, error handling or monitoring to catch it before users do.
Cost & scaling. Naive database queries and oversized infrastructure that are invisible at ten users and a budget emergency at ten thousand.
Maintainability. Plausible-looking code that nobody on the team actually wrote or fully understands, so every later change is a gamble.

None of this means the tools are bad. It means the harness gets you the first 80% - scaffolding, plumbing, a working happy path - and the missing 20% is exactly the security, reliability and maintainability work that production demands. We wrote about that gap in detail in Vibe-Coded vs Production-Ready.

A checklist to ship agent-written code safely

Whatever harness produced the code, the same review pass closes most of the gap:

Move every secret server-side and rotate anything that touched a prompt, a commit or the client bundle.
Put auth on every route - authentication and authorization - and validate all input at the boundary.
Add tests around the real paths, not just the demo flow, plus error handling and basic monitoring.
Profile the expensive queries and right-size infrastructure before launch, not after the bill arrives.
Have a senior engineer read the code the agent generated - the cheapest insurance against a confident-but-wrong implementation.

Ship it with confidence

Agentic tools are a genuine leap - they are how a lot of great products will start. The discipline that turns that head start into a real product hasn't changed: security, reliability, and code your team can own. If you built something with Claude Code, Codex, Gemini CLI, Cursor, Replit or any other agent and you're not sure it's safe to launch, that's exactly what an audit is for.

IOTA audits, secures and ships agent-written apps to production - fixed prices, senior engineers, starting with an audit. See how the rescue works →