Claude Code: Triumphs, Trials & Trade-Offs
A deep dive into its architecture, standout features, and where it still falls short
Four months after Claude Code’s launch, we’re still seeing polarized opinions about the tool. Several of our teams are actively using it, while others are on the fence. So we decided to dig deep and get answers the only way that matters: by putting it to work.
Is Claude Code actually any good?
What can it do well, and where does it fall short?
How to use it for optimal results
Is it worth the price tag of $100-$200/month?
What is Claude Code?
As advertised - it’s a CLI coding agent that operates directly in your terminal, acting as an intelligent partner rather than just a code completion utility. It reads and understands your entire codebase, and claims to automate complex development workflows from start to finish.
Features
Smart CLI Agent: Executes multi-step tasks via natural language, directly in your terminal.
Full-Codebase Awareness: Understands your entire repo - no need to manually feed files.
File & Shell Access: Reads/writes files, runs tests, manages dependencies via CLI.
Git Integration: Handles commit messages, PRs, and history lookups.
Visual Understanding: Accepts screenshots/diagrams for UI and bug analysis.
Persistent Project Memory: Remembers context via
CLAUDE.md
and.claude/
memory files.Reusable Commands: Define and share custom slash commands for common workflows.
Advanced Reasoning: “Extended thinking” mode for deep, multi-step logic.
Secure by Default: Runs locally with granular permissions for file/system access.
What’s the Internet Saying?
Predictably polarized. Engineers argue over capabilities and depth, while less technical folks are often dazzled. The consensus? It's impressive, but inconsistent.
Approach to this Evaluation
Not being satisfied with a check of the features from the outside, I also ventured deeper, to understand how the tool works under the hood. This eval includes:
A study of Claude Code’s architecture and how it works, with information gleaned both from developers who have looked under the hood, and from actually using the tool
A multi task evaluation, exploring it’s limits under various conditions
Claude Code’s Architecture
Gathered from sources both at Anthropic, and outside - Claude Code is a modular, multi-layered system that cleanly separates UI, core logic, AI integration, and tooling:
CLI/UI Layer: Built with what looks like Ink React, it parses input and renders components in the terminal.
Core Application: Orchestrates message flow, gathers context, enforces permissions.
AI Integration: Supports multiple backends (Claude, OpenAI, Gemini) with fallback and cost tracking.
Tool System: Plug-and-play access to File APIs, Shell, Git, MCP, Agent modules.
Agent Framework: Enables mini-agents with isolated contexts for parallel workflows.
Storage & Memory: Caches state and writes markdown memory files to
.claude/
.Security: Sandboxing, command filters, permission prompts - a robust local-first model.
This layered, event-driven architecture ensures flexibility (multiple AI backends, dynamic tools), performance (streaming, caching, concurrent tool runs), and safety (strict validation and permission controls).
Real-World Tests
For my setup, I added the context7 MCP server to help get the latest docs for all the tools and libraries that I needed
claude mcp add --transport sse context7 https://mcp.context7.com/sse
Task 1 - Bootstrap a Next.js application from Zero to One
Build a simple prototype for a prompt → video generator. OpenAI for script generation, Gemini for text to video.
Prompts:
Download the latest docs for nextjs
Bootstrap a nextjs app from scratch
Add Clerk for authentication
With these super terse and lazy prompts, I got exactly what I wanted.
However, I did end up with some broken code on the first go. The middleware file was in the wrong directory.
Having fixed this manually, I proceeded to build a hero unit and dashboard, and add a Prisma schema.
More prompts:
I’m building a video tutorial app. Add a hero unit
Add a prisma schema for the video tutorial generator app. install prisma. Get the latest docs for prisma if required
Split up the logged in and logged out views into pages. dashboard view should be in a dashboard page and the root page should be the hero unit. add logged in status check to route the user accordingly 1.
There is an infinite redirect loop on the dashboard page when the user is signed in. Fix it
Wire up the new tutorial button on the dashboard to a create tutorial page with a form that lets you create a tutorial.The form should accept a topic. When you type in the topic and hit submit, then it should generate a script for the user. The script should describe also the tutor's name and appearance, and the setting. Use Open AI gpt-4o to generate the script from the backend. The user may then modify the script. Once the script is ready, the user can hit generate. Generate should make a request to gemini video generation with Veo3 and generate a tutorial video based on the script. The character should be consistent through out the video.
Update the approach to use the docs here https://ai.google.dev/gemini-api/docs/video#javascript



🟢 Verdict
Handled terse prompts surprisingly well.
Output was usable, though a directory bug needed manual fixing.
Later tasks (like schema generation, auth, redirects) were mostly accurate and efficient.
Task 2 - Fix a bug on a complex existing codebase
For this one, I cloned a popular open source survey builder application - Formbricks. It’s a large Next.js application with a couple of dockerized services.
I will say that being open source with 10K stars, it did have well written documentation, and that could have been a helping factor.
Prompts:
Fix this issue:
Issue Summary
When the project name is changed, it does not update in the navigation sidebar or the project switcher. You need to manually reload the page to see the updated name.
Expected Behavior
All UI elements should reflect the updated project name automatically, without requiring a page reload.
Other information (incl. screenshots, Formbricks version, steps to reproduce,...)
Steps to reproduce:
Navigate to Project Settings (Configuration) > General
Edit the project name
Click the Update button
The project name updates in the settings but does not update in the sidebar or project selector
⚠️ Verdict:
The issue was first solved Web 1.0 style - by adding a refresh after updating the form. I had to provide Claude a bit of constructive criticism to get what I wanted.
Quick turnaround (~15 mins). Smarter than a junior dev, but needed nudging.
Task 3 - Refactor Code On a large existing codebase
I continued with the formbricks repo, and broke this one down into two subtasks.
Task 3.1 Low impact chore - add improved logging in the auth module
Prompt:
Enhance Debug Logging in SAML SSO Authentication Flow
Task description: Currently, debugging issues in the SAML SSO authentication flow is not seamless. It often requires multiple back-and-forth communications with users to understand their setup and provide appropriate troubleshooting steps. To improve the debugging experience and expedite issue resolution, please add comprehensive debug logs at various key points in the SAML SSO authentication flow. Enhanced logging will help us quickly identify and diagnose issues without requiring extensive input from end users.
Requested Changes: Add detailed debug logs at relevant stages of the SAML SSO authentication process. Ensure logs capture essential configuration, errors, and state transitions to facilitate easier troubleshooting. Review existing log statements and enhance them where necessary for clarity and completeness
🟢 Verdict:
Simple, solid execution. ~45 minutes saved.
Task 3.2 High impact refactor - rewrite template factory for better reusability
Prompt
Refactor Survey Templates to Use Builder Functions and Reduce Code Duplication
Task description
Context Our current apps/web/app/lib/templates.ts file contains a large amount of repetitive code for survey/question templates. SonarQube is flagging this as a major source of code duplication, which negatively impacts our code quality metrics and maintainability.
🟠 Verdict
The process was very involved. I had to constantly have my Claude code terminal active to give feedback or unblock it on various decisions
The output was average. I could use it, but had to provide several follow up prompts to get it into a shape that I deemed suitable
It followed the coding standards in the repo, which was a pleasant surprise - not common for a new junior dev on your team
Task 4 - Rewrite an existing Next.js app in Python + Svelte
For this task - I used the Video Tutorial Generator app that I previously built with Claude Code.
Prompts:
Convert the Next.js app in video_tutorial to Python. Use Svelte for the frontend and Python with Fast API for the backend. Use poetry to manage dependencies, and PSQL with alembic for migrations
I ended up with a slightly broken backend and a working frontend, both of which I was able to get up and running within about 10 minutes.
👎 Outcome:
The newly ported app was riddled with problems
Generated code structure was unusable. Spent well over an hour prompting and reshaping the structure of the backend app
Clerk authentication was thrown away without approval, and a broken JWT implementation took it’s place. Required manual debugging to fix it
Gemini integration was completely stubbed out, and reduced to a canned output video
Emotion at the end of task 4 - frustration.
Takeaway: An Impressive, Flawed Glimpse into the Future
Claude Code is neither a silver bullet nor a gimmick - it’s a powerful CLI-based coding agent that lives somewhere in between.
Incredibly Smart and Inexplicably Dumb at the same time - It’ll architect an elegant backend, then decide to route your auth flow through a random file for no reason. Flashes of genius followed by baffling choices.
Blazing Fast for Prototyping - If you’re building from scratch, Claude Code is an absolute weapon. It eats boilerplate for breakfast and gets MVPs shipped fast.
Brilliant, But Needy - Think of it as a genius intern: sharp ideas, excellent recall, but weak decision making capabilities. You’ll babysit it more than you want to.
Refactors are sometimes garbage - For large codebases, it's hit-or-miss. You’ll often spend more time steering it than if you just did the job yourself.
Understands Context Like a Pro - Its architectural awareness and full-repo memory are impressive. This isn’t your average autocomplete gimmick.
Fails Hard on Complex Rewrites - Once it’s off the rails, it stays off the rails. Expect to step in and clean up.
Great Imitator, Weak Innovator - Follows repo conventions better than a junior dev, but don’t expect groundbreaking solutions.
Mediocre at Multi-Agent Orchestration - The agent framework sounds revolutionary. In practice? It's mostly just elaborate glorified scripting.
$200/mo Is Steep... But totally worth it - If you’re able to incorporate it into your daily workflow.
Not for the Lazy or the Naive - Claude Code punishes vague prompts and rewards specificity. If you’re not willing to drive, it won’t take you anywhere useful.
Verdict - Game-changing if you treat it like a co-pilot. A frustrating toy if you expect it to fly the plane.
How to Use Claude Code Effectively
Claude Code shines when used the right way, and flounders when pushed out of its depth. These strategies will help you get the most out of it without falling into common traps.
1. Stay Present in CLI Mode
Claude’s CLI agent is not designed to be fire-and-forget. It frequently pauses for clarifications or permission grants, requiring active participation.
🛠 Pro tip: Use it while doing low-effort work on your second screen so you can respond promptly.
⚠️ You can bypass prompts using --dangerously-skip-permissions
, but as advertised - it’s dangerous.
2. Use GitHub Actions for Hands-Off Tasks
Claude Code also runs through GitHub Actions, offering a more passive alternative for routine automation.
However:
Setup is higher-effort compared to OpenAI Codex.
It consumes GitHub Actions compute minutes - beware hidden costs.
3. Be Specific, Or Be Disappointed
Claude Code often performs poorly with vague or lazy prompts. Production-grade results demand structure: specify, coding patterns, and architectural expectations.
💡 Bonus: Write a CODING_GUIDELINES.md
or fill in .claude/CLAUDE.md
. It will honor conventions like a diligent junior dev.
4. Use It for Greenfield Prototypes, Not Legacy Refactors
Claude Code is a weapon when starting from scratch. It handles scaffolding, boilerplate, and MVP logic quickly and coherently - getting your idea live fast.
But large-scale refactors? Expect to babysit. For example, rewriting template logic in a mature codebase required frequent steering and manual patching.
✅ Ideal: New Next.js app, backend boilerplate, scaffolded flows
❌ Risky: Large-scale code reuse, deep refactors, architectural overhauls
5. Avoid Cross-Stack Rewrites
One major failure mode: translating apps across languages or frameworks. When asked to rewrite a Next.js app into Python + Svelte, Claude stumbled hard:
Broken backend structure
Incorrect auth migration
Gemini integration stubbed into nonsense
If you’re hoping for a smooth port from one stack to another, don’t expect Claude Code to do it cleanly without heavy involvement.
6. Let It Handle Chores, Not Strategy
Claude excels at tasks like:
Adding logging
Fixing simple bugs
Updating schemas
Automating CLI-heavy tasks
But for anything requiring strategic judgment or creative thinking - like solving an ambiguous bug, rethinking architecture, or choosing between libraries - Claude needs a lot of hand-holding.
🧠 Think of it this way: Claude is a fantastic executor, but a mediocre architect. You lead, it follows.
So is it worth the leap from Cursor + GPT?
One area that I'd have liked to evaluate, but couldn't get around to: DevOps automation using cloud provider MCP servers. If you're is using this actively, or would like to see a deep dive on that, let me know