Building AI Agents: The Hard Part Isn't Code

Gopal Khadka
.
Feb 25, 2026
Everyone's building agents. Most are building them wrong.
Not because they can't code. Because they skip the decisions that matter.
I've built AI agents for research, data extraction, web scraping, and orchestration. The ones that failed didn't fail in implementation. They failed in design — questions I didn't ask before writing a single line.
Here's what I learned: the hard part of building AI agents isn't code. It's decisions.
Why Building AI Agents Fails at the Design Phase
Agent tutorials teach you frameworks. They don't teach you when to use them.
Should this be an agent or a workflow? Big model or small? How much autonomy? What happens when it fails?
These questions shape everything downstream. Skip them and you'll build something that works in demos and breaks in production.
Agent vs Workflow: The First Decision When Building AI Agents
This is the decision most people get wrong.
An agent decides what to do next. It has autonomy. It picks tools, loops, adjusts. Powerful — but unpredictable.
A workflow follows a fixed path. Step one, then step two, then step three. Predictable — but rigid.
The rule: if the steps are known upfront, use a workflow. If the LLM must decide what to do based on context, use an agent.
Most people default to agents when a workflow would do. They add complexity they don't need, then spend weeks debugging autonomy they never wanted.
Start with a workflow. Graduate to an agent only when you need the flexibility.
The Guardrail Paradox for LLM Agents
Too few guardrails: the agent breaks on basics. Formats wrong. Types wrong. Misses obvious expectations.
Too many guardrails: the agent stops thinking. Give it strict examples and hard rules, it copies instead of learns. It imitates your patterns instead of understanding your intent.
The balance isn't a formula. It's taste — developed through iteration.
My approach: start loose, tighten where it fails. Don't over-specify upfront. Let the agent show you where it needs constraints.
The AI Agent Design Checklist: Questions Before Code
Before building any agent, answer these:
Architecture
Should this be an agent or a workflow? If steps are predictable, workflow. If decisions depend on context, agent. Don't add autonomy you don't need.
Does each agent have a single, clear responsibility? One agent, one job. Researcher researches. Parser parses. Orchestrator coordinates. Blur the lines and you'll debug for days.
Is an LLM call necessary here? If a Python function can do it precisely and deterministically, skip the LLM. Not everything needs intelligence. Some things need reliability.
Guardrails & Structure
Should the output be structured or prose? Prefer structured (JSON, schema, typed objects) when downstream code consumes it. Use prose only when humans read it directly.
How much autonomy should this agent have? Define the boundaries. Can it pick tools? Trigger subagents? Loop indefinitely? Autonomy without boundaries is chaos.
Are we providing enough context without overloading? Feed only what's needed. Too little context and it guesses wrong. Too much and it gets confused. Structure long context with clear sections or XML tags.
Failure Modes
What happens when parsing fails? It will fail. Plan for it. Retry? Fallback parser? Ask the LLM to try again? Decide before it happens.
How will we know if this agent is working? Define success. Latency? Accuracy? Completion rate? If you can't measure it, you can't improve it.
Choosing the Right Model for Your LLM Agent
Not every task needs your biggest model.
Small models for simple, deterministic tasks. Classification. Extraction. Formatting. Fast, cheap, reliable.
A classification call on a small model runs in 200ms for pennies. The same call on a frontier model takes 3 seconds and costs 50x more. Match the model to the task.
Large models for reasoning, planning, multi-step decisions. When the agent needs to think, let it think — but only then.
"Thinking mode" is expensive. Enable it for complex orchestration. Disable it for everything else.
Agentic Design Patterns: Small Agents, Clear Responsibilities
The best agent architectures look simple.
One orchestrator coordinates. Small specialized agents handle specific tasks — research, parsing, fetching, analysis. Each one does one thing well.
In practice: one orchestrator, one researcher, one parser. Three agents with clear boundaries — not thirteen with overlapping jobs.
The orchestrator delegates and combines. The subagents execute and return. Clean boundaries. Predictable behavior.
When an agent does too much, it fails in unpredictable ways. When responsibilities are clear, failures are obvious and fixable.
Why Engineers Struggle With AI Agent Architecture
They're not bad at coding. They're skipping the design phase.
They start with frameworks instead of questions. They add agents when workflows would do. They over-engineer autonomy, then fight to constrain it.
The engineers who build reliable agents slow down before they speed up. They answer the hard questions first. Then the code writes itself.
The Checklist for Building AI Agents
Before you build, ask:
Agent or workflow?
Single responsibility per agent?
LLM call or Python function?
Structured output or prose?
How much autonomy? What are the limits?
Enough context? Too much?
What's the fallback when parsing fails?
How do we measure success?
Answer these first. Then write code.
Everyone's building agents. The ones that work were designed before they were coded.
Everyone's building agents. Most are building them wrong.
Not because they can't code. Because they skip the decisions that matter.
I've built AI agents for research, data extraction, web scraping, and orchestration. The ones that failed didn't fail in implementation. They failed in design — questions I didn't ask before writing a single line.
Here's what I learned: the hard part of building AI agents isn't code. It's decisions.
Why Building AI Agents Fails at the Design Phase
Agent tutorials teach you frameworks. They don't teach you when to use them.
Should this be an agent or a workflow? Big model or small? How much autonomy? What happens when it fails?
These questions shape everything downstream. Skip them and you'll build something that works in demos and breaks in production.
Agent vs Workflow: The First Decision When Building AI Agents
This is the decision most people get wrong.
An agent decides what to do next. It has autonomy. It picks tools, loops, adjusts. Powerful — but unpredictable.
A workflow follows a fixed path. Step one, then step two, then step three. Predictable — but rigid.
The rule: if the steps are known upfront, use a workflow. If the LLM must decide what to do based on context, use an agent.
Most people default to agents when a workflow would do. They add complexity they don't need, then spend weeks debugging autonomy they never wanted.
Start with a workflow. Graduate to an agent only when you need the flexibility.
The Guardrail Paradox for LLM Agents
Too few guardrails: the agent breaks on basics. Formats wrong. Types wrong. Misses obvious expectations.
Too many guardrails: the agent stops thinking. Give it strict examples and hard rules, it copies instead of learns. It imitates your patterns instead of understanding your intent.
The balance isn't a formula. It's taste — developed through iteration.
My approach: start loose, tighten where it fails. Don't over-specify upfront. Let the agent show you where it needs constraints.
The AI Agent Design Checklist: Questions Before Code
Before building any agent, answer these:
Architecture
Should this be an agent or a workflow? If steps are predictable, workflow. If decisions depend on context, agent. Don't add autonomy you don't need.
Does each agent have a single, clear responsibility? One agent, one job. Researcher researches. Parser parses. Orchestrator coordinates. Blur the lines and you'll debug for days.
Is an LLM call necessary here? If a Python function can do it precisely and deterministically, skip the LLM. Not everything needs intelligence. Some things need reliability.
Guardrails & Structure
Should the output be structured or prose? Prefer structured (JSON, schema, typed objects) when downstream code consumes it. Use prose only when humans read it directly.
How much autonomy should this agent have? Define the boundaries. Can it pick tools? Trigger subagents? Loop indefinitely? Autonomy without boundaries is chaos.
Are we providing enough context without overloading? Feed only what's needed. Too little context and it guesses wrong. Too much and it gets confused. Structure long context with clear sections or XML tags.
Failure Modes
What happens when parsing fails? It will fail. Plan for it. Retry? Fallback parser? Ask the LLM to try again? Decide before it happens.
How will we know if this agent is working? Define success. Latency? Accuracy? Completion rate? If you can't measure it, you can't improve it.
Choosing the Right Model for Your LLM Agent
Not every task needs your biggest model.
Small models for simple, deterministic tasks. Classification. Extraction. Formatting. Fast, cheap, reliable.
A classification call on a small model runs in 200ms for pennies. The same call on a frontier model takes 3 seconds and costs 50x more. Match the model to the task.
Large models for reasoning, planning, multi-step decisions. When the agent needs to think, let it think — but only then.
"Thinking mode" is expensive. Enable it for complex orchestration. Disable it for everything else.
Agentic Design Patterns: Small Agents, Clear Responsibilities
The best agent architectures look simple.
One orchestrator coordinates. Small specialized agents handle specific tasks — research, parsing, fetching, analysis. Each one does one thing well.
In practice: one orchestrator, one researcher, one parser. Three agents with clear boundaries — not thirteen with overlapping jobs.
The orchestrator delegates and combines. The subagents execute and return. Clean boundaries. Predictable behavior.
When an agent does too much, it fails in unpredictable ways. When responsibilities are clear, failures are obvious and fixable.
Why Engineers Struggle With AI Agent Architecture
They're not bad at coding. They're skipping the design phase.
They start with frameworks instead of questions. They add agents when workflows would do. They over-engineer autonomy, then fight to constrain it.
The engineers who build reliable agents slow down before they speed up. They answer the hard questions first. Then the code writes itself.
The Checklist for Building AI Agents
Before you build, ask:
Agent or workflow?
Single responsibility per agent?
LLM call or Python function?
Structured output or prose?
How much autonomy? What are the limits?
Enough context? Too much?
What's the fallback when parsing fails?
How do we measure success?
Answer these first. Then write code.
Everyone's building agents. The ones that work were designed before they were coded.
More from DevDash Labs

Service as a Software: How to Scale Your Professional Services Expertise with AI
Read More >>>

Figma Buzz: A Game-Changer for SMB Marketing Teams (Hands-On Review)
Read More >>>

The 2025 Generative AI Platforms: A Guide to Tools, Platforms & Frameworks
Read More >>>
