Building an AI Agent from Scratch

You must hear Agent and MCP a millions times like me in 2025. They all sounds like magic by all those hypes and wrapped by all those frameworks.

But what's actually happening? The docs don't tell you. The code is wrapped in abstractions. We're debugging a black box. I prefer white box much more than black box.

I needed to know: what's the minimum viable agent? Stripped down to basics. No inheritance hierarchies. No plugin systems. No middleware chains. Just the core loop that makes an agent tick.

How Agents Actually Work

Strip away the frameworks and you get three things:

// 1. LLM decides what to do next
const response = await llm.call(messages, tools)
// 2. Execute the tool it picked
const result = await runTool(response.toolName, response.args)
// 3. Feed result back, repeat until done
messages.push({role: 'tool', content: result})

That's it. That's the agent loop. Everything else is optimization.

The Core Loop

LLM gets a prompt and tool definitions. It thinks, picks a tool, returns structured output. You run that tool. Take the output, append to conversation. Call LLM again. Keep looping until it says "done" or you hit max iterations.

while (!done && iterations < maxIterations) {
  const response = await llm.call(context, availableTools)
  if (response.done) {
    return response.finalAnswer
  }
  const toolResult = await executeToolCall(
    response.tool,
    response.arguments
  )
  context.push({
    role: 'tool',
    name: response.tool,
    content: toolResult
  })
  iterations++
}

No magic. Just message passing and function calls. It can also draw like this sequence diagram.

User     Agent                   LLM               MCP                Tools
 │  task   │                      │                 │                   │
 ├────────>│  context + schemas   │                 │                   │
 │         ├─────────────────────>│                 │                   │
 │         │                      │decide:tool+args │                   │
 │         │<─────────────────────┤                 │                   │
 │         │ {tool,args}validate  │                 │                   │
 │         ├──────────────────────┼────────────────>│execute tool(args) │
 │         │                      │                 ├──────────────────>│
 │         │                      │                 │  result           │run function
 │         │    append            │                 │<──────────────────┤
 │         │<─────────────────────┼─────────────────┤                   │
 │         │  result              │                 │                   │
 │         │ ╭────────────────────╮                 │                   │
 │         │ │ loop if !done      │                 │                   │
 │         │ ╰────┬───────────────╯                 │                   │
 │         │<─────╯                                 │                   │
 │         │  context + schemas   │                 │                   │
 │         │  prev_result         │                 │                   │
 │         ├─────────────────────>│ decide:next tool│                   │
 │         │                      │ OR done         │                   │
 │         │<─────────────────────┤                 │                   │
 │  answer │ {done:true}          │                 │                   │
 │<────────┤                      │                 │                   │

Tools Are Just Functions

Tools aren't special. They're functions with schemas. LLM sees the schema, knows what args to pass.

const tools = {
  readFile: {
    fn: (path) => fs.readFileSync(path, 'utf8'),
    schema: {
      name: 'readFile',
      description: 'Read file contents',
      parameters: {
        type: 'object',
        properties: {
          path: {type: 'string'}
        }
      }
    }
  }
}

Schema goes to LLM. LLM returns tool name + args. You match the name, call the function. No framework needed.

What I Learned

Agents aren't complex. The loop is simple. Complexity comes from error handling, retries, context management, parallel tools. But the core? It's a while loop calling an LLM.

Frameworks hide this because they handle edge cases. But when you need to debug why your agent hallucinates tool calls, or runs in circles, or ignores results - you need to understand the loop.

Building from scratch isn't about rejecting frameworks. It's about knowing what they do so you can actually use them.

The Code

Full implementation is ~150 lines. No dependencies except LLM API client. Supports multiple tools, streaming, context window management. Works. Ships. You can read every line and know exactly what happens.