AI Agents Deep Dive Advanced

The n8n AI Agent Node: Complete Guide

Published on Apr 4, 2026 · By Anshul Namdev

What the AI Agent Node Actually Is

Most n8n nodes do one thing: take input, transform it, pass it on. The AI Agent node is different. It doesn't follow a fixed path. It reasons. You give it a goal, a set of tools, and a model, and it figures out the steps on its own. It can call tools, observe results, decide what to do next, and loop until the job is done or it hits a limit.

Under the hood, the AI Agent node is built on LangChain and implements what's called a ReAct loop: Reason, Act, Observe, Repeat. The LLM acts as the brain. It reads the incoming message, looks at the tools available, decides which one to call (if any), gets the result back, and decides whether the task is complete or whether it needs to do more. This loop can run multiple times per single user message.

That's the core distinction from a basic LLM node. A basic LLM node is a one-shot call: prompt in, text out. The AI Agent node is a loop: it can take 10 actions before it gives you a final answer.

Since n8n v1.82.0, there is only one agent type: the Tools Agent. Earlier versions had multiple agent types (Conversational Agent, ReAct Agent, OpenAI Functions Agent, Plan and Execute Agent). All of those have been consolidated into the single Tools Agent, which uses the LLM's native function/tool-calling interface. If you're on an older version, you may still see those options, but for any new workflow, the Tools Agent is what you want.

When to Use an Agent vs. a Simple LLM Call

This is the most important question to answer before you start building. Agents are powerful, but they're also slower, more expensive, and harder to debug than a direct LLM call. Don't reach for the AI Agent node by default.

Scenario	Use Agent?	Why
Summarize a document	No	Single LLM call is faster and cheaper
Classify an email into categories	No	Fixed output, no tool use needed
Extract structured fields from text	No	LLM Chain + Structured Output Parser handles this
Answer a question that needs a live web search	Yes	Agent decides when and how to call the search tool
Multi-turn chat that remembers context	Yes	Agent + Memory maintains conversation state
Task requiring multiple API calls in unknown order	Yes	Agent reasons about which tool to call next
Fixed sequential pipeline (always same steps)	No	Standard workflow nodes are more predictable
Research task pulling from multiple sources	Yes	Agent can query, synthesize, and verify iteratively
Customer support bot with CRM lookups	Yes	Agent decides when to look up data vs. answer directly
Translate text to another language	No	One-shot LLM call, no reasoning loop needed

The rule of thumb: if the AI needs to decide what to do next based on intermediate results, use an agent. If the steps are predictable and fixed, use a standard workflow with LLM nodes where needed. If you're still getting familiar with how n8n workflows are structured, the first workflow guide is a good starting point before diving into agents.

Your First Agent: Chat Trigger + AI Agent + Model

Before going deep on every option, let's build the simplest possible agent so you have a working mental model. Three nodes. Five minutes.

Minimal Agent Workflow

Chat Trigger

Trigger

›

AI Agent

Agent

OpenAI Chat Model

Simple Memory

Add a Chat Trigger node

This is your entry point. The Chat Trigger creates a built-in chat interface you can use directly in n8n for testing. In production, it can be embedded in a website or connected to a messaging platform. No configuration needed to get started, just add it to the canvas.

Add the AI Agent node

Click the + connector on the Chat Trigger and search for "AI Agent". Add it. You'll see the node has connection points along the bottom for sub-nodes: Chat Model, Memory, Tool, and Output Parser. These are not regular node connections, they're sub-node slots that extend the agent's capabilities.

Connect a Chat Model

Click the + under "Chat Model" on the AI Agent node. Search for your provider, OpenAI, Anthropic, Google Gemini, Groq, Mistral, or others. Select it, add your API credentials, and choose a model. For OpenAI, gpt-4o-mini is a solid starting point. For Anthropic, claude-3-5-haiku is fast and affordable.

The Chat Model is the only required sub-node. Everything else, memory, tools, output parsers, is optional.

Test it

Click the Chat button at the bottom of the canvas. A chat window opens. Type anything. The agent will respond using the connected model. You now have a working AI agent, no tools yet, just a direct LLM conversation. From here, every section below adds capability.

No tools = expensive chatbot. An agent without tools is just a chatbot with extra overhead. The real value of the AI Agent node comes from connecting tools that let it interact with the real world. We'll cover all of them below.

Import This Workflow

Here's the exact workflow for the minimal agent above rendered live. Click any node to inspect it, or use the copy button to grab the JSON and import it directly into n8n via Ctrl+Shift+V. Just swap in your OpenAI credentials and you're running.

Live Workflow Preview

Node Parameters: The Prompt Section

Open the AI Agent node and you'll see two main parameters before you get to the Options section.

Prompt (Source)

Default: Take from previous node automatically

This controls where the agent gets its user message from. Two choices:

Take from previous node automatically: The agent looks for a field called chatInput in the incoming data. The Chat Trigger node outputs exactly this field, so if you're using Chat Trigger, leave this on the default and it just works.
Define below: You manually write the prompt, either as static text or as an n8n expression. Use this when the agent is triggered by something other than a chat (a webhook, a schedule, a form submission) and you want to construct the prompt dynamically from the incoming data.

Require Specific Output Format

Default: Off

When you turn this on, a new sub-node slot appears on the agent: Output Parser. This tells the agent it must return its response in a specific structured format rather than free-form text. When enabled, you connect one of three output parsers (covered in detail in the Output Parsers section below). Leave this off unless you specifically need structured JSON output from the agent.

Node Options: Every Setting Explained

Click Add Option at the bottom of the AI Agent node to reveal these settings. None are required, they're all refinements.

System Message

Default: "You are a helpful assistant"

This is the most important option in the entire node. The system message is a set of instructions sent to the LLM before any user message. It defines the agent's role, personality, constraints, what tools to use and when, and how to format responses. Think of it as the agent's job description and rulebook combined.

You can use n8n expressions here to make it dynamic. For example, inject the current user's name or account type from the incoming data to personalize the agent's behavior per request.

Max Iterations

Default: 10

The maximum number of times the agent's reasoning loop can run before it's forced to stop and return whatever it has. Each iteration may involve one LLM call plus one tool call. If your agent is doing complex multi-step research, you might need to raise this. If you're worried about runaway costs, lower it. For simple agents with one or two tools, 5 is usually plenty.

Return Intermediate Steps

Default: Off

When turned on, the agent's output includes every step it took, every tool call, every observation, every reasoning step, not just the final answer. This is extremely useful for debugging. In production, you'd typically leave this off and only enable it when something isn't working as expected. The intermediate steps appear in the node's output data and in the execution logs.

Automatically Passthrough Binary Images

Default: Off

When enabled, any binary image data in the incoming workflow data is automatically forwarded to the agent as image-type messages. This is what enables multimodal workflows. For example, a user uploads an image via a form, and the agent can see and analyze it. Requires a model that supports vision (GPT-4o, Claude 3, Gemini 1.5, etc.).

Enable Streaming

Default: On

When enabled, the agent sends its response back token by token as it generates it, rather than waiting until the full response is ready. This makes the chat feel much more responsive for long answers. For streaming to work, your trigger must support it. The Chat Trigger and Webhook node (with Response Mode set to Streaming) both do. If you're using a different trigger, disable this.

Tracing Metadata

Default: Empty

Add custom key-value pairs that get attached to tracing events for this agent. This is specifically for use with LangSmith or similar LLM observability tools. If you're not using a tracing platform, ignore this option entirely.

Writing a Good System Prompt

The system message is where most of the agent's behavior is defined. A vague system prompt produces unpredictable results. A well-structured one makes the agent consistent and reliable. Here's a template that works well:

## Role
You are [describe the role and purpose clearly].

## Tools Available
- [tool_name]: Use this when [specific situation]. Input: [what to pass]. Output: [what it returns].
- [tool_name]: Use this when [specific situation].

## Behavior Rules
- [Specific instruction, be concrete, not vague]
- [Constraint or boundary]
- [When to escalate or say you don't know]

## Response Format
[How to structure the final answer, length, tone, format]

## Error Handling
If a tool fails or returns no results, [what to do instead].

A few things that consistently improve agent behavior:

Tell the agent when to use each tool, not just that the tools exist. "Use the calendar tool when the user asks about scheduling or availability" is better than just listing the tool.
Set explicit constraints. "Never share order details without first verifying the customer's email" prevents the agent from doing things you don't want.
Define what "done" looks like. Without this, agents sometimes keep calling tools unnecessarily.
Use n8n expressions to inject dynamic context: Today's date is {{ $now.toFormat('yyyy-MM-dd') }}. This is especially useful for date-aware agents.

Common mistake: Writing a system prompt that's too long and contradictory. If you tell the agent to "be concise" and also "provide detailed explanations", it will be inconsistent. Keep instructions clear and non-conflicting. If the prompt is getting very long, that's usually a sign the agent is trying to do too many things and should be split into specialized sub-agents.

Supported Chat Models

The AI Agent node requires a Chat Model sub-node. The model must support function/tool calling, not all models do. Here are the officially supported providers and what to know about each. For a deeper comparison of cost, speed, and quality across these providers, see the Choosing the Right AI Model guide.

Provider	Node Name	Best For	Notes
OpenAI	OpenAI Chat Model	General purpose, most reliable tool calling	GPT-4o for quality, GPT-4o-mini for cost. Most widely tested with n8n.
Anthropic	Anthropic Chat Model	Long context, nuanced reasoning, safety	Claude 3.5 Sonnet/Haiku. Excellent for complex instructions and large documents.
Google Gemini	Google Gemini Chat Model	Multimodal, Google ecosystem	Gemini 1.5 Pro/Flash. Strong vision capabilities. Free tier available.
Google Vertex AI	Google Vertex Chat Model	Enterprise Google Cloud deployments	Same models as Gemini but via Vertex AI API with enterprise billing.
Azure OpenAI	Azure OpenAI Chat Model	Enterprise compliance, data residency	OpenAI models hosted on Azure. Required for some regulated industries.
Groq	Groq Chat Model	Speed, extremely fast inference	Llama, Mixtral, Gemma models. Best for latency-sensitive workflows.
Mistral	Mistral Cloud Chat Model	European hosting, open-weight models	Mistral Large/Small. Good alternative to OpenAI with EU data residency.
Ollama	Ollama Chat Model	Local/self-hosted, privacy, no API costs	Runs models on your own hardware. Requires Ollama server. Tool calling support varies by model.
AWS Bedrock	AWS Bedrock Chat Model	AWS ecosystem, enterprise	Access to Claude, Llama, Titan, and others via AWS.
DeepSeek	DeepSeek Chat Model	Cost-effective, strong reasoning	Very competitive pricing. DeepSeek-R1 has strong reasoning capabilities.

Each Chat Model sub-node has its own settings. The most important ones are:

Model: Which specific model to use (e.g., gpt-4o, claude-3-5-sonnet-20241022)
Temperature: Controls randomness. 0 = deterministic, 1 = creative. For agents doing factual tasks or tool use, keep this at 0 or 0.1. For creative tasks, 0.7 to 1.0.
Max Tokens: Maximum length of the model's response. Set this to avoid unexpectedly long (and expensive) outputs.
Timeout: How long to wait for a response before failing. Important for production workflows.

Fallback Model: Some Chat Model sub-nodes have an "Enable Fallback Model" option. When enabled, you connect a second backup model that the agent switches to if the primary model fails or is unavailable. Useful for production reliability.

Tools: What the Agent Can Do

Tools are what separate an AI Agent from a chatbot. Each tool is a capability the agent can invoke during its reasoning loop. The agent reads the tool's name and description, decides whether it needs it, calls it with the right parameters, and uses the result to continue reasoning.

You connect tools by clicking the + under "Tool" on the AI Agent node. You can connect as many tools as you want, but more tools means more decisions for the agent, which can lead to confusion and higher costs. Start with the minimum set needed.

Built-in Tool Nodes

n8n ships with several ready-to-use tool nodes that require no custom configuration:

Calculator

Math Operations

Performs arithmetic. Use when the agent needs to compute exact numbers rather than estimate. No configuration needed.

Wikipedia

General Knowledge

Searches Wikipedia for factual information. Good for general knowledge lookups without needing a paid search API.

Wolfram Alpha

Computational Knowledge

Scientific queries, unit conversions, data analysis. Requires a Wolfram Alpha API key.

SerpAPI

Google Search

Real-time web search via Google. Use when the agent needs current information. Requires a SerpAPI key.

Code

Execute JavaScript

Runs custom JavaScript or Python. Use for data transformations or logic that no other tool handles.

HTTP Request

Call Any API

The most flexible tool. Configure it to call any REST API. Define the description carefully so the agent knows when to use it.

App Integration Tools

The Tools Agent supports over 100 native app integrations as tools. These include Gmail, Google Sheets, Google Calendar, Slack, Notion, Airtable, HubSpot, Jira, GitHub, Postgres, MySQL, MongoDB, Stripe, Shopify, Telegram, Discord, and many more. Each one exposes specific operations (read, write, search, create) that the agent can call.

To add one, click + under Tool, search for the app name, and configure it with your credentials and the specific operation you want to expose. The agent will use the tool's name and description to decide when to call it.

The Workflow Tool: Your Most Powerful Option

The Call n8n Workflow tool lets your agent call another n8n workflow as a tool. This is the foundation of multi-agent architectures. You create specialized sub-workflows, one for CRM lookups, one for sending emails, one for database queries, and your main agent calls them as needed. Each sub-workflow can have its own credentials, error handling, and logic, completely isolated from the main agent.

Main Agent

→ tool call

lookup-customer workflow

→ tool call

send-email workflow

→ tool call

check-inventory workflow

The $fromAI() Function: Dynamic Tool Parameters

When configuring a tool node (like Google Sheets or Gmail), you normally hardcode the parameters. But with $fromAI(), you let the agent fill in the parameters dynamically based on context.

// In a Google Sheets tool node, instead of hardcoding the row data:
// Name field:
{{ $fromAI("customerName", "The customer's full name from the conversation") }}

// Email field:
{{ $fromAI("email", "The customer's email address") }}

// Amount field:
{{ $fromAI("orderAmount", "The total order amount in USD", "number") }}

The $fromAI(key, description, type) function takes three arguments: a key name (what the AI uses to identify the value), an optional description (gives the AI context about what to look for), and an optional type hint (string, number, boolean, json). The AI model fills in the value from its context, from the conversation, from previous tool results, or by asking the user if it can't find it.

$fromAI() only works in tool nodes connected to an AI Agent. It doesn't work in regular workflow nodes or in the Code tool. Think of it as a way to make your tool nodes "AI-aware", the agent populates the fields rather than you hardcoding them.

Human-in-the-Loop for Tools

For sensitive tool calls, sending emails, deleting records, making payments, you can require human approval before the agent executes the tool. In the Tools Panel, there's a "Human review" section where you configure an approval channel (Chat, Slack, Telegram, email). When the agent wants to use a gated tool, the workflow pauses and sends an approval request. The reviewer approves or denies, and the workflow continues accordingly.

Memory: Giving the Agent Context

Without memory, every message to the agent starts a completely blank conversation. The agent has no idea what was said before. Memory fixes this by storing conversation history and making it available on each turn.

You connect a memory node by clicking the + under "Memory" on the AI Agent node. Here are all the memory types available:

Memory Type	Persistence	Best For	Requires
Simple Memory	Session only (lost on restart)	Testing, demos, single-session interactions	Nothing, built into n8n
Window Buffer Memory	Session only	Conversations where you want a fixed context window	Nothing, built into n8n
Postgres Chat Memory	Persistent	Production chatbots, multi-session conversations	Postgres database
Redis Chat Memory	Persistent	High-performance production, fast reads	Redis server
Motorhead Memory	Persistent	Managed memory service with summarization	Motorhead server
Xata Memory	Persistent	Serverless Postgres-compatible storage	Xata account
Zep Memory	Persistent	Semantic memory with knowledge graph	Zep server
Vector Store Memory	Persistent	Semantic recall over long histories, RAG-style memory	Embeddings node + Vector Store node

Simple Memory

Start Here

Stores the full conversation history in n8n's in-memory store. Zero setup. Perfect for testing. Lost when the workflow restarts or n8n restarts. Configurable: set how many previous messages to keep (default is the last 5 interactions).

Window Buffer Memory

Cost Control

Like Simple Memory but with a hard cap on how many messages are kept. When the window fills, the oldest messages are dropped. Use this to prevent context from growing indefinitely and driving up token costs.

Postgres / Redis

Production Choice

Stores conversation history in an external database. Survives restarts. Supports multiple concurrent users via Session ID. Use Postgres if you already have it. Use Redis if you need sub-millisecond read speeds.

Vector Store Memory

Semantic Recall

On each turn, retrieves the most semantically relevant past messages, not just the most recent ones. Requires an Embeddings sub-node and a Vector Store sub-node. See the Vector Stores and RAG guide for setup.

All persistent memory types use a Session ID to separate conversations. This is critical. If you don't set a unique Session ID per user or conversation, all users will share the same memory. A common pattern:

// Session ID in Postgres/Redis Chat Memory node:
{{ $json.userId }}_{{ $json.conversationId }}

// Or for a simple single-user setup:
{{ $('Chat Trigger').item.json.sessionId }}

Memory and token costs: Every message stored in memory gets sent to the LLM on every turn. A conversation with 50 messages means 50 messages of context on message 51. This adds up fast. Use Window Buffer Memory to cap the context size, or use Vector Store Memory which only retrieves the most relevant messages rather than all of them.

Chat Memory Manager

For advanced memory management beyond what the standard memory nodes offer, n8n provides the Chat Memory Manager node. Use it when you need to: inject custom messages into the agent's memory (to give it context it didn't learn from the conversation), check and reduce memory size programmatically, or manage memory in workflows where you can't attach a memory sub-node directly to the agent.

Output Parsers: Getting Structured Data

By default, the AI Agent returns free-form text. That's fine for chat. But if the agent's output needs to feed into downstream nodes, a database insert, an API call, a conditional branch, you need structured data. Output parsers enforce a specific format on the agent's response.

To use an output parser, first enable Require Specific Output Format in the node parameters. This reveals the Output Parser sub-node slot. Then connect one of the three available parsers.

Important caveat: Output parsers work more reliably with the Basic LLM Chain node than with the AI Agent node. The agent's multi-step reasoning loop can interfere with structured output generation. If you need guaranteed structured output, the recommended pattern is: let the agent do its reasoning, then pass its raw text response through a separate LLM Chain node with an output parser for the final formatting step.

Structured Output Parser

The most common parser. Forces the LLM to return a JSON object matching a schema you define. Two ways to configure it:

// Method 1: Provide a JSON example, the parser infers the schema
{
  "customerName": "Jane Smith",
  "orderTotal": 149.99,
  "items": ["Widget A", "Widget B"],
  "priority": "high"
}

// Method 2: Define an explicit JSON Schema for precise control
{
  "type": "object",
  "properties": {
    "customerName": { "type": "string" },
    "orderTotal":   { "type": "number" },
    "priority": {
      "type": "string",
      "enum": ["low", "medium", "high"]
    }
  },
  "required": ["customerName", "orderTotal"]
}

Note: n8n's JSON Schema implementation doesn't support $ref for referencing other schemas. Keep schemas self-contained.

Auto-fixing Output Parser

A wrapper around the Structured Output Parser that adds resilience. When the LLM's output doesn't match the schema, instead of failing, this parser sends the malformed output back to the LLM with instructions to fix it. It requires a second Chat Model connection (the "fixing" model) and a Structured Output Parser connection (the schema to fix toward).

Use this in production systems where occasional parsing failures are unacceptable. Avoid it in cost-sensitive or time-critical workflows, each fix attempt adds an extra LLM call.

Item List Output Parser

The simplest parser. Forces the LLM to return a plain list of items, strings, keywords, tags, categories. Use this when you need an array of simple values rather than a complex object. No schema definition needed; just connect it and the agent will return a JSON array.

Parser	Output Format	Best For	Reliability
Structured Output Parser	JSON object matching your schema	Structured data for downstream nodes	Good with capable models
Auto-fixing Output Parser	JSON object (with retry on failure)	Production systems needing reliability	Best, retries on failure
Item List Output Parser	JSON array of strings	Tags, keywords, simple lists	Very good, simple format

How the Agent Loop Works

Understanding the loop helps you debug unexpected behavior and write better system prompts. Here's exactly what happens when a message hits the AI Agent node:

1. RECEIVE

User message + memory history + system prompt assembled into context

↓

2. THINK

LLM evaluates: should I call a tool, or respond directly?

↓

3. ACT

If tool selected: n8n executes the tool and captures the result

↓

4. OBSERVE

Tool result added to context. LLM evaluates: is the goal achieved?

↓ (if not done, loop back to THINK)

5. RESPOND

Final answer generated and returned. Memory updated.

Each pass through steps 2–4 counts as one iteration toward the Max Iterations limit. A simple question with no tool use takes one iteration. A research task that searches the web three times and synthesizes results takes four or more iterations.

Agent Architecture Patterns

As your use cases grow, the architecture you choose matters. Here are the four patterns that cover most real-world scenarios. If you want to go further and have Claude build these architectures for you automatically, check out the Automate n8n with Claude guide.

Single Agent

One agent, multiple tools

The simplest setup. One AI Agent node with all tools connected. Good for prototypes and straightforward tasks with fewer than 5–6 tools. Gets unwieldy fast as complexity grows.

Routing Pattern

Classify → Route → Specialist

A classifier (simple LLM call or Switch node) categorizes the incoming request and routes it to a specialized agent. Each specialist has focused tools and a tight system prompt. Best when you have clear, distinct request categories.

Orchestrator Pattern

Master agent → Sub-agents

A master agent has Workflow tools that call specialized sub-workflows, each containing their own AI Agent. The master decides which specialist to invoke. Scales well for enterprise use cases with many domains.

Sequential Chain

Agent A → Agent B → Agent C

Agents process in stages, each handling one phase of a pipeline. Research → Analysis → Report is a classic example. Each agent receives the previous agent's output as its input.

Start with a single agent. Add architectural complexity only when you've validated that the simpler approach doesn't meet your requirements. Most workflows that "need" a multi-agent architecture actually just need a better system prompt and cleaner tool descriptions.

A Real Example: Support Agent with CRM Lookup

Here's a complete, production-ready agent setup you can replicate. A customer support agent that can look up orders, check product availability, and escalate tickets.

Chat Trigger

→

AI Agent

↓ tools

HTTP Request (Orders API)

Google Sheets (Products)

Gmail (Escalation)

↓ memory

Postgres Chat Memory

System Message for this agent:

You are a customer support agent for TechStore.

## Tools
- order_lookup: Search orders by order ID or customer email. Use when a customer asks about their order status, shipping, or delivery.
- product_search: Search the product catalog. Use when a customer asks about availability, specs, or pricing.
- escalate_ticket: Send an escalation email to the support team. Use for refund requests over $200, damaged items, or any issue you cannot resolve directly.

## Rules
- Always greet the customer by name if you know it from the conversation.
- Verify the customer's email before sharing any order details.
- Never process refunds directly, always use escalate_ticket for refund requests.
- If a product is out of stock, suggest the closest available alternative.
- Keep responses under 150 words unless the customer asks for detail.

## When You Don't Know
If you cannot find an order or answer a question, say so honestly and use escalate_ticket to loop in a human agent.

The HTTP Request tool for order lookup:

// Tool description (what the agent reads to decide when to use it):
Look up a customer order by order ID or email address.
Input: order ID (format: ORD-XXXXX) or customer email.
Output: Order status, items, shipping info, and estimated delivery.

// Node configuration:
Method: GET
URL: https://api.yourstore.com/orders/search
Query Params:
  q: {{ $fromAI("searchQuery", "The order ID or customer email to search for") }}
Headers:
  Authorization: Bearer {{ $credentials.apiKey }}

Debugging Agent Workflows

When an agent behaves unexpectedly, the execution logs are your best friend. Go to Executions, click the relevant execution, click the AI Agent node, and expand the output. You'll see every step the agent took: what it received, what it decided, which tools it called, what they returned, and what it concluded.

Common failure patterns and fixes:

Symptom	Likely Cause	Fix
Agent ignores tools entirely	Tool descriptions are vague or the system prompt doesn't mention when to use them	Rewrite tool descriptions to be specific about when and how to use each tool
Agent calls the wrong tool	Tool descriptions overlap or are ambiguous	Add clearer differentiation; reduce the number of tools
Agent loops without stopping	Tool returns unclear results; no completion criteria in prompt	Add explicit "you're done when..." instructions to the system prompt; lower Max Iterations
Agent makes up tool results	Tool is returning errors or empty responses that the agent doesn't recognize	Add error handling to tool nodes; add instructions for handling empty results in the system prompt
Agent forgets previous messages	No memory node connected, or Session ID is wrong	Add a memory node; verify Session ID is consistent per user
Structured output fails	Model doesn't reliably follow the schema	Use Auto-fixing Output Parser; simplify the schema; use a more capable model

Keeping Costs Under Control

Agents can get expensive fast if you're not careful. Each iteration involves at least one LLM call, and each LLM call sends the full context (system prompt + memory + conversation history + tool results) to the model. Here's how to keep costs reasonable. For a full breakdown of which providers offer the best cost-to-quality ratio, see the AI Inference Providers guide.

Right-size your model. Use the smallest model that handles your use case. GPT-4o-mini or Claude Haiku for most tasks. Reserve GPT-4o or Claude Sonnet for tasks that genuinely need the extra capability.
Cap memory. Use Window Buffer Memory with a small window (5 to 10 messages) instead of unlimited Simple Memory. Every message in memory costs tokens on every turn.
Lower Max Iterations. If your agent rarely needs more than 3 tool calls, set Max Iterations to 5. This prevents runaway loops from burning tokens.
Keep system prompts tight. Every token in the system prompt is sent on every turn. A 2,000-token system prompt on a 100-message conversation is 200,000 tokens just for the prompt.
Cache frequent queries. If the same questions come up repeatedly, consider caching responses in a database and checking the cache before hitting the agent.

The Full Picture

The AI Agent node is n8n's most capable and most complex node. Here's a quick reference of everything it connects to:

Sub-node Type	Required?	Options
Chat Model	Required	OpenAI, Anthropic, Gemini, Groq, Mistral, Azure OpenAI, Ollama, AWS Bedrock, DeepSeek, Google Vertex AI
Memory	Optional	Simple Memory, Window Buffer, Postgres, Redis, Motorhead, Xata, Zep, Vector Store Memory
Tool	Optional*	Calculator, Wikipedia, Wolfram Alpha, SerpAPI, Code, HTTP Request, Call n8n Workflow, 100+ app integrations
Output Parser	Optional	Structured Output Parser, Auto-fixing Output Parser, Item List Output Parser

* Technically optional, but an agent without tools is just an expensive chatbot.

The key settings to know:

System Message: Defines the agent's behavior. The most impactful thing you can configure.
Max Iterations: Controls how many reasoning loops the agent can run. Default 10.
Return Intermediate Steps: Shows every tool call and reasoning step in the output. Use for debugging.
Enable Streaming: Sends responses token by token for a more responsive chat experience.
Require Specific Output Format: Enables the Output Parser sub-node slot for structured JSON output.
Automatically Passthrough Binary Images: Enables multimodal input (images) for vision-capable models.