AI Agent Workflow Patterns: Building Reliable Multi-Step AI Systems

As AI capabilities advance, single-shot prompts are giving way to multi-step agent workflows that combine LLM reasoning with structured execution patterns. At Acceli, we've implemented agent workflows for document processing, code review, customer service, and content generation—systems handling millions of requests monthly. The difference between reliable production agents and brittle prototypes lies in workflow architecture.
This guide covers five essential workflow patterns—sequential processing, parallel execution, evaluation loops, orchestration, and routing—drawn from Anthropic's agent design research and our production experience. We'll focus on when to use each pattern, implementation details using the Vercel AI SDK, and the business trade-offs that inform architectural decisions.
Choosing the Right Workflow Pattern
Before diving into specific patterns, understand the key factors that guide architectural decisions. Different patterns suit different business requirements and technical constraints.
Flexibility vs Control Trade-offs
How much autonomy should your AI agent have? This fundamental question shapes architecture:
High flexibility (autonomous agents): The LLM decides execution paths, which tools to use, and when to conclude. Best for open-ended tasks like customer service where conversations are unpredictable. Risk: agents may take unexpected paths or make costly tool calls.
High control (constrained workflows): Predefined sequences with LLM operating within strict boundaries. Best for regulated industries (finance, healthcare) or workflows requiring audit trails. Drawback: less adaptive to edge cases.
For a financial services client, we implemented high-control workflows for compliance-critical operations (KYC verification, transaction approval) while using flexible agents for customer inquiries. This hybrid approach balanced regulatory requirements with user experience—compliance workflows complete in predefined steps while support conversations adapt to user needs.
Error Tolerance and Business Impact
What happens if your agent makes a mistake? This determines workflow complexity:
Low error tolerance: Medical diagnosis, legal analysis, financial transactions require validation steps, human review loops, and fallback mechanisms. We implement evaluation loops (covered below) that verify outputs before acting on them. For a healthcare application, every AI-generated recommendation undergoes rule-based validation and flags edge cases for human review. This reduced error-related incidents from 3.2% to 0.1%.
High error tolerance: Content generation, idea brainstorming, creative work tolerate imperfect outputs. Simpler workflows suffice—sequential or single-step patterns work well. For a marketing copy generator, we use basic sequential workflow: generate copy → format output → return. Users understand AI may need editing, making complex validation unnecessary.
Start with the simplest pattern your error tolerance allows. Add complexity only when failures carry real business cost.
Cost Considerations
More sophisticated workflows mean more LLM calls and higher costs. Real-world cost examples:
Sequential workflow (3-4 LLM calls): $0.02-0.05 per execution with GPT-4o Parallel workflow (5-10 simultaneous calls): $0.08-0.15 per execution Evaluation loop workflow (5-15 calls depending on iterations): $0.10-0.30 per execution Orchestrator-worker workflow (8-20 calls): $0.15-0.40 per execution
For a document analysis system processing 50,000 documents monthly, workflow choice affects annual costs by $50,000-$180,000. We started with simple sequential workflows, added parallel processing for performance (50% latency reduction), then evaluation loops for quality-critical documents (flagged for higher-tier processing). This tiered approach balanced cost and quality—90% of documents use cheap workflows, 10% use expensive multi-iteration patterns.
Maintenance and Debugging Complexity
Complex workflows are harder to debug and modify. Considerations:
Single-step: Debug one prompt, modify in minutes Sequential: Trace through 3-5 steps, modify in hours Parallel: Debug race conditions and inconsistencies, modify in days Orchestrator: Understand coordination logic and worker interactions, modify in weeks
For a startup team of 3 engineers, we recommended sequential and routing patterns over complex orchestration. Maintenance burden matters—spending 40% of engineering time debugging agent workflows wasn't sustainable. We refactored to simpler patterns, reducing debugging time from 12 hours/week to 2 hours/week while maintaining 85% of functionality.
Start simple. Add complexity incrementally as you understand your domain better and can justify the maintenance cost.
Sequential Processing: The Foundation Pattern
Sequential workflows execute steps in predefined order, with each step's output feeding the next. This is the simplest reliable pattern—use it whenever tasks have clear sequential dependencies.
When to Use Sequential Workflows
Sequential patterns excel when:
-
Tasks have natural ordering: Content generation → quality check → formatting → publication. Each step depends on the previous step's output.
-
Requirements are well-understood: You know exactly what needs to happen and in what order. Workflows rarely need to deviate from the standard path.
-
Debugging is critical: Sequential execution provides clear audit trails. When something fails, you know exactly which step caused the issue.
Real example: For a legal document generation system, we use this sequence:
- Extract requirements from user input (generateObject)
- Generate document draft (generateText)
- Check legal compliance against rules (generateObject with validation schema)
- Format with appropriate legal language (generateText with specialized prompt)
- Generate document metadata for filing system (generateObject)
This processes 5,000+ documents monthly with 99.2% success rate. Sequential execution ensures every document passes compliance checks before formatting—critical for legal defensibility.
Implementation Pattern
Here's a production-tested pattern for sequential workflows:
import { generateText, generateObject } from 'ai'; import { z } from 'zod';
async function processCustomerFeedback(feedback: string) { const model = 'openai/gpt-4o';
// Step 1: Extract sentiment and key topics
const { object: analysis } = await generateObject({
model,
schema: z.object({
sentiment: z.enum(['positive', 'negative', 'neutral']),
topics: z.array(z.string()),
urgency: z.enum(['low', 'medium', 'high']),
category: z.enum(['bug', 'feature_request', 'complaint', 'praise'])
}),
prompt: Analyze this customer feedback: ${feedback}
});
// Step 2: Generate appropriate response based on analysis
const { text: response } = await generateText({
model,
system: You are a customer service representative. Tone: ${ analysis.sentiment === 'negative' ? 'empathetic and solution-focused' : 'friendly and appreciative' },
prompt: `Generate response to: ${feedback}
Context: Category is ${analysis.category}, urgency is ${analysis.urgency}`
});
// Step 3: Quality check the response const { object: qualityCheck } = await generateObject({ model, schema: z.object({ addresses_issue: z.boolean(), appropriate_tone: z.boolean(), contains_next_steps: z.boolean(), issues: z.array(z.string()) }), prompt: `Evaluate this customer service response:
Original feedback: ${feedback}
Response: ${response}
Check if it addresses the issue, has appropriate tone, and includes next steps.`
});
// Step 4: Regenerate if quality check fails if (!qualityCheck.addresses_issue || !qualityCheck.appropriate_tone) { const { text: improvedResponse } = await generateText({ model: 'openai/gpt-4o', // Use stronger model for regeneration system: 'You are a senior customer service representative.', prompt: `Improve this response addressing these issues: ${qualityCheck.issues.join(', ')}
Original feedback: ${feedback}
Previous response: ${response}`
});
return {
analysis,
response: improvedResponse,
qualityCheck,
regenerated: true
};
}
return { analysis, response, qualityCheck, regenerated: false }; }
For a customer service platform, this workflow maintains 92% first-response quality (no regeneration needed) while the 8% requiring regeneration still complete in under 5 seconds total. Sequential execution with quality gates balances speed and reliability.
Error Handling in Sequential Workflows
Sequential workflows can fail at any step. Implement robust error handling:
async function sequentialWorkflow(input: string) { const results = { step1: null, step2: null, step3: null, error: null };
try { // Step 1 with timeout results.step1 = await Promise.race([ executeStep1(input), timeoutPromise(10000, 'Step 1 timeout') ]);
// Step 2 with retry logic
for (let attempt = 0; attempt < 3; attempt++) {
try {
results.step2 = await executeStep2(results.step1);
break;
} catch (error) {
if (attempt === 2) throw error;
await delay(1000 * Math.pow(2, attempt)); // Exponential backoff
}
}
// Step 3 with fallback
try {
results.step3 = await executeStep3(results.step2);
} catch (error) {
console.error('Step 3 failed, using fallback');
results.step3 = await fallbackStep3(results.step2);
}
return results;
} catch (error) { results.error = error; await logWorkflowFailure(results); throw error; } }
For a document processing pipeline, this error handling maintains 99.7% completion rate despite individual step failures. Timeouts prevent hung processes, retries handle transient failures, and fallbacks ensure partial results when possible.
Parallel Processing: Speed Through Concurrency
Parallel workflows execute independent tasks simultaneously, dramatically reducing total execution time. Use when tasks don't depend on each other's outputs.
When Parallel Processing Makes Sense
Parallel execution excels when:
-
Tasks are independent: Analyzing different aspects of the same input (security review, performance review, code quality review) can happen simultaneously.
-
Latency matters: Users waiting for results benefit from parallelization. 3 sequential tasks taking 2 seconds each = 6 seconds total. Run in parallel = 2 seconds total.
-
You have sufficient resources: Each parallel LLM call costs money. Ensure ROI justifies increased costs.
Real example: For a code review agent, we analyze repositories across three dimensions simultaneously:
Security review: Check for vulnerabilities, injection risks, authentication issues Performance review: Identify bottlenecks, memory leaks, optimization opportunities Maintainability review: Evaluate code quality, documentation, best practices
Sequential execution took 12-15 seconds. Parallel execution takes 4-5 seconds—a 3x speedup. For developers reviewing 20+ PRs daily, this saved 3+ hours per developer weekly. The business case: $40/month additional LLM costs versus $800/month in developer productivity gains.
Implementation Pattern
Parallel execution with Promise.all and intelligent aggregation:
import { generateObject } from 'ai'; import { z } from 'zod';
async function parallelCodeReview(code: string, context: string) { const model = 'openai/gpt-4o';
// Execute all reviews simultaneously const [securityReview, performanceReview, maintainabilityReview] = await Promise.all([ // Security review generateObject({ model, system: 'You are a security expert. Focus on vulnerabilities, injection risks, and auth issues.', schema: z.object({ vulnerabilities: z.array(z.object({ severity: z.enum(['critical', 'high', 'medium', 'low']), location: z.string(), description: z.string(), recommendation: z.string() })), overall_risk: z.enum(['critical', 'high', 'medium', 'low']), summary: z.string() }), prompt: `Review this code for security issues:
Code: ${code}
Context: ${context}`
}),
// Performance review
generateObject({
model,
system: 'You are a performance expert. Focus on bottlenecks, memory leaks, optimization opportunities.',
schema: z.object({
issues: z.array(z.object({
impact: z.enum(['critical', 'high', 'medium', 'low']),
location: z.string(),
description: z.string(),
optimization: z.string()
})),
overall_impact: z.enum(['critical', 'high', 'medium', 'low']),
summary: z.string()
}),
prompt: `Review this code for performance issues:
Code: ${code}
Context: ${context}`
}),
// Maintainability review
generateObject({
model,
system: 'You are a code quality expert. Focus on readability, maintainability, best practices.',
schema: z.object({
concerns: z.array(z.object({
category: z.enum(['naming', 'structure', 'documentation', 'patterns']),
location: z.string(),
description: z.string(),
suggestion: z.string()
})),
quality_score: z.number().min(1).max(10),
summary: z.string()
}),
prompt: `Review this code for quality and maintainability:
Code: ${code}
Context: ${context}`
})
]);
// Aggregate results with another LLM call const { text: executiveSummary } = await generateText({ model, system: 'You are a technical lead summarizing code reviews.', prompt: `Synthesize these code reviews into an executive summary with priority actions:
Security: ${JSON.stringify(securityReview.object)}
Performance: ${JSON.stringify(performanceReview.object)}
Maintainability: ${JSON.stringify(maintainabilityReview.object)}
Provide:
1. Top 3 critical issues to address immediately
2. Overall assessment (approve/needs work/block)
3. Estimated effort to address all issues`
});
return { security: securityReview.object, performance: performanceReview.object, maintainability: maintainabilityReview.object, executiveSummary }; }
For a code review platform processing 500+ PRs daily, parallel execution reduced review time from 15 seconds to 5 seconds while maintaining review quality (92% agreement with human reviewers).
Handling Partial Failures
With parallel execution, some tasks may succeed while others fail. Design for partial results:
async function parallelWorkflowWithFallbacks(input: string) { const tasks = [ executeTask1(input).catch(e => ({ error: e, fallback: 'default1' })), executeTask2(input).catch(e => ({ error: e, fallback: 'default2' })), executeTask3(input).catch(e => ({ error: e, fallback: 'default3' })) ];
const results = await Promise.allSettled(tasks);
// Process results with fallbacks for failures
const processed = results.map((result, index) => {
if (result.status === 'fulfilled') {
if (result.value.error) {
console.error(Task ${index} failed, using fallback, result.value.error);
return result.value.fallback;
}
return result.value;
}
console.error(Task ${index} rejected, result.reason);
return null;
});
// Continue with available results return aggregatePartialResults(processed); }
For a multi-document analysis system, partial failure handling maintained 95% availability despite individual analysis failures. Some documents get partial analysis, but the workflow always completes.
Evaluation Loops: Self-Improving Workflows
Evaluation loops add quality control by assessing intermediate results and iteratively improving them. Use when output quality is critical and first attempts often need refinement.
When to Implement Evaluation Loops
Evaluation loops are essential when:
-
Quality varies significantly: First LLM attempts succeed 60-80% of the time, leaving room for improvement.
-
Quality criteria are objective: You can define clear metrics (accuracy, completeness, tone) that an evaluator LLM can assess.
-
Iterations provide value: Each refinement cycle meaningfully improves output quality.
Real example: For a technical documentation generator, we found first drafts met quality standards only 68% of the time:
- 20% too technical (users confused)
- 8% too basic (experts bored)
- 4% factually incorrect or incomplete
Implementing evaluation loops improved quality to 94% while adding only 2 seconds to generation time (worth the trade-off for permanent documentation). The evaluator checks: technical accuracy, appropriate audience level, completeness, and clarity. Failed checks trigger regeneration with specific feedback.
Implementation Pattern
Evaluation loop with iterative refinement:
import { generateText, generateObject } from 'ai'; import { z } from 'zod';
async function generateWithQualityControl( prompt: string, qualityCriteria: { minScore: number; maxIterations: number; } ) { const model = 'openai/gpt-4o'; let content = ''; let iterations = 0;
// Initial generation const { text } = await generateText({ model, prompt }); content = text;
// Evaluation and refinement loop while (iterations < qualityCriteria.maxIterations) { // Evaluate current content const { object: evaluation } = await generateObject({ model, schema: z.object({ scores: z.object({ accuracy: z.number().min(1).max(10), clarity: z.number().min(1).max(10), completeness: z.number().min(1).max(10), tone: z.number().min(1).max(10) }), overall_score: z.number().min(1).max(10), issues: z.array(z.string()), suggestions: z.array(z.string()), passes: z.boolean() }), prompt: `Evaluate this content against quality criteria:
Content: ${content}
Original request: ${prompt}
Score accuracy, clarity, completeness, and tone (1-10).
Identify specific issues and suggestions for improvement.
Determine if content passes quality threshold (${qualityCriteria.minScore}/10).`
});
// Check if quality threshold met
if (evaluation.overall_score >= qualityCriteria.minScore && evaluation.passes) {
return {
content,
finalEvaluation: evaluation,
iterations: iterations + 1
};
}
// Generate improved version based on feedback
const { text: improved } = await generateText({
model: 'openai/gpt-4o', // Use same or better model for refinement
prompt: `Improve this content addressing these issues:
Original request: ${prompt}
Current content: ${content}
Issues: ${evaluation.issues.join(', ')}
Suggestions: ${evaluation.suggestions.join(', ')}
Current scores: Accuracy ${evaluation.scores.accuracy}/10,
Clarity ${evaluation.scores.clarity}/10,
Completeness ${evaluation.scores.completeness}/10,
Tone ${evaluation.scores.tone}/10
Focus on areas scoring below 8/10.`
});
content = improved;
iterations++;
}
// Max iterations reached return { content, finalEvaluation: null, iterations, warning: 'Max iterations reached without meeting quality threshold' }; }
// Usage const result = await generateWithQualityControl( 'Explain machine learning to a business executive', { minScore: 8, maxIterations: 3 } );
For a content generation platform, evaluation loops improved customer satisfaction from 73% to 91%. Most content passes in 1-2 iterations (average 1.4 iterations), keeping costs reasonable while dramatically improving quality.
Cost Management in Evaluation Loops
Evaluation loops are expensive—each iteration includes generation + evaluation. Manage costs:
-
Use smaller models for evaluation: GPT-4o-mini can evaluate as well as GPT-4 for most criteria at 1/10th the cost.
-
Set iteration limits: Cap at 3-5 iterations max. Infinite loops waste money. If quality isn't achieved after 5 attempts, escalate to human review or use more capable base model.
-
Skip evaluation for simple tasks: Only use evaluation loops when quality variability justifies the cost. For simple translations or reformatting, skip evaluation.
-
Batch evaluation: Instead of evaluating each piece of content separately, batch multiple pieces for single evaluation call when possible.
For a translation service processing 100,000 documents monthly, we use evaluation loops only for high-value documents (based on length, complexity, customer tier). This reduced costs from $18,000/month (all documents) to $6,400/month (10% of documents) while maintaining 93% customer satisfaction.
Orchestrator-Worker Pattern: Coordinated Specialization
The orchestrator-worker pattern uses a primary LLM (orchestrator) to coordinate specialized workers. Each worker optimizes for specific subtasks while the orchestrator maintains overall context and coherence.
When to Use Orchestration
Orchestrator-worker patterns excel when:
-
Tasks require different expertise: Legal review requires different knowledge than technical implementation. Specialized workers perform better than generalist models.
-
You need consistent coordination: The orchestrator ensures all workers contribute to a coherent whole, preventing contradictory or disconnected outputs.
-
Workflows are dynamic: The orchestrator can adapt execution based on intermediate results, calling different workers as needed.
Real example: For a contract generation system, the orchestrator plans the contract structure while specialized workers handle:
- Legal worker: Ensures compliance with jurisdiction-specific laws
- Financial worker: Calculates terms, payment schedules, penalties
- Domain worker: Incorporates industry-specific clauses (SaaS, construction, etc.)
The orchestrator maintains overall coherence, ensuring financial terms align with legal constraints and domain-specific clauses don't contradict general terms. This produced contracts with 97% attorney approval rate versus 78% for single-model generation.
Implementation Pattern
Orchestrator coordinates specialized workers:
import { generateObject, generateText } from 'ai'; import { z } from 'zod';
async function orchestratedFeatureImplementation(featureRequest: string) { const orchestratorModel = 'openai/gpt-4o'; // Stronger model for planning const workerModel = 'openai/gpt-4o'; // Workers can be same or different
// Orchestrator: Plan implementation const { object: plan } = await generateObject({ model: orchestratorModel, schema: z.object({ feature_summary: z.string(), components: z.array(z.object({ type: z.enum(['frontend', 'backend', 'database', 'api', 'tests']), description: z.string(), dependencies: z.array(z.string()), priority: z.enum(['high', 'medium', 'low']) })), implementation_order: z.array(z.string()), estimated_complexity: z.enum(['simple', 'moderate', 'complex']) }), system: 'You are a senior software architect planning feature implementations.', prompt: `Analyze this feature request and create an implementation plan:
Feature: ${featureRequest}
Break down into components, determine dependencies, and suggest implementation order.`
});
// Workers: Execute planned components in order const implementations = [];
for (const componentName of plan.implementation_order) { const component = plan.components.find(c => c.description.includes(componentName) ); if (!component) continue;
// Select specialized worker based on component type
const workerSystem = {
frontend: 'You are a senior frontend engineer specializing in React/Next.js. Focus on user experience, accessibility, and performance.',
backend: 'You are a senior backend engineer specializing in Node.js APIs. Focus on scalability, security, and data integrity.',
database: 'You are a database architect. Focus on schema design, indexing, and query performance.',
api: 'You are an API designer. Focus on RESTful principles, documentation, and versioning.',
tests: 'You are a test engineer. Focus on comprehensive test coverage, edge cases, and maintainability.'
}[component.type];
const { text: implementation } = await generateText({
model: workerModel,
system: workerSystem,
prompt: `Implement this component:
Component: ${component.description}
Feature context: ${featureRequest}
Dependencies: ${component.dependencies.join(', ')}
Previously implemented components:
${implementations.map(i => `- ${i.component}: ${i.summary}`).join('\n')}
Provide complete implementation with inline documentation.`
});
implementations.push({
component: componentName,
type: component.type,
code: implementation,
summary: component.description
});
}
// Orchestrator: Review coherence and integration const { text: integration } = await generateText({ model: orchestratorModel, system: 'You are a senior architect reviewing feature implementations.', prompt: `Review these component implementations for coherence and integration:
Feature: ${featureRequest}
Plan: ${JSON.stringify(plan, null, 2)}
Implementations: ${JSON.stringify(implementations, null, 2)}
Provide:
1. Integration checklist
2. Potential issues or conflicts
3. Testing recommendations`
});
return { plan, implementations, integration }; }
For a development automation tool, orchestrated workflows improved code quality (fewer integration bugs) while reducing generation time through parallel worker execution. The orchestrator's planning prevents workers from producing incompatible implementations.
Optimizing Orchestrator-Worker Costs
Orchestrator-worker is the most expensive pattern (8-20 LLM calls). Optimize costs:
-
Use smaller models for simple workers: Frontend and test workers can use GPT-4o-mini for 1/10th the cost. Reserve GPT-4o for complex backend/architecture work.
-
Cache worker outputs: If multiple features need similar components, cache and reuse worker implementations. For a code generation platform, caching common components (authentication, CRUD operations) reduced costs 35%.
-
Parallel worker execution: Workers often operate independently. Execute in parallel when possible (like parallel processing pattern) to reduce latency without increasing costs.
-
Limit orchestrator complexity: Simple orchestrators using generateObject with structured planning schemas work as well as complex multi-call orchestrators at fraction of cost.
For a feature development tool processing 1,000 features monthly, these optimizations reduced monthly costs from $12,000 to $4,800 while maintaining output quality. The 60% cost reduction justified continued use of expensive orchestrator pattern versus reverting to simpler architectures.
Routing: Context-Aware Execution Paths
Routing patterns let the model decide execution paths based on context. Unlike fixed sequential workflows, routing adapts to input characteristics, optimizing for different scenarios dynamically.
When Routing Adds Value
Routing patterns excel when:
-
Inputs vary significantly: Customer service queries range from simple FAQ to complex refund disputes. Each needs different handling.
-
Cost optimization matters: Route simple queries to cheap models (GPT-4o-mini), complex queries to expensive models (Claude Sonnet). This dramatically reduces average costs.
-
Latency targets vary: Simple queries need instant responses; complex queries tolerate longer processing for better quality.
Real example: For a customer support chatbot handling 50,000 queries monthly:
- 60% are simple FAQ (product info, pricing, hours) → GPT-4o-mini, <1 second
- 30% are standard issues (account, billing, orders) → GPT-4o, 2-3 seconds
- 10% are complex problems (technical support, escalations) → Claude Sonnet + tool calling, 5-8 seconds
Routing reduced average query cost from $0.08 (all queries on GPT-4o) to $0.03 (routed appropriately), saving $2,500 monthly while improving response times for simple queries by 60%.
Implementation Pattern
Two-stage routing: classify then process:
import { generateObject, generateText } from 'ai'; import { z } from 'zod';
async function routedCustomerSupport(query: string, context: any) { // Stage 1: Classification and routing decision const { object: classification } = await generateObject({ model: 'openai/gpt-4o-mini', // Use cheap model for routing schema: z.object({ category: z.enum(['faq', 'account', 'technical', 'billing', 'refund', 'escalation']), complexity: z.enum(['simple', 'moderate', 'complex']), requires_tools: z.boolean(), reasoning: z.string() }), system: 'You are a triage specialist routing customer queries.', prompt: `Classify this customer query:
Query: ${query}
Customer context: ${JSON.stringify(context)}
Determine:
1. Category (faq, account, technical, billing, refund, escalation)
2. Complexity (simple, moderate, complex)
3. Whether it requires external tools (database lookup, API calls)
4. Brief reasoning`
});
// Stage 2: Route to appropriate handler based on classification
// Simple FAQ - fast, cheap model if (classification.category === 'faq' && classification.complexity === 'simple') { const { text: response } = await generateText({ model: 'openai/gpt-4o-mini', system: 'You are a helpful customer service agent. Provide concise, friendly answers.', prompt: query });
return {
response,
classification,
model: 'gpt-4o-mini',
latency: 'fast'
};
}
// Standard queries - balanced model
if (classification.complexity === 'moderate' && !classification.requires_tools) {
const { text: response } = await generateText({
model: 'openai/gpt-4o',
system: You are an experienced customer service agent specializing in ${classification.category}.,
prompt: `Customer query: ${query}
Customer context: ${JSON.stringify(context)}`
});
return {
response,
classification,
model: 'gpt-4o',
latency: 'medium'
};
}
// Complex queries with tools - powerful model + agent capabilities
if (classification.complexity === 'complex' || classification.requires_tools) {
const { text: response } = await generateText({
model: 'anthropic/claude-3-5-sonnet',
system: You are a senior customer service specialist with access to tools. Specialization: ${classification.category},
tools: {
// Define relevant tools based on category
lookup_account: /* tool definition /,
process_refund: / tool definition /,
create_ticket: / tool definition */
},
prompt: `Customer query: ${query}
Customer context: ${JSON.stringify(context)}
Use available tools as needed to fully resolve the query.`
});
return {
response,
classification,
model: 'claude-3-5-sonnet',
latency: 'slow',
tools_used: true
};
}
// Fallback to standard handling const { text: response } = await generateText({ model: 'openai/gpt-4o', prompt: query });
return { response, classification, model: 'gpt-4o' }; }
For the customer support chatbot, routing improved key metrics:
- Average response time: 3.2s → 1.8s (44% faster)
- Cost per query: $0.08 → $0.03 (62% cheaper)
- Customer satisfaction: 78% → 86% (10% higher)
The initial classification call ($0.001) pays for itself through optimized downstream routing.
Advanced Routing: Multi-Dimensional Decisions
Sophisticated routing considers multiple factors:
async function advancedRouting(input: string, metadata: any) { const { object: routing } = await generateObject({ model: 'openai/gpt-4o-mini', schema: z.object({ complexity: z.enum(['simple', 'moderate', 'complex']), domain: z.enum(['technical', 'business', 'creative']), urgency: z.enum(['low', 'medium', 'high']), estimated_tokens: z.number(), recommended_model: z.string(), reasoning: z.string() }), prompt: `Analyze this input for optimal routing:
Input: ${input}
User tier: ${metadata.userTier}
History: ${metadata.previousInteractions}
Consider: complexity, domain, urgency, expected token usage
Recommend optimal model and approach.`
});
// Route based on multi-dimensional decision const modelChoice = selectModel(routing, metadata); const systemPrompt = selectPrompt(routing, metadata); const toolsEnabled = routing.complexity === 'complex';
return { routing, modelChoice, systemPrompt, toolsEnabled }; }
This enables sophisticated optimizations:
- Free tier users → cheap models for all queries
- Premium users → powerful models for better experience
- High urgency + simple → fast model with < 1s response
- High urgency + complex → parallel processing for speed
- Low urgency + complex → thorough evaluation loops for quality
For a SaaS platform with tiered pricing, advanced routing delivered differentiated service levels while optimizing costs. Premium users received Claude Sonnet for all queries (better experience justifies cost), while free tier received GPT-4o-mini (adequate quality at sustainable cost).
Conclusion
AI agent workflows are essential for building reliable production systems beyond simple chatbots. The five patterns covered—sequential processing, parallel execution, evaluation loops, orchestration, and routing—provide a toolkit for different requirements:
Start simple: Sequential workflows for well-understood tasks with clear steps. Add complexity incrementally as business value justifies additional costs and maintenance burden.
Optimize for latency: Parallel processing when tasks are independent, routing when inputs vary significantly. Both reduce user-perceived latency dramatically.
Improve quality: Evaluation loops when output quality varies and refinement provides value. Worth 2-3x cost increase when quality directly impacts business outcomes.
Handle complexity: Orchestrator-worker for tasks requiring diverse expertise, routing for dynamic adaptation. Most expensive patterns—only use when simpler approaches fail.
The key to successful agent workflows: match pattern complexity to business requirements. Over-engineered workflows waste money and developer time. Under-engineered workflows produce unreliable results that erode user trust. Find the balance through iterative refinement based on production metrics: cost per query, latency, quality scores, and user satisfaction.
Budget 2-4 weeks for initial workflow implementation, 4-8 weeks for optimization based on production data. The investment pays off through reduced manual work, improved quality, and scalable AI operations that grow with your business.
Building complex AI agent workflows?
We've implemented multi-step agent workflows for clients across document processing, customer service, code review, and content generation. Our team can help you design, implement, and optimize workflows that balance quality, cost, and maintainability. Let's discuss your AI agent project.
Get in Touch