Note: “Flow” in this document refers to wxO Agentic Workflow
Agent Performance Overview
What is an Agent in wxO?
An agent in watsonx Orchestrate is an intelligent component that:- Reasons about problems and makes decisions
- Orchestrates tools and flows to accomplish tasks
- Interacts with users through natural language
- Adapts its approach based on context and results
Agent Execution Model (ReAct Loop)
Agents use a ReAct (Reasoning and Acting) loop to solve problems: ReAct Loop Steps:- Thought: Analyze the situation (LLM call)
- Action: Select tool or provide answer (LLM call)
- Observation: Execute tool and observe result
- Repeat or Finish: Continue loop or provide answer
- Each loop iteration involves multiple LLM calls
- Agents can call tools (Python, Langflow, APIs, Flows)
- Agents can invoke other agents
- Loop continues until agent determines task is complete
Agent Performance Components
Total agent response time consists of:- Guidelines processing: LLM call before agent loop to process guidelines (if configured)
- Number of reasoning loops: More loops = longer response time
- Tool selection complexity: More available tools = longer selection time
- Tool execution time: Slow tools slow down the entire agent
- LLM model size: Larger models are slower but more capable
- Context size: Larger context = longer processing time
- Prompt complexity: Complex prompts require more reasoning
- Plugin logic: Custom plugin code execution (if using plugins)
How to Test Agent Performance
Testing Methodology
For comprehensive testing methodology, see the main Performance Guide which covers:- Baseline Testing: Establish performance benchmarks under minimal load
- Load Testing: Validate behavior under expected production load
- Stress Testing: Determine system breaking points (less relevant for SaaS)
Testing Agents via API
API Documentation: Orchestrate Runs API How to Test:- Execute agent runs and retrieve run information using the Orchestrate Assistant Runs API
- Get detailed traces using the searchTraces API and Get Spans for Trace API
- User requests (human-in-the-loop interactions) must be obtained through the Messages API
- This API retrieves messages from a thread, including user responses to flow requests
- Use the APIs programmatically to run multiple test queries
- Collect performance metrics across runs
- Calculate statistics (average, median, percentiles)
- Compare performance across different agent configurations
Testing Agents via Channel-Specific Interfaces
Available Testing Channels:- Web chat UI
- Voice interfaces
- Slack
- Other integrated channels
- Test via API first - Ensure the agent works correctly through the API before running UI automation tests
- Then test via channels - Once API testing is successful, validate the agent through channel-specific interfaces
- Use appropriate tools - wxO has no specific tool recommendations; choose tools that fit your testing needs
Key Metrics to Track
Speed Metrics
Response Time
Response Time
- Total time from query to response
- Track: Average, Median, 95th percentile, 99th percentile
- Target: Define based on your SLA requirements
Reasoning Loops
Reasoning Loops
- Number of thought-action-observation cycles
- Fewer loops generally means faster response
- Track: Average loops per query type
Tool Calls
Tool Calls
- Number of tools invoked per agent run
- More tool calls = longer execution time
- Track: Average tool calls per query type
LLM Calls
LLM Calls
- Number of LLM inference calls
- Each call adds latency
- Track: Total calls per run
Quality Metrics
Accuracy
Accuracy
- Percentage of correct responses
- Critical: Speed without accuracy is useless
- Track: Accuracy rate per query type
Relevance
Relevance
- How well the response addresses the query
- Track: User satisfaction scores
Completeness
Completeness
- Whether the response fully answers the question
- Track: Follow-up question rate
Consistency
Consistency
- Similar queries should get similar responses
- Track: Response similarity for equivalent queries
Cost Metrics
Token Usage:- Total tokens consumed per run
- Directly impacts LLM costs
- Track: Average tokens per query type
- External API calls and their costs
- Track: API call count and associated costs
Agent Optimization Strategies
1. Choose the Right Approach
Different Agent Styles:| Use Case | Recommended Approach | Why |
|---|---|---|
| Simple text generation | Default Agent | Faster, simplier call |
| Straightforward Q&A | Default Agent | No reasoning loop needed |
| Complex reasoning | ReAct Agent | Needs multi-step thinking |
| Tool orchestration | ReAct Agent | Requires tool selection |
| Multi-step workflows | ReAct Agent | Benefits from reasoning loop |
2. Optimize Prompts
Make Prompts Clear and Specific
- ❌ Vague: “Help the user”
- ✅ Clear: “Answer customer questions about order status using the order_lookup tool”
Provide Examples
- Include few-shot examples in the prompt
- Shows the agent the expected behavior
- Reduces reasoning loops
Structure Output
- Request structured responses (JSON, specific format)
- Reduces generation time
- Easier to parse
3. Optimize Guidelines
What are Guidelines: Agent Guidelines are instructions that help shape agent behavior. Optimization Strategies:- Keep guidelines concise: Shorter guidelines = faster processing
- Use guidelines only when needed: Remove if not providing value
- Test with and without: Measure impact on response time
- Balance guidance vs speed: Guidelines improve quality but add latency
4. Limit Available Tools
Solution:- Provide only relevant tools for the agent’s purpose
- Group related tools
- Use clear, descriptive tool names
- ❌ Agent with 25 generic tools
- ✅ Agent with 5-10 focused, relevant tools
4a. Avoid Deep Agent Hierarchies
Why It Matters:- Each nested agent runs its own ReAct loop with multiple LLM calls
- Deep hierarchies spend more time on reasoning than actual work
- Agent A → Agent B → Agent C means 3x the reasoning overhead
- LLM inference time compounds at each level
- Limit nesting depth to 2-3 levels maximum
- Consider using Flows instead of nested agents for deterministic orchestration
- Use tools (Python, API) for simple operations instead of wrapping them in agents
- Flatten agent hierarchies where possible
- ❌ Agent → Agent → Agent → Agent (4 levels of reasoning overhead)
- ✅ Agent → Flow → Tools (reasoning only at top level, deterministic execution below)
- ✅ Agent → Agent → Tools (2 levels, acceptable for complex scenarios)
5. Choose Appropriate Model
Model Selection Trade-offs:| Model Size | Speed | Capability | Use Case |
|---|---|---|---|
| Small | Fastest | Basic | Simple queries, high volume |
| Medium | Moderate | Good | General purpose |
| Large | Slower | Best | Complex reasoning, high accuracy needs |
Agent Quality and Evaluation
Why Quality Matters
Speed vs Quality Trade-off
- Fast but inaccurate agents provide poor user experience
- Slow but accurate agents frustrate users
- Goal: Optimize for both speed AND quality
Agent Evaluation
Evaluation Framework: Quick Evaluation of Agents and Tools What You Can Evaluate:- Agent accuracy on test queries
- Response quality and relevance
- Consistency across similar queries
- Comparison between different agent configurations
Best Practices Summary
Do’s
Measure before optimizing
- Establish baseline performance
- Identify actual bottlenecks
- Track both speed and quality metrics
Use the right approach
- Generative Prompts for simple tasks
- Agents for complex reasoning
- Don’t over-engineer
Optimize prompts
- Clear, specific instructions
- Include examples
- Request structured output
Optimize guidelines
- Keep guidelines concise
- Use only when needed
- Test impact on performance
Limit tool count
- Provide only relevant tools
- Use descriptive names
- Group related functionality
Avoid deep agent hierarchies
- Limit agent nesting to 2-3 levels maximum
- Use Flows for deterministic orchestration
- Use tools instead of agents for simple operations
- Flatten hierarchies to reduce reasoning overhead
Optimize plugin logic
- Write efficient plugin code
- Cache plugin results when appropriate
- Profile plugin performance
Validate quality
- Use evaluation framework to validate accuracy
- Ensure optimizations don’t degrade quality
- Iterate based on real metrics
Don’ts
Resources and Links
Main Performance Guide
Comprehensive performance guide overview
Flow Performance
Flow-specific performance optimization
Tool Performance
Tool execution performance guide

