Performance Testing ApproachThis guide focuses on how to measure and optimize Knowledge runtime performance. Performance varies significantly based on workload, configuration, system load, and network conditions. Always measure in your own environment.
Note: “Flow” in this document refers to wxO Agentic Workflow
Overview
Knowledge in watsonx Orchestrate enables agents to return accurate, relevant answers based on trusted information. Knowledge sources provide factual and contextual data that help interpret user questions and generate meaningful responses.
Knowledge Source Types:
- File Uploads: Static content stored in internal knowledge base
- External Repositories: Milvus, Elasticsearch, Astra DB, or custom services
Key Insight: Knowledge is internally implemented as a built-in tool, which means:
- It shares the same performance characteristics as other tools in wxO
- Each knowledge retrieval incurs tool invocation overhead
- Performance depends on both the tool call overhead and the actual search execution
- Understanding this helps you optimize knowledge usage patterns
Performance Components:
- Tool Invocation: Time to initialize and prepare the knowledge request
- Search Execution: Time to perform the actual search (varies by repository type)
- Result Processing: Time to format and return results to the agent
This is important because:
- Multiple knowledge calls = Multiple tool invocation costs
- The actual search time is often fast, but tool overhead exists for each call
Knowledge Modes: Classic vs Dynamic
Knowledge can operate in two modes, each with different performance characteristics:
Classic Mode:
- Knowledge performs query rewrite, search, and answer generation
- Returns a complete answer to the agent
- Latency: Higher (includes query rewrite + search + answer generation)
- Use case: When you want consistent, pre-formatted answers
Dynamic Mode (Recommended):
- Knowledge returns raw search results to the agent
- Agent processes the results and formulates its own response
- Latency: Lower for knowledge retrieval
- Use case: When agent needs flexibility to interpret and combine results
Performance Comparison:
| Aspect | Classic Mode | Dynamic Mode |
|---|
| Knowledge execution | Slower (query rewrite + search + answer generation) | Faster (search only) |
| Agent processing | Minimal (receives answer) | More (interprets results) |
| Total latency | Higher | Variable (depends on agent processing) |
| Flexibility | Lower (fixed answer format) | Higher (agent can combine/interpret) |
| Best for | Simple Q&A, consistent answers | Complex reasoning, multi-source synthesis |
Recommendation: Use Dynamic mode for better performance and flexibility unless you specifically need pre-formatted answers.
You can use the built-in file uploads or connect to external knowledge sources like Milvus, Elasticsearch, Astra DB, or a custom service as your knowledge source.
Note: Using external knowledge sources require creating your own vector index and ingesting your data into it.
| Repository Type | Performance | Network Dependency | Best For |
|---|
| File Uploads | Very fast | Managed by wxO | Static content, quick setup, small to medium datasets |
| Milvus | Fast | Yes | Large-scale vector search, high-volume queries |
| Elasticsearch | Fast | Yes | Full-text/hybrid search, structured data, complex filtering |
| Astra DB | Fast | Yes | Cloud-native deployments, managed service |
| Custom Service | Varies | Yes | Specialized integrations, custom search logic |
Repository Selection Guide
File Uploads:
- Best for: Static content, documentation, policies, FAQs
- Limits: 20 files per batch, 30MB total, 600 pages per file
- Formats:
.docx, .pdf, .pptx, .xlsx (25MB max), .csv, .html, .txt (5MB max)
- Performance: Fastest option for small to medium datasets
Milvus:
- Best for: Large-scale semantic search, high query volumes
- Strengths: Optimized for vector similarity search, scales well
- Consideration: Requires setup and maintenance
Elasticsearch:
- Best for: Full-text search, hybrid search, structured content
- Strengths: Flexible query capabilities, good for keyword and semantic search
- Consideration: Supports custom query bodies for advanced filtering
Astra DB:
- Best for: Cloud deployments, managed service preference
- Strengths: Consistent performance, reduced operational overhead
- Consideration: Cloud-native architecture
Custom Service:
- Best for: Specialized integrations, custom search logic, unique requirements
- Strengths: Full control over search implementation, can integrate proprietary systems
- Consideration: Performance varies by implementation, requires custom development and maintenance
Retrieval Speed vs Quality Trade-offs
Understanding the trade-offs between speed and quality is crucial for optimizing knowledge performance. Different configurations impact both retrieval speed and answer accuracy.
Search Strategy Trade-offs
For Elasticsearch & Astra DB:
| Search Type | Speed | Quality | Best For |
|---|
| Keyword Search | Fastest | Good for exact matches | Known terminology, product codes, exact phrases |
| Vector Search | Moderate | Best for semantic understanding | Natural language queries, conceptual similarity |
| Hybrid Search | Slowest | Best overall accuracy | Complex queries requiring both precision and recall |
Keyword Search:
- Speed: Fastest - no embedding generation required
- Quality: Excellent for exact matches, limited for semantic similarity
- Use when: Users search with specific terms, product codes, or exact phrases
- Example: “Order #12345”, “Return policy”, “Product SKU-789”
Vector Search:
- Speed: Moderate - requires embedding generation for query
- Quality: Excellent for semantic understanding, handles synonyms and paraphrasing
- Use when: Natural language queries, conceptual searches, multilingual content
- Example: “How do I get my money back?” (matches “refund policy”)
Hybrid Search:
- Speed: Slightly slower - combines both keyword and vector search
- Quality: Best overall - combines precision of keywords with semantic understanding
- Use when: Accuracy is critical, queries are complex, or you need both exact and semantic matches
- Example: “iPhone 15 battery life issues” (exact product + semantic problem)
Recommendation: Use hybrid search for best quality, and only optimize to keyword or vector search based on your specific use case and performance requirements.
Result Count Trade-offs (Default is 5)
| Result Count | Speed | Quality | Best For |
|---|
| Low (1-3) | Fastest | Risk of missing relevant info | Simple queries, single-fact retrieval |
| Medium (5-10) | Moderate | Balanced coverage | General purpose, most use cases |
| High (15+) | Slowest | Comprehensive but may include noise | Complex queries, research tasks |
Performance Impact:
- More results = Longer retrieval time
- More results = Larger context for agent processing
- More results = Higher token usage
Quality Impact:
- Too few results = May miss relevant information
- Too many results = Noise and irrelevant content dilute quality
- Optimal count depends on content granularity and query complexity
Recommendation: Start with 5-10 results for most use cases. Increase for complex queries requiring comprehensive coverage; decrease for simple fact retrieval.
Index Configuration Trade-offs
For External Repositories (Milvus, Elasticsearch, Astra DB):
Proper indexing significantly impacts both speed and quality:
Vector Index Types:
- HNSW (Hierarchical Navigable Small World): Fast search, high accuracy, more memory usage
- IVF (Inverted File): Balanced speed/memory, good for large datasets
- Flat: Most accurate but slowest, only suitable for small datasets
Embedding Dimensions:
- Lower dimensions (384): Faster, less storage, slightly lower quality
- Higher dimensions (768, 1536): Slower, more storage, better semantic understanding
Recommendation:
- Use HNSW indexing for vector search in production
- Choose embeddings models with 768 dimensions for balanced performance and quality
- Test with your specific data to find optimal configuration
Optimization Strategies
1. Choose the Right Mode
Use Dynamic Mode (Recommended):
- Faster knowledge retrieval
- More flexible agent processing
- Better for complex reasoning tasks
Use Classic Mode:
- When you need consistent, pre-formatted answers
- For simple Q&A scenarios
- When agent flexibility is not required
2. Select Appropriate Repository
File Uploads:
- ✅ Static content (policies, documentation, FAQs)
- ✅ Quick setup and testing
- ✅ Small to medium datasets
External Repositories:
- ✅ Large-scale content (100K+ documents)
- ✅ Frequently updated information
- ✅ High query volumes
- ✅ Advanced search capabilities needed
3. Optimize Search Strategy
Choose Based on Your Use Case:
- Keyword search: For exact term matching, fastest performance
- Vector search: For natural language queries, semantic understanding
- Hybrid search: For best accuracy with slightly slower performance
Adjust Result Counts:
- Start with 5-10 results
- Reduce to 1-3 for simple fact retrieval
- Increase to 15+ only for complex research queries
4. Optimize Repository Configuration
For External Repositories:
- Ensure proper indexing (HNSW for vector search)
- Minotor index size and performance
- Use metadata filtering to narrow search scope
- Monitor and optimize network connectivity
5. Provide Clear Knowledge Source Descriptions
Why It Matters:
- Helps agents select the right knowledge source
- Improves query formulation
- Enhances overall agent performance
Best Practices:
- Describe what content the source contains
- Specify the types of questions it can answer
- Include relevant keywords and topics
- Keep descriptions clear and concise
Example:
- ❌ Poor: “Company documents”
- ✅ Good: “Employee handbook containing HR policies, benefits information, and workplace guidelines. Use for questions about PTO, health insurance, and company policies.”
How to Measure
Using Agent Traces:
- Execute agent runs that use knowledge
- Retrieve detailed traces using the searchTraces API
- Analyze knowledge tool execution time in traces
Knowledge tool response has a
debug object with detailed information about processing time and other metrics, including the following:
total_time_ms: Total time spent in the knowledge tool
search_time_ms: Time spent calling search excluding embeeding generation
answer_generation_time_ms: Time spent calling LLM to generate answer
- Identify bottlenecks in the knowledge pipeline
Testing External Repositories Independently:
- Test repository performance directly (outside of wxO)
- Measure search latency at the repository level
- Compare with end-to-end knowledge tool performance
- Isolate network vs processing time
Key Metrics to Track
Speed Metrics:
- Total knowledge execution time: End-to-end retrieval time
- Embedding generation time: Time to create vector embeddings
- Search latency: Time spent in actual search operation
- Answer generation time: Time to create final response
- Network latency: Time for external repository communication (if applicable)
- Establish baselines: Measure performance with different configurations
- Test with representative queries: Use real-world query patterns
- Monitor over time: Track performance trends as data grows
- Compare modes: Test Classic vs Dynamic mode for your use case
- Test different search strategies: Compare keyword, vector, and hybrid search
- Vary result counts: Find optimal balance for your use case
Summary
Key Points:
- Dynamic mode (recommended): Faster knowledge retrieval, more flexible agent processing
- Repository selection matters: File uploads for static content, external repositories for scale
- Search strategy impacts performance: Keyword (fastest) → Vector (moderate) → Hybrid (slowest, most accurate)
- Result count affects speed and quality: 5-10 results optimal for most use cases
- Proper indexing is critical: Use HNSW for vector search
Speed vs Quality Trade-offs:
- Search strategy: Balance between speed (keyword) and semantic understanding (vector/hybrid)
- Result count: More results = better coverage but slower and potentially noisier
- Index configuration: Better indexing = faster search but requires more resources
Performance Best Practices:
- Use Dynamic mode for better performance and flexibility
- Choose appropriate repository based on content size and update frequency
- Select search strategy based on query types (keyword/vector/hybrid)
- Optimize result counts (start with 5-10)
- Ensure proper indexing for external repositories
- Provide clear, detailed knowledge source descriptions
- Monitor performance metrics and adjust based on real usage patterns
- Test with representative queries before production deployment
Related Guides: