Knowledge Performance Guide - IBM watsonx Orchestrate ADK

Part of: watsonx Orchestrate Performance Guide

Performance Testing ApproachThis guide focuses on how to measure and optimize Knowledge runtime performance. Performance varies significantly based on workload, configuration, system load, and network conditions. Always measure in your own environment.

Note: “Flow” in this document refers to wxO Agentic Workflow

Overview

Knowledge in watsonx Orchestrate enables agents to return accurate, relevant answers based on trusted information. Knowledge sources provide factual and contextual data that help interpret user questions and generate meaningful responses. Knowledge Source Types:

File Uploads: Static content stored in internal knowledge base
External Repositories: Milvus, Elasticsearch, Astra DB, or custom services

Understanding Knowledge Performance

Knowledge as a Built-in Tool

Key Insight: Knowledge is internally implemented as a built-in tool, which means:

It shares the same performance characteristics as other tools in wxO
Each knowledge retrieval incurs tool invocation overhead
Performance depends on both the tool call overhead and the actual search execution
Understanding this helps you optimize knowledge usage patterns

Performance Components:

Tool Invocation: Time to initialize and prepare the knowledge request
Search Execution: Time to perform the actual search (varies by repository type)
Result Processing: Time to format and return results to the agent

This is important because:

Multiple knowledge calls = Multiple tool invocation costs
The actual search time is often fast, but tool overhead exists for each call

Knowledge Modes: Classic vs Dynamic

Knowledge can operate in two modes, each with different performance characteristics: Classic Mode:

Knowledge performs query rewrite, search, and answer generation
Returns a complete answer to the agent
Latency: Higher (includes query rewrite + search + answer generation)
Use case: When you want consistent, pre-formatted answers

Dynamic Mode (Recommended):

Knowledge returns raw search results to the agent
Agent processes the results and formulates its own response
Latency: Lower for knowledge retrieval
Use case: When agent needs flexibility to interpret and combine results

Performance Comparison:

Aspect	Classic Mode	Dynamic Mode
Knowledge execution	Slower (query rewrite + search + answer generation)	Faster (search only)
Agent processing	Minimal (receives answer)	More (interprets results)
Total latency	Higher	Variable (depends on agent processing)
Flexibility	Lower (fixed answer format)	Higher (agent can combine/interpret)
Best for	Simple Q&A, consistent answers	Complex reasoning, multi-source synthesis

Recommendation: Use Dynamic mode for better performance and flexibility unless you specifically need pre-formatted answers.

Performance by Knowledge Sources

You can use the built-in file uploads or connect to external knowledge sources like Milvus, Elasticsearch, Astra DB, or a custom service as your knowledge source. Note: Using external knowledge sources require creating your own vector index and ingesting your data into it.

Performance Characteristics by Type

Repository Type	Performance	Network Dependency	Best For
File Uploads	Very fast	Managed by wxO	Static content, quick setup, small to medium datasets
Milvus	Fast	Yes	Large-scale vector search, high-volume queries
Elasticsearch	Fast	Yes	Full-text/hybrid search, structured data, complex filtering
Astra DB	Fast	Yes	Cloud-native deployments, managed service
Custom Service	Varies	Yes	Specialized integrations, custom search logic

Repository Selection Guide

File Uploads:

Best for: Static content, documentation, policies, FAQs
Limits: 20 files per batch, 30MB total, 600 pages per file
Formats: .docx, .pdf, .pptx, .xlsx (25MB max), .csv, .html, .txt (5MB max)
Performance: Fastest option for small to medium datasets

Milvus:

Best for: Large-scale semantic search, high query volumes
Strengths: Optimized for vector similarity search, scales well
Consideration: Requires setup and maintenance

Elasticsearch:

Best for: Full-text search, hybrid search, structured content
Strengths: Flexible query capabilities, good for keyword and semantic search
Consideration: Supports custom query bodies for advanced filtering

Astra DB:

Best for: Cloud deployments, managed service preference
Strengths: Consistent performance, reduced operational overhead
Consideration: Cloud-native architecture

Custom Service:

Best for: Specialized integrations, custom search logic, unique requirements
Strengths: Full control over search implementation, can integrate proprietary systems
Consideration: Performance varies by implementation, requires custom development and maintenance

Retrieval Speed vs Quality Trade-offs

Understanding the trade-offs between speed and quality is crucial for optimizing knowledge performance. Different configurations impact both retrieval speed and answer accuracy.

Search Strategy Trade-offs

For Elasticsearch & Astra DB:

Search Type	Speed	Quality	Best For
Keyword Search	Fastest	Good for exact matches	Known terminology, product codes, exact phrases
Vector Search	Moderate	Best for semantic understanding	Natural language queries, conceptual similarity
Hybrid Search	Slowest	Best overall accuracy	Complex queries requiring both precision and recall

Keyword Search:

Speed: Fastest - no embedding generation required
Quality: Excellent for exact matches, limited for semantic similarity
Use when: Users search with specific terms, product codes, or exact phrases
Example: “Order #12345”, “Return policy”, “Product SKU-789”

Vector Search:

Speed: Moderate - requires embedding generation for query
Quality: Excellent for semantic understanding, handles synonyms and paraphrasing
Use when: Natural language queries, conceptual searches, multilingual content
Example: “How do I get my money back?” (matches “refund policy”)

Hybrid Search:

Speed: Slightly slower - combines both keyword and vector search
Quality: Best overall - combines precision of keywords with semantic understanding
Use when: Accuracy is critical, queries are complex, or you need both exact and semantic matches
Example: “iPhone 15 battery life issues” (exact product + semantic problem)

Recommendation: Use hybrid search for best quality, and only optimize to keyword or vector search based on your specific use case and performance requirements.

Result Count Trade-offs (Default is 5)

Result Count	Speed	Quality	Best For
Low (1-3)	Fastest	Risk of missing relevant info	Simple queries, single-fact retrieval
Medium (5-10)	Moderate	Balanced coverage	General purpose, most use cases
High (15+)	Slowest	Comprehensive but may include noise	Complex queries, research tasks

Performance Impact:

More results = Longer retrieval time
More results = Larger context for agent processing
More results = Higher token usage

Quality Impact:

Too few results = May miss relevant information
Too many results = Noise and irrelevant content dilute quality
Optimal count depends on content granularity and query complexity

Recommendation: Start with 5-10 results for most use cases. Increase for complex queries requiring comprehensive coverage; decrease for simple fact retrieval.

Index Configuration Trade-offs

For External Repositories (Milvus, Elasticsearch, Astra DB): Proper indexing significantly impacts both speed and quality: Vector Index Types:

HNSW (Hierarchical Navigable Small World): Fast search, high accuracy, more memory usage
IVF (Inverted File): Balanced speed/memory, good for large datasets
Flat: Most accurate but slowest, only suitable for small datasets

Embedding Dimensions:

Lower dimensions (384): Faster, less storage, slightly lower quality
Higher dimensions (768, 1536): Slower, more storage, better semantic understanding

Recommendation:

Use HNSW indexing for vector search in production
Choose embeddings models with 768 dimensions for balanced performance and quality
Test with your specific data to find optimal configuration

Optimization Strategies

1. Choose the Right Mode

Use Dynamic Mode (Recommended):

Faster knowledge retrieval
More flexible agent processing
Better for complex reasoning tasks

Use Classic Mode:

When you need consistent, pre-formatted answers
For simple Q&A scenarios
When agent flexibility is not required

2. Select Appropriate Repository

File Uploads:

✅ Static content (policies, documentation, FAQs)
✅ Quick setup and testing
✅ Small to medium datasets

External Repositories:

✅ Large-scale content (100K+ documents)
✅ Frequently updated information
✅ High query volumes
✅ Advanced search capabilities needed

3. Optimize Search Strategy

Choose Based on Your Use Case:

Keyword search: For exact term matching, fastest performance
Vector search: For natural language queries, semantic understanding
Hybrid search: For best accuracy with slightly slower performance

Adjust Result Counts:

Start with 5-10 results
Reduce to 1-3 for simple fact retrieval
Increase to 15+ only for complex research queries

4. Optimize Repository Configuration

For External Repositories:

Ensure proper indexing (HNSW for vector search)
Minotor index size and performance
Use metadata filtering to narrow search scope
Monitor and optimize network connectivity

5. Provide Clear Knowledge Source Descriptions

Why It Matters:

Helps agents select the right knowledge source
Improves query formulation
Enhances overall agent performance

Best Practices:

Describe what content the source contains
Specify the types of questions it can answer
Include relevant keywords and topics
Keep descriptions clear and concise

Example:

❌ Poor: “Company documents”
✅ Good: “Employee handbook containing HR policies, benefits information, and workplace guidelines. Use for questions about PTO, health insurance, and company policies.”

Measuring Knowledge Performance

How to Measure

Using Agent Traces:

Execute agent runs that use knowledge
Retrieve detailed traces using the searchTraces API
Analyze knowledge tool execution time in traces Knowledge tool response has a debug object with detailed information about processing time and other metrics, including the following:
- total_time_ms: Total time spent in the knowledge tool
- search_time_ms: Time spent calling search excluding embeeding generation
- answer_generation_time_ms: Time spent calling LLM to generate answer
Identify bottlenecks in the knowledge pipeline

Testing External Repositories Independently:

Test repository performance directly (outside of wxO)
Measure search latency at the repository level
Compare with end-to-end knowledge tool performance
Isolate network vs processing time

Key Metrics to Track

Speed Metrics:

Total knowledge execution time: End-to-end retrieval time
Embedding generation time: Time to create vector embeddings
Search latency: Time spent in actual search operation
Answer generation time: Time to create final response
Network latency: Time for external repository communication (if applicable)

Performance Analysis Tips

Establish baselines: Measure performance with different configurations
Test with representative queries: Use real-world query patterns
Monitor over time: Track performance trends as data grows
Compare modes: Test Classic vs Dynamic mode for your use case
Test different search strategies: Compare keyword, vector, and hybrid search
Vary result counts: Find optimal balance for your use case

Summary

Key Points:

Dynamic mode (recommended): Faster knowledge retrieval, more flexible agent processing
Repository selection matters: File uploads for static content, external repositories for scale
Search strategy impacts performance: Keyword (fastest) → Vector (moderate) → Hybrid (slowest, most accurate)
Result count affects speed and quality: 5-10 results optimal for most use cases
Proper indexing is critical: Use HNSW for vector search

Speed vs Quality Trade-offs:

Search strategy: Balance between speed (keyword) and semantic understanding (vector/hybrid)
Result count: More results = better coverage but slower and potentially noisier
Index configuration: Better indexing = faster search but requires more resources

Performance Best Practices:

Use Dynamic mode for better performance and flexibility
Choose appropriate repository based on content size and update frequency
Select search strategy based on query types (keyword/vector/hybrid)
Optimize result counts (start with 5-10)
Ensure proper indexing for external repositories
Provide clear, detailed knowledge source descriptions
Monitor performance metrics and adjust based on real usage patterns
Test with representative queries before production deployment

Related Guides:

​Overview

​Understanding Knowledge Performance

​Knowledge as a Built-in Tool

​Knowledge Modes: Classic vs Dynamic

​Performance by Knowledge Sources

​Performance Characteristics by Type

​Repository Selection Guide

​Retrieval Speed vs Quality Trade-offs

​Search Strategy Trade-offs

​Result Count Trade-offs (Default is 5)

​Index Configuration Trade-offs

​Optimization Strategies

​1. Choose the Right Mode

​2. Select Appropriate Repository

​3. Optimize Search Strategy

​4. Optimize Repository Configuration

​5. Provide Clear Knowledge Source Descriptions

​Measuring Knowledge Performance

​How to Measure

​Key Metrics to Track

​Performance Analysis Tips

​Summary

Overview

Understanding Knowledge Performance

Knowledge as a Built-in Tool

Knowledge Modes: Classic vs Dynamic

Performance by Knowledge Sources

Performance Characteristics by Type

Repository Selection Guide

Retrieval Speed vs Quality Trade-offs

Search Strategy Trade-offs

Result Count Trade-offs (Default is 5)

Index Configuration Trade-offs

Optimization Strategies

1. Choose the Right Mode

2. Select Appropriate Repository

3. Optimize Search Strategy

4. Optimize Repository Configuration

5. Provide Clear Knowledge Source Descriptions

Measuring Knowledge Performance

How to Measure

Key Metrics to Track

Performance Analysis Tips

Summary