Skip to main content
Performance Testing ApproachThis guide focuses on how to measure and optimize Knowledge runtime performance. Performance varies significantly based on workload, configuration, system load, and network conditions. Always measure in your own environment.
Note: “Flow” in this document refers to wxO Agentic Workflow

Overview

Knowledge in watsonx Orchestrate enables agents to return accurate, relevant answers based on trusted information. Knowledge sources provide factual and contextual data that help interpret user questions and generate meaningful responses. Knowledge Source Types:
  • File Uploads: Static content stored in internal knowledge base
  • External Repositories: Milvus, Elasticsearch, Astra DB, or custom services

Understanding Knowledge Performance

Knowledge as a Built-in Tool

Key Insight: Knowledge is internally implemented as a built-in tool, which means:
  • It shares the same performance characteristics as other tools in wxO
  • Each knowledge retrieval incurs tool invocation overhead
  • Performance depends on both the tool call overhead and the actual search execution
  • Understanding this helps you optimize knowledge usage patterns
Performance Components:
  1. Tool Invocation: Time to initialize and prepare the knowledge request
  2. Search Execution: Time to perform the actual search (varies by repository type)
  3. Result Processing: Time to format and return results to the agent
This is important because:
  • Multiple knowledge calls = Multiple tool invocation costs
  • The actual search time is often fast, but tool overhead exists for each call

Knowledge Modes: Classic vs Dynamic

Knowledge can operate in two modes, each with different performance characteristics: Classic Mode:
  • Knowledge performs query rewrite, search, and answer generation
  • Returns a complete answer to the agent
  • Latency: Higher (includes query rewrite + search + answer generation)
  • Use case: When you want consistent, pre-formatted answers
Dynamic Mode (Recommended):
  • Knowledge returns raw search results to the agent
  • Agent processes the results and formulates its own response
  • Latency: Lower for knowledge retrieval
  • Use case: When agent needs flexibility to interpret and combine results
Performance Comparison:
AspectClassic ModeDynamic Mode
Knowledge executionSlower (query rewrite + search + answer generation)Faster (search only)
Agent processingMinimal (receives answer)More (interprets results)
Total latencyHigherVariable (depends on agent processing)
FlexibilityLower (fixed answer format)Higher (agent can combine/interpret)
Best forSimple Q&A, consistent answersComplex reasoning, multi-source synthesis
Recommendation: Use Dynamic mode for better performance and flexibility unless you specifically need pre-formatted answers.

Performance by Knowledge Sources

You can use the built-in file uploads or connect to external knowledge sources like Milvus, Elasticsearch, Astra DB, or a custom service as your knowledge source. Note: Using external knowledge sources require creating your own vector index and ingesting your data into it.

Performance Characteristics by Type

Repository TypePerformanceNetwork DependencyBest For
File UploadsVery fastManaged by wxOStatic content, quick setup, small to medium datasets
MilvusFastYesLarge-scale vector search, high-volume queries
ElasticsearchFastYesFull-text/hybrid search, structured data, complex filtering
Astra DBFastYesCloud-native deployments, managed service
Custom ServiceVariesYesSpecialized integrations, custom search logic

Repository Selection Guide

File Uploads:
  • Best for: Static content, documentation, policies, FAQs
  • Limits: 20 files per batch, 30MB total, 600 pages per file
  • Formats: .docx, .pdf, .pptx, .xlsx (25MB max), .csv, .html, .txt (5MB max)
  • Performance: Fastest option for small to medium datasets
Milvus:
  • Best for: Large-scale semantic search, high query volumes
  • Strengths: Optimized for vector similarity search, scales well
  • Consideration: Requires setup and maintenance
Elasticsearch:
  • Best for: Full-text search, hybrid search, structured content
  • Strengths: Flexible query capabilities, good for keyword and semantic search
  • Consideration: Supports custom query bodies for advanced filtering
Astra DB:
  • Best for: Cloud deployments, managed service preference
  • Strengths: Consistent performance, reduced operational overhead
  • Consideration: Cloud-native architecture
Custom Service:
  • Best for: Specialized integrations, custom search logic, unique requirements
  • Strengths: Full control over search implementation, can integrate proprietary systems
  • Consideration: Performance varies by implementation, requires custom development and maintenance

Retrieval Speed vs Quality Trade-offs

Understanding the trade-offs between speed and quality is crucial for optimizing knowledge performance. Different configurations impact both retrieval speed and answer accuracy.

Search Strategy Trade-offs

For Elasticsearch & Astra DB:
Search TypeSpeedQualityBest For
Keyword SearchFastestGood for exact matchesKnown terminology, product codes, exact phrases
Vector SearchModerateBest for semantic understandingNatural language queries, conceptual similarity
Hybrid SearchSlowestBest overall accuracyComplex queries requiring both precision and recall
Keyword Search:
  • Speed: Fastest - no embedding generation required
  • Quality: Excellent for exact matches, limited for semantic similarity
  • Use when: Users search with specific terms, product codes, or exact phrases
  • Example: “Order #12345”, “Return policy”, “Product SKU-789”
Vector Search:
  • Speed: Moderate - requires embedding generation for query
  • Quality: Excellent for semantic understanding, handles synonyms and paraphrasing
  • Use when: Natural language queries, conceptual searches, multilingual content
  • Example: “How do I get my money back?” (matches “refund policy”)
Hybrid Search:
  • Speed: Slightly slower - combines both keyword and vector search
  • Quality: Best overall - combines precision of keywords with semantic understanding
  • Use when: Accuracy is critical, queries are complex, or you need both exact and semantic matches
  • Example: “iPhone 15 battery life issues” (exact product + semantic problem)
Recommendation: Use hybrid search for best quality, and only optimize to keyword or vector search based on your specific use case and performance requirements.

Result Count Trade-offs (Default is 5)

Result CountSpeedQualityBest For
Low (1-3)FastestRisk of missing relevant infoSimple queries, single-fact retrieval
Medium (5-10)ModerateBalanced coverageGeneral purpose, most use cases
High (15+)SlowestComprehensive but may include noiseComplex queries, research tasks
Performance Impact:
  • More results = Longer retrieval time
  • More results = Larger context for agent processing
  • More results = Higher token usage
Quality Impact:
  • Too few results = May miss relevant information
  • Too many results = Noise and irrelevant content dilute quality
  • Optimal count depends on content granularity and query complexity
Recommendation: Start with 5-10 results for most use cases. Increase for complex queries requiring comprehensive coverage; decrease for simple fact retrieval.

Index Configuration Trade-offs

For External Repositories (Milvus, Elasticsearch, Astra DB): Proper indexing significantly impacts both speed and quality: Vector Index Types:
  • HNSW (Hierarchical Navigable Small World): Fast search, high accuracy, more memory usage
  • IVF (Inverted File): Balanced speed/memory, good for large datasets
  • Flat: Most accurate but slowest, only suitable for small datasets
Embedding Dimensions:
  • Lower dimensions (384): Faster, less storage, slightly lower quality
  • Higher dimensions (768, 1536): Slower, more storage, better semantic understanding
Recommendation:
  • Use HNSW indexing for vector search in production
  • Choose embeddings models with 768 dimensions for balanced performance and quality
  • Test with your specific data to find optimal configuration

Optimization Strategies

1. Choose the Right Mode

Use Dynamic Mode (Recommended):
  • Faster knowledge retrieval
  • More flexible agent processing
  • Better for complex reasoning tasks
Use Classic Mode:
  • When you need consistent, pre-formatted answers
  • For simple Q&A scenarios
  • When agent flexibility is not required

2. Select Appropriate Repository

File Uploads:
  • ✅ Static content (policies, documentation, FAQs)
  • ✅ Quick setup and testing
  • ✅ Small to medium datasets
External Repositories:
  • ✅ Large-scale content (100K+ documents)
  • ✅ Frequently updated information
  • ✅ High query volumes
  • ✅ Advanced search capabilities needed

3. Optimize Search Strategy

Choose Based on Your Use Case:
  • Keyword search: For exact term matching, fastest performance
  • Vector search: For natural language queries, semantic understanding
  • Hybrid search: For best accuracy with slightly slower performance
Adjust Result Counts:
  • Start with 5-10 results
  • Reduce to 1-3 for simple fact retrieval
  • Increase to 15+ only for complex research queries

4. Optimize Repository Configuration

For External Repositories:
  • Ensure proper indexing (HNSW for vector search)
  • Minotor index size and performance
  • Use metadata filtering to narrow search scope
  • Monitor and optimize network connectivity

5. Provide Clear Knowledge Source Descriptions

Why It Matters:
  • Helps agents select the right knowledge source
  • Improves query formulation
  • Enhances overall agent performance
Best Practices:
  • Describe what content the source contains
  • Specify the types of questions it can answer
  • Include relevant keywords and topics
  • Keep descriptions clear and concise
Example:
  • ❌ Poor: “Company documents”
  • ✅ Good: “Employee handbook containing HR policies, benefits information, and workplace guidelines. Use for questions about PTO, health insurance, and company policies.”

Measuring Knowledge Performance

How to Measure

Using Agent Traces:
  1. Execute agent runs that use knowledge
  2. Retrieve detailed traces using the searchTraces API
  3. Analyze knowledge tool execution time in traces Knowledge tool response has a debug object with detailed information about processing time and other metrics, including the following:
    • total_time_ms: Total time spent in the knowledge tool
    • search_time_ms: Time spent calling search excluding embeeding generation
    • answer_generation_time_ms: Time spent calling LLM to generate answer
  4. Identify bottlenecks in the knowledge pipeline
Testing External Repositories Independently:
  1. Test repository performance directly (outside of wxO)
  2. Measure search latency at the repository level
  3. Compare with end-to-end knowledge tool performance
  4. Isolate network vs processing time

Key Metrics to Track

Speed Metrics:
  • Total knowledge execution time: End-to-end retrieval time
  • Embedding generation time: Time to create vector embeddings
  • Search latency: Time spent in actual search operation
  • Answer generation time: Time to create final response
  • Network latency: Time for external repository communication (if applicable)

Performance Analysis Tips

  1. Establish baselines: Measure performance with different configurations
  2. Test with representative queries: Use real-world query patterns
  3. Monitor over time: Track performance trends as data grows
  4. Compare modes: Test Classic vs Dynamic mode for your use case
  5. Test different search strategies: Compare keyword, vector, and hybrid search
  6. Vary result counts: Find optimal balance for your use case

Summary

Key Points:
  • Dynamic mode (recommended): Faster knowledge retrieval, more flexible agent processing
  • Repository selection matters: File uploads for static content, external repositories for scale
  • Search strategy impacts performance: Keyword (fastest) → Vector (moderate) → Hybrid (slowest, most accurate)
  • Result count affects speed and quality: 5-10 results optimal for most use cases
  • Proper indexing is critical: Use HNSW for vector search
Speed vs Quality Trade-offs:
  • Search strategy: Balance between speed (keyword) and semantic understanding (vector/hybrid)
  • Result count: More results = better coverage but slower and potentially noisier
  • Index configuration: Better indexing = faster search but requires more resources
Performance Best Practices:
  1. Use Dynamic mode for better performance and flexibility
  2. Choose appropriate repository based on content size and update frequency
  3. Select search strategy based on query types (keyword/vector/hybrid)
  4. Optimize result counts (start with 5-10)
  5. Ensure proper indexing for external repositories
  6. Provide clear, detailed knowledge source descriptions
  7. Monitor performance metrics and adjust based on real usage patterns
  8. Test with representative queries before production deployment

Related Guides: