Documentation Index
Fetch the complete documentation index at: https://developer.watson-orchestrate.ibm.com/llms.txt
Use this file to discover all available pages before exploring further.
Note: “Flow” in this document refers to wxO Agentic Workflow
Overview
Tools are executable units in watsonx Orchestrate that perform specific tasks (API calls, data processing, calculations) within isolated, secure environments. They are invoked by Agents or orchestrated by Flows.Tool Types in wxO
Choosing the Right Tool Type
wxO Flow (first choice): Native support for connections/security, Python code snippets, orchestration of wxO agents, tools and people (e.g. confirmation before transaction, custom forms), built-in document extraction and processing, LLM support. Python Tools: For custom libraries not in wxO Flow code blocks Langflow Tools: For Langflow-specific AI components or existing langflow flowsTool Type Comparison
| Tool Type | Runtime/Base | Timeout | Key Features | Best For |
|---|---|---|---|---|
| wxO Flow | Visual workflow | None | Stateful, resumable, orchestrates Agents/Tools/People, LLM support | Multi-step workflows, API integrations, human-in-loop, simple document processing |
| Python | Python 3.12 | 2 min | Stateless, read-only FS, outbound network only | Data processing, business logic, calculations |
| Langflow | Langflow 1.7.1 | 2 min | Stateless, read-only FS, outbound network only, 2+ sec initialization | LLM processing, RAG, complex document analysis |
| API | OpenAPI spec | 2 min (sync) / None (async) | External REST APIs, network-dependent | Third-party integrations, external services |
| MCP | MCP protocol | 2 min | Extended capabilities, varies by implementation | Custom integrations, protocol-based interactions |
Understanding Tool Performance
The Granularity Principle
Key Insight: Tool performance has two distinct components:-
Tool Call Overhead: The cost of invoking a tool
- Includes initialization, context setup, and result handling
- Exists for every tool invocation
- Relatively consistent per tool type
-
Execution Inside the Tool: The actual work being done
- Runs fast once inside the tool
- Varies based on the logic and operations
- Where optimization efforts should focus
- Multiple small tool calls = Multiple overhead costs
- Single larger tool call = One overhead cost, fast internal execution
- Recommendation: Combine related operations into single tools when possible
Performance Characteristics by Tool Type
Python Tools:- Call overhead: Present but minimal
- Internal execution: Very fast for most operations
- Overall: Fast to very fast
- Best for: Deterministic logic, data processing, API calls
- Call overhead: Includes Langflow initialization (2+ seconds due to initialization)
- Internal execution: Depends on LLM operations
- Overall: Slower due to LLM inference (if using LLM)
- Best for: LLM reasoning, NLP tasks, document analysis
Python & Langflow Tool Performance
Shared Technical Constraints
Both Python and Langflow tools operate in secure sandboxes with identical constraints:| Constraint | Impact | Best Practice |
|---|---|---|
| Isolated pod execution | Each tool instance runs in a separate pod for tenant isolation and security | Design for stateless, independent execution |
| 2 CPU cores maximum | Limited computational resources per tool instance | Optimize algorithms and avoid CPU-intensive operations |
| 2GB memory maximum | Limited memory per tool instance | Manage memory efficiently, avoid large data structures |
| 2-minute timeout | Maximum execution time per call | Design for timeout awareness, break long operations into chunks |
| No GPU access | GPU operations will fail | Use CPU-optimized algorithms and libraries only |
| Cold start penalty | First run takes longer; subsequent runs within 72 hours are faster | Expect initial latency; warm pods improve performance |
| Stateless execution | No state persists between calls | Use external storage (Redis, S3, database) |
| Read-only filesystem | Cannot write files locally | Use in-memory buffers or external storage |
| Network isolation | Outbound requests only | Design for outbound-only patterns |
Security Note: Due to tenant isolation requirements and the potentially unsecured nature of user-authored code, wxO runs each Python and Langflow tool instance as a separate pod with strict resource limits. This ensures security and prevents resource contention between tenants.
Performance Note: Tool pods experience cold start latency on first invocation. Once warmed up, the pod remains available for approximately 6 hours, with the time extended upon continuous use of the tool. This provides faster execution for subsequent calls. Plan for initial latency in performance testing and user experience design.
Python Tools
Execution Speed: Very fast (simple operations), fast (data processing), variable (external API calls) Optimization Strategies:- Minimize and batch external API calls
- Use efficient algorithms (O(n) vs O(n²))
- Choose performant libraries
- Implement external caching (Redis) for expensive operations
Overview
Tool builders in IBM watsonx Orchestrate can build their tools by using Python and deploy them as toolkits on the platform. The platform now provides a toolkit runtime architecture that offers significant improvements over the individual tool import architecture. This document provides guidelines for designing, developing, and deploying Python tools for watsonx Orchestrate.Key changes in Python toolkits
- New toolkit runtime: Overcomes individual tool processing overheads.
- Improved performance: Improves resource utilization and reduces cold start latency.
New toolkit runtime architecture
Architecture overview
Both individual Python tools and Python toolkits run in isolated containers, but with fundamentally different execution models: Individual Python tool approach:- Each tool has its own
requirements.txtwith specific dependencies - Process-level isolation: Each tool invocation creates a new process, albeit a lightweight one.
- Thread-safety: Tools need not be thread-safe.
- Performance overhead: Creates a lightweight process for each execution and a virtual environment for tools that are not yet loaded in the worker.
- Resource allocation: 2 vCPU and 2 GB RAM per replica (2 replicas provided)
- Cold-start delays: Significant latency from process creation and dependency loading.
- Multiple related tools packaged together in a single toolkit.
- Single shared
requirements.txtfor all tools in the toolkit. - Thread-level isolation: Tools run in persistent worker threads.
- Preinstalled dependencies: All dependencies are loaded once at container startup.
- Thread-safety required: All tools must be thread-safe and reentrant.
- High performance: No process creation or dependency loading overhead.
- Resource allocation: 2 vCPU and 2 GiB RAM per replica (2 replicas provided).
- FastAPI/Gunicorn workers: 5 persistent workers per replica handle concurrent requests.
Deployment architecture
Each toolkit deployment consists of:- Kubernetes deployment: 2 replicas
- Per-Replica resources:
- 2 vCPU cores
- 2 GiB memory
- 5 FastAPI/Gunicorn workers per replica
- Load balancing: Automatic distribution across replicas and workers
- Health monitoring: Built-in health checks and automatic recovery
Benefits of toolkit approach
-
Higher performance:
- No process creation overhead per execution.
- No dynamic dependency loading.
- Persistent workers eliminate cold start delays.
- Thread-based execution is orders of magnitude faster than process-based.
-
Shared dependencies:
- Single
requirements.txtfor all tools in the toolkit - Dependencies loaded once at container startup
- Reduced memory footprint through shared libraries
- Single
-
Better resource utilization:
- Multiple tools share container resources
- Efficient thread-based concurrency
- Lower per-request overhead
-
Improved scalability:
- Each replica handles multiple concurrent requests through workers
- Better throughput per unit of compute
-
Enhanced reliability:
- Built-in health checks and automatic failover
- Persistent workers reduce failure points
- No dependency loading failures during execution
-
Cost efficiency:
- More tools per unit of compute resource
- Reduced infrastructure overhead
- Better resource density
Thread safety requirements
Critical: All tools in a toolkit must be thread-safe because:- Multiple worker threads run tools concurrently
- No process-level isolation between tool invocations
- Shared memory space within the container
PYTHON
- A toolkit in draft is run by using the Python tool approach to cover a large number of tools and toolkits that might be present in a tenant in the draft environment.
- Toolkits in live run with thread safety and a separate deployment per toolkit.
Resource allocation and performance requirements
CPU computation limits
Critical requirement: Python tools must limit CPU-bound computation to approximately 50 milliseconds and use async I/O for all I/O operations to ensure performance scaling. This constraint helps ensure:- Nonblocking execution in the event loop
- Fair resource sharing across concurrent requests
- Predictable response times
- Prevention of worker starvation
What counts as CPU time
CPU time includes:- Data parsing and transformation logic
- JSON serialization/deserialization
- String manipulation and formatting
- Mathematical computations
- In-memory data filtering and sorting
- Object instantiation and manipulation
- Network I/O wait time (HTTP requests, database queries)
- File I/O wait time
- Sleep/delay operations
- External API call latency
Memory constraints
Each replica has 2 GiB of memory that is shared across 5 workers:- Per-worker budget: ~400 MB (accounting for overhead).
- Tool memory usage: Cannot not exceed 100-200 MB per concurrent request.
- Memory leaks: Avoid memory leaks to prevent pod crashes.
- Throttling: Implement request throttling if memory usage approaches limits.
Python tool design principles
1. Tools as API wrappers
Python tools must be lightweight wrappers around existing API implementations:PYTHON
2. Mandatory async I/O
All I/O operations must use async/await patterns to prevent blocking the event loop:PYTHON
- Allows the worker to handle other requests while it waits for the I/O
- Enables high concurrency with limited resources
- Prevents worker starvation
- Maximizes throughput
3. Response formatting guidelines
Keep CPU-intensive formatting minimal (target: 10-50ms):PYTHON
Performance analysis and capacity planning
Example scenario: high-throughput tool
The following example shows a tool that makes 4 asynchronous API calls: Tool characteristics:- CPU time per request: 50 ms (formatting/parsing)
- API call latency: 15 seconds each (4 calls in parallel via async)
- Total wall-clock time: ~15 seconds (parallel execution)
- 2 Kubernetes replicas
- 2 vCPU per replica
- 5 workers per replica
- Total workers: 10 (2 replicas × 5 workers)
Throughput calculation
CPU capacity & practical throughput:- Total vCPUs: 4 (2 replicas × 2 vCPU)
- CPU time per request: 50 ms = 0.05 seconds
- Typical CPU-based throughput: 50 requests/second
When to use platform-hosted Python toolkits
Platform-hosted Python toolkits are good for:✅ Suitable use cases
-
API orchestration tools
- Calling 2-5 external APIs and combining results
- Response formatting and field mapping
- Simple data aggregation from multiple sources
-
Data transformation tools
- JSON/XML format conversions
- Field extraction and renaming
- Simple filtering and sorting
- Data validation and sanitization
-
Integration wrappers
- Wrapping REST APIs with simplified interfaces
- Authentication token management
- Request/response normalization
-
Lightweight processing
- Text formatting and templating
- Date/time conversions
- Unit conversions
- Simple calculations (< 50 ms CPU)
Requirements checklist
Use platform-hosted toolkits when ALL of the following are true:- CPU computation ≤ 50 ms per request
- All I/O operations use async/await
- Memory usage is less than 200 MB per request
- Heavy processing is delegated to external APIs
- Response time is acceptable at 5-30 seconds
- The tool is stateless or uses platform context management
- No special runtime dependencies (such as custom C libraries)
When to use self-hosted solutions
Consider self-hosting tools through Remote MCP Servers or OpenAPI Tools when:❌ Not suitable for platform toolkits
-
CPU-intensive operations
- Complex statistical analysis
- Cryptographic operations
- Large-scale data transformations
-
High memory requirements
- Loading large models (> 500 MB)
- Processing large datasets in memory
- Caching extensive reference data
- In-memory databases
-
Special runtime requirements
- Custom system libraries
- Specific Python versions
- Native code dependencies
-
Long-running operations
- Batch processing jobs
- Report generation (> 60 seconds)
- Data migration tasks
- Background processing
-
Stateful operations
- Maintaining persistent connections
- Session management
- Transaction coordination
- Workflow orchestration
Self-hosting options
Option 1: Remote MCP server- Best for: Custom tool collections with complex logic
- Deployment: Your infrastructure
- Protocol: Model Context Protocol over HTTP/SSE
- Benefits: Full control, custom resources, flexible scaling
- Best for: Existing REST APIs
- Deployment: Your infrastructure or third-party services
- Protocol: Standard HTTP/REST
- Benefits: Use existing APIs, no code changes needed
- Platform toolkits for orchestration and formatting
- Self-hosted services for heavy computation
- Example: Platform tool calls your ML inference API
Best practices
1. Async I/O patterns
PYTHON
2. Error handling
PYTHON
3. Connection pooling
PYTHON
4. Memory management
PYTHON
5. Context management
PYTHON
Migration from individual tools to toolkits
If you have existing individual tools, consider migrating to the toolkit approach:Migration checklist
-
Group-related tools:
- Identify tools that share similar dependencies
- Group tools by functional domain or use case
- Aim for 10-15 tools per toolkit for optimal resource utilization
-
Consolidate dependencies:
- Merge all
requirements.txtfiles into a single file - Resolve version conflicts between tools
- Remove duplicate dependencies
- Test compatibility of consolidated dependencies
- Merge all
-
Ensure thread safety:
- Critical: Review all tools for thread-safety issues
- Remove or protect a shared mutable state
- Use thread-safe data structures (queue.Queue, threading.Lock)
- Avoid global variables that are modified during execution
- Test tools under concurrent load
-
Review CPU usage:
- Profile each tool to help ensure CPU time < 50 ms
- Identify and optimize CPU-intensive operations
- Consider moving heavy computation to external APIs
-
Convert to async:
- Replace all blocking I/O with async/await
- Use async-compatible libraries (httpx, aiohttp, asyncpg)
- Ensure all I/O operations are nonblocking
-
Test concurrency:
- Load test with multiple concurrent requests
- Verify no race conditions or deadlocks
- Monitor memory usage under load
- Validate thread-safety under stress
-
Package as toolkit:
- Follow toolkit packaging format
- Create single requirements.txt
- Document thread-safety guarantees
- Include toolkit metadata
Summary
The new toolkit runtime architecture provides a high-performance, scalable platform for Python tools that:- Eliminates cold start delays through persistent workers
- Maximizes throughput through async I/O and efficient resource sharing
- Ensures predictability through strict CPU time limits
Langflow Tools
Execution Speed: Minimum 2+ seconds (initialization overhead per run), then variable depending on workflow complexity Key Consideration: Due to the initialization overhead on every run, Langflow tools are most effective for operations that take longer to execute (but remain under the 2-minute timeout). The initialization penalty becomes less significant when the actual workflow processing time is substantial.Note: LLM operations in Langflow are optional. Use Langflow when you need its specific AI components or have existing Langflow workflows, not solely for LLM capabilities.
- Use for longer-running operations where initialization overhead is proportionally smaller
- Minimize and combine LLM calls into single prompts (when using LLMs)
- Use concise, focused prompts with minimal context (when using LLMs)
- Choose smaller models for simple tasks, larger for complex reasoning (when using LLMs)
- Cache expensive operation results externally (Redis) for repeated queries
API & MCP Tool Performance
API Tools (OpenAPI-based)
Performance: Depends entirely on external service (network latency, service speed, payload size, rate limiting, authentication) Timeout:- Synchronous calls: 2 minutes (same as Python/Langflow/MCP)
- Asynchronous calls: None (via OpenAPI callback syntax)
MCP Tools (Model Context Protocol)
Performance: Varies by implementation (tool design, external dependencies, protocol overhead, resource requirements) Timeout: 2 minutes (same as Python/Langflow/synchronous API tools; MCP protocol does not support async)Summary
Key Points:- wxO Flow: First choice for most scenarios - supports connections, security, custom code, no timeout limit
- Python Tools: For custom libraries not supported in wxO Flow code blocks
- Langflow Tools: For Langflow-specific AI components
- API Tools: For OpenAPI-based external service integrations - performance depends on external service
- MCP Tools: For Model Context Protocol capabilities - performance varies by implementation
- Tool call overhead exists: Combine operations when possible
- 2-minute timeout: Applies to synchrous tools such as Python, Langflow, OpenAPI sync and MCP tools
Related Guides:

