Skip to main content
New in 1.11.0
The quick-eval command provides a fast, reference-less evaluation of your agents and tools.
Note:For now, you can use only Python tools.
Unlike the standard evaluate command, it does not require ground truth datasets. Instead, it runs a lightweight check to identify common issues such as schema mismatches and hallucinations in tool calls.
orchestrate evaluations quick-eval -p  examples/evaluations/quick-eval/ -o results/ -t examples/evaluations/evaluate/agent_tools
You can also run the quick evaluation using a YAML config file, giving you full control over all parameters.
orchestrate evaluations quick-eval -c examples/evaluations/config.yaml
Sample config file:
config.yaml
test_paths:
  - benchmarks/wxo_domains/rel_1.8_mock/workday/data/
auth_config:
  url: http://localhost:4321
  tenant_name: local
output_dir: "test_bench_data3"
enable_verbose_logging: true
llm_user_config:
  user_response_style:
  - "Be concise in messages and confirmations"
--config (-c)
string
required
Path to the configuration file with details about the evaluation settings.
--test-paths (-p)
list[string]
Comma-separated list of test files or directories containing ground truth datasets. Required when not using a configuration file.
--tools-path (-t)
string
Directory containing tool definitions.
--output-dir (-o)
string
Directory where evaluation results will be saved. Required when not using a config file.
--env-file (-e)
string
Path to the .env file that overrides the default environment.
More examples in the Examples folder.

Understanding the Summary Metrics Table

At the end of the evaluation, you will see a summary similar to the following one: Quick evaluation results table

Metrics explained

Quick Evaluation Summary Metrics
MetricDescriptionCalculation / Type
DatasetName of the dataset used for quick evaluationText
Tool CallsTotal number of tool calls attempted during the evaluationInteger (≥ 0)
Successful Tool CallsNumber of tool calls that executed successfully without errorsInteger (≥ 0)
Tool Calls Failed due to Schema MismatchNumber of tool calls that failed because the input/output schema did not match expectationsInteger (≥ 0)
Tool Calls Failed due to HallucinationNumber of tool calls that failed because the agent invoked tools that were irrelevant or non-existentInteger (≥ 0)
If the value is equal to 1.0 or True, the table omits the result.