New in 1.11.0
quick-eval command provides a fast, reference-less evaluation of your agents and tools.
Note:For now, you can use only Python tools.
evaluate command, it does not require ground truth datasets. Instead, it runs a lightweight check to identify common issues such as schema mismatches and hallucinations in tool calls.
config.yaml
Flags
Flags
Path to the configuration file with details about the evaluation settings.
Comma-separated list of test files or directories containing ground truth datasets. Required when not using a configuration file.
Directory containing tool definitions.
Directory where evaluation results will be saved. Required when not using a config file.
Path to the
.env file that overrides the default environment.Understanding the Summary Metrics Table
At the end of the evaluation, you will see a summary similar to the following one:
Metrics explained
Quick Evaluation Summary Metrics| Metric | Description | Calculation / Type |
|---|---|---|
| Dataset | Name of the dataset used for quick evaluation | Text |
| Tool Calls | Total number of tool calls attempted during the evaluation | Integer (≥ 0) |
| Successful Tool Calls | Number of tool calls that executed successfully without errors | Integer (≥ 0) |
| Tool Calls Failed due to Schema Mismatch | Number of tool calls that failed because the input/output schema did not match expectations | Integer (≥ 0) |
| Tool Calls Failed due to Hallucination | Number of tool calls that failed because the agent invoked tools that were irrelevant or non-existent | Integer (≥ 0) |
If the value is equal to 1.0 or
True, the table omits the result.
