Skip to main content

Before you begin

In order to analyze, you must first evaluate your agent. For more information, see Evaluating agents and tools.

Analyzing

The analyze command provides a detailed breakdown of your agent evaluation results, highlighting where the agent succeeded, failed, and why. The analyze command generates an overview analysis for each dataset result in the specified directory. It helps you quickly identify:
  • Which tool calls were expected and made
  • Which were irrelevant or incorrect
  • Any parameter mismatches
  • A high-level summary of the agent’s performance
  • Missed tool calls
The analysis includes:
  • Analysis Summary: Displays the overall evaluation type (e.g., Multi-run), total number of runs, number of runs with problems, and the overall status. This provides a quick high-level view of the evaluation results.
  • Test Case Summary: Presents key counts for each test run, including expected vs. actual tool calls, correct tool calls, text match results, and journey success status.
  • Conversation History: Step-by-step breakdown of every message exchanged, providing insight into where things went right or wrong.
  • Analysis Results: Details the specific mistakes, along with the reasoning for each error (e.g., irrelevant tool calls).
orchestrate evaluations analyze -d path/to/results
--data-path (-d)
string
required
Directory where your evaluation results are saved.
--tools-path (-t)
string
New in 1.11.0: Directory containing tool definitions.
--env-file (-e)
string
Path to the .env file that overrides the default environment.
--mode (-m)
string
Either default or enhanced. enhanced mode optionally provides doc string enrichments for tools. By default, doc string recommendations stay limited to tools with minimal descriptions. To enable more recommendations, set export GATE_TOOL_ENRICHMENTS=false. Before using a recommendation, make a copy of the tool, try the new recommended doc string, and validate the performance again.
Before you run the command, enlarge you terminal window to better visualize the output. The output of the command can truncate some of the information in smaller terminal windows.

Analyzing tools

New in 1.11.0
The analyze command now supports tool description quality analysis for failing tools in your workflows. This helps ensure that your tool definitions include clear and sufficient docstrings. When you provide the --tools-path flag, the analyzer will:
  • Inspect the Python source file containing your tool definitions.
  • Evaluate the quality of each tool’s description (docstring).
  • Display:
    • A warning if the description is missing or classified as poor.
    • An OK message if the description meets quality standards.
Note:
  • Description quality analysis only runs for tools that failed during evaluation.
  • For now, you can use only Python tools.
orchestrate evaluations analyze -d data/path -t path/to/source/file/containing/tool/definitions

Example Output

Running analyze on the evaluation results of a dataset, such as examples/evaluations/analysis/multi_run_example, produces an output like the following for both runs of data_commplex.json:

Example of enhanced mode

  • Always verify that your API credentials are set before running analyze.
  • Use the analysis output to quickly identify patterns in agent errors and focus your improvement efforts.