record
commandgenerate
commandrecord
command captures real-time chat interactions and automatically generates evaluation datasets from them.
With recording enabled, any conversation you have in the chat UI will automatically be captured and annotated for evaluation.
You can create as many sessions as needed. Each session’s data will be stored in separate annotated files.
record
command to capture the session.Launch the Chat UI
Access the Chat UI
hr_agent
agent:--output-dir
(optional): Directory where your recorded data will be saved. If omitted, the data will be saved in your current working directory. For every chat session, the following file is generated in your output directory:
<THREAD_ID>_annotated_data.json
Annotated ground truth data based on your chat session, ready for evaluation.hr_agent
:
starting_sentence
field is populated directly from your inputs. However, other fields like story
and goals
are derived from the recorded conversation and might require validation to ensure their accuracy and relevance.Ctrl+C
in the terminal running the record
command. Be sure to finish your conversation before stopping to avoid generating an incomplete dataset.
generate
command transforms user stories into structured test cases using your tool definitions. It produces datasets suitable for automated evaluation and benchmarking of agents.
generate
command, ensure the following:
@tool
decorator and proper type annotations..csv
file containing user stories. Each row should include:
story
: A natural language description of the user’s goalagent
: The name of the agent responsible for handling the storygenerate
command--stories-path
: Path to your CSV file of user stories--tools-path
: Path to your Python file defining agent tools--output-dir
(optional): Output directory for generated files; if omitted, files are saved alongside your stories file.generate
command will analyze each story and generate a sequence of tool calls which is saved as an <AGENT_NAME>_snapshot_llm.json
file in our output directory.
The snapshot is then used to generate structured test cases that you can use for evaluating your agent(s). The generated datasets are written to a <AGENT_NAME>_test_cases/
folder in the output directory.
@tool
decoratorstr
, int
, etc.)str
, list
, dict
, etc.)generate
.