Run Batch Simulations - SigmaMind AI

The Simulate tab provides tools to run and evaluate AI agent behavior against predefined test cases. It is divided into two sub-sections: Test Cases and Batch Runs.

Overview

The LLM Simulation page allows you to:

Define and manage test cases for your AI agents
Trigger batch runs that execute multiple test cases simultaneously
Review results, success/failure status, and detailed remarks for each job

The page is accessed via the top navigation bar under the Simulate tab (alongside Build).

Build | Simulate (active)

A secondary tab bar within the page switches between:

Tab	Description
Test Cases	Create and manage individual simulation scenarios
Batch Runs	View and manage grouped execution runs

Batch Runs

Batch Run List (Left Panel)

The left panel displays all batch runs sorted by creation time. Each entry shows:

Batch ID — a unique identifier (e.g. batch_Xb1RnQ8vfbE0Ed8b)
Created At — timestamp of when the batch was created (e.g. 4/18/2026, 4:57:05 AM)

Clicking a batch run loads its details in the right panel.

Batch Run Summary (Right Panel — Top)

When a batch run is selected, three summary cards are displayed:

Metric	Description
Total Attempts	Total number of jobs executed in the batch
Successful Attempts	Number of jobs where `is_success = true`
Failed Attempts	Number of jobs where `is_success = false`

Example:

Total Attempts: 30   Successful Attempts: 17   Failed Attempts: 13

Job Results Table

Below the summary cards, a detailed table lists every job in the selected batch run.

Columns

Column	Type	Description
`Job ID`	string	Unique identifier for the individual job (e.g. `job_h5IDmcZfPRUIGaRv`)
`Test Case ID`	string	The test case the job was run against (e.g. `tc_wlKlRIsGjKV9gFDL`)
`Chat ID`	string	Identifier of the chat session generated during simulation (e.g. `chat-r5ThYqOTUZpzk8Jl`)
`Is Success`	boolean	`true` (green) if the agent met all evaluation criteria; `false` (red) if it did not
`Status`	badge	Current state of the job — typically `Completed`
`Remarks`	text	AI-generated evaluation summary explaining why the job passed or failed
`Created Datetime`	timestamp	When the job was created

Success / Failure Badge

The Is Success column renders a color-coded badge:

🟢 true — Agent passed all evaluation criteria
🔴 false — Agent failed to meet one or more criteria

Status Badge

The Status column shows the current execution state. Common values:

Status	Meaning
`Completed`	Job finished execution
`Running`	Job is currently in progress
`Failed`	Job encountered an execution error

Remarks

The Remarks column contains a natural-language summary of agent performance. Examples:

“The agent successfully introduced themselves warmly, clearly explained the overdue balance, offered a payment plan in response to the customer’s financial difficulty, confirmed the 3-installment option, and closed the conversation by…”

“The agent acknowledged the customer’s time constraints and kept the pitch brief, but failed to effectively execute a micro-close. While the agent did mention the credit impact reminder, it was not integrated naturally into the conversation and…”

Remarks are truncated in the table view. Click a row to view the full remark.

Example Batch Run

Below is a sample from a batch run showing mixed results:

Job ID	Test Case ID	Is Success	Remarks (summary)
`job_h5IDmcZfPRUIGaRv`	`tc_wlKlRIsGjKV9gFDL`	❌ false	Agent failed to complete the primary goal — customer requested a callback
`job_nNjj6EgdIyOlqTcb`	`tc_jBuFAVeD0OsSekgW`	✅ true	Successfully introduced, explained balance, confirmed payment plan
`job_hFO6nnPvfw8mGa9c`	`tc_jBuFAVeD0OsSekgW`	✅ true	Successfully handled objection and confirmed 3-installment option
`job_7NknE7cMoPvsppko`	`tc_jBuFAVeD0OsSekgW`	✅ true	Payment plan confirmed, conversation closed appropriately
`job_FMCxaLhVViQj51Kl`	`tc_lDawKCNE05hzUaOv`	✅ true	Remained empathetic, handled objection twice with different approaches
`job_AC59MKlMCJWlLbvS`	`tc_wlKlRIsGjKV9gFDL`	❌ false	Micro-close failed; credit reminder not integrated naturally
`job_KxemQu0GeTC951ao`	`tc_lDawKCNE05hzUaOv`	❌ false	Failed to provide non-payment alternatives per customer’s primary request

Tips

Use Batch Runs to run large-scale evaluations across many test cases at once.
Monitor the Successful / Failed Attempts summary to quickly gauge agent quality.
Read Remarks carefully — they provide specific, actionable feedback about agent behavior.
Multiple jobs can share the same Test Case ID, allowing you to test consistency across runs.

​Overview

​Navigation

​Batch Runs

​Batch Run List (Left Panel)

​Batch Run Summary (Right Panel — Top)

​Job Results Table

​Columns

​Success / Failure Badge

​Status Badge

​Remarks

​Example Batch Run

​Tips