Records

The Records page provides a unified view of all evaluation records across runs in your project. Use it to search, filter, analyze patterns, and bulk re-score records without navigating through individual runs.

Screenshot of the Records page showing the records table and history chart.

What is a Record?

A Record is an individual test execution within a run. Each record contains:

Inputs: The data sent to your AI system
Outputs: The response generated by your system
Expected (Labels): Ground truth or ideal responses for comparison
Scores: Evaluation results from each metric
Status: Whether scoring is pending, completed, or errored

Records are created when you run evaluations via the API, Playground, or from traces.

Searching and Filtering

Metadata Search

Search through trace metadata to find specific records. Click the search field and enter any text that appears in your trace metadata. This search looks through all metadata fields and returns records from matching traces.

Metadata search uses ClickHouse for high-performance searches across large datasets. Results may take a few seconds to load for very large projects.

Filtering Options

Use the filter dropdown to narrow results by:

Run: Filter by specific evaluation run
Source: How the record was created (API, Playground, Kickoff, Trace)
Status: Scoring status (completed, pending, errored)
Date range: Records created within a specific time period

Customizing the Table

Click Edit Table to customize which columns appear and their order. You can add, remove, and reorder columns including:

Base columns: ID, Created By, Created At
Data fields: Inputs, Outputs, Expected
Source: How the record was created (API, Playground, Kickoff, Trace)
Metrics: Score columns for each metric in your project

Screenshot of the Edit Table modal for customizing record columns.

Your column preferences are saved per project.

History Chart

The interactive histogram shows record distribution over time. Click any bar to filter records to that time period.

Bulk Re-scoring

Select multiple records using the checkboxes, then click Re-score to re-evaluate them with your metrics. This is useful when:

You’ve updated a metric’s guidelines
You want to apply new metrics to existing records
You need to re-evaluate after fixing a configuration issue

Re-scoring uses the latest version of your metrics without re-running your AI system.

Record Details

Click anywhere on a table row to view the full record details. You can also click the specific record ID link if you prefer. The entire row is clickable to improve discoverability.

Interactive elements like checkboxes, score cards, and popover buttons won’t trigger navigation - only clicking on empty areas of the row will open the record details.

The details view differs based on how the record was created:

Testcase-Based Records

Records created from testsets show:

Scores: Pass/fail status, reasoning, and metric properties for each evaluation
Test Record Details: Input fields, expected outputs, and actual outputs

Screenshot of a testcase-based record with scores and test record details.

Trace-Based Records

Records created from production traces show:

Trace Overview: Duration, estimated cost, total tokens, and span count
Spans: Individual LLM calls with timing and cost breakdown
Model Usage: Which models were called and token counts

Screenshot of a trace-based record with spans and trace overview.

Use Cases

Cross-run analysis: Find patterns across multiple evaluation runs
Debugging failures: Filter by metric.status:fail to investigate failing records
Quality review: Review records from specific time periods or sources
Metric iteration: Re-score records after updating metric guidelines

Introduction

Quickstarts

Core features

Advanced features

Governance, Risk, and Compliance

What is a Record?

Searching and Filtering

Metadata Search

Filtering Options

Customizing the Table

History Chart

Bulk Re-scoring

Record Details

Testcase-Based Records

Trace-Based Records

Use Cases

Introduction

Quickstarts

Core features

Advanced features

Governance, Risk, and Compliance

​What is a Record?

​Searching and Filtering

​Metadata Search

​Filtering Options

​Customizing the Table

​History Chart

​Bulk Re-scoring

​Record Details

​Testcase-Based Records

​Trace-Based Records

​Use Cases

What is a Record?

Searching and Filtering

Metadata Search

Filtering Options

Customizing the Table

History Chart

Bulk Re-scoring

Record Details

Testcase-Based Records

Trace-Based Records

Use Cases