Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 21 additions & 2 deletions content/en/llm_observability/experiments/advanced_runs.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,29 @@
---
title: Advanced Experiment Runs
description: Run experiments multiple times to account for model variability and automate experiment execution in CI/CD pipelines.
description: Run experiments multiple times to account for model variability on a subset of your dataset, and automate experiment execution in CI/CD pipelines.
---

This page discusses advanced topics in running experiments, including [multiple experiment runs](#multiple-runs) and [setting up experiments in CI/CD](#setting-up-your-experiment-in-cicd).

## Run an experiment on a subset of your dataset

First, add tags to your dataset records. These tags can be unique identifiers (for example, `name:test_use_case_1`) or represent properties of the scenario (for example, `difficulty:hard`).

Then, use the `tags` argument of `LLMObs.pull_dataset()` to filter the dataset to the relevant records and run the experiment.

Example
```
prod_dataset = LLMObs.pull_dataset(dataset_name="my-dataset", tags=["env:prod", "version:1.0"])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use documented pull_dataset parameters in subset example

The new subset example passes tags=[...] to LLMObs.pull_dataset, but the Experiments docs currently define dataset records in terms of metadata and document pull_dataset with dataset_name plus optional project_name/version only (see content/en/llm_observability/experiments/datasets.md). In that context, users following this snippet can hit an unexpected-keyword error and fail to run the workflow; please either switch to the supported filter path or add/version-gate the new parameter where the API contract is documented.

Useful? React with 👍 / 👎.


experiment = LLMObs.experiment(
name="example-experiment",
dataset=prod_dataset,
task=topic_relevance,
evaluators=[exact_match, false_confidence]
)
experiment.run()
```

## Multiple runs

You can run the same experiment multiple times to account for model non-determinism. You can use the [LLM Observability Python SDK][1] or [Experiments API][2] to specify how many iterations to run; subsequently, each dataset record is executed that many times using the same tasks and evaluators.
Expand Down Expand Up @@ -215,4 +234,4 @@ jobs:

[1]: /llm_observability/instrumentation/sdk?tab=python
[2]: /llm_observability/experiments/api
[3]: https://app.datadoghq.com/llm/experiments
[3]: https://app.datadoghq.com/llm/experiments
Loading