From dafdfb8433715eaafa45a1c2e8b5dc8578432ef7 Mon Sep 17 00:00:00 2001 From: Charles Jacquet Date: Mon, 16 Mar 2026 18:01:43 -0400 Subject: [PATCH 1/6] Enhance documentation for advanced experiment runs Updated description to specify running experiments on a subset of the dataset. Added a new section on running experiments on a subset of the dataset with examples. --- .../experiments/advanced_runs.md | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/content/en/llm_observability/experiments/advanced_runs.md b/content/en/llm_observability/experiments/advanced_runs.md index fe4e91df67e..db8b6db2671 100644 --- a/content/en/llm_observability/experiments/advanced_runs.md +++ b/content/en/llm_observability/experiments/advanced_runs.md @@ -1,10 +1,22 @@ --- title: Advanced Experiment Runs -description: Run experiments multiple times to account for model variability and automate experiment execution in CI/CD pipelines. +description: Run experiments multiple times to account for model variability on a subset of the dataset, and automate experiment execution in CI/CD pipelines. --- This page discusses advanced topics in running experiments, including [multiple experiment runs](#multiple-runs) and [setting up experiments in CI/CD](#setting-up-your-experiment-in-cicd). +## Run an Experiment on a subset of the Dataset + +First, add tags to your dataset records. They can be a unique identifier (e.g `name:test_use_case_1`) or represent a property of the scenario (e.g `difficulty:hard`). + +Then, use the `tags` argument of `LLMObs.pull_dataset()` to filter down the dataset to the records you want to run an Experiment on. + +Example +``` +LLMObs.pull_dataset(dataset_name="my-dataset", tags=["env:prod", "version:1.0"]) +``` +Finally, run the Experiment as usual. + ## Multiple runs You can run the same experiment multiple times to account for model non-determinism. You can use the [LLM Observability Python SDK][1] or [Experiments API][2] to specify how many iterations to run; subsequently, each dataset record is executed that many times using the same tasks and evaluators. @@ -215,4 +227,4 @@ jobs: [1]: /llm_observability/instrumentation/sdk?tab=python [2]: /llm_observability/experiments/api -[3]: https://app.datadoghq.com/llm/experiments \ No newline at end of file +[3]: https://app.datadoghq.com/llm/experiments From 8f5f4d5c1480f7fc5f522b7061dd74edbef9f6a4 Mon Sep 17 00:00:00 2001 From: Charles Jacquet Date: Tue, 17 Mar 2026 12:36:59 -0400 Subject: [PATCH 2/6] Update content/en/llm_observability/experiments/advanced_runs.md Co-authored-by: Ida Adjivon <65119712+iadjivon@users.noreply.github.com> --- content/en/llm_observability/experiments/advanced_runs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/llm_observability/experiments/advanced_runs.md b/content/en/llm_observability/experiments/advanced_runs.md index db8b6db2671..a2fb844fb4b 100644 --- a/content/en/llm_observability/experiments/advanced_runs.md +++ b/content/en/llm_observability/experiments/advanced_runs.md @@ -5,7 +5,7 @@ description: Run experiments multiple times to account for model variability on This page discusses advanced topics in running experiments, including [multiple experiment runs](#multiple-runs) and [setting up experiments in CI/CD](#setting-up-your-experiment-in-cicd). -## Run an Experiment on a subset of the Dataset +## Run an Experiment on a subset of the dataset First, add tags to your dataset records. They can be a unique identifier (e.g `name:test_use_case_1`) or represent a property of the scenario (e.g `difficulty:hard`). From d837ce563178acea686eea1b252f1a599bc31527 Mon Sep 17 00:00:00 2001 From: Charles Jacquet Date: Tue, 17 Mar 2026 12:37:06 -0400 Subject: [PATCH 3/6] Update content/en/llm_observability/experiments/advanced_runs.md Co-authored-by: Ida Adjivon <65119712+iadjivon@users.noreply.github.com> --- content/en/llm_observability/experiments/advanced_runs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/llm_observability/experiments/advanced_runs.md b/content/en/llm_observability/experiments/advanced_runs.md index a2fb844fb4b..92a93fe6795 100644 --- a/content/en/llm_observability/experiments/advanced_runs.md +++ b/content/en/llm_observability/experiments/advanced_runs.md @@ -7,7 +7,7 @@ This page discusses advanced topics in running experiments, including [multiple ## Run an Experiment on a subset of the dataset -First, add tags to your dataset records. They can be a unique identifier (e.g `name:test_use_case_1`) or represent a property of the scenario (e.g `difficulty:hard`). +First, add tags to your dataset records. These tags can be unique identifiers (for example, `name:test_use_case_1`) or represent properties of the scenario (for example, `difficulty:hard`). Then, use the `tags` argument of `LLMObs.pull_dataset()` to filter down the dataset to the records you want to run an Experiment on. From 632ba7d1d0303faefeec47eee41596c23615aa08 Mon Sep 17 00:00:00 2001 From: Charles Jacquet Date: Tue, 17 Mar 2026 14:58:52 -0400 Subject: [PATCH 4/6] Apply suggestion from @iadjivon Co-authored-by: Ida Adjivon <65119712+iadjivon@users.noreply.github.com> --- content/en/llm_observability/experiments/advanced_runs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/llm_observability/experiments/advanced_runs.md b/content/en/llm_observability/experiments/advanced_runs.md index 92a93fe6795..f375823c0b1 100644 --- a/content/en/llm_observability/experiments/advanced_runs.md +++ b/content/en/llm_observability/experiments/advanced_runs.md @@ -9,7 +9,7 @@ This page discusses advanced topics in running experiments, including [multiple First, add tags to your dataset records. These tags can be unique identifiers (for example, `name:test_use_case_1`) or represent properties of the scenario (for example, `difficulty:hard`). -Then, use the `tags` argument of `LLMObs.pull_dataset()` to filter down the dataset to the records you want to run an Experiment on. +Then, use the `tags` argument of `LLMObs.pull_dataset()` to filter the dataset to the records you want to run an Experiment on. Example ``` From 355a8ac95de59ee63912c639bf45e96545457638 Mon Sep 17 00:00:00 2001 From: Charles Jacquet Date: Tue, 17 Mar 2026 15:03:23 -0400 Subject: [PATCH 5/6] Update experiment instructions in advanced_runs.md Clarified instructions for filtering dataset and running experiments. --- .../llm_observability/experiments/advanced_runs.md | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/content/en/llm_observability/experiments/advanced_runs.md b/content/en/llm_observability/experiments/advanced_runs.md index f375823c0b1..06b7e56598b 100644 --- a/content/en/llm_observability/experiments/advanced_runs.md +++ b/content/en/llm_observability/experiments/advanced_runs.md @@ -9,13 +9,20 @@ This page discusses advanced topics in running experiments, including [multiple First, add tags to your dataset records. These tags can be unique identifiers (for example, `name:test_use_case_1`) or represent properties of the scenario (for example, `difficulty:hard`). -Then, use the `tags` argument of `LLMObs.pull_dataset()` to filter the dataset to the records you want to run an Experiment on. +Then, use the `tags` argument of `LLMObs.pull_dataset()` to filter the dataset to the relevant records and run the Experiment. Example ``` -LLMObs.pull_dataset(dataset_name="my-dataset", tags=["env:prod", "version:1.0"]) +prod_dataset = LLMObs.pull_dataset(dataset_name="my-dataset", tags=["env:prod", "version:1.0"]) + +experiment = LLMObs.experiment( + name="example-experiment", + dataset=prod_dataset, + task=topic_relevance, + evaluators=[exact_match, false_confidence] +) +experiment.run() ``` -Finally, run the Experiment as usual. ## Multiple runs From 5af032632f7c94300fd0301fee2687d5fe6cd24b Mon Sep 17 00:00:00 2001 From: Ida Adjivon <65119712+iadjivon@users.noreply.github.com> Date: Tue, 17 Mar 2026 16:23:36 -0400 Subject: [PATCH 6/6] Apply suggestions from code review Co-authored-by: Ida Adjivon <65119712+iadjivon@users.noreply.github.com> --- content/en/llm_observability/experiments/advanced_runs.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/en/llm_observability/experiments/advanced_runs.md b/content/en/llm_observability/experiments/advanced_runs.md index 06b7e56598b..511586dd26c 100644 --- a/content/en/llm_observability/experiments/advanced_runs.md +++ b/content/en/llm_observability/experiments/advanced_runs.md @@ -1,15 +1,15 @@ --- title: Advanced Experiment Runs -description: Run experiments multiple times to account for model variability on a subset of the dataset, and automate experiment execution in CI/CD pipelines. +description: Run experiments multiple times to account for model variability on a subset of your dataset, and automate experiment execution in CI/CD pipelines. --- This page discusses advanced topics in running experiments, including [multiple experiment runs](#multiple-runs) and [setting up experiments in CI/CD](#setting-up-your-experiment-in-cicd). -## Run an Experiment on a subset of the dataset +## Run an experiment on a subset of your dataset First, add tags to your dataset records. These tags can be unique identifiers (for example, `name:test_use_case_1`) or represent properties of the scenario (for example, `difficulty:hard`). -Then, use the `tags` argument of `LLMObs.pull_dataset()` to filter the dataset to the relevant records and run the Experiment. +Then, use the `tags` argument of `LLMObs.pull_dataset()` to filter the dataset to the relevant records and run the experiment. Example ```