diff --git a/content/en/llm_observability/experiments/advanced_runs.md b/content/en/llm_observability/experiments/advanced_runs.md index fe4e91df67e..511586dd26c 100644 --- a/content/en/llm_observability/experiments/advanced_runs.md +++ b/content/en/llm_observability/experiments/advanced_runs.md @@ -1,10 +1,29 @@ --- title: Advanced Experiment Runs -description: Run experiments multiple times to account for model variability and automate experiment execution in CI/CD pipelines. +description: Run experiments multiple times to account for model variability on a subset of your dataset, and automate experiment execution in CI/CD pipelines. --- This page discusses advanced topics in running experiments, including [multiple experiment runs](#multiple-runs) and [setting up experiments in CI/CD](#setting-up-your-experiment-in-cicd). +## Run an experiment on a subset of your dataset + +First, add tags to your dataset records. These tags can be unique identifiers (for example, `name:test_use_case_1`) or represent properties of the scenario (for example, `difficulty:hard`). + +Then, use the `tags` argument of `LLMObs.pull_dataset()` to filter the dataset to the relevant records and run the experiment. + +Example +``` +prod_dataset = LLMObs.pull_dataset(dataset_name="my-dataset", tags=["env:prod", "version:1.0"]) + +experiment = LLMObs.experiment( + name="example-experiment", + dataset=prod_dataset, + task=topic_relevance, + evaluators=[exact_match, false_confidence] +) +experiment.run() +``` + ## Multiple runs You can run the same experiment multiple times to account for model non-determinism. You can use the [LLM Observability Python SDK][1] or [Experiments API][2] to specify how many iterations to run; subsequently, each dataset record is executed that many times using the same tasks and evaluators. @@ -215,4 +234,4 @@ jobs: [1]: /llm_observability/instrumentation/sdk?tab=python [2]: /llm_observability/experiments/api -[3]: https://app.datadoghq.com/llm/experiments \ No newline at end of file +[3]: https://app.datadoghq.com/llm/experiments