diff --git a/develop-docs/application-architecture/dynamic-sampling/architecture.mdx b/develop-docs/application-architecture/dynamic-sampling/architecture.mdx index 409d331e3a62c..d3b2d809fcd17 100644 --- a/develop-docs/application-architecture/dynamic-sampling/architecture.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/architecture.mdx @@ -59,13 +59,13 @@ Dynamic sampling rules must always include a `condition` field, otherwise the en #### Fetching the Sampling Configuration -The sampling configuration is fetched by Relay from Sentry by sending a request to the `/api/0/relays/projectconfigs/` endpoint periodically (defined [here](https://github.com/getsentry/sentry/blob/master/src/sentry/api/endpoints/relay/project_configs.py#L32-L32)). When this endpoint is called, the Sentry backend will attempt to retrieve the configuration from the cache, and if the configuration is not found, it will be computed and then cached in Redis. +The sampling configuration is fetched by Relay from Sentry by sending a request to the `/api/0/relays/projectconfigs/` endpoint periodically (defined [here](https://github.com/getsentry/sentry/blob/master/src/sentry/api/endpoints/relay/project_configs.py#L32-L32)). When this endpoint is called, the Sentry backend will attempt to retrieve the configuration from the cache, and if the configuration is not found, it will be computed and then cached in Redis. ### Sampling Decision In order to arrive at a sampling decision, Relay matches the incoming event and/or DSC against the configuration, derives a sample rate from the combination of `factor` and `sampleRate` rules, and uses a random number generator to make the decision. In case there are problems during the matching process, Relay will accept the event under the assumption that it's preferable to oversample rather than drop potentially important events. -In order to make the sampling decisions, Relay samples using a [SamplingConfig](https://getsentry.github.io/relay/relay_sampling/config/struct.SamplingConfig.html) that belongs to the project of the head transaction of the trace. +In order to make the sampling decisions, Relay samples using a [SamplingConfig](https://getsentry.github.io/relay/relay_sampling/config/struct.SamplingConfig.html) that belongs to the project of the head transaction of the trace. The payloads inspected for matching vary based on the type of rule being matched - `trace`: a trace rule will match against the [Dynamic Sampling Context](https://getsentry.github.io/relay/relay_sampling/dsc/struct.DynamicSamplingContext.html), which remains consistent across all transactions of the trace. - `project`: a project rule will also match against the [Dynamic Sampling Context](https://getsentry.github.io/relay/relay_sampling/dsc/struct.DynamicSamplingContext.html) @@ -127,6 +127,21 @@ In this case, the matching will happen from **top to bottom** and the following 1. Rule `1` is matched against the DSC, since it is of type `trace`. The `samplingValue` is a `factor` with value `2.0`. 2. Because rule `1` was a factor rule, the matching continues and rule `2` will again be matched against the DSC, since it is of type `trace`. The `samplingValue` is a `sampleRate`, thus the matching will stop and the sample rate will be computed as `2.0 * 0.5 = 1.0`, where `2.0` is the factor accumulated from the previous rule and `0.5` is the sample rate of the current rule. +### Interpreting the Dynamic Sampling Context + +The existence of a dynamic sampling context does not necessarily mean it is valid. Relay differentiates between three cases: +1. No dynamic sampling context. +2. A dynamic sampling context originating in a project of the same organization. +3. A dynamic sampling context originating in a project of a different organization or an unknown project. + +If an envelope received by Relay does not contain a dynamic sampling context it is always sampled, unless the payload requires a DSC to always be present. + +A dynamic sampling context which originates from either the same project or a project within the same organization is considered valid and Relay will apply the sampling rules from the root project as described in the previous section. + +DSCs originating in different organizations or unknown projects are discarded and Relay will re-compute a DSC based on the data of the payload and scoped to the current project. The computed dynamic sampling context is then used to apply the dynamic sampling rules. + +![Interpreting the Dynamic Sampling Context](./images/interpreteDsc.png) + ## Rules Generation in Sentry Sentry is responsible for generating the rules used by Relay to perform sampling. @@ -136,7 +151,7 @@ Sentry is responsible for generating the rules used by Relay to perform sampling The generation of rules is performed as part of the **project configuration recomputation**, which happens: 1. When Relay requests the configuration and it is not cached in Redis. -2. When the configuration is invalidated on demand by calling [this function](https://github.com/getsentry/sentry/blob/master/src/sentry/tasks/relay.py#L244-L244). This happens when a new release is detected, when certain project settings change, the dynamic sampling tasks for computing sample rates are finished executing, and more. +2. When the configuration is invalidated on demand by calling [this function](https://github.com/getsentry/sentry/blob/master/src/sentry/tasks/relay.py#L244-L244). This happens when a new release is detected, when certain project settings change, the dynamic sampling tasks for computing sample rates are finished executing, and more. The rules are generated [here](https://github.com/getsentry/sentry/blob/master/src/sentry/dynamic_sampling/rules/base.py#L126-L143) by performing the following steps: @@ -144,4 +159,4 @@ The rules are generated [here](https://github.com/getsentry/sentry/blob/master/s 2. Determine the base sample rate for each project. 3. Compute the rules for each bias. -Data underlying the rules is computed asynchronously for scalability reasons. Multiple biases require data that must be computed from incoming volume data for the org in question. These biases are calculated asynchronously by background tasks that are executed by Celery and write results to Redis. \ No newline at end of file +Data underlying the rules is computed asynchronously for scalability reasons. Multiple biases require data that must be computed from incoming volume data for the org in question. These biases are calculated asynchronously by background tasks that are executed by Celery and write results to Redis. diff --git a/develop-docs/application-architecture/dynamic-sampling/images/interpreteDsc.png b/develop-docs/application-architecture/dynamic-sampling/images/interpreteDsc.png new file mode 100644 index 0000000000000..cd27cbc565847 Binary files /dev/null and b/develop-docs/application-architecture/dynamic-sampling/images/interpreteDsc.png differ