Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -59,13 +59,13 @@ Dynamic sampling rules must always include a `condition` field, otherwise the en

#### Fetching the Sampling Configuration

The sampling configuration is fetched by Relay from Sentry by sending a request to the `/api/0/relays/projectconfigs/` endpoint periodically (defined [here](https://github.com/getsentry/sentry/blob/master/src/sentry/api/endpoints/relay/project_configs.py#L32-L32)). When this endpoint is called, the Sentry backend will attempt to retrieve the configuration from the cache, and if the configuration is not found, it will be computed and then cached in Redis.
The sampling configuration is fetched by Relay from Sentry by sending a request to the `/api/0/relays/projectconfigs/` endpoint periodically (defined [here](https://github.com/getsentry/sentry/blob/master/src/sentry/api/endpoints/relay/project_configs.py#L32-L32)). When this endpoint is called, the Sentry backend will attempt to retrieve the configuration from the cache, and if the configuration is not found, it will be computed and then cached in Redis.

### Sampling Decision

In order to arrive at a sampling decision, Relay matches the incoming event and/or DSC against the configuration, derives a sample rate from the combination of `factor` and `sampleRate` rules, and uses a random number generator to make the decision. In case there are problems during the matching process, Relay will accept the event under the assumption that it's preferable to oversample rather than drop potentially important events.

In order to make the sampling decisions, Relay samples using a [SamplingConfig](https://getsentry.github.io/relay/relay_sampling/config/struct.SamplingConfig.html) that belongs to the project of the head transaction of the trace.
In order to make the sampling decisions, Relay samples using a [SamplingConfig](https://getsentry.github.io/relay/relay_sampling/config/struct.SamplingConfig.html) that belongs to the project of the head transaction of the trace.
The payloads inspected for matching vary based on the type of rule being matched
- `trace`: a trace rule will match against the [Dynamic Sampling Context](https://getsentry.github.io/relay/relay_sampling/dsc/struct.DynamicSamplingContext.html), which remains consistent across all transactions of the trace.
- `project`: a project rule will also match against the [Dynamic Sampling Context](https://getsentry.github.io/relay/relay_sampling/dsc/struct.DynamicSamplingContext.html)
Expand Down Expand Up @@ -127,6 +127,21 @@ In this case, the matching will happen from **top to bottom** and the following
1. Rule `1` is matched against the DSC, since it is of type `trace`. The `samplingValue` is a `factor` with value `2.0`.
2. Because rule `1` was a factor rule, the matching continues and rule `2` will again be matched against the DSC, since it is of type `trace`. The `samplingValue` is a `sampleRate`, thus the matching will stop and the sample rate will be computed as `2.0 * 0.5 = 1.0`, where `2.0` is the factor accumulated from the previous rule and `0.5` is the sample rate of the current rule.

### Interpreting the Dynamic Sampling Context

The existence of a dynamic sampling context does not necessarily mean it is valid. Relay differentiates between three cases:
1. No dynamic sampling context.
2. A dynamic sampling context originating in a project of the same organization.
3. A dynamic sampling context originating in a project of a different organization or an unknown project.

If an envelope received by Relay does not contain a dynamic sampling context it is always sampled, unless the payload requires a DSC to always be present.

A dynamic sampling context which originates from either the same project or a project within the same organization is considered valid and Relay will apply the sampling rules from the root project as described in the previous section.

DSCs originating in different organizations or unknown projects are discarded and Relay will re-compute a DSC based on the data of the payload and scoped to the current project. The computed dynamic sampling context is then used to apply the dynamic sampling rules.
Comment on lines +133 to +141
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be easier to parse if we inline the descriptions into the list, and make the headers bold.

Suggested change
1. No dynamic sampling context.
2. A dynamic sampling context originating in a project of the same organization.
3. A dynamic sampling context originating in a project of a different organization or an unknown project.
If an envelope received by Relay does not contain a dynamic sampling context it is always sampled, unless the payload requires a DSC to always be present.
A dynamic sampling context which originates from either the same project or a project within the same organization is considered valid and Relay will apply the sampling rules from the root project as described in the previous section.
DSCs originating in different organizations or unknown projects are discarded and Relay will re-compute a DSC based on the data of the payload and scoped to the current project. The computed dynamic sampling context is then used to apply the dynamic sampling rules.
1. **No dynamic sampling context**: If an envelope received by Relay does not contain a dynamic sampling context it is always sampled, unless the payload requires a DSC to always be present.
2. **A dynamic sampling context originating in a project of the same organization**: A dynamic sampling context which originates from either the same project or a project within the same organization is considered valid and Relay will apply the sampling rules from the root project as described in the previous section.
3. **A dynamic sampling context originating in a project of a different organization or an unknown project**: DSCs originating in different organizations or unknown projects are discarded and Relay will re-compute a DSC based on the data of the payload and scoped to the current project. The computed dynamic sampling context is then used to apply the dynamic sampling rules.


![Interpreting the Dynamic Sampling Context](./images/interpreteDsc.png)

## Rules Generation in Sentry

Sentry is responsible for generating the rules used by Relay to perform sampling.
Expand All @@ -136,12 +151,12 @@ Sentry is responsible for generating the rules used by Relay to perform sampling
The generation of rules is performed as part of the **project configuration recomputation**, which happens:

1. When Relay requests the configuration and it is not cached in Redis.
2. When the configuration is invalidated on demand by calling [this function](https://github.com/getsentry/sentry/blob/master/src/sentry/tasks/relay.py#L244-L244). This happens when a new release is detected, when certain project settings change, the dynamic sampling tasks for computing sample rates are finished executing, and more.
2. When the configuration is invalidated on demand by calling [this function](https://github.com/getsentry/sentry/blob/master/src/sentry/tasks/relay.py#L244-L244). This happens when a new release is detected, when certain project settings change, the dynamic sampling tasks for computing sample rates are finished executing, and more.

The rules are generated [here](https://github.com/getsentry/sentry/blob/master/src/sentry/dynamic_sampling/rules/base.py#L126-L143) by performing the following steps:

1. Fetch the list of active biases (since some of them can be enabled or disabled by the user in the Sentry UI)
2. Determine the base sample rate for each project.
3. Compute the rules for each bias.

Data underlying the rules is computed asynchronously for scalability reasons. Multiple biases require data that must be computed from incoming volume data for the org in question. These biases are calculated asynchronously by background tasks that are executed by Celery and write results to Redis.
Data underlying the rules is computed asynchronously for scalability reasons. Multiple biases require data that must be computed from incoming volume data for the org in question. These biases are calculated asynchronously by background tasks that are executed by Celery and write results to Redis.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what "Requires DSC" meas in this case, can we be more specific what the condition here is?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Payload requires DSC, is that better? 🤔

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading