Skip to content

xds: Implementation of Unified Matcher and CEL Integration#12640

Open
shivaspeaks wants to merge 20 commits intogrpc:masterfrom
shivaspeaks:xds-unified-matcher-and-cel
Open

xds: Implementation of Unified Matcher and CEL Integration#12640
shivaspeaks wants to merge 20 commits intogrpc:masterfrom
shivaspeaks:xds-unified-matcher-and-cel

Conversation

@shivaspeaks
Copy link
Member

Implements gRFC A106: xDS Unified Matcher and CEL Integration
grpc/proposal#520

@shivaspeaks shivaspeaks force-pushed the xds-unified-matcher-and-cel branch from 082243d to f472c63 Compare February 3, 2026 10:11
@shivaspeaks shivaspeaks marked this pull request as ready for review February 8, 2026 09:01
Copy link

@l46kok l46kok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the drive-by -- I happened to notice this while putting together an unrelated PR in CEL-Java. I'll hold off on submitting it on our end.

# checkstyle 10.0+ requires Java 11+
# See https://checkstyle.sourceforge.io/releasenotes_old_8-35_10-26.html#Release_10.0
# checkForUpdates: checkstylejava8:9.+
cel-runtime = "dev.cel:runtime:0.11.1"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you plan on using both the compilation and runtime packages, you could consider just taking a dependency on dev.cel:cel instead, which includes both. It will likely make your dependency management much easier.

On a separate note -- let me know if you'd like us to cut a new release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We purposefully don't want a dependency on the cel parser. Also, it seems dev.cel:cel is busted, in that it re-defines dev.cel.runtime instead of depending on the other artifact. That will cause hard-to-debug duplicate classes in the class path.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For artifact issue -- it's tracked, we can look into prioritizing it: (b/328302799).

The PR here is referencing CelCompiler, so I think you'll need the compiler dependency (unless if it shouldn't have been included here).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is pretty serious, as it prevents dependency management from working properly and it is not something regular people are able to debug. It's the sort of thing where everything looks fine, but then you make an unrelated change and somehow suddenly breaks CEL. So the concern is in support costs.

Really, I'm wary about merging this with the broken cel jars; it's best not to have them in our dependency tree. Even with the code disabled they can still harm users. Maybe it is best to make them compileOnly until things are fixed, which also means parts of testing and feature stabilization would be delayed until it is fixed.

The PR is only using CelCompiler in tests. It will need to get reorganized to avoid the dependency for production.

@shivaspeaks, you may want to glance at go/grpc-cel-integration. It was supposed to have been superseded by the gRFC, but I'm seeing the gRFC doesn't make a point of calling out some of the goals (like not depending on the CEL compiler). It's longer than you probably want to read in full, but you may just skim it to find the "interesting" parts.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into this next week. I was actually under the impression this wasn't a hard blocker. Thanks for surfacing it up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an observation @ejona86! I was trying what the doc said:
"
Production code: no need to depend on the full dev.cel:cel package.

Testing code: it's acceptable to use unstable CEL methods and depend on dev.cel:cel / dev.cel:compiler in testing code.
"

When ChannelAndServerBuilderTest runs, it uses Guava's ClassPath to scan all io.grpc.* classes. When the JVM Verifier loads CelMatcher or PredicateEvaluator, it traverses the method signatures and fields, eventually hitting CodePointStream. Because CodePointStream implements CharStream, the JVM tries to load CharStream. Since dev.cel:runtime didn't bring ANTLR transitively, and I removed dev.cel:compiler (which does bring ANTLR) from implementation, the JVM throws NoClassDefFoundError. I am being forced to add compileOnly(libraries.cel.compiler) or else we can add implementation 'org.antlr:antlr4-runtime'

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've merged in the fix for this: google/cel-java#954. Runtime pulling in the antlr dependency is unintended, which is also fixed with this PR. I'll cut a new release sometime next week.

In the interim, I've pushed a snapshot version if you'd like to locally verify the fix (and let me know if you encounter any issues): https://central.sonatype.com/repository/maven-snapshots/dev/cel/runtime/0.12.0-SNAPSHOT/runtime-0.12.0-SNAPSHOT.pom

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes its working with the snapshot version.

I assume there is no periodic release schedule for cel-java, like every 6 weeks for grpc (https://github.com/grpc/grpc-java/milestones or like go/grpc-oss-release-dates)? Is it done as and when required a new release for cel?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the most part, we cut a release whenever as needed. At a minimum, we'll cut one once per quarter.

Comment on lines 33 to 35
.enableStringConversion(false)
.enableStringConcatenation(false)
.enableListConcatenation(false)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider subsetting the environment via CelStandardDeclarations (compiler) and CelStandardFunctions (runtime) instead. Examples:

This is the preferred mechanism, as it gives you more custom control over what functions/overloads are exposed. It's the direction cel-go went too, so this is likely desirable from a consistency standpoint as well.

* Compiles the CEL expression string into a CelMatcher.
*/
@VisibleForTesting
public static CelMatcher compile(String expression)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to the testing code so we don't need a dependency on the Cel compiler. It looks unused for production.

# checkstyle 10.0+ requires Java 11+
# See https://checkstyle.sourceforge.io/releasenotes_old_8-35_10-26.html#Release_10.0
# checkForUpdates: checkstylejava8:9.+
cel-runtime = "dev.cel:runtime:0.11.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is pretty serious, as it prevents dependency management from working properly and it is not something regular people are able to debug. It's the sort of thing where everything looks fine, but then you make an unrelated change and somehow suddenly breaks CEL. So the concern is in support costs.

Really, I'm wary about merging this with the broken cel jars; it's best not to have them in our dependency tree. Even with the code disabled they can still harm users. Maybe it is best to make them compileOnly until things are fixed, which also means parts of testing and feature stabilization would be delayed until it is fixed.

The PR is only using CelCompiler in tests. It will need to get reorganized to avoid the dependency for production.

@shivaspeaks, you may want to glance at go/grpc-cel-integration. It was supposed to have been superseded by the gRFC, but I'm seeing the gRFC doesn't make a point of calling out some of the goals (like not depending on the CEL compiler). It's longer than you probably want to read in full, but you may just skim it to find the "interesting" parts.

Comment on lines 57 to 58
String id = over.toString();
return !id.equals("ADD_STRING") && !id.equals("ADD_LIST");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to compare directly against the overload enum instead of the stringified form (ex: over != AddOverload.ADD_STRING).

copybara-service bot pushed a commit to google/cel-java that referenced this pull request Feb 18, 2026
copybara-service bot pushed a commit to google/cel-java that referenced this pull request Feb 18, 2026
copybara-service bot pushed a commit to google/cel-java that referenced this pull request Feb 18, 2026
copybara-service bot pushed a commit to google/cel-java that referenced this pull request Feb 18, 2026
Also removes antlr dependency from `dev.cel:runtime` artifact

Fixes: #566
See also grpc/grpc-java#12640
PiperOrigin-RevId: 871541021
copybara-service bot pushed a commit to google/cel-java that referenced this pull request Feb 18, 2026
Also removes antlr dependency from `dev.cel:runtime` artifact

Fixes: #566
See also grpc/grpc-java#12640
PiperOrigin-RevId: 871541021
copybara-service bot pushed a commit to google/cel-java that referenced this pull request Feb 18, 2026
Also removes antlr dependency from `dev.cel:runtime` artifact

Fixes: #566
See also grpc/grpc-java#12640
PiperOrigin-RevId: 871541021
copybara-service bot pushed a commit to google/cel-java that referenced this pull request Feb 18, 2026
Also removes antlr dependency from `dev.cel:runtime` artifact

Fixes: #566
See also grpc/grpc-java#12640
PiperOrigin-RevId: 871541021
copybara-service bot pushed a commit to google/cel-java that referenced this pull request Feb 18, 2026
Also removes antlr dependency from `dev.cel:runtime` artifact

Fixes: #566
See also grpc/grpc-java#12640
PiperOrigin-RevId: 871541021
copybara-service bot pushed a commit to google/cel-java that referenced this pull request Feb 18, 2026
Also removes antlr dependency from `dev.cel:runtime` artifact

Fixes: #566
See also grpc/grpc-java#12640
PiperOrigin-RevId: 871541021
copybara-service bot pushed a commit to google/cel-java that referenced this pull request Feb 18, 2026
Also removes antlr dependency from `dev.cel:runtime` artifact

Fixes: #566
See also grpc/grpc-java#12640
PiperOrigin-RevId: 871541021
copybara-service bot pushed a commit to google/cel-java that referenced this pull request Feb 18, 2026
Also removes antlr dependency from `dev.cel:runtime` artifact

Fixes: #566
See also grpc/grpc-java#12640
PiperOrigin-RevId: 871541021
copybara-service bot pushed a commit to google/cel-java that referenced this pull request Feb 18, 2026
Also removes antlr dependency from `dev.cel:runtime` artifact

Fixes: #566
See also grpc/grpc-java#12640
PiperOrigin-RevId: 871541021
copybara-service bot pushed a commit to google/cel-java that referenced this pull request Feb 18, 2026
Also removes antlr dependency from `dev.cel:runtime` artifact

Fixes: #566
See also grpc/grpc-java#12640
PiperOrigin-RevId: 872010923
@ejona86 ejona86 requested a review from sergiitk February 23, 2026 18:02
String id = over.toString();
return !id.equals("ADD_STRING") && !id.equals("ADD_LIST");
return !over.equals(
(Object) dev.cel.runtime.standard.AddOperator.AddOverload.ADD_STRING)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we needing to cast to Object? If you're working around ErrorProne, then something seems broken and casting to Object is just hiding it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it was an errorProne workaround.
over is statically known as a StandardOverload interface, but ADD_STRING is an AddOverload enum. ErrorProne threw an [EqualsIncompatibleType] build error so we need to cast the ADD_STRING enum specifically to (Object) before passing it into equals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ADD_STRING is a CelStandardOverload, which isn't a StandardOverload. It seems ErrorProne is correct; these will never be equal.

You should comment cases like this. If this were a bug in ErrorProne we'd probably want to file a bug with ErrorProne, but even if not, the comment helps us know when it is no longer necessary. (We don't have to guess why it was done, so we can confirm if that is no longer the case.)

It looks like that has been adjusted on cel-java master, as the function is now passed CelStandardOverload.

@sergiitk, the interface change looks like a breaking API change. That's surprising since I thought we were using stable APIs. Is this PR using an API that it shouldn't be, was this a known change that just hadn't been released yet, or should we be concerned about the actual stability of the API?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a relatively new API that didn't exist when API stabilization doc was made (go/cel-api-stabilization). It's stable now (I'll update the doc when I get a chance). Reason why I suggested this API is that it follows the recommended pattern for how environment subsetting is supposed to be done across all CEL stacks.

As for EP -- I'm surprised it's actually flagging this (internally, it doesn't). AddOperator enum implements CelStandardOverload, so they should equal. My only guess is that CelStandardOverload is a functional interface, so theoretically it's possible to provide a lambda that's not equals comparable. Though we control the standard overload implementations, so in practice this won't happen. Checking for reference identity should likely get around the issue over == AddOverload.ADD_STRING

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, this is the new setup flow. I didn't look where this FUNCTIONS constant was used. That's fine; I'm less surprised that changed. (And yes, it's good to add it to the doc.)

I'm surprised it's actually flagging this (internally, it doesn't)

I don't think it'd get flagged on master either. It only looked to be a problem because we're still using 0.11.1; in 0.11.1 I agree with ErrorProne that the equals() will always return false.

import javax.annotation.Nullable;

/**
* Interface for extracting values from a match context (e.g. HTTP headers).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says explicitly "from a match context", and the only two implementations use MatchContext, so why is T present? Shouldn't this just have an argument that is MatchContext?

return result.actions;
}

public interface MatchContext {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing implements this. Is this still TODO? There's no mention this is missing in the PR description.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grfc says:

Matcher Context: The input to the Matcher evaluation tree. Provided at runtime by the filter, contains the input data.

MatchContext is the representation of the "Matcher Context" described in gRFC. It is currently an empty shell (no implementations in this PR) because it is designed to be implemented by the specific xDS filters (e.g., RLQS or Composite Filter) that will be the primary consumers of this matching logic, as noted in the gRFC's Implementation section

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation section saying "as part of" is saying "it will be part of that effort." We aren't saying CEL will be part of RLQS and the Composite Filter will call the RLQS code if RLQS was implemented first.

I think the filter will construct the Matcher Context and provide its contents, but I don't think each filter will have its own implementation, or at least an implementation from scratch. Headers will be Metadata, Method is always POST, Path will be pulled from MethodDescriptor, etc. I could understand if there was a server-side vs client-side implementation, but even that doesn't seem likely; worst-case looks to be different factory methods as convenience.

/**
* Represents a compiled xDS Matcher.
*/
public abstract class UnifiedMatcher {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we have enough information to satisfy:

If set, the return type of the input must match the input type
of the matcher.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now. UnifiedMatcher has input+matcher combined together, so there's no need to check types when using it. But I think I got confused because there is not Matcher API. There should be a matcher API; in no way do we want to hard-code each matcher type.

final class OnMatch {
@Nullable private final UnifiedMatcher nestedMatcher;
@Nullable private final TypedExtensionConfig action;
final boolean keepMatching;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it seems like keepMatching is incomplete (beyond this one instance). But that's not mentioned in the PR description. What is the current status of keepMatching?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MatcherList: keep_matching is used to continue the loop and accumulate actions.

In MatcherTree (Prefix): It is also implemented for traversing the trie/list of prefixes. Check line 133 in MatcherTree (if (!onMatch.keepMatching))

In MatcherTree (Exact): If a key matches, it returns immediately without checking if it should "keep matching" because thats the only logical way it seems. Should it fall back to on_no_match?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keepMatching here is literally not used. Clearly that is wrong. Note that I'm most bothered that there's no TODO or mention in the PR description. There's nothing that calls out "this didn't make sense" or "this was not possible" for us to work through.

It seems you're missing the fact keep_matching means multiple completely different matchers can contribute to the result. You need to accumulate keep_matching results across matchers.

This is probably more clear after reading my comments at https://github.com/grpc/proposal/pull/520/changes#r2683391713 . Basically, the most important thing to understand about keep_matching is that it is a semantically different result. You shouldn't semantically mix the results between a terminal result with those of keep_matching; they aren't interchangeable (if C++ wants to and have an encoding for them to determine which results are what, they are free to, but I don't want the security vulnerability risk). We should either use a callback or a separate list in the result (MatchResult {TypedExecutionConfig action; List<TypedExecutionConfig> keepMatchingActions; }, where action == null means matched == false, or use Optional<TypedExecutionConfig> to track it)

try {
com.github.xds.type.matcher.v3.CelMatcher celProto = customConfig.getTypedConfig()
.unpack(com.github.xds.type.matcher.v3.CelMatcher.class);
if (celProto.hasExprMatch()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invert this condition so that we can more easily see what exception is matched with with if and also can get rid of the else. Ditto elsewhere in this class.

@Test
public void celMatcher_match() {
// Construct CelMatcher with checked expression
CelMatcher celMatcher = createCelMatcher("request.path == '/good'");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reasons for the CEL-specific tests to be bunched with the general matcher logic. We should separate them out.

import org.junit.runners.JUnit4;

@RunWith(JUnit4.class)
public class UnifiedMatcherTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to separate (or at least make a clear distinction between) the tests for:

  • the proto validation logic (construction the matcher object from protos for the first time)
  • the matching logic (when a concrete instance of the matcher object is applied to different inputs.

Comment on lines 52 to 53
@Nullable private final Matchers.StringMatcher stringMatcher;
@Nullable private final CelMatcher celMatcher;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class design is unsustainable. We'll be adding support for different matcher extensions, not just these two. Do we expect hardcoding a separate nullable field for each extension we add?


if (proto.hasValueMatch()) {
if (proto.getInput().getTypedConfig().getTypeUrl()
.equals(TYPE_URL_HTTP_ATTRIBUTES_CEL_INPUT)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason for any CEL logic to be tightly coupled with SinglePredicateEvaluator, or any other specific input/matcher extensions.

Instead, we should introduce two registries: one for inputs, one for matchers. Then we need to generalize logic, like so:

  1. Check if the input contains a type in the input registry. If so, get corresponding class
  2. Check if the matcher (either from value_match or custom_match) contains a type in the matcher registry. If so, get corresponding class
  3. Compare that the output type of the input is the same as the input type of the matcher.
  4. If yes, store the input object and the matcher object as Single Predicate object fields. These object fields types must not be concrete (like CelMatcher celMatcher right now), but interfaces/abstract classes.

* Returns the type of value extracted by this input.
*/
default Class<?> outputType() {
return Object.class;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This disables the check by default and it seems extremely doubtful any implementation would legitimately use Object. Delete the default implementation?

}

if (!input.outputType().isAssignableFrom(matcher.inputType())
&& !matcher.inputType().isAssignableFrom(input.outputType())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way for the input to be assignable from the output and the output to be assignable from the input is for it to be the same class. This is just a verbose equals() check.

And just using equals() sounds perfect. I don't think we need any complex object hierarchy stuff going on. We just need to make sure they match. Equals() is what I'd hope we use, as it is unlikely to be broken.

this.matcher = new Matcher() {
@Override
public boolean match(Object value) {
if (value instanceof String) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the input isn't compatible, this should throw. That is a programmer error, as the class said which type it can handle and that wasn't provided. That lets us find bugs. Note that silently ignoring a failure like that can be a security vulnerability. We need to be just as precise with match and non-match; we can't favor one way or the other, as we have no knowledge of what this matching implies.

}

@Override
public Object apply(MatchContext context) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have this method return String even when the interface returns Object. That would help guarantee we don't accidentally return any other type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants