Skip to content

Comments

feat: Support members parameter in @dy.filter() to apply collection filters to specific members only#284

Draft
Copilot wants to merge 2 commits intomainfrom
copilot/add-filtering-specific-schemas
Draft

feat: Support members parameter in @dy.filter() to apply collection filters to specific members only#284
Copilot wants to merge 2 commits intomainfrom
copilot/add-filtering-specific-schemas

Conversation

Copy link
Contributor

Copilot AI commented Feb 23, 2026

Collection-level filters currently apply to all non-ignored members via the collection's common primary key. This makes it impossible to filter one member based on data from another (e.g., filter diagnoses using admission dates from invoices) without affecting all members.

Changes

  • @dy.filter(members=[...]): New optional parameter restricts a filter to specific members. The join key used is the common primary key of the specified members (enabling row-level filtering when a member has a superset primary key).
  • Validation (CollectionMeta): Validates that specified members exist, are not ignored_in_filters, and share a common primary key.
  • filter() / is_valid() / validate(): Each method now evaluates per-filter applicability per member; non-applicable filters are skipped entirely (not reflected in failure info).
  • matches(): Compares members as sets in addition to filter logic.
  • Serialization: Filter entries now serialize as {"logic": ..., "members": ...}; old LazyFrame-only format is still deserializable for backward compatibility.

Example

class HospitalClaims(dy.Collection):
    invoices: dy.LazyFrame[InvoiceSchema]
    diagnoses: dy.LazyFrame[DiagnosisSchema]

    @dy.filter(members=["diagnoses"])  # only filters diagnoses, not invoices
    def diagnosis_date_after_admission(self) -> pl.LazyFrame:
        # Return diagnoses rows to keep; joined on DiagnosisSchema's full primary key
        return self.diagnoses.join(
            self.invoices.select(["invoice_id", "admission_date"]),
            on="invoice_id",
            how="left",
        ).filter(pl.col("diagnosis_date") >= pl.col("admission_date"))
Original prompt

This section details on the original issue you should resolve

<issue_title>Feature request - Support for filtering specific schemas for collection filter</issue_title>
<issue_description>Hi,

An example of filtering specific schema could be:

from decimal import Decimal
import dataframely as dy
import polars as pl
class InvoiceIdSchema(dy.Schema):
    invoice_id = dy.String(primary_key=True)
class InvoiceSchema(InvoiceIdSchema):
    admission_date = dy.Date(nullable=False)
    discharge_date = dy.Date(nullable=False)
    received_at = dy.Datetime(nullable=False)
    amount = dy.Decimal(nullable=False, min_exclusive=Decimal(0))

    @dy.rule()
    def discharge_after_admission(cls) -> pl.Expr:
        return pl.col("discharge_date") >= pl.col("admission_date")

    @dy.rule()
    def received_at_after_discharge(cls) -> pl.Expr:
        return pl.col("received_at").dt.date() >= pl.col("discharge_date")
class DiagnosisSchema(InvoiceIdSchema):
    diagnosis_code = dy.String(primary_key=True, regex=r"[A-Z][0-9]{2,4}")
    diagnosis_date = dy.Date(nullable=False)
    is_main = dy.Bool(nullable=False)

    @dy.rule(group_by=["invoice_id"])
    def exactly_one_main_diagnosis(cls) -> pl.Expr:
        return pl.col("is_main").sum() == 1
class HospitalClaims(dy.Collection):
    invoices: dy.LazyFrame[InvoiceSchema]
    diagnoses: dy.LazyFrame[DiagnosisSchema]

    @dy.filter(members=["diagnoses"])  # Apply filter only to diagnoses member
    def diagnosis_date_after_admission(self) -> pl.LazyFrame:
        """
        Filter diagnoses to only include those where diagnosis_date >= admission_date.
        This requires joining with invoices to get admission_date.
        
        Returns: LazyFrame with valid diagnoses rows
        """
        return self.diagnoses.join(
            self.invoices.select(["invoice_id", "admission_date"]),
            on="invoice_id",
            how="left"
        ).filter(
            pl.col("diagnosis_date") >= pl.col("admission_date")
        )

    @dy.rule()  # Collection-level validation across members
    def diagnosis_before_discharge(self) -> pl.LazyFrame:
        """
        Validate that all diagnosis dates occur before or on discharge date.
        
        Returns: LazyFrame with rows that violate this rule (diagnosis_date > discharge_date)
        """
        invalid_rows = self.diagnoses.join(
            self.invoices.select(["invoice_id", "discharge_date"]),
            on="invoice_id",
            how="inner"
        ).filter(
            pl.col("diagnosis_date") > pl.col("discharge_date")
        )
        
        return invalid_rows

Ability to filter only specifc schema for filtering collection.</issue_description>

Comments on the Issue (you are @copilot in this section)

@AndreasAlbertQC Thanks for reaching out @ereb2impact ! We currently only support member-based filtering if you define the filtering in the member schemas themselves. Necessarily, that won't work for your example because you are using multiple members as the same time in order to determine how to filter one specific member. Your use case seems reasonable to me and I can't immediately think of anything we would break with this suggestion

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

…ction filters

Co-authored-by: borchero <22455425+borchero@users.noreply.github.com>
Copilot AI changed the title [WIP] Add support for filtering specific schemas in collection filter feat: Support members parameter in @dy.filter() to apply collection filters to specific members only Feb 23, 2026
Copilot AI requested a review from borchero February 23, 2026 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request - Support for filtering specific schemas for collection filter

2 participants