Skip to content

feat: add event dataclasses#32

Open
cademirch wants to merge 10 commits intomainfrom
feat-event-dataclasses
Open

feat: add event dataclasses#32
cademirch wants to merge 10 commits intomainfrom
feat-event-dataclasses

Conversation

@cademirch
Copy link
Collaborator

@cademirch cademirch commented Jul 24, 2025

This adds logevent dataclasses for conviently parsing log records into a dataclass.

This could also be extended to provide formatting to simplify logging upstream. This would move the complex formatting out of Snakemake's current DefaultFormatter and to the LogEvent.

Example:

#snakemake/src/snakemake/logging/events.py
from snakemake_interface_logger_plugins.common import JobInfo as BaseJobInfo

class JobInfo(BaseJobInfo):
    
    def format(self) -> str:
        lines = [f"rule {self.rule_name}:"]
        lines.append(f"    jobid: {self.jobid}")
        lines.append(f"    threads: {self.threads}")
        # ... custom formatting logic ...
        return "\n".join(lines)
#snakemake/src/snakemake/jobs.py
...
def log_info(self, indent=False, printshellcmd=True):
    from snakemake.logging.events import JobInfo, ShellCmd
    
    priority = self.priority
    benchmark = (
        fmt_iofile(self.benchmark, as_input=False, as_output=True)
        if self.benchmark is not None
        else None
    )
    
    # Create event
    job_event = JobInfo(
        jobid=self.dag.jobid(self),
        rule_name=self.rule.name,
        rule_msg=self.message,
        # ... all the fields ...
        indent=indent,
    )
    
    # Format and log - the message is now a plain string
    formatted_msg = job_event.format(quiet=self.dag.workflow.output_settings.quiet)
    logger.info(
    formatted_msg, 
    extra={
        'event': LogEvent.JOB_INFO,
        'quietness': Quietness.RULES,
        **asdict(job_event),  # Unpacks all dataclass fields
    }
)
    
    # Shell command
    if printshellcmd and self.shellcmd:
        logger.info(self.shellcmd, extra={'event': LogEvent.SHELLCMD, 'quietness': Quietness.RULES, 'indent': indent})
    
    if self.rule.is_checkpoint:
        logger.info("DAG of jobs will be updated after completion.")
    if self.rule.is_handover:
        logger.info("Handing over execution to foreign system...")

Summary by CodeRabbit

  • New Features

    • Added typed dataclass representations for logging events and a runtime mapping to reconstruct event objects from log records, enabling structured, type-safe event access.
  • Documentation

    • README updated with the dataclass-based event model, a consolidated event table, and usage examples for parsing and handling common event types (JOB_ERROR, JOB_FINISHED, PROGRESS).
  • Tests

    • New test suite validating event conversion, alias handling, mapping consistency, and run-info/stat shapes.

@coderabbitai
Copy link

coderabbitai bot commented Jul 24, 2025

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Introduces a new dataclass-based event system that maps logging.LogRecord objects to typed LogEventData subclasses via from_record()/from_extra, adds a comprehensive events module, updates README examples to use dataclass instances, and replaces the previous Rich-plugin test with an event-focused test suite.

Changes

Cohort / File(s) Summary
Docs
README.md
Rewrote logger documentation to use dataclass-based event types: removed per-event field lists, added table mapping LogEvent → dataclass, added from_record() usage examples and guidance for selecting dataclasses.
Event dataclasses module
src/snakemake_interface_logger_plugins/events.py
Added LogEventData base and many concrete event dataclasses (ErrorEvent, WorkflowStartedEvent, JobInfoEvent, JobStartedEvent, JobFinishedEvent, ShellCmdEvent, JobErrorEvent, GroupInfoEvent, GroupErrorEvent, ResourcesInfoEvent, DebugDagEvent, ProgressEvent, RuleGraphEvent, RunInfoEvent), alias/default helpers, NamedList detection, extra()/from_extra/from_record flows, and LOG_EVENT_CLASSES MappingProxyType registry.
Tests
tests/tests.py
Removed prior concrete Rich-plugin test; added TestEvents suite validating dataclass ↔ extra dict ↔ LogRecord conversions, alias handling, LOG_EVENT_CLASSES mapping, and RunInfoEvent construction/compatibility; updated/mock settings structure.

Sequence Diagram(s)

sequenceDiagram
    participant Handler as LogHandler
    participant Record as LogRecord
    participant Events as EventsModule

    Handler->>Record: receive LogRecord (level, msg, event, extra)
    Handler->>Events: lookup cls = LOG_EVENT_CLASSES[record.event]
    Events->>Events: cls.from_record(record) -> from_extra(record.__dict__)
    Events-->>Handler: return typed LogEventData instance
    Handler->>Handler: inspect/use dataclass fields
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • snakemake/snakemake-interface-logger-plugins PR #29: Overlapping README and event-handling documentation changes.
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add event dataclasses' directly and clearly summarizes the main change—introduction of dataclass-based event data structures. It follows conventional commit syntax and is specific enough to understand the primary contribution.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat-event-dataclasses

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
src/snakemake_interface_logger_plugins/events.py (2)

28-47: Review the post_init validation logic for edge cases.

The WorkflowStarted class has good type conversion handling, but there are some considerations:

  1. The post_init method modifies self.snakefile in place, which could be unexpected behavior in an immutable dataclass
  2. The None check should come before the string conversion attempt

Consider this improvement:

def __post_init__(self) -> None:
-   if not isinstance(self.snakefile, str):
+   if self.snakefile is not None and not isinstance(self.snakefile, str):
        try:
            # Try to convert to string - this should work for PosixPath and other path-like objects
            self.snakefile = str(self.snakefile)
        except (TypeError, ValueError) as e:
            raise ValueError(f"Could not convert snakefile to string: {e}")

170-186: Review the mutually exclusive field logic.

The ResourcesInfo class uses conditional logic to populate only one field at a time, but this design might not be intuitive. The class could potentially have multiple resource-related fields populated simultaneously.

Consider allowing multiple fields to be populated or document the mutually exclusive behavior:

@classmethod
def from_record(cls, record: LogRecord) -> "ResourcesInfo":
-   if hasattr(record, "nodes"):
-       return cls(nodes=record.nodes)  # type: ignore
-   elif hasattr(record, "cores"):
-       return cls(cores=record.cores)  # type: ignore
-   elif hasattr(record, "provided_resources"):
-       return cls(provided_resources=record.provided_resources)  # type: ignore
-   else:
-       return cls()
+   return cls(
+       nodes=getattr(record, "nodes", None),
+       cores=getattr(record, "cores", None),
+       provided_resources=getattr(record, "provided_resources", None),
+   )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42c55ff and 1cce5ea.

📒 Files selected for processing (2)
  • README.md (1 hunks)
  • src/snakemake_interface_logger_plugins/events.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit Configuration File

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake_interface_logger_plugins/events.py
🔇 Additional comments (12)
src/snakemake_interface_logger_plugins/events.py (11)

1-5: LGTM! Clean imports and type annotations.

The imports are well-organized and include all necessary types for the dataclasses.


7-26: LGTM! Error dataclass implementation is solid.

The Error dataclass properly handles optional fields and uses safe attribute access with getattr() and default values.


49-90: LGTM! JobInfo class handles complex resource extraction well.

The resource filtering logic correctly excludes internal fields (_cores, _nodes) and safely handles cases where the resources attribute might not exist or have the expected structure.


92-106: LGTM! Robust job ID normalization logic.

The JobStarted class properly handles different input types (None, int, list) and normalizes them to a consistent List[int] format.


132-141: LGTM! Simple and consistent implementation.

The JobError class follows the same pattern as other simple dataclasses with appropriate default values.


143-153: LGTM! Proper handling of group information.

The GroupInfo class correctly uses default_factory for mutable default values and provides safe defaults.


155-168: LGTM! Consistent with GroupInfo implementation.

The GroupError class follows good practices with default_factory for mutable fields and safe attribute access.


188-203: LGTM! Consistent implementation pattern.

The DebugDag class follows the established pattern with safe attribute access and appropriate optional types.


205-213: LGTM! Simple and effective implementation.

The Progress class correctly handles required integer fields with sensible defaults.


215-222: LGTM! Appropriate handling of complex data.

The RuleGraph class uses Dict[str, Any] for the flexible rulegraph structure and provides a safe default.


224-239: LGTM! Smart statistics parsing logic.

The RunInfo class correctly separates the "total" field from per-rule counts and handles missing data gracefully.

README.md (1)

234-277: LGTM! Excellent example demonstrating proper usage.

The example function parse_event() demonstrates best practices:

  • Proper event type checking with hasattr()
  • Complete mapping of all LogEvent types to dataclasses
  • Type-safe usage with isinstance() checks
  • Clear documentation of the pattern

This is exactly the kind of example that will help plugin developers use the new dataclasses correctly.

Comment on lines +204 to +231
The `LogEvent` enum defines particularly important Snakemake events such as workflow starting, job submission, job failure, etc.
For each event, a corresponding dataclass is defined in `snakemake_interface_logger_plugins.events`.
These dataclasses provide a typed interface for accessing event-specific fields from a `LogRecord`.

**To extract event data from a `LogRecord`, use the appropriate dataclass's `from_record()` method.**

### Event Types, Dataclasses, and Typical Fields

| LogEvent | Dataclass | Typical Fields (see class for details) |
| --------------------------- | ----------------- | -------------------------------------------------------------------------------------------------------------------- |
| `LogEvent.ERROR` | `Error` | exception, location, rule, traceback, file, line |
| `LogEvent.WORKFLOW_STARTED` | `WorkflowStarted` | workflow_id, snakefile |
| `LogEvent.JOB_INFO` | `JobInfo` | jobid, rule_name, threads, input, output, log, benchmark, rule_msg, wildcards, reason, shellcmd, priority, resources |
| `LogEvent.JOB_STARTED` | `JobStarted` | job_ids |
| `LogEvent.JOB_FINISHED` | `JobFinished` | job_id |
| `LogEvent.SHELLCMD` | `ShellCmd` | jobid, shellcmd, rule_name |
| `LogEvent.JOB_ERROR` | `JobError` | jobid |
| `LogEvent.GROUP_INFO` | `GroupInfo` | group_id, jobs |
| `LogEvent.GROUP_ERROR` | `GroupError` | groupid, aux_logs, job_error_info |
| `LogEvent.RESOURCES_INFO` | `ResourcesInfo` | nodes, cores, provided_resources |
| `LogEvent.DEBUG_DAG` | `DebugDag` | status, job, file, exception |
| `LogEvent.PROGRESS` | `Progress` | done, total |
| `LogEvent.RULEGRAPH` | `RuleGraph` | rulegraph |
| `LogEvent.RUN_INFO` | `RunInfo` | per_rule_job_counts, total_job_count |

**Note:** These field lists are guidelines only and may change between versions.
Always use defensive programming practices like `getattr()` with defaults or `hasattr()` checks when accessing fields.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify event mapping table accuracy against actual dataclasses.

The documentation provides a comprehensive mapping table, but we should verify that all mappings are accurate and that field lists match the actual dataclass definitions.


🏁 Script executed:

#!/bin/bash
# Verify that all LogEvent enum values have corresponding dataclasses
echo "=== Checking LogEvent enum values ==="
rg -A 20 "class LogEvent" src/

echo "=== Checking dataclass names in events.py ==="
rg "^class [A-Z]" src/snakemake_interface_logger_plugins/events.py

echo "=== Verifying field names in dataclasses match documentation ==="
# Check specific examples mentioned in the table
rg -A 10 "class Error:" src/snakemake_interface_logger_plugins/events.py
rg -A 10 "class JobInfo:" src/snakemake_interface_logger_plugins/events.py
rg -A 10 "class ResourcesInfo:" src/snakemake_interface_logger_plugins/events.py

Length of output: 3084


🏁 Script executed:

#!/bin/bash
echo "=== WorkflowStarted fields ==="
rg -A10 "class WorkflowStarted" src/snakemake_interface_logger_plugins/events.py

echo "=== GroupInfo fields ==="
rg -A10 "class GroupInfo" src/snakemake_interface_logger_plugins/events.py

echo "=== GroupError fields ==="
rg -A10 "class GroupError" src/snakemake_interface_logger_plugins/events.py

echo "=== RunInfo fields ==="
rg -A10 "class RunInfo" src/snakemake_interface_logger_plugins/events.py

Length of output: 2107


#!/bin/bash
for cls in Error WorkflowStarted JobInfo JobStarted JobFinished ShellCmd JobError GroupInfo GroupError ResourcesInfo DebugDag Progress RuleGraph RunInfo; do
  echo "=== $cls fields ==="
  rg -A20 "class $cls" src/snakemake_interface_logger_plugins/events.py || echo "No class $cls found"
  echo
done

Update README event table for JobInfo fields

All LogEvent enum values map correctly to dataclasses in events.py, and the majority of field lists match. However, the JobInfo entry in the table includes three attributes that the JobInfo dataclass does not define:

  • The class defines:
    • jobid, rule_name, threads, input, output, log, benchmark, rule_msg, wildcards, reason
  • The table lists (incorrectly):
    • shellcmd, priority, resources

Please remove or replace shellcmd, priority, and resources from the LogEvent.JOB_INFO row and ensure the table reflects the actual JobInfo fields (e.g., rule_msg, wildcards, reason).

🤖 Prompt for AI Agents
In README.md between lines 204 and 231, the table listing typical fields for
LogEvent.JOB_INFO incorrectly includes shellcmd, priority, and resources, which
are not defined in the JobInfo dataclass. Remove these three fields from the
JOB_INFO row and update the list to accurately reflect the actual JobInfo
dataclass fields such as rule_msg, wildcards, and reason to ensure the
documentation matches the code.

Comment on lines +108 to +115
@dataclass
class JobFinished:
job_id: int

@classmethod
def from_record(cls, record: LogRecord) -> "JobFinished":
return cls(job_id=getattr(record, "job_id"))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Missing default value for required field.

The JobFinished class accesses job_id without a default value, which could raise an AttributeError if the field is missing from the LogRecord.

Apply this fix to provide a safe default:

def from_record(cls, record: LogRecord) -> "JobFinished":
-   return cls(job_id=getattr(record, "job_id"))
+   return cls(job_id=getattr(record, "job_id", 0))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@dataclass
class JobFinished:
job_id: int
@classmethod
def from_record(cls, record: LogRecord) -> "JobFinished":
return cls(job_id=getattr(record, "job_id"))
@dataclass
class JobFinished:
job_id: int
@classmethod
def from_record(cls, record: LogRecord) -> "JobFinished":
return cls(job_id=getattr(record, "job_id", 0))
🤖 Prompt for AI Agents
In src/snakemake_interface_logger_plugins/events.py around lines 108 to 115, the
JobFinished dataclass requires a job_id field but does not provide a default
value, which can cause an AttributeError if job_id is missing in the LogRecord.
Modify the from_record method to safely access job_id using getattr with a
default value (e.g., None or a sentinel value) to avoid exceptions when the
attribute is absent.

Copy link
Contributor

@johanneskoester johanneskoester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot!! Very nice design.

@johanneskoester
Copy link
Contributor

But it will be a breaking change, right? Then make sure to add an exclamation mark behind the feat in the title, i.e. feat!: replace log records by event dataclasses. Or am I wrong?

@cademirch
Copy link
Collaborator Author

But it will be a breaking change, right? Then make sure to add an exclamation mark behind the feat in the title, i.e. feat!: replace log records by event dataclasses. Or am I wrong?

This is not breaking because its separate from the LogEvent enum.

@cademirch cademirch mentioned this pull request Oct 13, 2025
4 tasks
@jlumpe
Copy link
Contributor

jlumpe commented Oct 16, 2025

Any plans to merge this soon?

@cademirch
Copy link
Collaborator Author

cademirch commented Oct 16, 2025 via email

@jlumpe
Copy link
Contributor

jlumpe commented Oct 16, 2025

No rush, I'm just excited to see this addition.

@jlumpe
Copy link
Contributor

jlumpe commented Oct 18, 2025

  • I'd say it's probably not as important to add the complete set of new events to this PR as those are easy to add afterward without interfering with any of the existing stuff.
  • I feel like some attributes have default values applied in from_record() when they shouldn't - I'm not sure it makes sense to set the job ID to 0 instead of just raising an exception when not found.
  • This is somewhat of a nitpick, but I've noticed that some of the attributes are inconsistently named (jobid vs job_id and groupid vs group_id). If you wanted to change those, this is probably the best time to do it. You would also not have to break backwards compatibility - the from_record() method (and I'm also envisioning an inverse like to_extra() to get the extra argument to pass to logging functions) could also include the old name of the attribute. I think this is already being done with JOB_STARTED, jobs on the record corresponds to job_ids on the dataclass.
  • Also a nitpick, but it might be simpler for the collection-typed fields of JobInfo (like input / output or wildcards) to default to an empty collection instead of None.

I've also made some of my own additions in #45 for reference.

@jlumpe
Copy link
Contributor

jlumpe commented Oct 18, 2025

I'm also somewhat confused about the ERROR and JOB_ERROR events (my original motivation for contributing to the logging system was for improving the error reporting). As far as I can tell Snakemake is not currently setting any of the attributes for the ERROR event other than exception, which is set to the name of the exception class (the only use of the event I could find is in snakemake.exceptions.print_exception()). If it's for logging generic exceptions, Python's logging system already has a standard way to include the exception type and instance (and traceback) through LogRecord.exc_info.

I feel like the error-related events should only include the Snakemake-specific information. Here's a list of what I think that could be:

  • For Python exceptions that occur in a user-defined workflow file:
    • If it was a Snakefile or script file.
    • Absolute path path to it (may not be the main Snakefile if an include statement was used).
    • Workflow-relative path (not always applicable because you can run scripts from anywhere).
    • Mapping between line numbers in the original source file and those in the associated traceback object (which is for some transformed version of the original source). The print_exception() function I mentioned already takes this as an argument.
  • General information on the runtime context:
    • Name of associated rule, if any.
    • ID of running job, if any.

This could all be implemented in one or multiple event types.

jlumpe and others added 2 commits March 9, 2026 16:08
- Added `LogEventData` base class.
- Added explicit mapping between `LogEvent` enum values and
`LogEventData` subclasses (`LogEventData.event` class attribute, and
`LOG_EVENT_CLASSES` global var).
- Added default implementation of `from_record()` class method that just
inspects the dataclass fields:
- Allows attributes to be missing from the record if the corresponding
field has a default value, raises an exception otherwise.
  - Most subclasses just use the default implementation.
- This changes the previous behavior in a few places, where a default
value was applied in `from_record()` that was different than the
dataclass field default.

Still to be done:

- Add `data_from_record()` function which dispatches to proper subclass
based on `record.event`.
- Should the `from_record()` class method be made private in this case?
- Add `to_extra()` method, which converts to the dict to be passed as
the `extra` argument to a logging function.
  - Tests for round trip of data to/from `LogRecord` instance.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/tests.py (1)

178-216: Add a regression case for raw field-name and legacy-key payloads.

This only round-trips dictionaries produced by extra(), so it never exercises the two parser inputs this PR needs to support: canonical dataclass fields (job_id, job_ids, group_id) and legacy Snakemake keys (jobid, groupid, cmd). A focused test for both shapes would catch the compatibility bugs here.

Example test shape
+    `@pytest.mark.parametrize`(
+        ("event", "attrs", "expected_type"),
+        [
+            (
+                LogEvent.JOB_INFO,
+                {"job_id": 123, "rule_name": "example_rule", "threads": 4},
+                logevents.JobInfoEvent,
+            ),
+            (
+                LogEvent.JOB_FINISHED,
+                {"jobid": 123},
+                logevents.JobFinishedEvent,
+            ),
+            (
+                LogEvent.GROUP_INFO,
+                {"groupid": "group1"},
+                logevents.GroupInfoEvent,
+            ),
+            (
+                LogEvent.SHELLCMD,
+                {"jobid": 123, "cmd": "echo hi"},
+                logevents.ShellCmdEvent,
+            ),
+        ],
+    )
+    def test_from_extra_accepts_canonical_and_legacy_keys(
+        self, event, attrs, expected_type
+    ):
+        data = logevents.LogEventData.from_extra({"event": event, **attrs})
+        assert isinstance(data, expected_type)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/tests.py` around lines 178 - 216, Extend the test_conversion regression
to also cover raw dataclass field names and legacy Snakemake keys by creating
extra payload shapes that use canonical fields (job_id, job_ids, group_id, cmd
where applicable) and legacy keys (jobid, groupid, cmd) and asserting
LogEventData.from_extra and from_record correctly parse and round-trip them;
update the loop over self.EVENTS in test_conversion to, for each data, build two
additional extras (one using the dataclass field names and one using legacy
keys) derived from data.extra() and then assert that
logevents.LogEventData.from_extra(...) and
logevents.LogEventData.from_record(...) return the original or equal
LogEventData instances for both shapes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/snakemake_interface_logger_plugins/events.py`:
- Around line 239-243: JobFinishedEvent.job_id and GroupInfoEvent.group_id must
retain legacy key names for compatibility: update their serialization/parsing to
accept and emit both the canonical names (job_id, group_id) and the legacy short
names ("jobid", "groupid"); specifically modify the (de)serialization mapping
used by the LogEventData subclass handling (or the to_dict/from_dict helpers) so
that JobFinishedEvent and GroupInfoEvent recognize incoming "jobid"/"groupid"
when parsing and also include the legacy keys when serializing alongside the
canonical keys.
- Around line 42-49: _in _from_extra_default (used by from_extra), the loop only
checks the alias key in extra and rejects inputs that use the dataclass
canonical field names; change the lookup to accept the canonical name first
(fld.name) and fall back to the legacy alias (fld.metadata.get("alias")), so for
each fld: if fld.name in extra use that value, else if alias in extra use the
alias value, otherwise raise the existing ValueError when no default exists.
Update references to fld.name, fld.metadata.get("alias", ...),
_from_extra_default and from_extra accordingly.

---

Nitpick comments:
In `@tests/tests.py`:
- Around line 178-216: Extend the test_conversion regression to also cover raw
dataclass field names and legacy Snakemake keys by creating extra payload shapes
that use canonical fields (job_id, job_ids, group_id, cmd where applicable) and
legacy keys (jobid, groupid, cmd) and asserting LogEventData.from_extra and
from_record correctly parse and round-trip them; update the loop over
self.EVENTS in test_conversion to, for each data, build two additional extras
(one using the dataclass field names and one using legacy keys) derived from
data.extra() and then assert that logevents.LogEventData.from_extra(...) and
logevents.LogEventData.from_record(...) return the original or equal
LogEventData instances for both shapes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 626739fa-680a-4d0b-bfa9-ff42efd30e1b

📥 Commits

Reviewing files that changed from the base of the PR and between 1cce5ea and da0d407.

📒 Files selected for processing (2)
  • src/snakemake_interface_logger_plugins/events.py
  • tests/tests.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
README.md (1)

220-235: ⚠️ Potential issue | 🟡 Minor

Update the table to match the shipped event classes and fields.

The rest of this PR uses ErrorEvent / WorkflowStartedEvent / JobInfoEvent / ... with normalized fields like job_id and group_id (tests/tests.py Lines 82-143; src/snakemake_interface_logger_plugins/events.py:460-481), but the table still documents Error, JobInfo, jobid, groupid, and extra JOB_INFO fields like shellcmd, priority, and resources. Readers won't be able to line this up with the actual constructors/attributes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 220 - 235, The README table is out of sync: replace
the old class/event names and field names with the shipped ones (e.g.
ErrorEvent, WorkflowStartedEvent, JobInfoEvent, JobStartedEvent,
JobFinishedEvent, GroupInfoEvent/GroupErrorEvent, etc.) and update field names
to the normalized attributes used in the codebase (e.g. job_id, group_id,
per_rule_job_counts, shell_cmd if used, aux_logs, job_error_info, etc.) so the
table matches the constructors/attributes in
src/snakemake_interface_logger_plugins/events.py and the tests (tests/tests.py
lines around 82-143); ensure each LogEvent enum entry maps to the corresponding
Event class name and its actual fields from events.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Around line 243-284: The example re-implements event dispatching instead of
using the shared parser; replace the custom mapping in parse_event with a call
to events.LogEventData.from_record(record) (i.e., have parse_event return
events.LogEventData.from_record(record) or None) and update the emit method to
check the returned LogEventData subclasses (e.g., compare isinstance(event_data,
events.JobErrorEvent or the exact classes exposed by events.LogEventData) and
use their fields), removing the manual event_map and dataclass_type logic so the
example uses the central parsing logic in events.LogEventData.from_record.

In `@tests/tests.py`:
- Around line 1-18: The test setup refers to RichLogHandler in
TestRegistry.reset_registry() but tests/tests.py no longer imports or defines
RichLogHandler, causing a NameError; restore an import or definition for
RichLogHandler so reset_registry() can instantiate it—specifically add an import
for RichLogHandler (or reintroduce a minimal definition) at the top of
tests/tests.py so the symbol is available to TestRegistry.reset_registry() and
any other tests that instantiate RichLogHandler.

---

Duplicate comments:
In `@README.md`:
- Around line 220-235: The README table is out of sync: replace the old
class/event names and field names with the shipped ones (e.g. ErrorEvent,
WorkflowStartedEvent, JobInfoEvent, JobStartedEvent, JobFinishedEvent,
GroupInfoEvent/GroupErrorEvent, etc.) and update field names to the normalized
attributes used in the codebase (e.g. job_id, group_id, per_rule_job_counts,
shell_cmd if used, aux_logs, job_error_info, etc.) so the table matches the
constructors/attributes in src/snakemake_interface_logger_plugins/events.py and
the tests (tests/tests.py lines around 82-143); ensure each LogEvent enum entry
maps to the corresponding Event class name and its actual fields from events.py.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: beaca467-ce01-4a92-b95a-4a43e787f9b3

📥 Commits

Reviewing files that changed from the base of the PR and between da0d407 and e886d60.

📒 Files selected for processing (2)
  • README.md
  • tests/tests.py

README.md Outdated
Comment on lines +243 to +284
from snakemake_interface_logger_plugins.common import LogEvent
from snakemake_interface_logger_plugins import events

def parse_event(record):
if not hasattr(record, "event"):
return None

event = record.event

# Map LogEvent to the corresponding dataclass
event_map = {
LogEvent.ERROR: events.Error,
LogEvent.WORKFLOW_STARTED: events.WorkflowStarted,
LogEvent.JOB_INFO: events.JobInfo,
LogEvent.JOB_STARTED: events.JobStarted,
LogEvent.JOB_FINISHED: events.JobFinished,
LogEvent.SHELLCMD: events.ShellCmd,
LogEvent.JOB_ERROR: events.JobError,
LogEvent.GROUP_INFO: events.GroupInfo,
LogEvent.GROUP_ERROR: events.GroupError,
LogEvent.RESOURCES_INFO: events.ResourcesInfo,
LogEvent.DEBUG_DAG: events.DebugDag,
LogEvent.PROGRESS: events.Progress,
LogEvent.RULEGRAPH: events.RuleGraph,
LogEvent.RUN_INFO: events.RunInfo,
}

dataclass_type = event_map.get(event)
if dataclass_type is not None:
return dataclass_type.from_record(record)
else:
return None

# Usage in a log handler:
def emit(self, record):
if hasattr(record, 'event'):
if record.event == LogEvent.JOB_ERROR:
# Access job error fields
jobid = getattr(record, 'jobid', 0)
# Handle job errors
pass
elif record.event == LogEvent.JOB_FINISHED:
# Access job completion fields
job_id = getattr(record, 'job_id', 0)
# Handle job completion
pass
elif record.event == LogEvent.PROGRESS:
# Access progress fields
done = getattr(record, 'done', 0)
total = getattr(record, 'total', 0)
# Handle progress updates
pass
```

Always use `getattr(record, 'field_name', default_value)` or check with `hasattr(record, 'field_name')` before accessing fields, as not all fields may be present in every record. No newline at end of file
event_data = parse_event(record)
if event_data:
# Now you can access event-specific fields, e.g.:
if isinstance(event_data, events.JobError):
print(f"Job error for jobid: {event_data.jobid}")
elif isinstance(event_data, events.Progress):
print(f"Progress: {event_data.done}/{event_data.total}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use LogEventData.from_record() in the example instead of a custom dispatcher.

src/snakemake_interface_logger_plugins/events.py:120-154 already exposes the parsing logic. Rebuilding the mapping here is redundant, and it has already drifted from the implementation (events.Error, events.JobInfo, etc. vs the *Event classes used elsewhere in this PR).

📌 Suggested doc update
-from snakemake_interface_logger_plugins.common import LogEvent
 from snakemake_interface_logger_plugins import events
 
-def parse_event(record):
-    if not hasattr(record, "event"):
-        return None
-
-    event = record.event
-
-    # Map LogEvent to the corresponding dataclass
-    event_map = {
-        LogEvent.ERROR: events.Error,
-        LogEvent.WORKFLOW_STARTED: events.WorkflowStarted,
-        LogEvent.JOB_INFO: events.JobInfo,
-        LogEvent.JOB_STARTED: events.JobStarted,
-        LogEvent.JOB_FINISHED: events.JobFinished,
-        LogEvent.SHELLCMD: events.ShellCmd,
-        LogEvent.JOB_ERROR: events.JobError,
-        LogEvent.GROUP_INFO: events.GroupInfo,
-        LogEvent.GROUP_ERROR: events.GroupError,
-        LogEvent.RESOURCES_INFO: events.ResourcesInfo,
-        LogEvent.DEBUG_DAG: events.DebugDag,
-        LogEvent.PROGRESS: events.Progress,
-        LogEvent.RULEGRAPH: events.RuleGraph,
-        LogEvent.RUN_INFO: events.RunInfo,
-    }
-
-    dataclass_type = event_map.get(event)
-    if dataclass_type is not None:
-        return dataclass_type.from_record(record)
-    else:
-        return None
-
-# Usage in a log handler:
 def emit(self, record):
-    event_data = parse_event(record)
-    if event_data:
-        # Now you can access event-specific fields, e.g.:
-        if isinstance(event_data, events.JobError):
-            print(f"Job error for jobid: {event_data.jobid}")
-        elif isinstance(event_data, events.Progress):
-            print(f"Progress: {event_data.done}/{event_data.total}")
+    event_data = events.LogEventData.from_record(record)
+    if event_data is None:
+        return
+
+    if isinstance(event_data, events.JobErrorEvent):
+        print(f"Job error for job_id: {event_data.job_id}")
+    elif isinstance(event_data, events.ProgressEvent):
+        print(f"Progress: {event_data.done}/{event_data.total}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 243 - 284, The example re-implements event
dispatching instead of using the shared parser; replace the custom mapping in
parse_event with a call to events.LogEventData.from_record(record) (i.e., have
parse_event return events.LogEventData.from_record(record) or None) and update
the emit method to check the returned LogEventData subclasses (e.g., compare
isinstance(event_data, events.JobErrorEvent or the exact classes exposed by
events.LogEventData) and use their fields), removing the manual event_map and
dataclass_type logic so the example uses the central parsing logic in
events.LogEventData.from_record.

@cademirch cademirch force-pushed the feat-event-dataclasses branch from e3abc6d to 257bcf2 Compare March 9, 2026 15:37
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
README.md (1)

286-339: ⚠️ Potential issue | 🟠 Major

README still points to the removed pre-rename event classes.

The table and sample dispatcher use events.Error, events.JobInfo, etc., but events.py now exports ErrorEvent, JobInfoEvent, and the other *Event classes. Copying this example will currently raise AttributeError, and the hand-written event_map can drift again from LogEventData.from_record(). Please switch the docs to the shared parser and the actual exported class names.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 286 - 339, The README references removed dataclass
names (e.g., events.Error, events.JobInfo) and a manual event_map instead of the
current exported classes and shared parser; update the table and the example to
use the real exported names (e.g., ErrorEvent, JobInfoEvent, JobStartedEvent,
etc.) and replace the handwritten mapping/dispatch with the shared parser
function LogEventData.from_record() (or show how to call that helper) so the
docs match the actual symbols (LogEvent, LogEventData.from_record, and the
*Event classes) and cannot drift.
src/snakemake_interface_logger_plugins/events.py (1)

42-49: ⚠️ Potential issue | 🟠 Major

Accept canonical dataclass keys before falling back to legacy aliases.

_from_extra_default() only reads the alias key once a field defines one. That means LogEventData.from_extra({**asdict(job_event), "event": ...}) still fails for aliased fields like job_id/jobid, which is one of the integration paths this PR is documenting.

Suggested fix
     for fld in fields(cls):
         if fld.name in kw:
             continue
-        name = fld.metadata.get("alias", fld.name)
-        if name in extra:
-            kw[fld.name] = extra[name]
+        alias = fld.metadata.get("alias")
+        if fld.name in extra:
+            kw[fld.name] = extra[fld.name]
+        elif alias is not None and alias in extra:
+            kw[fld.name] = extra[alias]
         elif not field_has_default(fld):
-            raise ValueError(f"LogRecord missing required attribute {name!r}")
+            raise ValueError(
+                f"LogRecord missing required attribute {(alias or fld.name)!r}"
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/snakemake_interface_logger_plugins/events.py` around lines 42 - 49, The
loop in _from_extra_default currently only checks the alias
(fld.metadata["alias"]) against extra, causing canonical dataclass keys
(fld.name) to be ignored; modify the loop in the function that iterates over
fields(cls) so it first looks for the canonical key in extra (e.g., if fld.name
in extra then kw[fld.name] = extra[fld.name]), then falls back to the alias
(name = fld.metadata.get("alias", fld.name) and check extra[name]) before
raising the ValueError; preserve the existing guard that skips fields already in
kw and retain the field_has_default() check for required fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/snakemake_interface_logger_plugins/events.py`:
- Line 305: ResourcesInfoEvent.nodes is incorrectly typed as Optional[list[str]]
while the docstring describes it as the number of provided remote nodes; change
the type annotation on ResourcesInfoEvent.nodes to an integer (e.g.,
Optional[int] or int as appropriate) and update any related
validation/serialization code in the ResourcesInfoEvent class to treat nodes as
a count rather than a list so consumers receive the correct numeric contract.

---

Duplicate comments:
In `@README.md`:
- Around line 286-339: The README references removed dataclass names (e.g.,
events.Error, events.JobInfo) and a manual event_map instead of the current
exported classes and shared parser; update the table and the example to use the
real exported names (e.g., ErrorEvent, JobInfoEvent, JobStartedEvent, etc.) and
replace the handwritten mapping/dispatch with the shared parser function
LogEventData.from_record() (or show how to call that helper) so the docs match
the actual symbols (LogEvent, LogEventData.from_record, and the *Event classes)
and cannot drift.

In `@src/snakemake_interface_logger_plugins/events.py`:
- Around line 42-49: The loop in _from_extra_default currently only checks the
alias (fld.metadata["alias"]) against extra, causing canonical dataclass keys
(fld.name) to be ignored; modify the loop in the function that iterates over
fields(cls) so it first looks for the canonical key in extra (e.g., if fld.name
in extra then kw[fld.name] = extra[fld.name]), then falls back to the alias
(name = fld.metadata.get("alias", fld.name) and check extra[name]) before
raising the ValueError; preserve the existing guard that skips fields already in
kw and retain the field_has_default() check for required fields.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0ec7606c-80d0-45cb-bc75-ab70dd6137fa

📥 Commits

Reviewing files that changed from the base of the PR and between e886d60 and ba9691e.

⛔ Files ignored due to path filters (1)
  • pyproject.toml is excluded by !pyproject.toml
📒 Files selected for processing (3)
  • README.md
  • src/snakemake_interface_logger_plugins/events.py
  • tests/tests.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/tests.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/snakemake_interface_logger_plugins/events.py (1)

312-319: Consider serialization implications of storing exception objects.

The exception field stores a BaseException directly. While valid, this may cause issues if the event is serialized (e.g., pickled or JSON-encoded). If serialization is expected, consider storing the exception type and message as strings instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/snakemake_interface_logger_plugins/events.py` around lines 312 - 319, The
DebugDagEvent dataclass currently stores an exception object in the exception
field which can break serialization; change the model to store serializable
representations (e.g., exception_type: Optional[str] and exception_message:
Optional[str]) instead of BaseException, or serialize the exception into these
strings inside DebugDagEvent (and optionally keep a transient/non-serialized
raw_exception attribute if runtime access is needed); update any code paths that
construct DebugDagEvent (references to DebugDagEvent and its exception field) to
populate exception_type and exception_message from type(exc).__name__ and
str(exc) (or repr) and remove reliance on the BaseException instance for
serialization.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/snakemake_interface_logger_plugins/events.py`:
- Around line 172-188: The WorkflowStartedEvent dataclass declares snakefile:
Optional[str] but provides no default, so _from_extra_default will treat it as
required and raise if missing; fix by giving the dataclass field a default of
None (change the declaration in WorkflowStartedEvent to snakefile: Optional[str]
= None) so missing snakefile in extra is allowed, and keep the _from_extra logic
that converts present values to str and passes snakefile=snakefile into
_from_extra_default.

In `@tests/tests.py`:
- Around line 97-105: Remove the duplicate test event: in tests.py within the
EVENTS list remove the redundant logevents.JobErrorEvent(job_id=123) so only one
JobErrorEvent for job_id=123 remains; locate the EVENTS list and delete the
second occurrence (the one after logevents.ShellCmdEvent) to prevent duplicated
test input affecting assertions that rely on unique events.

---

Nitpick comments:
In `@src/snakemake_interface_logger_plugins/events.py`:
- Around line 312-319: The DebugDagEvent dataclass currently stores an exception
object in the exception field which can break serialization; change the model to
store serializable representations (e.g., exception_type: Optional[str] and
exception_message: Optional[str]) instead of BaseException, or serialize the
exception into these strings inside DebugDagEvent (and optionally keep a
transient/non-serialized raw_exception attribute if runtime access is needed);
update any code paths that construct DebugDagEvent (references to DebugDagEvent
and its exception field) to populate exception_type and exception_message from
type(exc).__name__ and str(exc) (or repr) and remove reliance on the
BaseException instance for serialization.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ba850ec3-7741-4342-9cea-123c605eea75

📥 Commits

Reviewing files that changed from the base of the PR and between ba9691e and e5fdd66.

📒 Files selected for processing (2)
  • src/snakemake_interface_logger_plugins/events.py
  • tests/tests.py

Comment on lines +172 to +188
@dataclass
class WorkflowStartedEvent(LogEventData):
event = LogEvent.WORKFLOW_STARTED

workflow_id: uuid.UUID
snakefile: Optional[str]

@classmethod
def _from_extra(cls, extra: StrMap) -> Self:
snakefile = extra.get("snakefile", None)
if snakefile is not None:
try:
# Try to convert to string - this should work for PosixPath and other path-like objects
snakefile = str(snakefile)
except (TypeError, ValueError) as e:
raise ValueError(f"Could not convert snakefile to string: {e}")
return _from_extra_default(cls, extra, snakefile=snakefile)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing default for optional snakefile field.

The snakefile field is typed as Optional[str] but has no default value. This means _from_extra_default will raise a ValueError if snakefile is missing from the extra dict, even though None is a valid value per the type hint.

Proposed fix
 `@dataclass`
 class WorkflowStartedEvent(LogEventData):
     event = LogEvent.WORKFLOW_STARTED

     workflow_id: uuid.UUID
-    snakefile: Optional[str]
+    snakefile: Optional[str] = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/snakemake_interface_logger_plugins/events.py` around lines 172 - 188, The
WorkflowStartedEvent dataclass declares snakefile: Optional[str] but provides no
default, so _from_extra_default will treat it as required and raise if missing;
fix by giving the dataclass field a default of None (change the declaration in
WorkflowStartedEvent to snakefile: Optional[str] = None) so missing snakefile in
extra is allowed, and keep the _from_extra logic that converts present values to
str and passes snakefile=snakefile into _from_extra_default.

Comment on lines +97 to +105
logevents.JobErrorEvent(job_id=123),
logevents.JobStartedEvent(job_ids=[123, 321]),
logevents.JobFinishedEvent(job_id=123),
logevents.ShellCmdEvent(
job_id=123,
shellcmd="echo 'Hello, World!'",
rule_name="example_rule",
),
logevents.JobErrorEvent(job_id=123),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Duplicate JobErrorEvent in test data.

JobErrorEvent(job_id=123) appears twice in the EVENTS list (lines 97 and 105). This is likely unintentional and could mask test failures if the duplicate causes issues.

Proposed fix - remove duplicate
         logevents.JobInfoEvent(
             job_id=123,
             rule_name="example_rule",
             threads=4,
             input=["input1.txt", "input2.txt"],
             output=["output.txt"],
             log=["log.txt"],
             benchmark="benchmark.csv",
             wildcards={"foo": "bar"},
         ),
-        logevents.JobErrorEvent(job_id=123),
         logevents.JobStartedEvent(job_ids=[123, 321]),
         logevents.JobFinishedEvent(job_id=123),
         logevents.ShellCmdEvent(
             job_id=123,
             shellcmd="echo 'Hello, World!'",
             rule_name="example_rule",
         ),
         logevents.JobErrorEvent(job_id=123),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logevents.JobErrorEvent(job_id=123),
logevents.JobStartedEvent(job_ids=[123, 321]),
logevents.JobFinishedEvent(job_id=123),
logevents.ShellCmdEvent(
job_id=123,
shellcmd="echo 'Hello, World!'",
rule_name="example_rule",
),
logevents.JobErrorEvent(job_id=123),
logevents.JobStartedEvent(job_ids=[123, 321]),
logevents.JobFinishedEvent(job_id=123),
logevents.ShellCmdEvent(
job_id=123,
shellcmd="echo 'Hello, World!'",
rule_name="example_rule",
),
logevents.JobErrorEvent(job_id=123),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/tests.py` around lines 97 - 105, Remove the duplicate test event: in
tests.py within the EVENTS list remove the redundant
logevents.JobErrorEvent(job_id=123) so only one JobErrorEvent for job_id=123
remains; locate the EVENTS list and delete the second occurrence (the one after
logevents.ShellCmdEvent) to prevent duplicated test input affecting assertions
that rely on unique events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants