Skip to content

Conversation

@bmchalenv
Copy link
Contributor

@bmchalenv bmchalenv commented Jan 13, 2026

Add support for ROS parameter YAML on startup and rename message diagnostics to greenwave diagnostics to match parameter yaml group name. Look at the example.yaml if you want to see how it is integrated.

Fixes a few bugs/makes the system robust also. Specifically #16 .

Signed-off-by: Blake McHale <bmchale@nvidia.com>
Signed-off-by: Blake McHale <bmchale@nvidia.com>
Signed-off-by: Blake McHale <bmchale@nvidia.com>
Signed-off-by: Blake McHale <bmchale@nvidia.com>
@bmchalenv bmchalenv changed the title Use parameter yaml on startup for configuring greenwave monitor Use parameter yaml on startup for configuring greenwave monitor + rename to greenwave diagnostics Jan 13, 2026
@greptile-apps
Copy link

greptile-apps bot commented Jan 13, 2026

Greptile Summary

This PR adds YAML-based parameter configuration for the greenwave monitor and renames message_diagnostics to greenwave_diagnostics throughout the codebase to better align with the parameter namespace.

Key Changes:

  • Parameter Loading: Topics can now be configured via YAML with expected_frequency and tolerance parameters under the greenwave_diagnostics namespace
  • Deferred Initialization: Topic discovery is now deferred 100ms after node construction to allow the ROS graph to settle, improving reliability when topics don't exist immediately
  • Destructor Cleanup: Both GreenwaveMonitor and MinimalPublisher now have explicit destructors that cancel timers and clear diagnostics before the base Node destructor runs, fixing issue Greenwave monitor sometimes exits with return code -11, causes flaky test. #16 (exit code -11 crash)
  • Namespace Rename: Complete rename from message_diagnostics to greenwave_diagnostics for consistency with the parameter group name
  • Improved Topic Discovery: Changed from get_topic_names_and_types() to get_publishers_info_by_topic() for more reliable type discovery with configurable retries

Bug Fixes:

  • Fixes Greenwave monitor sometimes exits with return code -11, causes flaky test. #16: The explicit destructors prevent accessing invalid node state during shutdown by canceling timers and clearing resources in the correct order
  • Negative tolerance values are now clamped to 0.0 with a warning
  • Topics with invalid expected_frequency (≤ 0) are monitored but without frequency validation settings
  • Empty topic strings from the default parameter value are now skipped

Testing:

The PR includes comprehensive test coverage for YAML parameter loading, including validation of integer parameter conversion, negative parameter handling, and proper monitoring behavior for various configuration scenarios.

Confidence Score: 4/5

  • This PR is safe to merge with careful attention to the destructor implementation
  • The PR successfully addresses issue Greenwave monitor sometimes exits with return code -11, causes flaky test. #16 with explicit destructors and adds valuable YAML configuration support. The implementation is well-tested with comprehensive test coverage. Score is 4 rather than 5 due to the complexity of the destructor cleanup order and the init_timer_ self-cancellation pattern, which while likely safe, could benefit from additional stress testing under rapid shutdown scenarios
  • greenwave_monitor/src/greenwave_monitor.cpp - Pay attention to the destructor cleanup order and init_timer_ callback self-cancellation pattern

Important Files Changed

Filename Overview
greenwave_monitor/src/greenwave_monitor.cpp Adds YAML parameter loading, deferred initialization, and destructor cleanup to fix exit code -11 crash
greenwave_monitor/include/greenwave_monitor.hpp Adds destructor, deferred_init method, and updates function signatures for parameter loading
greenwave_monitor/include/minimal_publisher_node.hpp Adds destructor with timer cancellation and diagnostics cleanup
greenwave_monitor/test/test_greenwave_monitor.py Adds YAML parameter loading tests and expands test coverage for configuration scenarios

Sequence Diagram

sequenceDiagram
    participant User
    participant ROS2
    participant GreenwaveMonitor
    participant InitTimer
    participant TopicDiscovery
    participant Diagnostics

    User->>ROS2: Launch with YAML config
    ROS2->>GreenwaveMonitor: Constructor(options)
    GreenwaveMonitor->>GreenwaveMonitor: allow_undeclared_parameters(true)
    GreenwaveMonitor->>GreenwaveMonitor: automatically_declare_parameters_from_overrides(true)
    GreenwaveMonitor->>GreenwaveMonitor: declare_parameter("topics", {""})
    GreenwaveMonitor->>GreenwaveMonitor: create_wall_timer(1s, timer_callback)
    GreenwaveMonitor->>InitTimer: create_wall_timer(100ms, deferred_init)
    Note over InitTimer: Wait 100ms for ROS graph to settle
    
    InitTimer->>GreenwaveMonitor: deferred_init()
    GreenwaveMonitor->>GreenwaveMonitor: add_topics_from_parameters()
    GreenwaveMonitor->>GreenwaveMonitor: list_parameters("greenwave_diagnostics", 10)
    GreenwaveMonitor->>GreenwaveMonitor: Parse topic names from params
    GreenwaveMonitor->>GreenwaveMonitor: get_parameter("topics")
    
    loop For each topic
        GreenwaveMonitor->>TopicDiscovery: find_topic_type(topic, retries=5, wait=0.5s)
        TopicDiscovery->>ROS2: get_publishers_info_by_topic(topic)
        ROS2-->>TopicDiscovery: publisher info
        TopicDiscovery-->>GreenwaveMonitor: topic type
        GreenwaveMonitor->>GreenwaveMonitor: create_subscription(topic)
        GreenwaveMonitor->>Diagnostics: create GreenwaveDiagnostics
        
        alt Has expected_frequency param
            GreenwaveMonitor->>GreenwaveMonitor: get_double_param(freq_param)
            GreenwaveMonitor->>GreenwaveMonitor: get_double_param(tol_param)
            alt expected_frequency > 0
                GreenwaveMonitor->>Diagnostics: setExpectedDt(frequency, tolerance)
            else expected_frequency <= 0
                GreenwaveMonitor->>GreenwaveMonitor: WARN: skipping parameter settings
            end
        end
    end
    
    GreenwaveMonitor->>GreenwaveMonitor: create_service(manage_topic)
    GreenwaveMonitor->>GreenwaveMonitor: create_service(set_expected_frequency)
    
    loop Every 1 second
        GreenwaveMonitor->>Diagnostics: publishDiagnostics()
        Diagnostics->>ROS2: publish to /diagnostics
    end
    
    User->>ROS2: Shutdown (SIGINT)
    ROS2->>GreenwaveMonitor: ~GreenwaveMonitor()
    GreenwaveMonitor->>GreenwaveMonitor: timer_->cancel()
    GreenwaveMonitor->>GreenwaveMonitor: init_timer_->cancel()
    GreenwaveMonitor->>Diagnostics: greenwave_diagnostics_.clear()
    GreenwaveMonitor->>GreenwaveMonitor: subscriptions_.clear()
    Note over GreenwaveMonitor: Clean shutdown prevents exit code -11
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: Blake McHale <bmchale@nvidia.com>
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: Blake McHale <bmchale@nvidia.com>
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: Blake McHale <bmchale@nvidia.com>
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Signed-off-by: Blake McHale <bmchale@nvidia.com>
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@sgillen
Copy link
Collaborator

sgillen commented Jan 13, 2026

This also closes #19

expected_frequency: 100.0
tolerance: 10.0
/image_topic:
expected_frequency: 0.0 # Invalid parameter settings are ignored
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's elaborate - ignored and we use the default "report topic frequency but don't check it" behavior

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should include the string topic here as well? Also with expected_frequency = 0?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We're missing the copyright header here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left out the string topic on purpose so that it gets picked up by the topics:= parameter. I still think that parameter is useful for just starting the monitor without frequency and tolerance set. Wanted to make sure there was still an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved topics:= into the YAML so that users can reference it.

executable='greenwave_monitor',
name='greenwave_monitor',
output='log',
parameters=[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote we have config file contain all topics, and leave a commented out version showing how to specify topic inline like before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know what you think about the new yaml. I like the idea of leaving a commented out version.

}

std::string topic_name = remainder.substr(0, last_dot);
if (!topic_name.empty() && topic_name[0] == '/') {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are basically limiting ourselves to globally namespaces topics in the params? if we relax that condition does the topic resolution handle namespaces gracefully? or is it easier to just have this restriction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could work, I was trying to avoid complexity by not testing/implementing that in this MR. I'll try it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nvm I recall why I didn't, it adds more complexity to all of our logic. We'd need to support it in add topic and greenwave diagnostics. There would have to be some form of discovery or way to find the namespace. For instance, "/camera0/image_raw" and "/camera1/image_raw", how do we tell which one to match the namespace too? I thought it was too much to handle for this MR and even my previous MR. My opinion is that should be done separately since it strays strongly from the point here.

if (expected_frequency > 0.0) {
greenwave_diagnostics_[topic]->setExpectedDt(expected_frequency, tolerance);
} else {
RCLCPP_WARN(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be RCLCPP_DEBUG instead? I can imagine cases where this just causes annoying console spam.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also isn't expected_frequency == 0.0 expected behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expected_frequency == 0.0 is bad, we reject it in set expected frequency service so I reject it here. Was trying to keep our logic the same across the board. If we want to allow a topic to be specified as 0 frequency, I'm fine with doing that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically the WARN message should never appear unless the user has misconfigured their YAML.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they want to not specify expected frequency they should use topics:= instead. Another alternative is to add back the enabled flag, that was part of the reason I had that parameter in the more complex parameter MR.

Signed-off-by: Blake McHale <bmchale@nvidia.com>
Signed-off-by: Blake McHale <bmchale@nvidia.com>
Signed-off-by: Blake McHale <bmchale@nvidia.com>
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +56 to +60
init_timer_ = this->create_wall_timer(
100ms, [this]() {
init_timer_->cancel();
deferred_init();
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: timer cancels itself then accesses this

the init_timer_ callback cancels itself on line 58, then immediately calls deferred_init() which accesses member variables. if the node is being destroyed during this callback, the timer cancellation in the destructor (line 48-50 in the header) may not prevent the callback from continuing to execute deferred_init(), potentially accessing destroyed node state

Suggested change
init_timer_ = this->create_wall_timer(
100ms, [this]() {
init_timer_->cancel();
deferred_init();
});
init_timer_ = this->create_wall_timer(
100ms, [this, timer_weak = std::weak_ptr<rclcpp::TimerBase>()]() mutable {
if (auto timer = timer_weak.lock()) {
timer->cancel();
deferred_init();
}
});
std::static_pointer_cast<rclcpp::WallTimer<std::function<void()>>>(init_timer_)->get_timer_handle();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants