Skip to content

Launch Manager loads the new configuration file#248

Open
paulquiring wants to merge 9 commits into
eclipse-score:mainfrom
etas-contrib:feature/lm-use-new-configuration
Open

Launch Manager loads the new configuration file#248
paulquiring wants to merge 9 commits into
eclipse-score:mainfrom
etas-contrib:feature/lm-use-new-configuration

Conversation

@paulquiring

@paulquiring paulquiring commented Jun 15, 2026

Copy link
Copy Markdown
Contributor
  • Map user-facing json config to the new flatbuffer config
  • Load new launch manager config
  • Introduce adapters to make the rest of the code work with the new configuration content
  • Remove the legacy configuration files

Closes: #209

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

License Check Results

🚀 The license check job ran with the Bazel command:

bazel run --lockfile_mode=error //:license-check

Status: ⚠️ Needs Review

Click to expand output
[License Check Output]
Extracting Bazel installation...
Starting local Bazel server (8.4.2) and connecting to it...
INFO: Invocation ID: 87349e67-6baf-4dc4-8451-12ef92285d38
Computing main repo mapping: 
Computing main repo mapping: 
Loading: 
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
    currently loading: 
Loading: 0 packages loaded
    currently loading: 
Loading: 0 packages loaded
    currently loading: 
Analyzing: target //:license-check (1 packages loaded, 0 targets configured)
Analyzing: target //:license-check (1 packages loaded, 0 targets configured)

Analyzing: target //:license-check (30 packages loaded, 10 targets configured)

Analyzing: target //:license-check (82 packages loaded, 10 targets configured)

Analyzing: target //:license-check (86 packages loaded, 10 targets configured)

Analyzing: target //:license-check (138 packages loaded, 2721 targets configured)

Analyzing: target //:license-check (150 packages loaded, 5404 targets configured)

Analyzing: target //:license-check (155 packages loaded, 5453 targets configured)

Analyzing: target //:license-check (155 packages loaded, 5453 targets configured)

Analyzing: target //:license-check (155 packages loaded, 5453 targets configured)

Analyzing: target //:license-check (158 packages loaded, 7340 targets configured)

Analyzing: target //:license-check (160 packages loaded, 9062 targets configured)

Analyzing: target //:license-check (162 packages loaded, 10142 targets configured)

Analyzing: target //:license-check (162 packages loaded, 10142 targets configured)

Analyzing: target //:license-check (162 packages loaded, 10142 targets configured)

Analyzing: target //:license-check (162 packages loaded, 10142 targets configured)

Analyzing: target //:license-check (162 packages loaded, 10142 targets configured)

INFO: Analyzed target //:license-check (163 packages loaded, 10268 targets configured).
[13 / 16] JavaToolchainCompileClasses external/rules_java+/toolchains/platformclasspath_classes; 0s disk-cache, processwrapper-sandbox
[14 / 16] JavaToolchainCompileBootClasspath external/rules_java+/toolchains/platformclasspath.jar; 0s disk-cache, processwrapper-sandbox
INFO: Found 1 target...
Target //:license.check.license_check up-to-date:
  bazel-bin/license.check.license_check
  bazel-bin/license.check.license_check.jar
INFO: Elapsed time: 31.286s, Critical Path: 2.50s
INFO: 16 processes: 12 internal, 3 processwrapper-sandbox, 1 worker.
INFO: Build completed successfully, 16 total actions
INFO: Running command line: bazel-bin/license.check.license_check ./formatted.txt <args omitted>
usage: org.eclipse.dash.licenses.cli.Main [-batch <int>] [-cd <url>]
       [-confidence <int>] [-ef <url>] [-excludeSources <sources>] [-help] [-lic
       <url>] [-project <shortname>] [-repo <url>] [-review] [-summary <file>]
       [-timeout <seconds>] [-token <token>]

Comment thread scripts/config_mapping/lifecycle_config.py
@paulquiring paulquiring marked this pull request as ready for review June 16, 2026 08:02
Comment thread score/launch_manager/daemon/src/configuration/configuration_adapter.cpp Outdated
Comment thread score/launch_manager/daemon/src/configuration/configuration_adapter.hpp Outdated
Comment thread score/launch_manager/daemon/src/main.cpp Outdated
@S-MOHAMD

Copy link
Copy Markdown
Contributor

Please lnk corresponding issue/bug.
see: https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue

@paulquiring paulquiring force-pushed the feature/lm-use-new-configuration branch from f13cffe to eb807a1 Compare June 18, 2026 08:28
@github-actions

Copy link
Copy Markdown

The created documentation from the pull request is available at: docu-html

@NicolasFussberger NicolasFussberger moved this from Backlog to In Progress in LCM - Lifecycle & Health FT Jun 18, 2026
// strdup() returns nullptr on OOM. On this embedded target, OOM during daemon
// startup is unrecoverable — the OS will terminate the process.
size_t arg_index = 0U;
startup.argv_[arg_index++] = strdup(startup.executable_path_.c_str());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Du we need to strdup the string here or should we just point to the c_str()?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use comp.deployment_config.executable_path.c_str(), but this would create a mixed ownership between the configuration object in main and the adapter. The Adapter copies the configuration so that it owns it. Yes this costs a bit of resources but is the simplest and clearest way currently to bridge the old to new configuration.

*************

- The ``launch_manager`` binary built and available on your target.
- Python 3 with the ``jsonschema`` package installed (required for config validation).

@NicolasFussberger NicolasFussberger Jun 23, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the user does not really need to be aware of these things as this is all handled in the background. python and jsonschema dependency is automatically handled via bazel. I don't think anyone (besides LaunchManager developers) should try to invoke lifecycle_config.py themselves.

All the user needs to be aware of is:

  • to use the provided bazel abstraction
  • How to write a config file
  • start launch_manager with "-c" argument
load("//:defs.bzl", "launch_manager_config")

launch_manager_config(
    name = "examples_test_config",
    config = ":lifecycle_demo_test.json",
    flatbuffer_out_dir = "etc",
)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 1 — Write the JSON config
==============================

Create a file ``my_config.json``:

@NicolasFussberger NicolasFussberger Jun 23, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just link to an example config (or include the example config) from examples/demo_verification/lifecycle_demo_test.json so that we do not duplicate a config here.

For example:

  .. dropdown:: example_launch_manager_config.json

     .. literalinclude:: /../../../examples/demo_verification/lifecycle_demo_test.json
        :language: json

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked as suggested and makes the getting stared much shorter!

/// @param [in] f_recoveryClient_r Interface to the launch manager for recovery
/// @param [in] f_processStateReader_r Process state reader object for PHM daemon
/// @param [in] f_bufferConfig_r Configuration settings for constructing workers
/// @return Construction is successful (true), otherwise failure (false)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can bring back some of the doxygen as that seems still valid and to some extent helpful

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

try
{
try
supervised_components_.clear();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move the setup of supervised_components into the init method?
Other methods also assume this vector being already filled which creates a non-obvious dependency that createProcessStates must always be called first.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// Accessing the same index (vector <-> config vector) is mapping the same configuration.
// coverity[cert_exp34_c_violation] f_interfaceIpcs_r is created from PhmMonitorInterface
// coverity[dereference] f_interfaceIpcs_r is created from PhmMonitorInterface same size assured
auto refProcessIndex{flatBuffer_p->hmMonitorInterface()->Get(index)->refProcessIndex()};

@NicolasFussberger NicolasFussberger Jun 23, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this was getting the processIndex via the config. Now we rely on the assumption that for every component in supervised_components vector, a CheckpointIpcServer and ProcessState have been created with same ordering as in supervised_components vector. This assumption should be fine now, as there is a 1:1 mapping between Process and Alive supervision (this was not true before, as there could be multiple independent checkpoint reporters per process).

I wonder if we should make this more obvious by using the same index for those vectors.

E.g.

        // before
        uint32_t index{0U};
        f_interfaces_r.reserve(f_interfaceIpcs_r.size());
        for (auto& interfaceIpc : f_interfaceIpcs_r)
        {
            f_interfaces_r.emplace_back(interfaceIpc, interfaceIpc.getPath().data());
            f_processStates_r.at(static_cast<size_t>(index)).attachObserver(f_interfaces_r.back());

            logger_r.LogDebug() << "Successfully created MonitorInterface:" << f_interfaces_r.back().getInterfaceName();
            index++;
        }
        // suggestion
        f_interfaces_r.reserve(supervised_components_.size());
        for(std::size_t compIndex = index; compIndex < supervised_components_.size(); compIndex++) {
            auto& interfaceIpc = f_interfaceIpcs_r.at(compIndex);
            f_interfaces_r.emplace_back(interfaceIpc, interfaceIpc.getPath().data());
            f_processStates_r.at(static_cast<size_t>(compIndex)).attachObserver(f_interfaces_r.back());
            logger_r.LogDebug() << "Successfully created MonitorInterface:" << f_interfaces_r.back().getInterfaceName();
        }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is much better, it's pushed.

<< std::string_view{f_exception_r.what()};
const auto* comp = supervised_components_[idx];
const std::string checkpointCfgName = comp->name + "_checkpoint";
const uint32_t checkpointId = 1U;

@NicolasFussberger NicolasFussberger Jun 23, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is worth it, but the checkpointId has to match this

static constexpr std::uint32_t kDefaultCheckpointId{1U};

I wonder if we should make this a static constant in score/launch_manager/src/daemon/src/alive_monitor/details/factory/StaticConfig.hpp which is then used in both places.

The background is, when this code was still supporting Deadline and Logical supervisions, then different kinds of checkpoints were required.
But now only Alive supervision support is left, which only needs sending "Alive notificiations" - for which he just use checkpointId 1.

@paulquiring paulquiring Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to reorganize the code a bit
Looking at:

score/launch_manager/src/daemon/src/alive_monitor/details/factory/StaticConfig.hpp
score/launch_manager/src/alive/src/alive.cpp

we might want to move StaticConfig.hpp into score/launch_manager/src/common or a similar location. We might want to create a ticket out if this comment to work in it later?


logger_r.LogDebug() << kLogPrefix << "Successfully created alive supervision worker object:"
<< f_alive_r.back().getConfigName();
continue;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case should not really happen. If the component is supervised then alive supervision had to be specified.
Maybe we should have an assert here?

aliveSupCfg.checkpointBufferSize = bufferConfig_r.bufferSizeAliveSupervision;
aliveSupCfg.recoveryClient = f_recoveryClient_r;

aliveSupCfg.processIdentifier = score::lcm::IdentifierHash{comp->name};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I some other place the function getProcessId was used to calculate the process Id. Maybe we should use the same way, to avoid accidentally using a different string as a base.

Maybe change getProcessId to this, then the decision which string is hashed is in a single place:

score::lcm::IdentifierHash getProcessId(const score::mw::launch_manager::configuration::ComponentConfig* config) {
    return score::lcm::IdentifierHash{comp->name};
}

void logConfiguration() noexcept(true);

/// @brief Configured watchdog devices
/// By default, no watchdog device is configured

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should bring back some of this doxygen, thats seems rather useful

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

logConfiguration();
return true;
const auto& alive_sup = config.aliveSupervision();
if (alive_sup.evaluation_cycle_ms != 0U)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alive_sup.evaluation_cycle_ms == 0 should never happen. Maybe we should make this an assert?
Maybe SCORE_LANGUAGE_FUTURECPP_PRECONDITION_PRD_MESSAGE

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going for SCORE_LANGUAGE_FUTURECPP_PRECONDITION_PRD_MESSAGE seems to be a good idea. I chose assert to be consistent with the rest of the code base. We should change this in the code base with a separate PR & Issue.

Comment thread score/launch_manager/daemon/src/configuration/configuration_adapter.cpp Outdated
Comment thread score/launch_manager/daemon/src/configuration/configuration_adapter.hpp Outdated
Comment thread score/launch_manager/daemon/src/configuration/configuration_adapter.cpp Outdated
Comment thread score/launch_manager/daemon/src/main.cpp Outdated
Comment thread score/launch_manager/daemon/src/configuration/configuration_adapter.cpp Outdated
Comment thread score/launch_manager/daemon/src/configuration/configuration_adapter.cpp Outdated
Comment thread score/launch_manager/daemon/src/alive_monitor/details/factory/FlatCfgFactory.cpp Outdated
// strdup() returns nullptr on OOM. On this embedded target, OOM during daemon
// startup is unrecoverable — the OS will terminate the process.
size_t arg_index = 0U;
startup.argv_[arg_index++] = strdup(startup.executable_path_.c_str());

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use comp.deployment_config.executable_path.c_str(), but this would create a mixed ownership between the configuration object in main and the adapter. The Adapter copies the configuration so that it owns it. Yes this costs a bit of resources but is the simplest and clearest way currently to bridge the old to new configuration.

const auto& wd = *wd_opt;
watchdog::DeviceConfig wdConfig{};
wdConfig.fileName = wd.device_file_path;
assert(wd.max_timeout_ms <= std::numeric_limits<std::uint16_t>::max());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this a validation in score/launch_manager/src/daemon/src/configuration/details/flatbuffer_type_converters.cpp ?

class MachineConfigFactory : public watchdog::IDeviceConfigFactory
{
public:
/// @brief Holds different buffer sizes that may be configured in the HM Machine Config

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is no longer valid.
All these buffer sizes are currently not configurable anymore.

This seems like some fine tuning we may enable with e.g. CLI flags in the future. For now the default values should be sufficient

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to "Holds buffer sizes for supervision objects (currently using compile-time defaults)"

}
dep.target_process_id_ = IdentifierHash{dep_name};
os_process.dependencies_.push_back(dep);
}

@NicolasFussberger NicolasFussberger Jun 23, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here is a bug. We need to lookup the ReadyCondition process_state for the referenced component.

Image

I think it would be good to extend the UTs with these different state combinations to catch this

for (const auto& comp_name : fallback.depends_on) {
auto it = component_to_process_index.find(comp_name);
if (it != component_to_process_index.end()) {
fallback_state.process_indexes_.push_back(it->second);

@NicolasFussberger NicolasFussberger Jun 23, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we also need to resolve the dependencies recursively.

E.g. fallback_run_target depends_on [CompA]
and CompA depends_on [CompB] then

fallback_run_target = [CompA, CompB]

I think it would be good to extend the UT with this case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate Launch Manager from the reading the old configuration file format to the new format

4 participants