Launch Manager loads the new configuration file#248
Conversation
License Check Results🚀 The license check job ran with the Bazel command: bazel run --lockfile_mode=error //:license-checkStatus: Click to expand output |
|
Please lnk corresponding issue/bug. |
f13cffe to
eb807a1
Compare
|
The created documentation from the pull request is available at: docu-html |
| // strdup() returns nullptr on OOM. On this embedded target, OOM during daemon | ||
| // startup is unrecoverable — the OS will terminate the process. | ||
| size_t arg_index = 0U; | ||
| startup.argv_[arg_index++] = strdup(startup.executable_path_.c_str()); |
There was a problem hiding this comment.
Du we need to strdup the string here or should we just point to the c_str()?
There was a problem hiding this comment.
We could use comp.deployment_config.executable_path.c_str(), but this would create a mixed ownership between the configuration object in main and the adapter. The Adapter copies the configuration so that it owns it. Yes this costs a bit of resources but is the simplest and clearest way currently to bridge the old to new configuration.
| ************* | ||
|
|
||
| - The ``launch_manager`` binary built and available on your target. | ||
| - Python 3 with the ``jsonschema`` package installed (required for config validation). |
There was a problem hiding this comment.
I think the user does not really need to be aware of these things as this is all handled in the background. python and jsonschema dependency is automatically handled via bazel. I don't think anyone (besides LaunchManager developers) should try to invoke lifecycle_config.py themselves.
All the user needs to be aware of is:
- to use the provided bazel abstraction
- How to write a config file
- start launch_manager with "-c" argument
load("//:defs.bzl", "launch_manager_config")
launch_manager_config(
name = "examples_test_config",
config = ":lifecycle_demo_test.json",
flatbuffer_out_dir = "etc",
)There was a problem hiding this comment.
| Step 1 — Write the JSON config | ||
| ============================== | ||
|
|
||
| Create a file ``my_config.json``: |
There was a problem hiding this comment.
I wonder if we should just link to an example config (or include the example config) from examples/demo_verification/lifecycle_demo_test.json so that we do not duplicate a config here.
For example:
.. dropdown:: example_launch_manager_config.json
.. literalinclude:: /../../../examples/demo_verification/lifecycle_demo_test.json
:language: jsonThere was a problem hiding this comment.
This worked as suggested and makes the getting stared much shorter!
| /// @param [in] f_recoveryClient_r Interface to the launch manager for recovery | ||
| /// @param [in] f_processStateReader_r Process state reader object for PHM daemon | ||
| /// @param [in] f_bufferConfig_r Configuration settings for constructing workers | ||
| /// @return Construction is successful (true), otherwise failure (false) |
There was a problem hiding this comment.
Maybe we can bring back some of the doxygen as that seems still valid and to some extent helpful
| try | ||
| { | ||
| try | ||
| supervised_components_.clear(); |
There was a problem hiding this comment.
should we move the setup of supervised_components into the init method?
Other methods also assume this vector being already filled which creates a non-obvious dependency that createProcessStates must always be called first.
| // Accessing the same index (vector <-> config vector) is mapping the same configuration. | ||
| // coverity[cert_exp34_c_violation] f_interfaceIpcs_r is created from PhmMonitorInterface | ||
| // coverity[dereference] f_interfaceIpcs_r is created from PhmMonitorInterface same size assured | ||
| auto refProcessIndex{flatBuffer_p->hmMonitorInterface()->Get(index)->refProcessIndex()}; |
There was a problem hiding this comment.
Previously this was getting the processIndex via the config. Now we rely on the assumption that for every component in supervised_components vector, a CheckpointIpcServer and ProcessState have been created with same ordering as in supervised_components vector. This assumption should be fine now, as there is a 1:1 mapping between Process and Alive supervision (this was not true before, as there could be multiple independent checkpoint reporters per process).
I wonder if we should make this more obvious by using the same index for those vectors.
E.g.
// before
uint32_t index{0U};
f_interfaces_r.reserve(f_interfaceIpcs_r.size());
for (auto& interfaceIpc : f_interfaceIpcs_r)
{
f_interfaces_r.emplace_back(interfaceIpc, interfaceIpc.getPath().data());
f_processStates_r.at(static_cast<size_t>(index)).attachObserver(f_interfaces_r.back());
logger_r.LogDebug() << "Successfully created MonitorInterface:" << f_interfaces_r.back().getInterfaceName();
index++;
}
// suggestion
f_interfaces_r.reserve(supervised_components_.size());
for(std::size_t compIndex = index; compIndex < supervised_components_.size(); compIndex++) {
auto& interfaceIpc = f_interfaceIpcs_r.at(compIndex);
f_interfaces_r.emplace_back(interfaceIpc, interfaceIpc.getPath().data());
f_processStates_r.at(static_cast<size_t>(compIndex)).attachObserver(f_interfaces_r.back());
logger_r.LogDebug() << "Successfully created MonitorInterface:" << f_interfaces_r.back().getInterfaceName();
}There was a problem hiding this comment.
Yes this is much better, it's pushed.
| << std::string_view{f_exception_r.what()}; | ||
| const auto* comp = supervised_components_[idx]; | ||
| const std::string checkpointCfgName = comp->name + "_checkpoint"; | ||
| const uint32_t checkpointId = 1U; |
There was a problem hiding this comment.
Not sure if this is worth it, but the checkpointId has to match this
I wonder if we should make this a static constant in score/launch_manager/src/daemon/src/alive_monitor/details/factory/StaticConfig.hpp which is then used in both places.
The background is, when this code was still supporting Deadline and Logical supervisions, then different kinds of checkpoints were required.
But now only Alive supervision support is left, which only needs sending "Alive notificiations" - for which he just use checkpointId 1.
There was a problem hiding this comment.
We would need to reorganize the code a bit
Looking at:
score/launch_manager/src/daemon/src/alive_monitor/details/factory/StaticConfig.hpp
score/launch_manager/src/alive/src/alive.cpp
we might want to move StaticConfig.hpp into score/launch_manager/src/common or a similar location. We might want to create a ticket out if this comment to work in it later?
|
|
||
| logger_r.LogDebug() << kLogPrefix << "Successfully created alive supervision worker object:" | ||
| << f_alive_r.back().getConfigName(); | ||
| continue; |
There was a problem hiding this comment.
I think this case should not really happen. If the component is supervised then alive supervision had to be specified.
Maybe we should have an assert here?
| aliveSupCfg.checkpointBufferSize = bufferConfig_r.bufferSizeAliveSupervision; | ||
| aliveSupCfg.recoveryClient = f_recoveryClient_r; | ||
|
|
||
| aliveSupCfg.processIdentifier = score::lcm::IdentifierHash{comp->name}; |
There was a problem hiding this comment.
I some other place the function getProcessId was used to calculate the process Id. Maybe we should use the same way, to avoid accidentally using a different string as a base.
Maybe change getProcessId to this, then the decision which string is hashed is in a single place:
score::lcm::IdentifierHash getProcessId(const score::mw::launch_manager::configuration::ComponentConfig* config) {
return score::lcm::IdentifierHash{comp->name};
}
| void logConfiguration() noexcept(true); | ||
|
|
||
| /// @brief Configured watchdog devices | ||
| /// By default, no watchdog device is configured |
There was a problem hiding this comment.
Maybe we should bring back some of this doxygen, thats seems rather useful
| logConfiguration(); | ||
| return true; | ||
| const auto& alive_sup = config.aliveSupervision(); | ||
| if (alive_sup.evaluation_cycle_ms != 0U) |
There was a problem hiding this comment.
alive_sup.evaluation_cycle_ms == 0 should never happen. Maybe we should make this an assert?
Maybe SCORE_LANGUAGE_FUTURECPP_PRECONDITION_PRD_MESSAGE
There was a problem hiding this comment.
Going for SCORE_LANGUAGE_FUTURECPP_PRECONDITION_PRD_MESSAGE seems to be a good idea. I chose assert to be consistent with the rest of the code base. We should change this in the code base with a separate PR & Issue.
| // strdup() returns nullptr on OOM. On this embedded target, OOM during daemon | ||
| // startup is unrecoverable — the OS will terminate the process. | ||
| size_t arg_index = 0U; | ||
| startup.argv_[arg_index++] = strdup(startup.executable_path_.c_str()); |
There was a problem hiding this comment.
We could use comp.deployment_config.executable_path.c_str(), but this would create a mixed ownership between the configuration object in main and the adapter. The Adapter copies the configuration so that it owns it. Yes this costs a bit of resources but is the simplest and clearest way currently to bridge the old to new configuration.
| const auto& wd = *wd_opt; | ||
| watchdog::DeviceConfig wdConfig{}; | ||
| wdConfig.fileName = wd.device_file_path; | ||
| assert(wd.max_timeout_ms <= std::numeric_limits<std::uint16_t>::max()); |
There was a problem hiding this comment.
Should we make this a validation in score/launch_manager/src/daemon/src/configuration/details/flatbuffer_type_converters.cpp ?
| class MachineConfigFactory : public watchdog::IDeviceConfigFactory | ||
| { | ||
| public: | ||
| /// @brief Holds different buffer sizes that may be configured in the HM Machine Config |
There was a problem hiding this comment.
The comment is no longer valid.
All these buffer sizes are currently not configurable anymore.
This seems like some fine tuning we may enable with e.g. CLI flags in the future. For now the default values should be sufficient
There was a problem hiding this comment.
changed to "Holds buffer sizes for supervision objects (currently using compile-time defaults)"
| } | ||
| dep.target_process_id_ = IdentifierHash{dep_name}; | ||
| os_process.dependencies_.push_back(dep); | ||
| } |
| for (const auto& comp_name : fallback.depends_on) { | ||
| auto it = component_to_process_index.find(comp_name); | ||
| if (it != component_to_process_index.end()) { | ||
| fallback_state.process_indexes_.push_back(it->second); |
There was a problem hiding this comment.
I think here we also need to resolve the dependencies recursively.
E.g. fallback_run_target depends_on [CompA]
and CompA depends_on [CompB] then
fallback_run_target = [CompA, CompB]
I think it would be good to extend the UT with this case

Closes: #209