Conversation
There was a problem hiding this comment.
Pull request overview
Adds LFM2 tool-call parsing support to OVMS LLM I/O processing, wiring it into the existing OutputParser tool-parser selection and providing a dedicated LFM2 output parser test suite.
Changes:
- Introduced
Lfm2ToolParser(unary + streaming parsing) undersrc/llm/io_processing/lfm2/. - Registered the new parser in
OutputParser(parser name"lfm2") and Bazel build targets. - Added comprehensive LFM2 output parser tests (including streaming scenarios and malformed inputs).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
src/test/llm/output_parsers/lfm2_output_parser_test.cpp |
New gtest coverage for LFM2 tool-call parsing (unary + streaming). |
src/llm/io_processing/output_parser.cpp |
Registers "lfm2" tool parser and includes its header. |
src/llm/io_processing/lfm2/tool_parser.hpp |
Declares Lfm2ToolParser state machine and parsing helpers. |
src/llm/io_processing/lfm2/tool_parser.cpp |
Implements LFM2 unary + streaming parsing and argument normalization. |
src/llm/BUILD |
Adds Bazel library target for the LFM2 parser and links it into output_parsers. |
| this->currentState = State::Content; | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
parseChunk ignores finishReason. If generation ends while the tool call is incomplete (e.g., missing <|tool_call_end|> or )), the parser may never emit the final arguments/content and will leave state stuck. Other parsers use finishReason != NONE to flush/close structures; consider doing the same here to finalize whatever has been accumulated.
| if (finishReason != ov::genai::GenerationFinishReason::NONE) { | |
| if (this->currentState == State::ToolCallParameters || this->currentState == State::ToolCallEnded) { | |
| if (!this->toolCall.arguments.empty()) { | |
| return wrapDeltaArgs(this->toolCall.arguments, toolCallIndex); | |
| } | |
| } else if (this->currentState == State::Content) { | |
| if (this->streamingPosition < this->streamingContent.size()) { | |
| auto content = this->streamingContent.substr(this->streamingPosition); | |
| this->streamingPosition = this->streamingContent.size(); | |
| if (!content.empty()) { | |
| return wrapDeltaContent(content); | |
| } | |
| } | |
| } | |
| } |
There was a problem hiding this comment.
Consider this. We need to remain functional even if max_tokens is exceeded
| bool Lfm2ToolParser::parseInContentState() { | ||
| size_t pos = this->streamingContent.find(TOOL_CALL_START_TAG, this->streamingPosition); | ||
| size_t toolCallEndTagPos = this->streamingContent.find(TOOL_CALL_END_TAG, this->streamingPosition); | ||
| if (toolCallEndTagPos != std::string::npos && pos == std::string::npos) { | ||
| SPDLOG_LOGGER_INFO(llm_calculator_logger, "Detected end of tool call at position: {}", toolCallEndTagPos); | ||
| this->streamingPosition = toolCallEndTagPos + TOOL_CALL_END_TAG.length(); | ||
| return false; | ||
| } | ||
| if (pos != std::string::npos) { | ||
| this->streamingPosition = pos + TOOL_CALL_START_TAG.length() + TOOL_LIST_START_INDICATOR.length(); | ||
| this->currentState = State::ToolCallStarted; | ||
| SPDLOG_LOGGER_INFO(llm_calculator_logger, "Detected start of tool call at position: {}", pos); | ||
| return false; | ||
| } | ||
|
|
There was a problem hiding this comment.
Streaming: parseInContentState advances streamingPosition past <|tool_call_start|>[ as soon as it finds the tag, but it never emits the content that may precede the tag in the same buffer. Because OutputParser can call toolParser->parseChunk() with a buffer that contains both normal content and the start tag (during UNKNOWN→TOOL transition), this will drop user-visible content. Please return a content delta for any prefix before the start tag (or keep the state as Content until the prefix has been emitted).
| std::optional<rapidjson::Document> Lfm2ToolParser::parseChunk(const std::string& chunk, ov::genai::GenerationFinishReason finishReason) { | ||
| if (chunk.empty()) { | ||
| return std::nullopt; | ||
| } | ||
|
|
||
| this->streamingContent += chunk; | ||
|
|
||
| if (parseNewContent()) { | ||
| if (this->currentState == State::ToolCallParameters) { | ||
| return BaseOutputParser::wrapFirstDelta(this->toolCall.name, toolCallIndex); | ||
| } | ||
| if (this->currentState == State::ToolCallEnded) { | ||
| return wrapDeltaArgs(this->toolCall.arguments, toolCallIndex); | ||
| } | ||
| if (this->currentState == State::Content) { | ||
| auto content = this->streamingContent.substr(this->streamingPosition); | ||
| this->streamingPosition += content.size(); | ||
| return wrapDeltaContent(content); | ||
| } | ||
| if (this->currentState == State::AfterToolCall) { | ||
| this->currentState = State::Content; | ||
| } | ||
| } | ||
|
|
||
| return std::nullopt; | ||
| } |
There was a problem hiding this comment.
Streaming: streamingContent is only appended to and never trimmed, while streamingPosition monotonically increases. For long responses this can grow without bound and increase substring/search costs over time. Consider erasing the processed prefix (or keeping only an unprocessed tail) once streamingPosition moves forward, to keep memory and time bounded.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| const std::string Lfm2ToolParser::TOOL_ARGS_END_INDICATOR = ")"; | ||
| const std::string Lfm2ToolParser::TOOL_SEPARATOR_STR = ", "; | ||
|
|
||
| void Lfm2ToolParser::writeArgumentOfAnyType(const rapidjson::Value& arg, rapidjson::Writer<rapidjson::StringBuffer>& writer) { |
There was a problem hiding this comment.
If that's a standalone function could it be static?
Also it sounds quite generic. Can we extract it to utils? Not sure if we already have such file in io_processing...
| } | ||
| writer.EndObject(); | ||
| } else { | ||
| writer.String(""); |
There was a problem hiding this comment.
Should we write anything if logging error?
There was a problem hiding this comment.
in other case the key will be left like this {key: bad_value, another_key: value} -> {"key", "another_key", "value"}
| const char last = normalized.back(); | ||
| if ((first == '{' && last == '}') || (first == '[' && last == ']')) { | ||
| std::replace(normalized.begin(), normalized.end(), '\'', '"'); | ||
| SPDLOG_LOGGER_TRACE(llm_calculator_logger, "Argument contains curly braces or square brackets, replaced single quotes with double quotes for JSON parsing. Modified string: {}", normalized); |
There was a problem hiding this comment.
Is it always safe? What if we get argument with list of strings:
["hello", "it's me"]
?
There was a problem hiding this comment.
it's right, I will think how can I change it
| this->currentState = State::Content; | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Consider this. We need to remain functional even if max_tokens is exceeded
| size_t pos = 0; | ||
| int main_guard = 0; | ||
|
|
||
| while (pos != std::string::npos && main_guard < MAX_TOOL_CALLS) { |
There was a problem hiding this comment.
Is this guard really needed?
There was a problem hiding this comment.
This parser will be used with models from 2.6B to 24B, if it would use 2.6B for multiturn it gets confused and generates tool calls and doesn't follow it's usual template. To avoid infinity loops I would live it
🛠 Summary
CVS-182500
Adding toolp parser for LFM2 model
🧪 Checklist
``