LFM2 parser#4085

Open

przepeck wants to merge 29 commits intomainfrom

przepeck/lfm2_parser

Collaborator

przepeck commented Mar 25, 2026

🛠 Summary

CVS-182500
Adding toolp parser for LFM2 model

🧪 Checklist

Unit tests added.
The documentation updated.
Change follows security best practices.
``

przepeck added 15 commits

March 10, 2026 10:05


          save

d10ef38


          save

cd90b91


          save

087d054


          save

2f6d306


          save

f6e900d


          build fix

5cdcd2e


          save

d835022


          save

7af29b0


          save

53cc07c


          save

c2bda14


          streaming

70b063b


          fix and test

1782e6a


          save

e7d8581


          save

f69fed0


          improvements

bb6366d

przepeck requested review from Copilot, dtrawins and mzegla

March 25, 2026 12:41

Copilot started reviewing on behalf of przepeck

March 25, 2026 12:42

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

Adds LFM2 tool-call parsing support to OVMS LLM I/O processing, wiring it into the existing OutputParser tool-parser selection and providing a dedicated LFM2 output parser test suite.

Changes:

Introduced Lfm2ToolParser (unary + streaming parsing) under src/llm/io_processing/lfm2/.
Registered the new parser in OutputParser (parser name "lfm2") and Bazel build targets.
Added comprehensive LFM2 output parser tests (including streaming scenarios and malformed inputs).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
`src/test/llm/output_parsers/lfm2_output_parser_test.cpp`	New gtest coverage for LFM2 tool-call parsing (unary + streaming).
`src/llm/io_processing/output_parser.cpp`	Registers `"lfm2"` tool parser and includes its header.
`src/llm/io_processing/lfm2/tool_parser.hpp`	Declares `Lfm2ToolParser` state machine and parsing helpers.
`src/llm/io_processing/lfm2/tool_parser.cpp`	Implements LFM2 unary + streaming parsing and argument normalization.
`src/llm/BUILD`	Adds Bazel library target for the LFM2 parser and links it into `output_parsers`.

src/llm/io_processing/lfm2/tool_parser.cpp Outdated Show resolved Hide resolved

src/llm/io_processing/lfm2/tool_parser.cpp Outdated Show resolved Hide resolved

src/llm/io_processing/lfm2/tool_parser.cpp Outdated Show resolved Hide resolved

src/llm/io_processing/lfm2/tool_parser.cpp Show resolved Hide resolved

src/llm/io_processing/lfm2/tool_parser.cpp

+                          this->currentState = State::Content;
+                      }
+                  }

Copilot AI Mar 25, 2026

parseChunk ignores finishReason. If generation ends while the tool call is incomplete (e.g., missing <|tool_call_end|> or )), the parser may never emit the final arguments/content and will leave state stuck. Other parsers use finishReason != NONE to flush/close structures; consider doing the same here to finalize whatever has been accumulated.

Suggested change

      
                if (finishReason != ov::genai::GenerationFinishReason::NONE) {
          
                    if (this->currentState == State::ToolCallParameters || this->currentState == State::ToolCallEnded) {
          
                        if (!this->toolCall.arguments.empty()) {
          
                            return wrapDeltaArgs(this->toolCall.arguments, toolCallIndex);
          
                        }
          
                    } else if (this->currentState == State::Content) {
          
                        if (this->streamingPosition < this->streamingContent.size()) {
          
                            auto content = this->streamingContent.substr(this->streamingPosition);
          
                            this->streamingPosition = this->streamingContent.size();
          
                            if (!content.empty()) {
          
                                return wrapDeltaContent(content);
          
                            }
          
                        }
          
                    }
          
                }

Copilot uses AI. Check for mistakes.

Collaborator

mzegla Mar 25, 2026

Consider this. We need to remain functional even if max_tokens is exceeded

src/llm/io_processing/lfm2/tool_parser.cpp Show resolved Hide resolved

src/llm/io_processing/lfm2/tool_parser.cpp

Comment on lines +162 to +176

+              bool Lfm2ToolParser::parseInContentState() {
+                  size_t pos = this->streamingContent.find(TOOL_CALL_START_TAG, this->streamingPosition);
+                  size_t toolCallEndTagPos = this->streamingContent.find(TOOL_CALL_END_TAG, this->streamingPosition);
+                  if (toolCallEndTagPos != std::string::npos && pos == std::string::npos) {
+                      SPDLOG_LOGGER_INFO(llm_calculator_logger, "Detected end of tool call at position: {}", toolCallEndTagPos);
+                      this->streamingPosition = toolCallEndTagPos + TOOL_CALL_END_TAG.length();
+                      return false;
+                  }
+                  if (pos != std::string::npos) {
+                      this->streamingPosition = pos + TOOL_CALL_START_TAG.length() + TOOL_LIST_START_INDICATOR.length();
+                      this->currentState = State::ToolCallStarted;
+                      SPDLOG_LOGGER_INFO(llm_calculator_logger, "Detected start of tool call at position: {}", pos);
+                      return false;
+                  }

Copilot AI Mar 25, 2026

Streaming: parseInContentState advances streamingPosition past <|tool_call_start|>[ as soon as it finds the tag, but it never emits the content that may precede the tag in the same buffer. Because OutputParser can call toolParser->parseChunk() with a buffer that contains both normal content and the start tag (during UNKNOWN→TOOL transition), this will drop user-visible content. Please return a content delta for any prefix before the start tag (or keep the state as Content until the prefix has been emitted).

Copilot uses AI. Check for mistakes.

src/llm/io_processing/lfm2/tool_parser.cpp

Comment on lines +288 to +313

+              std::optional<rapidjson::Document> Lfm2ToolParser::parseChunk(const std::string& chunk, ov::genai::GenerationFinishReason finishReason) {
+                  if (chunk.empty()) {
+                      return std::nullopt;
+                  }
+                  this->streamingContent += chunk;
+                  if (parseNewContent()) {
+                      if (this->currentState == State::ToolCallParameters) {
+                          return BaseOutputParser::wrapFirstDelta(this->toolCall.name, toolCallIndex);
+                      }
+                      if (this->currentState == State::ToolCallEnded) {
+                          return wrapDeltaArgs(this->toolCall.arguments, toolCallIndex);
+                      }
+                      if (this->currentState == State::Content) {
+                          auto content = this->streamingContent.substr(this->streamingPosition);
+                          this->streamingPosition += content.size();
+                          return wrapDeltaContent(content);
+                      }
+                      if (this->currentState == State::AfterToolCall) {
+                          this->currentState = State::Content;
+                      }
+                  }
+                  return std::nullopt;
+              }

Copilot AI Mar 25, 2026

Streaming: streamingContent is only appended to and never trimmed, while streamingPosition monotonically increases. For long responses this can grow without bound and increase substring/search costs over time. Consider erasing the processed prefix (or keeping only an unprocessed tail) once streamingPosition moves forward, to keep memory and time bounded.

Copilot uses AI. Check for mistakes.

src/llm/io_processing/lfm2/tool_parser.cpp Show resolved Hide resolved

src/test/llm/output_parsers/lfm2_output_parser_test.cpp Show resolved Hide resolved

dtrawins requested a review from atobiszei

March 25, 2026 13:15

przepeck and others added 5 commits

March 25, 2026 14:52


          Style

daf83e8


          Apply suggestions from code review

d481a5a

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>


          fixes


          clang-format

93806de


          cpplint

3f32e01

mzegla reviewed

View reviewed changes

src/llm/io_processing/lfm2/tool_parser.cpp Outdated

+              const std::string Lfm2ToolParser::TOOL_ARGS_END_INDICATOR = ")";
+              const std::string Lfm2ToolParser::TOOL_SEPARATOR_STR = ", ";
+              void Lfm2ToolParser::writeArgumentOfAnyType(const rapidjson::Value& arg, rapidjson::Writer<rapidjson::StringBuffer>& writer) {

Collaborator

mzegla Mar 25, 2026

If that's a standalone function could it be static?
Also it sounds quite generic. Can we extract it to utils? Not sure if we already have such file in io_processing...

src/llm/io_processing/lfm2/tool_parser.cpp Outdated

+                      }
+                      writer.EndObject();
+                  } else {
+                      writer.String("");

Collaborator

mzegla Mar 25, 2026

Should we write anything if logging error?

Collaborator Author

przepeck Mar 25, 2026

in other case the key will be left like this {key: bad_value, another_key: value} -> {"key", "another_key", "value"}

src/llm/io_processing/lfm2/tool_parser.cpp

+                  const char last = normalized.back();
+                  if ((first == '{' && last == '}') || (first == '[' && last == ']')) {
+                      std::replace(normalized.begin(), normalized.end(), '\'', '"');
+                      SPDLOG_LOGGER_TRACE(llm_calculator_logger, "Argument contains curly braces or square brackets, replaced single quotes with double quotes for JSON parsing. Modified string: {}", normalized);

Collaborator

mzegla Mar 25, 2026

Is it always safe? What if we get argument with list of strings:

["hello", "it's me"]

?

Collaborator Author

przepeck Mar 25, 2026

it's right, I will think how can I change it

src/llm/io_processing/lfm2/tool_parser.cpp Outdated Show resolved Hide resolved

src/llm/io_processing/lfm2/tool_parser.cpp Outdated Show resolved Hide resolved

src/llm/io_processing/lfm2/tool_parser.cpp

+                          this->currentState = State::Content;
+                      }
+                  }

Collaborator

mzegla Mar 25, 2026

Consider this. We need to remain functional even if max_tokens is exceeded

src/llm/io_processing/lfm2/tool_parser.cpp

+                  size_t pos = 0;
+                  int main_guard = 0;
+                  while (pos != std::string::npos && main_guard < MAX_TOOL_CALLS) {

Collaborator

mzegla Mar 25, 2026

Is this guard really needed?

Collaborator Author

przepeck Mar 26, 2026

This parser will be used with models from 2.6B to 24B, if it would use 2.6B for multiturn it gets confused and generates tool calls and doesn't follow it's usual template. To avoid infinity loops I would live it

src/llm/io_processing/lfm2/tool_parser.cpp Outdated Show resolved Hide resolved

src/test/llm/output_parsers/lfm2_output_parser_test.cpp Show resolved Hide resolved

src/llm/io_processing/lfm2/tool_parser.cpp Show resolved Hide resolved

przepeck added 3 commits

March 25, 2026 17:11


          adding model to script, enabling tests

15d8c5f


          save

39c5004


          save

24096ff

przepeck added 6 commits

March 27, 2026 08:09


          changed normalization to use rapidjson instead of manual string parsing

2fe1f07


          clang-format

551bffc


          streaming fix and test


          moving generic, static functions to utils

35d5f19


          clang-format, test fix

c88878b


          whitelist

59b8576

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet