-
Notifications
You must be signed in to change notification settings - Fork 483
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Feature Summary
Support for safe, per-request LoRA loading when running in HTTP Server mode
Detailed Description
I would like to propose a patch to enable per-request LoRA loading when running in HTTP Server. Currently, the server ignores LoRA tags (e.g., lora:name:0.5) in prompts because the generation parameters are initialized with an empty string for the LoRA directory.
I am unsure of the original motivation for explicitly disabling this path, but simply enabling it causes "weight stacking" (state pollution) on the persistent server process.
The attached diff implements the following:
- Enables Parsing: Updates gen_params.process_and_check to use ctx_params.lora_model_dir, allowing the backend to find and load the LoRA files.
- Enables Safety: Forces ctx_params.lora_apply_mode = LORA_APPLY_AT_RUNTIME during server initialization. This ensures LoRA calculations are applied dynamically during graph execution without permanently altering the base model weights.
Proposed Change
Diff
diff --git a/examples/server/main.cpp b/examples/server/main.cpp
index 5c951c0..4fb57b0 100644
--- a/examples/server/main.cpp
+++ b/examples/server/main.cpp
@@ -282,6 +282,7 @@ int main(int argc, const char** argv) {
LOG_DEBUG("%s", default_gen_params.to_string().c_str());
sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(false, false, false);
+ ctx_params.lora_apply_mode = LORA_APPLY_AT_RUNTIME;
sd_ctx_t* sd_ctx = new_sd_ctx(&sd_ctx_params);
if (sd_ctx == nullptr) {
@@ -392,7 +393,7 @@ int main(int argc, const char** argv) {
return;
}
- if (!gen_params.process_and_check(IMG_GEN, "")) {
+ if (!gen_params.process_and_check(IMG_GEN, ctx_params.lora_model_dir)) {
res.status = 400;
res.set_content(R"({"error":"invalid params"})", "application/json");
return;
@@ -570,7 +571,7 @@ int main(int argc, const char** argv) {
return;
}
- if (!gen_params.process_and_check(IMG_GEN, "")) {
+ if (!gen_params.process_and_check(IMG_GEN, ctx_params.lora_model_dir)) {
res.status = 400;
res.set_content(R"({"error":"invalid params"})", "application/json");
return;Alternatives you considered
- Naive Implementation: Just passing the directory without changing the apply mode. This resulted in model corruption after a few requests due to weight accumulation.
- External Management: Running separate server instances for different LoRA configurations, which is resource-heavy and lacks flexibility.
Additional context
- File: examples/server/main.cpp
- Logic: common.hpp (Lines ~440) describes at_runtime as the method that avoids precision issues and permanent weight modification, making it suitable for a long-running server process.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request