[Feature] Enable safe per-request LoRA loading (implementation included)

### Feature Summary

Support for safe, per-request LoRA loading when running in HTTP Server mode

### Detailed Description

I would like to propose a patch to enable per-request LoRA loading when running in HTTP Server. Currently, the server ignores LoRA tags (e.g., <lora:name:0.5>) in prompts because the generation parameters are initialized with an empty string for the LoRA directory.

I am unsure of the original motivation for explicitly disabling this path, but simply enabling it causes "weight stacking" (state pollution) on the persistent server process.

The attached diff implements the following:

1. **Enables Parsing:** Updates gen\_params.process\_and\_check to use ctx\_params.lora\_model\_dir, allowing the backend to find and load the LoRA files.  
2. **Enables Safety:** Forces ctx\_params.lora\_apply\_mode \= LORA\_APPLY\_AT\_RUNTIME during server initialization. This ensures LoRA calculations are applied dynamically during graph execution without permanently altering the base model weights.

**Proposed Change**

Diff
```diff
diff --git a/examples/server/main.cpp b/examples/server/main.cpp
index 5c951c0..4fb57b0 100644
--- a/examples/server/main.cpp
+++ b/examples/server/main.cpp
@@ -282,6 +282,7 @@ int main(int argc, const char** argv) {
     LOG_DEBUG("%s", default_gen_params.to_string().c_str());
 
     sd_ctx_params_t sd_ctx_params = ctx_params.to_sd_ctx_params_t(false, false, false);
+    ctx_params.lora_apply_mode    = LORA_APPLY_AT_RUNTIME;
     sd_ctx_t* sd_ctx              = new_sd_ctx(&sd_ctx_params);
 
     if (sd_ctx == nullptr) {
@@ -392,7 +393,7 @@ int main(int argc, const char** argv) {
                 return;
             }
 
-            if (!gen_params.process_and_check(IMG_GEN, "")) {
+            if (!gen_params.process_and_check(IMG_GEN, ctx_params.lora_model_dir)) {
                 res.status = 400;
                 res.set_content(R"({"error":"invalid params"})", "application/json");
                 return;
@@ -570,7 +571,7 @@ int main(int argc, const char** argv) {
                 return;
             }
 
-            if (!gen_params.process_and_check(IMG_GEN, "")) {
+            if (!gen_params.process_and_check(IMG_GEN, ctx_params.lora_model_dir)) {
                 res.status = 400;
                 res.set_content(R"({"error":"invalid params"})", "application/json");
                 return;
```

### Alternatives you considered

* **Naive Implementation:** Just passing the directory without changing the apply mode. This resulted in model corruption after a few requests due to weight accumulation.  
* **External Management:** Running separate server instances for different LoRA configurations, which is resource-heavy and lacks flexibility.

### Additional context

* **File:** examples/server/main.cpp  
* **Logic:** common.hpp (Lines \~440) describes at\_runtime as the method that avoids precision issues and permanent weight modification, making it suitable for a long-running server process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Enable safe per-request LoRA loading (implementation included) #1143

Feature Summary

Detailed Description

Alternatives you considered

Additional context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature] Enable safe per-request LoRA loading (implementation included) #1143

Description

Feature Summary

Detailed Description

Alternatives you considered

Additional context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions