Skip to content

Kandinsky5 Image Editing model#12223

Open
xkvma wants to merge 17 commits intoComfy-Org:masterfrom
gen-ai-team:i2i_kandinsky
Open

Kandinsky5 Image Editing model#12223
xkvma wants to merge 17 commits intoComfy-Org:masterfrom
gen-ai-team:i2i_kandinsky

Conversation

@xkvma
Copy link
Copy Markdown

@xkvma xkvma commented Feb 2, 2026

1. Image Editing model support.

Adds support of Image Editing Kandinsky5 model.

Generation examples

Prompt Change cat to a husky.
cat_in_hat husky1
Prompt Change this girl to cyberpunk female warrior in dynamic battle stance on neon-lit rainy rooftop ledge surrounded by towering skyscrapers and holographic ads, long silver hair flowing in wind, piercing green eyes, intricate cybernetic high-tech armor with glowing neon blue accents and red underglow, wielding radiant blue plasma lightsaber with energy trails, form-fitting tactical bodysuit under armored plates with circuit patterns, flowing red cape with digital glitches, dramatic cyberpunk sunset lighting with volumetric god rays piercing through mist and rain, puddles reflecting neon signs, highly detailed, cinematic composition, ultra-realistic sci-fi rendering, 8k resolution, sharp focus, intricate details, artstation trending.
knight_girl cyberpunk_girl
Prompt Change this girl's t-shirt color to green
korean_girl korean_green

2. NABLA attention support (REVERTED)

(duplicates #11371)

NABLA is efficient attention mechanism introduced by Kandinsky Team and used in 10 sec video models. Speeds up generation by ~2.7x time without significant drop in quality.

Configuration Inference Time
K5 Lite T2V 768x512 10sec (without NABLA) 766 sec
K5 Lite T2V 768x512 10sec (with NABLA) 306 sec

Generation example

Prompt Rim light, side light, soft light, medium close-up, dusk, sunset, central composition, warm tones with low saturation, telephoto lens. A woman with fluffy brown curly hair stands elegantly in front of a magnificent stained glass window. She is wearing a flowing white dress, her hair neatly combed back, and her soft facial contours are gently illuminated by the colorful light filtering through the window from outside. The woman is talking to someone off-camera, yet there is a hint of sadness in her eyes, adding a layer of depth to her mysterious temperament. The background is dim, with a strong contrast between light and shadow, further emphasizing the tension of the character's emotions. The stained glass, under the glow of the setting sun, casts colorful light and shadows, enhancing the artistic sense and atmosphere of the overall picture.

without NABLA
https://github.com/user-attachments/assets/2e599e1e-0a09-4693-84f7-59f4a55a5748

with NABLA
https://github.com/user-attachments/assets/54633d02-0e55-45e6-a831-e21c893146c5
https://github.com/user-attachments/assets/7dc6b00c-5e1e-45bb-b1ee-c6a73601a730

3. Return typo to tokenizer

Returns the typo "promt" to the tokenizer instead of "prompt", since Kandinsky models were trained with this typo and is expected to perform better with it.

@xkvma
Copy link
Copy Markdown
Author

xkvma commented Feb 3, 2026

@Kosinkadink @comfyanonymous @guill
Could you please leave some feedback here?

@comfyanonymous
Copy link
Copy Markdown
Member

@kijai can you check this?

@kijai
Copy link
Copy Markdown
Contributor

kijai commented Feb 6, 2026

The image edit model support looks fine, but I don't think we should have NABLA automatically activate like that, especially since it relies on torch.compile. I did previously implement NABLA as attention override patch in KJNodes for testing, so if we do want that in the core it should still be a separate patch node in my opinion, similar to this:

https://github.com/kijai/ComfyUI-KJNodes/blob/fb9d5764d21d23a3f52186aeccbb259efac96f9c/nodes/model_optimization_nodes.py#L1682

And I guess they reverted the typo fix for the prompt template huh, suppose we should too then.

@xkvma
Copy link
Copy Markdown
Author

xkvma commented Feb 9, 2026

@comfyanonymous Should I replace NABLA with KJ's node then?

@Kosinkadink
Copy link
Copy Markdown
Member

For this PR, would be good to strip out the NABLA stuff entirely, that should be a separate PR (which I believe you already made). We would need to discuss NABLA separately. Like kijai said, the image edit stuff here should be good so stripping out NABLA is the only blocker rn.

@xkvma
Copy link
Copy Markdown
Author

xkvma commented Feb 11, 2026

Thanks! I removed NABLA, image editing can be merged.
It would be good if a decision on NABLA support was made soon, as users are complaining about the model's performance and NABLA significantly improves the situation

@xkvma xkvma changed the title Kandinsky5 Image Editing model and NABLA attention support Kandinsky5 Image Editing model Feb 11, 2026
Kosinkadink
Kosinkadink previously approved these changes Feb 14, 2026
category="advanced/conditioning/kandinsky5",
inputs=[
io.Clip.Input("clip"),
io.String.Input("clip_l", multiline=True, dynamic_prompts=True),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't break compatibility with old workflows by changing inputs like this.

Make a new node for the image edit if you only want a single prompt box in the node.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with new node CLIPTextEncodeKandinsky5ImageToImage

Comment thread comfy_extras/nodes_kandinsky5.py Outdated
def execute(cls, vae, batch_size, start_image) -> io.NodeOutput:
height, width = start_image.shape[1:-1]
available_res = [(1024, 1024), (640, 1408), (1408, 640), (768, 1280), (1280, 768), (896, 1152), (1152, 896)]
nearest_index = torch.argmin(torch.Tensor([abs((h / w) - (height / width))for (h, w) in available_res]))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

@xkvma xkvma Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done
Kept torch's resize because common_upscale gives slightly different results

@xkvma
Copy link
Copy Markdown
Author

xkvma commented Feb 24, 2026

@Kosinkadink @comfyanonymous I think it's ready to merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants