Kandinsky5 Image Editing model#12223
Conversation
|
@Kosinkadink @comfyanonymous @guill |
|
@kijai can you check this? |
|
The image edit model support looks fine, but I don't think we should have NABLA automatically activate like that, especially since it relies on torch.compile. I did previously implement NABLA as attention override patch in KJNodes for testing, so if we do want that in the core it should still be a separate patch node in my opinion, similar to this: And I guess they reverted the typo fix for the prompt template huh, suppose we should too then. |
|
@comfyanonymous Should I replace NABLA with KJ's node then? |
|
For this PR, would be good to strip out the NABLA stuff entirely, that should be a separate PR (which I believe you already made). We would need to discuss NABLA separately. Like kijai said, the image edit stuff here should be good so stripping out NABLA is the only blocker rn. |
|
Thanks! I removed NABLA, image editing can be merged. |
| category="advanced/conditioning/kandinsky5", | ||
| inputs=[ | ||
| io.Clip.Input("clip"), | ||
| io.String.Input("clip_l", multiline=True, dynamic_prompts=True), |
There was a problem hiding this comment.
You can't break compatibility with old workflows by changing inputs like this.
Make a new node for the image edit if you only want a single prompt box in the node.
There was a problem hiding this comment.
Replaced with new node CLIPTextEncodeKandinsky5ImageToImage
| def execute(cls, vae, batch_size, start_image) -> io.NodeOutput: | ||
| height, width = start_image.shape[1:-1] | ||
| available_res = [(1024, 1024), (640, 1408), (1408, 640), (768, 1280), (1280, 768), (896, 1152), (1152, 896)] | ||
| nearest_index = torch.argmin(torch.Tensor([abs((h / w) - (height / width))for (h, w) in available_res])) |
There was a problem hiding this comment.
Make a node like this one instead: https://github.com/Comfy-Org/ComfyUI/blob/master/comfy_extras/nodes_flux.py#L125
There was a problem hiding this comment.
Done
Kept torch's resize because common_upscale gives slightly different results
|
@Kosinkadink @comfyanonymous I think it's ready to merge |
1. Image Editing model support.
Adds support of Image Editing Kandinsky5 model.
Generation examples
Prompt
Change cat to a husky.Prompt
Change this girl to cyberpunk female warrior in dynamic battle stance on neon-lit rainy rooftop ledge surrounded by towering skyscrapers and holographic ads, long silver hair flowing in wind, piercing green eyes, intricate cybernetic high-tech armor with glowing neon blue accents and red underglow, wielding radiant blue plasma lightsaber with energy trails, form-fitting tactical bodysuit under armored plates with circuit patterns, flowing red cape with digital glitches, dramatic cyberpunk sunset lighting with volumetric god rays piercing through mist and rain, puddles reflecting neon signs, highly detailed, cinematic composition, ultra-realistic sci-fi rendering, 8k resolution, sharp focus, intricate details, artstation trending.Prompt
Change this girl's t-shirt color to green2.
NABLA attention support(REVERTED)(duplicates #11371)
NABLA is efficient attention mechanism introduced by Kandinsky Team and used in 10 sec video models. Speeds up generation by ~2.7x time without significant drop in quality.
Generation example
Prompt
Rim light, side light, soft light, medium close-up, dusk, sunset, central composition, warm tones with low saturation, telephoto lens. A woman with fluffy brown curly hair stands elegantly in front of a magnificent stained glass window. She is wearing a flowing white dress, her hair neatly combed back, and her soft facial contours are gently illuminated by the colorful light filtering through the window from outside. The woman is talking to someone off-camera, yet there is a hint of sadness in her eyes, adding a layer of depth to her mysterious temperament. The background is dim, with a strong contrast between light and shadow, further emphasizing the tension of the character's emotions. The stained glass, under the glow of the setting sun, casts colorful light and shadows, enhancing the artistic sense and atmosphere of the overall picture.without NABLA
https://github.com/user-attachments/assets/2e599e1e-0a09-4693-84f7-59f4a55a5748
with NABLA
https://github.com/user-attachments/assets/54633d02-0e55-45e6-a831-e21c893146c5
https://github.com/user-attachments/assets/7dc6b00c-5e1e-45bb-b1ee-c6a73601a730
3. Return typo to tokenizer
Returns the typo "promt" to the tokenizer instead of "prompt", since Kandinsky models were trained with this typo and is expected to perform better with it.