Hi, thank you for your great work and for sharing the vision encoder weights.
I noticed that the released checkpoint currently includes the vision encoder weights. I was wondering whether you also plan to release the full fine-tuned VLM checkpoint, including the corresponding vision encoder, MLP/projector, and LLM weights or adapters.
I would like to run inference with the complete model to reproduce the reported results and test its performance on my own examples. Could you please let me know whether the full VLM weights are available, or if there is any recommended way to assemble the released vision encoder with the corresponding language model and projector for inference?
Thank you very much!
Hi, thank you for your great work and for sharing the vision encoder weights.
I noticed that the released checkpoint currently includes the vision encoder weights. I was wondering whether you also plan to release the full fine-tuned VLM checkpoint, including the corresponding vision encoder, MLP/projector, and LLM weights or adapters.
I would like to run inference with the complete model to reproduce the reported results and test its performance on my own examples. Could you please let me know whether the full VLM weights are available, or if there is any recommended way to assemble the released vision encoder with the corresponding language model and projector for inference?
Thank you very much!