High Initial Loss When Fine-Tuning Gemma Model

Dear authors,

Thans for your great works!

 I am currently trying to fine-tune the Gemma-7B model using PiSSA, but I am encountering an issue where the initial loss  and grad norm are extremely high. 

This doesn't seem to be cuased by the pissa algorithm, since using LoRA to fine-tune Gemma-7B also has similar problem.

Do you have encounted this question, or have any ideas on how to solve it? Thanks a lot!



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High Initial Loss When Fine-Tuning Gemma Model #38

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

High Initial Loss When Fine-Tuning Gemma Model #38

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions