DoRA (Decomposed Low-Rank Adaptation)
Last updated
Last updated
DoRA (Decomposed Low-Rank Adaptation) is another extended version of LoRA. This technique takes the concept of Low-Rank Adaptation (LoRA) a step further by decomposing the pre-trained weight matrices into two distinct components: magnitude and direction. This decomposition allows for more efficient fine-tuning by applying directional updates through LoRA, minimizing the number of trainable parameters and improving the overall model's learning efficiency and training stability.
Decompose Pretrained Weights
Magnitude (𝑚): Represents the scale of the weights, initialized as the norm of the pretrained weights.
Direction (𝑉): The normalized pretrained weight matrix.
Adapt Direction
Recombine Magnitude and Direction
After training, the updated weights are merged back by recomposing the magnitude ( 𝑚 ) and the adapted direction.
Generate Merged Weights
The final merged weight (𝑊′) incorporates the pretrained knowledge from 𝑊 along with the fine-tuned updates, ready for downstream tasks.
The implementation of DoRA is quite similar to LoRA when using Hugging Face's PEFT library:
Both methods utilize The same steps we covered on the previous sections of this guide. The main distinction lies in enabling DoRA through the configuration. Instead of the standard LoRA setup, you initialize the configuration with the use_dora parameter set to True:
This change activates the decomposition mechanism specific to DoRA.
The pretrained weight is decomposed into two components:
During fine-tuning, updates ( Δ 𝑉) are applied only to the directional component (𝑉), which is represented using low-rank matrices (𝐴 and 𝐵). This enables efficient parameter adaptation while keeping the number of trainable parameters minimal. The updated directional component is recalculated as: