LLM Guide
  • An introduction to Large Language Models
  • Understanding Pre-trained Language Models
  • Solutions to LLM Challenges
    • Prompt Engineering
    • Neuro-Symbolic Methods
    • Retrieval-Augmented Generation (RAG)
    • Honorable Mentions
  • Fine-Tuning
  • Supervised Fine-Tuning Strategies
    • Full Parameter Fine-Tuning
    • Half Fine-Tuning (HFT)
    • Parameter-Efficient Fine-Tuning (PEFT)
      • LoRA (Low-Rank Adaptation)
      • QLoRA (Quantized LoRA)
      • DoRA (Decomposed Low-Rank Adaptation)
      • NEFTune (Noise-Enhanced Fine-Tuning)
  • Fine-tuning Best Practices
  • Fine-tuning Using Ubiai (No-Codesolution)
  • Evaluation of Fine-Tuned Models
    • Evaluation Techniques
    • Task specific Evaluation Metrics
    • Popular Benchmarks
    • Best Practices for Model Evaluation
  • Directory of Links By Section
Powered by GitBook
On this page
  • What is DoRA?
  • The DoRA Process Explained
  1. Supervised Fine-Tuning Strategies
  2. Parameter-Efficient Fine-Tuning (PEFT)

DoRA (Decomposed Low-Rank Adaptation)

PreviousQLoRA (Quantized LoRA)NextNEFTune (Noise-Enhanced Fine-Tuning)

Last updated 4 months ago

What is DoRA?

DoRA (Decomposed Low-Rank Adaptation) is another extended version of LoRA. This technique takes the concept of Low-Rank Adaptation (LoRA) a step further by decomposing the pre-trained weight matrices into two distinct components: magnitude and direction. This decomposition allows for more efficient fine-tuning by applying directional updates through LoRA, minimizing the number of trainable parameters and improving the overall model's learning efficiency and training stability.

The DoRA Process Explained

1

Decompose Pretrained Weights

  • Magnitude (𝑚): Represents the scale of the weights, initialized as the norm of the pretrained weights.

  • Direction (𝑉): The normalized pretrained weight matrix.

2

Adapt Direction

3

Recombine Magnitude and Direction

After training, the updated weights are merged back by recomposing the magnitude ( 𝑚 ) and the adapted direction.

4

Generate Merged Weights

The final merged weight (𝑊′) incorporates the pretrained knowledge from 𝑊 along with the fine-tuned updates, ready for downstream tasks.

The implementation of DoRA is quite similar to LoRA when using Hugging Face's PEFT library:

Both methods utilize The same steps we covered on the previous sections of this guide. The main distinction lies in enabling DoRA through the configuration. Instead of the standard LoRA setup, you initialize the configuration with the use_dora parameter set to True:

from peft import LoraConfig, get_peft_model

# Initialize DoRA configuration
config = LoraConfig(
    #Some other parameters here
    use_dora=True
    
)

This change activates the decomposition mechanism specific to DoRA.

The pretrained weight ​ is decomposed into two components:

During fine-tuning, updates ( Δ 𝑉) are applied only to the directional component (𝑉), which is represented using low-rank matrices (𝐴 and 𝐵). This enables efficient parameter adaptation while keeping the number of trainable parameters minimal. The updated directional component is recalculated as: