LoRA (Low-Rank Adaptation)

What is LoRA?

LoRA, or Low-Rank Adaptation, is a technique designed to make fine-tuning large AI models more efficient. It's one of the key techniques under the umbrella of PEFT, which aim to reduce the computational overhead involved in adjusting large pre-trained models. LoRA achieves this efficiency by breaking down the model’s weight matrices into smaller, low-rank components. This reduces the number of parameters that need to be updated during fine-tuning, making the process faster and more cost-effective.

The LoRA Fine-Tuning Process

LoRA simplifies the process of model adaptation by focusing on specific parts of the model, typically the weight matrices. Here’s how the LoRA technique works:

Selecting Target Layers for Fine-Tuning

he first step in the LoRA process is identifying which layers or components of the model should be fine-tuned. These layers typically involve the weight matrices 𝑊 that define how the model processes input data. By narrowing the focus to specific layers, LoRA ensures that only the necessary parts of the model are adapted, rather than adjusting the entire network of parameters.

Decomposing Weight Matrices

Once the target layers are identified, the next step involves decomposing the large weight matrix into smaller, low-rank matrices. Instead of working with a single, large matrix, LoRA breaks it down into two smaller matrices that can be updated more easily. This decomposition allows for more efficient fine-tuning since the smaller matrices require less computational power to adjust.

The original matrix 𝑊 can be approximated by multiplying two smaller matrices 𝐴 and 𝐵 :

Updating Only the Smaller Matrices

In this step, instead of updating the entire large matrix 𝑊 , LoRA focuses on updating only the smaller, low-rank matrices 𝐴 and 𝐵 . These matrices are the only ones that change during fine-tuning, while the rest of the model’s parameters remain fixed.

Reconstructing the Weight Matrices

After the low-rank matrices are updated, they are multiplied back together to recreate the updated weight matrix. The recombined matrix is given by:

This recomposed matrix now reflects the adjustments made during fine-tuning. The recombination process allows the model to retain its original structure while incorporating the new, fine-tuned parameters.

Integrating the Updated Weights

Once the low-rank matrices are recombined, the updated weight matrix is integrated back into the model. This replacement does not alter the rest of the model’s architecture, which preserves its original capabilities and knowledge.

Implementing PEFT fine-tuning with LORA is very simple. Here's a step-by-step guide to using LoRA for fine-tuning a pre-trained model for sentiment classification:

We start by importing necessary libraries to load datasets, manage the LoRA configuration, and handle the model and training setup.

from datasets import load_dataset
from peft import PeftModel, PeftConfig, LoraConfig, TaskType, get_peft_model
from transformers import TrainingArguments, Trainer, AutoModelForSequenceClassification, AutoTokenizer

we then create a lable mapping that is used to tell the model how to interpret the target labels.This mapping ensures that the model can understand and output sentiment labels correctly during the training and evaluation process.

id_to_label = {0: "NEGATIVE", 1: "POSITIVE"}
label_to_id = {"NEGATIVE": 0, "POSITIVE": 1}

We load the pre-trained model for sequence classification. The model is pre-trained on a large corpus of text and has learned general language patterns. The model is then initialized with our mappings so that it can correctly handle the classification task.

model= AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", id2label=id2label, label2id=label2id)

We load the Rotten Tomatoes dataset using the load_dataset function. This dataset contains movie reviews labeled as either "positive" or "negative" sentiment.

dataset = load_dataset("rotten_tomatoes")
dataset

We load a tokenizer corresponding to the pre-trained model. The tokenizer is essential because it converts raw text into token IDs that the model can process. It ensures that the input text is in a form that the model understands.

checkpoint = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

Next, we'll create a tokenization function, which we'll use to apply the tokenizer to the dataset to convert the text into tokenized input.

def tokenizer_func(input):
  return tokenizer(input["text"], truncation=True)
  
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
data_tokenized = data.map(tokenizer_func, batched=True)
#Let's also prepare the dataset
train_data_tokenized = data_tokenized["train"].remove_columns(["text"]).rename_column("label", "labels")
val_data_tokenized = data_tokenized["validation"].remove_columns(["text"]).rename_column("label", "labels")

This is where LoRA (Low-Rank Adaptation) comes into play. By configuring LoRA, we efficiently adapt the model with fewer trainable parameters.

lora_config = LoraConfig(
    r=8,  # rank - see hyperparameter section for more details
    lora_alpha=32,  # scaling factor - see hyperparameter section for more details
    target_modules=["q_lin", "v_lin"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_CLS
)
peft_model = get_peft_model(model, lora_config)
peft_model.print_trainable_parameters()

We then set up the training configuration. This configuration controls how the model will be trained.

output_dir = f'./rotten-tomatoes-classification-training-{str(int(time.time()))}'
training_args = TrainingArguments(
    output_dir=output_dir,
    learning_rate=1e-5,
    logging_steps=1,
    max_steps=10
)

Finally, we set up the Trainer and start the training process. This step kicks off the fine-tuning process, and the model will start learning to classify the movie reviews into "positive" or "negative" categories.

trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_data_tokenized,
    eval_dataset=val_data_tokenized,
    data_collator=data_collator,
    tokenizer=tokenizer,
)
trainer.train()

Make sure to adjust the parameter space to optimize performance based on your specific use case.

PreviousParameter-Efficient Fine-Tuning (PEFT)NextQLoRA (Quantized LoRA)

Last updated 7 months ago