Half Fine-Tuning (HFT)

Half Fine-Tuning represents an efficient compromise between full parameter fine-tuning and feature extraction approaches in the adaptation of pre-trained language models. This methodology strategically updates only a portion of the model's parameters while keeping others frozen, solving the issue of catastrophic forgetting. an important techniques within HFT is Selective Layer Freezing.

Selective Layer Freezing

strategically freezes certain layers of the neural network while allowing others to be updated during the training process. This technique is founded on the observation that different layers in a neural network capture different levels of abstraction, with lower layers typically capturing more general features and higher layers handling task-specific features. By selectively freezing layers, we can maintain the model's foundational knowledge while adapting specific components for new tasks.

Freezing Strategy

The implementation of Selective Layer Freezing requires careful consideration of which layers should be frozen. There are several strategies available, with the two most commonly used being alternate freezing and freezing the lower half of the layers. Let's dive into how to implement these strategies.

Alternate Freezing: In the alternate freezing strategy, the model freezes every even-numbered layer, as these layers tend to capture more general, lower-level abstractions. By freezing these layers and allowing the higher layers to adapt, the model retains its foundational knowledge while focusing on task-specific learning in the upper layers.
Freezing the Lower Half: In the lower-half freezing strategy, the first half of the model's layers are frozen, preserving their general knowledge. Meanwhile, the higher layers, which are responsible for more task-specific features, remain trainable. This allows the model to leverage its pre-existing knowledge while tailoring the higher-level features to suit the new task.

Here's a comprehensive implementation example that demonstrates the core concepts:

To begin, define the Finetuner class, which handles Selective Layer Freezing. The _init_ method receives two parameters: the model to fine-tune and a freeze_pattern dictating the freezing strategy. Upon creation, it invokes setup_layer_freezing to determine which model layers to freeze.

class SelectiveLayerFineTuner:
    def __init__(self, model, freeze_pattern='alternate'):
        self.model = model
        self.freeze_pattern = freeze_pattern
        self.setup_layer_freezing()

then we Configure Layer Freezing, the model's transformer layers are accessed and frozen according to the chosen strategy. The method first retrieves all layers of the transformer model. For the alternate pattern, it freezes every even-numbered layer, as they tend to capture lower-level abstractions. For the lower_half pattern, the method freezes the lower half of the layers, preserving their foundational knowledge while keeping the higher layers trainable for task-specific adaptations. Each frozen layer's parameters are set to requires_grad = False, ensuring they remain unchanged during training.

def setup_layer_freezing(self):
        # Get all transformer layers
        layers = #Model layers
        
        if self.freeze_pattern == 'alternate':
            # Freeze alternate layers
            for i, layer in enumerate(layers):
                if i % 2 == 0:  # Freeze even-numbered layers
                    for param in layer.parameters():
                        param.requires_grad = False
        
        elif self.freeze_pattern == 'lower_half':
            # Freeze lower half of the layers
            num_layers = len(layers)
            for i, layer in enumerate(layers):
                if i < num_layers // 2:
                    for param in layer.parameters():
                        param.requires_grad = False

Fine-Tuning the Model

Once the selective layer freezing is set up, the next step is to define the training loop for fine-tuning the model. You'll need to choose an appropriate loss function and optimization algorithm, configure the learning rate, and set the number of epochs. During the training process, only the layers that are not frozen will have their parameters updated, while the frozen layers will retain their original values.

PreviousFull Parameter Fine-Tuning NextParameter-Efficient Fine-Tuning (PEFT)

Last updated 7 months ago