Skip to content

LLM Fine-Tuning Techniques

Making foundation models your own. Fine-tuning adapts pre-trained LLMs to specific tasks, domains, or behaviors -- from lightweight LoRA adapters that train in hours on a single GPU to full alignment techniques like DPO that shape model values.

LoRA QLoRA PEFT DPO RLHF

LoRA and Variants

Method Description Paper Code
LoRA Low-Rank Adaptation of Large Language Models. Adds trainable low-rank matrices to attention layers while freezing pretrained weights. Rank 4-16 is typically sufficient. Paper Code
LoRA+ Introduces different learning rates for matrices A and B (B gets 16x higher LR), yielding ~2% accuracy improvement and 2x faster training. Paper -
AdaLoRA Adaptive budget allocation for LoRA - dynamically allocates parameter budget based on importance scoring via SVD parameterization. Paper Code
QLoRA Quantizes pretrained model to 4-bit, then adds trainable low-rank adapters. Uses 4-bit NormalFloat, double quantization, and paged optimizers. Paper Code
DoRA Weight-Decomposed Low-Rank Adaptation. Decomposes pretrained weights into magnitude and direction, then fine-tunes separately. By NVIDIA. Paper Code
PiSSA Principal Singular Values and Singular Vectors Adaptation. Modifies LoRA initialization using SVD for significantly better fine-tuning. Paper Code
MOELoRA Combines Mixture of Experts (MOE) with LoRA for multi-task parameter-efficient fine-tuning, especially for medical applications. Paper Code
LoRA-FA Freezes matrix A after initialization (as random projection), trains only matrix B. Halves parameter count with comparable performance. Paper -
LoRA-drop Algorithm to determine which layers need LoRA fine-tuning and which don't, based on output evaluation. Paper -
Delta-LoRA Updates the base weight matrix W using the gradient of AB (difference between consecutive timesteps), controlled by hyperparameter lambda. Paper -

Other Fine-Tuning Methods

Method Description Paper Code
PEFT HuggingFace library implementing multiple parameter-efficient fine-tuning methods. - Code
Instruction Tuning Fine-tuning LLMs on (instruction, output) pairs to improve instruction-following and controllability. Paper Code
Prefix Tuning Adds trainable continuous prefixes to each layer while keeping the LM frozen. Task-specific virtual tokens (soft prompts). Paper Code
Prompt Tuning Simplified version of Prefix Tuning. Adds soft prompt tokens only at the input layer. Paper Code
P-Tuning Converts prompts into learnable embeddings processed by MLP+LSTM. Enabled GPT to surpass BERT on SuperGLUE. Paper Code
P-Tuning v2 Adds prompt tokens at every layer (not just input). Removes reparameterization encoder, uses task-specific prompt lengths. Paper Code
Adapter Tuning Inserts small adapter modules into each Transformer layer. Only trains adapters and LayerNorm (~3.6% added params). Paper Code
BitFit Sparse fine-tuning method that only updates bias parameters. Paper Code
DPO Direct Preference Optimization - trains the language model directly as a reward model, eliminating the need for separate reward model training in RLHF. Paper Code