LLM Fine-Tuning Techniques¶

Making foundation models your own. Fine-tuning adapts pre-trained LLMs to specific tasks, domains, or behaviors -- from lightweight LoRA adapters that train in hours on a single GPU to full alignment techniques like DPO that shape model values.

LoRA and Variants¶

Method	Description	Paper	Code
LoRA	Low-Rank Adaptation of Large Language Models. Adds trainable low-rank matrices to attention layers while freezing pretrained weights. Rank 4-16 is typically sufficient.	Paper	Code
LoRA+	Introduces different learning rates for matrices A and B (B gets 16x higher LR), yielding ~2% accuracy improvement and 2x faster training.	Paper	-
AdaLoRA	Adaptive budget allocation for LoRA - dynamically allocates parameter budget based on importance scoring via SVD parameterization.	Paper	Code
QLoRA	Quantizes pretrained model to 4-bit, then adds trainable low-rank adapters. Uses 4-bit NormalFloat, double quantization, and paged optimizers.	Paper	Code
DoRA	Weight-Decomposed Low-Rank Adaptation. Decomposes pretrained weights into magnitude and direction, then fine-tunes separately. By NVIDIA.	Paper	Code
PiSSA	Principal Singular Values and Singular Vectors Adaptation. Modifies LoRA initialization using SVD for significantly better fine-tuning.	Paper	Code
MOELoRA	Combines Mixture of Experts (MOE) with LoRA for multi-task parameter-efficient fine-tuning, especially for medical applications.	Paper	Code
LoRA-FA	Freezes matrix A after initialization (as random projection), trains only matrix B. Halves parameter count with comparable performance.	Paper	-
LoRA-drop	Algorithm to determine which layers need LoRA fine-tuning and which don't, based on output evaluation.	Paper	-
Delta-LoRA	Updates the base weight matrix W using the gradient of AB (difference between consecutive timesteps), controlled by hyperparameter lambda.	Paper	-

Other Fine-Tuning Methods¶

Method	Description	Paper	Code
PEFT	HuggingFace library implementing multiple parameter-efficient fine-tuning methods.	-	Code
Instruction Tuning	Fine-tuning LLMs on (instruction, output) pairs to improve instruction-following and controllability.	Paper	Code
Prefix Tuning	Adds trainable continuous prefixes to each layer while keeping the LM frozen. Task-specific virtual tokens (soft prompts).	Paper	Code
Prompt Tuning	Simplified version of Prefix Tuning. Adds soft prompt tokens only at the input layer.	Paper	Code
P-Tuning	Converts prompts into learnable embeddings processed by MLP+LSTM. Enabled GPT to surpass BERT on SuperGLUE.	Paper	Code
P-Tuning v2	Adds prompt tokens at every layer (not just input). Removes reparameterization encoder, uses task-specific prompt lengths.	Paper	Code
Adapter Tuning	Inserts small adapter modules into each Transformer layer. Only trains adapters and LayerNorm (~3.6% added params).	Paper	Code
BitFit	Sparse fine-tuning method that only updates bias parameters.	Paper	Code
DPO	Direct Preference Optimization - trains the language model directly as a reward model, eliminating the need for separate reward model training in RLHF.	Paper	Code