# Model Loading
## FastLanguageModel.from_pretrained
By default this aims to load a model for the LoRA process, and uses `load_in_16bit`.
### `max_sequence_length`
The parameter sets the context window for model training/inference.
Unsloth can set `max_sequence_length` above the model's original capacity because it uses RoPE scaling.
Unsloth supports up to 4x longer context windows.
Unsloth computes a scaling factor and adjusts how positions are mapped into RoPE's frequency space so that all positions fit into the frequency range the model was trained on.
Because max. sequence length has a huge impact on training, you want to keep it small initially, during exploration (512, 1024).
### `dtype`
Controls the compute dtype. Pass `None` for auto-detection (recommended). Explicit options: `torch.bfloat16` (Ampere+), `torch.float16`.
### `load_in_4bit`
Turns on qLoRA by using the bitsandbytes 4-bit runtime quantization engine.
### `full_finetuning`
Enables full fine-tuning. Can be done in 8 or 4 bit quantized modes.
By default, it loads models in 16bit mode.
### `fast_inference=False`
To disable vLLM when you want to do GRPO on a model that has no vLLM support.
---
## FastLanguageModel.get_peft_model
Wrap the model with low-rank adapters for Huggingface' performance-efficient fine-tuning (HF PEFT).
So it creates the `peft.LoraConfig` for you, injects the LoRA layers into the model, and freezes the base model (by setting `requires_grad = False` in PyTorch).
Unlike normal PEFT, Unsloth also makes your model more efficient to train because it:
- patches attention kernels into fused kernels (more efficient)
- applies memory-efficient gradient logic
- enables faster backprop kernels
- ensures compatibility with 4-bit / qLoRA models
- optionally activates gradient check-pointing optimized for (q)LoRA.
### `r`
The rank parameter (smaller vs more expressive adapters).
| r | Reason |
| ----- | --------------------- |
| 4 | extremely lightweight |
| 8 | common small adapter |
| 16 | typical default |
| 32-64 | high-capacity LoRA |
### `lora_alpha`
Alpha divided by rank to scale the actual LoRA update.
$W' = W + \frac{\alpha}{r} B A$
Without scaling, increasing the rank would increase update magnitude.
LoRA update scaling ensures a stable update magnitude independent of rank.
Typical settings use `lora_alpha = 2 * r` to provide strong updates.
Using the same value as `r` ensures linear updates relative to the LoRA rank.
### `use_rslora`
Enables RSLoRA, which scales by `1/sqrt(r)` instead of `1/r`.
Recommended for high-rank adapters (r >= 32), as it stabilizes training and often improves quality.
### `lora_dropout`
Dropout applied to the LoRA adapter path only.
Purpose: regularization to prevent adapter over-fitting.
| value | meaning |
| ----- | ----------------------- |
| 0 | common default |
| 0.05 | mild regularization |
| 0.1 | stronger regularization |
### `bias`
Controls whether bias parameters are trainable.
| option | meaning |
| ------------- | ------------------------------ |
| `"none"` | biases frozen |
| `"lora_only"` | biases in LoRA layers trainable|
| `"all"` | all biases trainable |
Typical default: `bias="none"`. Training biases increase parameter count and often gives little benefit.
### `use_gradient_checkpointing`
Unsloth-specific convenience option (value: `True` or `"unsloth"`).
Enables activation checkpointing so that:
- fewer activations are stored
- recomputation used during backward
Effect:
- much lower VRAM usage
- slightly slower training
### `target_modules`
Defines which linear layers receive LoRA adapters (smaller adapters).
Typical transformer attention layers are: q_proj, k_proj, v_proj, o_proj
Sometimes also: gate_proj, up_proj, down_proj
```python
target_modules = [
"q_proj",
"k_proj",
"v_proj",
"o_proj",
]
```
Check the [Unsloth model notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks) for the best recommendation.
### `random_state`
Seed used when initializing LoRA matrices. Important for reproducibility.