# Model Choice
Always review the [Unsloth model notebook](https://unsloth.ai/docs/get-started/unsloth-notebooks) for your choice.
## Language Models
Instruct models are pre-trained with built-in instructions, making them ready to use without any fine-tuning. These models, including GGUFs and others commonly available, are optimized for direct usage and respond effectively to prompts right out of the box.
Unsloth recommends starting with **Instruct models**, as they allow direct fine-tuning using conversational chat templates (ChatML, ShareGPT etc.) and require less data compared to **Base models** (which uses Alpaca, Vicuna etc). Learn more about the differences between [instruct and base models here](https://unsloth.ai/docs/get-started/fine-tuning-llms-guide/what-model-should-i-use#instruct-or-base-model).
If you pick a model with `-4bit` quants, they use a bit more memory than normal 4-bit BitsAndBytes setups, but at significantly higher accuracy.
## Quantization Levels
| Format | VRAM use | Accuracy | Use case |
| --------- | -------- | -------- | ------------------------------ |
| 16-bit | highest | best | full fine-tuning, small models |
| 4-bit bnb | low | good | qLoRA on consumer GPUs |
| GGUF | varies | good | inference only (llama.cpp) |
Do not confuse training quantization (4-bit/qLoRA) with inference formats (GGUF). You fine-tune in 4-bit, then export to GGUF for deployment.