# Model Choice Always review the [Unsloth model notebook](https://unsloth.ai/docs/get-started/unsloth-notebooks) for your choice. ## Language Models Instruct models are pre-trained with built-in instructions, making them ready to use without any fine-tuning. These models, including GGUFs and others commonly available, are optimized for direct usage and respond effectively to prompts right out of the box. Unsloth recommends starting with **Instruct models**, as they allow direct fine-tuning using conversational chat templates (ChatML, ShareGPT etc.) and require less data compared to **Base models** (which uses Alpaca, Vicuna etc). Learn more about the differences between [instruct and base models here](https://unsloth.ai/docs/get-started/fine-tuning-llms-guide/what-model-should-i-use#instruct-or-base-model). If you pick a model with `-4bit` quants, they use a bit more memory than normal 4-bit BitsAndBytes setups, but at significantly higher accuracy. ## Quantization Levels | Format | VRAM use | Accuracy | Use case | | --------- | -------- | -------- | ------------------------------ | | 16-bit | highest | best | full fine-tuning, small models | | 4-bit bnb | low | good | qLoRA on consumer GPUs | | GGUF | varies | good | inference only (llama.cpp) | Do not confuse training quantization (4-bit/qLoRA) with inference formats (GGUF). You fine-tune in 4-bit, then export to GGUF for deployment.