# Vector DB index choices
## Flat index (brute-force)
Use Flat to start:
- Dataset is small to low‑medium (e.g., up to a few hundred thousand vectors, sometimes a few million if latency is acceptable).
- You need **exact** nearest neighbors (100% recall) for evaluation, benchmarks, or high‑precision applications.
- Implementation simplicity matters more than performance tuning.
- Memory is not a problem and queries per second (QPS) are modest.
Avoid Flat:
- Latency grows linearly with the number of vectors and is no longer acceptable.
- You need to scale to tens/hundreds of millions of vectors and high QPS.
## HNSW (graph-based ANN)
Use HNSW when:
- You want very **high recall** (often 95–99%+) with much lower query time than Flat.
- Dimensionality is moderate–high (typical LLM embeddings).
- Dataset is small to large (from tens of thousands up to many millions) and RAM is sufficient.
- You care a lot about query latency and quality, less about index build time and memory overhead.
Trade‑offs:
- High memory overhead (roughly ≥1.3–1.5× the raw vectors, often more).
- Slower and more complex index construction and tuning.
- Updates/deletes are supported but not as cheap as a simple Flat scan; suitability depends on implementation.
Avoid HNSW when:
- You are extremely memory‑constrained.
- You have very frequent reindexing or very dynamic data and want super fast builds.
## IVF / IVFFlat (coarse quantizer + Flat per list)
Use IVF(-Flat) when:
- Dataset is **medium to large** (millions to hundreds of millions of vectors).
- You need a good balance of speed, memory, and recall, with tunable trade‑offs (e.g., `n_lists`, `n_probes`).
- You can tolerate approximate results (e.g., 90–97% recall).
- Index build time and memory usage must be better than HNSW.
Characteristics:
- Clusters vectors into `n_lists`; at query time you probe only a subset of lists.
- Lower memory than HNSW if you store raw vectors, and **much** faster index build for large collections.
- Very good fit for large‑scale vector DBs, sharding, and filtered search scenarios when combined with other techniques.
Avoid IVF-Flat when:
- Dataset is very small (added complexity is not worth it).
- You need absolute best recall and have no latency/memory concerns (then Flat or HNSW may be better).
## PQ / IVF-PQ / HNSW+PQ (quantized/compressed)
“PQ index” usually means some form of product quantization, often on top of IVF (e.g., IVF-PQ) or another structure.
Use PQ (or IVF-PQ) when:
- Dataset is **very large** (hundreds of millions to billions of vectors).
- Memory is the main bottleneck: you must compress embeddings heavily.
- You can accept a noticeable drop in recall (e.g., down to 80–90% or worse depending on code size/settings).
- You need high query throughput at scale and are okay with approximate answers.
Characteristics:
- Replaces or augments raw vectors with PQ codes (compressed representation).
- Huge memory savings and often better cache behavior.
- Lower recall and sometimes more complex tuning (codebooks, code size, training samples).
Avoid PQ when:
- You need near‑exact results or very high recall in a small/medium‑scale system.
- You have enough RAM to store raw vectors and prefer simpler behavior.