Vector DB index choices

# Vector DB index choices ## Flat index (brute-force) Use Flat to start: - Dataset is small to low‑medium (e.g., up to a few hundred thousand vectors, sometimes a few million if latency is acceptable). - You need **exact** nearest neighbors (100% recall) for evaluation, benchmarks, or high‑precision applications. - Implementation simplicity matters more than performance tuning. - Memory is not a problem and queries per second (QPS) are modest. Avoid Flat: - Latency grows linearly with the number of vectors and is no longer acceptable. - You need to scale to tens/hundreds of millions of vectors and high QPS. ## HNSW (graph-based ANN) Use HNSW when: - You want very **high recall** (often 95–99%+) with much lower query time than Flat. - Dimensionality is moderate–high (typical LLM embeddings). - Dataset is small to large (from tens of thousands up to many millions) and RAM is sufficient. - You care a lot about query latency and quality, less about index build time and memory overhead. Trade‑offs: - High memory overhead (roughly ≥1.3–1.5× the raw vectors, often more). - Slower and more complex index construction and tuning. - Updates/deletes are supported but not as cheap as a simple Flat scan; suitability depends on implementation. Avoid HNSW when: - You are extremely memory‑constrained. - You have very frequent reindexing or very dynamic data and want super fast builds. ## IVF / IVFFlat (coarse quantizer + Flat per list) Use IVF(-Flat) when: - Dataset is **medium to large** (millions to hundreds of millions of vectors). - You need a good balance of speed, memory, and recall, with tunable trade‑offs (e.g., `n_lists`, `n_probes`). - You can tolerate approximate results (e.g., 90–97% recall). - Index build time and memory usage must be better than HNSW. Characteristics: - Clusters vectors into `n_lists`; at query time you probe only a subset of lists. - Lower memory than HNSW if you store raw vectors, and **much** faster index build for large collections. - Very good fit for large‑scale vector DBs, sharding, and filtered search scenarios when combined with other techniques. Avoid IVF-Flat when: - Dataset is very small (added complexity is not worth it). - You need absolute best recall and have no latency/memory concerns (then Flat or HNSW may be better). ## PQ / IVF-PQ / HNSW+PQ (quantized/compressed) “PQ index” usually means some form of product quantization, often on top of IVF (e.g., IVF-PQ) or another structure. Use PQ (or IVF-PQ) when: - Dataset is **very large** (hundreds of millions to billions of vectors). - Memory is the main bottleneck: you must compress embeddings heavily. - You can accept a noticeable drop in recall (e.g., down to 80–90% or worse depending on code size/settings). - You need high query throughput at scale and are okay with approximate answers. Characteristics: - Replaces or augments raw vectors with PQ codes (compressed representation). - Huge memory savings and often better cache behavior. - Lower recall and sometimes more complex tuning (codebooks, code size, training samples). Avoid PQ when: - You need near‑exact results or very high recall in a small/medium‑scale system. - You have enough RAM to store raw vectors and prefer simpler behavior.