What to measure
Three numbers carry most decisions: recall@k against an exact baseline, p95 query latency at your real top-k, and write throughput at your real batch size. Anything else is downstream of these.
elips bench /tmp/bench_db --count 100000 --dim 768Tuning HNSW
| Parameter | Default | Trade-off |
|---|---|---|
M | 16 | Higher → better recall, more memory. |
ef_construction | 200 | Higher → better graph, slower writes. |
ef_search | 64 | Higher → better recall, slower reads. |
Use ExactIndex at the same query set as a ground-truth baseline when sweeping ef_search.
Plan for filters
Equality and set-membership predicates accelerate through MetadataIndex. When the candidate set is small enough the planner switches to exact_candidates — often faster than running ANN over the full population. Inspect with explain_seek.
Durability cost
paranoid calls fsync per write; throughput is bounded by your storage's sync rate. standard drops the fsync but keeps the flush. relaxed buffers and amortises at checkpoint — pick it when crash recovery to the most recent checkpoint is acceptable.
GPU batching
With ELIPS_GPU_ENABLED, the dynamic batcher coalesces concurrent queries into a single GPU launch. Increase the in-flight query window in your serving layer to give the batcher room to work; very low concurrency leaves the GPU under-fed.