What to measure

Three numbers carry most decisions: recall@k against an exact baseline, p95 query latency at your real top-k, and write throughput at your real batch size. Anything else is downstream of these.

bash

elips bench /tmp/bench_db --count 100000 --dim 768

Tuning HNSW

Parameter	Default	Trade-off
`M`	`16`	Higher → better recall, more memory.
`ef_construction`	`200`	Higher → better graph, slower writes.
`ef_search`	`64`	Higher → better recall, slower reads.

Use ExactIndex at the same query set as a ground-truth baseline when sweeping ef_search.

Plan for filters

Equality and set-membership predicates accelerate through MetadataIndex. When the candidate set is small enough the planner switches to exact_candidates — often faster than running ANN over the full population. Inspect with explain_seek.

Durability cost

paranoid calls fsync per write; throughput is bounded by your storage's sync rate. standard drops the fsync but keeps the flush. relaxed buffers and amortises at checkpoint — pick it when crash recovery to the most recent checkpoint is acceptable.

GPU batching

With ELIPS_GPU_ENABLED, the dynamic batcher coalesces concurrent queries into a single GPU launch. Increase the in-flight query window in your serving layer to give the batcher room to work; very low concurrency leaves the GPU under-fed.