Tune what you measure. Measure what you tune.

Query latency — p50 / p95 / p99 per vault, per index type
Recall@k — measured against an exact-search ground truth on a sampled subset
Candidate-set size — emit from explain_seek(); spikes here predict tail latency
WAL flush time — paranoid/standard durability shifts cost into commit

The three knobs

ef_search — query-time recall vs. latency (no rebuild)
ef_construction — build-time recall ceiling (rebuild required)
max_connections — graph density vs. memory (rebuild required)

The elips bench command builds a synthetic database at a given dimension and count, then runs warm-cache queries — use it as a sanity floor when comparing hardware or builds.