elips/docs
Chapter XV · GPU

DynamicBatcher and the GPU pipelines

GPUs hate small launches. The DynamicBatcher exists to make them stop being small.

request streamQ1Q2Q3Q4Q5window_us (≤500µs)next windowbatched launchQ1+Q2+Q3batched launchQ4+Q51 kernel ≫ N kernels
Concurrent queries wait inside a tiny window, then ride one kernel launch as a batch.

Configurable knobs (include/elips/gpu_engine/GpuConfig.hpp)

  • window_us — how long to wait for stragglers (microseconds)
  • max_batch — hard cap on coalesced queries per launch
  • algorithm — brute-force, IVF-Flat, IVF-PQ, graph (CAGRA), hybrid

GpuIngestionPipeline streams record vectors onto the device. GpuQuantizationPipeline trains IVF-PQ centroids and pre-encodes residuals. GpuSearchPipeline orchestrates the batched distance + top-k path. GpuProfiler instruments every stage.

cpp
auto cfg = elips::Config{}
    .dimension(768)
    .metric(elips::Metric::cosine)
    .gpu(elips::gpu::GpuConfig{}
        .policy(elips::gpu::GpuPolicy::PreferGpu)
        .algorithm(elips::gpu::Algorithm::cagra)
        .window_us(500)
        .max_batch(64));