elips/docs

An embedded retrieval engine
for vectors & documents.

ELIPS is the local, in-process layer beneath your application. ANN and exact indexes, first-class document lineage, hybrid retrieval, WAL recovery, segmented persistence — without running a separate service.

quickstart.py
import elips

engine = elips.connect(":memory:", dimension=128)
arena = engine.arena("documents")

arena.ingest(
    texts=["alpha design note", "beta incident runbook"],
    meta=[{"kind": "design"}, {"kind": "ops"}],
)

for hit in arena.probe_text("alpha", top=2):
    print(hit.key, hit.distance, hit.text, hit.meta)
From the notebook

The shape of ELIPS, sketched out.

Architecture decisions worth illustrating. Every diagram in the docs is rendered the same way — hand-drawn, editorial, never stock.

Vaultcore domainIndexPortWALGpuPortEmbedder
Hexagonal ports: nothing imports an engine
writeplace / eraseWALCRC32C framesMcheckpoint ↓segments — atomic rename
WAL first, segments later, recovery always
vectorANN distancelexicaloverlap scorefusedSearchResult[]
Two scores, one planner, one ranking
Inside a hybrid query

Every retrieval walks a small, inspectable pipeline.

The planner emits a QueryPlan with a strategy, candidate set, metadata acceleration flags, and any text component. You can read it directly — in Python or C++.

Plan

Resolve filters, choose strategy.

Filter

Narrow via MetadataIndex equality sets.

Probe

ANN, exact, or hybrid fusion.

Rank

Re-sort by distance or metadata.

Yield

Project requested fields and return.

Why ELIPS

A retrieval primitive,
not a service.

Embedded by design

One process. Advisory file locks coordinate readers and writers. No daemons, no sidecars.

Document-aware records

Every record may carry text, chunk coordinates, and embedding lineage — restored across restarts.

ANN and exact, behind one port

HNSW and an exact index plug into the same IndexPort. GPU indexes follow the same contract.

Hybrid retrieval

seek_text, seek_hybrid, and EQL share one planner. Lexical overlap fuses with vector distance.

Crash-safe WAL

Every mutation appends with CRC32C before the in-memory store changes. Corrupt tails truncate cleanly.

Inspectable planner

explain_seek returns the strategy, candidate set, and acceleration flags used by the query.

Two SDKs, one core

The same runtime, in the language you reach for.

Python
python_sdk.py
import elips

db = elips.open("/tmp/elips", dimension=128, metric="cosine")
docs = db.vault("documents")
docs.place_document("alpha design note", {"kind": "design"})
hit = docs.seek_text("alpha", top=1)[0]
print(hit.document.text, hit.distance)
C++23
quickstart.cpp
#include "elips/elips.hpp"

auto db = elips::open(
    ":memory:",
    elips::Config{}.dimension(128).metric(elips::Metric::cosine));

auto& docs = db->vault("documents");
docs.place_document("alpha design note",
                    {{"kind", std::string{"design"}}});

auto hits = docs.seek_text("alpha", 1);
Why a vector database

SQL asks "equals?"
Vectors ask "close to?"

Meaning lives in geometry. Embeddings turn language, code, and user behavior into points in ℝᵈ, and the only useful question becomes "what's near this point?" — the question relational databases were never built to answer.

A vector database is the missing primitive between raw embeddings and the agents, search bars, and recommenders that consume them. ELIPS makes it small enough to embed and durable enough to trust.

your datadocumentsmd · pdf · codechatsmemory · logsimages / audiomultimodaluser actionsevents · tracesembeddertext → vector ∈ ℝᵈvector space · ℝᵈquerytop-k nearest = "meaning-similar"SQL asks "equals?" — vectors ask "close to?"retrieval by similarity is the only way agents recall prior context, code, tickets, memory.
Embed once, search by proximity forever. The query is just another point in the same space.
Agentic flow

Agents need memory. ELIPS is the memory.

Every useful agent loop ends the same way — retrieve, reason, respond, remember. ELIPS lives inside the loop, not across a network boundary, so the retrieval step costs microseconds and the write-back is just another function call.

user turnprompt · tool callagent loopLLM · plannerretrieval stepseek_hybrid(q, k)LLM callgrounded replyELIPS · in-process retrievalvault · planner · index · WALepisodic memorypast turns · summariessemantic memoryfacts · entitiesdocument corpusdocs · chunks · lineagetool tracesexec resultswrite back · place_documentretrieve → reason → respond → remember
Retrieve → reason → respond → remember. The whole cycle inside one process.

Episodic memory

Every turn is embedded and placed back into a vault for the next session.

Semantic memory

Entity cards and stable facts live in their own vault and survive restarts.

Document corpus

PDFs, code, tickets — chunked with lineage so citations are exact.

Tool traces

Past tool outputs are searchable, so the agent learns from its own runs.

System design

A low-level look at an agent stack on ELIPS.

Five layers, no sidecars. The agent runtime calls one client; the client talks to vaults; vaults route through the planner and the ports; everything terminates in a WAL frame and a segment on disk.

layer 1 · orchestrationAgent runtimeloop · tool routerTool registrysearch · code · fs · httpPolicy / guardquota · safetyTelemetryspans · evalslayer 2 · retrieval boundary (RAG bus)ElipsClient — seek · seek_text · seek_hybrid · explain_seekone process · no network hoplayer 3 · vaults (per memory class)vault: episodicturn embeddings + metavault: semanticentity cardsvault: corpusdocs · chunksvault: toolstrace embeddingslayer 4 · enginePlannerstrategy · filtersIndexPorthnsw · exact · gpuMetadataIndexequality setsEmbedderlocal · hostedGpuPortbatch · streamlayer 5 · persistenceWAL (CRC32C)every mutationsegments/ (atomic rename)checkpointLOCK (flock)single writertext_embedder/rehydratableno daemons · no sidecars · agent owns the bytes
Five layers from orchestration to persistence — the agent owns the bytes the whole way down.
Algorithms

One contract, three engines.

HNSW for scale, exact for ground truth, GPU for throughput — all behind IndexPort. The planner never branches on which one is mounted.

IndexPortbuild · upsert · search · eraseHNSW (graph)M · ef_construction · ef_searchExact (flat)brute force · deterministicGPU familybrute · ivf · pq · cagralog-ish recall@k100% recall, O(N) per querybatched, async, GpuPort-boundone contract — the planner doesn't care which
The planner sees one shape — IndexPort. Recall/latency trade-offs are a config switch, not a rewrite.
GPU acceleration

Coalesce, launch once, ship.

A dynamic batcher gathers concurrent queries inside a tiny window and fires a single kernel. One HBM trip, saturated SMs, std::expected on every fallible call.

N CPU querieslatency-boundDynamicBatcherwindow_us · max_batchGpuPort kernelcos · l2 · dottop-k resultsmergedwhy this wins1 launch ≫ N launches · one HBM trip · saturated SMsbackendsCUDA · HIP / ROCm · Metal — selected by GpuSelector
DynamicBatcher turns N CPU-side queries into one GPU launch — the only honest way to amortise PCIe.
Learn ELIPS, end to end

The sixteen-lesson tutorial.

From pip install to GPU-accelerated production serving, with sketched diagrams and runnable Python + C++ on every page.

Embed it once. Forget it ships with your binary.