An embedded retrieval engine
for vectors & documents.

ELIPS is the local, in-process layer beneath your application. ANN and exact indexes, first-class document lineage, hybrid retrieval, WAL recovery, segmented persistence — without running a separate service.

Read the docs Ask AI View on GitHub →

quickstart.py

import elips

engine = elips.connect(":memory:", dimension=128)
arena = engine.arena("documents")

arena.ingest(
    texts=["alpha design note", "beta incident runbook"],
    meta=[{"kind": "design"}, {"kind": "ops"}],
)

for hit in arena.probe_text("alpha", top=2):
    print(hit.key, hit.distance, hit.text, hit.meta)

From the notebook

The shape of ELIPS, sketched out.

Architecture decisions worth illustrating. Every diagram in the docs is rendered the same way — hand-drawn, editorial, never stock.

Hexagonal ports: nothing imports an engine

WAL first, segments later, recovery always

Two scores, one planner, one ranking

Inside a hybrid query

Every retrieval walks a small, inspectable pipeline.

The planner emits a QueryPlan with a strategy, candidate set, metadata acceleration flags, and any text component. You can read it directly — in Python or C++.

Plan

Resolve filters, choose strategy.

Filter

Narrow via MetadataIndex equality sets.

Probe

ANN, exact, or hybrid fusion.

Rank

Re-sort by distance or metadata.

Yield

Project requested fields and return.

Why ELIPS

A retrieval primitive,
not a service.

Embedded by design

One process. Advisory file locks coordinate readers and writers. No daemons, no sidecars.

Document-aware records

Every record may carry text, chunk coordinates, and embedding lineage — restored across restarts.

ANN and exact, behind one port

HNSW and an exact index plug into the same IndexPort. GPU indexes follow the same contract.

Hybrid retrieval

seek_text, seek_hybrid, and EQL share one planner. Lexical overlap fuses with vector distance.

Crash-safe WAL

Every mutation appends with CRC32C before the in-memory store changes. Corrupt tails truncate cleanly.

Inspectable planner

explain_seek returns the strategy, candidate set, and acceleration flags used by the query.

Two SDKs, one core

The same runtime, in the language you reach for.

Python

python_sdk.py

import elips

db = elips.open("/tmp/elips", dimension=128, metric="cosine")
docs = db.vault("documents")
docs.place_document("alpha design note", {"kind": "design"})
hit = docs.seek_text("alpha", top=1)[0]
print(hit.document.text, hit.distance)

C++23

quickstart.cpp

#include "elips/elips.hpp"

auto db = elips::open(
    ":memory:",
    elips::Config{}.dimension(128).metric(elips::Metric::cosine));

auto& docs = db->vault("documents");
docs.place_document("alpha design note",
                    {{"kind", std::string{"design"}}});

auto hits = docs.seek_text("alpha", 1);

Why a vector database

SQL asks "equals?"
Vectors ask "close to?"

Meaning lives in geometry. Embeddings turn language, code, and user behavior into points in ℝᵈ, and the only useful question becomes "what's near this point?" — the question relational databases were never built to answer.

A vector database is the missing primitive between raw embeddings and the agents, search bars, and recommenders that consume them. ELIPS makes it small enough to embed and durable enough to trust.

Embed once, search by proximity forever. The query is just another point in the same space.

Agentic flow

Agents need memory. ELIPS is the memory.

Every useful agent loop ends the same way — retrieve, reason, respond, remember. ELIPS lives inside the loop, not across a network boundary, so the retrieval step costs microseconds and the write-back is just another function call.

Retrieve → reason → respond → remember. The whole cycle inside one process.

Episodic memory

Every turn is embedded and placed back into a vault for the next session.

Semantic memory

Entity cards and stable facts live in their own vault and survive restarts.

Document corpus

PDFs, code, tickets — chunked with lineage so citations are exact.

Tool traces

Past tool outputs are searchable, so the agent learns from its own runs.

System design

A low-level look at an agent stack on ELIPS.

Five layers, no sidecars. The agent runtime calls one client; the client talks to vaults; vaults route through the planner and the ports; everything terminates in a WAL frame and a segment on disk.

Five layers from orchestration to persistence — the agent owns the bytes the whole way down.

Algorithms

One contract, three engines.

HNSW for scale, exact for ground truth, GPU for throughput — all behind IndexPort. The planner never branches on which one is mounted.

The planner sees one shape — IndexPort. Recall/latency trade-offs are a config switch, not a rewrite.

GPU acceleration

Coalesce, launch once, ship.

A dynamic batcher gathers concurrent queries inside a tiny window and fires a single kernel. One HBM trip, saturated SMs, std::expected on every fallible call.

DynamicBatcher turns N CPU-side queries into one GPU launch — the only honest way to amortise PCIe.

Learn ELIPS, end to end

The sixteen-lesson tutorial.

From pip install to GPU-accelerated production serving, with sketched diagrams and runnable Python + C++ on every page.

Start the tutorial Lesson 1 →

Embed it once. Forget it ships with your binary.

Install ELIPS Read the architecture

An embedded retrieval enginefor vectors & documents.