elips/docs
Concepts

Storage & recovery

A persistent ELIPS database is a directory. Each file inside it has a single responsibility: identity, WAL, manifest, segments, snapshot, lock, embedder.

On disk

/my_dbLOCKadvisory flock — single writerIDENTITYdimension · metric · indexTEXT_EMBEDDER.manifestprovider · model · fingerprintwal.logCRC32C framed mutationselips.manifestsegmented mode roottext_embedder/rehydratable artifactsegments/atomically renamed segment fileselips.snapshotcompat single-file layoutone directory = one database
One directory holds the entire database. Files are independently meaningful and atomically replaced.
/my_db/
/my_db/
├── LOCK                       # advisory file lock
├── IDENTITY                   # dimension, metric, index type
├── TEXT_EMBEDDER.manifest     # embedder identity
├── wal.log                    # write-ahead log
├── elips.manifest             # segmented mode root
├── text_embedder/
│   └── default_v1_<dim>.localembed
├── segments/
│   └── vault_<n>_<epoch>.segment
└── elips.snapshot             # snapshot mode (compat)

IDENTITY

The durable source of truth for dimension, metric, and index type. Existing databases reopen with this identity; passing a conflicting value raises ConfigError.

Embedder manifest

TEXT_EMBEDDER.manifest records provider, model, revision, dimension, fingerprint, whether the embedder is rehydratable, and a relative artifact path when applicable. For the built-in local embedder this manifest plus the .localembed artifact is everything required to restore the same embedder on reopen.

WAL

Every mutation appends to wal.log before the in-memory vault changes. Records are framed with a CRC32C. Supported ops:

  • insert — vector + payload.
  • erase — id-only.
  • insert_ex — full document attachment, chunk info, and embedding lineage.

Durability controls when the log flushes:

ModeFlush
paranoidFlush + fsync per write.
standardFlush per write.
relaxedBuffer until checkpoint / close.
ephemeralNo WAL attached.

Checkpoint & compact

checkpoint() writes the current logical state and truncates the WAL. In segmented mode it writes one fresh segment per vault, rewrites elips.manifest, and removes obsolete segment files. In snapshot mode it writes elips.snapshot.tmp, then atomically renames into place.

compact() rebuilds every vault index from the authoritative record store, then checkpoints — useful after large deletions or to reset graph topology.

Recovery

open()path · cfgflock LOCKRW excl · RO sharedIDENTITYdim · metric · idxEMBEDDER manifestrehydratemanifest + segs?vs snapshotload segmentselips.manifestload snapshotelips.snapshotWAL replayvalid prefix onlyRO ↛ attaches no WAL writer
Open-time recovery: acquire the lock, resolve identity & embedder, load segments or snapshot, then replay only the valid WAL prefix.

Corrupt or truncated WAL tails are tolerated: replay stops at the first invalid record and preserves the valid prefix. This is what makes ungraceful shutdowns safe in practice.

Read-only mode

Read-only opens require an existing database and take a shared lock. Multiple readers coexist; no WAL writer is attached; every mutation path raises StorageError. This is the supported mode for fan-out serving and shared-reader analytics.