mkv/README.md
2026-03-07 16:12:29 +01:00

6.9 KiB

mkv

Distributed key-value store for blobs. Thin index server (Rust + SQLite) in front of nginx volume servers. Inspired by minikeyvalue.

Usage

# Start the index server (replicates to 2 of 3 volumes)
mkv -d /tmp/index.db -v http://vol1:8080,http://vol2:8080,http://vol3:8080 -r 2 serve -p 3000

# Store a file
curl -X PUT -d "contents" http://localhost:3000/path/to/key

# Retrieve (returns 302 redirect to nginx)
curl -L http://localhost:3000/path/to/key

# Check existence and size
curl -I http://localhost:3000/path/to/key

# Delete
curl -X DELETE http://localhost:3000/path/to/key

# List keys (with optional prefix filter)
curl http://localhost:3000/?prefix=path/to/

Operations

# Rebuild index by scanning all volumes (disaster recovery)
mkv -d /tmp/index.db -v http://vol1:8080,http://vol2:8080,http://vol3:8080 -r 2 rebuild

# Rebalance after adding/removing volumes (preview with --dry-run)
mkv -d /tmp/index.db -v http://vol1:8080,http://vol2:8080,http://vol3:8080,http://vol4:8080 -r 2 rebalance --dry-run
mkv -d /tmp/index.db -v http://vol1:8080,http://vol2:8080,http://vol3:8080,http://vol4:8080 -r 2 rebalance

Volume servers

Any nginx with WebDAV enabled works:

server {
    listen 80;
    root /data;
    location / {
        dav_methods PUT DELETE;
        create_full_put_path on;
        autoindex on;
        autoindex_format json;
    }
}

What it does

  • HTTP API — PUT, GET (302 redirect), DELETE, HEAD, LIST with prefix filtering
  • Replication — fan-out writes to N volumes concurrently, all-or-nothing with rollback
  • Consistent hashing — stable volume assignment; adding/removing a volume only moves ~1/N of keys
  • Rebuild — reconstructs the SQLite index by scanning nginx autoindex on all volumes
  • Rebalance — migrates data to correct volumes after topology changes, with --dry-run preview
  • Key-as-path — blobs stored at /{key} on nginx, no content-addressing or sidecar files
  • Single binary — no config files, everything via CLI flags

What it doesn't do

  • Checksums — no integrity verification; bit rot goes undetected
  • Auth — no access control; anyone who can reach the server can read/write/delete
  • Encryption — blobs stored as plain files on nginx
  • Streaming / range requests — entire blob must fit in memory
  • Metadata — no EXIF, tags, or content types; key path is all you get
  • Versioning — PUT overwrites; no history
  • Compression — blobs stored as-is

Comparison to minikeyvalue

mkv is a ground-up rewrite of minikeyvalue in Rust.

mkv minikeyvalue
Language Rust Go
Index SQLite (WAL mode) LevelDB
Storage paths key-as-path (/{key}) content-addressed (md5 + base64)
GET behavior Index lookup, 302 redirect HEAD to volume first, then 302 redirect
PUT overwrite Allowed Forbidden (returns 403)
Hash function SHA-256 per volume, sort by score MD5 per volume, sort by score
MD5 of values No Yes (stored in index)
Health checker No No (checks per-request via HEAD)
Subvolumes No Yes (configurable fan-out directories)
Soft delete No (hard delete) Yes (UNLINK + DELETE two-phase)
S3 API No Partial (list, multipart upload)
App code ~600 lines ~1,000 lines
Tests 17 (unit + integration) 1

Performance (10k keys, 1KB values, 100 concurrency)

Tested on the same machine with shared nginx volumes:

Operation mkv minikeyvalue
PUT 10,000 req/s 10,500 req/s
GET (full round-trip) 7,000 req/s 6,500 req/s
GET (index only) 15,800 req/s 13,800 req/s
DELETE 13,300 req/s 13,600 req/s

Both are bottlenecked by nginx volume I/O. The index layer (SQLite) can sustain 378,000 writes/sec in isolation.

Error responses

Every error returns a plain-text body with a human-readable message.

Status Error When
404 Not Found not found GET, HEAD, DELETE for a key that doesn't exist
500 Internal Server Error corrupt record for key {key}: no volumes Key exists in index but has no volume locations (data integrity issue)
500 Internal Server Error database error: {detail} SQLite failure (disk full, corruption, locked)
502 Bad Gateway not all volume writes succeeded PUT where one or more volume writes failed; all volumes are rolled back
503 Service Unavailable need {n} volumes but only {m} available PUT when fewer volumes are configured than the replication factor requires

Failure modes

PUT writes to all target volumes concurrently, then updates the index. If any volume write fails, all volumes are rolled back (best-effort) and the client gets 502. If volume writes succeed but the index update fails, volumes are rolled back and the client gets 500.

DELETE removes the key from the index and issues best-effort deletes to all volumes. Volume delete failures are logged but do not fail the request — the client always gets 204 if the key existed. This can leave orphaned blobs on volumes; use rebuild to reconcile.

GET looks up the key in the index and returns a 302 redirect to the first volume. If the volume is unreachable, the client sees the failure directly from nginx (the index server does not proxy the blob).

Security

mkv assumes a trusted network. There is no built-in authentication, authorization, or encryption. This is the same security model as minikeyvalue — neither system is designed for direct exposure to the public internet.

Trust model

The index server and volume servers (nginx) are expected to live on the same private network. GET requests return a 302 redirect to a volume URL, so clients must be able to reach the volumes directly. Anyone who can reach the index server can read, write, and delete any key. Anyone who can reach a volume can read any blob.

Deploying with auth

Put a reverse proxy in front of the index server and handle authentication there:

  • Basic auth or API keys at the reverse proxy for simple setups
  • mTLS for machine-to-machine access
  • OAuth / JWT validation at the proxy for multi-user setups

Volume servers should be on a private network that clients cannot reach directly, or use nginx's secure_link module to validate signed redirect URLs.

What neither mkv nor minikeyvalue protect against

  • Unauthorized reads/writes (no auth)
  • Data in transit (no TLS unless the proxy adds it)
  • Data at rest (blobs are plain files on disk)
  • Malicious keys (no input sanitization beyond what nginx enforces on paths)
  • Index tampering (SQLite file has no integrity protection)