11 KiB
mkv
Distributed key-value store for blobs. Thin index server (Rust + SQLite) in front of nginx volume servers. Inspired by minikeyvalue.
Usage
# Start the index server (replicates to 2 of 3 volumes)
mkv -d /tmp/index.db -v http://vol1:8080,http://vol2:8080,http://vol3:8080 -r 2 serve -p 3000
# Store a file
curl -X PUT -d "contents" http://localhost:3000/path/to/key
# Retrieve (returns 302 redirect to nginx)
curl -L http://localhost:3000/path/to/key
# Check existence and size
curl -I http://localhost:3000/path/to/key
# Delete
curl -X DELETE http://localhost:3000/path/to/key
# List keys (with optional prefix filter)
curl http://localhost:3000/?prefix=path/to/
Operations
# Rebuild index by scanning all volumes (stop the server first)
mkv -d /tmp/index.db -v http://vol1:8080,http://vol2:8080,http://vol3:8080 -r 2 rebuild
# Rebalance after adding/removing volumes (preview with --dry-run)
mkv -d /tmp/index.db -v http://vol1:8080,http://vol2:8080,http://vol3:8080,http://vol4:8080 -r 2 rebalance --dry-run
mkv -d /tmp/index.db -v http://vol1:8080,http://vol2:8080,http://vol3:8080,http://vol4:8080 -r 2 rebalance
Volume servers
Any nginx with WebDAV enabled works:
server {
listen 80;
root /data;
location / {
dav_methods PUT DELETE;
create_full_put_path on;
autoindex on;
autoindex_format json;
}
}
What it does
- HTTP API — PUT, GET (302 redirect), DELETE, HEAD, LIST with prefix filtering
- Replication — fan-out writes to N volumes concurrently, all-or-nothing with rollback
- Consistent hashing — stable volume assignment; adding/removing a volume only moves ~1/N of keys
- Rebuild — reconstructs the SQLite index by scanning nginx autoindex on all volumes
- Rebalance — migrates data to correct volumes after topology changes, with
--dry-runpreview - Key-as-path — blobs stored at
/{key}on nginx, no content-addressing or sidecar files - Single binary — no config files, everything via CLI flags
What it doesn't do
- Checksums — no integrity verification; bit rot goes undetected
- Auth — no access control; anyone who can reach the server can read/write/delete
- Encryption — blobs stored as plain files on nginx
- Streaming / range requests — entire blob must fit in memory
- Metadata — no EXIF, tags, or content types; key path is all you get
- Versioning — PUT overwrites; no history
- Compression — blobs stored as-is
Comparison to minikeyvalue
mkv is a ground-up rewrite of minikeyvalue in Rust.
| mkv | minikeyvalue | |
|---|---|---|
| Language | Rust | Go |
| Index | SQLite (WAL mode) | LevelDB |
| Storage paths | key-as-path (/{key}) |
content-addressed (md5 + base64) |
| GET behavior | Index lookup, 302 redirect | HEAD to volume first, then 302 redirect |
| PUT overwrite | Allowed | Forbidden (returns 403) |
| Hash function | SHA-256 per volume, sort by score | MD5 per volume, sort by score |
| MD5 of values | No | Yes (stored in index) |
| Health checker | No | No (checks per-request via HEAD) |
| Subvolumes | No | Yes (configurable fan-out directories) |
| Soft delete | No (hard delete) | Yes (UNLINK + DELETE two-phase) |
| S3 API | No | Partial (list, multipart upload) |
| App code | ~600 lines | ~1,000 lines |
| Tests | 17 (unit + integration) | 1 |
Performance (10k keys, 1KB values, 100 concurrency)
Tested on the same machine with shared nginx volumes:
| Operation | mkv | minikeyvalue |
|---|---|---|
| PUT | 10,000 req/s | 10,500 req/s |
| GET (full round-trip) | 7,000 req/s | 6,500 req/s |
| GET (index only) | 15,800 req/s | 13,800 req/s |
| DELETE | 13,300 req/s | 13,600 req/s |
Both are bottlenecked by nginx volume I/O. The index layer (SQLite) can sustain 378,000 writes/sec in isolation.
Error responses
Every error returns a plain-text body with a human-readable message.
| Status | Error | When |
|---|---|---|
404 Not Found |
not found |
GET, HEAD, DELETE for a key that doesn't exist |
500 Internal Server Error |
corrupt record for key {key}: no volumes |
Key exists in index but has no volume locations (data integrity issue) |
500 Internal Server Error |
database error: {detail} |
SQLite failure (disk full, corruption, locked) |
502 Bad Gateway |
not all volume writes succeeded |
PUT where one or more volume writes failed; all volumes are rolled back |
503 Service Unavailable |
need {n} volumes but only {m} available |
PUT when fewer volumes are configured than the replication factor requires |
Failure modes
PUT writes to all target volumes concurrently, then updates the index. If any volume write fails, all volumes are rolled back (best-effort) and the client gets 502. If volume writes succeed but the index update fails, volumes are rolled back and the client gets 500.
DELETE removes the key from the index and issues best-effort deletes to all volumes. Volume delete failures are logged but do not fail the request — the client always gets 204 if the key existed. This can leave orphaned blobs on volumes; use rebuild to reconcile.
GET looks up the key in the index and returns a 302 redirect to the first volume. If the volume is unreachable, the client sees the failure directly from nginx (the index server does not proxy the blob).
Security
mkv assumes a trusted network. There is no built-in authentication, authorization, or encryption. This is the same security model as minikeyvalue — neither system is designed for direct exposure to the public internet.
Trust model
The index server and volume servers (nginx) are expected to live on the same private network. GET requests return a 302 redirect to a volume URL, so clients must be able to reach the volumes directly. Anyone who can reach the index server can read, write, and delete any key. Anyone who can reach a volume can read any blob.
Deploying with auth
Put a reverse proxy in front of the index server and handle authentication there:
- Basic auth or API keys at the reverse proxy for simple setups
- mTLS for machine-to-machine access
- OAuth / JWT validation at the proxy for multi-user setups
Volume servers should be on a private network that clients cannot reach directly, or use nginx's secure_link module to validate signed redirect URLs.
What neither mkv nor minikeyvalue protect against
- Unauthorized reads/writes (no auth)
- Data in transit (no TLS unless the proxy adds it)
- Data at rest (blobs are plain files on disk)
- Malicious keys (no input sanitization beyond what nginx enforces on paths)
- Index tampering (SQLite file has no integrity protection)
Development
Principles
-
Explicit over clever — no magic helpers, no macros that hide control flow, no trait gymnastics. Code reads top-to-bottom. A new reader should understand what a function does without chasing through layers of indirection.
-
Pure functions — isolate decision logic from IO. A function that takes data and returns data is testable, composable, and easy to reason about. Keep it that way. Don't sneak in network calls or logging.
-
Linear flow — avoid callbacks, deep nesting, and async gymnastics where possible. A handler should read like a sequence of steps: look up the record, pick a volume, build the response.
-
Minimize shared state — pass values explicitly. Don't hold locks across IO. Don't reach into globals.
-
Minimize indirection — don't hide logic behind abstractions that exist "in case we need to swap the implementation later." We won't. A three-line function inline is better than a trait with one implementor.
Applying the principles: separate decisions from execution
Every request handler does two things: decides what should happen, then executes IO to make it happen. These should be separate functions.
A decision is a pure function. It takes data in, returns a description of what
to do. It doesn't call the network, doesn't touch the database, doesn't log.
It can be tested with assert_eq! and nothing else.
Execution is the messy part — HTTP calls, SQLite writes, error recovery. It reads the decision and carries it out. It's tested with integration tests.
Where this applies today
Already pure
hasher.rs — the entire module is pure. volumes_for_key is a
deterministic function of its inputs. No IO, no state mutation. This is the
gold standard for the project.
rebalance.rs::plan_rebalance — takes a slice of records and returns a
list of moves. Pure decision logic, tested with unit tests.
db.rs encode/parse — parse_volumes and encode_volumes are pure
transformations between JSON strings and Vec<String>.
Mixed (decision + execution interleaved)
server.rs::put_key — this handler does three things in one function:
- Decide which volumes to write to (pure —
volumes_for_key) - Execute fan-out PUTs to nginx (IO)
- Decide whether to rollback based on results (pure — check which succeeded)
- Execute rollback DELETEs and/or index write (IO)
Steps 1 and 3 could be extracted as pure functions if they grow more complex.
Intentionally impure
rebuild.rs — walks nginx autoindex and bulk-inserts into SQLite. The IO
is the whole point; there's no decision logic worth extracting.
db.rs — wraps SQLite behind Arc<Mutex<Connection>> with
spawn_blocking to avoid blocking the tokio runtime. The mutex serializes all
access; SQLITE_OPEN_NO_MUTEX disables SQLite's internal locking since the
application mutex handles it.
Guidelines
-
If a function takes only data and returns only data, it's pure. Keep it that way. Don't sneak in logging, metrics, or "just one network call."
-
If a handler has an
iformatchthat decides between outcomes, that decision can probably be a pure function. Extract it. Name it. Test it. -
IO boundaries should be thin. Format URL, make request, check status, return bytes. No business logic.
-
Don't over-abstract. A three-line pure function inline in a handler is fine. Extract it when it gets complex enough to need its own tests, or when the same decision appears in multiple places (e.g., rebuild and rebalance both use
volumes_for_key). -
Errors are data.
AppErroris a value, not an exception. Functions returnResult, handlers pattern-match on it. TheIntoResponseimpl is the only place where errors become HTTP responses — one place, one mapping.
Anti-patterns to avoid
-
God handler — a 100-line async fn that reads the DB, calls volumes, makes decisions, handles errors, and formats the response. Break it up.
-
Hidden state reads — if a function needs data, pass it in. Don't reach into a global or lock a mutex inside a "pure" function.
-
Testing IO to test logic — if you need a Docker container running to test whether volume selection works correctly, the logic isn't separated from the IO.