Make plan
This commit is contained in:
parent
2a2afa5f69
commit
23a075382f
3 changed files with 386 additions and 0 deletions
157
PHILOSOPHY.md
Normal file
157
PHILOSOPHY.md
Normal file
|
|
@ -0,0 +1,157 @@
|
|||
# Project Philosophy
|
||||
|
||||
## Principles
|
||||
|
||||
1. **Explicit over clever** — no magic helpers, no macros that hide control
|
||||
flow, no trait gymnastics. Code reads top-to-bottom. A new reader should
|
||||
understand what a function does without chasing through layers of
|
||||
indirection.
|
||||
|
||||
2. **Pure functions** — isolate decision logic from IO. A function that takes
|
||||
data and returns data is testable, composable, and easy to reason about.
|
||||
Keep it that way. Don't sneak in network calls or logging.
|
||||
|
||||
3. **Linear flow** — avoid callbacks, deep nesting, and async gymnastics where
|
||||
possible. A handler should read like a sequence of steps: look up the
|
||||
record, pick a volume, build the response.
|
||||
|
||||
4. **Minimize shared state** — pass values explicitly. The handler reads the
|
||||
healthy volume set as a snapshot, then works with that snapshot. Don't hold
|
||||
locks across IO. Don't reach into globals.
|
||||
|
||||
5. **Minimize indirection** — don't hide logic behind abstractions that exist
|
||||
"in case we need to swap the implementation later." We won't. A three-line
|
||||
function inline is better than a trait with one implementor.
|
||||
|
||||
## Applying the principles: separate decisions from execution
|
||||
|
||||
Every request handler does two things: **decides** what should happen, then
|
||||
**executes** IO to make it happen. These should be separate functions.
|
||||
|
||||
A decision is a pure function. It takes data in, returns a description of what
|
||||
to do. It doesn't call the network, doesn't touch the database, doesn't log.
|
||||
It can be tested with `assert_eq!` and nothing else.
|
||||
|
||||
Execution is the messy part — HTTP calls, SQLite writes, error recovery. It
|
||||
reads the decision and carries it out. It's tested with integration tests.
|
||||
|
||||
## Where this applies today
|
||||
|
||||
### Already pure
|
||||
|
||||
**`hasher.rs`** — the entire module is pure. `Ring` is a data structure.
|
||||
`get_volumes` and `key_path` are deterministic functions of their inputs. No
|
||||
IO, no state mutation beyond construction. This is the gold standard for the
|
||||
project.
|
||||
|
||||
**`db.rs` query functions** — `get`, `list_keys`, `all_records` take a
|
||||
`&Connection` and return data. The connection is injected, not owned. The
|
||||
functions don't decide what to do with the data — they just retrieve it.
|
||||
|
||||
**`db.rs` encode/parse** — `parse_volumes` and `encode_volumes` are pure
|
||||
transformations between JSON strings and `Vec<String>`.
|
||||
|
||||
### Mixed (decision + execution interleaved)
|
||||
|
||||
**`server.rs::put_key`** — this handler does three things in one function:
|
||||
|
||||
1. *Decide* which volumes to write to (pure — ring lookup)
|
||||
2. *Execute* fan-out PUTs to nginx (IO)
|
||||
3. *Decide* whether to rollback based on results (pure — check which succeeded)
|
||||
4. *Execute* rollback DELETEs and/or index write (IO)
|
||||
|
||||
Steps 1 and 3 could be extracted as pure functions:
|
||||
|
||||
```rust
|
||||
// Pure: given a key and ring, compute the placement plan
|
||||
struct PutPlan {
|
||||
path: String,
|
||||
target_volumes: Vec<String>,
|
||||
}
|
||||
|
||||
fn plan_put(ring: &Ring, key: &str, replication: usize) -> Result<PutPlan, AppError> {
|
||||
let path = Ring::key_path(key);
|
||||
let target_volumes = ring.get_volumes(key, replication);
|
||||
if target_volumes.len() < replication {
|
||||
return Err(AppError::VolumeError(...));
|
||||
}
|
||||
Ok(PutPlan { path, target_volumes })
|
||||
}
|
||||
|
||||
// Pure: given fan-out results, decide what to do next
|
||||
enum PutOutcome {
|
||||
AllSucceeded { volumes: Vec<String> },
|
||||
NeedsRollback { succeeded: Vec<String> },
|
||||
}
|
||||
|
||||
fn evaluate_put_results(results: &[(String, Result<(), String>)]) -> PutOutcome { ... }
|
||||
```
|
||||
|
||||
**`server.rs::get_key`** — the "pick a healthy volume" logic is a pure
|
||||
function hiding inside an async handler:
|
||||
|
||||
```rust
|
||||
// Pure: given a record's volumes and the healthy set, pick one
|
||||
fn pick_healthy_volume<'a>(
|
||||
record_volumes: &'a [String],
|
||||
healthy: &HashSet<String>,
|
||||
) -> Option<&'a str> {
|
||||
record_volumes.iter().find(|v| healthy.contains(*v)).map(|v| v.as_str())
|
||||
}
|
||||
```
|
||||
|
||||
### Intentionally impure
|
||||
|
||||
**`volume.rs`** — this is an IO boundary. It wraps `reqwest` and talks to
|
||||
nginx. There's no decision logic here to extract; it's a thin adapter. Testing
|
||||
it means mocking HTTP. That's fine.
|
||||
|
||||
**`health.rs`** — a side-effecting loop. It polls volumes and mutates shared
|
||||
state. No pure core to extract. Keep it simple.
|
||||
|
||||
**`db.rs` writer thread** — the batch-and-commit loop is inherently stateful.
|
||||
The `execute_cmd` function is close to pure (it takes a connection and a
|
||||
command, returns a result), but it mutates the database. The batching logic
|
||||
(drain channel, group into transaction) is a state machine. Not worth
|
||||
abstracting further.
|
||||
|
||||
## Guidelines
|
||||
|
||||
1. **If a function takes only data and returns only data, it's pure.** Keep it
|
||||
that way. Don't sneak in logging, metrics, or "just one network call."
|
||||
|
||||
2. **If a handler has an `if` or `match` that decides between outcomes, that
|
||||
decision can probably be a pure function.** Extract it. Name it. Test it.
|
||||
|
||||
3. **IO boundaries should be thin.** `volume.rs` is a good example: format URL,
|
||||
make request, check status, return bytes. No business logic.
|
||||
|
||||
4. **Don't over-abstract.** A three-line pure function inline in a handler is
|
||||
fine. Extract it when it gets complex enough to need its own tests, or when
|
||||
the same decision appears in multiple places (e.g., rebuild and rebalance
|
||||
both need "compute desired placement").
|
||||
|
||||
5. **Shared state should be read-only snapshots when possible.** The handler
|
||||
reads `healthy_volumes` and `ring` under a read lock, then releases it
|
||||
before doing IO. This keeps the critical section small and makes the
|
||||
decision logic operate on a snapshot, not live-mutating state.
|
||||
|
||||
6. **Errors are data.** `AppError` is a value, not an exception. Functions
|
||||
return `Result`, handlers pattern-match on it. The `IntoResponse` impl is
|
||||
the only place where errors become HTTP responses — one place, one mapping.
|
||||
|
||||
## Anti-patterns to avoid
|
||||
|
||||
- **God handler** — a 100-line async fn that reads the DB, calls volumes, makes
|
||||
decisions, handles errors, and formats the response. Break it up.
|
||||
|
||||
- **Stringly-typed errors in business logic** — `volume.rs` uses `String` errors
|
||||
because it's an IO boundary and the strings are for logging. Decision
|
||||
functions should use typed errors.
|
||||
|
||||
- **Hidden state reads** — if a function needs the healthy volume set, pass it
|
||||
in. Don't reach into a global or lock a mutex inside a "pure" function.
|
||||
|
||||
- **Testing IO to test logic** — if you need a Docker container running to test
|
||||
whether "pick a healthy volume" works correctly, the logic isn't separated
|
||||
from the IO.
|
||||
169
TESTING.md
Normal file
169
TESTING.md
Normal file
|
|
@ -0,0 +1,169 @@
|
|||
# Testing Strategy
|
||||
|
||||
## Layers
|
||||
|
||||
### 1. Unit tests (no IO)
|
||||
|
||||
Pure logic that can be tested with `cargo test`, no containers needed.
|
||||
|
||||
**`hasher.rs`** (already done)
|
||||
- Deterministic placement for the same key
|
||||
- All volumes appear when requesting full replication
|
||||
- Even distribution across volumes
|
||||
- Correct key path format
|
||||
|
||||
**`db.rs`** (TODO)
|
||||
- Open an in-memory SQLite (`:memory:`)
|
||||
- Test put/get/delete/list_keys/all_records round-trip
|
||||
- Test upsert behavior (put same key twice)
|
||||
- Test soft delete (deleted flag)
|
||||
- Test bulk_put
|
||||
|
||||
**Pure decision functions** (TODO, after refactor)
|
||||
- Given a record and a set of healthy volumes, which volume to redirect to?
|
||||
- Given fan-out results (list of Ok/Err), which volumes succeeded? Should we rollback?
|
||||
- Given current vs desired volume placement, what needs to move?
|
||||
|
||||
### 2. Volume client tests (mock HTTP)
|
||||
|
||||
Use a lightweight HTTP server in-process (e.g. `axum` itself or `wiremock`) to test `volume.rs` without real nginx.
|
||||
|
||||
- PUT blob + .key sidecar → verify both requests made
|
||||
- GET blob → verify body returned
|
||||
- DELETE blob → verify both blob and .key deleted
|
||||
- DELETE non-existent → verify 404 is treated as success
|
||||
- Health check → respond 200 → healthy; timeout → unhealthy
|
||||
|
||||
### 3. Integration tests (real nginx)
|
||||
|
||||
Full end-to-end with Docker containers. These are slower but catch real issues.
|
||||
|
||||
#### Setup
|
||||
|
||||
```yaml
|
||||
# docker-compose.test.yml
|
||||
services:
|
||||
vol1:
|
||||
image: nginx:alpine
|
||||
volumes:
|
||||
- ./tests/nginx.conf:/etc/nginx/conf.d/default.conf
|
||||
- vol1_data:/data
|
||||
ports: ["3101:80"]
|
||||
|
||||
vol2:
|
||||
image: nginx:alpine
|
||||
volumes:
|
||||
- ./tests/nginx.conf:/etc/nginx/conf.d/default.conf
|
||||
- vol2_data:/data
|
||||
ports: ["3102:80"]
|
||||
|
||||
vol3:
|
||||
image: nginx:alpine
|
||||
volumes:
|
||||
- ./tests/nginx.conf:/etc/nginx/conf.d/default.conf
|
||||
- vol3_data:/data
|
||||
ports: ["3103:80"]
|
||||
|
||||
volumes:
|
||||
vol1_data:
|
||||
vol2_data:
|
||||
vol3_data:
|
||||
```
|
||||
|
||||
```nginx
|
||||
# tests/nginx.conf
|
||||
server {
|
||||
listen 80;
|
||||
root /data;
|
||||
|
||||
location / {
|
||||
dav_methods PUT DELETE;
|
||||
create_full_put_path on;
|
||||
autoindex on;
|
||||
autoindex_format json;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
```toml
|
||||
# tests/test_config.toml
|
||||
[server]
|
||||
port = 3100
|
||||
replication_factor = 2
|
||||
virtual_nodes = 100
|
||||
|
||||
[database]
|
||||
path = "/tmp/mkv-test/index.db"
|
||||
|
||||
[[volumes]]
|
||||
url = "http://localhost:3101"
|
||||
|
||||
[[volumes]]
|
||||
url = "http://localhost:3102"
|
||||
|
||||
[[volumes]]
|
||||
url = "http://localhost:3103"
|
||||
```
|
||||
|
||||
#### Test cases
|
||||
|
||||
**Happy path**
|
||||
1. PUT `/hello` with body `"world"` → 201
|
||||
2. HEAD `/hello` → 200, Content-Length: 5
|
||||
3. GET `/hello` → 302 to a volume URL
|
||||
4. Follow redirect → body is `"world"`
|
||||
5. GET `/` → list contains `"hello"`
|
||||
6. DELETE `/hello` → 204
|
||||
7. GET `/hello` → 404
|
||||
|
||||
**Replication verification**
|
||||
1. PUT `/replicated` → 201
|
||||
2. Read SQLite directly, verify 2 volumes listed
|
||||
3. GET blob from both volume URLs directly, verify identical content
|
||||
|
||||
**Volume failure**
|
||||
1. PUT `/failtest` → 201
|
||||
2. Stop vol1 container
|
||||
3. GET `/failtest` → should still 302 to vol2 (healthy replica)
|
||||
4. PUT `/new-during-failure` → should fail if replication_factor can't be met, or succeed on remaining volumes depending on ring placement
|
||||
5. Restart vol1
|
||||
|
||||
**Rebuild**
|
||||
1. PUT several keys
|
||||
2. Delete the SQLite database
|
||||
3. Run `mkv rebuild`
|
||||
4. GET all keys → should all still work
|
||||
|
||||
**Rebalance**
|
||||
1. PUT several keys with 3 volumes
|
||||
2. Add a 4th volume to config
|
||||
3. Run `mkv rebalance --dry-run` → verify output
|
||||
4. Run `mkv rebalance` → verify keys migrated
|
||||
5. GET all keys → should all work
|
||||
|
||||
#### Running integration tests
|
||||
|
||||
```bash
|
||||
# Start volumes
|
||||
docker compose -f docker-compose.test.yml up -d
|
||||
|
||||
# Wait for nginx to be ready
|
||||
sleep 2
|
||||
|
||||
# Run integration tests
|
||||
cargo test --test integration
|
||||
|
||||
# Tear down
|
||||
docker compose -f docker-compose.test.yml down -v
|
||||
```
|
||||
|
||||
The integration test binary (`tests/integration.rs`) starts the mkv server
|
||||
in-process on a random port, runs all test cases sequentially (shared state),
|
||||
then shuts down.
|
||||
|
||||
### 4. What we don't test
|
||||
|
||||
- Performance / benchmarks (follow-up)
|
||||
- TLS, auth (not implemented)
|
||||
- Concurrent writers racing on the same key (SQLite serializes this correctly by design)
|
||||
- Blobs > available RAM (streaming is a follow-up)
|
||||
60
TODO.md
Normal file
60
TODO.md
Normal file
|
|
@ -0,0 +1,60 @@
|
|||
# mkv TODO
|
||||
|
||||
## Done
|
||||
|
||||
- [x] SQLite schema + pragmas + ReadPool + WriterHandle (`db.rs`)
|
||||
- [x] Config parsing with validation (`config.rs`)
|
||||
- [x] Error types with axum IntoResponse (`error.rs`)
|
||||
- [x] Consistent hash ring with virtual nodes (`hasher.rs`)
|
||||
- [x] Volume HTTP client with .key sidecar writes (`volume.rs`)
|
||||
- [x] Health checker background task (`health.rs`)
|
||||
- [x] HTTP handlers: GET (302 redirect), PUT (fan-out), DELETE, HEAD, list (`server.rs`)
|
||||
- [x] CLI with clap subcommands (`main.rs`)
|
||||
- [x] Wiring: state, routes, health checker startup
|
||||
|
||||
## Remaining
|
||||
|
||||
### Core features
|
||||
|
||||
- [ ] **Rebuild tool** (`tools/rebuild.rs`)
|
||||
- Scan all volumes for `.key` sidecar files (via nginx autoindex JSON)
|
||||
- Read each `.key` file to recover the original key name
|
||||
- Verify the blob exists alongside each `.key`
|
||||
- Reconstruct SQLite index, merging replicas (same key on multiple volumes)
|
||||
- Uses `db::bulk_put` for efficient batch inserts
|
||||
|
||||
- [ ] **Rebalance tool** (`tools/rebalance.rs`)
|
||||
- Load current index, compute desired placement via new ring config
|
||||
- Diff current vs desired volumes per key
|
||||
- Copy blobs to new volumes, delete from old ones
|
||||
- Update SQLite records
|
||||
- `--dry-run` flag to preview without acting
|
||||
- Progress output: keys moved, bytes transferred, errors
|
||||
|
||||
### Refactoring
|
||||
|
||||
- [ ] **Extract pure decision logic from handlers** (see PHILOSOPHY.md)
|
||||
- `put_key`: separate "compute placement" from "execute IO"
|
||||
- `get_key`: separate "pick healthy volume" from "build redirect"
|
||||
- Make fan-out results processing a pure function
|
||||
|
||||
- [ ] **Typed volume errors** — replace `String` errors in `volume.rs` with a proper enum
|
||||
|
||||
### Testing
|
||||
|
||||
- [ ] **Unit tests for `db.rs`** — CRUD with an in-memory SQLite
|
||||
- [ ] **Integration test harness** (see TESTING.md)
|
||||
- Docker Compose with 3 nginx volume containers
|
||||
- Rust integration tests that spin up the full server
|
||||
- PUT/GET/DELETE/HEAD happy path
|
||||
- Failure test: kill a volume mid-test, verify reads still work from replica
|
||||
- Rebalance test: add 4th volume, rebalance, verify all keys accessible
|
||||
- [ ] **Property-based tests for hasher** — verify ring invariants under add/remove
|
||||
|
||||
### Polish
|
||||
|
||||
- [ ] **Graceful shutdown** — drain in-flight requests, flush writer
|
||||
- [ ] **Metrics** — request count, latency histograms, volume error rates
|
||||
- [ ] **Request ID / tracing spans** — per-request trace context
|
||||
- [ ] **Config reload** — SIGHUP to reload config without restart
|
||||
- [ ] **Rate limiting on list endpoint** — prefix scan can be expensive
|
||||
Loading…
Add table
Add a link
Reference in a new issue