From e67f476ca474ec26147eef145c9bedf7447a83d6 Mon Sep 17 00:00:00 2001 From: Silas Brack Date: Sat, 7 Mar 2026 16:05:15 +0100 Subject: [PATCH] Correct philosophy document --- PHILOSOPHY.md | 101 ++++++++++++-------------------------------------- 1 file changed, 23 insertions(+), 78 deletions(-) diff --git a/PHILOSOPHY.md b/PHILOSOPHY.md index 40c90a7..683056d 100644 --- a/PHILOSOPHY.md +++ b/PHILOSOPHY.md @@ -15,9 +15,8 @@ possible. A handler should read like a sequence of steps: look up the record, pick a volume, build the response. -4. **Minimize shared state** — pass values explicitly. The handler reads the - healthy volume set as a snapshot, then works with that snapshot. Don't hold - locks across IO. Don't reach into globals. +4. **Minimize shared state** — pass values explicitly. Don't hold locks across + IO. Don't reach into globals. 5. **Minimize indirection** — don't hide logic behind abstractions that exist "in case we need to swap the implementation later." We won't. A three-line @@ -39,14 +38,12 @@ reads the decision and carries it out. It's tested with integration tests. ### Already pure -**`hasher.rs`** — the entire module is pure. `Ring` is a data structure. -`get_volumes` and `key_path` are deterministic functions of their inputs. No -IO, no state mutation beyond construction. This is the gold standard for the -project. +**`hasher.rs`** — the entire module is pure. `volumes_for_key` is a +deterministic function of its inputs. No IO, no state mutation. This is the +gold standard for the project. -**`db.rs` query functions** — `get`, `list_keys`, `all_records` take a -`&Connection` and return data. The connection is injected, not owned. The -functions don't decide what to do with the data — they just retrieve it. +**`rebalance.rs::plan_rebalance`** — takes a slice of records and returns a +list of moves. Pure decision logic, tested with unit tests. **`db.rs` encode/parse** — `parse_volumes` and `encode_volumes` are pure transformations between JSON strings and `Vec`. @@ -55,65 +52,22 @@ transformations between JSON strings and `Vec`. **`server.rs::put_key`** — this handler does three things in one function: -1. *Decide* which volumes to write to (pure — ring lookup) +1. *Decide* which volumes to write to (pure — `volumes_for_key`) 2. *Execute* fan-out PUTs to nginx (IO) 3. *Decide* whether to rollback based on results (pure — check which succeeded) 4. *Execute* rollback DELETEs and/or index write (IO) -Steps 1 and 3 could be extracted as pure functions: - -```rust -// Pure: given a key and ring, compute the placement plan -struct PutPlan { - path: String, - target_volumes: Vec, -} - -fn plan_put(ring: &Ring, key: &str, replication: usize) -> Result { - let path = Ring::key_path(key); - let target_volumes = ring.get_volumes(key, replication); - if target_volumes.len() < replication { - return Err(AppError::VolumeError(...)); - } - Ok(PutPlan { path, target_volumes }) -} - -// Pure: given fan-out results, decide what to do next -enum PutOutcome { - AllSucceeded { volumes: Vec }, - NeedsRollback { succeeded: Vec }, -} - -fn evaluate_put_results(results: &[(String, Result<(), String>)]) -> PutOutcome { ... } -``` - -**`server.rs::get_key`** — the "pick a healthy volume" logic is a pure -function hiding inside an async handler: - -```rust -// Pure: given a record's volumes and the healthy set, pick one -fn pick_healthy_volume<'a>( - record_volumes: &'a [String], - healthy: &HashSet, -) -> Option<&'a str> { - record_volumes.iter().find(|v| healthy.contains(*v)).map(|v| v.as_str()) -} -``` +Steps 1 and 3 could be extracted as pure functions if they grow more complex. ### Intentionally impure -**`volume.rs`** — this is an IO boundary. It wraps `reqwest` and talks to -nginx. There's no decision logic here to extract; it's a thin adapter. Testing -it means mocking HTTP. That's fine. +**`rebuild.rs`** — walks nginx autoindex and bulk-inserts into SQLite. The IO +is the whole point; there's no decision logic worth extracting. -**`health.rs`** — a side-effecting loop. It polls volumes and mutates shared -state. No pure core to extract. Keep it simple. - -**`db.rs` writer thread** — the batch-and-commit loop is inherently stateful. -The `execute_cmd` function is close to pure (it takes a connection and a -command, returns a result), but it mutates the database. The batching logic -(drain channel, group into transaction) is a state machine. Not worth -abstracting further. +**`db.rs`** — wraps SQLite behind `Arc>` with +`spawn_blocking` to avoid blocking the tokio runtime. The mutex serializes all +access; `SQLITE_OPEN_NO_MUTEX` disables SQLite's internal locking since the +application mutex handles it. ## Guidelines @@ -123,20 +77,15 @@ abstracting further. 2. **If a handler has an `if` or `match` that decides between outcomes, that decision can probably be a pure function.** Extract it. Name it. Test it. -3. **IO boundaries should be thin.** `volume.rs` is a good example: format URL, - make request, check status, return bytes. No business logic. +3. **IO boundaries should be thin.** Format URL, make request, check status, + return bytes. No business logic. 4. **Don't over-abstract.** A three-line pure function inline in a handler is fine. Extract it when it gets complex enough to need its own tests, or when the same decision appears in multiple places (e.g., rebuild and rebalance - both need "compute desired placement"). + both use `volumes_for_key`). -5. **Shared state should be read-only snapshots when possible.** The handler - reads `healthy_volumes` and `ring` under a read lock, then releases it - before doing IO. This keeps the critical section small and makes the - decision logic operate on a snapshot, not live-mutating state. - -6. **Errors are data.** `AppError` is a value, not an exception. Functions +5. **Errors are data.** `AppError` is a value, not an exception. Functions return `Result`, handlers pattern-match on it. The `IntoResponse` impl is the only place where errors become HTTP responses — one place, one mapping. @@ -145,13 +94,9 @@ abstracting further. - **God handler** — a 100-line async fn that reads the DB, calls volumes, makes decisions, handles errors, and formats the response. Break it up. -- **Stringly-typed errors in business logic** — `volume.rs` uses `String` errors - because it's an IO boundary and the strings are for logging. Decision - functions should use typed errors. - -- **Hidden state reads** — if a function needs the healthy volume set, pass it - in. Don't reach into a global or lock a mutex inside a "pure" function. +- **Hidden state reads** — if a function needs data, pass it in. Don't reach + into a global or lock a mutex inside a "pure" function. - **Testing IO to test logic** — if you need a Docker container running to test - whether "pick a healthy volume" works correctly, the logic isn't separated - from the IO. + whether volume selection works correctly, the logic isn't separated from the + IO.