Correct philosophy document

This commit is contained in:
Silas Brack 2026-03-07 16:05:15 +01:00
parent 984953502b
commit e67f476ca4

View file

@ -15,9 +15,8 @@
possible. A handler should read like a sequence of steps: look up the possible. A handler should read like a sequence of steps: look up the
record, pick a volume, build the response. record, pick a volume, build the response.
4. **Minimize shared state** — pass values explicitly. The handler reads the 4. **Minimize shared state** — pass values explicitly. Don't hold locks across
healthy volume set as a snapshot, then works with that snapshot. Don't hold IO. Don't reach into globals.
locks across IO. Don't reach into globals.
5. **Minimize indirection** — don't hide logic behind abstractions that exist 5. **Minimize indirection** — don't hide logic behind abstractions that exist
"in case we need to swap the implementation later." We won't. A three-line "in case we need to swap the implementation later." We won't. A three-line
@ -39,14 +38,12 @@ reads the decision and carries it out. It's tested with integration tests.
### Already pure ### Already pure
**`hasher.rs`** — the entire module is pure. `Ring` is a data structure. **`hasher.rs`** — the entire module is pure. `volumes_for_key` is a
`get_volumes` and `key_path` are deterministic functions of their inputs. No deterministic function of its inputs. No IO, no state mutation. This is the
IO, no state mutation beyond construction. This is the gold standard for the gold standard for the project.
project.
**`db.rs` query functions** — `get`, `list_keys`, `all_records` take a **`rebalance.rs::plan_rebalance`** — takes a slice of records and returns a
`&Connection` and return data. The connection is injected, not owned. The list of moves. Pure decision logic, tested with unit tests.
functions don't decide what to do with the data — they just retrieve it.
**`db.rs` encode/parse** — `parse_volumes` and `encode_volumes` are pure **`db.rs` encode/parse** — `parse_volumes` and `encode_volumes` are pure
transformations between JSON strings and `Vec<String>`. transformations between JSON strings and `Vec<String>`.
@ -55,65 +52,22 @@ transformations between JSON strings and `Vec<String>`.
**`server.rs::put_key`** — this handler does three things in one function: **`server.rs::put_key`** — this handler does three things in one function:
1. *Decide* which volumes to write to (pure — ring lookup) 1. *Decide* which volumes to write to (pure — `volumes_for_key`)
2. *Execute* fan-out PUTs to nginx (IO) 2. *Execute* fan-out PUTs to nginx (IO)
3. *Decide* whether to rollback based on results (pure — check which succeeded) 3. *Decide* whether to rollback based on results (pure — check which succeeded)
4. *Execute* rollback DELETEs and/or index write (IO) 4. *Execute* rollback DELETEs and/or index write (IO)
Steps 1 and 3 could be extracted as pure functions: Steps 1 and 3 could be extracted as pure functions if they grow more complex.
```rust
// Pure: given a key and ring, compute the placement plan
struct PutPlan {
path: String,
target_volumes: Vec<String>,
}
fn plan_put(ring: &Ring, key: &str, replication: usize) -> Result<PutPlan, AppError> {
let path = Ring::key_path(key);
let target_volumes = ring.get_volumes(key, replication);
if target_volumes.len() < replication {
return Err(AppError::VolumeError(...));
}
Ok(PutPlan { path, target_volumes })
}
// Pure: given fan-out results, decide what to do next
enum PutOutcome {
AllSucceeded { volumes: Vec<String> },
NeedsRollback { succeeded: Vec<String> },
}
fn evaluate_put_results(results: &[(String, Result<(), String>)]) -> PutOutcome { ... }
```
**`server.rs::get_key`** — the "pick a healthy volume" logic is a pure
function hiding inside an async handler:
```rust
// Pure: given a record's volumes and the healthy set, pick one
fn pick_healthy_volume<'a>(
record_volumes: &'a [String],
healthy: &HashSet<String>,
) -> Option<&'a str> {
record_volumes.iter().find(|v| healthy.contains(*v)).map(|v| v.as_str())
}
```
### Intentionally impure ### Intentionally impure
**`volume.rs`** — this is an IO boundary. It wraps `reqwest` and talks to **`rebuild.rs`** — walks nginx autoindex and bulk-inserts into SQLite. The IO
nginx. There's no decision logic here to extract; it's a thin adapter. Testing is the whole point; there's no decision logic worth extracting.
it means mocking HTTP. That's fine.
**`health.rs`** — a side-effecting loop. It polls volumes and mutates shared **`db.rs`** — wraps SQLite behind `Arc<Mutex<Connection>>` with
state. No pure core to extract. Keep it simple. `spawn_blocking` to avoid blocking the tokio runtime. The mutex serializes all
access; `SQLITE_OPEN_NO_MUTEX` disables SQLite's internal locking since the
**`db.rs` writer thread** — the batch-and-commit loop is inherently stateful. application mutex handles it.
The `execute_cmd` function is close to pure (it takes a connection and a
command, returns a result), but it mutates the database. The batching logic
(drain channel, group into transaction) is a state machine. Not worth
abstracting further.
## Guidelines ## Guidelines
@ -123,20 +77,15 @@ abstracting further.
2. **If a handler has an `if` or `match` that decides between outcomes, that 2. **If a handler has an `if` or `match` that decides between outcomes, that
decision can probably be a pure function.** Extract it. Name it. Test it. decision can probably be a pure function.** Extract it. Name it. Test it.
3. **IO boundaries should be thin.** `volume.rs` is a good example: format URL, 3. **IO boundaries should be thin.** Format URL, make request, check status,
make request, check status, return bytes. No business logic. return bytes. No business logic.
4. **Don't over-abstract.** A three-line pure function inline in a handler is 4. **Don't over-abstract.** A three-line pure function inline in a handler is
fine. Extract it when it gets complex enough to need its own tests, or when fine. Extract it when it gets complex enough to need its own tests, or when
the same decision appears in multiple places (e.g., rebuild and rebalance the same decision appears in multiple places (e.g., rebuild and rebalance
both need "compute desired placement"). both use `volumes_for_key`).
5. **Shared state should be read-only snapshots when possible.** The handler 5. **Errors are data.** `AppError` is a value, not an exception. Functions
reads `healthy_volumes` and `ring` under a read lock, then releases it
before doing IO. This keeps the critical section small and makes the
decision logic operate on a snapshot, not live-mutating state.
6. **Errors are data.** `AppError` is a value, not an exception. Functions
return `Result`, handlers pattern-match on it. The `IntoResponse` impl is return `Result`, handlers pattern-match on it. The `IntoResponse` impl is
the only place where errors become HTTP responses — one place, one mapping. the only place where errors become HTTP responses — one place, one mapping.
@ -145,13 +94,9 @@ abstracting further.
- **God handler** — a 100-line async fn that reads the DB, calls volumes, makes - **God handler** — a 100-line async fn that reads the DB, calls volumes, makes
decisions, handles errors, and formats the response. Break it up. decisions, handles errors, and formats the response. Break it up.
- **Stringly-typed errors in business logic**`volume.rs` uses `String` errors - **Hidden state reads** — if a function needs data, pass it in. Don't reach
because it's an IO boundary and the strings are for logging. Decision into a global or lock a mutex inside a "pure" function.
functions should use typed errors.
- **Hidden state reads** — if a function needs the healthy volume set, pass it
in. Don't reach into a global or lock a mutex inside a "pure" function.
- **Testing IO to test logic** — if you need a Docker container running to test - **Testing IO to test logic** — if you need a Docker container running to test
whether "pick a healthy volume" works correctly, the logic isn't separated whether volume selection works correctly, the logic isn't separated from the
from the IO. IO.