~bigbes/lethe

ab6efef20d1f15a9c5da17ded758114abb40aeb8 — Eugene Blikh 24 days ago 20e5301
docs: clarify collector plan assumptions
1 files changed, 11 insertions(+), 4 deletions(-)

M docs/tasks/lethe-collector-claude-code.md
M docs/tasks/lethe-collector-claude-code.md => docs/tasks/lethe-collector-claude-code.md +11 -4
@@ 134,10 134,17 @@ log:
- *Parse-then-batch vs streaming POST:* batching keeps the wire protocol simple (NDJSON body, one HTTP call) and lets the server commit chunks atomically. Streaming would force the server to handle interrupted bodies — the RFC's chunked-commit response shape works because the body is bounded.
- *Synthesize missing turn_ids vs require source IDs:* Claude Code always provides UUIDs in current versions, but the parser can't assume that holds for older fixture files or future regressions. Synthesis preserves idempotency; the rare case of a `content[:64]` collision within one session at one timestamp is acceptable.

**Unknowns that remain.**
- Whether `tailscale serve` injects `Tailscale-User-Login` for daemon HTTP clients (vs only browsers). If not, I add a `lethe-token` shared-secret fallback header in the deploy step — a 5-line server change. Confirmed empirically before declaring this task done.
- True line-size distribution of Claude Code `.jsonl` events. If it exceeds `bufio.Scanner`'s default 64 KiB token buffer, the parser uses `Scanner.Buffer(buf, maxSize)` with maxSize = 16 MiB. Captured here so the test fixtures cover the long-line case.
- Whether the laptop's `~/.claude/projects/` ever contains files concurrent-written from multiple Claude Code processes. If yes, the parser still works (append-only, monotonic offset), but the test plan should cover it.
### Assumptions

- AS1 — The server-side ingest contract remains the locked `internal/shared/wire.TurnEvent` over `POST /api/v1/ingest`.
- AS2 — Claude Code transcript files are append-only for the byte ranges the collector has already read.
- AS3 — This task has one host identity and one configured Claude Code source root per collector process.

### Unknowns

- UK1 — Whether `tailscale serve` injects `Tailscale-User-Login` for daemon HTTP clients.
- UK2 — True line-size distribution of Claude Code `.jsonl` events.
- UK3 — Whether the laptop's `~/.claude/projects/` ever contains files concurrent-written from multiple Claude Code processes.

### Backwards-compatibility check