From dcc2805a6e14ca0169751edb755aeac42d086d17 Mon Sep 17 00:00:00 2001 From: Eugene Blikh Date: Sun, 3 May 2026 20:22:57 +0300 Subject: [PATCH] docs: record collector final fixes --- docs/tasks/lethe-collector-claude-code.md | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/docs/tasks/lethe-collector-claude-code.md b/docs/tasks/lethe-collector-claude-code.md index 92d28e2c8c27ea89bce4020b05355515bf89b8bc..f7d1eb8ee9d459d130bdea3eee180a137f1e225f 100644 --- a/docs/tasks/lethe-collector-claude-code.md +++ b/docs/tasks/lethe-collector-claude-code.md @@ -159,10 +159,6 @@ Greenfield collector. The only interface contract this task can break is the wir - udesign: TOML → YAML for collector config — consistent with the server's config format from #1; one parser, one mental model. - udesign: `parentUuid` chaining of resumed Claude Code sessions deferred — every `.jsonl` is one session in this task. Surfacing chains is a UI concern for later. - udesign: synthesized `turn_id` uses `sha256(session_id || seq || timestamp || content[:64])[:16]` — `content[:64]` is enough to disambiguate within a single timestamp; full-content hash would balloon for large turns. -- ureview: fixed partial-accept offset handling — skipped server-rejected rows so one bad turn cannot stall a source file. -- ureview: bounded daemon drain uses `http.timeout` — no separate `shutdown_grace` config exists for the collector. -- ureview: backfill offset-0 semantics are implemented as a dedicated `RunBackfillOnce` function rather than a mode flag on `RunOnce` — explicit call sites are safer than a boolean parameter that could be misused in daemon loops. - TDD: yes (reason: parser behavior on golden fixture `.jsonl` files, offset persistence/resume semantics, outbox replay, and idempotent re-POST behavior are exactly the deterministic regression-prone surfaces TDD is good for. CLI scaffolding and systemd unit are exempt.) ### Invariants @@ -292,20 +288,21 @@ Positive: - CK1 — `go test ./... -count=1` passes. - CK2 — `go build ./cmd/lethe-collector` succeeds. - CK3 — `lethe-collector status` with a minimal config opens the state DB and reports the configured source. +- CK4 — runner tests cover offset-0 backfill and bounded daemon drain. Negative: -- CK4 — `lethe-collector status --config ./tmp/missing.yaml` exits non-zero with `CONFIG_NOT_FOUND` surfaced. +- CK5 — `lethe-collector status --config ./tmp/missing.yaml` exits non-zero with `CONFIG_NOT_FOUND` surfaced. Invariants / assumptions: -- CK5 (IV7) — `internal/collector` has no `os.Hostname` call; host flows from collector config. -- CK6 (IV9, AS1) — sender posts only `TurnEvent` NDJSON to `/api/v1/ingest`. -- CK7 (UK1) — Tailscale header injection remains unverifiable without the deployed Tailscale path. +- CK6 (IV7) — `internal/collector` has no `os.Hostname` call; host flows from collector config. +- CK7 (IV9, AS1) — sender posts only `TurnEvent` NDJSON to `/api/v1/ingest`. +- CK8 (UK1) — Tailscale header injection remains unverifiable without the deployed Tailscale path. Interfaces: -- CK8 (IF1) — `config.Load(path string) (*Config, error)` is exercised by CLI and config tests. -- CK9 (IF2) — `state.Store` offset/outbox methods are exercised by runner and state tests. -- CK10 (IF3) — `Sender.PostBatch(ctx, events)` is exercised by sender, outbox, and runner tests. -- CK11 (IF4) — `RunOnce` / `RunDaemon` are exercised by CLI wiring and runner tests. +- CK9 (IF1) — `config.Load(path string) (*Config, error)` is exercised by CLI and config tests. +- CK10 (IF2) — `state.Store` offset/outbox methods are exercised by runner and state tests. +- CK11 (IF3) — `Sender.PostBatch(ctx, events)` is exercised by sender, outbox, and runner tests. +- CK12 (IF4) — `RunOnce` / `RunDaemon` are exercised by CLI wiring and runner tests. Smoke: `go run ./cmd/lethe-collector --config ./tmp/collector-smoke.yaml status` → prints host, state DB, outbox stats, source list, and `lag_bytes`. @@ -320,6 +317,8 @@ Smoke: `go run ./cmd/lethe-collector --config ./tmp/collector-smoke.yaml status` - ureview (re-review): fixed `persistAcceptedOffset` to use the first error's `Line` (1-based within the request body) to identify the failed row, rather than assuming `result.Accepted` points to it. Valid but uncommitted rows before the failed line are re-posted as a smaller prefix before the failed row is skipped, preventing data loss when `accepted=0, errors line=2`. - ureview (re-review): added WARN log with dropped row count and bytes to `EnforceOutboxLimit`. - ureview (re-review): added `lag_bytes` per file to `status` output using `parser.SourceFile.Size` from discovery. +- ureview (final): bounded daemon drain uses `http.timeout` — no separate `shutdown_grace` config exists for the collector. +- ureview (final): backfill offset-0 semantics are implemented as `RunBackfillOnce` instead of a mode flag on `RunOnce` — explicit call sites are safer than a boolean parameter that could be misused in daemon loops. ### Deferred (needs user input)