~bigbes/lethe

dcc2805a6e14ca0169751edb755aeac42d086d17 — Eugene Blikh 24 days ago f167318
docs: record collector final fixes
1 files changed, 11 insertions(+), 12 deletions(-)

M docs/tasks/lethe-collector-claude-code.md
M docs/tasks/lethe-collector-claude-code.md => docs/tasks/lethe-collector-claude-code.md +11 -12
@@ 159,10 159,6 @@ Greenfield collector. The only interface contract this task can break is the wir
- udesign: TOML → YAML for collector config — consistent with the server's config format from #1; one parser, one mental model.
- udesign: `parentUuid` chaining of resumed Claude Code sessions deferred — every `.jsonl` is one session in this task. Surfacing chains is a UI concern for later.
- udesign: synthesized `turn_id` uses `sha256(session_id || seq || timestamp || content[:64])[:16]` — `content[:64]` is enough to disambiguate within a single timestamp; full-content hash would balloon for large turns.
- ureview: fixed partial-accept offset handling — skipped server-rejected rows so one bad turn cannot stall a source file.
- ureview: bounded daemon drain uses `http.timeout` — no separate `shutdown_grace` config exists for the collector.
- ureview: backfill offset-0 semantics are implemented as a dedicated `RunBackfillOnce` function rather than a mode flag on `RunOnce` — explicit call sites are safer than a boolean parameter that could be misused in daemon loops.

TDD: yes (reason: parser behavior on golden fixture `.jsonl` files, offset persistence/resume semantics, outbox replay, and idempotent re-POST behavior are exactly the deterministic regression-prone surfaces TDD is good for. CLI scaffolding and systemd unit are exempt.)

### Invariants


@@ 292,20 288,21 @@ Positive:
- CK1 — `go test ./... -count=1` passes.
- CK2 — `go build ./cmd/lethe-collector` succeeds.
- CK3 — `lethe-collector status` with a minimal config opens the state DB and reports the configured source.
- CK4 — runner tests cover offset-0 backfill and bounded daemon drain.

Negative:
- CK4 — `lethe-collector status --config ./tmp/missing.yaml` exits non-zero with `CONFIG_NOT_FOUND` surfaced.
- CK5 — `lethe-collector status --config ./tmp/missing.yaml` exits non-zero with `CONFIG_NOT_FOUND` surfaced.

Invariants / assumptions:
- CK5 (IV7) — `internal/collector` has no `os.Hostname` call; host flows from collector config.
- CK6 (IV9, AS1) — sender posts only `TurnEvent` NDJSON to `/api/v1/ingest`.
- CK7 (UK1) — Tailscale header injection remains unverifiable without the deployed Tailscale path.
- CK6 (IV7) — `internal/collector` has no `os.Hostname` call; host flows from collector config.
- CK7 (IV9, AS1) — sender posts only `TurnEvent` NDJSON to `/api/v1/ingest`.
- CK8 (UK1) — Tailscale header injection remains unverifiable without the deployed Tailscale path.

Interfaces:
- CK8 (IF1) — `config.Load(path string) (*Config, error)` is exercised by CLI and config tests.
- CK9 (IF2) — `state.Store` offset/outbox methods are exercised by runner and state tests.
- CK10 (IF3) — `Sender.PostBatch(ctx, events)` is exercised by sender, outbox, and runner tests.
- CK11 (IF4) — `RunOnce` / `RunDaemon` are exercised by CLI wiring and runner tests.
- CK9 (IF1) — `config.Load(path string) (*Config, error)` is exercised by CLI and config tests.
- CK10 (IF2) — `state.Store` offset/outbox methods are exercised by runner and state tests.
- CK11 (IF3) — `Sender.PostBatch(ctx, events)` is exercised by sender, outbox, and runner tests.
- CK12 (IF4) — `RunOnce` / `RunDaemon` are exercised by CLI wiring and runner tests.

Smoke: `go run ./cmd/lethe-collector --config ./tmp/collector-smoke.yaml status` → prints host, state DB, outbox stats, source list, and `lag_bytes`.



@@ 320,6 317,8 @@ Smoke: `go run ./cmd/lethe-collector --config ./tmp/collector-smoke.yaml status`
- ureview (re-review): fixed `persistAcceptedOffset` to use the first error's `Line` (1-based within the request body) to identify the failed row, rather than assuming `result.Accepted` points to it. Valid but uncommitted rows before the failed line are re-posted as a smaller prefix before the failed row is skipped, preventing data loss when `accepted=0, errors line=2`.
- ureview (re-review): added WARN log with dropped row count and bytes to `EnforceOutboxLimit`.
- ureview (re-review): added `lag_bytes` per file to `status` output using `parser.SourceFile.Size` from discovery.
- ureview (final): bounded daemon drain uses `http.timeout` — no separate `shutdown_grace` config exists for the collector.
- ureview (final): backfill offset-0 semantics are implemented as `RunBackfillOnce` instead of a mode flag on `RunOnce` — explicit call sites are safer than a boolean parameter that could be misused in daemon loops.

### Deferred (needs user input)