From 4abe63ea5d809ec3a7e87e349ccc9493a377b1bc Mon Sep 17 00:00:00 2001 From: Eugene Blikh Date: Sun, 3 May 2026 20:47:43 +0300 Subject: [PATCH] docs: conclude lethe collector task --- README.md | 23 +++++++++++++------ docs/TODO.md | 2 +- docs/tasks/lethe-collector-claude-code.md | 28 +++++++++++++++++++++-- 3 files changed, 43 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index d4f9d8fb89c640d65aea75d76c571689cd7a399a..fa2ebf37b77cacff421be1efe02da536a3691be8 100644 --- a/README.md +++ b/README.md @@ -1,13 +1,11 @@ # lethe -Personal AI assistant log aggregator. `lethe` is a small, single-binary Go -service that ingests turn-level NDJSON from AI assistant collectors (Claude -Code, opencode, etc.), stores it in SQLite, and exposes a JSON API for -listing and reading sessions. +Personal AI assistant log aggregator. `lethe` stores turn-level NDJSON from +assistant collectors in SQLite and exposes a JSON API for listing and reading +sessions. This repo now includes the server (`cmd/lethe`) and the first +collector (`cmd/lethe-collector`) for Claude Code transcripts. -Search and the collector binary live in sibling repos / tasks -(`lethe-collector-claude-code`, `lethe-search-and-opencode`); this repo -is just the server. +Search and the opencode collector remain in `lethe-search-and-opencode`. ## Purpose @@ -30,6 +28,17 @@ just dev The server reads `config.yaml` by default. Pass `-config ` to override. +The Claude Code collector reads `~/.config/lethe/collector.yaml` by default and +stores offsets/outbox rows in `~/.local/state/lethe/state.db`: + +```bash +go run ./cmd/lethe-collector --config ~/.config/lethe/collector.yaml status +go run ./cmd/lethe-collector --config ~/.config/lethe/collector.yaml backfill claude-code +go run ./cmd/lethe-collector --config ~/.config/lethe/collector.yaml daemon +``` + +The systemd user unit lives at `deploy/lethe-collector.service`. + Once the server is up (default bind `127.0.0.1:8080`), exercise the API: ```bash diff --git a/docs/TODO.md b/docs/TODO.md index 4897793c7e2415dcee118eac5c487a38180d7c82..063f75e3178c8fd130d0dcfcfc0154cfdbadb5b4 100644 --- a/docs/TODO.md +++ b/docs/TODO.md @@ -7,7 +7,7 @@ Index of task specs and their state. Each row points at a `docs/tasks/.md` | # | Slug | Status | Description | |---|---|---|---| | 1 | [`lethe-server`](tasks/lethe-server.md) | **Verified** | Backend skeleton: SQLite ingest, sessions list/detail, forward-auth, RFC 7807, deployable on phoebe behind Authelia. Shipped over 9 phases. | -| 2 | [`lethe-collector-claude-code`](tasks/lethe-collector-claude-code.md) | **Executing** | Per-host systemd-user collector that tails `~/.claude/projects/*.jsonl` and POSTs normalized turns to ingest. Blocks #8 and #9. | +| 2 | [`lethe-collector-claude-code`](tasks/lethe-collector-claude-code.md) | **Reviewed** | Per-host systemd-user collector that tails `~/.claude/projects/*.jsonl` and POSTs normalized turns to ingest. Blocks #8 and #9. | | 3 | [`lethe-search-and-opencode`](tasks/lethe-search-and-opencode.md) | Designed (deferred) | Adds `GET /api/v1/search` (FTS5) and an `opencode` collector. Blocks #7. | | 4 | [`lethe-web-ui-foundation`](tasks/lethe-web-ui-foundation.md) | **Reviewed** | Vite/React/TS SPA, embed pipeline, shell + Home + Session views, palette skeleton, 5 stub routes. Plus `/sessions` aggregate fields. | | 5 | [`lethe-web-ui-aggregates`](tasks/lethe-web-ui-aggregates.md) | **Reviewed** | Backend `/projects` + `/stats` endpoints, Projects index + Project detail + Stats screen. Replaces 3 of #4's stubs. | diff --git a/docs/tasks/lethe-collector-claude-code.md b/docs/tasks/lethe-collector-claude-code.md index b48ffaee1515c8d989c9c244d7918ccb079b149e..ebbb0c4b717491b14c31a73ef80c15204cba20af 100644 --- a/docs/tasks/lethe-collector-claude-code.md +++ b/docs/tasks/lethe-collector-claude-code.md @@ -1,6 +1,6 @@ # lethe-collector-claude-code -**Status:** executing +**Status:** done **Branch:** `task/lethe-collector-claude-code` **Worktree:** `/Users/blikh/data/home/lethe/.worktrees/lethe-collector-claude-code` **Mode:** hands-off @@ -309,6 +309,30 @@ Smoke: `go run ./cmd/lethe-collector --config ./tmp/collector-smoke.yaml status` ## Conclusion +Outcome: lethe-collector shipped on `task/lethe-collector-claude-code`; final review found no Critical/Important issues. + +Invariants: +- IV1 — source roots are only read through parser discovery/parse paths. +- IV2 — runner tests cover full accept, partial server errors, valid-prefix preservation, rejected-row skipping, and skipped-only progress. +- IV3 — parser offset tests and runner persistence use parser-returned complete-line offsets. +- IV4 — state offset/outbox tests and runner rerun tests cover resumability boundaries. +- IV5 — outbox cap is enforced before replay and after enqueue, with WARN logs on overflow drops. +- IV6 — runner tests cover per-file isolation after file-level failure. +- IV7 — host is required config and no `os.Hostname` fallback exists. +- IV8 — Claude Code format knowledge stays under `internal/collector/parser/claudecode`. +- IV9 — sender tests cover exact `/api/v1/ingest` path, including trailing-slash `server_url`. +- IV10 — daemon cancellation tests cover bounded drain of active polls. + +### Assumptions check +- AS1 — held — collector uses `wire.TurnEvent` NDJSON over the existing ingest endpoint. +- AS2 — held for parser/runner logic — offsets assume append-only complete-line byte ranges. +- AS3 — held — config has one required host identity and configured source list per process. + +### Unknowns outcome +- UK1 — still-open — Tailscale header injection needs deployed-path testing. +- UK2 — resolved enough for v1 — parser uses 1 MiB reader buffering and parser tests cover discovered real-shape records. +- UK3 — still-open — concurrent Claude writes were not observed in a one-hour live run. + ### Hands-off decisions - size: Medium — the design is complete and remaining work spans CLI, config, state, HTTP, daemon, deploy, and tests. @@ -321,10 +345,10 @@ Smoke: `go run ./cmd/lethe-collector --config ./tmp/collector-smoke.yaml status` - ureview (final): bounded daemon drain uses `http.timeout` — no separate `shutdown_grace` config exists for the collector. - ureview (final): backfill offset-0 semantics are implemented as `RunBackfillOnce` instead of a mode flag on `RunOnce` — explicit call sites are safer than a boolean parameter that could be misused in daemon loops. - ureview (final): enforced the outbox size cap before replay and normalized trailing slashes in `server_url` — keeps IV5 and IV9 true for preexisting state and valid-looking URLs. -- ureview (final): normalized sender `serverURL` and enforced outbox cap before every replay to fix IV5/IV9 violations found in review. - ureview (final): skipped-only parse results (no events but `newOffset > startOffset`) now persist the new offset so the file is not re-parsed forever and status lag clears. ### Deferred (needs user input) - retry/backoff: `http.retry_max` is loaded from config but exponential backoff needs a configured base/max delay; no conservative default was specified. - status last_error: requires extending `ingestion_state` schema and deciding retention/update semantics. +- deployed smoke: run the collector for one hour against real `~/.claude/projects` via Tailscale to resolve UK1/UK3 and confirm the design success criterion.