M README.md => README.md +16 -7
@@ 1,13 1,11 @@
# lethe
-Personal AI assistant log aggregator. `lethe` is a small, single-binary Go
-service that ingests turn-level NDJSON from AI assistant collectors (Claude
-Code, opencode, etc.), stores it in SQLite, and exposes a JSON API for
-listing and reading sessions.
+Personal AI assistant log aggregator. `lethe` stores turn-level NDJSON from
+assistant collectors in SQLite and exposes a JSON API for listing and reading
+sessions. This repo now includes the server (`cmd/lethe`) and the first
+collector (`cmd/lethe-collector`) for Claude Code transcripts.
-Search and the collector binary live in sibling repos / tasks
-(`lethe-collector-claude-code`, `lethe-search-and-opencode`); this repo
-is just the server.
+Search and the opencode collector remain in `lethe-search-and-opencode`.
## Purpose
@@ 30,6 28,17 @@ just dev
The server reads `config.yaml` by default. Pass `-config <path>` to override.
+The Claude Code collector reads `~/.config/lethe/collector.yaml` by default and
+stores offsets/outbox rows in `~/.local/state/lethe/state.db`:
+
+```bash
+go run ./cmd/lethe-collector --config ~/.config/lethe/collector.yaml status
+go run ./cmd/lethe-collector --config ~/.config/lethe/collector.yaml backfill claude-code
+go run ./cmd/lethe-collector --config ~/.config/lethe/collector.yaml daemon
+```
+
+The systemd user unit lives at `deploy/lethe-collector.service`.
+
Once the server is up (default bind `127.0.0.1:8080`), exercise the API:
```bash
M docs/TODO.md => docs/TODO.md +1 -1
@@ 7,7 7,7 @@ Index of task specs and their state. Each row points at a `docs/tasks/<slug>.md`
| # | Slug | Status | Description |
|---|---|---|---|
| 1 | [`lethe-server`](tasks/lethe-server.md) | **Verified** | Backend skeleton: SQLite ingest, sessions list/detail, forward-auth, RFC 7807, deployable on phoebe behind Authelia. Shipped over 9 phases. |
-| 2 | [`lethe-collector-claude-code`](tasks/lethe-collector-claude-code.md) | **Executing** | Per-host systemd-user collector that tails `~/.claude/projects/*.jsonl` and POSTs normalized turns to ingest. Blocks #8 and #9. |
+| 2 | [`lethe-collector-claude-code`](tasks/lethe-collector-claude-code.md) | **Reviewed** | Per-host systemd-user collector that tails `~/.claude/projects/*.jsonl` and POSTs normalized turns to ingest. Blocks #8 and #9. |
| 3 | [`lethe-search-and-opencode`](tasks/lethe-search-and-opencode.md) | Designed (deferred) | Adds `GET /api/v1/search` (FTS5) and an `opencode` collector. Blocks #7. |
| 4 | [`lethe-web-ui-foundation`](tasks/lethe-web-ui-foundation.md) | **Reviewed** | Vite/React/TS SPA, embed pipeline, shell + Home + Session views, palette skeleton, 5 stub routes. Plus `/sessions` aggregate fields. |
| 5 | [`lethe-web-ui-aggregates`](tasks/lethe-web-ui-aggregates.md) | **Reviewed** | Backend `/projects` + `/stats` endpoints, Projects index + Project detail + Stats screen. Replaces 3 of #4's stubs. |
M docs/tasks/lethe-collector-claude-code.md => docs/tasks/lethe-collector-claude-code.md +26 -2
@@ 1,6 1,6 @@
# lethe-collector-claude-code
-**Status:** executing
+**Status:** done
**Branch:** `task/lethe-collector-claude-code`
**Worktree:** `/Users/blikh/data/home/lethe/.worktrees/lethe-collector-claude-code`
**Mode:** hands-off
@@ 309,6 309,30 @@ Smoke: `go run ./cmd/lethe-collector --config ./tmp/collector-smoke.yaml status`
## Conclusion
+Outcome: lethe-collector shipped on `task/lethe-collector-claude-code`; final review found no Critical/Important issues.
+
+Invariants:
+- IV1 — source roots are only read through parser discovery/parse paths.
+- IV2 — runner tests cover full accept, partial server errors, valid-prefix preservation, rejected-row skipping, and skipped-only progress.
+- IV3 — parser offset tests and runner persistence use parser-returned complete-line offsets.
+- IV4 — state offset/outbox tests and runner rerun tests cover resumability boundaries.
+- IV5 — outbox cap is enforced before replay and after enqueue, with WARN logs on overflow drops.
+- IV6 — runner tests cover per-file isolation after file-level failure.
+- IV7 — host is required config and no `os.Hostname` fallback exists.
+- IV8 — Claude Code format knowledge stays under `internal/collector/parser/claudecode`.
+- IV9 — sender tests cover exact `/api/v1/ingest` path, including trailing-slash `server_url`.
+- IV10 — daemon cancellation tests cover bounded drain of active polls.
+
+### Assumptions check
+- AS1 — held — collector uses `wire.TurnEvent` NDJSON over the existing ingest endpoint.
+- AS2 — held for parser/runner logic — offsets assume append-only complete-line byte ranges.
+- AS3 — held — config has one required host identity and configured source list per process.
+
+### Unknowns outcome
+- UK1 — still-open — Tailscale header injection needs deployed-path testing.
+- UK2 — resolved enough for v1 — parser uses 1 MiB reader buffering and parser tests cover discovered real-shape records.
+- UK3 — still-open — concurrent Claude writes were not observed in a one-hour live run.
+
### Hands-off decisions
- size: Medium — the design is complete and remaining work spans CLI, config, state, HTTP, daemon, deploy, and tests.
@@ 321,10 345,10 @@ Smoke: `go run ./cmd/lethe-collector --config ./tmp/collector-smoke.yaml status`
- ureview (final): bounded daemon drain uses `http.timeout` — no separate `shutdown_grace` config exists for the collector.
- ureview (final): backfill offset-0 semantics are implemented as `RunBackfillOnce` instead of a mode flag on `RunOnce` — explicit call sites are safer than a boolean parameter that could be misused in daemon loops.
- ureview (final): enforced the outbox size cap before replay and normalized trailing slashes in `server_url` — keeps IV5 and IV9 true for preexisting state and valid-looking URLs.
-- ureview (final): normalized sender `serverURL` and enforced outbox cap before every replay to fix IV5/IV9 violations found in review.
- ureview (final): skipped-only parse results (no events but `newOffset > startOffset`) now persist the new offset so the file is not re-parsed forever and status lag clears.
### Deferred (needs user input)
- retry/backoff: `http.retry_max` is loaded from config but exponential backoff needs a configured base/max delay; no conservative default was specified.
- status last_error: requires extending `ingestion_state` schema and deciding retention/update semantics.
+- deployed smoke: run the collector for one hour against real `~/.claude/projects` via Tailscale to resolve UK1/UK3 and confirm the design success criterion.