~bigbes/lethe

4abe63ea5d809ec3a7e87e349ccc9493a377b1bc — Eugene Blikh 24 days ago 57c0d49
docs: conclude lethe collector task
3 files changed, 43 insertions(+), 10 deletions(-)

M README.md
M docs/TODO.md
M docs/tasks/lethe-collector-claude-code.md
M README.md => README.md +16 -7
@@ 1,13 1,11 @@
# lethe

Personal AI assistant log aggregator. `lethe` is a small, single-binary Go
service that ingests turn-level NDJSON from AI assistant collectors (Claude
Code, opencode, etc.), stores it in SQLite, and exposes a JSON API for
listing and reading sessions.
Personal AI assistant log aggregator. `lethe` stores turn-level NDJSON from
assistant collectors in SQLite and exposes a JSON API for listing and reading
sessions. This repo now includes the server (`cmd/lethe`) and the first
collector (`cmd/lethe-collector`) for Claude Code transcripts.

Search and the collector binary live in sibling repos / tasks
(`lethe-collector-claude-code`, `lethe-search-and-opencode`); this repo
is just the server.
Search and the opencode collector remain in `lethe-search-and-opencode`.

## Purpose



@@ 30,6 28,17 @@ just dev

The server reads `config.yaml` by default. Pass `-config <path>` to override.

The Claude Code collector reads `~/.config/lethe/collector.yaml` by default and
stores offsets/outbox rows in `~/.local/state/lethe/state.db`:

```bash
go run ./cmd/lethe-collector --config ~/.config/lethe/collector.yaml status
go run ./cmd/lethe-collector --config ~/.config/lethe/collector.yaml backfill claude-code
go run ./cmd/lethe-collector --config ~/.config/lethe/collector.yaml daemon
```

The systemd user unit lives at `deploy/lethe-collector.service`.

Once the server is up (default bind `127.0.0.1:8080`), exercise the API:

```bash

M docs/TODO.md => docs/TODO.md +1 -1
@@ 7,7 7,7 @@ Index of task specs and their state. Each row points at a `docs/tasks/<slug>.md`
| # | Slug | Status | Description |
|---|---|---|---|
| 1 | [`lethe-server`](tasks/lethe-server.md) | **Verified** | Backend skeleton: SQLite ingest, sessions list/detail, forward-auth, RFC 7807, deployable on phoebe behind Authelia. Shipped over 9 phases. |
| 2 | [`lethe-collector-claude-code`](tasks/lethe-collector-claude-code.md) | **Executing** | Per-host systemd-user collector that tails `~/.claude/projects/*.jsonl` and POSTs normalized turns to ingest. Blocks #8 and #9. |
| 2 | [`lethe-collector-claude-code`](tasks/lethe-collector-claude-code.md) | **Reviewed** | Per-host systemd-user collector that tails `~/.claude/projects/*.jsonl` and POSTs normalized turns to ingest. Blocks #8 and #9. |
| 3 | [`lethe-search-and-opencode`](tasks/lethe-search-and-opencode.md) | Designed (deferred) | Adds `GET /api/v1/search` (FTS5) and an `opencode` collector. Blocks #7. |
| 4 | [`lethe-web-ui-foundation`](tasks/lethe-web-ui-foundation.md) | **Reviewed** | Vite/React/TS SPA, embed pipeline, shell + Home + Session views, palette skeleton, 5 stub routes. Plus `/sessions` aggregate fields. |
| 5 | [`lethe-web-ui-aggregates`](tasks/lethe-web-ui-aggregates.md) | **Reviewed** | Backend `/projects` + `/stats` endpoints, Projects index + Project detail + Stats screen. Replaces 3 of #4's stubs. |

M docs/tasks/lethe-collector-claude-code.md => docs/tasks/lethe-collector-claude-code.md +26 -2
@@ 1,6 1,6 @@
# lethe-collector-claude-code

**Status:** executing
**Status:** done
**Branch:** `task/lethe-collector-claude-code`
**Worktree:** `/Users/blikh/data/home/lethe/.worktrees/lethe-collector-claude-code`
**Mode:** hands-off


@@ 309,6 309,30 @@ Smoke: `go run ./cmd/lethe-collector --config ./tmp/collector-smoke.yaml status`

## Conclusion

Outcome: lethe-collector shipped on `task/lethe-collector-claude-code`; final review found no Critical/Important issues.

Invariants:
- IV1 — source roots are only read through parser discovery/parse paths.
- IV2 — runner tests cover full accept, partial server errors, valid-prefix preservation, rejected-row skipping, and skipped-only progress.
- IV3 — parser offset tests and runner persistence use parser-returned complete-line offsets.
- IV4 — state offset/outbox tests and runner rerun tests cover resumability boundaries.
- IV5 — outbox cap is enforced before replay and after enqueue, with WARN logs on overflow drops.
- IV6 — runner tests cover per-file isolation after file-level failure.
- IV7 — host is required config and no `os.Hostname` fallback exists.
- IV8 — Claude Code format knowledge stays under `internal/collector/parser/claudecode`.
- IV9 — sender tests cover exact `/api/v1/ingest` path, including trailing-slash `server_url`.
- IV10 — daemon cancellation tests cover bounded drain of active polls.

### Assumptions check
- AS1 — held — collector uses `wire.TurnEvent` NDJSON over the existing ingest endpoint.
- AS2 — held for parser/runner logic — offsets assume append-only complete-line byte ranges.
- AS3 — held — config has one required host identity and configured source list per process.

### Unknowns outcome
- UK1 — still-open — Tailscale header injection needs deployed-path testing.
- UK2 — resolved enough for v1 — parser uses 1 MiB reader buffering and parser tests cover discovered real-shape records.
- UK3 — still-open — concurrent Claude writes were not observed in a one-hour live run.

### Hands-off decisions

- size: Medium — the design is complete and remaining work spans CLI, config, state, HTTP, daemon, deploy, and tests.


@@ 321,10 345,10 @@ Smoke: `go run ./cmd/lethe-collector --config ./tmp/collector-smoke.yaml status`
- ureview (final): bounded daemon drain uses `http.timeout` — no separate `shutdown_grace` config exists for the collector.
- ureview (final): backfill offset-0 semantics are implemented as `RunBackfillOnce` instead of a mode flag on `RunOnce` — explicit call sites are safer than a boolean parameter that could be misused in daemon loops.
- ureview (final): enforced the outbox size cap before replay and normalized trailing slashes in `server_url` — keeps IV5 and IV9 true for preexisting state and valid-looking URLs.
- ureview (final): normalized sender `serverURL` and enforced outbox cap before every replay to fix IV5/IV9 violations found in review.
- ureview (final): skipped-only parse results (no events but `newOffset > startOffset`) now persist the new offset so the file is not re-parsed forever and status lag clears.

### Deferred (needs user input)

- retry/backoff: `http.retry_max` is loaded from config but exponential backoff needs a configured base/max delay; no conservative default was specified.
- status last_error: requires extending `ingestion_state` schema and deciding retention/update semantics.
- deployed smoke: run the collector for one hour against real `~/.claude/projects` via Tailscale to resolve UK1/UK3 and confirm the design success criterion.