# lethe-search-and-opencode **Status:** done **Branch:** `task/lethe-search-and-opencode` **Worktree:** `/Users/blikh/data/home/lethe/.worktrees/lethe-search-and-opencode` **Mode:** hands-off **Module:** `sourcecraft.dev/bigbes/lethe` **Depends on:** `lethe-server.md` (#1) — FTS5 tables and triggers were created in #1; this task only adds query code. `lethe-collector-claude-code.md` (#2) — the collector framework and Parser interface this task extends. **Sibling tasks (deferred):** per-tool parsers (`lethe-collector-crush.md`, `lethe-collector-pi.md`, `lethe-collector-kimi.md`); RFC backlog items (cost rollups for tools that report it, tagging, JSON/Markdown export). ## Design ### Purpose Make the archive searchable by exposing the existing FTS5 indexes through `/api/v1/search`, and prove the collector parser boundary with a second tool: opencode. A successful end state for this task: ingested Claude Code and opencode turns can be searched through the authenticated JSON API, with ranked snippets and session anchors that #7 can render in the existing React `/search` route. ### Scope **In:** - `GET /api/v1/search?q=&tool=&host=&since=&until=&include_tool_outputs=&limit=&cursor=` — owner-scoped FTS5 query against `turns_fts`; opt-in union with `tool_outputs_fts`. - `internal/domain/search/` — repository and handler matching the existing domain package shape. - Cursor pagination over the ranked result set; cursor is opaque and invalid cursors are `400 INVALID`. - JSON result rows include enough data for #7 to link to `/session/{tool}/{host}/{session_id}#turn-{turn_id}`. - Collector-side: - `internal/collector/parser/opencode/` — new parser implementing the same Parser interface from #2. - Format-discovery spike captured under `docs/spikes/opencode-format.md` before the parser is written; spike output is checked in. - One parser registration in `cmd/lethe-collector`; otherwise the collector runner framework is untouched. - Golden fixtures for the opencode parser (anonymized snippets in `testdata/opencode/`). **Out:** - React search UI — #7 owns filling `web/src/routes/search.tsx` and any saved-search execute flow. - Stats API and stats page — already shipped by #5 and left unchanged here. - Server-rendered HTML and vanilla search JS — superseded by the shipped React SPA. - Schema migrations — #1 already created `turns_fts` and `tool_outputs_fts`. - crush, pi, kimi parsers — separate task files when the time comes. - Tag system for manual session annotation (RFC backlog). - JSON / Markdown export endpoints (RFC backlog). - Faceted search UI (e.g. histogram-driven date range pickers). Filters stay as form fields and URL params. - Saved-search CRUD, alerts, RSS, anything subscription-shaped. - A second machine's deploy (the goal is to prove the parser interface with #2; running on the work PC is a deployment exercise, not a code change). ### Chosen approach **Search API.** Add a `search` domain package, mount it under the existing authenticated `/api/v1` group, and register it in the steward graph beside `session`, `project`, `stats`, and `savedsearch`. Response shape: ```json { "results": [ { "tool": "claude-code", "host": "laptop", "session_id": "...", "turn_id": "...", "timestamp": 1760000000, "role": "user", "working_dir": "/repo", "snippet": "...\u0002term\u0003...", "match_source": "turn", "rank": -1.23 } ], "limit": 50, "next_cursor": "opaque-or-empty" } ``` `snippet` uses marker runes instead of HTML so #7 can render highlights with React text nodes; `match_source` is `turn` or `tool_output`. **Search query (default — `include_tool_outputs=0`).** One FTS5 `MATCH` against `turns_fts`, joined back to `turns`/`sessions` by rowid and composite key so filters and result metadata come from canonical tables: ```sql SELECT t.tool, t.host, t.session_id, t.turn_id, t.timestamp, t.role, s.working_dir, snippet(turns_fts, 0, char(2), char(3), '…', 32) AS snippet, bm25(turns_fts) AS rank, 'turn' AS match_source FROM turns_fts JOIN turns AS t ON t.rowid = turns_fts.rowid JOIN sessions AS s ON s.owner = t.owner AND s.tool = t.tool AND s.host = t.host AND s.session_id = t.session_id WHERE turns_fts MATCH ? AND t.owner = ? AND (? IS NULL OR t.tool = ?) AND (? IS NULL OR t.host = ?) AND (? IS NULL OR t.timestamp >= ?) AND (? IS NULL OR t.timestamp < ?) ORDER BY rank ASC, t.timestamp DESC, t.turn_id ASC LIMIT ?; ``` Pagination cursor encodes `(rank, timestamp, turn_id, match_source)` of the last row and must be generated from the same normalized query/filter tuple. **Search query (`include_tool_outputs=1`).** Two `MATCH` queries, one per FTS table, `UNION ALL`, then window-dedupe on `(tool, host, session_id, turn_id)` keeping the better-ranked match and exposing which source won. **Query validation.** Empty `q` is `400 INVALID`; `limit` clamps to the existing 50/200 pattern; `since`/`until` parse as Unix seconds; invalid FTS syntax returns `400 INVALID` rather than a 500. **opencode parser — discovery first.** The local install currently exposes `~/.local/share/opencode/opencode.db`, `storage/session/**/*.json`, and `tool-output/*`; the spike decides which is canonical before parser code exists. If session JSON files are canonical, implementation mirrors the Claude Code parser: discover session JSON files, parse from byte offset, and emit complete-turn events. If SQLite is canonical, implementation opens the DB read-only and uses `ingestion_state.last_offset` as a row marker. If neither source is stable, opencode leaves this task and the task still ships `/api/v1/search`. **Tradeoffs that settled it.** - *Keep #3 API-only vs include React UI:* API-only matches `docs/TODO.md`, unblocks #7 cleanly, and avoids mixing parser discovery with frontend route work. - *Marker snippets vs HTML snippets:* markers avoid `dangerouslySetInnerHTML`; #7 can convert them to `` with normal React nodes. - *Single-table FTS query vs always-union:* default prose search is faster and less noisy; tool outputs remain an explicit power-user toggle. - *Cursor vs offset pagination:* cursor prevents an API break if the corpus grows; it costs one helper and a cursor validation test. - *Discovery spike vs guess-and-iterate on opencode:* the spike is cheaper than implementing against the wrong store and creates the parser fixture map. ### Backwards-compatibility check - Server: additive route only; existing `/api/v1/stats`, `/api/v1/sessions`, `/api/v1/projects`, `/api/v1/saved-searches`, and ingest behavior stay unchanged. - Collector: additive parser registration only; existing `claude-code` sources keep the same config and parser behavior. - Database: no migrations; the task reads the existing FTS tables and canonical `turns`/`sessions` tables. - Web: the React `/search` stub remains a stub until #7. ### Hands-off decisions - udesign-refresh: scope narrowed to `/api/v1/search` plus opencode parser — current `docs/TODO.md` assigns React search UI to #7, and stats already shipped in #5. - udesign-refresh: server-rendered HTML/vanilla-JS search removed — the repo now serves a React SPA with an existing `/search` stub. - udesign-refresh: snippets use non-HTML markers — future React UI can render highlights without unsafe HTML insertion. TDD: yes (reason: FTS query behavior, cursor round-trips, owner scoping, FTS syntax errors, and opencode parser offsets are deterministic contracts where regressions should fail CI.) ### Invariants - IV1 — This task adds no schema migration files. - IV2 — `internal/shared/wire/` types are not modified. - IV3 — `/api/v1/search` is read-only and executes `SELECT` only. - IV4 — Search results are scoped through the same authenticated owner rules as sessions/projects. - IV5 — Default search queries `turns_fts` only; `tool_outputs_fts` is read only when `include_tool_outputs=1`. - IV6 — API snippets contain marker runes, not HTML. - IV7 — Empty or syntactically invalid FTS queries return `400 INVALID`, not `500`. - IV8 — The opencode parser implements `parser.Parser` unchanged. - IV9 — The collector runner and state schema are unchanged by opencode support. - IV10 — `docs/spikes/opencode-format.md` is committed before opencode parser implementation lands. - IV11 — Existing `/api/v1/stats` behavior and React `/stats` page are not changed by this task. - IV12 — `web/src/routes/search.tsx` remains a stub until #7. ### Principles - PC1 — API first, UI later: #3 returns data; #7 decides presentation. - PC2 — Search defaults to prose turns; tool-output search is explicit. - PC3 — Spike before parser code when the source format is unknown. - PC4 — New parser support is one package plus one registration, not a new collector abstraction. ### Assumptions - AS1 — `turns_fts` and `tool_outputs_fts` are kept current by #1's triggers for every ingested turn. - AS2 — Joining FTS rowid back to `turns.rowid` is stable for the existing regular FTS5 tables. - AS3 — opencode local storage has a readable canonical transcript source under `~/.local/share/opencode/`. - AS4 — The collector state's integer offset can represent the chosen opencode progress marker. ### Unknowns - UK1 — Which opencode store is canonical: `opencode.db`, `storage/session/**/*.json`, `tool-output/*`, or a combination. - UK2 — Whether SQLite FTS query syntax needs a stricter user-query normalizer than passing the validated `q` through to `MATCH`. - UK3 — Whether default BM25 quality is good enough on real lethe data. ## Plan Approach: ship `/api/v1/search` as an additive read domain first, then run the opencode storage spike before writing the parser; keep #3 API/parser-only so #7 can consume the search contract without frontend churn here. ### PH1 — Search Repository - Tier: deep — FTS5, owner scoping, dedupe, and cursor semantics are correctness-sensitive. - **1.1** `internal/domain/search/repository.go:1-260` (create) - `type Result struct`, `type Row struct`, `type Filter struct`, `type Cursor struct` — API/domain shapes for JSON output, filters, and pagination. - `func (r *Repository) Search(ctx context.Context, f Filter) (*Result, error)` — executes default `turns_fts` search and optional `tool_outputs_fts` union with owner/tool/host/time filters. - `func EncodeCursor(c Cursor, f Filter) (string, error)` / `func DecodeCursor(raw string, f Filter) (Cursor, error)` — opaque cursor tied to normalized query/filter tuple. - Respects: IV1, IV2, IV3, IV4, IV5, IV6, IV7, IV11, IV12, PC1, PC2, AS1, AS2, UK2, UK3. - **1.2** `internal/domain/search/repository_test.go:1-360` (create) - RED tests for owner isolation, tool/host/since/until filters, prose-only default, tool-output opt-in, dedupe, cursor next page, invalid cursor, marker snippets, and invalid FTS syntax mapping. - Respects: TDD, IV3-IV7, AS1, AS2. - Commit: `search: add fts repository` ### PH2 — Search HTTP Wiring - Tier: smart — follows existing handler/steward patterns but defines a new public API contract. - **2.1** `internal/domain/search/handler.go:1-220` (create) - `func (h *Handler) Mount(r chi.Router)` — registers `GET /search` under `/api/v1`. - `func (h *Handler) List(w http.ResponseWriter, r *http.Request)` — resolves auth owner scope, parses query params, clamps limit to 50/200, renders JSON or RFC 7807 errors. - `func (h *Handler) resolveScope(r *http.Request) (session.OwnerScope, error)` — mirrors session/project admin owner rules. - Respects: IV3, IV4, IV7, PC1. - **2.2** `internal/domain/search/handler_test.go:1-260` (create) - RED tests for route registration, missing/empty `q`, bad `since`, non-admin `owner`, admin `owner=*`, bad cursor, and successful response envelope. - Respects: TDD, IV4, IV7. - **2.3** `internal/server/server.go:31-66,103-110` (modify) - Inject `*search.Handler` and mount it inside the authenticated `/api/v1` group. - Respects: IV4, IV11, IV12. - **2.4** `cmd/lethe/main.go:26-137` and `cmd/lethe/main_e2e_test.go:73-92` (modify) - Register `search.Repository` and `search.Handler` with steward in production and e2e graph setup. - Respects: IV11. - Commit: `search: expose search endpoint` ### PH3 — opencode Format Spike - Tier: smart — exploratory but needs a durable writeup before parser code. - **3.1** `cmd/lethe-spike-opencode/main.go:1-180` (create, then delete before phase commit) - Walk `~/.local/share/opencode/`, `~/.config/opencode/`, and `~/.cache/opencode/`; report structural file types, counts, sizes, and redacted samples. - Respects: PC3, AS3, UK1. - **3.2** `docs/spikes/opencode-format.md:1-160` (create) - Record canonical source choice, session/message/tool-output shape, progress marker choice, fixture anonymization notes, and parser risks. - Respects: IV10, PC3, AS3, AS4, UK1. - Commit: `collector: document opencode storage format` ### PH4 — opencode Parser - Tier: deep — parser correctness affects resumability and archive integrity. - **4.1** `internal/collector/parser/opencode/parser.go:1-320` (create) - `func New(host string) *Parser`, `func (p *Parser) Tool() string`, `func (p *Parser) Discover(root string) ([]parser.SourceFile, error)`, `func (p *Parser) Parse(path string, since int64) ([]wire.TurnEvent, int64, error)` — implement the source shape chosen in PH3 without changing `parser.Parser`. - `func mapRecord(...) (wire.TurnEvent, bool)` or SQLite-equivalent mapper — converts opencode session/message/tool-output records into `wire.TurnEvent`. - Respects: IV2, IV8, IV9, IV10, PC3, PC4, AS3, AS4. - **4.2** `internal/collector/parser/opencode/parser_test.go:1-260` and `internal/collector/parser/opencode/testdata/*` (create) - RED tests for discovery, turn mapping, tool-output mapping, offset/marker resume, malformed-record fallback/skip behavior, and host/tool/source identity. - Respects: TDD, IV8, IV9, IV10. - **4.3** `cmd/lethe-collector/main.go:17-221` and `cmd/lethe-collector/main_test.go:1-90` (modify) - Register `opencode.New(host)` in `buildParsers`; test that both `claude-code` and `opencode` are present. - Respects: IV8, IV9, PC4. - Commit: `collector: add opencode parser` ### Test Strategy - RED first: `internal/domain/search` repository tests for FTS result shape, owner scope, filters, cursor, tool-output opt-in, and invalid query handling. - RED first: `internal/domain/search` handler tests for query parsing, auth scoping, route mount, and response envelope. - RED first: opencode parser tests after PH3 selects the canonical source; no parser production code before fixtures exist. - Existing safety net: `go test ./... -count=1`; collector CLI smoke with an opencode source in config once PH4 lands. ### Order & Dependencies - PH1 blocks PH2. - PH3 blocks PH4. - PH1/PH2 and PH3/PH4 are otherwise independent; PH4 needs the collector branch already merged on `master`. ### Risks / Rollback - RK1 — FTS5 `MATCH` syntax can turn user input into hard SQL errors; PH1 maps those to `400 INVALID` and keeps normalization isolated. - RK2 — opencode may require multi-file joins between session JSON and `tool-output/*`; PH3 must choose a marker that PH4 can persist in `last_offset` without state schema changes. - RK3 — Cursor pagination over BM25 may duplicate or skip rows if the tie-breaker is incomplete; PH1 orders by rank, timestamp, turn_id, and match_source and tests the boundary. ### Interfaces - IF1 — `func (r *Repository) Search(ctx context.Context, f Filter) (*Result, error)` — search read boundary used only by the HTTP handler. - IF2 — `func (h *Handler) Mount(r chi.Router)` — server mount contract matching other domain packages. - IF3 — `func New(host string) *Parser` — opencode parser constructor registered by the collector CLI. - IF4 — `func buildParsers(host string) map[string]parser.Parser` — collector parser registry remains the only dispatch point. - IF5 — `docs/spikes/opencode-format.md` — canonical opencode source choice consumed by the parser phase. ### Interface Graph - PH1 -> IF1 @ `internal/domain/search/` - PH2 IF1 -> IF2 @ `internal/domain/search/`, `internal/server/`, `cmd/lethe/` - PH3 -> IF5 @ `docs/spikes/opencode-format.md` - PH4 IF5 -> IF3, IF4 @ `internal/collector/parser/opencode/`, `cmd/lethe-collector/` Backwards-compat: additive route and parser registration only; PH1/PH2 do not alter existing routes or schema, and PH4 does not change the parser interface, runner, or collector state schema. Scope check: no stats work, no React search UI, no schema migration, no saved-search changes, and no parser abstraction beyond `buildParsers`. ## Verify **Result:** passed Positive: - CK1 — `/api/v1/search` repository and handler tests cover ranked prose search, tool-output opt-in, filters, cursors, and response envelope. - CK2 — opencode parser tests cover SQLite discovery, turn mapping, tool summaries, resume marker, malformed skips, and collector registration. - CK3 — `go build ./cmd/lethe ./cmd/lethe-collector` succeeds. - CK4 — `go test ./... -count=1` passes. Negative: - CK5 — empty/invalid search query and bad cursor return `INVALID`. - CK6 — non-admin `?owner=` on search returns `FORBIDDEN`. - CK7 — opencode parser does not ingest external `tool-output/` blob contents. Invariants / assumptions: - CK8 (IV1, IV2) — no search package references schema DDL or `internal/shared/wire`. - CK9 (IV3-IV7) — search tests verify read-path behavior, owner scoping, prose default, marker snippets, and invalid-query handling. - CK10 (IV8-IV10, AS3, AS4) — opencode parser implements `parser.Parser`, keeps collector state schema unchanged, and consumes the committed storage spike. - CK11 (IV11, IV12) — stats packages and React `/search` route were not changed. Interfaces: - CK12 (IF1) — `Repository.Search(ctx, Filter)` is called by handler and repository tests. - CK13 (IF2) — `Handler.Mount(r chi.Router)` registers `/api/v1/search`. - CK14 (IF3, IF4) — `opencode.New(host)` is registered through `buildParsers` and tested by `cmd/lethe-collector`. - CK15 (IF5) — `docs/spikes/opencode-format.md` records the SQLite source and `message.rowid` marker used by PH4. Smoke: `go test ./internal/domain/search -run TestHandler_SuccessfulResponseEnvelope -v` and `go test ./internal/collector/parser/opencode -run TestParse_MapsTurnsAndIdentity -v` both pass. ## Conclusion Outcome: `/api/v1/search` and the opencode collector parser shipped on `task/lethe-search-and-opencode` through `5cc599d`. Invariants: - IV1 — no migration files were added. - IV2 — `internal/shared/wire/` was not modified. - IV3 — search implementation is repository/handler read-path code only. - IV4 — search handler uses the existing authenticated owner-scope rules. - IV5 — repository tests cover prose-only default and tool-output opt-in. - IV6 — snippets use marker bytes, not HTML. - IV7 — empty, malformed, and bad-cursor search inputs return `INVALID`. - IV8 — opencode implements `parser.Parser` unchanged. - IV9 — collector runner and state schema were unchanged. - IV10 — `docs/spikes/opencode-format.md` landed before parser implementation. - IV11 — stats API/page code was not changed. - IV12 — React `/search` route was not changed. ### Assumptions check - AS1 — held — search tests exercise FTS rows populated by existing triggers. - AS2 — held — search joins FTS rowid back to `turns.rowid` in tests and implementation. - AS3 — held — spike confirmed readable opencode SQLite storage under `~/.local/share/opencode/`. - AS4 — held after review fix — collector `last_offset` stores next opencode `message.rowid`, and `TurnEvent.Seq` stores current rowid. ### Unknowns outcome - UK1 — resolved — SQLite `opencode.db` is canonical for v1. - UK2 — resolved for v1 — invalid FTS syntax maps to `INVALID`; no stricter normalizer was needed. - UK3 — still-open — BM25 quality needs real archive usage after ingest. ### Review findings - Critical: opencode offset marker changed from `message.time_created` to inclusive next-`message.rowid` after reviewer found skipped-row risk in partial-accept paths.