From 72508a536dd54662af89cce0002f6f3dee1ed0f8 Mon Sep 17 00:00:00 2001 From: Eugene Blikh Date: Sun, 3 May 2026 21:09:41 +0300 Subject: [PATCH] docs: refresh search and opencode plan --- docs/tasks/lethe-search-and-opencode.md | 292 ++++++++++++++++-------- 1 file changed, 201 insertions(+), 91 deletions(-) diff --git a/docs/tasks/lethe-search-and-opencode.md b/docs/tasks/lethe-search-and-opencode.md index 27eac9360d64161605fe4bd02c9b27b0ff67e916..443c0a060cd992d776f5fecb6c7d86b28a69e0ca 100644 --- a/docs/tasks/lethe-search-and-opencode.md +++ b/docs/tasks/lethe-search-and-opencode.md @@ -9,138 +9,248 @@ ### Purpose -Make the archive answer "what did I ask any robot last Tuesday about X" in under five seconds, and add the second tool to prove the parser interface scales. End state: a search box on the timeline that returns ranked turn snippets with click-through to session, a stats page showing token rollups, and a working `opencode` parser registered in the collector. +Make the archive searchable by exposing the existing FTS5 indexes through `/api/v1/search`, and prove the collector parser boundary with a second tool: opencode. -A successful end state for this task: I can type "tt bundle lockfile" into the search box, see ranked snippets across both Claude Code and opencode sessions on either machine, click into any result and land on the matching turn. The stats page shows a per-tool, per-day token bar for the last month. +A successful end state for this task: ingested Claude Code and opencode turns can be searched through the authenticated JSON API, with ranked snippets and session anchors that #7 can render in the existing React `/search` route. ### Scope **In:** -- Server-side: - - `GET /api/v1/search?q=&tool=&host=&since=&until=&include_tool_outputs=&limit=&cursor=` — FTS5 query against `turns_fts`. With `include_tool_outputs=1`, also union `tool_outputs_fts` and merge by BM25 rank, deduped on `(tool, host, session_id, turn_id)`. - - `GET /api/v1/stats?group_by=tool|host|day&since=&until=` — token rollups (sum `tokens_in`, `tokens_out`, count of turns, count of sessions). Cost is summed only where non-NULL. - - HTML `GET /search?q=...` — server-rendered results page with snippet/highlight, filters in URL. - - HTML `GET /stats` — table view of the stats endpoint, simple and readable. - - Search-as-you-type debouncing (vanilla JS, ~300 ms) on the timeline and search pages. - - Session view gains an anchor-jump on `#turn-` and a "find on page" hint. +- `GET /api/v1/search?q=&tool=&host=&since=&until=&include_tool_outputs=&limit=&cursor=` — owner-scoped FTS5 query against `turns_fts`; opt-in union with `tool_outputs_fts`. +- `internal/domain/search/` — repository and handler matching the existing domain package shape. +- Cursor pagination over the ranked result set; cursor is opaque and invalid cursors are `400 INVALID`. +- JSON result rows include enough data for #7 to link to `/session/{tool}/{host}/{session_id}#turn-{turn_id}`. - Collector-side: - `internal/collector/parser/opencode/` — new parser implementing the same Parser interface from #2. - Format-discovery spike captured under `docs/spikes/opencode-format.md` before the parser is written; spike output is checked in. - - One config-entry change to register opencode in the dispatcher; otherwise the collector framework is untouched. + - One parser registration in `cmd/lethe-collector`; otherwise the collector runner framework is untouched. - Golden fixtures for the opencode parser (anonymized snippets in `testdata/opencode/`). **Out:** -- Cost rollups for tools that don't report cost. Stats endpoint sums what's there and reports nulls as null. +- React search UI — #7 owns filling `web/src/routes/search.tsx` and any saved-search execute flow. +- Stats API and stats page — already shipped by #5 and left unchanged here. +- Server-rendered HTML and vanilla search JS — superseded by the shipped React SPA. +- Schema migrations — #1 already created `turns_fts` and `tool_outputs_fts`. - crush, pi, kimi parsers — separate task files when the time comes. - Tag system for manual session annotation (RFC backlog). - JSON / Markdown export endpoints (RFC backlog). - Faceted search UI (e.g. histogram-driven date range pickers). Filters stay as form fields and URL params. -- Saved searches, alerts, RSS, anything subscription-shaped. +- Saved-search CRUD, alerts, RSS, anything subscription-shaped. - A second machine's deploy (the goal is to prove the parser interface with #2; running on the work PC is a deployment exercise, not a code change). ### Chosen approach -**Search query (default — `include_tool_outputs=0`).** One FTS5 `MATCH` against `turns_fts`, filtered by the optional `tool`, `host`, `since`, `until` UNINDEXED columns: +**Search API.** Add a `search` domain package, mount it under the existing authenticated `/api/v1` group, and register it in the steward graph beside `session`, `project`, `stats`, and `savedsearch`. + +Response shape: + +```json +{ + "results": [ + { + "tool": "claude-code", + "host": "laptop", + "session_id": "...", + "turn_id": "...", + "timestamp": 1760000000, + "role": "user", + "working_dir": "/repo", + "snippet": "...\u0002term\u0003...", + "match_source": "turn", + "rank": -1.23 + } + ], + "limit": 50, + "next_cursor": "opaque-or-empty" +} +``` + +`snippet` uses marker runes instead of HTML so #7 can render highlights with React text nodes; `match_source` is `turn` or `tool_output`. + +**Search query (default — `include_tool_outputs=0`).** One FTS5 `MATCH` against `turns_fts`, joined back to `turns`/`sessions` by rowid and composite key so filters and result metadata come from canonical tables: ```sql SELECT - f.tool, f.host, f.session_id, f.turn_id, f.timestamp, f.role, - snippet(turns_fts, 0, '', '', '…', 32) AS snippet, - bm25(turns_fts) AS rank -FROM turns_fts AS f + t.tool, t.host, t.session_id, t.turn_id, t.timestamp, t.role, + s.working_dir, + snippet(turns_fts, 0, char(2), char(3), '…', 32) AS snippet, + bm25(turns_fts) AS rank, + 'turn' AS match_source +FROM turns_fts +JOIN turns AS t ON t.rowid = turns_fts.rowid +JOIN sessions AS s ON s.owner = t.owner AND s.tool = t.tool AND s.host = t.host AND s.session_id = t.session_id WHERE turns_fts MATCH ? - AND (? IS NULL OR f.tool = ?) - AND (? IS NULL OR f.host = ?) - AND (? IS NULL OR f.timestamp >= ?) - AND (? IS NULL OR f.timestamp < ?) -ORDER BY rank LIMIT ?; + AND t.owner = ? + AND (? IS NULL OR t.tool = ?) + AND (? IS NULL OR t.host = ?) + AND (? IS NULL OR t.timestamp >= ?) + AND (? IS NULL OR t.timestamp < ?) +ORDER BY rank ASC, t.timestamp DESC, t.turn_id ASC +LIMIT ?; ``` -Pagination: cursor encodes `(rank, turn_id)` of the last row; next page applies `(rank, turn_id) > cursor` in lexicographic order. Offset-based pagination over FTS5 is correct but gets pathological at depth; a tuple cursor is barely more code. For the personal scale, offset+limit would also be fine — defaulting to cursor anyway because changing pagination later is an API break. - -**Search query (`include_tool_outputs=1`).** Two `MATCH` queries (one per FTS table), `UNION ALL`, then a window dedupe on `(tool, host, session_id, turn_id)` keeping the better-ranked match (a turn might appear in both indexes). The frontend sets `include_tool_outputs=1` only when the user toggles the "include tool output" checkbox; the default URL stays clean and fast. - -**Stats query.** Standard SQL aggregation over `turns`, parameterized by `group_by`: -- `group_by=tool` → `GROUP BY tool` returning per-tool totals. -- `group_by=host` → `GROUP BY host`. -- `group_by=day` → `GROUP BY date(timestamp, 'unixepoch')` returning a daily series. - -The `idx_turns_timestamp` index from #1 covers the date-range filter; for the small data scale this stays sub-second without further tuning. +Pagination cursor encodes `(rank, timestamp, turn_id, match_source)` of the last row and must be generated from the same normalized query/filter tuple. -**Search HTML.** A new `search.html` template sharing `base.html` with the timeline and session views. The result list shows: tool icon, host, working_dir, timestamp, snippet (with ``), session-link to `/session/{tool}/{host}/{session_id}#turn-`. Snippets pass through `bluemonday` UGC policy before insertion. +**Search query (`include_tool_outputs=1`).** Two `MATCH` queries, one per FTS table, `UNION ALL`, then window-dedupe on `(tool, host, session_id, turn_id)` keeping the better-ranked match and exposing which source won. -**JS (vanilla, no framework).** One small script (`internal/ui/static/lethe.js`): -- Debounced search-as-you-type: 300 ms after last keystroke, fetch `/search?q=...&format=fragment`, replace the results region's `innerHTML`. The `format=fragment` query param tells the server to return only the results partial template, no `` chrome. -- Anchor-jump highlight on session view (`#turn-` → scroll, briefly add a `.highlighted` CSS class). -- Tool-call expand/collapse uses native `
` from #1; no JS needed. +**Query validation.** Empty `q` is `400 INVALID`; `limit` clamps to the existing 50/200 pattern; `since`/`until` parse as Unix seconds; invalid FTS syntax returns `400 INVALID` rather than a 500. -Total JS budget: under 50 lines. CSP-safe; no `eval`. +**opencode parser — discovery first.** The local install currently exposes `~/.local/share/opencode/opencode.db`, `storage/session/**/*.json`, and `tool-output/*`; the spike decides which is canonical before parser code exists. -**opencode parser — discovery first.** - -The format is unknown until the spike runs. Step 0 of this task is `cmd/lethe-spike-opencode/main.go` (deleted before the task closes), which: -1. Walks `~/.local/share/opencode/` and `~/.config/opencode/` and `~/.cache/opencode/`. -2. Reports file types, sizes, sample contents. -3. Identifies whether sessions are stored as JSONL, JSON-per-session, SQLite, or something else. - -Spike output goes to `docs/spikes/opencode-format.md`. **Until that spike runs, the parser implementation below is provisional.** Two likely shapes: - -- *If JSONL or JSON-per-session:* implementation mirrors the Claude Code parser. `Parse()` reads from a byte offset, returns `wire.TurnEvent`s, persists offset. -- *If SQLite-backed:* `Parse()` opens the source DB read-only with `_busy_timeout` set, queries rows newer than a stored marker (rowid or `(session, sequence)`), maps to `wire.TurnEvent`. The `ingestion_state.last_offset` column gets reused for the marker (it's an INTEGER — works for either rowid or byte offset). - -In both cases the Parser interface from #2 is unchanged. The opencode-specific code is contained in `internal/collector/parser/opencode/`. - -If the spike reveals an opaque or encrypted format, opencode is dropped from this task and an issue is filed against opencode upstream. The task still ships search. +If session JSON files are canonical, implementation mirrors the Claude Code parser: discover session JSON files, parse from byte offset, and emit complete-turn events. If SQLite is canonical, implementation opens the DB read-only and uses `ingestion_state.last_offset` as a row marker. If neither source is stable, opencode leaves this task and the task still ships `/api/v1/search`. **Tradeoffs that settled it.** -- *Single-table FTS query vs always-union:* unioning costs nothing on small data, but it muddies BM25 scoring (tool outputs have wildly different term distributions than prose). Default to clean prose search; let the user opt in. -- *Cursor vs offset pagination:* offset+limit is fine at this scale, but changing pagination later is an API break. Cursor cost is one struct and a base64 helper. -- *Server-side fragment rendering vs client-side JSON+template:* fragment rendering keeps JS at 50 lines and templates as the single source of truth. Client-side rendering would force a parallel JS template path. -- *Discovery spike vs guess-and-iterate on opencode:* the spike is 30 minutes; iterating against a wrong assumption costs an evening. The spike's output is also documentation worth keeping. - -**Unknowns that remain.** -- opencode's actual on-disk format. Resolved by the spike before parser code is written. -- BM25 tuning. SQLite's default BM25 (`k1=1.2`, `b=0.75`) is fine for prose. If empirical search quality is bad against real data, tune via `bm25(turns_fts, 5.0, 1.0, ...)` per-column weights — that's a #3-and-a-half follow-up, not a blocker. -- Stats-page chart. v1 is a table. Adding a sparkline is a one-CSS-grid afternoon if I want it; not in scope here. +- *Keep #3 API-only vs include React UI:* API-only matches `docs/TODO.md`, unblocks #7 cleanly, and avoids mixing parser discovery with frontend route work. +- *Marker snippets vs HTML snippets:* markers avoid `dangerouslySetInnerHTML`; #7 can convert them to `` with normal React nodes. +- *Single-table FTS query vs always-union:* default prose search is faster and less noisy; tool outputs remain an explicit power-user toggle. +- *Cursor vs offset pagination:* cursor prevents an API break if the corpus grows; it costs one helper and a cursor validation test. +- *Discovery spike vs guess-and-iterate on opencode:* the spike is cheaper than implementing against the wrong store and creates the parser fixture map. ### Backwards-compatibility check -- Server: only **adds** routes. `/api/v1/sessions/...` and the timeline/session HTML are unchanged. The `internal/shared/wire/` types are untouched. No schema changes. -- Collector: only adds a new parser package and one config entry under `sources:`. The Parser interface is unchanged. Existing claude-code config keeps working. -- Database: zero migrations. The FTS tables and triggers were created in #1 and are merely queried here for the first time. +- Server: additive route only; existing `/api/v1/stats`, `/api/v1/sessions`, `/api/v1/projects`, `/api/v1/saved-searches`, and ingest behavior stay unchanged. +- Collector: additive parser registration only; existing `claude-code` sources keep the same config and parser behavior. +- Database: no migrations; the task reads the existing FTS tables and canonical `turns`/`sessions` tables. +- Web: the React `/search` stub remains a stub until #7. ### Hands-off decisions -- udesign: cursor pagination for `/api/v1/search` rather than offset — costs one helper, prevents an API break later. Stats keeps `since`/`until` only (no pagination — the result set is intrinsically small). -- udesign: `include_tool_outputs` defaults to false — keeps default search latency low and result lists uncluttered. UI surfaces a checkbox; URL param drives behavior. -- udesign: server-side fragment rendering for live-search — single template path beats a parallel JS template engine. `?format=fragment` returns the partial. -- udesign: BM25 default weights — no per-column tuning until empirical evidence forces it. -- udesign: opencode discovery spike is mandatory before parser code — protects against an evening of guessing. -- udesign: spike binary at `cmd/lethe-spike-opencode/` deleted before task closes; spike findings preserved in `docs/spikes/opencode-format.md`. Spike code is throwaway; the writeup is the artifact. -- udesign: stats UI is a table for v1 — sparkline / chart is YAGNI until I actually use the page. -- udesign: cost rollups null-pass — sum where present, ignore otherwise. No fake "estimated cost" math. Honest data. -- udesign: anchor-jump uses `#turn-` rather than scroll-into-view-via-JS — anchors are stable across renders, work with browser back/forward, and survive the page being reloaded from a bookmark. +- udesign-refresh: scope narrowed to `/api/v1/search` plus opencode parser — current `docs/TODO.md` assigns React search UI to #7, and stats already shipped in #5. +- udesign-refresh: server-rendered HTML/vanilla-JS search removed — the repo now serves a React SPA with an existing `/search` stub. +- udesign-refresh: snippets use non-HTML markers — future React UI can render highlights without unsafe HTML insertion. -TDD: yes (reason: FTS query result shape is exactly the kind of thing that silently regresses on schema or trigger changes; stats aggregation has off-by-one risks around timezone/date boundaries; opencode parser determinism mirrors the claude-code parser's TDD justification.) +TDD: yes (reason: FTS query behavior, cursor round-trips, owner scoping, FTS syntax errors, and opencode parser offsets are deterministic contracts where regressions should fail CI.) ### Invariants -- All schema migrations remain in #1's `migrations/`. This task adds **no** new migration files. -- `internal/shared/wire/` types are not modified by this task. If a need arises, it's flagged and split into a separate amendment task. -- `GET /api/v1/search` and `GET /api/v1/stats` are read-only: they execute `SELECT` only, with no transaction lower than `BEGIN DEFERRED`. -- BM25 ranking is the default for `/api/v1/search`; time-ordered results are explicitly opt-in via a future `?order=time` parameter (not implemented in this task; reserved). -- Search and stats endpoints honor the same `Tailscale-User-Login` allowlist middleware as #1's existing routes. No exemptions. -- Snippets and any user-supplied query reflected back into HTML pass through `bluemonday` before insertion. -- The opencode parser implements `parser.Parser` from #2 verbatim — no special interface, no opencode-specific hooks in the ingestion loop. -- Source-format identification for opencode is documented in `docs/spikes/opencode-format.md` and committed before the parser code lands. -- `tool_outputs_fts` is queried only when `include_tool_outputs=1`; default search path never touches it. -- All client-side JavaScript stays under 50 lines, lives in one file, and works without any build step. +- IV1 — This task adds no schema migration files. +- IV2 — `internal/shared/wire/` types are not modified. +- IV3 — `/api/v1/search` is read-only and executes `SELECT` only. +- IV4 — Search results are scoped through the same authenticated owner rules as sessions/projects. +- IV5 — Default search queries `turns_fts` only; `tool_outputs_fts` is read only when `include_tool_outputs=1`. +- IV6 — API snippets contain marker runes, not HTML. +- IV7 — Empty or syntactically invalid FTS queries return `400 INVALID`, not `500`. +- IV8 — The opencode parser implements `parser.Parser` unchanged. +- IV9 — The collector runner and state schema are unchanged by opencode support. +- IV10 — `docs/spikes/opencode-format.md` is committed before opencode parser implementation lands. +- IV11 — Existing `/api/v1/stats` behavior and React `/stats` page are not changed by this task. +- IV12 — `web/src/routes/search.tsx` remains a stub until #7. ### Principles -- Search defaults are the prose-only path. Tool-output search is a power-user toggle, not the headline. -- One template tree, server-rendered. Live updates go through `?format=fragment` partials, not a parallel JS render path. -- A new parser is one directory under `internal/collector/parser//` plus a single `register.go` import — opencode's job is to prove this and leave the path obvious for crush/pi/kimi. -- Spike before commit when the format is unknown. The spike's writeup is the artifact; the spike's code is throwaway. -- Stats are honest. Null-pass on missing fields. No estimates, no synthesis. -- Vanilla JS, no framework, no build step, no transpiler, no bundler. The day this stops working is the day the project has changed shape enough to merit its own task. +- PC1 — API first, UI later: #3 returns data; #7 decides presentation. +- PC2 — Search defaults to prose turns; tool-output search is explicit. +- PC3 — Spike before parser code when the source format is unknown. +- PC4 — New parser support is one package plus one registration, not a new collector abstraction. + +### Assumptions + +- AS1 — `turns_fts` and `tool_outputs_fts` are kept current by #1's triggers for every ingested turn. +- AS2 — Joining FTS rowid back to `turns.rowid` is stable for the existing regular FTS5 tables. +- AS3 — opencode local storage has a readable canonical transcript source under `~/.local/share/opencode/`. +- AS4 — The collector state's integer offset can represent the chosen opencode progress marker. + +### Unknowns + +- UK1 — Which opencode store is canonical: `opencode.db`, `storage/session/**/*.json`, `tool-output/*`, or a combination. +- UK2 — Whether SQLite FTS query syntax needs a stricter user-query normalizer than passing the validated `q` through to `MATCH`. +- UK3 — Whether default BM25 quality is good enough on real lethe data. + +## Plan + +Approach: ship `/api/v1/search` as an additive read domain first, then run the opencode storage spike before writing the parser; keep #3 API/parser-only so #7 can consume the search contract without frontend churn here. + +### PH1 — Search Repository + +- Tier: deep — FTS5, owner scoping, dedupe, and cursor semantics are correctness-sensitive. +- **1.1** `internal/domain/search/repository.go:1-260` (create) + - `type Result struct`, `type Row struct`, `type Filter struct`, `type Cursor struct` — API/domain shapes for JSON output, filters, and pagination. + - `func (r *Repository) Search(ctx context.Context, f Filter) (*Result, error)` — executes default `turns_fts` search and optional `tool_outputs_fts` union with owner/tool/host/time filters. + - `func EncodeCursor(c Cursor, f Filter) (string, error)` / `func DecodeCursor(raw string, f Filter) (Cursor, error)` — opaque cursor tied to normalized query/filter tuple. + - Respects: IV1, IV2, IV3, IV4, IV5, IV6, IV7, IV11, IV12, PC1, PC2, AS1, AS2, UK2, UK3. +- **1.2** `internal/domain/search/repository_test.go:1-360` (create) + - RED tests for owner isolation, tool/host/since/until filters, prose-only default, tool-output opt-in, dedupe, cursor next page, invalid cursor, marker snippets, and invalid FTS syntax mapping. + - Respects: TDD, IV3-IV7, AS1, AS2. +- Commit: `search: add fts repository` + +### PH2 — Search HTTP Wiring + +- Tier: smart — follows existing handler/steward patterns but defines a new public API contract. +- **2.1** `internal/domain/search/handler.go:1-220` (create) + - `func (h *Handler) Mount(r chi.Router)` — registers `GET /search` under `/api/v1`. + - `func (h *Handler) List(w http.ResponseWriter, r *http.Request)` — resolves auth owner scope, parses query params, clamps limit to 50/200, renders JSON or RFC 7807 errors. + - `func (h *Handler) resolveScope(r *http.Request) (session.OwnerScope, error)` — mirrors session/project admin owner rules. + - Respects: IV3, IV4, IV7, PC1. +- **2.2** `internal/domain/search/handler_test.go:1-260` (create) + - RED tests for route registration, missing/empty `q`, bad `since`, non-admin `owner`, admin `owner=*`, bad cursor, and successful response envelope. + - Respects: TDD, IV4, IV7. +- **2.3** `internal/server/server.go:31-66,103-110` (modify) + - Inject `*search.Handler` and mount it inside the authenticated `/api/v1` group. + - Respects: IV4, IV11, IV12. +- **2.4** `cmd/lethe/main.go:26-137` and `cmd/lethe/main_e2e_test.go:73-92` (modify) + - Register `search.Repository` and `search.Handler` with steward in production and e2e graph setup. + - Respects: IV11. +- Commit: `search: expose search endpoint` + +### PH3 — opencode Format Spike + +- Tier: smart — exploratory but needs a durable writeup before parser code. +- **3.1** `cmd/lethe-spike-opencode/main.go:1-180` (create, then delete before phase commit) + - Walk `~/.local/share/opencode/`, `~/.config/opencode/`, and `~/.cache/opencode/`; report structural file types, counts, sizes, and redacted samples. + - Respects: PC3, AS3, UK1. +- **3.2** `docs/spikes/opencode-format.md:1-160` (create) + - Record canonical source choice, session/message/tool-output shape, progress marker choice, fixture anonymization notes, and parser risks. + - Respects: IV10, PC3, AS3, AS4, UK1. +- Commit: `collector: document opencode storage format` + +### PH4 — opencode Parser + +- Tier: deep — parser correctness affects resumability and archive integrity. +- **4.1** `internal/collector/parser/opencode/parser.go:1-320` (create) + - `func New(host string) *Parser`, `func (p *Parser) Tool() string`, `func (p *Parser) Discover(root string) ([]parser.SourceFile, error)`, `func (p *Parser) Parse(path string, since int64) ([]wire.TurnEvent, int64, error)` — implement the source shape chosen in PH3 without changing `parser.Parser`. + - `func mapRecord(...) (wire.TurnEvent, bool)` or SQLite-equivalent mapper — converts opencode session/message/tool-output records into `wire.TurnEvent`. + - Respects: IV2, IV8, IV9, IV10, PC3, PC4, AS3, AS4. +- **4.2** `internal/collector/parser/opencode/parser_test.go:1-260` and `internal/collector/parser/opencode/testdata/*` (create) + - RED tests for discovery, turn mapping, tool-output mapping, offset/marker resume, malformed-record fallback/skip behavior, and host/tool/source identity. + - Respects: TDD, IV8, IV9, IV10. +- **4.3** `cmd/lethe-collector/main.go:17-221` and `cmd/lethe-collector/main_test.go:1-90` (modify) + - Register `opencode.New(host)` in `buildParsers`; test that both `claude-code` and `opencode` are present. + - Respects: IV8, IV9, PC4. +- Commit: `collector: add opencode parser` + +### Test Strategy + +- RED first: `internal/domain/search` repository tests for FTS result shape, owner scope, filters, cursor, tool-output opt-in, and invalid query handling. +- RED first: `internal/domain/search` handler tests for query parsing, auth scoping, route mount, and response envelope. +- RED first: opencode parser tests after PH3 selects the canonical source; no parser production code before fixtures exist. +- Existing safety net: `go test ./... -count=1`; collector CLI smoke with an opencode source in config once PH4 lands. + +### Order & Dependencies + +- PH1 blocks PH2. +- PH3 blocks PH4. +- PH1/PH2 and PH3/PH4 are otherwise independent; PH4 needs the collector branch already merged on `master`. + +### Risks / Rollback + +- RK1 — FTS5 `MATCH` syntax can turn user input into hard SQL errors; PH1 maps those to `400 INVALID` and keeps normalization isolated. +- RK2 — opencode may require multi-file joins between session JSON and `tool-output/*`; PH3 must choose a marker that PH4 can persist in `last_offset` without state schema changes. +- RK3 — Cursor pagination over BM25 may duplicate or skip rows if the tie-breaker is incomplete; PH1 orders by rank, timestamp, turn_id, and match_source and tests the boundary. + +### Interfaces + +- IF1 — `func (r *Repository) Search(ctx context.Context, f Filter) (*Result, error)` — search read boundary used only by the HTTP handler. +- IF2 — `func (h *Handler) Mount(r chi.Router)` — server mount contract matching other domain packages. +- IF3 — `func New(host string) *Parser` — opencode parser constructor registered by the collector CLI. +- IF4 — `func buildParsers(host string) map[string]parser.Parser` — collector parser registry remains the only dispatch point. + +### Interface Graph + +- PH1 -> IF1 @ `internal/domain/search/` +- PH2 IF1 -> IF2 @ `internal/domain/search/`, `internal/server/`, `cmd/lethe/` +- PH3 -> @ `docs/spikes/opencode-format.md` +- PH4 -> IF3, IF4 @ `internal/collector/parser/opencode/`, `cmd/lethe-collector/` + +Backwards-compat: additive route and parser registration only; PH1/PH2 do not alter existing routes or schema, and PH4 does not change the parser interface, runner, or collector state schema. + +Scope check: no stats work, no React search UI, no schema migration, no saved-search changes, and no parser abstraction beyond `buildParsers`.