@@ 1,6 1,9 @@
# lethe-collector-claude-code
-**Status:** Design (hands-off)
+**Status:** executing
+**Branch:** `task/lethe-collector-claude-code`
+**Worktree:** `/Users/blikh/data/home/lethe/.worktrees/lethe-collector-claude-code`
+**Mode:** hands-off
**Module:** `sourcecraft.dev/bigbes/lethe`
**Depends on:** `lethe-server.md` (#1) — locks the wire format and ingest semantics this task targets.
**Sibling tasks (deferred):** `lethe-search-and-opencode.md` (#3) and per-tool follow-ups (`lethe-collector-crush.md`, etc.) when the time comes.
@@ 174,3 177,112 @@ TDD: yes (reason: parser behavior on golden fixture `.jsonl` files, offset persi
- Permissive parsing: unknown fields → `metadata`, malformed lines → system-role turn with raw payload. Never panic, never stall.
- No background goroutines without a `context.Context` tied to shutdown.
- Test against real fixture files (anonymized snippets from `~/.claude/projects/` checked into `testdata/`), not hand-crafted minimal JSON.
+
+## Plan
+
+Approach: keep the parser as the only Claude-specific layer, then add small collector packages for config, SQLite state/outbox, HTTP sending, and orchestration; the CLI is a thin cobra shell over those packages.
+
+### PH1 — Config And State
+
+- Tier: smart — config/state define the contracts every later phase consumes.
+- **1.1** `internal/collector/config/config.go:1-220` (create)
+ - `Load(path string) (*Config, error)` — strict Viper YAML loader with `~` expansion, defaults except required `host`, and validation.
+ - Respects: IV7, PC2, GPC1.
+- **1.2** `internal/collector/state/store.go:1-260` (create)
+ - `Open(ctx context.Context, path string) (*Store, error)` — opens SQLite, creates parent dir, applies embedded migrations.
+ - `GetOffset(ctx context.Context, tool, sourceFile string) (int64, error)` / `SaveOffset(ctx context.Context, tool, sourceFile string, offset int64) error`.
+ - `Enqueue(ctx context.Context, item OutboxItem) error`, `Oldest(ctx context.Context, limit int) ([]OutboxRow, error)`, `Delete(ctx context.Context, ids []int64) error`, `Stats(ctx context.Context) (Stats, error)`.
+ - Respects: IV2, IV4, IV5, GPC5.
+- **1.3** `internal/collector/state/migrations.go:1-80` (create)
+ - `applyMigrations(ctx context.Context, db *sqlx.DB) error` — idempotent DDL for `ingestion_state` and `outbox`.
+ - Respects: IV4, IV5.
+- Commit: `collector: add config and state store`
+
+### PH2 — HTTP Send And Outbox Replay
+
+- Tier: smart — partial-accept offset semantics and outbox deletion must match the server contract exactly.
+- **2.1** `internal/collector/ingest/sender.go:1-240` (create)
+ - `PostBatch(ctx context.Context, events []wire.TurnEvent) (Result, error)` — serializes NDJSON, POSTs `server_url + /api/v1/ingest`, decodes `{accepted,errors}`.
+ - `EncodeNDJSON(events []wire.TurnEvent) ([]byte, error)` — shared by sender and outbox tests.
+ - Respects: IV2, IV8, GPC4.
+- **2.2** `internal/collector/ingest/outbox.go:1-220` (create)
+ - `ReplayOutbox(ctx context.Context, store *state.Store, sender *Sender, limit int) error` — oldest-first replay, delete only fully accepted rows.
+ - `EnforceOutboxLimit(ctx context.Context, store *state.Store, maxBytes int64) error` — oldest-drop overflow.
+ - Respects: IV5, PC3, GPC5.
+- Commit: `collector: add ingest sender and outbox replay`
+
+### PH3 — Source Runner
+
+- Tier: deep — this phase owns resumability, shutdown, and per-source isolation.
+- **3.1** `internal/collector/ingest/runner.go:1-320` (create)
+ - `RunOnce(ctx context.Context, cfg config.Config, src config.Source, p parser.Parser, store *state.Store, sender *Sender) error` — replay outbox, discover files, parse from persisted offset, send batches, persist accepted offsets.
+ - `RunDaemon(ctx context.Context, cfg config.Config, parsers map[string]parser.Parser, store *state.Store, sender *Sender) error` — per-source polling loops via `auxilia/async` and context-bound shutdown.
+ - Respects: IV1-IV8, PC1-PC6, AS1-AS3.
+- **3.2** `internal/collector/ingest/batch.go:1-160` (create)
+ - `BuildBatches(events []wire.TurnEvent, maxLines int, maxBytes int) ([]Batch, error)` — records event indexes so accepted counts map back to offsets.
+ - Respects: IV2, IV3.
+- Commit: `collector: add polling source runner`
+
+### PH4 — CLI And Deploy
+
+- Tier: smart — command behavior is user-facing but mostly glue.
+- **4.1** `cmd/lethe-collector/main.go:1-260` (create)
+ - `newRootCmd() *cobra.Command`, `newDaemonCmd() *cobra.Command`, `newBackfillCmd() *cobra.Command`, `newStatusCmd() *cobra.Command`.
+ - Default config path is `~/.config/lethe/collector.yaml`; `host` still has no default inside config.
+ - Respects: IV6, IV7, IV9, GPC6.
+- **4.2** `deploy/lethe-collector.service:1-40` (create)
+ - systemd user unit running `lethe-collector daemon` with journald logging and restart policy.
+ - Respects: IV9.
+- **4.3** `docs/tasks/lethe-collector-claude-code.md` (modify)
+ - Record implementation decisions, deferred items, and verify results.
+ - Respects: GPC7.
+- Commit: `collector: add lethe-collector cli`
+
+### Test strategy
+
+- RED first: `internal/collector/config` tests for strict unknown-key rejection, required `host`, YAML defaults, and `~` expansion.
+- RED first: `internal/collector/state` tests for migration idempotency, offset upsert, outbox FIFO replay rows, byte accounting, and oldest-drop limit.
+- RED first: `internal/collector/ingest` tests for NDJSON encoding, partial accepted-count offset persistence, network-failure outbox enqueue, replay deletion, and batch byte/line caps.
+- Existing parser tests remain the regression gate for Claude Code format handling.
+
+### Order & dependencies
+
+- PH1 blocks PH2-PH4.
+- PH2 blocks PH3.
+- PH3 blocks PH4 daemon/backfill behavior; `status` can be implemented after PH1.
+
+### Risks / rollback
+
+- RK1 — The server returns `accepted` counts but not source offsets, so PH3 must retain per-event source offsets in-memory and enqueue whole batches on hard failures.
+- RK2 — `tailscale serve` header behavior remains empirical; verify records the result and defers token fallback if needed rather than changing the locked server in this task.
+
+### Interfaces
+
+- IF1 — `config.Load(path string) (*Config, error)` — all CLI commands load the same strict collector YAML.
+- IF2 — `state.Store` offset/outbox methods — runner and status share one SQLite boundary.
+- IF3 — `ingest.Sender.PostBatch(ctx, events)` — runner and outbox replay share one HTTP boundary.
+- IF4 — `ingest.RunOnce` / `ingest.RunDaemon` — CLI commands do not know parser, offset, or batching internals.
+
+### Interface graph
+
+- PH1 -> IF1, IF2 @ `internal/collector/config/`, `internal/collector/state/`
+- PH2 IF2 -> IF3 @ `internal/collector/ingest/sender.go`, `internal/collector/ingest/outbox.go`
+- PH3 IF1, IF2, IF3 -> IF4 @ `internal/collector/ingest/runner.go`, `internal/collector/ingest/batch.go`
+- PH4 IF1, IF2, IF4 -> @ `cmd/lethe-collector/`, `deploy/`
+
+Backwards-compat: greenfield collector; PH2 must not mutate `internal/shared/wire`, and all server interaction stays inside the existing `POST /api/v1/ingest` contract.
+
+Scope check: no server changes, no extra parser registry abstraction, and no token-auth fallback unless verify proves Tailscale forwarding cannot work.
+
+## Verify
+
+## Conclusion
+
+### Hands-off decisions
+
+- size: Medium — the design is complete and remaining work spans CLI, config, state, HTTP, daemon, deploy, and tests.
+- worktree: `task/lethe-collector-claude-code` at `/Users/blikh/data/home/lethe/.worktrees/lethe-collector-claude-code` — hands-off requires isolated reversible edits.
+- worktree setup: added `.worktrees/` to `.gitignore` on `master` before creating the task worktree — `git-worktrees` requires project-local worktree directories to be ignored.
+- uplan: plan auto-approved (hands-off).
+
+### Deferred (needs user input)