~bigbes/lethe

ref: 136ae1f4c5c0068868f3cabeae0f947945a381dd lethe/docs/tasks/lethe-server.md -rw-r--r-- 56.7 KiB
136ae1f4 — Eugene Blikh docs(lethe-web-ui-foundation): record verify section, mark Verified a month ago

#lethe-server

Status: Verified Module: sourcecraft.dev/bigbes/lethe Branch: master Worktree: none Parent RFC: Personal AI Assistant Log Aggregator (2026-04-25) Sibling tasks (deferred): lethe-collector-claude-code.md (#2), lethe-search-and-opencode.md (#3)

#Design

#Purpose

Stand up the lethe server binary: SQLite-backed ingestion endpoint, session list/detail JSON API, forward-auth header trust against an Authelia-protected reverse proxy, ready to deploy on phoebe behind Caddy/Traefik+Authelia. Search, stats, any HTML/UI, and any collector code are explicitly deferred to siblings or a later UI task.

A successful end state for this task: I can curl -X POST a fixture NDJSON file at /api/v1/ingest, then curl the sessions list and a single session detail through the reverse proxy (with either an Authelia session forwarding Remote-User or an Authelia-issued OIDC bearer) and get the expected JSON back.

#Scope

In:

  • Single Go binary lethe (cmd/lethe/main.go) — JSON API server only.
  • Full SQLite schema, including the FTS5 tables and triggers from the start (so #3 only adds query code, no migrations).
  • Embedded migrations via embed.FS + golang-migrate/v4, applied on startup.
  • POST /api/v1/ingest — NDJSON, turn-only protocol (server upserts session rows from turn data).
  • GET /api/v1/sessions — paginated list with filters (tool, host, since, until).
  • GET /api/v1/sessions/{tool}/{host}/{session_id} — full session with turns inline.
  • Two auth paths, both gated by the same auth.allowed_users allowlist:
    • Forward-auth: trust Remote-User from an upstream reverse proxy (Caddy/Traefik) gated by Authelia.
    • OIDC bearer: validate Authorization: Bearer <jwt> against Authelia's OIDC issuer (JWKS lookup, signature, iss/aud/exp); take user from preferred_username (fallback sub).
  • Each mode independently enable-able; if both enabled, bearer is checked first then header.
  • Per-user data isolation: every session and turn is owned by the user that ingested it. List and detail endpoints filter by current user. Members of the configurable auth.admins list can see all owners' data and override via ?owner=<user> (or ?owner=*) on read endpoints.
  • /healthz, /readyz, /metrics (Prometheus).
  • Graceful shutdown, structured logging via scribe, structured errors via culpa rendered as RFC 7807 application/problem+json.
  • Justfile, .air.toml, Dockerfile, docker-compose.yml, config.example.yaml, .golangci.yml per go-selfhosted-backend skill conventions.
  • Wire types live in internal/shared/wire/ so #2's collector imports them directly.
  • Daily .backup is documented in the README (cron + sqlite3 .backup); no code in this task.

Out:

  • Any HTML / web UI — timeline view, session view, templates, CSS, static assets, markdown rendering (goldmark/bluemonday). Deferred to a later UI task; the JSON API is the only consumer surface in this task.
  • Collector binary, parsers, ingestion loop, outbox, backfill — all #2.
  • /api/v1/search + search UI + tool_outputs_fts query path — #3 (table + triggers exist; queries don't).
  • /api/v1/stats + rollups view — #3.
  • /debug/pprof — defer until something forces the issue.
  • Sub-agent / parentUuid session chaining (Claude Code resume semantics) — parser concern, lands with #2.
  • Backup automation in code.

#Chosen approach

Storage: option B (locked). SQLite (modernc.org/sqlite, no CGO) with WAL, single DB file. Schema includes turns_fts (FTS5 over prose content) and tool_outputs_fts (FTS5 over tool_calls text). Both indexes populated by INSERT/UPDATE/DELETE triggers from the start — #3 only wires queries.

Wire protocol: option B (locked). Collector emits turn-only NDJSON. Server upserts the session row on first-seen turn from session_meta carried on the turn; subsequent turns extend the session's ended_at. No separate session events on the wire. Reduces collector state and makes outbox replay trivially idempotent.

Repo layout: option A (locked). Monorepo. cmd/lethe/ (this task) and cmd/lethe-collector/ (placeholder dir for #2) under one go.mod. Shared types in internal/shared/wire/.

Auth model. Server binds 127.0.0.1 only. A reverse proxy on phoebe (Caddy or Traefik) terminates TLS and forwards to localhost. Two independent auth paths inside lethe, each enable-able via config:

  1. Forward-auth (header trust). Reverse proxy runs Authelia forward-auth on the lethe vhost, and on success injects Remote-User (and Remote-Email, Remote-Groups) headers. Middleware reads Remote-User (header name configurable for non-Authelia setups), checks allowlist, 403 on miss. Used by browsers and any other tool that already has an Authelia session cookie.

  2. OIDC bearer. Lethe is registered as an OIDC client of Authelia (client_id/client_secret configured at the Authelia side, client_id only at lethe — the server validates tokens, never issues them). Middleware accepts Authorization: Bearer <jwt>, validates against Authelia's published JWKS (discovered via /.well-known/openid-configuration), enforces iss, aud, exp, then resolves user from preferred_username falling back to sub. Used by the collector and any scripted client. No code-flow / callback / cookie machinery in lethe — the bearer must already be obtained out-of-band (Authelia issues it via its OIDC flow to whatever client got it).

Both paths drop the user identity into the request context so handlers see the same shape regardless of how the user was authenticated. Both paths are gated by the same auth.allowed_users allowlist as a defense-in-depth check. If both are enabled and a request carries both an Authorization header and a Remote-User header, the bearer is validated first; the proxy header is ignored unless bearer validation fails open (configurable). Health and metrics endpoints are mounted outside the auth middleware so phoebe can scrape them locally without going through the proxy. The implicit assumption — that nothing else on phoebe binds 127.0.0.1 and forges the header — is the whole trust model for path 1; documented in the README.

Layered design (from go-selfhosted-backend):

  • internal/config — Viper, strict mode, validator tags, fails on unknown keys. Top-level Config struct exposes substructs (Server, Database, Auth, …) via config-section:"" tags so steward can inject them by type into individual services.
  • internal/platform/database — sqlx connection, migration runner over embed.FS, transaction helper. Implements steward Init (open + migrate) and Destroy (close).
  • internal/platform/observability — scribe logger service (Init sets slog.SetDefault), Prometheus registry singleton.
  • internal/platform/healthChecker interface, Set aggregator service with steward multi-inject ([]Checker \inject:""``), DB check service registered as a Checker. Adding a new check = registering a new asset; no edits to the set.
  • internal/server — chi router, middleware stack (request-id, logging, metrics, recovery, auth), Start (listen) and Stop (graceful shutdown). Marked steward.Root() so it's always started.
  • internal/domain/ingest — handler + service for POST /api/v1/ingest, both as steward services. Service owns the upsert-session-then-upsert-turn logic in a single transaction per batch.
  • internal/domain/session — handler + repository for the list/detail JSON API, both as steward services.
  • internal/shared/wireTurnEvent, SessionMeta types. Imported by both server and (eventually) collector.
  • internal/pkg/httputil + internal/pkg/apierror — JSON helpers, RFC 7807 problem rendering with culpa code → status mapping. Pure libraries, not steward components.

Wiring & lifecycle. All wiring goes through steward.Manager. cmd/lethe/main.go is a thin shell: parse -config, load Config, register configuration + service assets, conditionally register OIDCVerifier only when cfg.Auth.OIDC.Enabled, call Inject → Init → Start, wait on signal, call Stop → Destroy. Dependency direction is enforced by struct tags (config:"", inject:"", inject:"" optional:"true"); the manager solves the topological order. No global state outside slog.Default (set by the logger service in Init).

Wire type (locked contract for #2):

package wire

type TurnEvent struct {
    Tool        string          `json:"tool"`
    Host        string          `json:"host"`
    SessionID   string          `json:"session_id"`
    TurnID      string          `json:"turn_id"`
    Seq         int64           `json:"seq"`
    Role        string          `json:"role"`        // user | assistant | tool | system
    Timestamp   int64           `json:"timestamp"`   // unix epoch seconds
    Content     string          `json:"content"`
    Model       *string         `json:"model,omitempty"`
    TokensIn    *int64          `json:"tokens_in,omitempty"`
    TokensOut   *int64          `json:"tokens_out,omitempty"`
    CostUSD     *float64        `json:"cost_usd,omitempty"`
    ToolCalls   json.RawMessage `json:"tool_calls,omitempty"`
    SessionMeta SessionMeta     `json:"session_meta"`
    Metadata    json.RawMessage `json:"metadata,omitempty"`
}

type SessionMeta struct {
    WorkingDir *string         `json:"working_dir,omitempty"`
    SourceFile string          `json:"source_file"`
    StartedAt  *int64          `json:"started_at,omitempty"` // optional; server falls back to MIN(turn.timestamp)
    Metadata   json.RawMessage `json:"metadata,omitempty"`
}

tool_calls is json.RawMessage: the server doesn't interpret it, just persists it and feeds its serialized text to tool_outputs_fts. Carrying session_meta on every turn is redundant (~100 bytes/turn) and intentional: collector replay never has to ask "did I already send the session header?"

Ingest semantics.

  • Request body is NDJSON; one TurnEvent per line. Hard cap on body size (configurable, default 16 MiB). Per-turn content soft-capped at 4 MiB (configurable); a single oversize turn is a LineError, not a 413.
  • Required wire fields: tool, host, session_id, turn_id, seq, role, timestamp, content. role ∈ {user, assistant, tool, system}. source_file (in SessionMeta) capped at 1024 bytes. Validation runs per-line before DB; failure → LineError.
  • owner is server-derived from the authenticated user (request context), never read from the wire. The wire format in internal/shared/wire/ deliberately has no owner field — collectors cannot impersonate other owners.
  • All timestamps stored as SQLite INTEGER (unix epoch seconds). No TEXT timestamps anywhere in the schema.
  • Server processes lines in order within a single SQLite transaction.
  • For each turn:
    1. INSERT INTO sessions ... ON CONFLICT (owner, tool, host, session_id) DO UPDATE SET ended_at = MAX(ended_at, excluded.ended_at). started_at, working_dir, source_file, metadata are first-write-wins (preserved on conflict); only ended_at extends.
    2. INSERT INTO turns ... ON CONFLICT (owner, tool, host, session_id, turn_id) DO UPDATE SET <all non-key columns>. Last-write-wins on the turn row. Triggers keep both FTS tables in sync.
  • On any per-line error (malformed JSON, missing required field, FK violation), the transaction is rolled back for the failing line and beyond, and the response is 200 {"accepted": N, "errors": [{"line": N, "error": "..."}]} where accepted is the count of lines successfully committed in a previous chunk. Practically, the server processes and commits in chunks (e.g. every 500 lines) so a single bad line near the end of a batch doesn't lose 499 good ones. Collector advances its offset by exactly accepted.
  • Hard server errors (DB down, OOM): 5xx with empty accepted, collector retries the whole batch from the same offset.

Schema (initial migration). As specified in the RFC §4.2, with these adjustments:

  • Add owner TEXT NOT NULL to sessions and turns. Composite PKs become sessions(owner, tool, host, session_id) and turns(owner, tool, host, session_id, turn_id) with FK (owner, tool, host, session_id) → sessions. Owner-leading PK gives a free index for "list my sessions" and makes per-user isolation a schema property.
  • Add owner UNINDEXED column to both FTS tables so #3's search can WHERE turns_fts MATCH ? AND owner = ? cheaply; triggers carry it from the source row.
  • Add turns_fts_update trigger so an UPSERT on turns keeps the FTS row current (the RFC has insert/delete only).
  • Add the parallel tool_outputs_fts table with insert/update/delete triggers, indexing the tool_calls column when non-NULL.
  • Add schema_migrations (managed by golang-migrate) — replaces the RFC's hand-rolled schema_version.
  • Index sessions(owner, started_at DESC) for the timeline/list query.

CLI. This binary has one mode (run server). flag package, no cobra. Single arg: -config <path>. (Cobra will land with the collector binary in #2.)

Tradeoffs that settled it.

  • SQLite vs Postgres: Phoebe doesn't need another service, FTS5's snippet/highlight and tokenizer are better than tsvector for this workload, single-user means no write contention. Migrating to PG later is mechanical if I'm wrong; the cost of being wrong is an afternoon.
  • Turn-only wire vs explicit session events: Outbox replay correctness wins over "cleaner" two-event schema. The redundant session_meta per turn costs nothing at this scale.
  • Inline tool-call payloads vs blob/dedupe: RFC's expected scale doesn't justify the extra code. Revisit in #3 if FTS index size or query latency actually bites.
  • Defer all UI: Skipped HTML views in this task to land the JSON API + ingest pipeline first. The collector (#2) only needs the API; the UI lands once there's real data to look at.

Unknowns that remain.

  • True size distribution of tool-call payloads in real Claude Code transcripts — won't know until #2 runs against ~/.claude/projects/. If the FTS index for tool_outputs_fts grows pathologically, #3 has the option to add a size cap or move that table to a separate attached DB.
  • JWKS rotation cadence at Authelia and the right cache TTL on the lethe side (go-oidc defaults usually fine; revisit if validation latency or 401-storms appear). Not blocking #1.

#Backwards-compatibility check

Greenfield. Empty repo, no consumers, nothing to break. The only forward-compat concern is the wire format, which is locked into internal/shared/wire/ and versioned implicitly via the /api/v1/ path prefix. Future breaking changes get /api/v2/.

TDD: yes (reason: ingest idempotency, the upsert-session-from-turn semantics, the chunked-commit-with-partial-accept response shape, the auth middleware allowlist, and migration application on startup are all deterministic, regression-prone surfaces.)

#Invariants

  • Server never opens, reads, or writes any file outside its own data directory and config path. source_file from incoming turns is stored as opaque string only.
  • The host identifier on a turn is whatever the collector says it is. Server does not derive or validate it against the authenticated user (whether resolved via forward-auth header or OIDC bearer).
  • All schema changes go through embed.FS migrations applied on startup. No ad-hoc DDL, no startup-time conditional CREATE TABLE.
  • Composite primary keys: sessions keyed on (owner, tool, host, session_id); turns keyed on (owner, tool, host, session_id, turn_id). No surrogate IDs anywhere.
  • POST /api/v1/ingest is idempotent at the turn level per owner: re-POST of identical (tool, host, session_id, turn_id) by the same authenticated user produces the same final state regardless of how many times it's sent. Two different users posting the same (tool, host, session_id, turn_id) produce two distinct rows.
  • owner is set from the authenticated user on every ingest write. The wire format has no owner field; the server never reads owner from the request body.
  • Read endpoints (GET /api/v1/sessions, GET /api/v1/sessions/{tool}/{host}/{session_id}) return only rows where owner = <current user>, except when the current user is in auth.admins and supplies ?owner=<user> (specific owner) or ?owner=* (all owners). Non-admin requests with ?owner= are 403.
  • Every route under /api/v1/* validates the configured user header (default Remote-User) against auth.allowed_users. Only /healthz, /readyz, /metrics are unauthenticated.
  • The HTTP listener binds 127.0.0.1 only. Binding any other interface is a config error and fails fast at startup.
  • SQLite is opened in WAL mode with _busy_timeout configured. Foreign keys are enforced (PRAGMA foreign_keys = ON).
  • A turn insert/update fires both FTS triggers as appropriate; the turns_fts and tool_outputs_fts tables are never written to directly outside triggers.
  • Errors leaving any HTTP handler are rendered as RFC 7807 application/problem+json with the culpa code mapped to status; internal (5xx) errors are logged with full stacktrace via scribe.Err before being sanitized for the response.

#Principles

  • Greenfield — no backwards-compat shims, no deprecation paths, no _unused parameters. If something turns out wrong, rewrite the file.
  • Schema additions over schema changes. Tool-specific fields go into the metadata JSON column on the relevant row; new SQL columns require justification.
  • Fail fast on config and migration errors. The auth allowlist has no default — empty list means the server refuses to start.
  • Stdlib + chi + sqlx + golang-migrate + modernc.org/sqlite + go-oidc/v3 (JWT/JWKS validation only — no auth-code flow) + go.bigb.es/auxilia (steward for DI/lifecycle, culpa for errors, scribe for logs, async only if a background task surfaces). No ORM, no template engine, no UI dependencies in this task.
  • Lifecycle and dependency wiring go through steward.Manager. Adding a new component is registering an asset; main.go does not grow.
  • Each layer is a steward service that declares its deps via struct tags (config:"", inject:"", optional + multi-injection where it earns its keep). Constructors are the zero value; setup happens in Init.
  • Unit tests construct services with hand-built deps (no manager). Integration / e2e tests use steward.Manager to assemble a real graph against a :memory: DB.
  • Errors propagate as culpa errors with codes; HTTP layer translates once at the boundary.
  • Every authenticated route and every ingest semantic has a regression test.
  • internal/shared/wire/ is treated as a published API even though it isn't published — changes ripple into the collector and need to be obvious in diff.

#Plan

Approach: build bottom-up — wire types → config → DB+schema → platform → HTTP foundation → auth → ingest → read API → main. Each phase is one commit; tests land with the phase that introduces the behavior. Greenfield, so no compat shims.

#Phase 1 — Bootstrap & wire contract

  • 1.1 go.mod (create) — module sourcecraft.dev/bigbes/lethe, Go 1.22+. Direct deps stub: chi/v5, sqlx, modernc.org/sqlite, golang-migrate/v4, viper, validator/v10, prometheus/client_golang, coreos/go-oidc/v3, go.bigb.es/auxilia/{steward,culpa,scribe}.
  • 1.2 Justfile, .air.toml, Dockerfile, docker-compose.yml, .golangci.yml, .gitignore, config.example.yaml (create) — per go-selfhosted-backend skill conventions; SQLite volume mount, no CGO.
  • 1.3 README.md (create) — purpose, quickstart, trust model section documenting both auth paths: (a) the 127.0.0.1 + reverse-proxy + Authelia forward-auth + Remote-User chain (with a sample Caddy forward_auth snippet), and (b) the OIDC bearer flow against Authelia (sample Authelia identity_providers.oidc.clients entry + sample lethe auth.oidc config). Backup section with the sqlite3 .backup cron snippet.
  • 1.4 cmd/lethe/main.go (create, ~30 lines) — flag.String("config", ...), prints version and exits. Real wiring in Phase 9.
  • 1.5 internal/shared/wire/wire.go (create, ~40 lines) — TurnEvent, SessionMeta exactly as specified in Design. No methods; pure data. Locked contract for #2.
  • Invariant: internal/shared/wire/ published-API-discipline (Principles).
  • Commit: feat: bootstrap lethe server skeleton + wire contract

#Phase 2 — Config

  • 2.1 internal/config/config.go (create, ~190 lines) — Config struct with Server, Database, Auth, Logging, Ingest substructs. Each substruct has mapstructure, validate, and config-section:"" tags so steward can inject them by type into individual services.
    • Database substruct: Path string (sqlite file path), BusyTimeout time.Duration (default 5s).
    • Auth substruct: AllowedUsers []string, Admins []string (subset of allowed users; may be empty), ForwardAuth ForwardAuthConfig{ Enabled bool; UserHeader string (default "Remote-User") }, OIDC OIDCConfig{ Enabled bool; Issuer string (URL); Audience string; UsernameClaim string (default "preferred_username") }.
    • Ingest substruct: MaxBodyBytes int64 (default 16 MiB), MaxTurnContentBytes int64 (default 4 MiB), ChunkSize int (default 500).
    • Server substruct: Bind string, ShutdownGrace time.Duration (default 10s).
    • func Load(path string) (*Config, error) — viper strict mode, validator, env-var overrides via viper.SetEnvPrefix("LETHE") + viper.AutomaticEnv() + viper.SetEnvKeyReplacer(NewReplacer(".", "_")), returns culpa error on failure.
    • func MustLoad(path string) *Config — wraps Load, panics on error. Used by main.go.
    • Validation: Server.Bind must equal 127.0.0.1 or 127.0.0.1:<port> (regex); Database.Path required; at least one of Auth.ForwardAuth.Enabled or Auth.OIDC.Enabled must be true (custom validator: auth_at_least_one); Auth.AllowedUsers min=1; every entry in Auth.Admins must also appear in Auth.AllowedUsers (custom validator: admins_subset_of_allowed); if OIDC enabled, Auth.OIDC.Issuer url and Auth.OIDC.Audience required; Ingest.MaxBodyBytes gt=0, Ingest.MaxTurnContentBytes gt=0,ltefield=MaxBodyBytes, Ingest.ChunkSize gt=0.
  • 2.2 internal/config/config_test.go (create, TDD) — tests for: empty allowlist rejected; non-loopback bind rejected; both auth modes disabled rejected; OIDC enabled without issuer rejected; OIDC enabled with non-URL issuer rejected; admin not in allowed_users rejected; empty admins list accepted; missing Database.Path rejected; MaxTurnContentBytes > MaxBodyBytes rejected; unknown YAML key rejected (strict mode); env override works (LETHE_AUTH_ALLOWED_USERS overrides YAML); valid forward-auth-only config loads; valid OIDC-only config loads; valid both-enabled config loads; defaults applied (UserHeader="Remote-User", UsernameClaim="preferred_username", MaxBodyBytes=16MiB, MaxTurnContentBytes=4MiB, ChunkSize=500, BusyTimeout=5s, ShutdownGrace=10s).
  • Invariants: auth allowlist has no default (Principles); listener binds 127.0.0.1 only.
  • Commit: feat(config): viper-loaded config with fail-fast validation

#Phase 3 — Database & schema

  • 3.1 internal/platform/database/database.go (create, ~110 lines) — Database is a steward service.
    • type Database struct { Cfg config.DatabaseConfig \config:""`; DB *sqlx.DB }DBpopulated inInit`.
    • func (d *Database) Init(ctx context.Context) error — opens via modernc.org/sqlite with _journal_mode=WAL, _busy_timeout=5000, _foreign_keys=on, _synchronous=NORMAL, cache=shared; then runs Migrate(d.DB).
    • func (d *Database) Destroy(ctx context.Context) error — closes the DB.
    • func Migrate(db *sqlx.DB) error — runs embed.FS migrations via golang-migrate/v4 iofs source + sqlite driver. Pure function so tests can call it directly.
    • func InTx(ctx, db, fn func(*sqlx.Tx) error) error — transaction helper, rollback on error. Pure function.
    • Other services depend on this via inject:"" and read .DB.
  • 3.2 internal/platform/database/migrations/0001_init.up.sql + .down.sql (create) — sessions, turns, turns_fts (FTS5 over content + owner UNINDEXED), tool_outputs_fts (FTS5 over tool_calls + owner UNINDEXED), insert/update/delete triggers for both FTS tables (triggers carry owner from source row). Composite PKs: sessions(owner, tool, host, session_id); turns(owner, tool, host, session_id, turn_id) with FK (owner, tool, host, session_id) → sessions. All timestamps (started_at, ended_at, turns.timestamp) are INTEGER NOT NULL (unix epoch seconds). tool_calls, metadata are TEXT storing JSON. Index on sessions(owner, started_at DESC) for timeline.
  • 3.3 internal/platform/database/migrations.go (create, ~10 lines) — //go:embed migrations/*.sql var FS embed.FS.
  • 3.4 internal/platform/database/database_test.go (create, TDD) — tests with :memory: DB: migrate is idempotent on second run; turn insert populates turns_fts with correct owner; turn update updates turns_fts; turn delete removes from turns_fts; same for tool_outputs_fts when tool_calls non-NULL; FK rejects orphan turn; two owners with same (tool, host, session_id) coexist as distinct sessions; FTS query with owner = ? filter returns only that owner's rows.
  • Invariants: WAL + busy_timeout + FKs on; FTS tables only via triggers; embed.FS migrations only.
  • Commit: feat(db): SQLite schema with FTS5 + migration runner

#Phase 4 — Observability & health

  • 4.1 internal/platform/observability/logger.go (create, ~110 lines) — Logger steward service.
    • type Logger struct { Cfg config.LoggingConfig \config:""`; L *slog.Logger }`
    • func (l *Logger) Init(ctx) error — builds scribe.NewTintHandler (or JSON handler per cfg), applies WithLevel, WithMaskKeys("password","token","authorization","secret","cookie"), wraps with a small contextHandler that pulls request_id and user from r.Context() and adds them to every record. Sets slog.SetDefault(l.L).
    • func WithRequestID(ctx, id string) context.Context, func RequestIDFrom(ctx) string — context helpers used by the request-id middleware in Phase 5.
  • 4.2 internal/platform/observability/metrics.go (create, ~80 lines) — Metrics steward service.
    • type Metrics struct { Registry *prometheus.Registry; HTTPRequests *prometheus.CounterVec; HTTPDuration *prometheus.HistogramVec; IngestLinesAccepted, IngestLinesErrored, IngestChunksCommitted prometheus.Counter }
    • func (m *Metrics) Init(ctx) errorprometheus.NewRegistry(); register collectors.NewProcessCollector + collectors.NewGoCollector; register HTTP histograms with labels {method, route, status} (route from chi.RouteContext(r.Context()).RoutePattern() — never raw path, to keep cardinality bounded); register ingest counters.
    • HTTP middleware in Phase 5 reads from Metrics; ingest service in Phase 7 increments the ingest counters.
  • 4.3 internal/platform/health/health.go (create, ~90 lines)
    • type Checker interface { Name() string; Check(ctx context.Context) error }
    • type DBCheck struct { DB *database.Database \inject:""` }— implementsChecker; registered as a steward service tagged for multi-injection. CheckrunsSELECT 1`.
    • type Set struct { Checks []Checker \inject:""` }— steward multi-injects every registeredChecker`.
    • func (s *Set) Run(ctx) (results map[string]error, allOK bool) — applies a per-check 2s timeout via context.WithTimeout. Empty Checks slice → returns allOK = true (intentional: no checks means nothing has declared a readiness signal yet, not an error).
    • Adding new checks later = registering a new asset that implements Checker. No edits to Set.
  • 4.4 internal/platform/health/health_test.go (create, TDD) — Set returns aggregate failure when any check errors; passes when all OK; empty Checks returns allOK=true; per-check timeout enforced. Uses fake Checker implementations (no steward needed for unit test).
  • 4.5 internal/platform/steward_unwind_test.go (create, TDD, throwaway after Phase 4) — confirms steward calls Destroy on already-init'd siblings when a later component's Init errors; if it doesn't, Database.Destroy won't run on partial-init failures and we need to add an explicit guard in main. Verifies the assumption underpinning the lifecycle design.
  • Commit: feat(platform): scribe logger, prometheus registry, health checker set

#Phase 5 — HTTP foundation

  • 5.1 internal/pkg/apierror/apierror.go (create, ~80 lines)
    • type Problem struct { Type, Title, Status, Detail, Code, Instance, Errors } — RFC 7807 shape.
    • func Render(w, r, err error) — extracts culpa.Code from err, maps to HTTP status (NotFound→404, Invalid→400, Unauthorized→401, Forbidden→403, Conflict→409, Internal→500), writes application/problem+json. 5xx logs full stacktrace via scribe.Err before sanitizing.
  • 5.2 internal/pkg/httputil/httputil.go (create, ~50 lines) — ReadJSON, WriteJSON, ReadNDJSONLines(r io.Reader, maxBytes int64) iter.Seq2[[]byte, error].
  • 5.3 internal/server/server.go (create, ~150 lines) — Server is the steward root service.
    • type Server struct { Cfg config.ServerConfig \config:""`; Log *observability.Logger `inject:""`; Metrics *observability.Metrics `inject:""`; Health *health.Set `inject:""`; Auth *auth.Authenticator `inject:""`; Ingest *ingest.Handler `inject:""`; Sessions *session.Handler `inject:""`; httpSrv *http.Server }`
    • func (s *Server) Init(ctx) error — builds chi router, mounts middleware stack:
      • request-id: generate ULID, set on context via observability.WithRequestID, echo as X-Request-ID response header.
      • logging: structured access log per request, picks up request-id automatically via the contextHandler from Phase 4.1. Body never logged.
      • metrics: increments Metrics.HTTPRequests and observes Metrics.HTTPDuration using chi.RouteContext(r.Context()).RoutePattern() as the route label.
      • recovery: panics → 500 problem.
    • Unauthenticated routes: GET /healthz (process up), GET /readyz (calls s.Health.Run with 5s timeout, 503 on any failure), GET /metrics (promhttp.HandlerFor(s.Metrics.Registry, promhttp.HandlerOpts{})).
    • Authed /api/v1/* group with s.Auth.Middleware then s.Ingest.Mount(r) and s.Sessions.Mount(r) (paths inside Mount are relative to the /api/v1 group).
    • Validates Cfg.Bind resolves to a loopback IP — error otherwise.
    • func (s *Server) Start(ctx) error — spawns http.Server.ListenAndServe in a goroutine; returns nil immediately. Errors propagate via stop channel.
    • func (s *Server) Stop(ctx) errorhttpSrv.Shutdown(ctx) with Cfg.ShutdownGrace (default 10s) drain budget. In-flight ingest chunks finish their commit; partially-processed batches return their Accepted count truthfully.
    • steward.Root() so it's always started even if no other component injects it.
  • 5.4 internal/pkg/apierror/apierror_test.go (create, TDD) — each culpa code maps to expected status; problem JSON has all required fields; internal-error response detail is sanitized (no stack trace in body).
  • 5.5 internal/server/server_test.go (create, TDD) — non-loopback bind returns error from Server.Init; recovery middleware turns panic into 500 problem; request-id propagates to log lines. Tests construct Server directly with hand-built deps (skip steward; unit test of router behavior).
  • Invariants: errors rendered as RFC 7807; 5xx logged with stack; bind 127.0.0.1 enforced.
  • Commit: feat(http): chi server with middleware stack + RFC 7807 problem renderer

#Phase 6 — Auth middleware (forward-auth + OIDC bearer)

  • 6.1 internal/server/auth/oidc.go (create, ~140 lines) — OIDCVerifier is a steward service, registered conditionally in main only when cfg.Auth.OIDC.Enabled.
    • type OIDCVerifier struct { Cfg config.OIDCConfig \config:""`; verifier *oidc.IDTokenVerifier; usernameClaim string }`
    • func (v *OIDCVerifier) Init(ctx) error — builds oidc.NewProvider(ctx, Cfg.Issuer) (which fetches /.well-known/openid-configuration + JWKS) and provider.Verifier(&oidc.Config{ClientID: Cfg.Audience}). Accepts go-oidc default clock skew (no explicit option). Hard-fails at startup if Authelia unreachable; that's the chosen tradeoff (see Risks).
    • func (v *OIDCVerifier) Verify(ctx, raw string) (user string, err error) — validates JWT, extracts username via usernameClaim, falls back to sub. Returns culpa.Unauthorized-coded error on any validation failure.
  • 6.2 internal/server/auth/middleware.go (create, ~150 lines) — Authenticator is a steward service.
    • type Authenticator struct { Cfg config.AuthConfig \config:""`; Log *observability.Logger `inject:""`; Verifier *OIDCVerifier `inject:"" optional:"true"`; allowed, admins map[string]struct{} }`
    • func (a *Authenticator) Init(ctx) error — builds allowed and admins lowercase sets from Cfg.AllowedUsers / Cfg.Admins; if Cfg.OIDC.Enabled && Verifier == nil → hard error (config invariant breach).
    • func (a *Authenticator) Middleware(next http.Handler) http.Handler — resolution order: (1) if OIDC enabled and Authorization: Bearer <token> present, call Verifier.Verify; (2) if forward-auth enabled and <UserHeader> non-empty, take it; (3) else 401 problem. After resolving user: lowercase, check allowed, 403 problem on miss, otherwise put Identity{User, IsAdmin} into request context via WithIdentity and call next.
    • type Identity struct { User string; IsAdmin bool }
    • func WithIdentity(ctx, Identity) context.Context, func IdentityFrom(ctx) (Identity, bool), func MustIdentity(ctx) Identity — context helpers used by handlers.
    • Caller (Server.Init) mounts middleware on /api/v1/* only; /healthz, /readyz, /metrics unmounted (Phase 5/9).
  • 6.3 internal/server/auth/middleware_test.go (create, TDD) — table-driven against an in-memory router:
    • Forward-auth path (OIDC disabled): missing header → 401; header set, user not in allowlist → 403; header set, allowed → 200; case-insensitive allowlist match → 200; configurable header name (X-Forwarded-User) honored → 200.
    • OIDC path (forward-auth disabled): missing Authorization → 401; malformed bearer → 401; valid JWT signed by test JWKS, allowed user → 200; valid JWT, user not in allowlist → 403; expired JWT → 401; wrong-audience JWT → 401; preferred_username claim used; falls back to sub when preferred_username absent.
    • Both enabled: bearer present and valid → user resolved from JWT (header ignored); bearer invalid + header present → 401 (do not silently fall back to header — fail closed); both absent → 401.
    • Admin flag: user in Auth.AdminsIdentityFrom(ctx).IsAdmin == true; user not in admins → false; admin not in AllowedUsers rejected at config load (covered in Phase 2 tests).
    • Test helper sets up a local httptest.Server serving JWKS + OIDC discovery + signs JWTs with a generated RSA key; pointed at by the verifier under test.
    • Problem JSON shape verified for 401 and 403.
  • Invariant: every /api/v1/* route validates auth; only /healthz, /readyz, /metrics exempt (enforced by mount point in Phase 9). Same auth.allowed_users allowlist applied regardless of which auth path resolved the user.
  • Commit: feat(auth): forward-auth + OIDC bearer middleware with shared allowlist

#Phase 7 — Ingest domain

  • 7.1 internal/domain/ingest/repository.go (create, ~140 lines) — Repository is a steward service.
    • type Repository struct { Database *database.Database \inject:""` }`
    • func (r *Repository) UpsertChunk(ctx, tx *sqlx.Tx, owner string, turns []wire.TurnEvent) error — single tx: per turn, INSERT … ON CONFLICT (owner,tool,host,session_id) DO UPDATE SET ended_at = MAX(ended_at, excluded.ended_at) for sessions (first-write-wins on metadata); INSERT … ON CONFLICT (owner,tool,host,session_id,turn_id) DO UPDATE SET <all non-key cols> for turns. owner is bound from the parameter on every row — never sourced from the wire payload. started_at falls back to MIN(turn.timestamp) when SessionMeta.StartedAt is nil.
  • 7.2 internal/domain/ingest/service.go (create, ~160 lines) — Service is a steward service.
    • type Service struct { Cfg config.IngestConfig \config:""`; Repo *Repository `inject:""`; Log *observability.Logger `inject:""`; Metrics *observability.Metrics `inject:""` }`
    • type Result struct { Accepted int; Errors []LineError }
    • func validateTurn(t wire.TurnEvent, maxContentBytes int64) error — required fields (tool, host, session_id, turn_id, seq, role, timestamp, content non-empty); role ∈ {user, assistant, tool, system}; len(content) ≤ maxContentBytes; len(SessionMeta.SourceFile) ≤ 1024. Returns culpa.Invalid-coded error on failure.
    • func (s *Service) Ingest(ctx, owner string, body io.Reader, maxBytes int64) (Result, error) — reads NDJSON via httputil.ReadNDJSONLines; per line: JSON-unmarshal then validateTurn; buffers up to Cfg.ChunkSize lines; on full chunk, opens tx and calls Repo.UpsertChunk(ctx, tx, owner, chunk); on commit success, increments Metrics.IngestChunksCommitted and IngestLinesAccepted by chunk len, adds chunk len to Accepted; on parse/validation/DB error mid-chunk, rolls back chunk, increments IngestLinesErrored, logs at WARN with line number, owner, tool/host/session_id, returns with Accepted reflecting prior chunks plus a LineError for the failing line; subsequent lines are not processed.
  • 7.3 internal/domain/ingest/handler.go (create, ~70 lines) — Handler is a steward service.
    • type Handler struct { Cfg config.IngestConfig \config:""`; Service *Service `inject:""` }`
    • func (h *Handler) Post(w, r) — extract auth.MustIdentity(r.Context()).User as owner, Content-Type check, body limit reader at Cfg.MaxBodyBytes, calls Service.Ingest(ctx, owner, ...), returns 200 {"accepted": N, "errors": [...]}. Hard server errors (DB down) → 5xx problem with empty accepted.
    • func (h *Handler) Mount(r chi.Router)r.Post("/ingest", h.Post). Called by Server.Init.
  • 7.4 internal/domain/ingest/service_test.go + repository_test.go (create, TDD) — tests against :memory: DB:
    • Wire validation: each required field missing → LineError; bad role value → LineError; content over MaxTurnContentBytesLineError (not 413, body still under cap); source_file over 1024 → LineError. All other lines in the same chunk that came before the bad line still commit.
    • Idempotency: posting same NDJSON twice as the same owner produces identical row counts and identical row contents.
    • First-write-wins on session: posting turn-1 with WorkingDir=A then turn-2 with WorkingDir=B leaves session.WorkingDir=A.
    • Last-write-wins on turn: re-posting same turn_id with new content updates the row.
    • ended_at extends: MAX(existing, incoming).
    • started_at fallback to MIN turn timestamp when SessionMeta.StartedAt is nil.
    • Chunked partial accept: 3-chunk batch with bad line in chunk 3 returns Accepted = 2*chunkSize, error references correct line number.
    • FK violation produces a LineError, not a 5xx.
    • DB-down → 5xx, no Accepted.
    • Body over MaxBodyBytes → 413 problem.
    • FTS rows present after ingest with correct owner (turns_fts contains turn content; tool_outputs_fts contains tool_calls JSON when present).
    • Per-user isolation: same NDJSON ingested by user A then by user B produces two distinct sessions and two distinct sets of turns; rows have correct owner. Modifying user B's session's ended_at does not affect user A's row.
    • Wire owner is ignored: a TurnEvent JSON line that includes a stray "owner": "evil" field (not in the wire struct) does not change the stored owner; ingest still attributes to the authenticated user. (Negative test against the trust model.)
  • Invariants: ingest is idempotent at turn level per owner; owner is server-derived; chunked-commit-with-partial-accept semantics; first-write-wins session, last-write-wins turn.
  • Commit: feat(ingest): NDJSON ingest with chunked transactions and partial-accept

#Phase 8 — Sessions read API

  • 8.1 internal/domain/session/repository.go (create, ~150 lines) — Repository is a steward service.
    • type Repository struct { Database *database.Database \inject:""` }`
    • type OwnerScope struct { User string; AllOwners bool; SpecificOwner *string } — resolved by handler from identity + ?owner= param.
    • type ListFilter struct { Owner OwnerScope; Tool, Host *string; Since, Until *int64; Limit, Offset int }
    • func (r *Repository) List(ctx, f ListFilter) ([]Session, error) — dynamic WHERE built from non-nil filters; owner clause: if AllOwners no clause, if SpecificOwner WHERE owner = ?, else WHERE owner = User; ORDER BY started_at DESC; LIMIT/OFFSET.
    • func (r *Repository) Get(ctx, scope OwnerScope, tool, host, sessionID string) (*SessionWithTurns, error) — owner clause same as above; joined select; returns culpa.NotFound when session missing or owner mismatch (do not leak existence by returning 403 vs 404 — a non-admin asking for someone else's session must see the same 404 as if the session didn't exist).
  • 8.2 internal/domain/session/handler.go (create, ~120 lines) — Handler is a steward service.
    • type Handler struct { Repo *Repository \inject:""` }`
    • Pagination defaults: limit=50 default, limit capped at 200, negative limit/offset clamped to default/0. since > until → 400 problem.
    • func (h *Handler) resolveScope(r) (OwnerScope, error) — read auth.MustIdentity(r.Context()), parse ?owner=; if param empty → scope is current user; if param non-empty and identity is admin → scope is SpecificOwner (or AllOwners for ?owner=*); if param non-empty and identity is not admin → 403 problem.
    • func (h *Handler) List(w, r)resolveScope, parse other query params (tool, host, since, until, limit, offset), calls Repo.List, writes JSON.
    • func (h *Handler) Get(w, r)resolveScope, chi URL params, calls Repo.Get, writes JSON; 404 problem when not found (auto via apierror).
    • func (h *Handler) Mount(r chi.Router)r.Get("/sessions", h.List); r.Get("/sessions/{tool}/{host}/{session_id}", h.Get). Called by Server.Init.
  • 8.3 internal/domain/session/repository_test.go + handler_test.go (create, TDD) — tests against :memory: DB:
    • Filters compose correctly (tool only, host only, time range, all combined); pagination caps limit (e.g. max 200) and clamps negatives; ordering by started_at desc; turns inline in correct order by seq.
    • Per-user isolation (List): with rows owned by A and B, a request authenticated as A returns only A's rows. B's rows do not appear regardless of filter combination.
    • Per-user isolation (Get): A asking for B's session by URL → 404 problem (not 403 — must not leak existence). A asking for own session → 200.
    • Admin override: admin with ?owner=B lists/gets B's rows. Admin with ?owner=* lists across all owners. Admin without ?owner= defaults to admin's own rows (no implicit cross-tenant view).
    • Non-admin ?owner= rejected: non-admin user A passing ?owner=A (their own user!) or ?owner=B → 403 problem. (Param is admin-only; non-admins must not pass it at all.)
  • Invariants: errors rendered as RFC 7807; composite PKs throughout; per-user isolation enforced; existence never leaked across owners.
  • Commit: feat(session): list and detail JSON API with filters

#Phase 9 — Main wiring + ops surface

  • 9.1 cmd/lethe/main.go (modify, ~70 lines) — thin shell. No business logic; everything is a steward asset.
    cfg := config.MustLoad(*configPath)
    mgr := steward.NewManager()
    mgr.AddComponent(ctx,
        steward.MustConfigurationAsset(cfg),
        steward.MustServiceAsset(&observability.Logger{}),
        steward.MustServiceAsset(&observability.Metrics{}),
        steward.MustServiceAsset(&database.Database{}),
        steward.MustServiceAsset(&health.DBCheck{}),     // registers as Checker
        steward.MustServiceAsset(&health.Set{}),
        steward.MustServiceAsset(&auth.Authenticator{}),
        steward.MustServiceAsset(&ingest.Repository{}),
        steward.MustServiceAsset(&ingest.Service{}),
        steward.MustServiceAsset(&ingest.Handler{}),
        steward.MustServiceAsset(&session.Repository{}),
        steward.MustServiceAsset(&session.Handler{}),
        steward.MustServiceAsset(&server.Server{}, steward.Root()),
    )
    if cfg.Auth.OIDC.Enabled {
        mgr.AddComponent(ctx, steward.MustServiceAsset(&auth.OIDCVerifier{}))
    }
    must(mgr.Inject(ctx)); must(mgr.Init(ctx)); must(mgr.Start(ctx))
    // wait on SIGINT/SIGTERM
    must(mgr.Stop(stopCtx)); must(mgr.Destroy(ctx))
    
    • All routes mounted by Server.Init (Phase 5.3): /healthz, /readyz, /metrics outside the auth group; /api/v1/* inside Authenticator.Middleware. Main does not touch chi.
    • Signal handling: signal.NotifyContext(ctx, SIGINT, SIGTERM) → on cancel, mgr.Stop(ctx with 15s deadline)mgr.Destroy(ctx) → exit.
  • 9.2 cmd/lethe/main_test.go (create, TDD, light) — end-to-end smoke via steward.Manager assembled in-test with a :memory: DB and a random-port Server: POST a fixture NDJSON as user A, GET sessions list as A (sees own row), GET session detail as A. Then POST a fresh batch as user B with the same (tool, host, session_id) and confirm both rows coexist; A still only sees A's. Confirms wiring + isolation reaches all the way through the steward graph.
  • 9.3 README.md (modify) — fill in real config example with all keys (including both auth.forward_auth and auth.oidc blocks), add curl commands for ingest + list + detail (one variant with Remote-User for testing forward-auth, one variant with Authorization: Bearer … for OIDC), finalize the dual-auth trust-model section and backup section.
  • Invariant: only /healthz, /readyz, /metrics are unauthenticated; mounting layout enforces this.
  • Commit: feat(cmd): wire server with /healthz /readyz /metrics + authed /api/v1

#Order & dependencies

Linear: each phase depends on all prior phases. Phase 4 (observability/health) and Phase 5 (HTTP foundation) could parallelize but commit-coupling makes it not worth it. No phase can land before its dependencies (config → db → platform → http → auth → handlers → main).

#Open questions / risks / rollback

  • Risk: golang-migrate/v4 + modernc.org/sqlite driver compatibility — multiple-statement migrations (FTS triggers) may need ; handling. Mitigation: Phase 3 test asserts migration applies; if it fails, swap to per-statement execution or goose.
  • Risk: FTS5 trigger SQL on UPSERT — SQLite doesn't fire UPDATE triggers from INSERT … ON CONFLICT … DO UPDATE in older versions. Mitigation: Phase 3 test specifically covers UPSERT path; if triggers don't fire, replace UPSERT with explicit SELECT then INSERT/UPDATE.
  • Risk: OIDC discovery on startup blocks if Authelia is down. Mitigation chosen: fail fast — NewOIDCVerifier errors propagate from main, lethe refuses to start. Acceptable because Authelia is on the same host; if Authelia is down lethe is unusable anyway. Forward-auth-only deployments are unaffected.
  • Open: Authelia OIDC client registration is a manual step on the Authelia side — not in this repo. Documented in Phase 9.3 README only.
  • Rollback: greenfield, single git revert per phase commit.

#Backwards-compat check

Greenfield — no compat surface. Wire format is the only forward-compat concern; pinned in internal/shared/wire/ and /api/v1/ URL prefix per Design. No further checks needed.

#Acknowledged out-of-scope (won't surface in this task, listed so they aren't re-discovered later)

  • Rate limiting on /api/v1/ingest — body cap + per-turn content cap are the only safeguards in v1.
  • OpenAPI spec / generated client — collector (#2) hand-writes against internal/shared/wire/.
  • CORS — JSON API behind reverse proxy on the same origin; not needed.
  • Cursor-based pagination / total-count on /api/v1/sessions — offset+limit is enough for the expected volume.
  • Per-route per-user rate limits (would need Authenticator to expose a per-identity bucket).
  • Pluggable auth backends beyond Authelia (the OIDC verifier is generic enough that pointing at a different IdP works, but only Authelia is documented).

#Verify

Date: 2026-04-26. Run against master HEAD; binary built fresh from cmd/lethe; e2e smoke driven through a real listener on 127.0.0.1:18888 with tmp/lethe.yaml (sqlite at tmp/lethe.db, forward-auth header trust, alice/bob/admin in allowlist, admin is admin).

#Positive

  • go test ./... -race -count=1 — green across all packages.
  • go build ./cmd/lethe — binary at tmp/lethe.
  • Server listens on 127.0.0.1:18888; PID file written.
  • GET /healthz200 ok.
  • GET /readyz200 with {"checks":{"database":"ok"}}.
  • GET /metrics200 Prometheus exposition.
  • POST /api/v1/ingest (alice, NDJSON, application/x-ndjson) → 200 {"accepted":2}; second identical post → 200 {"accepted":2} (idempotent upsert).
  • GET /api/v1/sessions (alice) → 200 with one session, ended_at extended by MAX(turn.timestamp).
  • GET /api/v1/sessions/{tool}/{host}/{sid} (alice) → 200 with turns inline in seq order.
  • Admin override: GET /api/v1/sessions?owner=alice (admin) → 200 with alice's session.

#Negative

All return application/problem+json:

  • POST /api/v1/ingest no auth → 401.
  • POST /api/v1/ingest non-allowlisted user → 403.
  • POST /api/v1/ingest wrong Content-Type415.
  • GET /api/v1/sessions/claude-code/phoebe/nope-id404.
  • GET /no-such-route404 (chi NotFound handler routed through apierror.Render).
  • GET /api/v1/sessions?owner=alice as non-admin → 403.
  • Per-line malformed JSON in NDJSON → 200 with errors[]; accepted reflects committed lines only (chunk aborts before commit on parse error, per spec).
  • Bind validation: bind: 0.0.0.0:18889 and bind: 127.0.0.1:999999 both rejected at config load with err.code=CONFIG_VALIDATE on loopback_bind tag; process exits 1 before any listener opens.

#Per-user isolation (security boundary)

  • Same (tool, host, session_id) ingested by alice and bob coexist as distinct rows (owner first in PK).
  • GET /api/v1/sessions returns only the caller's sessions.
  • GET /api/v1/sessions/{tool}/{host}/{sid} for another owner's session → 404 (does not leak existence).
  • Wire-payload owner injection ignored: ingesting NDJSON whose Session/Turn JSON includes a stray "owner" key still attributes the row to the authenticated identity (confirmed via internal/shared/wire/ having no owner field — the wire types literally cannot deserialize it).

#Graceful shutdown

SIGTERM to the running server → process exits within 2s. Final log lines:

"signal received; shutting down"
"stopping component (CallStop)" component=server.Server
"stopped component (CallStop)"  component=server.Server
"destroying component"          component=database.Database
"lethe stopped"

No in-flight request errors, no panic, exit code 0.

#Invariants (re-confirmed)

  • internal/shared/wire/ has no owner field anywhere (grep -rin "owner" internal/shared/wire/ empty).
  • All HTTP error paths render through internal/pkg/apierror/apierror.go:88, which sets Content-Type: application/problem+json. The chi NotFound/MethodNotAllowed handlers route through the same renderer (Phase 5 fix).
  • Composite primary keys lead with owner (internal/platform/database/migrations/0001_init.up.sql:39,60 — sessions: (owner, tool, host, session_id); turns: (owner, tool, host, session_id, turn_id)).

#Result

All positive, negative, isolation, shutdown, and invariant checks pass. No regressions. Task is verified.

#Conclusion

#Deviations from plan

  • Phase 1: go.mod directive is go 1.25.0, not go 1.22+. go get of golang-migrate/v4.19, viper 1.21, and prometheus 1.23 forced the bump (each requires ≥1.24). Plan said "Go 1.22+" so this satisfies the floor; flagging because the explicit number changed and the Dockerfile builder image was bumped to golang:1.25-alpine to match (consistency fix folded into the same Phase 1 commit).
  • Phase 1: added internal/deps/deps.go with blank imports of every direct dep so go mod tidy keeps them in go.mod until real packages start importing them. Transitional file; expected to shrink each phase and disappear by end of Phase 9. Without it, go mod tidy strips the dep stub the plan called for.
  • Phase 1: .golangci.yml uses the v2 schema (golangci-lint 2.11.4 rejects v1). Same lint set as the plan listed (errcheck, govet, staticcheck, revive, gosec, unused, gofmt, goimports).

#Notes carried forward

  • Phase 3 should add migrate-up, migrate-down, migrate-create to the Justfile alongside the migration runner so the targets aren't dead. (Done in Phase 3.)
  • Each phase from 2 onward must remove the dep it adopts from internal/deps/deps.go; Phase 9 deletes the file.
  • README's Caddy/Authelia snippets use auth.example.com placeholders; replace with phoebe-specific values when the production deploy lands (out of scope for this task).
  • Phase 4 finding (steward unwind gap): steward.Manager.Init returns on the first failing CallInit and does not iterate back over previously-initialized assets to call Destroy. The canary test TestStewardUnwindsOnInitFailure (in internal/platform/health/steward_unwind_test.go) is intentionally red on master to document this. Phase 9 main must compensate: track each component as it init's and, on Init error, walk the list in reverse calling Destroy directly on each (don't try mgr.Stop/mgr.Destroy — those panic unless the manager has reached Started). Once Phase 9 lands the explicit unwind, either delete the canary test or convert it to assert the new compensating behavior.
  • Phase 5 consistency fix (folded into commit 3c45b48 via amend): chi's default 404/405 handlers wrote text/plain, violating the invariant "errors leaving any HTTP handler are rendered as RFC 7807". Added explicit chi.Router.NotFound/MethodNotAllowed handlers that call apierror.Render with NOT_FOUND / METHOD_NOT_ALLOWED codes. Added METHOD_NOT_ALLOWED → 405 entry to the apierror code-status map. Added two regression tests.
  • Phase 7: added UNSUPPORTED_MEDIA_TYPE → 415 entry to apierror.codeStatus (ingest handler enforces application/x-ndjson Content-Type). Repository simulates DB-down by closing the underlying *sql.DB (cleaner than service-faking, mirrors real driver-disconnect failure). Service-level FK test omitted because the schema makes it unreachable through the Service path (parent session is upserted in the same chunk); equivalent Repository-level test pins the wrap-and-classify code path.
  • Phase 8: introduced JSONText (sql.Scanner wrapper) for nullable TEXT-JSON columns — json.RawMessage cannot Scan NULL directly. External JSON shape unchanged. If Phase #3 (search) wants the same scan-safety, factor up to internal/pkg/sqljson.
  • Phase 9: refactored Server.Start to net.Listen first then http.Serve(listener) plus added Server.Addr() so :0 binds report the kernel-assigned port — enables the e2e smoke to bind to a random port without races. cmd/lethe/main.go uses a run() int shell so tests can drive it. Steward unwind canary internal/platform/health/steward_unwind_test.go deleted; main.go's reverse-order unwindOnError compensator is now the production guarantee. Bootstrap stderr slog handler installed before any asset registration so the unwind path always has a logger.

#Final state

  • All 9 phases committed (4ca03be53221c9).
  • go test ./... -race -count=1 fully green; no allowed-red exception.
  • go vet, gofmt -l, go mod tidy, golangci-lint run ./... all clean.
  • Manual smoke: lethe binary built and ran against config.example.yaml; /healthz, /readyz, /metrics, unauthed /api/v1/sessions (401), authed /api/v1/sessions with Remote-User: bigbes (200) all behaved as designed; SIGTERM triggered clean shutdown via the steward graph.
  • Phase 3 → Phase 7 contract pin: INSERT … ON CONFLICT … DO UPDATE fires the UPDATE trigger on SQLite (verified by TestUpsertFiresUpdateTriggerAndKeepsFTSCoherent). Regular FTS5 (not contentless / external content) was chosen so WHERE owner = ? works on the FTS table without a join — accepted the storage cost (content duplicated in real table + FTS shadow). Composite key order is (owner, tool, host, session_id[, turn_id]) everywhere; ingest INSERT/UPDATE/ON CONFLICT clauses must match. started_at/ended_at/source_file are NOT NULL — ingest derives started_at from MIN(turn.timestamp) when SessionMeta.StartedAt is absent.