Root .gitignore had `models/` which matched both ML weights AND
SQLAlchemy model dirs. Changed to `/models/` (root only).
Added auth/src/models/ (User) and chat-api/src/models/ (Conversation, Message).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Auth service was crash-looping with ModuleNotFoundError for
prometheus_fastapi_instrumentator. Chat-api was also missing it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove NextAuth and replace with token-based auth against the backend
auth service (OAuth + JWT). The frontend now redirects login to
/api/auth/google and /api/auth/github (proxied by nginx to the auth
service), captures the JWT from the redirect query param, and uses it
for all API calls.
Key changes:
- Remove next-auth dependency and all NextAuth config/routes
- Add lib/auth-client.ts (JWT token storage + auth headers)
- Add hooks/useAuth.ts (client-side auth state + token capture)
- Rewrite middleware.ts to pass-through (client-side auth only)
- Login page uses plain <a> links to /api/auth/{provider}
- Chat page captures access_token from OAuth redirect
- Zustand store fetches conversations from real chat-api via JWT
- API routes proxy /api/conversations/* to chat-api with auth
- chat/stream route supports conversationId + auth header forwarding
- useSSE hook accepts auth headers for authenticated streaming
- Sidebar loads conversations from API, supports delete
- Landing page (Hero, LandingNav) uses useAuth instead of useSession
- Add .env.production.example and scripts/generate-jwt-keys.sh
Mock echo fallback preserved when CHAT_API_URL is not set.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The observability PR added structlog and prometheus-fastapi-instrumentator
to inference pyproject.toml but did not regenerate uv.lock, causing
Docker build to fail with --locked flag.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the helm/observability scaffold with a real monitoring stack
wired into the samosaChaat platform.
Helm chart (helm/observability/)
- Chart.yaml declares kube-prometheus-stack (~62.0) and loki-stack
(~2.10) as subchart dependencies.
- values.yaml configures Prometheus (15d retention, 50Gi PVC,
ServiceMonitor + rule selector on app.kubernetes.io/part-of:
samosachaat), Alertmanager (10Gi PVC), Grafana (OAuth-only via
GitHub + Google, local login disabled, Prometheus + Loki datasources,
dashboards auto-provisioned from a ConfigMap, email + Slack contact
points with a critical route to Slack), Loki (50Gi, 30d retention,
tsdb schema), and Promtail (JSON pipeline that lifts level / service
/ trace_id / user_id into labels, scrape config with pod labels).
- Alert rules: HighCPU, HighMemory, DiskSpaceLow, High5xxRate,
InferenceServiceDown, HighP99Latency.
- templates/grafana-dashboards-configmap.yaml renders every file under
dashboards/ into a single grafana_dashboard=1 ConfigMap.
- dashboards/node-health.json, app-performance.json, inference.json -
fully-formed Grafana dashboards with Prometheus datasource variable,
templated app selector, thresholded gauges, and LogQL-ready labels.
Scraping (helm/samosachaat/templates/servicemonitor.yaml)
- ServiceMonitor CRs for auth / chat-api / inference that Prometheus
picks up via the part-of=samosachaat selector; scrapes /metrics
every 15s and replaces the app label so dashboards line up.
Application instrumentation
- services/{auth,chat-api,inference} each depend on
prometheus-fastapi-instrumentator and expose /metrics (request count,
latency histograms, in-progress gauges).
- services/auth/src/logging_setup.py and
services/inference/src/logging_setup.py mirror the canonical
chat-api implementation - structlog JSON with service, trace_id,
user_id context injection.
- configure_logging() is called at create_app() in auth and inference;
inference's main.py now uses structlog via get_logger() instead of
logging.getLogger.
- log_level setting added to auth + inference config (LOG_LEVEL env).
Docs
- contracts/logging-standard.md defines the required JSON fields,
Python (structlog) + Node.js (pino) implementations, LogQL examples
for cross-service queries, and the x-trace-id propagation contract.
Closes#9
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- FastAPI service that manages conversations and messages in PostgreSQL
(SQLAlchemy 2.0 async + asyncpg) and streams assistant responses back
to the client via sse-starlette, forwarding the inference service SSE
contract unchanged.
- Auth guard validates every request against the auth service
/auth/validate endpoint (X-Internal-API-Key) and caches results in an
in-process TTL cache (5 min, 1024 entries) to absorb request bursts.
- Every query filters by authenticated user_id; cross-user access
returns 404. Message send flow auto-titles the first message,
persists the streamed assistant response after the client disconnects,
and records token_count + inference_time_ms.
- /api/models{,/swap} proxies the inference admin surface; swap
requires is_admin on the validated user.
- Structured JSON logging via structlog with trace_id + user_id
ContextVars attached to every log line.
- Test suite (pytest + aiosqlite + respx) covers CRUD, user scoping,
streaming SSE persistence, regenerate, model proxy admin gate,
and the stream proxy error path. 16/16 passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>