nanochat/services/chat-api
Manmohan Sharma aa0818aae2
feat(observability): Prometheus + Grafana + Loki stack for samosaChaat (#9)
Replaces the helm/observability scaffold with a real monitoring stack
wired into the samosaChaat platform.

Helm chart (helm/observability/)
- Chart.yaml declares kube-prometheus-stack (~62.0) and loki-stack
  (~2.10) as subchart dependencies.
- values.yaml configures Prometheus (15d retention, 50Gi PVC,
  ServiceMonitor + rule selector on app.kubernetes.io/part-of:
  samosachaat), Alertmanager (10Gi PVC), Grafana (OAuth-only via
  GitHub + Google, local login disabled, Prometheus + Loki datasources,
  dashboards auto-provisioned from a ConfigMap, email + Slack contact
  points with a critical route to Slack), Loki (50Gi, 30d retention,
  tsdb schema), and Promtail (JSON pipeline that lifts level / service
  / trace_id / user_id into labels, scrape config with pod labels).
- Alert rules: HighCPU, HighMemory, DiskSpaceLow, High5xxRate,
  InferenceServiceDown, HighP99Latency.
- templates/grafana-dashboards-configmap.yaml renders every file under
  dashboards/ into a single grafana_dashboard=1 ConfigMap.
- dashboards/node-health.json, app-performance.json, inference.json -
  fully-formed Grafana dashboards with Prometheus datasource variable,
  templated app selector, thresholded gauges, and LogQL-ready labels.

Scraping (helm/samosachaat/templates/servicemonitor.yaml)
- ServiceMonitor CRs for auth / chat-api / inference that Prometheus
  picks up via the part-of=samosachaat selector; scrapes /metrics
  every 15s and replaces the app label so dashboards line up.

Application instrumentation
- services/{auth,chat-api,inference} each depend on
  prometheus-fastapi-instrumentator and expose /metrics (request count,
  latency histograms, in-progress gauges).
- services/auth/src/logging_setup.py and
  services/inference/src/logging_setup.py mirror the canonical
  chat-api implementation - structlog JSON with service, trace_id,
  user_id context injection.
- configure_logging() is called at create_app() in auth and inference;
  inference's main.py now uses structlog via get_logger() instead of
  logging.getLogger.
- log_level setting added to auth + inference config (LOG_LEVEL env).

Docs
- contracts/logging-standard.md defines the required JSON fields,
  Python (structlog) + Node.js (pino) implementations, LogQL examples
  for cross-service queries, and the x-trace-id propagation contract.

Closes #9

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 12:29:16 -07:00
..
src feat(observability): Prometheus + Grafana + Loki stack for samosaChaat (#9) 2026-04-16 12:29:16 -07:00
Dockerfile feat(chat-api): conversation orchestration + SSE streaming proxy (#6) 2026-04-16 11:49:51 -07:00
pyproject.toml feat(observability): Prometheus + Grafana + Loki stack for samosaChaat (#9) 2026-04-16 12:29:16 -07:00
README.md feat(chat-api): conversation orchestration + SSE streaming proxy (#6) 2026-04-16 11:49:51 -07:00

Chat API Service

Orchestration layer for samosaChaat conversations. Manages conversation state in PostgreSQL, authenticates every request via the auth service, and proxies streaming inference requests via Server-Sent Events.

Endpoints

Method Path Description
GET /api/health Liveness probe (unauthenticated)
GET /api/conversations List the authenticated user's conversations, grouped by date
POST /api/conversations Create a new conversation
GET /api/conversations/{id} Fetch a conversation + full message history
PUT /api/conversations/{id} Update the conversation title
DELETE /api/conversations/{id} Delete a conversation (cascade deletes messages)
POST /api/conversations/{id}/messages Append a user message and stream the assistant response
POST /api/conversations/{id}/regenerate Delete the last assistant message and regenerate it
GET /api/models Proxy to inference GET /models
POST /api/models/swap Proxy to inference POST /models/swap (admin only)

All authenticated endpoints expect Authorization: Bearer <jwt>. The chat API validates the token by calling the auth service POST /auth/validate with the shared X-Internal-API-Key header and caches the result for 5 minutes.

Environment

Variable Default Purpose
DATABASE_URL postgresql+asyncpg://localhost/samosachaat PostgreSQL connection string
AUTH_SERVICE_URL http://auth:8001 Base URL of the auth service
INFERENCE_SERVICE_URL http://inference:8000 Base URL of the inference service
INTERNAL_API_KEY Shared key for internal service auth
MAX_CONVERSATION_HISTORY 50 Max messages included in each inference call
MAX_TOKEN_BUDGET 6000 Character budget proxy for the above
FRONTEND_URL http://localhost:3000 Origin allowed by CORS
LOG_LEVEL INFO Python log level

Running locally

uv pip install -e ".[dev]"
uvicorn src.main:app --reload --port 8002

Running tests

cd services/chat-api
pytest

Tests use SQLite + aiosqlite for a throwaway database, respx to mock the auth service, and hand-crafted httpx mocks for the inference SSE stream.