mirror of https://github.com/karpathy/nanochat.git synced 2026-05-08 00:39:50 +00:00

History

Manmohan Sharma aa0818aae2 feat(observability): Prometheus + Grafana + Loki stack for samosaChaat (#9 ) Replaces the helm/observability scaffold with a real monitoring stack wired into the samosaChaat platform. Helm chart (helm/observability/) - Chart.yaml declares kube-prometheus-stack (~62.0) and loki-stack (~2.10) as subchart dependencies. - values.yaml configures Prometheus (15d retention, 50Gi PVC, ServiceMonitor + rule selector on app.kubernetes.io/part-of: samosachaat), Alertmanager (10Gi PVC), Grafana (OAuth-only via GitHub + Google, local login disabled, Prometheus + Loki datasources, dashboards auto-provisioned from a ConfigMap, email + Slack contact points with a critical route to Slack), Loki (50Gi, 30d retention, tsdb schema), and Promtail (JSON pipeline that lifts level / service / trace_id / user_id into labels, scrape config with pod labels). - Alert rules: HighCPU, HighMemory, DiskSpaceLow, High5xxRate, InferenceServiceDown, HighP99Latency. - templates/grafana-dashboards-configmap.yaml renders every file under dashboards/ into a single grafana_dashboard=1 ConfigMap. - dashboards/node-health.json, app-performance.json, inference.json - fully-formed Grafana dashboards with Prometheus datasource variable, templated app selector, thresholded gauges, and LogQL-ready labels. Scraping (helm/samosachaat/templates/servicemonitor.yaml) - ServiceMonitor CRs for auth / chat-api / inference that Prometheus picks up via the part-of=samosachaat selector; scrapes /metrics every 15s and replaces the app label so dashboards line up. Application instrumentation - services/{auth,chat-api,inference} each depend on prometheus-fastapi-instrumentator and expose /metrics (request count, latency histograms, in-progress gauges). - services/auth/src/logging_setup.py and services/inference/src/logging_setup.py mirror the canonical chat-api implementation - structlog JSON with service, trace_id, user_id context injection. - configure_logging() is called at create_app() in auth and inference; inference's main.py now uses structlog via get_logger() instead of logging.getLogger. - log_level setting added to auth + inference config (LOG_LEVEL env). Docs - contracts/logging-standard.md defines the required JSON fields, Python (structlog) + Node.js (pino) implementations, LogQL examples for cross-service queries, and the x-trace-id propagation contract. Closes #9 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-04-16 12:29:16 -07:00
..
src	feat(observability): Prometheus + Grafana + Loki stack for samosaChaat (#9 )	2026-04-16 12:29:16 -07:00
Dockerfile	feat(chat-api): conversation orchestration + SSE streaming proxy (#6 )	2026-04-16 11:49:51 -07:00
pyproject.toml	feat(observability): Prometheus + Grafana + Loki stack for samosaChaat (#9 )	2026-04-16 12:29:16 -07:00
README.md	feat(chat-api): conversation orchestration + SSE streaming proxy (#6 )	2026-04-16 11:49:51 -07:00

README.md

Chat API Service

Orchestration layer for samosaChaat conversations. Manages conversation state in PostgreSQL, authenticates every request via the auth service, and proxies streaming inference requests via Server-Sent Events.

Endpoints

Method	Path	Description
GET	`/api/health`	Liveness probe (unauthenticated)
GET	`/api/conversations`	List the authenticated user's conversations, grouped by date
POST	`/api/conversations`	Create a new conversation
GET	`/api/conversations/{id}`	Fetch a conversation + full message history
PUT	`/api/conversations/{id}`	Update the conversation title
DELETE	`/api/conversations/{id}`	Delete a conversation (cascade deletes messages)
POST	`/api/conversations/{id}/messages`	Append a user message and stream the assistant response
POST	`/api/conversations/{id}/regenerate`	Delete the last assistant message and regenerate it
GET	`/api/models`	Proxy to inference `GET /models`
POST	`/api/models/swap`	Proxy to inference `POST /models/swap` (admin only)

All authenticated endpoints expect Authorization: Bearer <jwt>. The chat API validates the token by calling the auth service POST /auth/validate with the shared X-Internal-API-Key header and caches the result for 5 minutes.

Environment

Variable	Default	Purpose
`DATABASE_URL`	`postgresql+asyncpg://localhost/samosachaat`	PostgreSQL connection string
`AUTH_SERVICE_URL`	`http://auth:8001`	Base URL of the auth service
`INFERENCE_SERVICE_URL`	`http://inference:8000`	Base URL of the inference service
`INTERNAL_API_KEY`	—	Shared key for internal service auth
`MAX_CONVERSATION_HISTORY`	`50`	Max messages included in each inference call
`MAX_TOKEN_BUDGET`	`6000`	Character budget proxy for the above
`FRONTEND_URL`	`http://localhost:3000`	Origin allowed by CORS
`LOG_LEVEL`	`INFO`	Python log level

Running locally

uv pip install -e ".[dev]"
uvicorn src.main:app --reload --port 8002

Running tests

cd services/chat-api
pytest

Tests use SQLite + aiosqlite for a throwaway database, respx to mock the auth service, and hand-crafted httpx mocks for the inference SSE stream.