mirror of
https://github.com/karpathy/nanochat.git
synced 2026-05-08 00:39:50 +00:00
Replaces the helm/observability scaffold with a real monitoring stack
wired into the samosaChaat platform.
Helm chart (helm/observability/)
- Chart.yaml declares kube-prometheus-stack (~62.0) and loki-stack
(~2.10) as subchart dependencies.
- values.yaml configures Prometheus (15d retention, 50Gi PVC,
ServiceMonitor + rule selector on app.kubernetes.io/part-of:
samosachaat), Alertmanager (10Gi PVC), Grafana (OAuth-only via
GitHub + Google, local login disabled, Prometheus + Loki datasources,
dashboards auto-provisioned from a ConfigMap, email + Slack contact
points with a critical route to Slack), Loki (50Gi, 30d retention,
tsdb schema), and Promtail (JSON pipeline that lifts level / service
/ trace_id / user_id into labels, scrape config with pod labels).
- Alert rules: HighCPU, HighMemory, DiskSpaceLow, High5xxRate,
InferenceServiceDown, HighP99Latency.
- templates/grafana-dashboards-configmap.yaml renders every file under
dashboards/ into a single grafana_dashboard=1 ConfigMap.
- dashboards/node-health.json, app-performance.json, inference.json -
fully-formed Grafana dashboards with Prometheus datasource variable,
templated app selector, thresholded gauges, and LogQL-ready labels.
Scraping (helm/samosachaat/templates/servicemonitor.yaml)
- ServiceMonitor CRs for auth / chat-api / inference that Prometheus
picks up via the part-of=samosachaat selector; scrapes /metrics
every 15s and replaces the app label so dashboards line up.
Application instrumentation
- services/{auth,chat-api,inference} each depend on
prometheus-fastapi-instrumentator and expose /metrics (request count,
latency histograms, in-progress gauges).
- services/auth/src/logging_setup.py and
services/inference/src/logging_setup.py mirror the canonical
chat-api implementation - structlog JSON with service, trace_id,
user_id context injection.
- configure_logging() is called at create_app() in auth and inference;
inference's main.py now uses structlog via get_logger() instead of
logging.getLogger.
- log_level setting added to auth + inference config (LOG_LEVEL env).
Docs
- contracts/logging-standard.md defines the required JSON fields,
Python (structlog) + Node.js (pino) implementations, LogQL examples
for cross-service queries, and the x-trace-id propagation contract.
Closes #9
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| src | ||
| Dockerfile | ||
| pyproject.toml | ||
| README.md | ||
Chat API Service
Orchestration layer for samosaChaat conversations. Manages conversation state in PostgreSQL, authenticates every request via the auth service, and proxies streaming inference requests via Server-Sent Events.
Endpoints
| Method | Path | Description |
|---|---|---|
| GET | /api/health |
Liveness probe (unauthenticated) |
| GET | /api/conversations |
List the authenticated user's conversations, grouped by date |
| POST | /api/conversations |
Create a new conversation |
| GET | /api/conversations/{id} |
Fetch a conversation + full message history |
| PUT | /api/conversations/{id} |
Update the conversation title |
| DELETE | /api/conversations/{id} |
Delete a conversation (cascade deletes messages) |
| POST | /api/conversations/{id}/messages |
Append a user message and stream the assistant response |
| POST | /api/conversations/{id}/regenerate |
Delete the last assistant message and regenerate it |
| GET | /api/models |
Proxy to inference GET /models |
| POST | /api/models/swap |
Proxy to inference POST /models/swap (admin only) |
All authenticated endpoints expect Authorization: Bearer <jwt>. The chat API
validates the token by calling the auth service POST /auth/validate with the
shared X-Internal-API-Key header and caches the result for 5 minutes.
Environment
| Variable | Default | Purpose |
|---|---|---|
DATABASE_URL |
postgresql+asyncpg://localhost/samosachaat |
PostgreSQL connection string |
AUTH_SERVICE_URL |
http://auth:8001 |
Base URL of the auth service |
INFERENCE_SERVICE_URL |
http://inference:8000 |
Base URL of the inference service |
INTERNAL_API_KEY |
— | Shared key for internal service auth |
MAX_CONVERSATION_HISTORY |
50 |
Max messages included in each inference call |
MAX_TOKEN_BUDGET |
6000 |
Character budget proxy for the above |
FRONTEND_URL |
http://localhost:3000 |
Origin allowed by CORS |
LOG_LEVEL |
INFO |
Python log level |
Running locally
uv pip install -e ".[dev]"
uvicorn src.main:app --reload --port 8002
Running tests
cd services/chat-api
pytest
Tests use SQLite + aiosqlite for a throwaway database, respx to mock the auth service, and hand-crafted httpx mocks for the inference SSE stream.