nanochat/services/chat-api
Manmohan 3ab89e7890
feat: deploy d24-sft-r6 with full reasoning mode + live tool use (Tavily)
Model R6 (97% pass rate on 33-probe eval, val_bpb 0.2635):
- modal/serve.py + modal/_tools.py: tool-aware streaming with
  TavilySearchBackend auto-detect, python_start/end state machine,
  output_start/end forcing; mount tavily secret
- modal/serve.py: MODEL_TAG=d24-sft-r6, model path points at new SFT r6
- services/chat-api/routes/messages.py: accept thinking_mode flag,
  inject samosaChaat system prompt (direct or <think> variant) into
  first user message before streaming to Modal
- services/frontend/components/chat/ChatInput.tsx: Brain toggle
  'Think' button next to send; when active, model uses think mode
- services/frontend/components/chat/ChatWindow.tsx: track
  thinkingMode state, pass through to API body as thinking_mode
- services/frontend/components/chat/MessageBubble.tsx: parse and
  render <think>...</think> as collapsible italic blocks;
  <|python_start|>...<|python_end|> as tool-call cards with icons
  per tool name; <|output_start|>...<|output_end|> as result cards
  with expandable JSON
- nanochat/tools.py: TavilySearchBackend class + auto-detect
- nanochat/ui.html: legacy UI reasoning toggle (kept for parity)

Tool execution verified live: query -> web_search via Tavily ->
Macron returned with grounded answer.
2026-04-22 13:43:43 -07:00
..
src feat: deploy d24-sft-r6 with full reasoning mode + live tool use (Tavily) 2026-04-22 13:43:43 -07:00
Dockerfile fix(docker): add structlog + prometheus deps to auth and chat-api Dockerfiles 2026-04-16 13:46:53 -07:00
pyproject.toml feat(observability): Prometheus + Grafana + Loki stack for samosaChaat (#9) 2026-04-16 12:29:16 -07:00
README.md feat(chat-api): conversation orchestration + SSE streaming proxy (#6) 2026-04-16 11:49:51 -07:00

Chat API Service

Orchestration layer for samosaChaat conversations. Manages conversation state in PostgreSQL, authenticates every request via the auth service, and proxies streaming inference requests via Server-Sent Events.

Endpoints

Method Path Description
GET /api/health Liveness probe (unauthenticated)
GET /api/conversations List the authenticated user's conversations, grouped by date
POST /api/conversations Create a new conversation
GET /api/conversations/{id} Fetch a conversation + full message history
PUT /api/conversations/{id} Update the conversation title
DELETE /api/conversations/{id} Delete a conversation (cascade deletes messages)
POST /api/conversations/{id}/messages Append a user message and stream the assistant response
POST /api/conversations/{id}/regenerate Delete the last assistant message and regenerate it
GET /api/models Proxy to inference GET /models
POST /api/models/swap Proxy to inference POST /models/swap (admin only)

All authenticated endpoints expect Authorization: Bearer <jwt>. The chat API validates the token by calling the auth service POST /auth/validate with the shared X-Internal-API-Key header and caches the result for 5 minutes.

Environment

Variable Default Purpose
DATABASE_URL postgresql+asyncpg://localhost/samosachaat PostgreSQL connection string
AUTH_SERVICE_URL http://auth:8001 Base URL of the auth service
INFERENCE_SERVICE_URL http://inference:8000 Base URL of the inference service
INTERNAL_API_KEY Shared key for internal service auth
MAX_CONVERSATION_HISTORY 50 Max messages included in each inference call
MAX_TOKEN_BUDGET 6000 Character budget proxy for the above
FRONTEND_URL http://localhost:3000 Origin allowed by CORS
LOG_LEVEL INFO Python log level

Running locally

uv pip install -e ".[dev]"
uvicorn src.main:app --reload --port 8002

Running tests

cd services/chat-api
pytest

Tests use SQLite + aiosqlite for a throwaway database, respx to mock the auth service, and hand-crafted httpx mocks for the inference SSE stream.