mirror of https://github.com/karpathy/nanochat.git synced 2026-05-08 16:59:59 +00:00

History

Manmohan 3ab89e7890 feat: deploy d24-sft-r6 with full reasoning mode + live tool use (Tavily) Model R6 (97% pass rate on 33-probe eval, val_bpb 0.2635): - modal/serve.py + modal/_tools.py: tool-aware streaming with TavilySearchBackend auto-detect, python_start/end state machine, output_start/end forcing; mount tavily secret - modal/serve.py: MODEL_TAG=d24-sft-r6, model path points at new SFT r6 - services/chat-api/routes/messages.py: accept thinking_mode flag, inject samosaChaat system prompt (direct or <think> variant) into first user message before streaming to Modal - services/frontend/components/chat/ChatInput.tsx: Brain toggle 'Think' button next to send; when active, model uses think mode - services/frontend/components/chat/ChatWindow.tsx: track thinkingMode state, pass through to API body as thinking_mode - services/frontend/components/chat/MessageBubble.tsx: parse and render <think>...</think> as collapsible italic blocks; <\|python_start\|>...<\|python_end\|> as tool-call cards with icons per tool name; <\|output_start\|>...<\|output_end\|> as result cards with expandable JSON - nanochat/tools.py: TavilySearchBackend class + auto-detect - nanochat/ui.html: legacy UI reasoning toggle (kept for parity) Tool execution verified live: query -> web_search via Tavily -> Macron returned with grounded answer.		2026-04-22 13:43:43 -07:00
..
src	feat: deploy d24-sft-r6 with full reasoning mode + live tool use (Tavily)	2026-04-22 13:43:43 -07:00
Dockerfile	fix(docker): add structlog + prometheus deps to auth and chat-api Dockerfiles	2026-04-16 13:46:53 -07:00
pyproject.toml	feat(observability): Prometheus + Grafana + Loki stack for samosaChaat (#9 )	2026-04-16 12:29:16 -07:00
README.md	feat(chat-api): conversation orchestration + SSE streaming proxy (#6 )	2026-04-16 11:49:51 -07:00

README.md

Chat API Service

Orchestration layer for samosaChaat conversations. Manages conversation state in PostgreSQL, authenticates every request via the auth service, and proxies streaming inference requests via Server-Sent Events.

Endpoints

Method	Path	Description
GET	`/api/health`	Liveness probe (unauthenticated)
GET	`/api/conversations`	List the authenticated user's conversations, grouped by date
POST	`/api/conversations`	Create a new conversation
GET	`/api/conversations/{id}`	Fetch a conversation + full message history
PUT	`/api/conversations/{id}`	Update the conversation title
DELETE	`/api/conversations/{id}`	Delete a conversation (cascade deletes messages)
POST	`/api/conversations/{id}/messages`	Append a user message and stream the assistant response
POST	`/api/conversations/{id}/regenerate`	Delete the last assistant message and regenerate it
GET	`/api/models`	Proxy to inference `GET /models`
POST	`/api/models/swap`	Proxy to inference `POST /models/swap` (admin only)

All authenticated endpoints expect Authorization: Bearer <jwt>. The chat API validates the token by calling the auth service POST /auth/validate with the shared X-Internal-API-Key header and caches the result for 5 minutes.

Environment

Variable	Default	Purpose
`DATABASE_URL`	`postgresql+asyncpg://localhost/samosachaat`	PostgreSQL connection string
`AUTH_SERVICE_URL`	`http://auth:8001`	Base URL of the auth service
`INFERENCE_SERVICE_URL`	`http://inference:8000`	Base URL of the inference service
`INTERNAL_API_KEY`	—	Shared key for internal service auth
`MAX_CONVERSATION_HISTORY`	`50`	Max messages included in each inference call
`MAX_TOKEN_BUDGET`	`6000`	Character budget proxy for the above
`FRONTEND_URL`	`http://localhost:3000`	Origin allowed by CORS
`LOG_LEVEL`	`INFO`	Python log level

Running locally

uv pip install -e ".[dev]"
uvicorn src.main:app --reload --port 8002

Running tests

cd services/chat-api
pytest

Tests use SQLite + aiosqlite for a throwaway database, respx to mock the auth service, and hand-crafted httpx mocks for the inference SSE stream.