nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-05-07 16:30:11 +00:00

Author	SHA1	Message	Date
Manmohan Sharma	7a92f5b016	fix(serve): detect tool markers in text stream not token ids The SFT loader tokenizes assistant content with .encode() (ordinary), not .encode_special(), so the model was trained to emit <\|python_start\|> / <\|python_end\|> as the 7-token ordinary sequence [60, 124, 25145, 95, 17104, 124, 62] rather than as special token id 32764. My prior state-machine matched token_id == python_start_id, which never fired — so tool calls were never executed and the model just hallucinated fake tool results (Official leadership page etc). Fix: detect markers in the decoded text stream, parse the payload between <\|python_start\|> and <\|python_end\|>, execute the tool, inject the real <\|output_start\|>…<\|output_end\|> tokens into both the SSE stream and the model's input_ids. Next-token prediction is now grounded on real Tavily output.	2026-04-22 14:39:36 -07:00
Manmohan Sharma	f642cb2eb6	feat(sft): add r7 think+tool prep scripts and compose cleanup - allow assistant list-shaped content in CustomJSON for joint think+tool JSONL - add gen_joint_think_tool, filter_reasoning_jsonl, eval_suite_v2 (think_plus_tool probes) - fix CI: uv sync --no-install-workspace; uv run pytest - remove unused local inference service from compose; document Modal URL in env examples Made-with: Cursor	2026-04-22 14:22:47 -07:00
Manmohan	38cb7f7596	Merge pull request #45 from manmohan659/fix/tavily-direct-answer fix(tools): Tavily include_answer + tool-card overflow	2026-04-22 17:21:08 -04:00
Manmohan Sharma	f70be25212	fix(tools): enable Tavily include_answer and fix UI overflow	2026-04-22 14:20:47 -07:00
Manmohan	d747bcf3e3	Merge pull request #44 from manmohan659/feat/r6-reasoning-tools feat: R6 deploy + reasoning mode toggle + live tool execution	2026-04-22 16:46:25 -04:00
Manmohan	3ab89e7890	feat: deploy d24-sft-r6 with full reasoning mode + live tool use (Tavily) Model R6 (97% pass rate on 33-probe eval, val_bpb 0.2635): - modal/serve.py + modal/_tools.py: tool-aware streaming with TavilySearchBackend auto-detect, python_start/end state machine, output_start/end forcing; mount tavily secret - modal/serve.py: MODEL_TAG=d24-sft-r6, model path points at new SFT r6 - services/chat-api/routes/messages.py: accept thinking_mode flag, inject samosaChaat system prompt (direct or <think> variant) into first user message before streaming to Modal - services/frontend/components/chat/ChatInput.tsx: Brain toggle 'Think' button next to send; when active, model uses think mode - services/frontend/components/chat/ChatWindow.tsx: track thinkingMode state, pass through to API body as thinking_mode - services/frontend/components/chat/MessageBubble.tsx: parse and render <think>...</think> as collapsible italic blocks; <\|python_start\|>...<\|python_end\|> as tool-call cards with icons per tool name; <\|output_start\|>...<\|output_end\|> as result cards with expandable JSON - nanochat/tools.py: TavilySearchBackend class + auto-detect - nanochat/ui.html: legacy UI reasoning toggle (kept for parity) Tool execution verified live: query -> web_search via Tavily -> Macron returned with grounded answer.	2026-04-22 13:43:43 -07:00
Manmohan	67f568a4f2	fix(nginx): re-resolve upstream IPs so deploys don't break auth (#43 ) When docker compose recreates a service, it gets a new internal IP. nginx was resolving upstream hostnames once at startup and serving 502 until someone manually restarted it — which is what broke /api/auth after the last deploy. Uses Docker Compose's embedded DNS (127.0.0.11) and moves each proxy_pass onto a variable so nginx re-resolves every request. Rewrites replace the path-stripping behavior that variable-form proxy_pass doesn't provide out of the box. Also adds a `nginx -t && nginx -s reload` step in the deploy workflow so future nginx.conf edits land without manual ssh. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 20:41:01 -04:00
Manmohan	94bec5f2a0	fix(frontend): assistant messages fill the chat column (#42 ) Assistant responses were capped at max-w-[75%] of the column, so long replies broke into a narrow block with dead space on the right. Cap only applies to user bubbles now; assistant messages use w-full of the max-w-3xl content column, matching how ChatGPT/Claude render replies. Also bumps message vertical spacing from mb-3 to mb-5. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 20:23:56 -04:00
Manmohan	748d2e561c	fix(frontend): widen nav pill, default to dark theme (#41 ) LandingNav was max-w-3xl which forced "How it works" and "Try samosaChaat" to wrap on two lines. Bumps the pill to 1100px, tightens the link padding, demotes the @ handle to lg+, and adds whitespace-nowrap to every chip so nothing wraps again. Default theme is now dark — the no-flash init script adds .dark unless the user has explicitly stored 'light', and the useTheme hook seeds from the same logic. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 20:08:55 -04:00
Manmohan	9a45f0924d	fix(ci): grant id-token write so EC2 deploy can assume the OIDC role (#40 ) aws-actions/configure-aws-credentials needs id-token: write to mint the OIDC JWT and assume AWS_ROLE_ARN. Without it the deploy-ec2 workflow fails at the credentials step. Add the permission at workflow scope. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 19:59:14 -04:00
Manmohan	1d2a76eec4	feat: deploy d24 SFT + polished UI redesign with dark mode (#39 ) * feat(inference): deploy d24 SFT weights to Modal Repoint Modal inference app from the broken d20 checkpoint to our own ManmohanSharma/nanochat-d24 SFT step 484. Rewrites the standalone model as an inference-only port of nanochat/gpt.py so the modern architecture (smear gate, per-layer value embeddings, ve_gate, backout, sliding window attention via SDPA, rotary base 100000, padded vocab, logit softcap) loads cleanly from the checkpoint. Tokenizer loads the pickled tiktoken encoding directly so special tokens end up at their true IDs (32759-32767), and the stop check uses that set instead of hardcoded 0-8. GPU bumped to L4 for headroom. HF token sourced from the 'huggingface' Modal secret. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(frontend): polished redesign with serif display + dark mode Lifts the craft level of the landing and chat UI without changing the desi identity. Adds Fraunces for display headlines, a floating pill LandingNav, a saffron-glow hero with a large serif headline and black pill CTAs, and three gradient-tiled feature cards with inline SVG glyphs replacing the emoji cards. The chat empty state is now a serif greeting with pill-chip prompt starters, and ChatInput is a single rounded pod so the send button sits inside the input (fixes the misaligned floating button). Adds a class-based dark mode across the chat surfaces with a sun/moon toggle in the sidebar footer, powered by a small useTheme hook and a no-flash init script in the root layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(frontend): add ESLint config so CI lint step passes next lint was failing with an interactive prompt because the repo had no ESLint config. Adds a minimal next/core-web-vitals extends and drops the now-unloadable @typescript-eslint/no-explicit-any disable directive in the stream proxy by narrowing the body type to unknown. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 19:55:16 -04:00
Manmohan	272086d2c0	Merge pull request #38 from manmohan659/fix/modal-url-detection fix(frontend): fix stale closure - tokens now render	2026-04-16 18:16:07 -04:00
Manmohan Sharma	16f40ceb54	fix(frontend): pass assistantMsgId directly to fix stale closure bug	2026-04-16 15:15:53 -07:00
Manmohan	7387f7c1d1	Merge pull request #37 from manmohan659/fix/modal-url-detection fix: direct SSE streaming from chat-api (bypass Next.js proxy)	2026-04-16 18:08:59 -04:00
Manmohan Sharma	a873b6ad46	fix: stream directly from chat-api, bypass Next.js proxy Replaced the double-proxy (browser→Next.js→chat-api→Modal) with direct streaming (browser→nginx→chat-api→Modal). Added nginx route for /api/conversations → chat-api. Inlined SSE parsing in ChatWindow instead of useSSE hook going through /api/chat/stream. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 15:08:46 -07:00
Manmohan	e9885b2583	Merge pull request #36 from manmohan659/fix/modal-url-detection fix(chat-api): detect Modal URL by domain not path	2026-04-16 17:59:34 -04:00
Manmohan Sharma	df0584b861	fix(chat-api): detect Modal URL by domain not path suffix	2026-04-16 14:59:20 -07:00
Manmohan	2dd914a69d	Merge pull request #35 from manmohan659/fix/stream-body-format fix(frontend): type fix for proxyUpstream	2026-04-16 17:53:02 -04:00
Manmohan Sharma	7ecd8a928c	fix(frontend): use any type for proxyUpstream body param	2026-04-16 14:52:50 -07:00
Manmohan	15bb2324e2	Merge pull request #34 from manmohan659/fix/stream-body-format fix(frontend): add maxTokens to StreamBody type	2026-04-16 17:51:15 -04:00
Manmohan Sharma	fe34250900	fix(frontend): add maxTokens to StreamBody interface	2026-04-16 14:51:03 -07:00
Manmohan	c5d4d17650	Merge pull request #33 from manmohan659/fix/stream-body-format fix(frontend): correct body format for chat-api messages	2026-04-16 17:49:33 -04:00
Manmohan Sharma	faf4810696	fix(frontend): send correct body format to chat-api messages endpoint Chat-api expects {content, temperature, max_tokens, top_k} but frontend was sending {messages: [...]}. Now extracts last user message as content when proxying to /api/conversations/:id/messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 14:49:22 -07:00
Manmohan	9071685d85	Merge pull request #32 from manmohan659/fix/nginx-routing fix(nginx): route /api/* through frontend not chat-api	2026-04-16 17:46:09 -04:00
Manmohan Sharma	3f7a7da30b	fix(nginx): route all /api/* through frontend, not directly to chat-api Nginx was catching /api/chat/stream and /api/conversations and sending them to chat-api:8002, bypassing the frontend's Next.js API routes. Now only /api/auth/* goes directly to auth service. Everything else goes to frontend, which proxies internally to backend services. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 14:45:49 -07:00
Manmohan	129553b215	Merge pull request #31 from manmohan659/fix/chat-api-fk fix(chat-api): defer users FK to avoid startup crash	2026-04-16 17:41:05 -04:00
Manmohan Sharma	e8222011d9	fix(chat-api): use_alter on users FK to avoid metadata resolution error Chat-api doesn't define the users model (owned by auth service), so SQLAlchemy can't resolve the FK. use_alter=True defers the constraint to ALTER TABLE, avoiding the NoReferencedTableError at startup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 14:40:45 -07:00
Manmohan	5b6eff82e8	Merge pull request #30 from manmohan659/feat/modal-inference fix(chat-api): support Modal inference URL pattern	2026-04-16 17:36:56 -04:00
Manmohan Sharma	6d3e1f0afd	fix(chat-api): support Modal inference URL in inference client The inference client now auto-detects if the URL already ends with /generate (Modal's endpoint URL pattern) and skips appending the path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 14:36:36 -07:00
Manmohan	95b1ffc0fd	Merge pull request #29 from manmohan659/feat/modal-inference feat(modal): Modal GPU inference endpoint	2026-04-16 17:32:22 -04:00
Manmohan Sharma	e5b4db1eee	feat(modal): add Modal GPU inference endpoint for samosaChaat - modal/serve.py: FastAPI endpoint on Modal T4 GPU, streams SSE tokens - modal/_model.py: Standalone GPT model (auto-detects architecture from checkpoint) - modal/_tokenizer.py: Standalone BPE tokenizer (tiktoken-based) - Downloads nanochat-students/base-d20 weights from HuggingFace - Deployed at: https://manmohan659--samosachaat-inference-inference-generate.modal.run Deploy: modal deploy modal/serve.py Dev: modal serve modal/serve.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 14:32:09 -07:00
Manmohan	40ce6c1a89	Merge pull request #28 from manmohan659/fix/ui-redesign fix(ui): redesign landing page + chat UI	2026-04-16 17:05:16 -04:00
Manmohan Sharma	36debd8502	fix(frontend): redesign landing and chat pages for warm, premium look Landing page: warm gradient background, illustrations flanking hero text (180-220px), new tagline, features section with 3 cards, footer updated to "Built by Manmohan", gold CTA and nav buttons, toran moved to hero. Chat page: removed "Chat Completions" header, added samosa logo and bigger suggestion cards to empty state, sidebar empty state message, input area top border/shadow, more prominent new chat button. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 14:03:55 -07:00
Manmohan	b0df1dca2e	Merge pull request #27 from manmohan659/fix/docker-compose-env fix(docker): pass missing auth env vars in docker-compose	2026-04-16 16:54:03 -04:00
Manmohan Sharma	b7971313ba	fix(docker): pass missing env vars to auth service AUTH_BASE_URL, FRONTEND_URL, INTERNAL_API_KEY, SESSION_SECRET, COOKIE_SECURE, COOKIE_DOMAIN, REFRESH_COOKIE_NAME were in .env but not passed to auth container. OAuth callbacks were using localhost:8001 instead of the public URL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 13:53:52 -07:00
Manmohan	b5fbebb63f	Merge pull request #26 from manmohan659/fix/missing-models fix: add missing SQLAlchemy models to auth and chat-api	2026-04-16 16:50:22 -04:00
Manmohan Sharma	8a95a76522	fix: add missing models/ dirs to auth and chat-api services Root .gitignore had `models/` which matched both ML weights AND SQLAlchemy model dirs. Changed to `/models/` (root only). Added auth/src/models/ (User) and chat-api/src/models/ (Conversation, Message). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 13:50:08 -07:00
Manmohan	8885f52ba1	Merge pull request #25 from manmohan659/fix/docker-deps fix(docker): add missing deps to auth and chat-api Dockerfiles	2026-04-16 16:47:06 -04:00
Manmohan Sharma	2061f8848b	fix(docker): add structlog + prometheus deps to auth and chat-api Dockerfiles Auth service was crash-looping with ModuleNotFoundError for prometheus_fastapi_instrumentator. Chat-api was also missing it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 13:46:53 -07:00
Manmohan	bfa34a8a0e	Merge pull request #24 from manmohan659/feat/e2e-integration feat(frontend): wire frontend to real backend auth + chat-api	2026-04-16 16:23:05 -04:00
Manmohan Sharma	aa7a907063	feat(frontend): wire frontend to real backend auth + chat-api services Remove NextAuth and replace with token-based auth against the backend auth service (OAuth + JWT). The frontend now redirects login to /api/auth/google and /api/auth/github (proxied by nginx to the auth service), captures the JWT from the redirect query param, and uses it for all API calls. Key changes: - Remove next-auth dependency and all NextAuth config/routes - Add lib/auth-client.ts (JWT token storage + auth headers) - Add hooks/useAuth.ts (client-side auth state + token capture) - Rewrite middleware.ts to pass-through (client-side auth only) - Login page uses plain <a> links to /api/auth/{provider} - Chat page captures access_token from OAuth redirect - Zustand store fetches conversations from real chat-api via JWT - API routes proxy /api/conversations/* to chat-api with auth - chat/stream route supports conversationId + auth header forwarding - useSSE hook accepts auth headers for authenticated streaming - Sidebar loads conversations from API, supports delete - Landing page (Hero, LandingNav) uses useAuth instead of useSession - Add .env.production.example and scripts/generate-jwt-keys.sh Mock echo fallback preserved when CHAT_API_URL is not set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 13:21:38 -07:00
Manmohan	7e6ecc1d43	Merge pull request #23 from manmohan659/feat/dual-deploy feat(deploy): dual-mode deploy switch (EC2 monolith + EKS)	2026-04-16 15:58:10 -04:00
Manmohan Sharma	b766dcf703	feat(deploy): add dual-mode deploy switch (EC2 monolith + EKS) - deploy.sh: single script to switch between EC2 and EKS modes - ec2: docker-compose with ECR images + nginx SSL reverse proxy - eks: terraform apply + helm install (for demos/grading) - eks-down: terraform destroy (stop costs) - docker-compose.prod.yml: ECR image overrides + nginx service - nginx/nginx.conf: reverse proxy with SSL, SSE streaming support - deploy-ec2.yml: auto-deploy to EC2 after images are built - Remove old single-server deploy.yml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:57:57 -07:00
Manmohan	9095cf01a8	Merge pull request #22 from manmohan659/fix/inference-lockfile fix(inference): regenerate uv.lock for new deps	2026-04-16 15:49:19 -04:00
Manmohan Sharma	07892c0f00	fix(inference): regenerate uv.lock after structlog/prometheus deps added The observability PR added structlog and prometheus-fastapi-instrumentator to inference pyproject.toml but did not regenerate uv.lock, causing Docker build to fail with --locked flag. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:49:05 -07:00
Manmohan	3c0b1ae16b	Merge pull request #21 from manmohan659/fix/ci-and-frontend fix(ci): use setup-uv and --no-workspace for service tests	2026-04-16 15:36:27 -04:00
Manmohan Sharma	66bac1aa5f	fix(ci): use astral-sh/setup-uv and --no-workspace for service tests Root pyproject.toml uses uv features (extra in sources, conflicts) that caused uv sync to fail in CI. Fix by: 1. Replace pip install uv==0.4.30 with astral-sh/setup-uv@v4 (latest) 2. Add --no-workspace flag so services don't inherit root config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:35:41 -07:00
Manmohan	6f19a7c28c	Merge pull request #20 from manmohan659/feat/observability-stack feat(observability): Prometheus + Grafana + Loki stack (#9)	2026-04-16 15:32:22 -04:00
Manmohan	8a113d4757	Merge pull request #19 from manmohan659/feat/day2-operations feat(ops): Day 2 operations automation and chaos readiness (#10)	2026-04-16 15:32:19 -04:00
Manmohan Sharma	aa0818aae2	feat(observability): Prometheus + Grafana + Loki stack for samosaChaat (#9 ) Replaces the helm/observability scaffold with a real monitoring stack wired into the samosaChaat platform. Helm chart (helm/observability/) - Chart.yaml declares kube-prometheus-stack (~62.0) and loki-stack (~2.10) as subchart dependencies. - values.yaml configures Prometheus (15d retention, 50Gi PVC, ServiceMonitor + rule selector on app.kubernetes.io/part-of: samosachaat), Alertmanager (10Gi PVC), Grafana (OAuth-only via GitHub + Google, local login disabled, Prometheus + Loki datasources, dashboards auto-provisioned from a ConfigMap, email + Slack contact points with a critical route to Slack), Loki (50Gi, 30d retention, tsdb schema), and Promtail (JSON pipeline that lifts level / service / trace_id / user_id into labels, scrape config with pod labels). - Alert rules: HighCPU, HighMemory, DiskSpaceLow, High5xxRate, InferenceServiceDown, HighP99Latency. - templates/grafana-dashboards-configmap.yaml renders every file under dashboards/ into a single grafana_dashboard=1 ConfigMap. - dashboards/node-health.json, app-performance.json, inference.json - fully-formed Grafana dashboards with Prometheus datasource variable, templated app selector, thresholded gauges, and LogQL-ready labels. Scraping (helm/samosachaat/templates/servicemonitor.yaml) - ServiceMonitor CRs for auth / chat-api / inference that Prometheus picks up via the part-of=samosachaat selector; scrapes /metrics every 15s and replaces the app label so dashboards line up. Application instrumentation - services/{auth,chat-api,inference} each depend on prometheus-fastapi-instrumentator and expose /metrics (request count, latency histograms, in-progress gauges). - services/auth/src/logging_setup.py and services/inference/src/logging_setup.py mirror the canonical chat-api implementation - structlog JSON with service, trace_id, user_id context injection. - configure_logging() is called at create_app() in auth and inference; inference's main.py now uses structlog via get_logger() instead of logging.getLogger. - log_level setting added to auth + inference config (LOG_LEVEL env). Docs - contracts/logging-standard.md defines the required JSON fields, Python (structlog) + Node.js (pino) implementations, LogQL examples for cross-service queries, and the x-trace-id propagation contract. Closes #9 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 12:29:16 -07:00

1 2 3 4 5 ...

425 Commits