nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-05-10 09:50:25 +00:00

Author	SHA1	Message	Date
Manmohan Sharma	8b360f5bc8	fix: veto matches shorthand 'u' and 'r' for you/are	2026-04-22 16:10:59 -07:00
Manmohan Sharma	6069a7329b	fix: search veto for identity+greetings, grounding suffix for tool results Two bugs: (1) force-web-search toggle bypassed identity veto — 'who are u' with Search on hit Tavily and got personality-quiz garbage. Now we always check _is_identity_or_meta() which covers identity, creator, samosaChaat references AND greetings (hi/hello/hey/what's up) before honoring the force toggle. (2) Model ignored injected Tavily result and answered from training priors (e.g. generic VP bio instead of specific Armenia/Iran facts). Added a grounding suffix after <\|output_end\|> ('Based on the search results above, ' for web_search, 'The result is ' for calculator) so the model's next tokens condition on the fresh tool output instead of spinning up memory.	2026-04-22 16:08:59 -07:00
Manmohan Sharma	bd37163138	fix: auto-inject calculator tool call on arithmetic in user message needs_calculator now extracts the actual expression from: bare arithmetic (900+100), verbal math (900 plus 100), percentage (17% tip on 45), with comma-stripping and whitespace normalization. serve.py wires it into the force-prefix path parallel to web_search — if no web-search trigger, check calculator, pre-seed real tool call + result so the model sees the grounded answer in context.	2026-04-22 16:04:26 -07:00
Manmohan Sharma	2e5cf45f86	fix(classifier): resolve pronouns from conversation history + roadmap Adds needs_web_search_contextual(messages) that picks the subject from the most recent user turn and replaces him/her/it in the current query. Vetoes when prior turns were about identity. Also adds TRAINING_ROADMAP.md — six-phase plan (tokens redacted).	2026-04-22 15:43:57 -07:00
Manmohan Sharma	fd8e10a820	fix(classifier): expand identity veto to cover all self-introspection queries Added patterns for: tell me about yourself / you / about you, what do/can you do, what are your capabilities / skills, how do you work, what are you good at, what's your purpose / story / mission, where did you come from, how were you built, are you an AI / chatbot / language model, model meta (model/version/context/training cutoff), creator socials (github/linkedin/twitter), and more writing tasks (song, joke). All 27 identity cases now short-circuit without hitting Tavily.	2026-04-22 15:25:33 -07:00
Manmohan Sharma	5e3b17e990	fix(classifier): veto identity/meta/greeting/writing queries from web_search The heuristic classifier was triggering web_search on 'who is your creator', 'who is manmohan sharma', 'who created you' etc — which returned irrelevant Tavily results (Tyler the Creator, Waaree CFO) when the model's SFT training already has the correct grounded identity answer. Added _IDENTITY_VETO_PATTERNS covering: self-referential questions, creator/maker/developer queries, competitor/provenance attacks (are you chatgpt/made by openai), samosaChaat/Manmohan name references, meta-questions (parameters/architecture/training/open source), greetings (hi/hello/hey), small talk, and writing/reasoning tasks that the model answers from memory. Veto runs before all positive classification.	2026-04-22 15:24:08 -07:00
Manmohan Sharma	215e8bd8c3	feat(ui): add Search toggle that forces web_search every message New Globe/'Search' toggle next to the Brain/'Think' button. When ON, every message sent pushes force_web_search=true through: frontend -> chat-api -> Modal. Modal bypasses the heuristic classifier and always pre-seeds the assistant turn with a real Tavily-grounded tool call + result. Toggle is independent of Think — use either or both. Classifier still runs when toggle is OFF, so auto-detection of 'current president' / 'latest weather' / etc still works without any user action.	2026-04-22 15:20:45 -07:00
Manmohan Sharma	57be688fdc	fix(serve): don't scan our own injected tokens for the loop-break check Bug: after runtime tool injection, the post-injection break scanned gen_ids[pre_injection_len:] which included our own injected <\|output_start\|>…<\|output_end\|> — so the loop-break fired IMMEDIATELY and stopped the turn before the model could write its final answer. Visible on multi-turn queries like a follow-up 'tell me more about him' where the model naturally issued a tool call, got real Tavily output, and then got cut off. Fix: track post_injection_start (the index AFTER injected tokens) and only scan from there for stray markers.	2026-04-22 15:15:34 -07:00
Manmohan Sharma	297bc4bfb9	fix(serve): strip system-prompt prefix before classifying user query	2026-04-22 15:05:52 -07:00
Manmohan Sharma	4628d53d67	fix(tools): force web_search on tool-worthy queries + strip orphan markers in UI Adds modal/_query_classifier.py with regex patterns covering time-sensitive queries (current/present/latest/today/weather/CEO/president/stock/news/sports/etc). Modal serve.py classifies each user message and, when it matches, pre-seeds the assistant turn with a real Tavily-backed tool call + result — so 'whos the present president' now triggers web_search the same as 'current president'. Also tightens the post-injection break to fire on any leaked tool marker. UI: MessageBubble.tsx now strips orphan close-tags (<\|output_end\|> without an open), dedupes consecutive identical tool-result blocks, and removes fragment markers from text segments so they don't leak into the message body.	2026-04-22 15:01:07 -07:00
Manmohan Sharma	d49de1575b	fix(serve): decode-tail text match for tool markers Token-id sequence match failed because BPE has multiple valid tokenizations of the same text, so the greedy encoder output didn't match the model's sampled path. Instead decode gen_ids directly and search for the marker text. Batch-decoding produces complete text even if single-token decodes return empty strings.	2026-04-22 14:48:51 -07:00
Manmohan Sharma	544ab89c04	fix(serve): stop turn when model emits second output block after injection Training data taught the model to echo another <\|output_start\|>…<\|output_end\|> after our injected real tool result. Detect that second sequence and break the turn; the grounded answer has already streamed to the client.	2026-04-22 14:44:56 -07:00
Manmohan Sharma	ba727cb4d5	fix(serve): match tool markers on token-id sequences not decoded text Previous text-stream approach lost markers because BPE partial-byte tokens decode to empty strings, so assistant_text never accumulated the full marker. Switch to matching the ordinary-token id sequence directly (python_start = [60,124,25145,95,17104,124,62]).	2026-04-22 14:42:07 -07:00
Manmohan Sharma	7a92f5b016	fix(serve): detect tool markers in text stream not token ids The SFT loader tokenizes assistant content with .encode() (ordinary), not .encode_special(), so the model was trained to emit <\|python_start\|> / <\|python_end\|> as the 7-token ordinary sequence [60, 124, 25145, 95, 17104, 124, 62] rather than as special token id 32764. My prior state-machine matched token_id == python_start_id, which never fired — so tool calls were never executed and the model just hallucinated fake tool results (Official leadership page etc). Fix: detect markers in the decoded text stream, parse the payload between <\|python_start\|> and <\|python_end\|>, execute the tool, inject the real <\|output_start\|>…<\|output_end\|> tokens into both the SSE stream and the model's input_ids. Next-token prediction is now grounded on real Tavily output.	2026-04-22 14:39:36 -07:00
Manmohan Sharma	f70be25212	fix(tools): enable Tavily include_answer and fix UI overflow	2026-04-22 14:20:47 -07:00
Manmohan	3ab89e7890	feat: deploy d24-sft-r6 with full reasoning mode + live tool use (Tavily) Model R6 (97% pass rate on 33-probe eval, val_bpb 0.2635): - modal/serve.py + modal/_tools.py: tool-aware streaming with TavilySearchBackend auto-detect, python_start/end state machine, output_start/end forcing; mount tavily secret - modal/serve.py: MODEL_TAG=d24-sft-r6, model path points at new SFT r6 - services/chat-api/routes/messages.py: accept thinking_mode flag, inject samosaChaat system prompt (direct or <think> variant) into first user message before streaming to Modal - services/frontend/components/chat/ChatInput.tsx: Brain toggle 'Think' button next to send; when active, model uses think mode - services/frontend/components/chat/ChatWindow.tsx: track thinkingMode state, pass through to API body as thinking_mode - services/frontend/components/chat/MessageBubble.tsx: parse and render <think>...</think> as collapsible italic blocks; <\|python_start\|>...<\|python_end\|> as tool-call cards with icons per tool name; <\|output_start\|>...<\|output_end\|> as result cards with expandable JSON - nanochat/tools.py: TavilySearchBackend class + auto-detect - nanochat/ui.html: legacy UI reasoning toggle (kept for parity) Tool execution verified live: query -> web_search via Tavily -> Macron returned with grounded answer.	2026-04-22 13:43:43 -07:00
Manmohan	1d2a76eec4	feat: deploy d24 SFT + polished UI redesign with dark mode (#39 ) * feat(inference): deploy d24 SFT weights to Modal Repoint Modal inference app from the broken d20 checkpoint to our own ManmohanSharma/nanochat-d24 SFT step 484. Rewrites the standalone model as an inference-only port of nanochat/gpt.py so the modern architecture (smear gate, per-layer value embeddings, ve_gate, backout, sliding window attention via SDPA, rotary base 100000, padded vocab, logit softcap) loads cleanly from the checkpoint. Tokenizer loads the pickled tiktoken encoding directly so special tokens end up at their true IDs (32759-32767), and the stop check uses that set instead of hardcoded 0-8. GPU bumped to L4 for headroom. HF token sourced from the 'huggingface' Modal secret. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(frontend): polished redesign with serif display + dark mode Lifts the craft level of the landing and chat UI without changing the desi identity. Adds Fraunces for display headlines, a floating pill LandingNav, a saffron-glow hero with a large serif headline and black pill CTAs, and three gradient-tiled feature cards with inline SVG glyphs replacing the emoji cards. The chat empty state is now a serif greeting with pill-chip prompt starters, and ChatInput is a single rounded pod so the send button sits inside the input (fixes the misaligned floating button). Adds a class-based dark mode across the chat surfaces with a sun/moon toggle in the sidebar footer, powered by a small useTheme hook and a no-flash init script in the root layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(frontend): add ESLint config so CI lint step passes next lint was failing with an interactive prompt because the repo had no ESLint config. Adds a minimal next/core-web-vitals extends and drops the now-unloadable @typescript-eslint/no-explicit-any disable directive in the stream proxy by narrowing the body type to unknown. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 19:55:16 -04:00
Manmohan Sharma	e5b4db1eee	feat(modal): add Modal GPU inference endpoint for samosaChaat - modal/serve.py: FastAPI endpoint on Modal T4 GPU, streams SSE tokens - modal/_model.py: Standalone GPT model (auto-detects architecture from checkpoint) - modal/_tokenizer.py: Standalone BPE tokenizer (tiktoken-based) - Downloads nanochat-students/base-d20 weights from HuggingFace - Deployed at: https://manmohan659--samosachaat-inference-inference-generate.modal.run Deploy: modal deploy modal/serve.py Dev: modal serve modal/serve.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 14:32:09 -07:00

18 Commits