Training data taught the model to echo another <|output_start|>…<|output_end|> after our injected real tool result. Detect that second sequence and break the turn; the grounded answer has already streamed to the client.
Previous text-stream approach lost markers because BPE partial-byte tokens decode to empty strings, so assistant_text never accumulated the full marker. Switch to matching the ordinary-token id sequence directly (python_start = [60,124,25145,95,17104,124,62]).
The SFT loader tokenizes assistant content with .encode() (ordinary), not .encode_special(), so the model was trained to emit <|python_start|> / <|python_end|> as the 7-token ordinary sequence [60, 124, 25145, 95, 17104, 124, 62] rather than as special token id 32764. My prior state-machine matched token_id == python_start_id, which never fired — so tool calls were never executed and the model just hallucinated fake tool results (Official leadership page etc). Fix: detect markers in the decoded text stream, parse the payload between <|python_start|> and <|python_end|>, execute the tool, inject the real <|output_start|>…<|output_end|> tokens into both the SSE stream and the model's input_ids. Next-token prediction is now grounded on real Tavily output.
Model R6 (97% pass rate on 33-probe eval, val_bpb 0.2635):
- modal/serve.py + modal/_tools.py: tool-aware streaming with
TavilySearchBackend auto-detect, python_start/end state machine,
output_start/end forcing; mount tavily secret
- modal/serve.py: MODEL_TAG=d24-sft-r6, model path points at new SFT r6
- services/chat-api/routes/messages.py: accept thinking_mode flag,
inject samosaChaat system prompt (direct or <think> variant) into
first user message before streaming to Modal
- services/frontend/components/chat/ChatInput.tsx: Brain toggle
'Think' button next to send; when active, model uses think mode
- services/frontend/components/chat/ChatWindow.tsx: track
thinkingMode state, pass through to API body as thinking_mode
- services/frontend/components/chat/MessageBubble.tsx: parse and
render <think>...</think> as collapsible italic blocks;
<|python_start|>...<|python_end|> as tool-call cards with icons
per tool name; <|output_start|>...<|output_end|> as result cards
with expandable JSON
- nanochat/tools.py: TavilySearchBackend class + auto-detect
- nanochat/ui.html: legacy UI reasoning toggle (kept for parity)
Tool execution verified live: query -> web_search via Tavily ->
Macron returned with grounded answer.
* feat(inference): deploy d24 SFT weights to Modal
Repoint Modal inference app from the broken d20 checkpoint to our own
ManmohanSharma/nanochat-d24 SFT step 484. Rewrites the standalone model
as an inference-only port of nanochat/gpt.py so the modern architecture
(smear gate, per-layer value embeddings, ve_gate, backout, sliding
window attention via SDPA, rotary base 100000, padded vocab, logit
softcap) loads cleanly from the checkpoint. Tokenizer loads the pickled
tiktoken encoding directly so special tokens end up at their true IDs
(32759-32767), and the stop check uses that set instead of hardcoded
0-8. GPU bumped to L4 for headroom. HF token sourced from the
'huggingface' Modal secret.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(frontend): polished redesign with serif display + dark mode
Lifts the craft level of the landing and chat UI without changing the
desi identity. Adds Fraunces for display headlines, a floating pill
LandingNav, a saffron-glow hero with a large serif headline and black
pill CTAs, and three gradient-tiled feature cards with inline SVG
glyphs replacing the emoji cards. The chat empty state is now a serif
greeting with pill-chip prompt starters, and ChatInput is a single
rounded pod so the send button sits inside the input (fixes the
misaligned floating button). Adds a class-based dark mode across the
chat surfaces with a sun/moon toggle in the sidebar footer, powered by
a small useTheme hook and a no-flash init script in the root layout.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(frontend): add ESLint config so CI lint step passes
next lint was failing with an interactive prompt because the repo had
no ESLint config. Adds a minimal next/core-web-vitals extends and
drops the now-unloadable @typescript-eslint/no-explicit-any disable
directive in the stream proxy by narrowing the body type to unknown.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>