nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-05-07 16:30:11 +00:00

History

Manmohan Sharma 7a92f5b016 fix(serve): detect tool markers in text stream not token ids The SFT loader tokenizes assistant content with .encode() (ordinary), not .encode_special(), so the model was trained to emit <\|python_start\|> / <\|python_end\|> as the 7-token ordinary sequence [60, 124, 25145, 95, 17104, 124, 62] rather than as special token id 32764. My prior state-machine matched token_id == python_start_id, which never fired — so tool calls were never executed and the model just hallucinated fake tool results (Official leadership page etc). Fix: detect markers in the decoded text stream, parse the payload between <\|python_start\|> and <\|python_end\|>, execute the tool, inject the real <\|output_start\|>…<\|output_end\|> tokens into both the SSE stream and the model's input_ids. Next-token prediction is now grounded on real Tavily output.		2026-04-22 14:39:36 -07:00
..
_model.py	feat: deploy d24 SFT + polished UI redesign with dark mode (#39 )	2026-04-16 19:55:16 -04:00
_tokenizer.py	feat: deploy d24 SFT + polished UI redesign with dark mode (#39 )	2026-04-16 19:55:16 -04:00
_tools.py	fix(tools): enable Tavily include_answer and fix UI overflow	2026-04-22 14:20:47 -07:00
serve.py	fix(serve): detect tool markers in text stream not token ids	2026-04-22 14:39:36 -07:00