nanochat/modal
Manmohan Sharma 5bd773ef13
feat: double default and max generation budget
Default inference_default_max_tokens 512->1024 in chat-api and in
modal/serve.py default. Hard cap in modal raised 2048->4096. Fixes
mid-sentence cutoffs on longer (esp. thinking-mode) answers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 22:20:05 -07:00
..
_model.py feat: deploy d24 SFT + polished UI redesign with dark mode (#39) 2026-04-16 19:55:16 -04:00
_query_classifier.py fix: veto matches shorthand 'u' and 'r' for you/are 2026-04-22 16:10:59 -07:00
_tokenizer.py feat: deploy d24 SFT + polished UI redesign with dark mode (#39) 2026-04-16 19:55:16 -04:00
_tools.py fix(tools): enable Tavily include_answer and fix UI overflow 2026-04-22 14:20:47 -07:00
serve.py feat: double default and max generation budget 2026-04-22 22:20:05 -07:00