nanochat/services
Manmohan Sharma 5bd773ef13
feat: double default and max generation budget
Default inference_default_max_tokens 512->1024 in chat-api and in
modal/serve.py default. Hard cap in modal raised 2048->4096. Fixes
mid-sentence cutoffs on longer (esp. thinking-mode) answers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 22:20:05 -07:00
..
auth Merge pull request #26 from manmohan659/fix/missing-models 2026-04-16 16:50:22 -04:00
chat-api feat: double default and max generation budget 2026-04-22 22:20:05 -07:00
frontend feat(ui): cleaner input layout + sanitize model-output artifacts 2026-04-22 15:31:00 -07:00
inference fix(inference): regenerate uv.lock after structlog/prometheus deps added 2026-04-16 12:49:05 -07:00