files-to-prompt was including untracked files (knowledge/, dev scripts, etc.) which inflated the bloat metrics. now we use git ls-files to only count tracked source files, which is more accurate and removes an external dependency.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Update README: switch hosted model description from d32 to d34 per discussion #314
* link to discussion thread
* parameter in quotes
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Previously, when generating multiple samples (num_samples > 1), the first
token after prefill was sampled once and broadcast to all rows, causing
all samples to start identically. Now the prefill logits are expanded to
num_samples and sampled independently for each row.
Also simplified the generation loop by moving the forward pass to the end
of the loop, eliminating the first_iteration flag and if/else branching.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Performance varies by machine and load, making hard assertions flaky.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Refactor ChatRequest model to use Pydantic Field for default values
- Update documentation for system prompt exclusion.
- Remove duplicate await in exception
This change adds OpenAI-compatible API endpoints to the nanochat web server, enabling seamless integration with existing OpenAI SDK clients and tools while maintaining backward compatibility with the original chat UI.
- Adds /v1/chat/completions route - supporting both streaming and non-streaming mode
- Adds /v1/models route - returns single model, "nanochat"
This change ensures that the logits softcapping operation (tanh) is performed in float32 precision rather than bfloat16. Previously, the code cast to float32 after the tanh operation, which meant the non-linearity was computed with bfloat16 precision