New architectural features:
- Smear: mix previous token embedding into current position via learned
gate, providing cheap bigram-like info (works in training + KV cache)
- Backout: subtract learned fraction of mid-layer residual before logit
projection to remove low-level features
Hyperparameter tuning:
- Muon momentum warmdown 0.97→0.90 during LR warmdown phase
- Non-uniform per-layer init: resid_lambdas 1.15→1.05, x0_lambdas 0.20→0.05
- c_fc init scale 0.4x, QK norm scale 1.2, sliding window seq_len/4
- Speedrun data:params ratio reduced to 8
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
H-2 (High) — scripts/chat_web.py
Fix CORS misconfiguration: remove allow_credentials=True (incompatible with
wildcard origin) and restrict allow_methods/allow_headers to the minimum
required set (GET, POST / Content-Type, X-Stats-Key).
M-5 (Medium) — scripts/chat_web.py
Add sliding-window rate limiter on /chat/completions keyed by client IP.
Implemented without additional dependencies using asyncio + defaultdict.
Configurable via NANOCHAT_RATE_LIMIT and NANOCHAT_RATE_WINDOW env vars
(defaults: 10 requests per 60 seconds).
M-1 (Medium) — scripts/chat_web.py
Protect /health and /stats with an optional API key dependency.
When NANOCHAT_STATS_KEY env var is set, both endpoints require the value
in the X-Stats-Key header. Uses secrets.compare_digest to prevent timing
attacks. No-op when env var is unset (backwards compatible).
M-4 (Medium) — scripts/chat_web.py
Redact full conversation content from server logs.
User message bodies are no longer logged at INFO level; only message count
and a 120-char preview at DEBUG level. Assistant response logs now record
character count only, not content.
L-2 (Low) — nanochat/execution.py
Enforce memory limits on macOS in the code execution sandbox.
Previously the entire resource limit block was skipped on Darwin with a
comment 'seem to fail'. RLIMIT_AS is indeed unsupported on macOS, but
RLIMIT_DATA is. Linux now uses both RLIMIT_AS and RLIMIT_DATA; macOS uses
RLIMIT_DATA. Both paths are guarded by a None check.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Five targeted security fixes — all non-breaking, no behaviour change on the happy path.
H-1 (High) — nanochat/checkpoint_manager.py
Add weights_only=True to all three torch.load() calls.
torch.load() uses pickle by default; loading a malicious .pt file from an
untrusted source allows arbitrary code execution. weights_only=True restricts
deserialization to tensors and primitives, blocking this attack surface.
Refs: https://pytorch.org/docs/stable/generated/torch.load.html
H-3 (High) — nanochat/ui.html
Replace innerHTML injection with createElement + textContent for error display.
error.message was interpolated directly into innerHTML, creating an XSS sink:
a crafted server error response could inject and execute arbitrary JavaScript.
textContent escapes all HTML entities, closing the injection path.
L-1 (Low) — scripts/chat_web.py
Fix misleading role validation error message.
The error string claimed 'system' was a valid role, but the guard only accepts
'user' and 'assistant'. Corrected to reflect the actual allowed values.
M-3 (Medium) — nanochat/common.py
Reject non-HTTPS URLs in download_file_with_lock().
urlopen() follows redirects including HTTPS->HTTP downgrades, enabling MITM
attacks on downloaded model/tokenizer files. Added an explicit scheme check
that raises ValueError for any non-HTTPS URL before the request is made.
L-3 (Low) — nanochat/dataset.py
Replace predictable .tmp suffix with tempfile.NamedTemporaryFile.
The previous filepath + '.tmp' naming caused a TOCTOU race when multiple
worker processes downloaded the same shard concurrently, and is vulnerable
to symlink attacks on shared filesystems. NamedTemporaryFile generates a
unique path; os.replace() provides an atomic rename on POSIX.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* printing steps count
* adding reply only loss for chat
* using the mask by render_conversation function of tokeniser
* undoing some changes
* putting back the comment which got removed accidently, no functionality change
Store quantized input/weight and their inverse scales in _Float8Matmul ctx to avoid re-quantization in backward and reduce saved-activation memory without changing numerics.