Commit Graph

13 Commits

Author SHA1 Message Date
santhoshravindran7
3fa394c93f security: fix unsafe deserialization, XSS, HTTPS enforcement, and temp file race
Five targeted security fixes — all non-breaking, no behaviour change on the happy path.

H-1 (High) — nanochat/checkpoint_manager.py
  Add weights_only=True to all three torch.load() calls.
  torch.load() uses pickle by default; loading a malicious .pt file from an
  untrusted source allows arbitrary code execution. weights_only=True restricts
  deserialization to tensors and primitives, blocking this attack surface.
  Refs: https://pytorch.org/docs/stable/generated/torch.load.html

H-3 (High) — nanochat/ui.html
  Replace innerHTML injection with createElement + textContent for error display.
  error.message was interpolated directly into innerHTML, creating an XSS sink:
  a crafted server error response could inject and execute arbitrary JavaScript.
  textContent escapes all HTML entities, closing the injection path.

L-1 (Low) — scripts/chat_web.py
  Fix misleading role validation error message.
  The error string claimed 'system' was a valid role, but the guard only accepts
  'user' and 'assistant'. Corrected to reflect the actual allowed values.

M-3 (Medium) — nanochat/common.py
  Reject non-HTTPS URLs in download_file_with_lock().
  urlopen() follows redirects including HTTPS->HTTP downgrades, enabling MITM
  attacks on downloaded model/tokenizer files. Added an explicit scheme check
  that raises ValueError for any non-HTTPS URL before the request is made.

L-3 (Low) — nanochat/dataset.py
  Replace predictable .tmp suffix with tempfile.NamedTemporaryFile.
  The previous filepath + '.tmp' naming caused a TOCTOU race when multiple
  worker processes downloaded the same shard concurrently, and is vulnerable
  to symlink attacks on shared filesystems. NamedTemporaryFile generates a
  unique path; os.replace() provides an atomic rename on POSIX.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-08 23:12:50 -07:00
Andrej Karpathy
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
Sofie Van Landeghem
72b9064f9d
remove leftover mid references (#491) 2026-02-02 08:33:46 -08:00
Aarushi Singh
ace6740bdd
feat: allow top_k=0 in web api to disable filtering (#458)
* allow top_k=0 in web api to disable filtering

* adding a comment for clear reasoning

* adding change to docstring
2026-01-30 09:21:41 -08:00
svlandeg
2ce62ec076 ensure consistency of quotes within each statement 2025-11-03 21:52:02 +01:00
svlandeg
c72b8b2309 add explicit UTF-8 encoding 2025-11-03 21:27:12 +01:00
karpathy
2e9669e03a upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming 2025-10-20 10:15:17 -07:00
Andrej Karpathy
4346536ab2 also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate 2025-10-16 01:28:37 +00:00
Andrej Karpathy
4c3590c499 fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints 2025-10-15 20:29:54 +00:00
Andrej Karpathy
03fa673b7d add basic logging to chat_web, which i think might be fun 2025-10-15 19:51:06 +00:00
Andrej Karpathy
52bfeea8bd add very basic abuse prevention limits to chat_web so it's ok to host endpoints 2025-10-15 19:42:54 +00:00
Andrej Karpathy
01fb290f53 allow multiple GPUs to do inference in a data parallel way 2025-10-15 19:12:19 +00:00
karpathy
3a5e0bc50b initial commit 2025-10-13 06:49:24 -07:00