nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2025-12-06 04:12:13 +00:00

Author	SHA1	Message	Date
Fred Bliss	e6ce7a06b0	Merge branch 'master' into master	2025-11-24 07:00:46 -06:00
google-labs-jules[bot]	51927a9e60	feat: Add comprehensive end-to-end documentation This commit introduces extensive documentation across the entire nanochat codebase. The goal is to make the project more accessible, educational, and easier for new contributors to understand. Key additions include: - A new "Codebase Overview and Data Flow" section in the main README.md, providing a high-level guide to the project structure and training pipeline. - Detailed, educational docstrings and inline comments in all Python modules within the `nanochat/`, `scripts/`, and `tasks/` directories. - Explanations of the rationale and implementation details for key components, including Python equivalents for non-Python code where applicable. - A new `README.md` in the `rustbpe/` directory explaining the BPE algorithm and the decision to use Rust. - Comprehensive comments in shell scripts and development scripts in the `dev/` directory, clarifying their purpose and usage.	2025-11-24 12:57:49 +00:00
water-vapor	a9de4b1038	Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1	2025-10-26 01:43:49 -05:00
Andrej Karpathy	8892470f29	add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think	2025-10-24 14:02:48 +00:00
Luke Stanley	defd1246aa	Fix Torch crash caused by pinning on CPU	2025-10-21 20:28:10 +00:00
Andrej Karpathy	5bdc99abfb	merge and resolve conflict	2025-10-21 17:19:10 +00:00
Andrej Karpathy	fe5aed940b	add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok	2025-10-21 15:04:58 +00:00
karpathy	2e9669e03a	upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming	2025-10-20 10:15:17 -07:00
karpathy	ae02650afe	update the midtraining script too	2025-10-16 16:33:17 -07:00
Andrej Karpathy	b8076dd367	fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68 . also add --dry_run option useful for experimentation	2025-10-15 16:35:04 +00:00
karpathy	3a5e0bc50b	initial commit	2025-10-13 06:49:24 -07:00

11 Commits