nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-05-28 02:28:11 +00:00

History

Sofie Van Landeghem a3ca42a678 add comment		2026-04-13 14:17:23 +02:00
..
__init__.py
checkpoint_manager.py
common.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
core_eval.py
dataloader.py	big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise	2026-03-04 19:47:12 +00:00
dataset.py	big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise	2026-03-04 19:47:12 +00:00
engine.py	Autoresearch round 2: smear, backout, and hyperparameter tuning	2026-03-14 17:03:06 +00:00
execution.py
flash_attention.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
fp8.py	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
gpt.py	add comment	2026-04-13 14:17:23 +02:00
logo.svg
loss_eval.py
optim.py	use COMPUTE_DTYPE-aware cast in Muon polar express step	2026-03-25 20:19:14 +00:00
report.py
tokenizer.py
ui.html