nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-03-10 11:15:30 +00:00

History

Andrej Karpathy 43c29dd9d5 Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training The new DataLoader ensures that every token sequence in train/val batches has a BOS token at the beginning. Therefore, no token streams start abruptly in the middle of a document, which could be confusing for the model. Note that this changes the loss scale because there are fewer confusing tokens in the train/val batches. The main downside is that we now waste about 35% of tokens due to cropping. This is ok because we have a lot of data. See dev/LOG.md entry for this change for a lot more information.		2026-01-13 20:05:47 +00:00
..
base_eval.py	bugfix	2025-12-26 19:02:12 +08:00
base_loss.py	allow base_loss to report the loss of any arbitrary huggingface model similar to base_eval. had to change dataloader to be a lot better and just take tokenizer, not load the nanochat one. much better this way anyway	2026-01-12 03:10:13 +00:00
base_train.py	Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training	2026-01-13 20:05:47 +00:00
chat_cli.py	upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming	2025-10-20 10:15:17 -07:00
chat_eval.py	fix typos	2025-11-14 11:20:25 +01:00
chat_rl.py	Fix undefined variable in chat_rl after recent refactor	2026-01-07 09:08:57 -08:00
chat_sft.py	delete the configurator in favor of argparse and clean up a lot of kwarg details to make them more consistent across all scripts	2026-01-04 19:14:23 +00:00
chat_web.py	ensure consistency of quotes within each statement	2025-11-03 21:52:02 +01:00
mid_train.py	Big DataLoader refactor: BOS-aligned dataloaders with epoch tracking for pre/mid-training	2026-01-13 20:05:47 +00:00
tok_eval.py	initial commit	2025-10-13 06:49:24 -07:00
tok_train.py	nudge hyperparameters of the base script with the results of the sweeps and miniseries. vocab size down to 32K. D:N ratio from 20 to 8. add miniseries script	2026-01-07 22:11:59 +00:00