nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-06-18 20:19:08 +00:00

History

Unsal Gokdag c3f234cfca CORE eval: GPU-resident data, continuous pipeline, per-task progress bars three independent improvements to the cached CORE evaluation path: 1. GPU-resident data: all base-4 collated batches (~144MB for full CORE eval) are moved to GPU upfront via .to(device). eliminates all CPU→GPU transfers from the forward loop. _forward_all_cached replaces double-buffered prefetch with a simple upfront bulk transfer — .to() is a no-op when the caller has already preloaded tensors to GPU (as bench_core_eval now does). 2. continuous cross-task pipeline: _forward_all_cached flattens all tasks' batches into one stream. the last batch of task N flows directly into the first batch of task N+1 with no pipeline restart. GPU-side composition via merge (pad+cat for bs > base) and split (row-slice for bs < base) avoids the CPU-side compose_collated bottleneck that made bs=8 slower than bs=4. 3. progress bars + per-task result printing: both cached and first-run paths in evaluate_model now show a tqdm progress bar with the current task label. on_task_done callback in _forward_all_cached prints each task's accuracy as soon as its last batch is processed (single-GPU). DDP falls back to printing after all_reduce. both paths print total elapsed time at the end. bench_core_eval: preloads ALL base-4 batches to GPU once before the batch-size sweep. all sweep iterations compose from GPU-resident tensors with zero CPU→GPU transfers in the hot loop.		2026-02-13 07:54:53 +00:00
..
base_eval.py	CORE eval: GPU-resident data, continuous pipeline, per-task progress bars	2026-02-13 07:54:53 +00:00
base_train.py	speed up CORE metric evaluation: batched GPU forward passes, threaded CPU prep, cross-call caching. first eval pipelines tokenization on a background thread while GPU processes the previous batch. second+ evals skip tokenization and collation entirely, only GPU forward passes remain. Also adds a benchmark script to sweep batch_size and queue_size hyperparameters.	2026-02-12 18:13:56 +01:00
bench_core_eval.py	CORE eval: GPU-resident data, continuous pipeline, per-task progress bars	2026-02-13 07:54:53 +00:00
chat_cli.py	remove leftover mid references (#491 )	2026-02-02 08:33:46 -08:00
chat_eval.py	remove leftover mid references (#491 )	2026-02-02 08:33:46 -08:00
chat_rl.py	remove leftover mid references (#491 )	2026-02-02 08:33:46 -08:00
chat_sft.py	fix bug in chat_sft, the attention window must be preserved sigh	2026-02-01 20:58:44 +00:00
chat_web.py	remove leftover mid references (#491 )	2026-02-02 08:33:46 -08:00
tok_eval.py	initial commit	2025-10-13 06:49:24 -07:00
tok_train.py	quick fix to not OOM main speedrun script	2026-01-26 22:31:42 +00:00