nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-06-18 20:19:08 +00:00

History

Unsal Gokdag 7fa30f5ee3 CORE eval: disk-cached tokenized batches, double-buffered GPU transfers, batch composition, benchmark improvements the main idea: tokenization + collation for CORE eval only needs to happen once per tokenizer. collated batches at base batch_size=4 are saved to disk (core_token_cache/), keyed by SHA-256 of the tokenizer file. any batch_size can be served from these base-4 batches: larger sizes merge consecutive batches (right-pad shorter ones, cat along dim=0), smaller sizes split along example boundaries (trim trailing padding). this means prepare_task_data is truly a one-time cost. core_eval.py: - double-buffered CPU->GPU transfers in both forward paths (_forward_batches and evaluate_task's pipelined path). while GPU runs forward_model on batch N, batch N+1 is pin_memory()'d and DMA-transferred via non_blocking=True. the DMA engine and GPU compute units are separate hardware so they overlap. previously GPU idled during every transfer. - compose_collated(): merge base batches for larger batch_size (cat after right-padding to max_len), or split for smaller batch_size (slice along row boundaries from batch_meta, trim trailing padding via vectorized non_pad.any(dim=0)). works because examples are sorted by seq_len, so consecutive base batches have monotonically increasing lengths. - evaluate_task and _forward_batches accept optional pbar for progress tracking. base_eval.py: - evaluate_model now has 3-tier caching: in-memory (_batch_cache, across calls within same process), disk load (core_token_cache/, on first call when in-memory is empty), disk save (after first run's prepare+collate+forward, writes collated batches so future training runs and the benchmark skip tokenization entirely). keyed by tokenizer file hash + max_per_task. bench_core_eval.py: - cached sweep no longer re-runs the full first-run sweep to build collated data (was 2x the work for no reason). instead loads/builds base-4 cache once, then compose_collated serves any target batch_size. cached sweep only varies batch_size (no queue_size — no collation thread). - --skip-first: skip the first-run sweep entirely if disk cache exists. if cache is missing, runs a single bs=4 eval in minimal time to create it, then proceeds to cached sweep. - tqdm progress bars everywhere: old sequential baseline (per-example with task name), first-run sweep (double bar: outer=combo progress, inner=per-example), cache building (per-task), cached sweep (double bar). task names left-padded to max label length so the bar doesn't shift. - tokenizer identity via file_checksum (SHA-256 of tokenizer.pkl/tokenizer.json on disk), not encode-output hashing. HF models fall back to hashing the repo name.		2026-02-12 22:34:23 +00:00
..
base_eval.py	CORE eval: disk-cached tokenized batches, double-buffered GPU transfers, batch composition, benchmark improvements	2026-02-12 22:34:23 +00:00
base_train.py	speed up CORE metric evaluation: batched GPU forward passes, threaded CPU prep, cross-call caching. first eval pipelines tokenization on a background thread while GPU processes the previous batch. second+ evals skip tokenization and collation entirely, only GPU forward passes remain. Also adds a benchmark script to sweep batch_size and queue_size hyperparameters.	2026-02-12 18:13:56 +01:00
bench_core_eval.py	CORE eval: disk-cached tokenized batches, double-buffered GPU transfers, batch composition, benchmark improvements	2026-02-12 22:34:23 +00:00
chat_cli.py	remove leftover mid references (#491 )	2026-02-02 08:33:46 -08:00
chat_eval.py	remove leftover mid references (#491 )	2026-02-02 08:33:46 -08:00
chat_rl.py	remove leftover mid references (#491 )	2026-02-02 08:33:46 -08:00
chat_sft.py	fix bug in chat_sft, the attention window must be preserved sigh	2026-02-01 20:58:44 +00:00
chat_web.py	remove leftover mid references (#491 )	2026-02-02 08:33:46 -08:00
tok_eval.py	initial commit	2025-10-13 06:49:24 -07:00
tok_train.py	quick fix to not OOM main speedrun script	2026-01-26 22:31:42 +00:00