nanochat/dev
2026-03-24 20:54:03 +01:00
..
attention_residuals_d12_500_1000.png Add gated attention residuals 2026-03-17 08:30:40 +11:00
attention_residuals_d12_0_500.png Add gated attention residuals 2026-03-17 08:30:40 +11:00
attention_residuals_local_results.md Add gated attention residuals 2026-03-17 08:30:40 +11:00
estimate_gpt3_core.ipynb
gen_synthetic_data.py tune the synthetic data generation script. delete the king andrej stuff lol. also, upgrade to gemini 3 2026-02-02 01:45:59 +00:00
generate_logo.html
LEADERBOARD.md submit new time to GPT-2 leaderboard entry: 99 minutes 2026-03-14 17:15:01 +00:00
LOG.md delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
nanochat.png
repackage_data_reference.py document the legacy fineweb100b dataset and the new climbmix400b dataset 2026-03-03 17:24:31 +00:00
scaling_analysis.ipynb fix scaling laws scripts after the bigram embeddings were removed 2026-03-17 16:55:56 +00:00
scaling_laws_jan26.png nuke midtraining from orbit, it's not as needed now that we have a BOS-aligned dataloader. Also change the README a lot. midtrianing is not yet fully properly erased across the board, but good enough for step 1 2026-01-31 19:12:25 +00:00