mirror of
https://github.com/karpathy/nanochat.git
synced 2026-01-20 18:34:14 +00:00
The new DataLoader ensures that every token sequence in train/val batches has a BOS token at the beginning. Therefore, no token streams start abruptly in the middle of a document, which could be confusing for the model. Note that this changes the loss scale because there are fewer confusing tokens in the train/val batches. The main downside is that we now waste about 35% of tokens due to cropping. This is ok because we have a lot of data. See dev/LOG.md entry for this change for a lot more information. |
||
|---|---|---|
| .. | ||
| estimate_gpt3_core.ipynb | ||
| gen_synthetic_data.py | ||
| generate_logo.html | ||
| LOG.md | ||
| nanochat.png | ||
| repackage_data_reference.py | ||
| runcpu.sh | ||
| scaling_analysis.ipynb | ||