nanochat/log/report/tokenizer-training.md
2026-02-02 08:18:14 -08:00

14 lines
263 B
Markdown

## Tokenizer training
timestamp: 2026-02-01 14:40:20
- max_chars: 2,000,000,000
- doc_cap: 10,000
- vocab_size: 32,768
- train_time: 87.9820
- num_special_tokens: 9
- token_bytes_min: 1
- token_bytes_max: 19
- token_bytes_mean: 6.6029
- token_bytes_std: 2.8250