mirror of
https://github.com/karpathy/nanochat.git
synced 2026-04-23 01:08:40 +00:00
14 lines
263 B
Markdown
14 lines
263 B
Markdown
## Tokenizer training
|
|
timestamp: 2026-02-01 14:40:20
|
|
|
|
- max_chars: 2,000,000,000
|
|
- doc_cap: 10,000
|
|
- vocab_size: 32,768
|
|
- train_time: 87.9820
|
|
- num_special_tokens: 9
|
|
- token_bytes_min: 1
|
|
- token_bytes_max: 19
|
|
- token_bytes_mean: 6.6029
|
|
- token_bytes_std: 2.8250
|
|
|