## Tokenizer training timestamp: 2026-02-01 14:40:20 - max_chars: 2,000,000,000 - doc_cap: 10,000 - vocab_size: 32,768 - train_time: 87.9820 - num_special_tokens: 9 - token_bytes_min: 1 - token_bytes_max: 19 - token_bytes_mean: 6.6029 - token_bytes_std: 2.8250