nanochat/report/midtraining.md
2025-12-01 19:59:58 -05:00

590 B

Midtraining

timestamp: 2025-11-30 21:47:41

  • wandb_run_name: dummy
  • vertex_experiment: nanochat-experiment
  • vertex_tensorboard: projects/247010501180/locations/us-central1/tensorboards/8180826106513850368
  • device_type:
  • dtype: bfloat16
  • num_iterations: -1
  • max_seq_len: 2048
  • device_batch_size: 8
  • unembedding_lr: 0.0040
  • embedding_lr: 0.2000
  • matrix_lr: 0.0200
  • init_lr_frac: 1.0000
  • weight_decay: 0.0000
  • eval_every: 150
  • eval_tokens: 10,485,760
  • total_batch_size: 524,288
  • dry_run: 0
  • Number of iterations: 813
  • DDP world size: 1
  • Minimum validation bpb: 0.4203