nanochat/report/chat-evaluation-mid.md
2025-12-01 19:59:58 -05:00

405 B

Chat evaluation mid

timestamp: 2025-11-30 23:51:33

  • source: mid
  • task_name: None
  • dtype: bfloat16
  • temperature: 0.0000
  • max_new_tokens: 512
  • num_samples: 1
  • top_k: 50
  • batch_size: 8
  • model_tag: None
  • step: None
  • max_problems: None
  • device_type:
  • ARC-Easy: 0.3847
  • ARC-Challenge: 0.2944
  • MMLU: 0.3079
  • GSM8K: 0.0303
  • HumanEval: 0.0610
  • SpellingBee: 0.9688
  • ChatCORE metric: 0.2293