nanochat/log/report/chat-evaluation-sft.md
2026-02-02 08:18:14 -08:00

405 B

Chat evaluation sft

timestamp: 2026-02-02 01:24:46

  • source: sft
  • task_name: None
  • dtype: bfloat16
  • temperature: 0.0000
  • max_new_tokens: 512
  • num_samples: 1
  • top_k: 50
  • batch_size: 8
  • model_tag: None
  • step: None
  • max_problems: None
  • device_type:
  • ARC-Easy: 0.4903
  • ARC-Challenge: 0.3848
  • MMLU: 0.3480
  • GSM8K: 0.0470
  • HumanEval: 0.1463
  • SpellingBee: 0.9883
  • ChatCORE metric: 0.3021