mirror of
https://github.com/karpathy/nanochat.git
synced 2026-01-23 20:04:22 +00:00
humaneval done
This commit is contained in:
parent
c026e6f63d
commit
a1f836bbeb
13
lm_eval.md
13
lm_eval.md
|
|
@ -33,28 +33,27 @@ uv run lm-eval run --model hf \
|
|||
--tasks mmlu \
|
||||
--batch_size 1
|
||||
|
||||
# A small suite similar to nanochat chat_eval coverage
|
||||
# A small suite similar to nanochat chat_eval coverage (vanilla HF backend)
|
||||
# HumanEval requires both flags below to allow executing generated code.
|
||||
HF_ALLOW_CODE_EVAL=1 uv run lm-eval run --confirm_run_unsafe_code --model hf \
|
||||
--model_args pretrained=hf-export/sft,trust_remote_code=True \
|
||||
--tasks arc_easy,arc_challenge,gsm8k,mmlu,humaneval \
|
||||
--apply_chat_template \
|
||||
--tasks arc_easy,arc_challenge,mmlu \
|
||||
--batch_size 1 > log.log 2>&1
|
||||
|
||||
# Nanochat-aligned tool-use backend (matches nanochat eval formatting)
|
||||
HF_ALLOW_CODE_EVAL=1 uv run lm-eval run \
|
||||
--include_path tools/lm-eval/lm_eval/tasks \
|
||||
--confirm_run_unsafe_code \
|
||||
--model hf \
|
||||
--model hf-nanochat-tool \
|
||||
--model_args pretrained=hf-export/sft,trust_remote_code=True,tokenizer=hf-export/sft \
|
||||
--tasks gsm8k_nanochat,humaneval_nanochat \
|
||||
--apply_chat_template \
|
||||
--batch_size 1 \
|
||||
--log_samples \
|
||||
--output_path lm_eval_sample_nanochat.json > log.log 2>&1
|
||||
--output_path lm_eval_sample_nanochat > log.log 2>&1
|
||||
```
|
||||
|
||||
Notes:
|
||||
- If you exported to a different folder, change `pretrained=...` accordingly. You can also point to a remote HF repo name.
|
||||
- If you must stay offline, add `HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 TRANSFORMERS_OFFLINE=1`, **but** ensure the datasets are already cached locally (e.g., `allenai/ai2_arc`, `openai_humaneval`, `gsm8k`, `cais/mmlu`). Otherwise, leave them unset so the harness can download once.
|
||||
- `--batch_size auto` can help find the largest batch that fits GPU RAM. On CPU, keep it small.
|
||||
- No KV cache is implemented in the HF wrapper; generation is standard `AutoModelForCausalLM` style.
|
||||
- No KV cache is implemented in the HF wrapper; generation is standard `AutoModelForCausalLM` style. The `hf-nanochat-tool` wrapper runs a nanochat-style tool loop (greedy, batch=1) and does not need `--apply_chat_template` because the prompts already contain special tokens.
|
||||
|
|
|
|||
File diff suppressed because one or more lines are too long
File diff suppressed because it is too large
Load Diff
File diff suppressed because one or more lines are too long
|
|
@ -1 +1 @@
|
|||
Subproject commit 5628f98f0c387366f18964e3d34b614e5600f83b
|
||||
Subproject commit 32c4b74696a41586712a8a8b7906591833ba1a78
|
||||
Loading…
Reference in New Issue
Block a user