Same bug as scaling_laws.sh: TOKENS_TRAINED was computed as NUM_ITERS * 524288,
hardcoding the default total batch size. When base_train auto-computes a different
batch size, the value is wrong. Fix by reading "Total number of training tokens:"
directly from the training log.
Two bugs caused all parameter columns and tokens_trained to be silently
empty/wrong in the results CSV:
1. Parameter grep patterns did not account for the padded key format.
base_train.py prints parameters as `{key:24s}: {value:,}`, e.g.
`wte : 33,554,432`, so patterns like `grep "wte:"`
never matched. Fixed by using `grep -P "wte\s+:"` to handle the spaces.
2. tokens_trained was hardcoded as `NUM_ITERS * 524288`, but the batch
size is auto-computed by base_train.py and may differ from 524288
depending on the FLOPs budget and model size. Fixed by extracting the
actual value from the log line "Total number of training tokens: X".