Commit Graph

  • fca2b8cd07 harden eval: prevent the calc tool from accessing globals and locals By passing empty globals() and locals() to eval() we can prevent simple malicious cases where the user gets the model to output something like Marius Wachtler 2025-10-24 14:29:35 -0500
  • 05a051dbe9 fix tokenization bug, there should be no space before first letter. sigh Andrej Karpathy 2025-10-24 15:06:06 +0000
  • 8892470f29 add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think Andrej Karpathy 2025-10-24 14:02:48 +0000
  • 81597cd616 move the lr schedule args up in base_train so they are tunable in configurator Andrej Karpathy 2025-10-24 13:27:31 +0000
  • cc3636b01c allow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough Andrej Karpathy 2025-10-24 13:27:05 +0000
  • 9841848a2f Add Adaptive Gradient Clipping (AGC) to pretraining ulanch 2025-10-23 23:27:31 -0700
  • ca17179ec2 export wandb run name works ob1 2025-10-24 13:54:28 +0800
  • 41c8b8dbde removed buffer approuch MadMax129 2025-10-23 20:23:59 -0400
  • e02938c0aa cleanup MadMax129 2025-10-23 17:55:33 -0400
  • 12f418f0a1 faster regex in C MadMax129 2025-10-23 16:59:10 -0400
  • ed8d73a154 aling scripts with speedrun.sh Shizhe Diao 2025-10-23 06:33:23 -0700
  • a5f3ccc3ca improve scripts to preprocess climbmix Shizhe Diao 2025-10-23 06:33:10 -0700
  • 33ddc13ed4
    Improve configurator: add testable parse_args() and ConfigManager class Sermet Pekin 2025-10-23 09:56:52 +0300
  • e237116095 rename bash script Shizhe Diao 2025-10-22 22:36:56 -0700
  • 2a6276bfcb restore speedrun.sh Shizhe Diao 2025-10-22 22:36:12 -0700
  • 29b94f35ec track speedrun.sh Shizhe Diao 2025-10-22 22:33:19 -0700
  • 1d34a19b87 remove pretrain.sh and midtrain.sh Shizhe Diao 2025-10-22 22:30:21 -0700
  • dd8310c3d4 clean comments Shizhe Diao 2025-10-22 22:22:28 -0700
  • f384c16ba5 Update comments Shizhe Diao 2025-10-22 22:19:20 -0700
  • 55fed15421 remove redundant configs in base_eval.py Shizhe Diao 2025-10-22 22:09:24 -0700
  • 3525e6d5b7 remove redundancy Shizhe Diao 2025-10-22 22:05:37 -0700
  • de3ef20e20 rename files Shizhe Diao 2025-10-22 22:00:26 -0700
  • 66a92fc293 rename Shizhe Diao 2025-10-22 21:13:34 -0700
  • 4b62a8b00c add nemotron data processing script Shizhe Diao 2025-10-22 21:09:12 -0700
  • fc534f5f41 add script for nemotron recipe Shizhe Diao 2025-10-22 21:02:22 -0700
  • cf3b8ca20e fixed a bug in base_eval.py Shizhe Diao 2025-10-22 20:59:23 -0700
  • b939be0372 update script Shizhe Diao 2025-10-21 06:27:05 -0700
  • f3f069519d improve tokenizer and report in midtrain and sft Shizhe Diao 2025-10-20 22:04:27 -0700
  • 169022fec0 fixed a bug in base_eval Shizhe Diao 2025-10-20 22:03:26 -0700
  • 78611b9983 upload midtrain_sft_submit.sh Shizhe Diao 2025-10-20 11:51:34 -0700
  • 370de99bbf get report right Shizhe Diao 2025-10-20 11:51:01 -0700
  • e872e798c4 improve script Shizhe Diao 2025-10-20 11:48:24 -0700
  • defcef6587 edit report generation Shizhe Diao 2025-10-20 11:12:05 -0700
  • fc23c1aa71 use the same tokenizer Shizhe Diao 2025-10-20 11:11:31 -0700
  • cee6a17d9e support nemotron posttraining data Shizhe Diao 2025-10-19 15:11:54 -0700
  • 7690b82d4b support nemotron posttraining data in mid-train and sft Shizhe Diao 2025-10-19 15:10:53 -0700
  • 646647c776 support custom tokenizer by adding tokenizer_name Shizhe Diao 2025-10-19 15:08:01 -0700
  • 2085e6637a support custom training data, train tokenizer Shizhe Diao 2025-10-19 07:55:41 -0700
  • 15e7a22a41 support custom training data Shizhe Diao 2025-10-19 07:53:44 -0700
  • 21d8b9994f multinode slurm submit Shizhe Diao 2025-10-18 07:30:09 -0700
  • be1e6c3592 add exp_name as unique id Shizhe Diao 2025-10-18 07:29:34 -0700
  • 0de778a75b update wandb Shizhe Diao 2025-10-17 16:14:47 -0700
  • d5cda11ab8 Export the base dir variable Tancrède Lepoint 2025-10-22 18:13:04 -0400
  • cd6cebcbe0 Export the base dir variable Tancrède Lepoint 2025-10-22 18:13:04 -0400
  • 7715d0d425 reset tests/test_rustbpe.py file to upstream master Sermet Pekin 2025-10-22 21:25:52 +0300
  • a7d130f015 workflow remove windows from matrix Sermet Pekin 2025-10-22 20:52:05 +0300
  • c4efcafaa8 wf uv sync --extra cpu Sermet Pekin 2025-10-22 20:46:39 +0300
  • 63e4691357 Specify UTF-8 encoding for on test_rustbpe.py while enwik8 file reads Sermet Pekin 2025-10-22 13:57:45 +0300
  • 4b45dfee97 Rename base.yml to .github/workflows/base.yml Sermet Pekin 2025-10-22 13:51:16 +0300
  • 46659d1009 Refactor CI workflow to use 'uv' commands Sermet Pekin 2025-10-22 13:50:35 +0300
  • 5eeb2b6ef9 experiment: looking to 'hire' a nanochat repo czar to help the repo, mentioning in readme Andrej Karpathy 2025-10-22 16:55:54 +0000
  • 2dda5c4c8d Merge branch 'ulanch-fix/ios-safari-input-overlap' Andrej Karpathy 2025-10-22 16:26:35 +0000
  • 80b203ea59 also bump run1000.sh to new uv sync Andrej Karpathy 2025-10-22 16:08:31 +0000
  • 917c858136 Updates lockfile with CPU package support without overwriting other architectures Luke Stanley 2025-10-21 20:53:18 +0000
  • db1d5b595d Git ignore eval_bundle Luke Stanley 2025-10-21 20:39:31 +0000
  • dd9387b362 Fix GPU-less CPU use on Linux with specific Torch indexes Luke Stanley 2025-10-21 19:52:21 +0000
  • 32571664b1 Fix Torch crash caused by pinning on CPU Luke Stanley 2025-10-21 19:43:38 +0000
  • 51e70f0d3c Merge branch 'lukestanley-fix-cpu-support-with-extras' Andrej Karpathy 2025-10-22 16:11:15 +0000
  • 48387cd895 also bump run1000.sh to new uv sync Andrej Karpathy 2025-10-22 16:08:31 +0000
  • ca632b2fd9 FEAT: Allow CPU-only execution in compute_init SyedaAnshrahGillani 2025-10-22 18:12:18 +0500
  • 529d1b9cf9
    docs: add comprehensive developer onboarding guide Claude 2025-10-22 13:07:55 +0000
  • 6641aeed1d FEAT: Allow CPU-only execution in compute_init SyedaAnshrahGillani 2025-10-22 18:07:30 +0500
  • 3fbe7cd2b9
    Merge b70da6d907 into 2e938530ce SyedaAnshrahGillani 2025-10-22 18:05:50 +0530
  • 67d76b834a tidy up and doc simplification Jason Kneen 2025-10-22 11:07:22 +0100
  • e83d633179 Add training continuation script and update MacOS guide Jason Kneen 2025-10-22 09:37:31 +0100
  • b81d789992 Pass device batch size to base_loss script Jason Kneen 2025-10-22 09:29:46 +0100
  • 53cc5c9bd0
    Implement tests for configurator.py functionality Sermet Pekin 2025-10-22 11:03:12 +0300
  • 1225ddf00e Add macOS memory-optimized training and documentation Jason Kneen 2025-10-22 07:35:26 +0100
  • 32017e831a fix gb200 tflops Qubitium 2025-10-22 02:03:23 +0000
  • 4e6f5eb8b9
    Merge branch 'master' into fix-mfu-a100 Qubitium-ModelCloud 2025-10-22 09:59:55 +0800
  • ff0605c372 fix mfu statically keyed to h100 max tflops Qubitium 2025-10-21 05:08:49 +0000
  • 5a3d8b6b5e
    Update nanochat/gpt.py Jason Kneen 2025-10-22 02:37:32 +0100
  • 3e184d343e Improve Mac/MPS compatibility and device handling Jason Kneen 2025-10-22 01:55:38 +0100
  • 796f84527f fix(ui): prevent iOS Safari toolbar from covering input on initial load ulanch 2025-10-21 17:34:40 -0700
  • 7a52f9bfbb Updates lockfile with CPU package support without overwriting other architectures Luke Stanley 2025-10-21 20:53:18 +0000
  • 760af62e11 Git ignore eval_bundle Luke Stanley 2025-10-21 20:39:31 +0000
  • 901b075605 Fix GPU-less CPU use on Linux with specific Torch indexes Luke Stanley 2025-10-21 19:52:21 +0000
  • 067298c51b reverted back Murali Chandran 2025-10-21 23:45:09 +0100
  • defd1246aa Fix Torch crash caused by pinning on CPU Luke Stanley 2025-10-21 19:43:38 +0000
  • 2e938530ce
    delete spurious torch.empty allocation in adamw Andrej 2025-10-21 11:35:17 -0700
  • a088b7a6ec use enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available Andrej Karpathy 2025-10-21 18:07:33 +0000
  • 94ee507054 quick fix base eval due to fewshot requirement Andrej Karpathy 2025-10-21 17:56:08 +0000
  • cc8d819286 fix runcpu when num_fewshot is greater than data burtenshaw 2025-10-21 19:37:46 +0200
  • 33e8a27f91
    Merge karpathy/cpu-mps-dev , adding the ability to run on CPU, on MPS, or on CUDA, with autodetect. Gnarly PR, nonzero chance I broke something. Andrej 2025-10-21 10:26:04 -0700
  • 50bea28ef9 also add readme mention of the cpu mps changes Andrej Karpathy 2025-10-21 17:24:48 +0000
  • 5bdc99abfb merge and resolve conflict Andrej Karpathy 2025-10-21 17:19:10 +0000
  • dfcb1c16f1 Merge branch 'master' into cpu-mps-dev Andrej Karpathy 2025-10-21 17:15:53 +0000
  • bb71c64579 fix silly issue in dataloader, this version is much faster and more portable to mps too Andrej Karpathy 2025-10-21 17:12:50 +0000
  • bb786c5560 i shouldnt have committed the lock file, i missed that. revert to the flagship build which is linux. sorry to pollute the repo history... karpathy 2025-10-21 10:07:40 -0700
  • 2bd578fb80
    Merge 7a337f3d5d into c9ea7a91e2 mnehete32 2025-10-21 17:57:46 +0200
  • c9ea7a91e2
    Add customization instructions to README Andrej 2025-10-21 08:57:10 -0700
  • 03cddd9878 actually let's not brick code on git pull. change error to warning Andrej Karpathy 2025-10-21 15:13:25 +0000
  • fe5aed940b add personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok Andrej Karpathy 2025-10-21 15:04:58 +0000
  • a3b701af8d change in code block display Goderr 2025-10-21 20:17:03 +0530
  • e9c5e911f6 change in linespacing of response Goderr 2025-10-21 20:14:15 +0530
  • c5ef68cea2 Add comprehensive educational guide for nanochat Matt Suiche 2025-10-21 18:36:26 +0400
  • 5617ce8d69
    optimize print0 function on common.py Sermet Pekin 2025-10-21 16:44:02 +0300
  • 033c9be7e4 update script Shizhe Diao 2025-10-21 06:27:05 -0700
  • ce64059d65
    Optimize print0 function: Cache DDP rank evaluation for better performance Sermet Pekin 2025-10-21 16:07:56 +0300
  • 82e7cbe611 update structure of sft notebook Ben Burtenshaw 2025-10-21 12:38:31 +0000