Commit Graph

  • d1ac0b2d07
    when loading models on CPU, convert tensors from bfloat16 to float Andrej 2025-11-02 07:58:56 -0800
  • 5bfcd31b73 revert more formatting changes svlandeg 2025-11-02 14:17:10 +0100
  • 036a3c5881 revert formatting changes to facilitate review svlandeg 2025-11-02 14:16:43 +0100
  • 52e85aaf80 Merge branch 'master' into fix/typo svlandeg 2025-11-02 13:41:13 +0100
  • 1688ba9597 add head_dim, num_heads, num_kv_heads, depth_to_width_ratio as arguments to base_train.py to allow modeling flexibility Nitish Pandey 2025-11-02 16:26:47 +0530
  • ba4f40bf58
    Update run1000.sh to add missing --run=$WANDB_RUN Jing Zhang 2025-11-01 21:27:00 -0700
  • d54c9cbf8c CPU Support, as bfloat16 params breaks inference Manuel Saelices 2025-11-01 23:38:50 +0100
  • cf587acb1a move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts Andrej Karpathy 2025-11-01 16:04:38 +0000
  • 7d2c4a3d95 delete pandas dep in base_eval use csv instead Andrej Karpathy 2025-11-01 15:28:30 +0000
  • ad39db5a23
    tiny fix to comment Andrej 2025-11-01 07:43:57 -0700
  • 630f54ae5a
    use empty locals and globals in call to eval() in engine tool use Andrej 2025-11-01 07:22:59 -0700
  • 6fe0b41dd2
    Merge 96c8e82324 into f15732524a Ruhollah Majdoddin 2025-11-01 07:14:41 -0700
  • f15732524a make deepwiki link better Andrej Karpathy 2025-11-01 14:13:29 +0000
  • e100cdf7f1 Illustrated architecture Nda-jiya Suberu 2025-11-01 11:07:21 +0000
  • 7312d36e58
    link update for lazy people 😭 ayaan sharif 2025-10-31 23:23:19 +0530
  • 3b372875c1 Manage the Python module with maturin konstin 2025-10-31 15:58:05 +0100
  • 7f3154f025
    Update .github/workflows/test.yml Sermet Pekin 2025-10-31 13:34:57 +0300
  • 76ecece5f3
    rename base.yml as test.yml Sermet Pekin 2025-10-31 09:18:38 +0300
  • b8d0c7f391
    Update .github/workflows/base.yml Sermet Pekin 2025-10-31 09:17:08 +0300
  • c98648d0a9
    Update .github/workflows/base.yml Sermet Pekin 2025-10-31 09:16:46 +0300
  • 887e68409f
    Update .github/workflows/base.yml Sermet Pekin 2025-10-31 09:16:23 +0300
  • 5cfcbaa4cd
    Update .github/workflows/base.yml Sermet Pekin 2025-10-31 09:16:01 +0300
  • 876da692c6
    Update .github/workflows/base.yml Sermet Pekin 2025-10-31 09:15:40 +0300
  • 317e4b65df hypercube-abacus Nda-jiya Suberu 2025-10-31 03:27:16 +0000
  • 96c8e82324
    Use compatible release operator for rustbpe dependency Ruhollah Majdoddin 2025-10-30 21:41:22 +0100
  • 158b2b707b
    Update uv.lock after a rebase to master, which had changed pyproject.toml Ruhollah Majdoddin 2025-10-30 21:17:16 +0100
  • f35b27c4a2
    amending the runscripts to use the new installation Ruhollah Majdoddin 2025-10-30 19:36:22 +0100
  • bdeacffdae
    fixing rebuild of rustbpe, if its is changed Ruhollah Majdoddin 2025-10-30 19:34:47 +0100
  • 47960bdbf2
    uv workspace Ruhollah Majdoddin 2025-10-25 18:05:16 +0200
  • d60b720213
    fix(eval): drop pandas in base_eval, harden metadata lookup, fix typo Dipesh Babu 2025-10-30 13:37:58 -0400
  • dfc88334b6
    fix tok/sec calculation bug when grad accum steps > 1 Andrej 2025-10-30 08:36:32 -0700
  • eb11bb0e2e
    remove numpy as dep Andrej 2025-10-30 08:28:14 -0700
  • a3e1352f6b fix: inference_mode, csv metadata, typo, DDP comment Dipesh Babu 2025-10-30 02:04:26 -0400
  • 69b7cc9ac5 remove .DS_Store Kian Kyars 2025-10-29 17:20:46 -0600
  • 4a1104ed1c merge master Kian Kyars 2025-10-29 17:18:28 -0600
  • 70319851fc fix typo svlandeg 2025-10-29 19:48:34 +0100
  • 7bd999ba02 feat(engine.py): Sample unique initial tokens for each sequence in a batch Azekowka 2025-10-29 22:02:28 +0500
  • 0a784e25de
    Merge 557b2d5840 into 1ccbaf4416 Abdulaziz Gabitov 2025-10-29 21:41:29 +0500
  • 1ccbaf4416
    nit delete redundant catch/raise in execute Andrej 2025-10-29 08:10:03 -0700
  • 29ff38d94b
    Merge pull request #35 from bhaskar0210s/master Andrej 2025-10-29 08:06:24 -0700
  • 09f1f4283e
    Merge branch 'master' into cleanup Alex Gaynor 2025-10-29 07:41:54 -0400
  • b996131570 Merge branch 'master' into logo/kerning-update svlandeg 2025-10-29 11:45:40 +0100
  • 3fa974f93c few more reverts svlandeg 2025-10-29 11:45:02 +0100
  • cbd560a83d revert formatting changes to minimize diff and merge conflicts svlandeg 2025-10-29 11:42:56 +0100
  • c93f90d161
    clean up Sofie Van Landeghem 2025-10-29 09:57:35 +0100
  • 4715fdcf52 Merge branch 'master' into master_hoslak svlandeg 2025-10-29 09:40:24 +0100
  • 964d459d9b remove type annotations Matthew Murphy 2025-10-29 01:28:19 -0700
  • 69e3cd410d Merge branch 'fuse-attn-proj-layers' of github.com:murphymatt/nanochat into fuse-attn-proj-layers Matthew Murphy 2025-10-29 01:23:30 -0700
  • ae6dd06489 supporting multi-turn for spelling bee tasks Richard Hsu 2025-10-28 23:14:09 -0700
  • 134f9b7a8f remove kvcache import Matt Murphy 2025-10-14 07:40:05 +0000
  • 7e87fa8a71 fuse qkv linear and qk rotary + norm Matthew Murphy 2025-10-13 22:55:55 -0700
  • a1de1f46ad
    Merge pull request #156 from tlepoint/fix/export-base-dir Andrej 2025-10-28 15:19:08 -0700
  • ee00f523d0
    fixing all the typos to make the pull requests stop Andrej 2025-10-28 13:36:07 -0700
  • 5e0987a431 numpy isnt acting as a dependency for nanochat, so isnt it better to remove numpy from dependencies list Ajeesh Sunil 2025-10-28 20:05:38 +0000
  • 8c9b004c99 typo fixes in scripts svlandeg 2025-10-28 20:17:31 +0100
  • 0a3ce7b0ff typo fixes in readme svlandeg 2025-10-28 20:11:00 +0100
  • f314f1dc59 Remove need for pandas in the base eval script Ádám Vajda 2025-10-28 18:30:28 +0100
  • fdda5826e3 Merge branch 'haowei01-fix_kv_cache_due_to_resize' Andrej Karpathy 2025-10-28 16:54:30 +0000
  • baf0b3fdda also add a test that failed before the fix and passes now with the fix for kv cache resize Andrej Karpathy 2025-10-28 16:54:17 +0000
  • f1db6b4712 delete czar call for help, i'm working through the inbound on that now. add current LLM policy which just asks for disclosure atm Andrej Karpathy 2025-10-28 15:17:43 +0000
  • 9415931f85 delete czar call for help, i'm working through the inbound on that now. add current LLM policy which just asks for disclosure atm Andrej Karpathy 2025-10-28 15:17:43 +0000
  • 093bab4e50 fix: adjust logits shape for liger_cross_entropy loss calculation Eddie Tsai 2025-10-28 15:14:39 +0800
  • 60611fe877 add support for liger_cross_entropy in loss calculation when available Eddie Tsai 2025-10-28 15:08:58 +0800
  • d179dd6918
    Merge branch 'master' into master ong brandon (ob1) 2025-10-28 14:26:51 +0800
  • 111f7cf9de r4 ob1 2025-10-28 14:25:24 +0800
  • 2b9c085559 update the kv_shape Haowei Zhang 2025-10-27 02:47:13 -0700
  • b062b422ac Fix kv cache, given resize will destroys the logical structure Haowei Zhang 2025-10-27 02:23:08 -0700
  • 436b5d7e74
    Fix CUDA out of memory error for 14.56GB GPUs with multiple optimizations Claude 2025-10-27 02:46:40 +0000
  • 5e5adaf76e Refactor run_t4_quick_test.sh to check for existing installations of uv and rustc before attempting to install. Update download logic for evaluation data and identity conversations to skip if already present. Modify model tag naming conventions in training scripts for consistency. z 2025-10-27 10:24:47 +0800
  • 2020eb9973 Refactor run_t4_quick_test.sh to check for existing installations of uv and rustc before installation, and improve data download logic to skip existing files. Update checkpoint directory naming conventions in training scripts for consistency. z 2025-10-27 10:24:24 +0800
  • 7456de29b9 update NANOCHAT_BASE_DIR path in run_t4_quick_test.sh to use a relative directory z 2025-10-27 10:07:03 +0800
  • f5a8e8e3f0 midtrain_onwards script Richard Hsu 2025-10-26 21:35:52 +0000
  • 58b38fcd81 minor Richard Hsu 2025-10-26 12:55:33 -0700
  • 6da078295a Extension - supporting Spelling Bee but for digits (e.g. how many 2 in 6789022?) Richard Hsu 2025-10-26 12:34:44 -0700
  • 7eac69487b formatting svilupp 2025-10-26 19:28:04 +0000
  • 6bfc1f8f53 update synth-data-pipe svilupp 2025-10-26 19:24:22 +0000
  • e7f6062ea7 deleted doc for proposal to submit a PR Ethan Silverthorne 2025-10-26 13:51:09 -0400
  • 72f803ff58 multilingual file added and readME updated Ethan Silverthorne 2025-10-26 13:50:42 -0400
  • a7d4b0045e create proposal for issue Ethan Silverthorne 2025-10-26 13:45:08 -0400
  • c739d3bd79
    Fixed a typo Eugene Trifonov 2025-10-26 18:25:29 +0400
  • 725599b86d
    Merge branch 'master' into master tillo 2025-10-26 15:12:17 +0100
  • 1c301a374c format tillo beffa 2025-10-26 15:10:26 +0100
  • c77fbb010b update BASE_URL in dataset.py to new modelscope link for data access z 2025-10-26 17:18:39 +0800
  • a9de4b1038 Fix tok/sec metrics for base_train and mid_train when gradient accumulation is not 1 water-vapor 2025-10-26 01:43:49 -0500
  • c964f74ed7 Rename 'gpu' dependency to 'cuda' to properly reflect the dependencies being loaded tkreindler 2025-10-26 03:37:51 +0000
  • 9d4556ba95 Make GPU optional tkreindler 2025-10-26 03:28:07 +0000
  • e72efbbf14 Updated runcpu.sh to use the gpu sometimes tkreindler 2025-10-26 03:18:59 +0000
  • d513db792e Format a bit tkreindler 2025-10-26 02:42:50 +0000
  • 13b97b6088
    Merge branch 'karpathy:master' into claude/nanochat-sae-interpretability-011CUT2TocZpFerXthoW9LMf Caleb DeLeeuw 2025-10-25 19:11:19 -0700
  • 6a3fdf1422 Got it working tkreindler 2025-10-25 23:18:24 +0000
  • 0d8973a53d
    Merge branch 'karpathy:master' into add-agc-gradient-clipping Alex 2025-10-25 15:55:42 -0700
  • 80adb78ff4 docs(dev): fix typos in gen_synthetic_data.py comments/prompts hasan 2025-10-26 00:06:57 +0200
  • 4add4ff246 docs: fix typos and clarify README hasan 2025-10-25 23:43:38 +0200
  • f31faa5dfe wip add dev container tkreindler 2025-10-25 21:09:29 +0000
  • c75fe54aa7 readme tweak, link to new discussion and add file structure Andrej Karpathy 2025-10-25 19:39:16 +0000
  • 0197c1cd3c up svilupp 2025-10-25 18:38:28 +0100
  • ffe22fd4d5 up svilupp 2025-10-25 13:28:46 +0100
  • 558e949ddd
    Add SAE-based interpretability extension for nanochat Claude 2025-10-25 01:22:51 +0000
  • 0a1059d571 add into rustbpe MadMax129 2025-10-24 18:52:26 -0400
  • 851810c7d5 remove string allocations MadMax129 2025-10-24 17:06:06 -0400