Commit Graph

  • 2dc85662c3 fix: safe DDP cleanup (check initialized PG, not just env) Dipesh Babu 2025-11-05 21:22:35 -0500
  • b399e43168 fix engine test bug howardgao@outlook.com 2025-11-06 08:56:45 +0800
  • c6b7ab7440 grad clip logging and printing and cosmetics Andrej Karpathy 2025-11-05 21:08:30 +0000
  • a6efa53b92 optimisations fixed diana 2025-11-05 22:07:29 +0330
  • 890d1af779
    Merge pull request #19 from Dianababaei/test/auto-discovery-comprehensive-test-suite Dianababaei 2025-11-05 20:25:22 +0330
  • ffdbb9c247 test: add comprehensive test suite for auto-batch-size discovery with unit and integration tests, pytest framework, stability validation, and updated documentation Artemis Git Integration 2025-11-05 16:52:29 +0000
  • 04e66eacfa
    Merge pull request #18 from Dianababaei/feat/auto-batch-size-discovery-integration Dianababaei 2025-11-05 20:20:47 +0330
  • 09f5420fab feat: add auto-batch-size discovery to base_train, mid_train, and chat_sft with fallback defaults and manual override support Artemis Git Integration 2025-11-05 16:50:27 +0000
  • fa14cba28e
    Merge pull request #17 from Dianababaei/feat/train-batch-sample-functions-memory-testing Dianababaei 2025-11-05 20:19:35 +0330
  • a8aad26041 feat(train): add batch sample functions for memory testing in auto-discovery Artemis Git Integration 2025-11-05 16:48:55 +0000
  • 38801c983d
    Merge pull request #16 from Dianababaei/feat/auto-batch-size-discovery-config Dianababaei 2025-11-05 20:18:26 +0330
  • cba76ef8ef feat(config): add auto batch size discovery with configurable parameters and CLI overrides Artemis Git Integration 2025-11-05 16:47:32 +0000
  • 747f3a82ef
    Merge pull request #7 from Dianababaei/feat/auto-batch-size-discovery Dianababaei 2025-11-05 20:04:42 +0330
  • 9d525655e2
    Merge pull request #15 from Dianababaei/test/comprehensive-sampling-edge-cases-73bf1317 Dianababaei 2025-11-05 20:03:39 +0330
  • 8c8f08955a test: add comprehensive edge case test suite for sampling with deterministic and stochastic validation Artemis Git Integration 2025-11-05 16:32:21 +0000
  • 737165ce44
    Merge pull request #14 from Dianababaei/refactor/engine-remove-token-broadcasting-first-iteration Dianababaei 2025-11-05 20:01:54 +0330
  • bacfe0f453 refactor(engine): remove token broadcasting in first iteration Artemis Git Integration 2025-11-05 16:31:19 +0000
  • ad2f5c8c2f
    Merge pull request #13 from Dianababaei/feat/engine-independent-token-sampling-prefill-multi-sample Dianababaei 2025-11-05 19:58:51 +0330
  • eadcbc2d8f feat(engine): enable independent token sampling in prefill for multi-sample generation Artemis Git Integration 2025-11-05 16:28:22 +0000
  • 73bf1317ff
    Merge pull request #12 from Dianababaei/test/engine-multi-sample-token-diversity-validation Dianababaei 2025-11-05 19:57:54 +0330
  • c63107f51c test(engine): add multi-sample token diversity validation test Artemis Git Integration 2025-11-05 16:27:02 +0000
  • 717a2d443f
    Merge pull request #11 from Dianababaei/test/torch-compile-validation-logging Dianababaei 2025-11-05 19:50:37 +0330
  • 47935c69d5 test: add torch.compile performance validation logging with multi-GPU compatibility checks Artemis Git Integration 2025-11-05 16:19:59 +0000
  • 49d29417f1
    Merge pull request #10 from Dianababaei/refactor/chat-sft-use-orig-model-for-eval-and-checkpointing Dianababaei 2025-11-05 19:42:13 +0330
  • a381fc406d refactor(chat_sft): use uncompiled model for eval and checkpointing to prevent recompilation Artemis Git Integration 2025-11-05 16:09:43 +0000
  • 0af8c8af68
    Merge pull request #9 from Dianababaei/feat/enable-torch-compile-chat-sft-fixed-shapes Dianababaei 2025-11-05 19:39:05 +0330
  • 5cd79225c4 feat(train): enable torch.compile for chat_sft with fixed shapes for 30-50% speedup Artemis Git Integration 2025-11-05 16:07:54 +0000
  • 072d49ab3c
    Merge pull request #8 from Dianababaei/feat/chat-sft-fixed-length-padding-torch-compile Dianababaei 2025-11-05 19:36:36 +0330
  • d8be015b20 feat(chat_sft): add fixed-length padding for torch.compile compatibility Artemis Git Integration 2025-11-05 16:04:26 +0000
  • 507b230565 feat(training): implement automatic batch size discovery module Artemis Git Integration 2025-11-05 15:59:49 +0000
  • dd52b95fde
    Merge 32017e831a into 885a4f25e7 Qubitium-ModelCloud 2025-11-05 18:43:52 +0530
  • 545bb8e772
    Refactor wandb logging initialization Sermet Pekin 2025-11-05 15:58:41 +0300
  • b9f01eedd9
    Refactor wandb initialization in chat_sft.py Sermet Pekin 2025-11-05 15:58:02 +0300
  • 523714b5c8
    Replace wandb initialization with get_wandb function Sermet Pekin 2025-11-05 15:56:49 +0300
  • 679ac96efe
    Refactor wandb logging initialization Sermet Pekin 2025-11-05 15:55:53 +0300
  • d9be7d4f14
    add get_wandb function that will either return DummyWandb or real wandb initalized Sermet Pekin 2025-11-05 15:54:48 +0300
  • 59487556ce
    Add pyproject for rustbpe standalone TensorTemplar 2025-11-05 13:58:12 +0200
  • a2d61393ee Make NPROC_PER_NODE customizable in run1000.sh and speedrun.sh vinjn 2025-11-04 22:16:08 -0800
  • 1671e5cf1e readability changes to f-string, remove extra .item() Nitish Pandey 2025-11-05 10:19:49 +0530
  • 885a4f25e7
    Replace fcntl with filelock for Windows compatibility Andrej 2025-11-04 16:35:39 -0800
  • 3a2ae631c4
    Merge branch 'master' into master Andrej 2025-11-04 16:35:02 -0800
  • 12d995f58c
    Add NPROC_PER_NODE var to speedrun.sh and run1000.sh Andrej 2025-11-04 16:26:33 -0800
  • f1683c5b16 set nproc_per_node as var in speedrun and run1000 scripts svlandeg 2025-11-04 21:36:10 +0100
  • 3c43ef370c handle case when grad_clip is 0.0, call .item() once only Nitish Pandey 2025-11-04 23:54:09 +0530
  • d1558c7873
    handle bf16 on MPS by casting to fp32 during load checkpoint Andrej 2025-11-04 09:42:50 -0800
  • df25293087
    Add explicit UTF-8 encoding on open Andrej 2025-11-04 09:38:18 -0800
  • a37fd2d37f
    Merge 04722913b3 into a83646e098 Mert Cobanov 2025-11-04 12:40:33 +0100
  • 1e89af9862 Replace fcntl with filelock for Windows compatibility Yasser Makram 2025-11-04 07:22:34 +0000
  • 0bd2b19b1b fix: guard fcntl import/usage for non-POSIX (Windows-safe import) Dipesh Babu 2025-11-04 01:54:29 -0500
  • a88e7ec21f fix: Correct Docker build for rustbpe tokenizer google-labs-jules[bot] 2025-11-04 02:24:08 +0000
  • fa04262889 fix: Correct Docker build for rustbpe tokenizer google-labs-jules[bot] 2025-11-04 02:05:34 +0000
  • a2189d20d0 feat: Use Cloud Build for Vertex AI pipeline image creation google-labs-jules[bot] 2025-11-04 01:47:20 +0000
  • 2781d216c6 feat: Refactor nanochat to run on Vertex AI Pipelines google-labs-jules[bot] 2025-11-04 01:26:51 +0000
  • 04b7c85353 making default value as 8 Sachin Agrawal 2025-11-03 22:25:13 +0100
  • d1fc8c5d05 fixing deleted text issue Sachin Agrawal 2025-11-03 22:09:10 +0100
  • 7a40ee77b4 fix: cast bf16 to fp32 on MPS (like CPU) to avoid dtype issues Dipesh Babu 2025-11-03 16:00:56 -0500
  • 2ce62ec076 ensure consistency of quotes within each statement svlandeg 2025-11-03 21:52:02 +0100
  • e22fc6f2fa few more explicit UTF-8 encodings svlandeg 2025-11-03 21:46:39 +0100
  • c72b8b2309 add explicit UTF-8 encoding svlandeg 2025-11-03 21:27:12 +0100
  • 03939756bc log grad norm during training Nitish Pandey 2025-11-04 00:47:31 +0530
  • a83646e098
    fix(eval): use UTF-8 when reading CORE JSONL and writing CSV Andrej 2025-11-03 06:38:33 -0800
  • 8681922328
    fix lstrip bug, make it removeprefix, TIL. Andrej 2025-11-03 06:37:48 -0800
  • 5be33bbb78 Add support for multilingual training with Turkish added aleynahukmet 2025-11-03 14:28:58 +0000
  • 878d8bbdfa
    Merge pull request #6 from Dianababaei/docs/update-generate-docstring-kv-cache-optimization Dianababaei 2025-11-03 16:07:15 +0330
  • 807a56bdfc nit Salman Mohammadi 2025-11-03 12:32:59 +0000
  • 15a782453f docs: update generate() docstring to reflect KV cache optimization Artemis Git Integration 2025-11-03 12:30:21 +0000
  • e243767cc3 cleanup Salman Mohammadi 2025-11-03 12:28:15 +0000
  • 827e608492 cleaning up speedrun Sachin Agrawal 2025-11-03 13:12:31 +0100
  • 4163c648c6 cleaning up speedrun.sh Sachin Agrawal 2025-11-03 13:10:04 +0100
  • e0e168dacd cleanup Salman Mohammadi 2025-11-03 12:07:59 +0000
  • 5cf2bca56a cleanup Salman Mohammadi 2025-11-03 12:07:23 +0000
  • cf5e213613 updating nproc to 8 Sachin Agrawal 2025-11-03 13:06:38 +0100
  • e42ac0f428 updating Readme Sachin Agrawal 2025-11-03 12:59:25 +0100
  • fe9885d20a remove excess logging Salman Mohammadi 2025-11-03 11:45:46 +0000
  • 957a1f4394 compile eval model also Salman Mohammadi 2025-11-03 11:42:34 +0000
  • 83ce1af08e
    Update speedrun.sh Sachin Agrawal 2025-11-03 12:01:18 +0100
  • 3a3cd20690
    Merge pull request #5 from Dianababaei/feat/kv-cache-benchmark-script Dianababaei 2025-11-03 13:37:28 +0330
  • 4d9d10abb0 feat(benchmark): add performance benchmark script for KV-cache optimizations with CLI args, GPU memory tracking, and statistical measurement across iterations Artemis Git Integration 2025-11-03 10:06:02 +0000
  • 333919d764
    Merge pull request #4 from Dianababaei/feat/kv-cached-generation-loop-o-t-optimization Dianababaei 2025-11-03 13:35:41 +0330
  • b78bc3fd9f perf: optimize generation loop from O(T²) to O(T) using KV-cache Artemis Git Integration 2025-11-03 10:04:43 +0000
  • 8927ec79c8
    Merge pull request #3 from Dianababaei/feat/gpt-prefill-phase-kv-caching Dianababaei 2025-11-03 13:33:16 +0330
  • 1131c37a62 feat(gpt): implement prefill phase for efficient prompt processing with KV-caching Artemis Git Integration 2025-11-03 10:01:59 +0000
  • d0383978df
    Merge pull request #2 from Dianababaei/feat/gpt-initialize-kvcache-efficient-generation Dianababaei 2025-11-03 13:30:46 +0330
  • dd1f606c52 feat(gpt): initialize KVCache for efficient generation with MQA support Artemis Git Integration 2025-11-03 10:00:19 +0000
  • d44a3e090f
    Merge pull request #1 from Dianababaei/feat/gpt-add-kvcache-import Dianababaei 2025-11-03 13:26:17 +0330
  • 1703f181b9 feat(gpt): add KVCache import from engine module for efficient autoregressive generation Artemis Git Integration 2025-11-03 09:55:48 +0000
  • 9b8c4c8849 chore(sdd): init workflow skeleton 赵建新 2025-11-03 16:28:24 +0800
  • e86f8fc030 fix merge conflict Quanyi Mo 2025-11-02 22:49:30 -0800
  • 5ca0950c9c update README.md Quanyi Mo 2025-11-02 22:03:02 -0800
  • de6597533f change to allow 24GB VRAM gpu(3090/4090) to run training/eval Quanyi Mo 2025-11-02 21:49:51 -0800
  • 226953b841 fix: open JSONL and results CSV with UTF-8 encoding for portability Dipesh Babu 2025-11-03 01:20:56 -0500
  • 620c5f468c
    Create SECURITY.md Rittikrai kirikan 2025-11-03 12:43:26 +0700
  • f1e15f5f4d Fixing subtle bug: lstrip removes all matching characters, including potentially required ones. Use removeprefix instead. Josh Odom 2025-11-02 23:40:37 -0600
  • 34da6e1fa8
    Update README.md-Ging Rittikrai kirikan 2025-11-03 11:25:11 +0700
  • 984dfa69e2
    Update README.md-Ging Rittikrai kirikan 2025-11-03 02:22:29 +0700
  • b6da6982f6
    fix nanochat logo: the t was placed too far to the right Andrej 2025-11-02 08:17:00 -0800
  • c2c4f77e22
    oops small bugfix to run1000.sh missing kwarg Andrej 2025-11-02 08:14:41 -0800
  • 29c46065f6 aligned with latest changes willhama 2025-11-02 17:07:01 +0100
  • da2a597c61 Merge branch 'master' into added-tinyrun-for-minimal-configuration willhama 2025-11-02 17:04:50 +0100
  • 1a0b93d8a2 added tinyrun to run with single gpu willhama 2025-11-02 17:02:37 +0100