nanochat

tacit/nanochat

Fork 0

mirror of https://github.com/karpathy/nanochat.git synced 2025-12-06 04:12:13 +00:00

Commit Graph

Select branches

Hide Pull Requests

master

#103

#105

#106

#108

#109

#110

#111

#112

#113

#115

#116

#119

#121

#122

#123

#126

#127

#128

#128

#129

#13

#130

#131

#132

#133

#135

#137

#14

#140

#141

#141

#142

#144

#145

#146

#147

#147

#149

#15

#15

#151

#151

#153

#154

#155

#156

#159

#159

#160

#161

#161

#162

#165

#169

#17

#172

#172

#173

#174

#175

#179

#18

#180

#181

#182

#184

#185

#186

#19

#190

#191

#192

#194

#195

#197

#2

#201

#201

#204

#204

#205

#21

#217

#224

#226

#227

#228

#23

#230

#232

#233

#234

#235

#236

#237

#238

#239

#24

#240

#241

#242

#243

#244

#246

#247

#249

#251

#252

#252

#253

#253

#255

#256

#256

#258

#258

#259

#261

#262

#263

#265

#267

#268

#269

#27

#270

#271

#274

#275

#275

#276

#279

#280

#281

#282

#286

#29

#290

#294

#294

#295

#296

#296

#298

#299

#3

#3

#30

#301

#301

#306

#306

#307

#307

#308

#308

#309

#309

#31

#31

#310

#310

#311

#311

#312

#312

#316

#316

#317

#317

#318

#318

#319

#319

#32

#32

#322

#322

#323

#324

#324

#325

#325

#326

#326

#327

#327

#328

#328

#329

#33

#330

#333

#333

#336

#34

#342

#342

#345

#345

#346

#348

#348

#35

#350

#351

#351

#352

#353

#355

#356

#357

#357

#36

#38

#38

#39

#4

#40

#40

#41

#43

#46

#48

#49

#5

#50

#51

#52

#53

#53

#54

#55

#55

#56

#59

#59

#6

#61

#62

#63

#63

#64

#66

#67

#70

#74

#75

#80

#81

#84

#85

#85

#86

#88

#89

#9

#90

#90

#91

#93

#93

#94

#95

#97

#98

#99

2dc85662c3 fix: safe DDP cleanup (check initialized PG, not just env) Dipesh Babu 2025-11-05 21:22:35 -0500
b399e43168 fix engine test bug howardgao@outlook.com 2025-11-06 08:56:45 +0800
c6b7ab7440 grad clip logging and printing and cosmetics Andrej Karpathy 2025-11-05 21:08:30 +0000
a6efa53b92 optimisations fixed diana 2025-11-05 22:07:29 +0330
890d1af779

Merge pull request #19 from Dianababaei/test/auto-discovery-comprehensive-test-suite Dianababaei 2025-11-05 20:25:22 +0330
ffdbb9c247 test: add comprehensive test suite for auto-batch-size discovery with unit and integration tests, pytest framework, stability validation, and updated documentation Artemis Git Integration 2025-11-05 16:52:29 +0000
04e66eacfa

Merge pull request #18 from Dianababaei/feat/auto-batch-size-discovery-integration Dianababaei 2025-11-05 20:20:47 +0330
09f5420fab feat: add auto-batch-size discovery to base_train, mid_train, and chat_sft with fallback defaults and manual override support Artemis Git Integration 2025-11-05 16:50:27 +0000
fa14cba28e

Merge pull request #17 from Dianababaei/feat/train-batch-sample-functions-memory-testing Dianababaei 2025-11-05 20:19:35 +0330
a8aad26041 feat(train): add batch sample functions for memory testing in auto-discovery Artemis Git Integration 2025-11-05 16:48:55 +0000
38801c983d

Merge pull request #16 from Dianababaei/feat/auto-batch-size-discovery-config Dianababaei 2025-11-05 20:18:26 +0330
cba76ef8ef feat(config): add auto batch size discovery with configurable parameters and CLI overrides Artemis Git Integration 2025-11-05 16:47:32 +0000
747f3a82ef

Merge pull request #7 from Dianababaei/feat/auto-batch-size-discovery Dianababaei 2025-11-05 20:04:42 +0330
9d525655e2

Merge pull request #15 from Dianababaei/test/comprehensive-sampling-edge-cases-73bf1317 Dianababaei 2025-11-05 20:03:39 +0330
8c8f08955a test: add comprehensive edge case test suite for sampling with deterministic and stochastic validation Artemis Git Integration 2025-11-05 16:32:21 +0000
737165ce44

Merge pull request #14 from Dianababaei/refactor/engine-remove-token-broadcasting-first-iteration Dianababaei 2025-11-05 20:01:54 +0330
bacfe0f453 refactor(engine): remove token broadcasting in first iteration Artemis Git Integration 2025-11-05 16:31:19 +0000
ad2f5c8c2f

Merge pull request #13 from Dianababaei/feat/engine-independent-token-sampling-prefill-multi-sample Dianababaei 2025-11-05 19:58:51 +0330
eadcbc2d8f feat(engine): enable independent token sampling in prefill for multi-sample generation Artemis Git Integration 2025-11-05 16:28:22 +0000
73bf1317ff

Merge pull request #12 from Dianababaei/test/engine-multi-sample-token-diversity-validation Dianababaei 2025-11-05 19:57:54 +0330
c63107f51c test(engine): add multi-sample token diversity validation test Artemis Git Integration 2025-11-05 16:27:02 +0000
717a2d443f

Merge pull request #11 from Dianababaei/test/torch-compile-validation-logging Dianababaei 2025-11-05 19:50:37 +0330
47935c69d5 test: add torch.compile performance validation logging with multi-GPU compatibility checks Artemis Git Integration 2025-11-05 16:19:59 +0000
49d29417f1

Merge pull request #10 from Dianababaei/refactor/chat-sft-use-orig-model-for-eval-and-checkpointing Dianababaei 2025-11-05 19:42:13 +0330
a381fc406d refactor(chat_sft): use uncompiled model for eval and checkpointing to prevent recompilation Artemis Git Integration 2025-11-05 16:09:43 +0000
0af8c8af68

Merge pull request #9 from Dianababaei/feat/enable-torch-compile-chat-sft-fixed-shapes Dianababaei 2025-11-05 19:39:05 +0330
5cd79225c4 feat(train): enable torch.compile for chat_sft with fixed shapes for 30-50% speedup Artemis Git Integration 2025-11-05 16:07:54 +0000
072d49ab3c

Merge pull request #8 from Dianababaei/feat/chat-sft-fixed-length-padding-torch-compile Dianababaei 2025-11-05 19:36:36 +0330
d8be015b20 feat(chat_sft): add fixed-length padding for torch.compile compatibility Artemis Git Integration 2025-11-05 16:04:26 +0000
507b230565 feat(training): implement automatic batch size discovery module Artemis Git Integration 2025-11-05 15:59:49 +0000
dd52b95fde

Merge 32017e831a into 885a4f25e7 Qubitium-ModelCloud 2025-11-05 18:43:52 +0530
545bb8e772

Refactor wandb logging initialization Sermet Pekin 2025-11-05 15:58:41 +0300
b9f01eedd9

Refactor wandb initialization in chat_sft.py Sermet Pekin 2025-11-05 15:58:02 +0300
523714b5c8

Replace wandb initialization with get_wandb function Sermet Pekin 2025-11-05 15:56:49 +0300
679ac96efe

Refactor wandb logging initialization Sermet Pekin 2025-11-05 15:55:53 +0300
d9be7d4f14

add get_wandb function that will either return DummyWandb or real wandb initalized Sermet Pekin 2025-11-05 15:54:48 +0300
59487556ce

Add pyproject for rustbpe standalone TensorTemplar 2025-11-05 13:58:12 +0200
a2d61393ee Make NPROC_PER_NODE customizable in run1000.sh and speedrun.sh vinjn 2025-11-04 22:16:08 -0800
1671e5cf1e readability changes to f-string, remove extra .item() Nitish Pandey 2025-11-05 10:19:49 +0530
885a4f25e7

Replace fcntl with filelock for Windows compatibility Andrej 2025-11-04 16:35:39 -0800
3a2ae631c4

Merge branch 'master' into master Andrej 2025-11-04 16:35:02 -0800
12d995f58c

Add NPROC_PER_NODE var to speedrun.sh and run1000.sh Andrej 2025-11-04 16:26:33 -0800
f1683c5b16 set nproc_per_node as var in speedrun and run1000 scripts svlandeg 2025-11-04 21:36:10 +0100
3c43ef370c handle case when grad_clip is 0.0, call .item() once only Nitish Pandey 2025-11-04 23:54:09 +0530
d1558c7873

handle bf16 on MPS by casting to fp32 during load checkpoint Andrej 2025-11-04 09:42:50 -0800
df25293087

Add explicit UTF-8 encoding on open Andrej 2025-11-04 09:38:18 -0800
a37fd2d37f

Merge 04722913b3 into a83646e098 Mert Cobanov 2025-11-04 12:40:33 +0100
1e89af9862 Replace fcntl with filelock for Windows compatibility Yasser Makram 2025-11-04 07:22:34 +0000
0bd2b19b1b fix: guard fcntl import/usage for non-POSIX (Windows-safe import) Dipesh Babu 2025-11-04 01:54:29 -0500
a88e7ec21f fix: Correct Docker build for rustbpe tokenizer google-labs-jules[bot] 2025-11-04 02:24:08 +0000
fa04262889 fix: Correct Docker build for rustbpe tokenizer google-labs-jules[bot] 2025-11-04 02:05:34 +0000
a2189d20d0 feat: Use Cloud Build for Vertex AI pipeline image creation google-labs-jules[bot] 2025-11-04 01:47:20 +0000
2781d216c6 feat: Refactor nanochat to run on Vertex AI Pipelines google-labs-jules[bot] 2025-11-04 01:26:51 +0000
04b7c85353 making default value as 8 Sachin Agrawal 2025-11-03 22:25:13 +0100
d1fc8c5d05 fixing deleted text issue Sachin Agrawal 2025-11-03 22:09:10 +0100
7a40ee77b4 fix: cast bf16 to fp32 on MPS (like CPU) to avoid dtype issues Dipesh Babu 2025-11-03 16:00:56 -0500
2ce62ec076 ensure consistency of quotes within each statement svlandeg 2025-11-03 21:52:02 +0100
e22fc6f2fa few more explicit UTF-8 encodings svlandeg 2025-11-03 21:46:39 +0100
c72b8b2309 add explicit UTF-8 encoding svlandeg 2025-11-03 21:27:12 +0100
03939756bc log grad norm during training Nitish Pandey 2025-11-04 00:47:31 +0530
a83646e098

fix(eval): use UTF-8 when reading CORE JSONL and writing CSV Andrej 2025-11-03 06:38:33 -0800
8681922328

fix lstrip bug, make it removeprefix, TIL. Andrej 2025-11-03 06:37:48 -0800
5be33bbb78 Add support for multilingual training with Turkish added aleynahukmet 2025-11-03 14:28:58 +0000
878d8bbdfa

Merge pull request #6 from Dianababaei/docs/update-generate-docstring-kv-cache-optimization Dianababaei 2025-11-03 16:07:15 +0330
807a56bdfc nit Salman Mohammadi 2025-11-03 12:32:59 +0000
15a782453f docs: update generate() docstring to reflect KV cache optimization Artemis Git Integration 2025-11-03 12:30:21 +0000
e243767cc3 cleanup Salman Mohammadi 2025-11-03 12:28:15 +0000
827e608492 cleaning up speedrun Sachin Agrawal 2025-11-03 13:12:31 +0100
4163c648c6 cleaning up speedrun.sh Sachin Agrawal 2025-11-03 13:10:04 +0100
e0e168dacd cleanup Salman Mohammadi 2025-11-03 12:07:59 +0000
5cf2bca56a cleanup Salman Mohammadi 2025-11-03 12:07:23 +0000
cf5e213613 updating nproc to 8 Sachin Agrawal 2025-11-03 13:06:38 +0100
e42ac0f428 updating Readme Sachin Agrawal 2025-11-03 12:59:25 +0100
fe9885d20a remove excess logging Salman Mohammadi 2025-11-03 11:45:46 +0000
957a1f4394 compile eval model also Salman Mohammadi 2025-11-03 11:42:34 +0000
83ce1af08e

Update speedrun.sh Sachin Agrawal 2025-11-03 12:01:18 +0100
3a3cd20690

Merge pull request #5 from Dianababaei/feat/kv-cache-benchmark-script Dianababaei 2025-11-03 13:37:28 +0330
4d9d10abb0 feat(benchmark): add performance benchmark script for KV-cache optimizations with CLI args, GPU memory tracking, and statistical measurement across iterations Artemis Git Integration 2025-11-03 10:06:02 +0000
333919d764

Merge pull request #4 from Dianababaei/feat/kv-cached-generation-loop-o-t-optimization Dianababaei 2025-11-03 13:35:41 +0330
b78bc3fd9f perf: optimize generation loop from O(T²) to O(T) using KV-cache Artemis Git Integration 2025-11-03 10:04:43 +0000
8927ec79c8

Merge pull request #3 from Dianababaei/feat/gpt-prefill-phase-kv-caching Dianababaei 2025-11-03 13:33:16 +0330
1131c37a62 feat(gpt): implement prefill phase for efficient prompt processing with KV-caching Artemis Git Integration 2025-11-03 10:01:59 +0000
d0383978df

Merge pull request #2 from Dianababaei/feat/gpt-initialize-kvcache-efficient-generation Dianababaei 2025-11-03 13:30:46 +0330
dd1f606c52 feat(gpt): initialize KVCache for efficient generation with MQA support Artemis Git Integration 2025-11-03 10:00:19 +0000
d44a3e090f

Merge pull request #1 from Dianababaei/feat/gpt-add-kvcache-import Dianababaei 2025-11-03 13:26:17 +0330
1703f181b9 feat(gpt): add KVCache import from engine module for efficient autoregressive generation Artemis Git Integration 2025-11-03 09:55:48 +0000
9b8c4c8849 chore(sdd): init workflow skeleton 赵建新 2025-11-03 16:28:24 +0800
e86f8fc030 fix merge conflict Quanyi Mo 2025-11-02 22:49:30 -0800
5ca0950c9c update README.md Quanyi Mo 2025-11-02 22:03:02 -0800
de6597533f change to allow 24GB VRAM gpu(3090/4090) to run training/eval Quanyi Mo 2025-11-02 21:49:51 -0800
226953b841 fix: open JSONL and results CSV with UTF-8 encoding for portability Dipesh Babu 2025-11-03 01:20:56 -0500
620c5f468c

Create SECURITY.md Rittikrai kirikan 2025-11-03 12:43:26 +0700
f1e15f5f4d Fixing subtle bug: lstrip removes all matching characters, including potentially required ones. Use removeprefix instead. Josh Odom 2025-11-02 23:40:37 -0600
34da6e1fa8

Update README.md-Ging Rittikrai kirikan 2025-11-03 11:25:11 +0700
984dfa69e2

Update README.md-Ging Rittikrai kirikan 2025-11-03 02:22:29 +0700
b6da6982f6

fix nanochat logo: the t was placed too far to the right Andrej 2025-11-02 08:17:00 -0800
c2c4f77e22

oops small bugfix to run1000.sh missing kwarg Andrej 2025-11-02 08:14:41 -0800
29c46065f6 aligned with latest changes willhama 2025-11-02 17:07:01 +0100
da2a597c61 Merge branch 'master' into added-tinyrun-for-minimal-configuration willhama 2025-11-02 17:04:50 +0100
1a0b93d8a2 added tinyrun to run with single gpu willhama 2025-11-02 17:02:37 +0100