Commit Graph

372 Commits

Author SHA1 Message Date
Manmohan
1e2fc09ca6
Merge pull request #17 from manmohan659/feat/chat-api-service
feat(chat-api): conversation orchestration + SSE streaming proxy (#6)
2026-04-16 14:57:10 -04:00
Manmohan
4297817cfb
Merge pull request #16 from manmohan659/feat/auth-service
feat(auth): OAuth2 + JWT auth service with Alembic migrations (#5 #7)
2026-04-16 14:56:51 -04:00
Manmohan Sharma
8153a4fadf
feat(chat-api): conversation orchestration + SSE streaming proxy (#6)
- FastAPI service that manages conversations and messages in PostgreSQL
  (SQLAlchemy 2.0 async + asyncpg) and streams assistant responses back
  to the client via sse-starlette, forwarding the inference service SSE
  contract unchanged.
- Auth guard validates every request against the auth service
  /auth/validate endpoint (X-Internal-API-Key) and caches results in an
  in-process TTL cache (5 min, 1024 entries) to absorb request bursts.
- Every query filters by authenticated user_id; cross-user access
  returns 404. Message send flow auto-titles the first message,
  persists the streamed assistant response after the client disconnects,
  and records token_count + inference_time_ms.
- /api/models{,/swap} proxies the inference admin surface; swap
  requires is_admin on the validated user.
- Structured JSON logging via structlog with trace_id + user_id
  ContextVars attached to every log line.
- Test suite (pytest + aiosqlite + respx) covers CRUD, user scoping,
  streaming SSE persistence, regenerate, model proxy admin gate,
  and the stream proxy error path. 16/16 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 11:49:51 -07:00
Manmohan Sharma
4b4aca642a
feat(auth): OAuth2 + JWT auth service with Alembic migrations (#5 #7)
- Alembic async migrations: users, conversations, messages, is_favorited
- FastAPI auth service: Google + GitHub OAuth, RS256 JWT, refresh cookie
- /auth/me, /auth/refresh, /auth/validate (service-to-service)
- rate limiting 10/min on OAuth routes, CORS locked to FRONTEND_URL

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 11:47:00 -07:00
Manmohan
9bd0c907cc
Merge pull request #15 from manmohan659/feat/frontend-service
feat(frontend): Next.js 14 frontend service for samosaChaat (Workstream A, #2)
2026-04-16 14:27:18 -04:00
Manmohan Sharma
634be4080b
feat(frontend): Next.js 14 frontend service for samosaChaat (#2)
Build services/frontend/ replacing the legacy nanochat/ui.html single-file UI.
Landing, login, and chat pages ported with full design system: Devanagari +
Great Vibes hero, samosa/chai/toran SVG animations, gold/cream palette.

- App Router pages: / (hero + floating illustrations), /login (split-screen
  OAuth with mandala motif), /chat (260px collapsible sidebar, suggestion
  chips, markdown + code-copy, auto-expanding input, slash commands)
- SSE streaming via useSSE hook and /api/chat/stream BFF route (proxies to
  CHAT_API_URL when set, falls back to mock echo for local dev)
- NextAuth.js v5 with Google + GitHub providers; middleware gates /chat/*
- Zustand store with localStorage persistence for conversations/settings
- Tailwind theme carries all ui.html tokens + keyframes (pendulum, float,
  wobble, steamFloat, steamType); SVG assets componentized under components/svg
- Multi-stage node:20-alpine Dockerfile with Next standalone output

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 11:26:57 -07:00
Manmohan
2be82fe731
Merge pull request #13 from manmohan659/feat/terraform-infra
feat(terraform): provision full AWS stack for samosaChaat
2026-04-16 14:26:20 -04:00
Manmohan
ba6988ee32
Merge pull request #14 from manmohan659/feat/inference-service
[codex] Extract standalone inference service
2026-04-16 14:25:55 -04:00
Manmohan
a0533f2199
Merge pull request #12 from manmohan659/feat/monorepo-scaffold
[codex] Scaffold monorepo platform layout
2026-04-16 14:25:35 -04:00
Manmohan Sharma
577771b890
extract standalone inference service 2026-04-16 11:19:18 -07:00
Manmohan Sharma
b381933c3b
feat(terraform): provision full AWS stack for samosaChaat (issue #4)
Add reusable Terraform modules and per-environment configs (dev/uat/prod)
in us-west-2 covering: VPC (3 AZ public/private), EKS 1.29 with IRSA and
ALB/EBS/EFS CSI add-ons, RDS PostgreSQL 15, four ECR repos, IAM roles
(EKS node, ALB controller IRSA, GitHub Actions OIDC), Route53 + ACM for
samosachaat.art, and EFS for model weights. State backend on S3
(samosachaat-terraform-state) with DynamoDB lock table.

terraform validate passes for dev, uat, and prod.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 11:11:02 -07:00
Manmohan Sharma
957f66181d
scaffold monorepo platform layout 2026-04-16 11:06:29 -07:00
Manmohan
baef0a3d66
Merge pull request #1 from manmohan659/codex/pre-gpu-readiness
Add pre-GPU tool training and checkpoint plumbing
2026-03-24 20:53:39 -04:00
Manmohan Sharma
9f3973c677
Add pre-GPU tool training and checkpoint plumbing 2026-03-24 20:52:36 -04:00
Manmohan Sharma
e159c1cf9e
improve mobile responsiveness: proper scaling for phone/tablet
- Tablet (700px): shrink illustrations, hide steam, wider message bubbles
- Mobile (480px): smaller hero text, compact input bar, stacked footer,
  tighter spacing, scaled-down toran, properly sized illustrations
- Small phones (360px): further reduced hero/illustration sizes
- Safe area insets for notched phones

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 12:46:57 -04:00
Manmohan Sharma
5b1b9632fc
UI fixes: single-page layout, input near illustrations, remove cart, fix naming
- Merged landing + chat into single page (samosa/chai slide out on first message)
- Positioned input bar between samosa and chai illustrations
- Footer at very bottom with Karpathy credit
- Removed cart icon, fixed "Aachaat" → "Chaat" everywhere
- Improved lemon SVG with stem/nub
- "Explore" → "Samosa" label

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 12:22:53 -04:00
Manmohan Sharma
ee34586e77
redesign UI: artisan landing page + warm chat theme + ONNX export script
Landing page with desi street-food aesthetic: lemon-mirchi toran with
pendulum animation, dual-script hero (Devanagari + English cursive),
samosa illustration with floating animation, brass chai kettle with
steam wisps, ambient chilli/lemon doodles.

Chat page carries the warm samosa-chaat palette with cream/gold user
bubbles, steam-wisp typing indicator, and WebGPU integration hooks
(window.samosaChaat API for local inference mode switching).

Added scripts/export_onnx.py for ONNX model export with KV cache
support, targeting WebGPU browser inference.

Credit to Andrej Karpathy's nanochat in footer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 11:54:07 -04:00
Manmohan Sharma
40586713bd
fix KV cache dtype mismatch on CPU: use COMPUTE_DTYPE instead of hardcoded logic
The KV cache was hardcoded to float32 on non-CUDA devices, but the model
weights are loaded in bfloat16 via NANOCHAT_DTYPE env var. This caused a
RuntimeError in scaled_dot_product_attention. Now uses COMPUTE_DTYPE from
common.py which respects the env var.

Also broadened CI/CD path triggers to nanochat/**.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 10:04:33 -04:00
Manmohan Sharma
c3f683f3e3
add CI/CD auto-deploy workflow for samosaChaat
Deploys to EC2 on push to master when UI/server files change.
Uses appleboy/ssh-action with stored secrets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 10:00:25 -04:00
Manmohan Sharma
c767741b42
rebrand to samosaChaat: UI, logo, and server messages
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 09:58:12 -04:00
Andrej Karpathy
5019accc5b fix scaling laws scripts after the bigram embeddings were removed 2026-03-17 16:55:56 +00:00
Andrej Karpathy
1b1cc3c599 submit new time to GPT-2 leaderboard entry: 99 minutes 2026-03-14 17:15:01 +00:00
Andrej Karpathy
a825e63f81 Autoresearch round 2: smear, backout, and hyperparameter tuning
New architectural features:
- Smear: mix previous token embedding into current position via learned
  gate, providing cheap bigram-like info (works in training + KV cache)
- Backout: subtract learned fraction of mid-layer residual before logit
  projection to remove low-level features

Hyperparameter tuning:
- Muon momentum warmdown 0.97→0.90 during LR warmdown phase
- Non-uniform per-layer init: resid_lambdas 1.15→1.05, x0_lambdas 0.20→0.05
- c_fc init scale 0.4x, QK norm scale 1.2, sliding window seq_len/4
- Speedrun data:params ratio reduced to 8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 17:03:06 +00:00
Andrej Karpathy
f068604948 new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours 2026-03-10 06:26:39 +00:00
Andrej Karpathy
6ed7d1d82c All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately.
Optimizer & schedule changes:
- Increase unembedding LR 0.004 -> 0.008, weight decay 0.2 -> 0.28
- Per-group Adam betas and weight decay (instead of shared global betas)
- Muon beta2 0.95 -> 0.9, momentum warmup target 0.95 -> 0.97 over 400 steps
- Warmup: ratio-based -> absolute steps (default 40)
- Warmdown ratio 0.5 -> 0.65, final LR fraction 0.0 -> 0.05
- Weight decay schedule: linear -> cosine decay
- Polar express norm factor 1.02 -> 1.01

Architecture & init changes:
- VE gate: channels 32 -> 12, scale range 2x -> 3x, init small positive
- Add post-QK-norm scaling (q,k *= 1.15) for sharper attention
- Embedding init std 1.0 -> 0.8, MLP c_fc init 0.5x smaller
- RoPE base theta 10K -> 100K
- Short attention window: seq_len/2 -> ~seq_len/3 (ceil to 128 tile)
- Logit softcap 20 -> 15
2026-03-09 20:45:17 +00:00
Andrej Karpathy
1076f97059 delete autocast, an unnecessary thorn in my side, manage dtypes directly 2026-03-04 23:55:30 +00:00
Sofie Van Landeghem
752abc836e
Ensure that inputs and targets are contiguous (#569)
* call reshape instead of view in case the tensors are not contiguous

* fix directly in data loader instead
2026-03-04 13:58:27 -08:00
Andrej Karpathy
4b4077425b Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously 2026-03-04 20:02:07 +00:00
Andrej Karpathy
324e69c45d big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise 2026-03-04 19:47:12 +00:00
Andrej Karpathy
b07604ebaa document the legacy fineweb100b dataset and the new climbmix400b dataset 2026-03-03 17:24:31 +00:00
Andrej Karpathy
aba30cb037 tune logit softcap? 2026-03-03 00:38:53 +00:00
Anish
83dccc20ae
Restore completion-only loss masking in SFT dataloader (#582)
* printing steps count

* adding reply only loss for chat

* using the mask by render_conversation function of tokeniser

* undoing some changes

* putting back the comment which got removed accidently, no functionality change
2026-03-02 16:37:47 -08:00
Dipesh Babu
c7ba252142
docs: fix typos in experiment log (#547) 2026-02-20 08:03:45 -08:00
Andrej Karpathy
2dffdc8cf6 document MoE exploration 2026-02-19 02:53:47 +00:00
Andrej Karpathy
48804bff3a report negative result on fineweb dataset 2026-02-18 23:45:31 +00:00
Andrej Karpathy
bb5137860e fix comment 2026-02-18 23:26:22 +00:00
Andrej Karpathy
458555117b Merge branch 'Chetter2-patch-1' 2026-02-18 23:17:39 +00:00
Andrej Karpathy
bac5a35dd7 fix minor bug in fp8 application to skip tiny matmuls 2026-02-18 23:17:29 +00:00
George Shakan
ad55575326 Fix bug in setting precision (#538) 2026-02-18 15:49:18 +00:00
Sofie Van Landeghem
cac43e8511 Fix MockModel's device definition (#535)
* fix MockModel's device definition

* cleanup
2026-02-18 15:49:18 +00:00
Andrej Karpathy
f5fe7925ed update dev log with recent 2026-02-18 15:49:18 +00:00
Andrej Karpathy
1415fb7617 tune the data mixture a bit, load optimizer by default when SFT. These were confirmed to be best settings from sweeps of sft 2026-02-18 15:49:18 +00:00
Andrej Karpathy
77f8fb8303 a number of upgrades to SFT script to bring it up to date w.r.t. pretraining and tuning some of its kwargs based on sweeps 2026-02-18 15:49:18 +00:00
George Shakan
0a23f87643
Fix bug in setting precision (#538) 2026-02-18 07:42:11 -08:00
Sofie Van Landeghem
4800c62f6e
Fix MockModel's device definition (#535)
* fix MockModel's device definition

* cleanup
2026-02-17 16:03:46 -08:00
Andrej Karpathy
4a6e47b0c6 update dev log with recent 2026-02-17 15:44:54 +00:00
Andrej Karpathy
8180e1d8c1 tune the data mixture a bit, load optimizer by default when SFT. These were confirmed to be best settings from sweeps of sft 2026-02-16 20:23:04 +00:00
Andrej Karpathy
788dadeb88 a number of upgrades to SFT script to bring it up to date w.r.t. pretraining and tuning some of its kwargs based on sweeps 2026-02-16 14:41:53 +00:00
Alan
124f49be98
Removed redundant qunatization of gradients 2026-02-15 15:41:33 +00:00
Alan
d9678ff0f9
Save FP8 tensors in autograd ctx instead of full-precision inputs
Store quantized input/weight and their inverse scales in _Float8Matmul ctx to avoid re-quantization in backward and reduce saved-activation memory without changing numerics.
2026-02-15 14:31:54 +00:00