nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-05-07 08:19:52 +00:00

Author	SHA1	Message	Date
Manmohan Sharma	b766dcf703	feat(deploy): add dual-mode deploy switch (EC2 monolith + EKS) - deploy.sh: single script to switch between EC2 and EKS modes - ec2: docker-compose with ECR images + nginx SSL reverse proxy - eks: terraform apply + helm install (for demos/grading) - eks-down: terraform destroy (stop costs) - docker-compose.prod.yml: ECR image overrides + nginx service - nginx/nginx.conf: reverse proxy with SSL, SSE streaming support - deploy-ec2.yml: auto-deploy to EC2 after images are built - Remove old single-server deploy.yml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:57:57 -07:00
Manmohan	9095cf01a8	Merge pull request #22 from manmohan659/fix/inference-lockfile fix(inference): regenerate uv.lock for new deps	2026-04-16 15:49:19 -04:00
Manmohan Sharma	07892c0f00	fix(inference): regenerate uv.lock after structlog/prometheus deps added The observability PR added structlog and prometheus-fastapi-instrumentator to inference pyproject.toml but did not regenerate uv.lock, causing Docker build to fail with --locked flag. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:49:05 -07:00
Manmohan	3c0b1ae16b	Merge pull request #21 from manmohan659/fix/ci-and-frontend fix(ci): use setup-uv and --no-workspace for service tests	2026-04-16 15:36:27 -04:00
Manmohan Sharma	66bac1aa5f	fix(ci): use astral-sh/setup-uv and --no-workspace for service tests Root pyproject.toml uses uv features (extra in sources, conflicts) that caused uv sync to fail in CI. Fix by: 1. Replace pip install uv==0.4.30 with astral-sh/setup-uv@v4 (latest) 2. Add --no-workspace flag so services don't inherit root config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:35:41 -07:00
Manmohan	6f19a7c28c	Merge pull request #20 from manmohan659/feat/observability-stack feat(observability): Prometheus + Grafana + Loki stack (#9)	2026-04-16 15:32:22 -04:00
Manmohan	8a113d4757	Merge pull request #19 from manmohan659/feat/day2-operations feat(ops): Day 2 operations automation and chaos readiness (#10)	2026-04-16 15:32:19 -04:00
Manmohan Sharma	aa0818aae2	feat(observability): Prometheus + Grafana + Loki stack for samosaChaat (#9 ) Replaces the helm/observability scaffold with a real monitoring stack wired into the samosaChaat platform. Helm chart (helm/observability/) - Chart.yaml declares kube-prometheus-stack (~62.0) and loki-stack (~2.10) as subchart dependencies. - values.yaml configures Prometheus (15d retention, 50Gi PVC, ServiceMonitor + rule selector on app.kubernetes.io/part-of: samosachaat), Alertmanager (10Gi PVC), Grafana (OAuth-only via GitHub + Google, local login disabled, Prometheus + Loki datasources, dashboards auto-provisioned from a ConfigMap, email + Slack contact points with a critical route to Slack), Loki (50Gi, 30d retention, tsdb schema), and Promtail (JSON pipeline that lifts level / service / trace_id / user_id into labels, scrape config with pod labels). - Alert rules: HighCPU, HighMemory, DiskSpaceLow, High5xxRate, InferenceServiceDown, HighP99Latency. - templates/grafana-dashboards-configmap.yaml renders every file under dashboards/ into a single grafana_dashboard=1 ConfigMap. - dashboards/node-health.json, app-performance.json, inference.json - fully-formed Grafana dashboards with Prometheus datasource variable, templated app selector, thresholded gauges, and LogQL-ready labels. Scraping (helm/samosachaat/templates/servicemonitor.yaml) - ServiceMonitor CRs for auth / chat-api / inference that Prometheus picks up via the part-of=samosachaat selector; scrapes /metrics every 15s and replaces the app label so dashboards line up. Application instrumentation - services/{auth,chat-api,inference} each depend on prometheus-fastapi-instrumentator and expose /metrics (request count, latency histograms, in-progress gauges). - services/auth/src/logging_setup.py and services/inference/src/logging_setup.py mirror the canonical chat-api implementation - structlog JSON with service, trace_id, user_id context injection. - configure_logging() is called at create_app() in auth and inference; inference's main.py now uses structlog via get_logger() instead of logging.getLogger. - log_level setting added to auth + inference config (LOG_LEVEL env). Docs - contracts/logging-standard.md defines the required JSON fields, Python (structlog) + Node.js (pino) implementations, LogQL examples for cross-service queries, and the x-trace-id propagation contract. Closes #9 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 12:29:16 -07:00
Manmohan Sharma	0b8f9f0a5f	feat(ops): Day 2 operations automation and chaos runbook (#10 ) Adds tooling and documentation for Day 2 cluster operations: - scripts/rotate-nodes.sh: interactive node-rotation driver that applies terraform to pick up the latest SSM-resolved EKS AMI and watches the rolling replacement. - scripts/demo-schema-change.sh: end-to-end demo of the zero-downtime is_favorited column migration via helm upgrade + migration hook. - scripts/verify-deployment.sh: post-deploy health check across pods, per-service HTTP health endpoints, rollout status, and PDBs. - docs/chaos-runbook.md: failure-mode playbook with simulate / Grafana / Loki / recovery steps for six scenarios (pod kill, node failure, DB pool exhaustion, inference OOM, high latency, SSL issues) plus a Loki quick-reference. - terraform/modules/eks: expose current_node_ami_id output, add update_config.max_unavailable_percentage (configurable, default 33) so node-group rolls are controlled. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 12:25:47 -07:00
Manmohan	d98f50f64e	Merge pull request #18 from manmohan659/feat/cicd-pipeline feat(ci): CI/CD pipeline + Helm umbrella chart for samosaChaat (#8)	2026-04-16 15:12:45 -04:00
Manmohan Sharma	53f547fdef	feat(ci): CI/CD pipeline and Helm umbrella chart for samosaChaat (#8 ) Adds GitHub Actions workflows for per-service CI (paths-filter gated), dev image builds to ECR via OIDC, RC-tag UAT promotion with image re-tagging and Helm deploy, v-tag blue/green prod release with smoke test + ingress swap, and a nightly docker-compose integration suite. Ships a Helm umbrella chart (dev/uat/prod values) with Deployments, ClusterIP Services, ALB Ingress (samosachaat.art + grafana host), HPAs for chat-api/inference in prod, PDBs, ConfigMap/Secret wiring, and an alembic db-migrate Helm hook job. Wires commitlint + husky for Conventional Commits at the repo root. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 12:09:43 -07:00
Manmohan	1e2fc09ca6	Merge pull request #17 from manmohan659/feat/chat-api-service feat(chat-api): conversation orchestration + SSE streaming proxy (#6)	2026-04-16 14:57:10 -04:00
Manmohan	4297817cfb	Merge pull request #16 from manmohan659/feat/auth-service feat(auth): OAuth2 + JWT auth service with Alembic migrations (#5 #7)	2026-04-16 14:56:51 -04:00
Manmohan Sharma	8153a4fadf	feat(chat-api): conversation orchestration + SSE streaming proxy (#6 ) - FastAPI service that manages conversations and messages in PostgreSQL (SQLAlchemy 2.0 async + asyncpg) and streams assistant responses back to the client via sse-starlette, forwarding the inference service SSE contract unchanged. - Auth guard validates every request against the auth service /auth/validate endpoint (X-Internal-API-Key) and caches results in an in-process TTL cache (5 min, 1024 entries) to absorb request bursts. - Every query filters by authenticated user_id; cross-user access returns 404. Message send flow auto-titles the first message, persists the streamed assistant response after the client disconnects, and records token_count + inference_time_ms. - /api/models{,/swap} proxies the inference admin surface; swap requires is_admin on the validated user. - Structured JSON logging via structlog with trace_id + user_id ContextVars attached to every log line. - Test suite (pytest + aiosqlite + respx) covers CRUD, user scoping, streaming SSE persistence, regenerate, model proxy admin gate, and the stream proxy error path. 16/16 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 11:49:51 -07:00
Manmohan Sharma	4b4aca642a	feat(auth): OAuth2 + JWT auth service with Alembic migrations (#5 #7 ) - Alembic async migrations: users, conversations, messages, is_favorited - FastAPI auth service: Google + GitHub OAuth, RS256 JWT, refresh cookie - /auth/me, /auth/refresh, /auth/validate (service-to-service) - rate limiting 10/min on OAuth routes, CORS locked to FRONTEND_URL Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 11:47:00 -07:00
Manmohan	9bd0c907cc	Merge pull request #15 from manmohan659/feat/frontend-service feat(frontend): Next.js 14 frontend service for samosaChaat (Workstream A, #2)	2026-04-16 14:27:18 -04:00
Manmohan Sharma	634be4080b	feat(frontend): Next.js 14 frontend service for samosaChaat (#2 ) Build services/frontend/ replacing the legacy nanochat/ui.html single-file UI. Landing, login, and chat pages ported with full design system: Devanagari + Great Vibes hero, samosa/chai/toran SVG animations, gold/cream palette. - App Router pages: / (hero + floating illustrations), /login (split-screen OAuth with mandala motif), /chat (260px collapsible sidebar, suggestion chips, markdown + code-copy, auto-expanding input, slash commands) - SSE streaming via useSSE hook and /api/chat/stream BFF route (proxies to CHAT_API_URL when set, falls back to mock echo for local dev) - NextAuth.js v5 with Google + GitHub providers; middleware gates /chat/* - Zustand store with localStorage persistence for conversations/settings - Tailwind theme carries all ui.html tokens + keyframes (pendulum, float, wobble, steamFloat, steamType); SVG assets componentized under components/svg - Multi-stage node:20-alpine Dockerfile with Next standalone output Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 11:26:57 -07:00
Manmohan	2be82fe731	Merge pull request #13 from manmohan659/feat/terraform-infra feat(terraform): provision full AWS stack for samosaChaat	2026-04-16 14:26:20 -04:00
Manmohan	ba6988ee32	Merge pull request #14 from manmohan659/feat/inference-service [codex] Extract standalone inference service	2026-04-16 14:25:55 -04:00
Manmohan	a0533f2199	Merge pull request #12 from manmohan659/feat/monorepo-scaffold [codex] Scaffold monorepo platform layout	2026-04-16 14:25:35 -04:00
Manmohan Sharma	577771b890	extract standalone inference service	2026-04-16 11:19:18 -07:00
Manmohan Sharma	b381933c3b	feat(terraform): provision full AWS stack for samosaChaat (issue #4 ) Add reusable Terraform modules and per-environment configs (dev/uat/prod) in us-west-2 covering: VPC (3 AZ public/private), EKS 1.29 with IRSA and ALB/EBS/EFS CSI add-ons, RDS PostgreSQL 15, four ECR repos, IAM roles (EKS node, ALB controller IRSA, GitHub Actions OIDC), Route53 + ACM for samosachaat.art, and EFS for model weights. State backend on S3 (samosachaat-terraform-state) with DynamoDB lock table. terraform validate passes for dev, uat, and prod. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 11:11:02 -07:00
Manmohan Sharma	957f66181d	scaffold monorepo platform layout	2026-04-16 11:06:29 -07:00
Manmohan	baef0a3d66	Merge pull request #1 from manmohan659/codex/pre-gpu-readiness Add pre-GPU tool training and checkpoint plumbing	2026-03-24 20:53:39 -04:00
Manmohan Sharma	9f3973c677	Add pre-GPU tool training and checkpoint plumbing	2026-03-24 20:52:36 -04:00
Manmohan Sharma	e159c1cf9e	improve mobile responsiveness: proper scaling for phone/tablet - Tablet (700px): shrink illustrations, hide steam, wider message bubbles - Mobile (480px): smaller hero text, compact input bar, stacked footer, tighter spacing, scaled-down toran, properly sized illustrations - Small phones (360px): further reduced hero/illustration sizes - Safe area insets for notched phones Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 12:46:57 -04:00
Manmohan Sharma	5b1b9632fc	UI fixes: single-page layout, input near illustrations, remove cart, fix naming - Merged landing + chat into single page (samosa/chai slide out on first message) - Positioned input bar between samosa and chai illustrations - Footer at very bottom with Karpathy credit - Removed cart icon, fixed "Aachaat" → "Chaat" everywhere - Improved lemon SVG with stem/nub - "Explore" → "Samosa" label Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 12:22:53 -04:00
Manmohan Sharma	ee34586e77	redesign UI: artisan landing page + warm chat theme + ONNX export script Landing page with desi street-food aesthetic: lemon-mirchi toran with pendulum animation, dual-script hero (Devanagari + English cursive), samosa illustration with floating animation, brass chai kettle with steam wisps, ambient chilli/lemon doodles. Chat page carries the warm samosa-chaat palette with cream/gold user bubbles, steam-wisp typing indicator, and WebGPU integration hooks (window.samosaChaat API for local inference mode switching). Added scripts/export_onnx.py for ONNX model export with KV cache support, targeting WebGPU browser inference. Credit to Andrej Karpathy's nanochat in footer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 11:54:07 -04:00
Manmohan Sharma	40586713bd	fix KV cache dtype mismatch on CPU: use COMPUTE_DTYPE instead of hardcoded logic The KV cache was hardcoded to float32 on non-CUDA devices, but the model weights are loaded in bfloat16 via NANOCHAT_DTYPE env var. This caused a RuntimeError in scaled_dot_product_attention. Now uses COMPUTE_DTYPE from common.py which respects the env var. Also broadened CI/CD path triggers to nanochat/**. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 10:04:33 -04:00
Manmohan Sharma	c3f683f3e3	add CI/CD auto-deploy workflow for samosaChaat Deploys to EC2 on push to master when UI/server files change. Uses appleboy/ssh-action with stored secrets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 10:00:25 -04:00
Manmohan Sharma	c767741b42	rebrand to samosaChaat: UI, logo, and server messages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 09:58:12 -04:00
Andrej Karpathy	5019accc5b	fix scaling laws scripts after the bigram embeddings were removed	2026-03-17 16:55:56 +00:00
Andrej Karpathy	1b1cc3c599	submit new time to GPT-2 leaderboard entry: 99 minutes	2026-03-14 17:15:01 +00:00
Andrej Karpathy	a825e63f81	Autoresearch round 2: smear, backout, and hyperparameter tuning New architectural features: - Smear: mix previous token embedding into current position via learned gate, providing cheap bigram-like info (works in training + KV cache) - Backout: subtract learned fraction of mid-layer residual before logit projection to remove low-level features Hyperparameter tuning: - Muon momentum warmdown 0.97→0.90 during LR warmdown phase - Non-uniform per-layer init: resid_lambdas 1.15→1.05, x0_lambdas 0.20→0.05 - c_fc init scale 0.4x, QK norm scale 1.2, sliding window seq_len/4 - Speedrun data:params ratio reduced to 8 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-14 17:03:06 +00:00
Andrej Karpathy	f068604948	new leaderboard entry coming from improvements of autoresearch round 1, time to gpt-2 from 2.02 hours to 1.80 hours	2026-03-10 06:26:39 +00:00
Andrej Karpathy	6ed7d1d82c	All of these improvements were developed by Claude running autonomously over ~2 days using autoresearch. I didn't touch anything - incredible. All tuning was done on d12 but generalized easily to larger models (e.g. d24 in particular). This means we will also get a new "Time to GPT-2" Leaderboard entry, which I will push separately. Optimizer & schedule changes: - Increase unembedding LR 0.004 -> 0.008, weight decay 0.2 -> 0.28 - Per-group Adam betas and weight decay (instead of shared global betas) - Muon beta2 0.95 -> 0.9, momentum warmup target 0.95 -> 0.97 over 400 steps - Warmup: ratio-based -> absolute steps (default 40) - Warmdown ratio 0.5 -> 0.65, final LR fraction 0.0 -> 0.05 - Weight decay schedule: linear -> cosine decay - Polar express norm factor 1.02 -> 1.01 Architecture & init changes: - VE gate: channels 32 -> 12, scale range 2x -> 3x, init small positive - Add post-QK-norm scaling (q,k *= 1.15) for sharper attention - Embedding init std 1.0 -> 0.8, MLP c_fc init 0.5x smaller - RoPE base theta 10K -> 100K - Short attention window: seq_len/2 -> ~seq_len/3 (ceil to 128 tile) - Logit softcap 20 -> 15	2026-03-09 20:45:17 +00:00
Andrej Karpathy	1076f97059	delete autocast, an unnecessary thorn in my side, manage dtypes directly	2026-03-04 23:55:30 +00:00
Sofie Van Landeghem	752abc836e	Ensure that inputs and targets are contiguous (#569 ) * call reshape instead of view in case the tensors are not contiguous * fix directly in data loader instead	2026-03-04 13:58:27 -08:00
Andrej Karpathy	4b4077425b	Document new Leaderboard entry congrats @ddudek for pointing out ClimbMix, time to GPT-2 is now 2.01 hours, down from 2.76 previously	2026-03-04 20:02:07 +00:00
Andrej Karpathy	324e69c45d	big, breaking change but large upside: swap previous FineWeb-EDU dataset to NVIDIA ClimbMix dataset. Requires people to download the data shards. The upside is that training GPT-2 capablity model now only takes ~2 hours, down from 2.76 hours, so this is a huge win data-wise	2026-03-04 19:47:12 +00:00
Andrej Karpathy	b07604ebaa	document the legacy fineweb100b dataset and the new climbmix400b dataset	2026-03-03 17:24:31 +00:00
Andrej Karpathy	aba30cb037	tune logit softcap?	2026-03-03 00:38:53 +00:00
Anish	83dccc20ae	Restore completion-only loss masking in SFT dataloader (#582 ) * printing steps count * adding reply only loss for chat * using the mask by render_conversation function of tokeniser * undoing some changes * putting back the comment which got removed accidently, no functionality change	2026-03-02 16:37:47 -08:00
Dipesh Babu	c7ba252142	docs: fix typos in experiment log (#547 )	2026-02-20 08:03:45 -08:00
Andrej Karpathy	2dffdc8cf6	document MoE exploration	2026-02-19 02:53:47 +00:00
Andrej Karpathy	48804bff3a	report negative result on fineweb dataset	2026-02-18 23:45:31 +00:00
Andrej Karpathy	bb5137860e	fix comment	2026-02-18 23:26:22 +00:00
Andrej Karpathy	458555117b	Merge branch 'Chetter2-patch-1'	2026-02-18 23:17:39 +00:00
Andrej Karpathy	bac5a35dd7	fix minor bug in fp8 application to skip tiny matmuls	2026-02-18 23:17:29 +00:00
George Shakan	ad55575326	Fix bug in setting precision (#538 )	2026-02-18 15:49:18 +00:00

1 2 3 4 5 ...

383 Commits