Remove NextAuth and replace with token-based auth against the backend
auth service (OAuth + JWT). The frontend now redirects login to
/api/auth/google and /api/auth/github (proxied by nginx to the auth
service), captures the JWT from the redirect query param, and uses it
for all API calls.
Key changes:
- Remove next-auth dependency and all NextAuth config/routes
- Add lib/auth-client.ts (JWT token storage + auth headers)
- Add hooks/useAuth.ts (client-side auth state + token capture)
- Rewrite middleware.ts to pass-through (client-side auth only)
- Login page uses plain <a> links to /api/auth/{provider}
- Chat page captures access_token from OAuth redirect
- Zustand store fetches conversations from real chat-api via JWT
- API routes proxy /api/conversations/* to chat-api with auth
- chat/stream route supports conversationId + auth header forwarding
- useSSE hook accepts auth headers for authenticated streaming
- Sidebar loads conversations from API, supports delete
- Landing page (Hero, LandingNav) uses useAuth instead of useSession
- Add .env.production.example and scripts/generate-jwt-keys.sh
Mock echo fallback preserved when CHAT_API_URL is not set.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds tooling and documentation for Day 2 cluster operations:
- scripts/rotate-nodes.sh: interactive node-rotation driver that applies
terraform to pick up the latest SSM-resolved EKS AMI and watches the
rolling replacement.
- scripts/demo-schema-change.sh: end-to-end demo of the zero-downtime
is_favorited column migration via helm upgrade + migration hook.
- scripts/verify-deployment.sh: post-deploy health check across pods,
per-service HTTP health endpoints, rollout status, and PDBs.
- docs/chaos-runbook.md: failure-mode playbook with simulate / Grafana /
Loki / recovery steps for six scenarios (pod kill, node failure, DB
pool exhaustion, inference OOM, high latency, SSL issues) plus a
Loki quick-reference.
- terraform/modules/eks: expose current_node_ami_id output, add
update_config.max_unavailable_percentage (configurable, default 33)
so node-group rolls are controlled.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Landing page with desi street-food aesthetic: lemon-mirchi toran with
pendulum animation, dual-script hero (Devanagari + English cursive),
samosa illustration with floating animation, brass chai kettle with
steam wisps, ambient chilli/lemon doodles.
Chat page carries the warm samosa-chaat palette with cream/gold user
bubbles, steam-wisp typing indicator, and WebGPU integration hooks
(window.samosaChaat API for local inference mode switching).
Added scripts/export_onnx.py for ONNX model export with KV cache
support, targeting WebGPU browser inference.
Credit to Andrej Karpathy's nanochat in footer.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New architectural features:
- Smear: mix previous token embedding into current position via learned
gate, providing cheap bigram-like info (works in training + KV cache)
- Backout: subtract learned fraction of mid-layer residual before logit
projection to remove low-level features
Hyperparameter tuning:
- Muon momentum warmdown 0.97→0.90 during LR warmdown phase
- Non-uniform per-layer init: resid_lambdas 1.15→1.05, x0_lambdas 0.20→0.05
- c_fc init scale 0.4x, QK norm scale 1.2, sliding window seq_len/4
- Speedrun data:params ratio reduced to 8
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* printing steps count
* adding reply only loss for chat
* using the mask by render_conversation function of tokeniser
* undoing some changes
* putting back the comment which got removed accidently, no functionality change