Commit Graph

415 Commits

Author SHA1 Message Date
Manmohan
1d2a76eec4
feat: deploy d24 SFT + polished UI redesign with dark mode (#39)
* feat(inference): deploy d24 SFT weights to Modal

Repoint Modal inference app from the broken d20 checkpoint to our own
ManmohanSharma/nanochat-d24 SFT step 484. Rewrites the standalone model
as an inference-only port of nanochat/gpt.py so the modern architecture
(smear gate, per-layer value embeddings, ve_gate, backout, sliding
window attention via SDPA, rotary base 100000, padded vocab, logit
softcap) loads cleanly from the checkpoint. Tokenizer loads the pickled
tiktoken encoding directly so special tokens end up at their true IDs
(32759-32767), and the stop check uses that set instead of hardcoded
0-8. GPU bumped to L4 for headroom. HF token sourced from the
'huggingface' Modal secret.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(frontend): polished redesign with serif display + dark mode

Lifts the craft level of the landing and chat UI without changing the
desi identity. Adds Fraunces for display headlines, a floating pill
LandingNav, a saffron-glow hero with a large serif headline and black
pill CTAs, and three gradient-tiled feature cards with inline SVG
glyphs replacing the emoji cards. The chat empty state is now a serif
greeting with pill-chip prompt starters, and ChatInput is a single
rounded pod so the send button sits inside the input (fixes the
misaligned floating button). Adds a class-based dark mode across the
chat surfaces with a sun/moon toggle in the sidebar footer, powered by
a small useTheme hook and a no-flash init script in the root layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(frontend): add ESLint config so CI lint step passes

next lint was failing with an interactive prompt because the repo had
no ESLint config. Adds a minimal next/core-web-vitals extends and
drops the now-unloadable @typescript-eslint/no-explicit-any disable
directive in the stream proxy by narrowing the body type to unknown.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 19:55:16 -04:00
Manmohan
272086d2c0
Merge pull request #38 from manmohan659/fix/modal-url-detection
fix(frontend): fix stale closure - tokens now render
2026-04-16 18:16:07 -04:00
Manmohan Sharma
16f40ceb54
fix(frontend): pass assistantMsgId directly to fix stale closure bug 2026-04-16 15:15:53 -07:00
Manmohan
7387f7c1d1
Merge pull request #37 from manmohan659/fix/modal-url-detection
fix: direct SSE streaming from chat-api (bypass Next.js proxy)
2026-04-16 18:08:59 -04:00
Manmohan Sharma
a873b6ad46
fix: stream directly from chat-api, bypass Next.js proxy
Replaced the double-proxy (browser→Next.js→chat-api→Modal) with
direct streaming (browser→nginx→chat-api→Modal). Added nginx route
for /api/conversations → chat-api. Inlined SSE parsing in ChatWindow
instead of useSSE hook going through /api/chat/stream.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 15:08:46 -07:00
Manmohan
e9885b2583
Merge pull request #36 from manmohan659/fix/modal-url-detection
fix(chat-api): detect Modal URL by domain not path
2026-04-16 17:59:34 -04:00
Manmohan Sharma
df0584b861
fix(chat-api): detect Modal URL by domain not path suffix 2026-04-16 14:59:20 -07:00
Manmohan
2dd914a69d
Merge pull request #35 from manmohan659/fix/stream-body-format
fix(frontend): type fix for proxyUpstream
2026-04-16 17:53:02 -04:00
Manmohan Sharma
7ecd8a928c
fix(frontend): use any type for proxyUpstream body param 2026-04-16 14:52:50 -07:00
Manmohan
15bb2324e2
Merge pull request #34 from manmohan659/fix/stream-body-format
fix(frontend): add maxTokens to StreamBody type
2026-04-16 17:51:15 -04:00
Manmohan Sharma
fe34250900
fix(frontend): add maxTokens to StreamBody interface 2026-04-16 14:51:03 -07:00
Manmohan
c5d4d17650
Merge pull request #33 from manmohan659/fix/stream-body-format
fix(frontend): correct body format for chat-api messages
2026-04-16 17:49:33 -04:00
Manmohan Sharma
faf4810696
fix(frontend): send correct body format to chat-api messages endpoint
Chat-api expects {content, temperature, max_tokens, top_k} but frontend
was sending {messages: [...]}. Now extracts last user message as content
when proxying to /api/conversations/:id/messages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:49:22 -07:00
Manmohan
9071685d85
Merge pull request #32 from manmohan659/fix/nginx-routing
fix(nginx): route /api/* through frontend not chat-api
2026-04-16 17:46:09 -04:00
Manmohan Sharma
3f7a7da30b
fix(nginx): route all /api/* through frontend, not directly to chat-api
Nginx was catching /api/chat/stream and /api/conversations and sending
them to chat-api:8002, bypassing the frontend's Next.js API routes.
Now only /api/auth/* goes directly to auth service. Everything else
goes to frontend, which proxies internally to backend services.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:45:49 -07:00
Manmohan
129553b215
Merge pull request #31 from manmohan659/fix/chat-api-fk
fix(chat-api): defer users FK to avoid startup crash
2026-04-16 17:41:05 -04:00
Manmohan Sharma
e8222011d9
fix(chat-api): use_alter on users FK to avoid metadata resolution error
Chat-api doesn't define the users model (owned by auth service), so
SQLAlchemy can't resolve the FK. use_alter=True defers the constraint
to ALTER TABLE, avoiding the NoReferencedTableError at startup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:40:45 -07:00
Manmohan
5b6eff82e8
Merge pull request #30 from manmohan659/feat/modal-inference
fix(chat-api): support Modal inference URL pattern
2026-04-16 17:36:56 -04:00
Manmohan Sharma
6d3e1f0afd
fix(chat-api): support Modal inference URL in inference client
The inference client now auto-detects if the URL already ends with
/generate (Modal's endpoint URL pattern) and skips appending the path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:36:36 -07:00
Manmohan
95b1ffc0fd
Merge pull request #29 from manmohan659/feat/modal-inference
feat(modal): Modal GPU inference endpoint
2026-04-16 17:32:22 -04:00
Manmohan Sharma
e5b4db1eee
feat(modal): add Modal GPU inference endpoint for samosaChaat
- modal/serve.py: FastAPI endpoint on Modal T4 GPU, streams SSE tokens
- modal/_model.py: Standalone GPT model (auto-detects architecture from checkpoint)
- modal/_tokenizer.py: Standalone BPE tokenizer (tiktoken-based)
- Downloads nanochat-students/base-d20 weights from HuggingFace
- Deployed at: https://manmohan659--samosachaat-inference-inference-generate.modal.run

Deploy: modal deploy modal/serve.py
Dev:    modal serve modal/serve.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:32:09 -07:00
Manmohan
40ce6c1a89
Merge pull request #28 from manmohan659/fix/ui-redesign
fix(ui): redesign landing page + chat UI
2026-04-16 17:05:16 -04:00
Manmohan Sharma
36debd8502
fix(frontend): redesign landing and chat pages for warm, premium look
Landing page: warm gradient background, illustrations flanking hero text
(180-220px), new tagline, features section with 3 cards, footer updated
to "Built by Manmohan", gold CTA and nav buttons, toran moved to hero.

Chat page: removed "Chat Completions" header, added samosa logo and
bigger suggestion cards to empty state, sidebar empty state message,
input area top border/shadow, more prominent new chat button.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:03:55 -07:00
Manmohan
b0df1dca2e
Merge pull request #27 from manmohan659/fix/docker-compose-env
fix(docker): pass missing auth env vars in docker-compose
2026-04-16 16:54:03 -04:00
Manmohan Sharma
b7971313ba
fix(docker): pass missing env vars to auth service
AUTH_BASE_URL, FRONTEND_URL, INTERNAL_API_KEY, SESSION_SECRET,
COOKIE_SECURE, COOKIE_DOMAIN, REFRESH_COOKIE_NAME were in .env
but not passed to auth container. OAuth callbacks were using
localhost:8001 instead of the public URL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:53:52 -07:00
Manmohan
b5fbebb63f
Merge pull request #26 from manmohan659/fix/missing-models
fix: add missing SQLAlchemy models to auth and chat-api
2026-04-16 16:50:22 -04:00
Manmohan Sharma
8a95a76522
fix: add missing models/ dirs to auth and chat-api services
Root .gitignore had `models/` which matched both ML weights AND
SQLAlchemy model dirs. Changed to `/models/` (root only).
Added auth/src/models/ (User) and chat-api/src/models/ (Conversation, Message).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:50:08 -07:00
Manmohan
8885f52ba1
Merge pull request #25 from manmohan659/fix/docker-deps
fix(docker): add missing deps to auth and chat-api Dockerfiles
2026-04-16 16:47:06 -04:00
Manmohan Sharma
2061f8848b
fix(docker): add structlog + prometheus deps to auth and chat-api Dockerfiles
Auth service was crash-looping with ModuleNotFoundError for
prometheus_fastapi_instrumentator. Chat-api was also missing it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:46:53 -07:00
Manmohan
bfa34a8a0e
Merge pull request #24 from manmohan659/feat/e2e-integration
feat(frontend): wire frontend to real backend auth + chat-api
2026-04-16 16:23:05 -04:00
Manmohan Sharma
aa7a907063
feat(frontend): wire frontend to real backend auth + chat-api services
Remove NextAuth and replace with token-based auth against the backend
auth service (OAuth + JWT). The frontend now redirects login to
/api/auth/google and /api/auth/github (proxied by nginx to the auth
service), captures the JWT from the redirect query param, and uses it
for all API calls.

Key changes:
- Remove next-auth dependency and all NextAuth config/routes
- Add lib/auth-client.ts (JWT token storage + auth headers)
- Add hooks/useAuth.ts (client-side auth state + token capture)
- Rewrite middleware.ts to pass-through (client-side auth only)
- Login page uses plain <a> links to /api/auth/{provider}
- Chat page captures access_token from OAuth redirect
- Zustand store fetches conversations from real chat-api via JWT
- API routes proxy /api/conversations/* to chat-api with auth
- chat/stream route supports conversationId + auth header forwarding
- useSSE hook accepts auth headers for authenticated streaming
- Sidebar loads conversations from API, supports delete
- Landing page (Hero, LandingNav) uses useAuth instead of useSession
- Add .env.production.example and scripts/generate-jwt-keys.sh

Mock echo fallback preserved when CHAT_API_URL is not set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:21:38 -07:00
Manmohan
7e6ecc1d43
Merge pull request #23 from manmohan659/feat/dual-deploy
feat(deploy): dual-mode deploy switch (EC2 monolith + EKS)
2026-04-16 15:58:10 -04:00
Manmohan Sharma
b766dcf703
feat(deploy): add dual-mode deploy switch (EC2 monolith + EKS)
- deploy.sh: single script to switch between EC2 and EKS modes
  - ec2: docker-compose with ECR images + nginx SSL reverse proxy
  - eks: terraform apply + helm install (for demos/grading)
  - eks-down: terraform destroy (stop costs)
- docker-compose.prod.yml: ECR image overrides + nginx service
- nginx/nginx.conf: reverse proxy with SSL, SSE streaming support
- deploy-ec2.yml: auto-deploy to EC2 after images are built
- Remove old single-server deploy.yml

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 12:57:57 -07:00
Manmohan
9095cf01a8
Merge pull request #22 from manmohan659/fix/inference-lockfile
fix(inference): regenerate uv.lock for new deps
2026-04-16 15:49:19 -04:00
Manmohan Sharma
07892c0f00
fix(inference): regenerate uv.lock after structlog/prometheus deps added
The observability PR added structlog and prometheus-fastapi-instrumentator
to inference pyproject.toml but did not regenerate uv.lock, causing
Docker build to fail with --locked flag.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 12:49:05 -07:00
Manmohan
3c0b1ae16b
Merge pull request #21 from manmohan659/fix/ci-and-frontend
fix(ci): use setup-uv and --no-workspace for service tests
2026-04-16 15:36:27 -04:00
Manmohan Sharma
66bac1aa5f
fix(ci): use astral-sh/setup-uv and --no-workspace for service tests
Root pyproject.toml uses uv features (extra in sources, conflicts)
that caused uv sync to fail in CI. Fix by:
1. Replace pip install uv==0.4.30 with astral-sh/setup-uv@v4 (latest)
2. Add --no-workspace flag so services don't inherit root config

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 12:35:41 -07:00
Manmohan
6f19a7c28c
Merge pull request #20 from manmohan659/feat/observability-stack
feat(observability): Prometheus + Grafana + Loki stack (#9)
2026-04-16 15:32:22 -04:00
Manmohan
8a113d4757
Merge pull request #19 from manmohan659/feat/day2-operations
feat(ops): Day 2 operations automation and chaos readiness (#10)
2026-04-16 15:32:19 -04:00
Manmohan Sharma
aa0818aae2
feat(observability): Prometheus + Grafana + Loki stack for samosaChaat (#9)
Replaces the helm/observability scaffold with a real monitoring stack
wired into the samosaChaat platform.

Helm chart (helm/observability/)
- Chart.yaml declares kube-prometheus-stack (~62.0) and loki-stack
  (~2.10) as subchart dependencies.
- values.yaml configures Prometheus (15d retention, 50Gi PVC,
  ServiceMonitor + rule selector on app.kubernetes.io/part-of:
  samosachaat), Alertmanager (10Gi PVC), Grafana (OAuth-only via
  GitHub + Google, local login disabled, Prometheus + Loki datasources,
  dashboards auto-provisioned from a ConfigMap, email + Slack contact
  points with a critical route to Slack), Loki (50Gi, 30d retention,
  tsdb schema), and Promtail (JSON pipeline that lifts level / service
  / trace_id / user_id into labels, scrape config with pod labels).
- Alert rules: HighCPU, HighMemory, DiskSpaceLow, High5xxRate,
  InferenceServiceDown, HighP99Latency.
- templates/grafana-dashboards-configmap.yaml renders every file under
  dashboards/ into a single grafana_dashboard=1 ConfigMap.
- dashboards/node-health.json, app-performance.json, inference.json -
  fully-formed Grafana dashboards with Prometheus datasource variable,
  templated app selector, thresholded gauges, and LogQL-ready labels.

Scraping (helm/samosachaat/templates/servicemonitor.yaml)
- ServiceMonitor CRs for auth / chat-api / inference that Prometheus
  picks up via the part-of=samosachaat selector; scrapes /metrics
  every 15s and replaces the app label so dashboards line up.

Application instrumentation
- services/{auth,chat-api,inference} each depend on
  prometheus-fastapi-instrumentator and expose /metrics (request count,
  latency histograms, in-progress gauges).
- services/auth/src/logging_setup.py and
  services/inference/src/logging_setup.py mirror the canonical
  chat-api implementation - structlog JSON with service, trace_id,
  user_id context injection.
- configure_logging() is called at create_app() in auth and inference;
  inference's main.py now uses structlog via get_logger() instead of
  logging.getLogger.
- log_level setting added to auth + inference config (LOG_LEVEL env).

Docs
- contracts/logging-standard.md defines the required JSON fields,
  Python (structlog) + Node.js (pino) implementations, LogQL examples
  for cross-service queries, and the x-trace-id propagation contract.

Closes #9

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 12:29:16 -07:00
Manmohan Sharma
0b8f9f0a5f
feat(ops): Day 2 operations automation and chaos runbook (#10)
Adds tooling and documentation for Day 2 cluster operations:

- scripts/rotate-nodes.sh: interactive node-rotation driver that applies
  terraform to pick up the latest SSM-resolved EKS AMI and watches the
  rolling replacement.
- scripts/demo-schema-change.sh: end-to-end demo of the zero-downtime
  is_favorited column migration via helm upgrade + migration hook.
- scripts/verify-deployment.sh: post-deploy health check across pods,
  per-service HTTP health endpoints, rollout status, and PDBs.
- docs/chaos-runbook.md: failure-mode playbook with simulate / Grafana /
  Loki / recovery steps for six scenarios (pod kill, node failure, DB
  pool exhaustion, inference OOM, high latency, SSL issues) plus a
  Loki quick-reference.
- terraform/modules/eks: expose current_node_ami_id output, add
  update_config.max_unavailable_percentage (configurable, default 33)
  so node-group rolls are controlled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 12:25:47 -07:00
Manmohan
d98f50f64e
Merge pull request #18 from manmohan659/feat/cicd-pipeline
feat(ci): CI/CD pipeline + Helm umbrella chart for samosaChaat (#8)
2026-04-16 15:12:45 -04:00
Manmohan Sharma
53f547fdef
feat(ci): CI/CD pipeline and Helm umbrella chart for samosaChaat (#8)
Adds GitHub Actions workflows for per-service CI (paths-filter gated),
dev image builds to ECR via OIDC, RC*-tag UAT promotion with image
re-tagging and Helm deploy, v*-tag blue/green prod release with smoke
test + ingress swap, and a nightly docker-compose integration suite.

Ships a Helm umbrella chart (dev/uat/prod values) with Deployments,
ClusterIP Services, ALB Ingress (samosachaat.art + grafana host), HPAs
for chat-api/inference in prod, PDBs, ConfigMap/Secret wiring, and an
alembic db-migrate Helm hook job.

Wires commitlint + husky for Conventional Commits at the repo root.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 12:09:43 -07:00
Manmohan
1e2fc09ca6
Merge pull request #17 from manmohan659/feat/chat-api-service
feat(chat-api): conversation orchestration + SSE streaming proxy (#6)
2026-04-16 14:57:10 -04:00
Manmohan
4297817cfb
Merge pull request #16 from manmohan659/feat/auth-service
feat(auth): OAuth2 + JWT auth service with Alembic migrations (#5 #7)
2026-04-16 14:56:51 -04:00
Manmohan Sharma
8153a4fadf
feat(chat-api): conversation orchestration + SSE streaming proxy (#6)
- FastAPI service that manages conversations and messages in PostgreSQL
  (SQLAlchemy 2.0 async + asyncpg) and streams assistant responses back
  to the client via sse-starlette, forwarding the inference service SSE
  contract unchanged.
- Auth guard validates every request against the auth service
  /auth/validate endpoint (X-Internal-API-Key) and caches results in an
  in-process TTL cache (5 min, 1024 entries) to absorb request bursts.
- Every query filters by authenticated user_id; cross-user access
  returns 404. Message send flow auto-titles the first message,
  persists the streamed assistant response after the client disconnects,
  and records token_count + inference_time_ms.
- /api/models{,/swap} proxies the inference admin surface; swap
  requires is_admin on the validated user.
- Structured JSON logging via structlog with trace_id + user_id
  ContextVars attached to every log line.
- Test suite (pytest + aiosqlite + respx) covers CRUD, user scoping,
  streaming SSE persistence, regenerate, model proxy admin gate,
  and the stream proxy error path. 16/16 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 11:49:51 -07:00
Manmohan Sharma
4b4aca642a
feat(auth): OAuth2 + JWT auth service with Alembic migrations (#5 #7)
- Alembic async migrations: users, conversations, messages, is_favorited
- FastAPI auth service: Google + GitHub OAuth, RS256 JWT, refresh cookie
- /auth/me, /auth/refresh, /auth/validate (service-to-service)
- rate limiting 10/min on OAuth routes, CORS locked to FRONTEND_URL

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 11:47:00 -07:00
Manmohan
9bd0c907cc
Merge pull request #15 from manmohan659/feat/frontend-service
feat(frontend): Next.js 14 frontend service for samosaChaat (Workstream A, #2)
2026-04-16 14:27:18 -04:00
Manmohan Sharma
634be4080b
feat(frontend): Next.js 14 frontend service for samosaChaat (#2)
Build services/frontend/ replacing the legacy nanochat/ui.html single-file UI.
Landing, login, and chat pages ported with full design system: Devanagari +
Great Vibes hero, samosa/chai/toran SVG animations, gold/cream palette.

- App Router pages: / (hero + floating illustrations), /login (split-screen
  OAuth with mandala motif), /chat (260px collapsible sidebar, suggestion
  chips, markdown + code-copy, auto-expanding input, slash commands)
- SSE streaming via useSSE hook and /api/chat/stream BFF route (proxies to
  CHAT_API_URL when set, falls back to mock echo for local dev)
- NextAuth.js v5 with Google + GitHub providers; middleware gates /chat/*
- Zustand store with localStorage persistence for conversations/settings
- Tailwind theme carries all ui.html tokens + keyframes (pendulum, float,
  wobble, steamFloat, steamType); SVG assets componentized under components/svg
- Multi-stage node:20-alpine Dockerfile with Next standalone output

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 11:26:57 -07:00
Manmohan
2be82fe731
Merge pull request #13 from manmohan659/feat/terraform-infra
feat(terraform): provision full AWS stack for samosaChaat
2026-04-16 14:26:20 -04:00