- deploy.sh: single script to switch between EC2 and EKS modes
- ec2: docker-compose with ECR images + nginx SSL reverse proxy
- eks: terraform apply + helm install (for demos/grading)
- eks-down: terraform destroy (stop costs)
- docker-compose.prod.yml: ECR image overrides + nginx service
- nginx/nginx.conf: reverse proxy with SSL, SSE streaming support
- deploy-ec2.yml: auto-deploy to EC2 after images are built
- Remove old single-server deploy.yml
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The observability PR added structlog and prometheus-fastapi-instrumentator
to inference pyproject.toml but did not regenerate uv.lock, causing
Docker build to fail with --locked flag.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root pyproject.toml uses uv features (extra in sources, conflicts)
that caused uv sync to fail in CI. Fix by:
1. Replace pip install uv==0.4.30 with astral-sh/setup-uv@v4 (latest)
2. Add --no-workspace flag so services don't inherit root config
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the helm/observability scaffold with a real monitoring stack
wired into the samosaChaat platform.
Helm chart (helm/observability/)
- Chart.yaml declares kube-prometheus-stack (~62.0) and loki-stack
(~2.10) as subchart dependencies.
- values.yaml configures Prometheus (15d retention, 50Gi PVC,
ServiceMonitor + rule selector on app.kubernetes.io/part-of:
samosachaat), Alertmanager (10Gi PVC), Grafana (OAuth-only via
GitHub + Google, local login disabled, Prometheus + Loki datasources,
dashboards auto-provisioned from a ConfigMap, email + Slack contact
points with a critical route to Slack), Loki (50Gi, 30d retention,
tsdb schema), and Promtail (JSON pipeline that lifts level / service
/ trace_id / user_id into labels, scrape config with pod labels).
- Alert rules: HighCPU, HighMemory, DiskSpaceLow, High5xxRate,
InferenceServiceDown, HighP99Latency.
- templates/grafana-dashboards-configmap.yaml renders every file under
dashboards/ into a single grafana_dashboard=1 ConfigMap.
- dashboards/node-health.json, app-performance.json, inference.json -
fully-formed Grafana dashboards with Prometheus datasource variable,
templated app selector, thresholded gauges, and LogQL-ready labels.
Scraping (helm/samosachaat/templates/servicemonitor.yaml)
- ServiceMonitor CRs for auth / chat-api / inference that Prometheus
picks up via the part-of=samosachaat selector; scrapes /metrics
every 15s and replaces the app label so dashboards line up.
Application instrumentation
- services/{auth,chat-api,inference} each depend on
prometheus-fastapi-instrumentator and expose /metrics (request count,
latency histograms, in-progress gauges).
- services/auth/src/logging_setup.py and
services/inference/src/logging_setup.py mirror the canonical
chat-api implementation - structlog JSON with service, trace_id,
user_id context injection.
- configure_logging() is called at create_app() in auth and inference;
inference's main.py now uses structlog via get_logger() instead of
logging.getLogger.
- log_level setting added to auth + inference config (LOG_LEVEL env).
Docs
- contracts/logging-standard.md defines the required JSON fields,
Python (structlog) + Node.js (pino) implementations, LogQL examples
for cross-service queries, and the x-trace-id propagation contract.
Closes#9
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds tooling and documentation for Day 2 cluster operations:
- scripts/rotate-nodes.sh: interactive node-rotation driver that applies
terraform to pick up the latest SSM-resolved EKS AMI and watches the
rolling replacement.
- scripts/demo-schema-change.sh: end-to-end demo of the zero-downtime
is_favorited column migration via helm upgrade + migration hook.
- scripts/verify-deployment.sh: post-deploy health check across pods,
per-service HTTP health endpoints, rollout status, and PDBs.
- docs/chaos-runbook.md: failure-mode playbook with simulate / Grafana /
Loki / recovery steps for six scenarios (pod kill, node failure, DB
pool exhaustion, inference OOM, high latency, SSL issues) plus a
Loki quick-reference.
- terraform/modules/eks: expose current_node_ami_id output, add
update_config.max_unavailable_percentage (configurable, default 33)
so node-group rolls are controlled.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds GitHub Actions workflows for per-service CI (paths-filter gated),
dev image builds to ECR via OIDC, RC*-tag UAT promotion with image
re-tagging and Helm deploy, v*-tag blue/green prod release with smoke
test + ingress swap, and a nightly docker-compose integration suite.
Ships a Helm umbrella chart (dev/uat/prod values) with Deployments,
ClusterIP Services, ALB Ingress (samosachaat.art + grafana host), HPAs
for chat-api/inference in prod, PDBs, ConfigMap/Secret wiring, and an
alembic db-migrate Helm hook job.
Wires commitlint + husky for Conventional Commits at the repo root.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- FastAPI service that manages conversations and messages in PostgreSQL
(SQLAlchemy 2.0 async + asyncpg) and streams assistant responses back
to the client via sse-starlette, forwarding the inference service SSE
contract unchanged.
- Auth guard validates every request against the auth service
/auth/validate endpoint (X-Internal-API-Key) and caches results in an
in-process TTL cache (5 min, 1024 entries) to absorb request bursts.
- Every query filters by authenticated user_id; cross-user access
returns 404. Message send flow auto-titles the first message,
persists the streamed assistant response after the client disconnects,
and records token_count + inference_time_ms.
- /api/models{,/swap} proxies the inference admin surface; swap
requires is_admin on the validated user.
- Structured JSON logging via structlog with trace_id + user_id
ContextVars attached to every log line.
- Test suite (pytest + aiosqlite + respx) covers CRUD, user scoping,
streaming SSE persistence, regenerate, model proxy admin gate,
and the stream proxy error path. 16/16 passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add reusable Terraform modules and per-environment configs (dev/uat/prod)
in us-west-2 covering: VPC (3 AZ public/private), EKS 1.29 with IRSA and
ALB/EBS/EFS CSI add-ons, RDS PostgreSQL 15, four ECR repos, IAM roles
(EKS node, ALB controller IRSA, GitHub Actions OIDC), Route53 + ACM for
samosachaat.art, and EFS for model weights. State backend on S3
(samosachaat-terraform-state) with DynamoDB lock table.
terraform validate passes for dev, uat, and prod.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Merged landing + chat into single page (samosa/chai slide out on first message)
- Positioned input bar between samosa and chai illustrations
- Footer at very bottom with Karpathy credit
- Removed cart icon, fixed "Aachaat" → "Chaat" everywhere
- Improved lemon SVG with stem/nub
- "Explore" → "Samosa" label
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Landing page with desi street-food aesthetic: lemon-mirchi toran with
pendulum animation, dual-script hero (Devanagari + English cursive),
samosa illustration with floating animation, brass chai kettle with
steam wisps, ambient chilli/lemon doodles.
Chat page carries the warm samosa-chaat palette with cream/gold user
bubbles, steam-wisp typing indicator, and WebGPU integration hooks
(window.samosaChaat API for local inference mode switching).
Added scripts/export_onnx.py for ONNX model export with KV cache
support, targeting WebGPU browser inference.
Credit to Andrej Karpathy's nanochat in footer.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The KV cache was hardcoded to float32 on non-CUDA devices, but the model
weights are loaded in bfloat16 via NANOCHAT_DTYPE env var. This caused a
RuntimeError in scaled_dot_product_attention. Now uses COMPUTE_DTYPE from
common.py which respects the env var.
Also broadened CI/CD path triggers to nanochat/**.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deploys to EC2 on push to master when UI/server files change.
Uses appleboy/ssh-action with stored secrets.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New architectural features:
- Smear: mix previous token embedding into current position via learned
gate, providing cheap bigram-like info (works in training + KV cache)
- Backout: subtract learned fraction of mid-layer residual before logit
projection to remove low-level features
Hyperparameter tuning:
- Muon momentum warmdown 0.97→0.90 during LR warmdown phase
- Non-uniform per-layer init: resid_lambdas 1.15→1.05, x0_lambdas 0.20→0.05
- c_fc init scale 0.4x, QK norm scale 1.2, sliding window seq_len/4
- Speedrun data:params ratio reduced to 8
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* printing steps count
* adding reply only loss for chat
* using the mask by render_conversation function of tokeniser
* undoing some changes
* putting back the comment which got removed accidently, no functionality change