Commit Graph

12 Commits

Author SHA1 Message Date
Manmohan Sharma
215e8bd8c3
feat(ui): add Search toggle that forces web_search every message
New Globe/'Search' toggle next to the Brain/'Think' button. When ON, every message sent pushes force_web_search=true through: frontend -> chat-api -> Modal. Modal bypasses the heuristic classifier and always pre-seeds the assistant turn with a real Tavily-grounded tool call + result. Toggle is independent of Think — use either or both. Classifier still runs when toggle is OFF, so auto-detection of 'current president' / 'latest weather' / etc still works without any user action.
2026-04-22 15:20:45 -07:00
Manmohan
3ab89e7890
feat: deploy d24-sft-r6 with full reasoning mode + live tool use (Tavily)
Model R6 (97% pass rate on 33-probe eval, val_bpb 0.2635):
- modal/serve.py + modal/_tools.py: tool-aware streaming with
  TavilySearchBackend auto-detect, python_start/end state machine,
  output_start/end forcing; mount tavily secret
- modal/serve.py: MODEL_TAG=d24-sft-r6, model path points at new SFT r6
- services/chat-api/routes/messages.py: accept thinking_mode flag,
  inject samosaChaat system prompt (direct or <think> variant) into
  first user message before streaming to Modal
- services/frontend/components/chat/ChatInput.tsx: Brain toggle
  'Think' button next to send; when active, model uses think mode
- services/frontend/components/chat/ChatWindow.tsx: track
  thinkingMode state, pass through to API body as thinking_mode
- services/frontend/components/chat/MessageBubble.tsx: parse and
  render <think>...</think> as collapsible italic blocks;
  <|python_start|>...<|python_end|> as tool-call cards with icons
  per tool name; <|output_start|>...<|output_end|> as result cards
  with expandable JSON
- nanochat/tools.py: TavilySearchBackend class + auto-detect
- nanochat/ui.html: legacy UI reasoning toggle (kept for parity)

Tool execution verified live: query -> web_search via Tavily ->
Macron returned with grounded answer.
2026-04-22 13:43:43 -07:00
Manmohan Sharma
df0584b861
fix(chat-api): detect Modal URL by domain not path suffix 2026-04-16 14:59:20 -07:00
Manmohan
129553b215
Merge pull request #31 from manmohan659/fix/chat-api-fk
fix(chat-api): defer users FK to avoid startup crash
2026-04-16 17:41:05 -04:00
Manmohan Sharma
e8222011d9
fix(chat-api): use_alter on users FK to avoid metadata resolution error
Chat-api doesn't define the users model (owned by auth service), so
SQLAlchemy can't resolve the FK. use_alter=True defers the constraint
to ALTER TABLE, avoiding the NoReferencedTableError at startup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:40:45 -07:00
Manmohan Sharma
6d3e1f0afd
fix(chat-api): support Modal inference URL in inference client
The inference client now auto-detects if the URL already ends with
/generate (Modal's endpoint URL pattern) and skips appending the path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:36:36 -07:00
Manmohan
b5fbebb63f
Merge pull request #26 from manmohan659/fix/missing-models
fix: add missing SQLAlchemy models to auth and chat-api
2026-04-16 16:50:22 -04:00
Manmohan Sharma
8a95a76522
fix: add missing models/ dirs to auth and chat-api services
Root .gitignore had `models/` which matched both ML weights AND
SQLAlchemy model dirs. Changed to `/models/` (root only).
Added auth/src/models/ (User) and chat-api/src/models/ (Conversation, Message).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:50:08 -07:00
Manmohan Sharma
2061f8848b
fix(docker): add structlog + prometheus deps to auth and chat-api Dockerfiles
Auth service was crash-looping with ModuleNotFoundError for
prometheus_fastapi_instrumentator. Chat-api was also missing it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:46:53 -07:00
Manmohan Sharma
aa0818aae2
feat(observability): Prometheus + Grafana + Loki stack for samosaChaat (#9)
Replaces the helm/observability scaffold with a real monitoring stack
wired into the samosaChaat platform.

Helm chart (helm/observability/)
- Chart.yaml declares kube-prometheus-stack (~62.0) and loki-stack
  (~2.10) as subchart dependencies.
- values.yaml configures Prometheus (15d retention, 50Gi PVC,
  ServiceMonitor + rule selector on app.kubernetes.io/part-of:
  samosachaat), Alertmanager (10Gi PVC), Grafana (OAuth-only via
  GitHub + Google, local login disabled, Prometheus + Loki datasources,
  dashboards auto-provisioned from a ConfigMap, email + Slack contact
  points with a critical route to Slack), Loki (50Gi, 30d retention,
  tsdb schema), and Promtail (JSON pipeline that lifts level / service
  / trace_id / user_id into labels, scrape config with pod labels).
- Alert rules: HighCPU, HighMemory, DiskSpaceLow, High5xxRate,
  InferenceServiceDown, HighP99Latency.
- templates/grafana-dashboards-configmap.yaml renders every file under
  dashboards/ into a single grafana_dashboard=1 ConfigMap.
- dashboards/node-health.json, app-performance.json, inference.json -
  fully-formed Grafana dashboards with Prometheus datasource variable,
  templated app selector, thresholded gauges, and LogQL-ready labels.

Scraping (helm/samosachaat/templates/servicemonitor.yaml)
- ServiceMonitor CRs for auth / chat-api / inference that Prometheus
  picks up via the part-of=samosachaat selector; scrapes /metrics
  every 15s and replaces the app label so dashboards line up.

Application instrumentation
- services/{auth,chat-api,inference} each depend on
  prometheus-fastapi-instrumentator and expose /metrics (request count,
  latency histograms, in-progress gauges).
- services/auth/src/logging_setup.py and
  services/inference/src/logging_setup.py mirror the canonical
  chat-api implementation - structlog JSON with service, trace_id,
  user_id context injection.
- configure_logging() is called at create_app() in auth and inference;
  inference's main.py now uses structlog via get_logger() instead of
  logging.getLogger.
- log_level setting added to auth + inference config (LOG_LEVEL env).

Docs
- contracts/logging-standard.md defines the required JSON fields,
  Python (structlog) + Node.js (pino) implementations, LogQL examples
  for cross-service queries, and the x-trace-id propagation contract.

Closes #9

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 12:29:16 -07:00
Manmohan Sharma
8153a4fadf
feat(chat-api): conversation orchestration + SSE streaming proxy (#6)
- FastAPI service that manages conversations and messages in PostgreSQL
  (SQLAlchemy 2.0 async + asyncpg) and streams assistant responses back
  to the client via sse-starlette, forwarding the inference service SSE
  contract unchanged.
- Auth guard validates every request against the auth service
  /auth/validate endpoint (X-Internal-API-Key) and caches results in an
  in-process TTL cache (5 min, 1024 entries) to absorb request bursts.
- Every query filters by authenticated user_id; cross-user access
  returns 404. Message send flow auto-titles the first message,
  persists the streamed assistant response after the client disconnects,
  and records token_count + inference_time_ms.
- /api/models{,/swap} proxies the inference admin surface; swap
  requires is_admin on the validated user.
- Structured JSON logging via structlog with trace_id + user_id
  ContextVars attached to every log line.
- Test suite (pytest + aiosqlite + respx) covers CRUD, user scoping,
  streaming SSE persistence, regenerate, model proxy admin gate,
  and the stream proxy error path. 16/16 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 11:49:51 -07:00
Manmohan Sharma
957f66181d
scaffold monorepo platform layout 2026-04-16 11:06:29 -07:00