Commit Graph

456 Commits

Author SHA1 Message Date
Manmohan
8eaa96ce02
Merge pull request #62 from manmohan659/fix/shorthand-u-veto
fix: veto covers 'who r u' / 'who are u' shorthand
2026-04-22 19:11:05 -04:00
Manmohan Sharma
8b360f5bc8
fix: veto matches shorthand 'u' and 'r' for you/are 2026-04-22 16:10:59 -07:00
Manmohan
979493eb00
Merge pull request #61 from manmohan659/fix/search-veto-and-grounding-suffix
fix: search veto identity+greetings + grounding suffix
2026-04-22 19:09:05 -04:00
Manmohan Sharma
6069a7329b
fix: search veto for identity+greetings, grounding suffix for tool results
Two bugs: (1) force-web-search toggle bypassed identity veto — 'who are u' with Search on hit Tavily and got personality-quiz garbage. Now we always check _is_identity_or_meta() which covers identity, creator, samosaChaat references AND greetings (hi/hello/hey/what's up) before honoring the force toggle. (2) Model ignored injected Tavily result and answered from training priors (e.g. generic VP bio instead of specific Armenia/Iran facts). Added a grounding suffix after <|output_end|> ('Based on the search results above, ' for web_search, 'The result is ' for calculator) so the model's next tokens condition on the fresh tool output instead of spinning up memory.
2026-04-22 16:08:59 -07:00
Manmohan
64067a5edd
Merge pull request #60 from manmohan659/fix/wire-calculator-force
fix: auto-inject calculator force-path
2026-04-22 19:04:33 -04:00
Manmohan Sharma
bd37163138
fix: auto-inject calculator tool call on arithmetic in user message
needs_calculator now extracts the actual expression from: bare arithmetic (900+100), verbal math (900 plus 100), percentage (17% tip on 45), with comma-stripping and whitespace normalization. serve.py wires it into the force-prefix path parallel to web_search — if no web-search trigger, check calculator, pre-seed real tool call + result so the model sees the grounded answer in context.
2026-04-22 16:04:26 -07:00
Manmohan
27d77afb38
Merge pull request #59 from manmohan659/fix/classifier-context-pronouns
fix(classifier): resolve pronouns from conversation history + roadmap
2026-04-22 18:44:20 -04:00
Manmohan Sharma
2e5cf45f86
fix(classifier): resolve pronouns from conversation history + roadmap
Adds needs_web_search_contextual(messages) that picks the subject from the most recent user turn and replaces him/her/it in the current query. Vetoes when prior turns were about identity. Also adds TRAINING_ROADMAP.md — six-phase plan (tokens redacted).
2026-04-22 15:43:57 -07:00
Manmohan
56988b8e06
Merge pull request #58 from manmohan659/feat/ui-input-redesign-and-sanitize
feat(ui): cleaner input + sanitize model-output artifacts
2026-04-22 18:31:07 -04:00
Manmohan Sharma
2b6b7186d3
feat(ui): cleaner input layout + sanitize model-output artifacts
ChatInput: textarea on top, inline tool pills (Think, Search) on the left and send button on the right — single rounded pod, no more bolted-on feel. Smaller pill buttons with subtle ring instead of heavy borders. MessageBubble: add sanitizeModelOutput() that strips training-artifact leaks: <b>/<i>/<strong>/<em> HTML tags, stray standalone '<' markers, leading 'Answer:/Response:' labels, placeholder image markdown. Applied before tool-marker parsing so cleaned text also feeds the <think> card renderer.
2026-04-22 15:31:00 -07:00
Manmohan
dffda1aa15
Merge pull request #57 from manmohan659/feat/remove-model-selector
feat(ui): remove model selector dropdown
2026-04-22 18:27:44 -04:00
Manmohan Sharma
43ad35f73b
feat(ui): remove model selector dropdown - single model only
There's only one deployed model (samosaChaat). Drop the 'nanochat · base' select dropdown from the Sidebar and replace the header model badge with a static 'samosaChaat' label. Removes unused MODEL_OPTIONS / setModel / ChevronDown imports.
2026-04-22 15:27:38 -07:00
Manmohan
93b530c028
Merge pull request #56 from manmohan659/fix/classifier-identity-veto-expanded
fix(classifier): expand identity veto (tell me about yourself etc.)
2026-04-22 18:25:39 -04:00
Manmohan Sharma
fd8e10a820
fix(classifier): expand identity veto to cover all self-introspection queries
Added patterns for: tell me about yourself / you / about you, what do/can you do, what are your capabilities / skills, how do you work, what are you good at, what's your purpose / story / mission, where did you come from, how were you built, are you an AI / chatbot / language model, model meta (model/version/context/training cutoff), creator socials (github/linkedin/twitter), and more writing tasks (song, joke). All 27 identity cases now short-circuit without hitting Tavily.
2026-04-22 15:25:33 -07:00
Manmohan
9ed58c4813
Merge pull request #55 from manmohan659/fix/classifier-identity-veto
fix(classifier): veto identity/creator/meta queries from web_search
2026-04-22 18:24:15 -04:00
Manmohan Sharma
5e3b17e990
fix(classifier): veto identity/meta/greeting/writing queries from web_search
The heuristic classifier was triggering web_search on 'who is your creator', 'who is manmohan sharma', 'who created you' etc — which returned irrelevant Tavily results (Tyler the Creator, Waaree CFO) when the model's SFT training already has the correct grounded identity answer. Added _IDENTITY_VETO_PATTERNS covering: self-referential questions, creator/maker/developer queries, competitor/provenance attacks (are you chatgpt/made by openai), samosaChaat/Manmohan name references, meta-questions (parameters/architecture/training/open source), greetings (hi/hello/hey), small talk, and writing/reasoning tasks that the model answers from memory. Veto runs before all positive classification.
2026-04-22 15:24:08 -07:00
Manmohan
31823b632a
Merge pull request #54 from manmohan659/feat/force-web-search-toggle
feat(ui): Search toggle — force web_search on every message
2026-04-22 18:20:51 -04:00
Manmohan Sharma
215e8bd8c3
feat(ui): add Search toggle that forces web_search every message
New Globe/'Search' toggle next to the Brain/'Think' button. When ON, every message sent pushes force_web_search=true through: frontend -> chat-api -> Modal. Modal bypasses the heuristic classifier and always pre-seeds the assistant turn with a real Tavily-grounded tool call + result. Toggle is independent of Think — use either or both. Classifier still runs when toggle is OFF, so auto-detection of 'current president' / 'latest weather' / etc still works without any user action.
2026-04-22 15:20:45 -07:00
Manmohan
7575e4f22e
Merge pull request #53 from manmohan659/fix/post-injection-scan-window
fix(serve): post-injection scan window
2026-04-22 18:15:41 -04:00
Manmohan Sharma
57be688fdc
fix(serve): don't scan our own injected tokens for the loop-break check
Bug: after runtime tool injection, the post-injection break scanned gen_ids[pre_injection_len:] which included our own injected <|output_start|>…<|output_end|> — so the loop-break fired IMMEDIATELY and stopped the turn before the model could write its final answer. Visible on multi-turn queries like a follow-up 'tell me more about him' where the model naturally issued a tool call, got real Tavily output, and then got cut off. Fix: track post_injection_start (the index AFTER injected tokens) and only scan from there for stray markers.
2026-04-22 15:15:34 -07:00
Manmohan
07b7629ba7
Merge pull request #52 from manmohan659/fix/classifier-strip-sysprompt
fix(serve): clean query classifier input
2026-04-22 18:05:58 -04:00
Manmohan Sharma
297bc4bfb9
fix(serve): strip system-prompt prefix before classifying user query 2026-04-22 15:05:52 -07:00
Manmohan
65e681add5
Merge pull request #51 from manmohan659/fix/forced-web-search-and-ui-cleanup
fix: forced web_search classifier + UI orphan-marker cleanup
2026-04-22 18:01:13 -04:00
Manmohan Sharma
4628d53d67
fix(tools): force web_search on tool-worthy queries + strip orphan markers in UI
Adds modal/_query_classifier.py with regex patterns covering time-sensitive queries (current/present/latest/today/weather/CEO/president/stock/news/sports/etc). Modal serve.py classifies each user message and, when it matches, pre-seeds the assistant turn with a real Tavily-backed tool call + result — so 'whos the present president' now triggers web_search the same as 'current president'. Also tightens the post-injection break to fire on any leaked tool marker. UI: MessageBubble.tsx now strips orphan close-tags (<|output_end|> without an open), dedupes consecutive identical tool-result blocks, and removes fragment markers from text segments so they don't leak into the message body.
2026-04-22 15:01:07 -07:00
Manmohan
f41da418ab
Merge pull request #50 from manmohan659/fix/tool-decode-text-match
fix(serve): decode-tail text match for tool markers
2026-04-22 17:48:58 -04:00
Manmohan Sharma
d49de1575b
fix(serve): decode-tail text match for tool markers
Token-id sequence match failed because BPE has multiple valid tokenizations of the same text, so the greedy encoder output didn't match the model's sampled path. Instead decode gen_ids directly and search for the marker text. Batch-decoding produces complete text even if single-token decodes return empty strings.
2026-04-22 14:48:51 -07:00
Manmohan
eabfbd6d49
Merge pull request #49 from manmohan659/fix/tool-loop-suppression
fix(serve): suppress post-injection tool-call loop
2026-04-22 17:45:03 -04:00
Manmohan Sharma
544ab89c04
fix(serve): stop turn when model emits second output block after injection
Training data taught the model to echo another <|output_start|>…<|output_end|> after our injected real tool result. Detect that second sequence and break the turn; the grounded answer has already streamed to the client.
2026-04-22 14:44:56 -07:00
Manmohan
fd43d6399b
Merge pull request #48 from manmohan659/fix/tool-id-sequence-match
fix(serve): tool-marker detection via token-id sequence
2026-04-22 17:42:23 -04:00
Manmohan Sharma
ba727cb4d5
fix(serve): match tool markers on token-id sequences not decoded text
Previous text-stream approach lost markers because BPE partial-byte tokens decode to empty strings, so assistant_text never accumulated the full marker. Switch to matching the ordinary-token id sequence directly (python_start = [60,124,25145,95,17104,124,62]).
2026-04-22 14:42:07 -07:00
Manmohan
45570b51c5
Merge pull request #47 from manmohan659/fix/tool-text-stream-detection
fix(serve): detect tool markers in text stream, not token ids
2026-04-22 17:39:50 -04:00
Manmohan Sharma
7a92f5b016
fix(serve): detect tool markers in text stream not token ids
The SFT loader tokenizes assistant content with .encode() (ordinary), not .encode_special(), so the model was trained to emit <|python_start|> / <|python_end|> as the 7-token ordinary sequence [60, 124, 25145, 95, 17104, 124, 62] rather than as special token id 32764. My prior state-machine matched token_id == python_start_id, which never fired — so tool calls were never executed and the model just hallucinated fake tool results (Official leadership page etc). Fix: detect markers in the decoded text stream, parse the payload between <|python_start|> and <|python_end|>, execute the tool, inject the real <|output_start|>…<|output_end|> tokens into both the SSE stream and the model's input_ids. Next-token prediction is now grounded on real Tavily output.
2026-04-22 14:39:36 -07:00
Manmohan Sharma
f642cb2eb6
feat(sft): add r7 think+tool prep scripts and compose cleanup
- allow assistant list-shaped content in CustomJSON for joint think+tool JSONL
- add gen_joint_think_tool, filter_reasoning_jsonl, eval_suite_v2 (think_plus_tool probes)
- fix CI: uv sync --no-install-workspace; uv run pytest
- remove unused local inference service from compose; document Modal URL in env examples

Made-with: Cursor
2026-04-22 14:22:47 -07:00
Manmohan
38cb7f7596
Merge pull request #45 from manmohan659/fix/tavily-direct-answer
fix(tools): Tavily include_answer + tool-card overflow
2026-04-22 17:21:08 -04:00
Manmohan Sharma
f70be25212
fix(tools): enable Tavily include_answer and fix UI overflow 2026-04-22 14:20:47 -07:00
Manmohan
d747bcf3e3
Merge pull request #44 from manmohan659/feat/r6-reasoning-tools
feat: R6 deploy + reasoning mode toggle + live tool execution
2026-04-22 16:46:25 -04:00
Manmohan
3ab89e7890
feat: deploy d24-sft-r6 with full reasoning mode + live tool use (Tavily)
Model R6 (97% pass rate on 33-probe eval, val_bpb 0.2635):
- modal/serve.py + modal/_tools.py: tool-aware streaming with
  TavilySearchBackend auto-detect, python_start/end state machine,
  output_start/end forcing; mount tavily secret
- modal/serve.py: MODEL_TAG=d24-sft-r6, model path points at new SFT r6
- services/chat-api/routes/messages.py: accept thinking_mode flag,
  inject samosaChaat system prompt (direct or <think> variant) into
  first user message before streaming to Modal
- services/frontend/components/chat/ChatInput.tsx: Brain toggle
  'Think' button next to send; when active, model uses think mode
- services/frontend/components/chat/ChatWindow.tsx: track
  thinkingMode state, pass through to API body as thinking_mode
- services/frontend/components/chat/MessageBubble.tsx: parse and
  render <think>...</think> as collapsible italic blocks;
  <|python_start|>...<|python_end|> as tool-call cards with icons
  per tool name; <|output_start|>...<|output_end|> as result cards
  with expandable JSON
- nanochat/tools.py: TavilySearchBackend class + auto-detect
- nanochat/ui.html: legacy UI reasoning toggle (kept for parity)

Tool execution verified live: query -> web_search via Tavily ->
Macron returned with grounded answer.
2026-04-22 13:43:43 -07:00
Manmohan
67f568a4f2
fix(nginx): re-resolve upstream IPs so deploys don't break auth (#43)
When docker compose recreates a service, it gets a new internal IP.
nginx was resolving upstream hostnames once at startup and serving 502
until someone manually restarted it — which is what broke /api/auth
after the last deploy.

Uses Docker Compose's embedded DNS (127.0.0.11) and moves each
proxy_pass onto a variable so nginx re-resolves every request.
Rewrites replace the path-stripping behavior that variable-form
proxy_pass doesn't provide out of the box.

Also adds a `nginx -t && nginx -s reload` step in the deploy workflow
so future nginx.conf edits land without manual ssh.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 20:41:01 -04:00
Manmohan
94bec5f2a0
fix(frontend): assistant messages fill the chat column (#42)
Assistant responses were capped at max-w-[75%] of the column, so long
replies broke into a narrow block with dead space on the right. Cap
only applies to user bubbles now; assistant messages use w-full of the
max-w-3xl content column, matching how ChatGPT/Claude render replies.
Also bumps message vertical spacing from mb-3 to mb-5.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 20:23:56 -04:00
Manmohan
748d2e561c
fix(frontend): widen nav pill, default to dark theme (#41)
LandingNav was max-w-3xl which forced "How it works" and "Try
samosaChaat" to wrap on two lines. Bumps the pill to 1100px,
tightens the link padding, demotes the @ handle to lg+, and adds
whitespace-nowrap to every chip so nothing wraps again. Default
theme is now dark — the no-flash init script adds .dark unless the
user has explicitly stored 'light', and the useTheme hook seeds
from the same logic.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 20:08:55 -04:00
Manmohan
9a45f0924d
fix(ci): grant id-token write so EC2 deploy can assume the OIDC role (#40)
aws-actions/configure-aws-credentials needs id-token: write to mint the
OIDC JWT and assume AWS_ROLE_ARN. Without it the deploy-ec2 workflow
fails at the credentials step. Add the permission at workflow scope.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 19:59:14 -04:00
Manmohan
1d2a76eec4
feat: deploy d24 SFT + polished UI redesign with dark mode (#39)
* feat(inference): deploy d24 SFT weights to Modal

Repoint Modal inference app from the broken d20 checkpoint to our own
ManmohanSharma/nanochat-d24 SFT step 484. Rewrites the standalone model
as an inference-only port of nanochat/gpt.py so the modern architecture
(smear gate, per-layer value embeddings, ve_gate, backout, sliding
window attention via SDPA, rotary base 100000, padded vocab, logit
softcap) loads cleanly from the checkpoint. Tokenizer loads the pickled
tiktoken encoding directly so special tokens end up at their true IDs
(32759-32767), and the stop check uses that set instead of hardcoded
0-8. GPU bumped to L4 for headroom. HF token sourced from the
'huggingface' Modal secret.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(frontend): polished redesign with serif display + dark mode

Lifts the craft level of the landing and chat UI without changing the
desi identity. Adds Fraunces for display headlines, a floating pill
LandingNav, a saffron-glow hero with a large serif headline and black
pill CTAs, and three gradient-tiled feature cards with inline SVG
glyphs replacing the emoji cards. The chat empty state is now a serif
greeting with pill-chip prompt starters, and ChatInput is a single
rounded pod so the send button sits inside the input (fixes the
misaligned floating button). Adds a class-based dark mode across the
chat surfaces with a sun/moon toggle in the sidebar footer, powered by
a small useTheme hook and a no-flash init script in the root layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(frontend): add ESLint config so CI lint step passes

next lint was failing with an interactive prompt because the repo had
no ESLint config. Adds a minimal next/core-web-vitals extends and
drops the now-unloadable @typescript-eslint/no-explicit-any disable
directive in the stream proxy by narrowing the body type to unknown.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 19:55:16 -04:00
Manmohan
272086d2c0
Merge pull request #38 from manmohan659/fix/modal-url-detection
fix(frontend): fix stale closure - tokens now render
2026-04-16 18:16:07 -04:00
Manmohan Sharma
16f40ceb54
fix(frontend): pass assistantMsgId directly to fix stale closure bug 2026-04-16 15:15:53 -07:00
Manmohan
7387f7c1d1
Merge pull request #37 from manmohan659/fix/modal-url-detection
fix: direct SSE streaming from chat-api (bypass Next.js proxy)
2026-04-16 18:08:59 -04:00
Manmohan Sharma
a873b6ad46
fix: stream directly from chat-api, bypass Next.js proxy
Replaced the double-proxy (browser→Next.js→chat-api→Modal) with
direct streaming (browser→nginx→chat-api→Modal). Added nginx route
for /api/conversations → chat-api. Inlined SSE parsing in ChatWindow
instead of useSSE hook going through /api/chat/stream.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 15:08:46 -07:00
Manmohan
e9885b2583
Merge pull request #36 from manmohan659/fix/modal-url-detection
fix(chat-api): detect Modal URL by domain not path
2026-04-16 17:59:34 -04:00
Manmohan Sharma
df0584b861
fix(chat-api): detect Modal URL by domain not path suffix 2026-04-16 14:59:20 -07:00
Manmohan
2dd914a69d
Merge pull request #35 from manmohan659/fix/stream-body-format
fix(frontend): type fix for proxyUpstream
2026-04-16 17:53:02 -04:00
Manmohan Sharma
7ecd8a928c
fix(frontend): use any type for proxyUpstream body param 2026-04-16 14:52:50 -07:00