nanochat/services/inference
Manmohan Sharma 07892c0f00
fix(inference): regenerate uv.lock after structlog/prometheus deps added
The observability PR added structlog and prometheus-fastapi-instrumentator
to inference pyproject.toml but did not regenerate uv.lock, causing
Docker build to fail with --locked flag.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 12:49:05 -07:00
..
src feat(observability): Prometheus + Grafana + Loki stack for samosaChaat (#9) 2026-04-16 12:29:16 -07:00
tests extract standalone inference service 2026-04-16 11:19:18 -07:00
Dockerfile extract standalone inference service 2026-04-16 11:19:18 -07:00
pyproject.toml feat(observability): Prometheus + Grafana + Loki stack for samosaChaat (#9) 2026-04-16 12:29:16 -07:00
README.md extract standalone inference service 2026-04-16 11:19:18 -07:00
uv.lock fix(inference): regenerate uv.lock after structlog/prometheus deps added 2026-04-16 12:49:05 -07:00

Inference Service

Standalone FastAPI microservice for nanochat model serving.

Endpoints

  • POST /generate streams model output as SSE
  • GET /models lists registered and loaded weights
  • POST /models/swap drains workers and hot-swaps the active weights
  • GET /health reports readiness
  • GET /stats reports worker pool state

Environment

  • MODEL_STORAGE_PATH
  • DEFAULT_MODEL_TAG
  • HF_TOKEN
  • INTERNAL_API_KEY
  • NANOCHAT_DTYPE
  • NUM_WORKERS

Run locally with:

uv run --project services/inference uvicorn main:app --app-dir services/inference/src --reload --port 8003