nanochat/services/inference
2026-04-16 11:19:18 -07:00
..
src extract standalone inference service 2026-04-16 11:19:18 -07:00
tests extract standalone inference service 2026-04-16 11:19:18 -07:00
Dockerfile extract standalone inference service 2026-04-16 11:19:18 -07:00
pyproject.toml extract standalone inference service 2026-04-16 11:19:18 -07:00
README.md extract standalone inference service 2026-04-16 11:19:18 -07:00
uv.lock extract standalone inference service 2026-04-16 11:19:18 -07:00

Inference Service

Standalone FastAPI microservice for nanochat model serving.

Endpoints

  • POST /generate streams model output as SSE
  • GET /models lists registered and loaded weights
  • POST /models/swap drains workers and hot-swaps the active weights
  • GET /health reports readiness
  • GET /stats reports worker pool state

Environment

  • MODEL_STORAGE_PATH
  • DEFAULT_MODEL_TAG
  • HF_TOKEN
  • INTERNAL_API_KEY
  • NANOCHAT_DTYPE
  • NUM_WORKERS

Run locally with:

uv run --project services/inference uvicorn main:app --app-dir services/inference/src --reload --port 8003