# Inference Service

Standalone FastAPI microservice for nanochat model serving.

## Endpoints

- `POST /generate` streams model output as SSE
- `GET /models` lists registered and loaded weights
- `POST /models/swap` drains workers and hot-swaps the active weights
- `GET /health` reports readiness
- `GET /stats` reports worker pool state

## Environment

- `MODEL_STORAGE_PATH`
- `DEFAULT_MODEL_TAG`
- `HF_TOKEN`
- `INTERNAL_API_KEY`
- `NANOCHAT_DTYPE`
- `NUM_WORKERS`

Run locally with:

```bash
uv run --project services/inference uvicorn main:app --app-dir services/inference/src --reload --port 8003
```