nanochat/services/chat-api/src/main.py
Manmohan Sharma aa0818aae2
feat(observability): Prometheus + Grafana + Loki stack for samosaChaat (#9)
Replaces the helm/observability scaffold with a real monitoring stack
wired into the samosaChaat platform.

Helm chart (helm/observability/)
- Chart.yaml declares kube-prometheus-stack (~62.0) and loki-stack
  (~2.10) as subchart dependencies.
- values.yaml configures Prometheus (15d retention, 50Gi PVC,
  ServiceMonitor + rule selector on app.kubernetes.io/part-of:
  samosachaat), Alertmanager (10Gi PVC), Grafana (OAuth-only via
  GitHub + Google, local login disabled, Prometheus + Loki datasources,
  dashboards auto-provisioned from a ConfigMap, email + Slack contact
  points with a critical route to Slack), Loki (50Gi, 30d retention,
  tsdb schema), and Promtail (JSON pipeline that lifts level / service
  / trace_id / user_id into labels, scrape config with pod labels).
- Alert rules: HighCPU, HighMemory, DiskSpaceLow, High5xxRate,
  InferenceServiceDown, HighP99Latency.
- templates/grafana-dashboards-configmap.yaml renders every file under
  dashboards/ into a single grafana_dashboard=1 ConfigMap.
- dashboards/node-health.json, app-performance.json, inference.json -
  fully-formed Grafana dashboards with Prometheus datasource variable,
  templated app selector, thresholded gauges, and LogQL-ready labels.

Scraping (helm/samosachaat/templates/servicemonitor.yaml)
- ServiceMonitor CRs for auth / chat-api / inference that Prometheus
  picks up via the part-of=samosachaat selector; scrapes /metrics
  every 15s and replaces the app label so dashboards line up.

Application instrumentation
- services/{auth,chat-api,inference} each depend on
  prometheus-fastapi-instrumentator and expose /metrics (request count,
  latency histograms, in-progress gauges).
- services/auth/src/logging_setup.py and
  services/inference/src/logging_setup.py mirror the canonical
  chat-api implementation - structlog JSON with service, trace_id,
  user_id context injection.
- configure_logging() is called at create_app() in auth and inference;
  inference's main.py now uses structlog via get_logger() instead of
  logging.getLogger.
- log_level setting added to auth + inference config (LOG_LEVEL env).

Docs
- contracts/logging-standard.md defines the required JSON fields,
  Python (structlog) + Node.js (pino) implementations, LogQL examples
  for cross-service queries, and the x-trace-id propagation contract.

Closes #9

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 12:29:16 -07:00

88 lines
2.4 KiB
Python

"""FastAPI entrypoint for the samosaChaat chat API service."""
from __future__ import annotations
from contextlib import asynccontextmanager
import httpx
from fastapi import FastAPI, Request, Response
from fastapi.middleware.cors import CORSMiddleware
from prometheus_fastapi_instrumentator import Instrumentator
from .config import get_settings
from .logging_setup import (
configure_logging,
get_logger,
new_trace_id,
set_trace_id,
set_user_id,
)
from .routes import conversations, messages, models
@asynccontextmanager
async def lifespan(app: FastAPI):
app.state.auth_http_client = httpx.AsyncClient(
timeout=httpx.Timeout(5.0, connect=2.0)
)
app.state.inference_http_client = httpx.AsyncClient(
timeout=httpx.Timeout(60.0, connect=5.0)
)
try:
yield
finally:
await app.state.auth_http_client.aclose()
await app.state.inference_http_client.aclose()
def create_app() -> FastAPI:
configure_logging()
settings = get_settings()
logger = get_logger(__name__)
app = FastAPI(title="samosaChaat Chat API", version="0.1.0", lifespan=lifespan)
app.add_middleware(
CORSMiddleware,
allow_origins=[settings.frontend_url],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
@app.middleware("http")
async def request_context(request: Request, call_next) -> Response:
incoming = request.headers.get("x-trace-id") or request.headers.get("x-request-id")
trace_id = incoming or new_trace_id()
set_trace_id(trace_id)
set_user_id(None)
logger.info(
"request_start",
method=request.method,
path=request.url.path,
)
response = await call_next(request)
response.headers["x-trace-id"] = trace_id
logger.info(
"request_end",
method=request.method,
path=request.url.path,
status_code=response.status_code,
)
return response
app.include_router(conversations.router)
app.include_router(messages.router)
app.include_router(models.router)
@app.get("/api/health")
async def health():
return {"status": "ok", "ready": True, "service": "chat-api"}
Instrumentator().instrument(app).expose(app, endpoint="/metrics", include_in_schema=False)
return app
app = create_app()