mirror of
https://github.com/karpathy/nanochat.git
synced 2026-05-14 11:47:34 +00:00
Replaces the helm/observability scaffold with a real monitoring stack
wired into the samosaChaat platform.
Helm chart (helm/observability/)
- Chart.yaml declares kube-prometheus-stack (~62.0) and loki-stack
(~2.10) as subchart dependencies.
- values.yaml configures Prometheus (15d retention, 50Gi PVC,
ServiceMonitor + rule selector on app.kubernetes.io/part-of:
samosachaat), Alertmanager (10Gi PVC), Grafana (OAuth-only via
GitHub + Google, local login disabled, Prometheus + Loki datasources,
dashboards auto-provisioned from a ConfigMap, email + Slack contact
points with a critical route to Slack), Loki (50Gi, 30d retention,
tsdb schema), and Promtail (JSON pipeline that lifts level / service
/ trace_id / user_id into labels, scrape config with pod labels).
- Alert rules: HighCPU, HighMemory, DiskSpaceLow, High5xxRate,
InferenceServiceDown, HighP99Latency.
- templates/grafana-dashboards-configmap.yaml renders every file under
dashboards/ into a single grafana_dashboard=1 ConfigMap.
- dashboards/node-health.json, app-performance.json, inference.json -
fully-formed Grafana dashboards with Prometheus datasource variable,
templated app selector, thresholded gauges, and LogQL-ready labels.
Scraping (helm/samosachaat/templates/servicemonitor.yaml)
- ServiceMonitor CRs for auth / chat-api / inference that Prometheus
picks up via the part-of=samosachaat selector; scrapes /metrics
every 15s and replaces the app label so dashboards line up.
Application instrumentation
- services/{auth,chat-api,inference} each depend on
prometheus-fastapi-instrumentator and expose /metrics (request count,
latency histograms, in-progress gauges).
- services/auth/src/logging_setup.py and
services/inference/src/logging_setup.py mirror the canonical
chat-api implementation - structlog JSON with service, trace_id,
user_id context injection.
- configure_logging() is called at create_app() in auth and inference;
inference's main.py now uses structlog via get_logger() instead of
logging.getLogger.
- log_level setting added to auth + inference config (LOG_LEVEL env).
Docs
- contracts/logging-standard.md defines the required JSON fields,
Python (structlog) + Node.js (pino) implementations, LogQL examples
for cross-service queries, and the x-trace-id propagation contract.
Closes #9
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
50 lines
1.5 KiB
Python
50 lines
1.5 KiB
Python
"""Runtime configuration for the auth service.
|
|
|
|
All configuration is loaded from environment variables using pydantic-settings.
|
|
Private/public keys are PEM-encoded RSA material used for RS256 JWTs.
|
|
"""
|
|
from __future__ import annotations
|
|
|
|
from functools import lru_cache
|
|
|
|
from pydantic import Field
|
|
from pydantic_settings import BaseSettings, SettingsConfigDict
|
|
|
|
|
|
class Settings(BaseSettings):
|
|
model_config = SettingsConfigDict(env_file=".env", extra="ignore")
|
|
|
|
database_url: str = Field(default="postgresql+asyncpg://localhost/samosachaat")
|
|
|
|
google_client_id: str = Field(default="")
|
|
google_client_secret: str = Field(default="")
|
|
|
|
github_client_id: str = Field(default="")
|
|
github_client_secret: str = Field(default="")
|
|
|
|
jwt_private_key: str = Field(default="")
|
|
jwt_public_key: str = Field(default="")
|
|
jwt_issuer: str = Field(default="samosachaat-auth")
|
|
jwt_access_ttl_seconds: int = Field(default=3600)
|
|
jwt_refresh_ttl_seconds: int = Field(default=7 * 24 * 3600)
|
|
|
|
frontend_url: str = Field(default="http://localhost:3000")
|
|
internal_api_key: str = Field(default="")
|
|
|
|
auth_base_url: str = Field(default="http://localhost:8001")
|
|
session_secret: str = Field(default="dev-session-secret-change-me")
|
|
|
|
cookie_secure: bool = Field(default=False)
|
|
cookie_domain: str | None = Field(default=None)
|
|
|
|
log_level: str = Field(default="INFO")
|
|
|
|
@property
|
|
def refresh_cookie_name(self) -> str:
|
|
return "samosachaat_refresh"
|
|
|
|
|
|
@lru_cache(maxsize=1)
|
|
def get_settings() -> Settings:
|
|
return Settings()
|