Observability (AI Factory on HM) v1.3

Hub quick links: Model Serving observabilityGen AI hub


Observability in Hybrid Manager

When you deploy AI Factory components on Hybrid Manager, the platform gives you integrated observability for both Model Serving (KServe) and Gen AI Builder workloads.

Model Serving (KServe)

  • Use the platform dashboards and logs to monitor InferenceServices and GPU workloads.
  • Metrics include request latency, throughput, GPU/CPU utilization, and error rates.
  • See: Monitor InferenceService

Gen AI Builder (Assistants & threads)

  • Each assistant thread is recorded in the system so you can trace inputs, retrieval context, and outputs.
  • Thread data can be viewed in the Gen AI Builder UI inside Hybrid Manager.
  • Usage metrics (number of runs, latency, errors) are integrated into the platform’s observability stack.
  • See: View threads (hub)

Accessing metrics and logs

  • Dashboards. Hybrid Manager surfaces AI Factory metrics through the built-in observability dashboards (Grafana).
  • Logs. Use kubectl logs or the HM UI log viewer to inspect Gen AI Builder pods, KServe InferenceServices, and retriever jobs.
  • Tracing. OpenTelemetry drivers are included in Gen AI Builder for deeper tracing; see Gen AI observability drivers (hub).

Typical commands

For cluster-level troubleshooting or custom dashboards, you can also pull metrics and logs directly:

# Check logs for a running InferenceService
kubectl logs -n <project-namespace> svc/<model-service-name>

# Port-forward Grafana if running locally
kubectl port-forward -n observability svc/grafana 3000:3000

Key takeaway

  • Model Serving → monitored as KServe services with full metrics.
  • Gen AI Builder → assistants and threads tracked for usage and auditing.
  • Hybrid Manager provides a unified observability layer — no separate monitoring stack to configure.