Observability (AI Factory on HM) v1.3
Hub quick links: Model Serving observability — Gen AI hub
Observability in Hybrid Manager
When you deploy AI Factory components on Hybrid Manager, the platform gives you integrated observability for both Model Serving (KServe) and Gen AI Builder workloads.
Model Serving (KServe)
- Use the platform dashboards and logs to monitor InferenceServices and GPU workloads.
- Metrics include request latency, throughput, GPU/CPU utilization, and error rates.
- See: Monitor InferenceService
Gen AI Builder (Assistants & threads)
- Each assistant thread is recorded in the system so you can trace inputs, retrieval context, and outputs.
- Thread data can be viewed in the Gen AI Builder UI inside Hybrid Manager.
- Usage metrics (number of runs, latency, errors) are integrated into the platform’s observability stack.
- See: View threads (hub)
Accessing metrics and logs
- Dashboards. Hybrid Manager surfaces AI Factory metrics through the built-in observability dashboards (Grafana).
- Logs. Use
kubectl logs
or the HM UI log viewer to inspect Gen AI Builder pods, KServe InferenceServices, and retriever jobs. - Tracing. OpenTelemetry drivers are included in Gen AI Builder for deeper tracing; see Gen AI observability drivers (hub).
Typical commands
For cluster-level troubleshooting or custom dashboards, you can also pull metrics and logs directly:
# Check logs for a running InferenceService kubectl logs -n <project-namespace> svc/<model-service-name> # Port-forward Grafana if running locally kubectl port-forward -n observability svc/grafana 3000:3000
Key takeaway
- Model Serving → monitored as KServe services with full metrics.
- Gen AI Builder → assistants and threads tracked for usage and auditing.
- Hybrid Manager provides a unified observability layer — no separate monitoring stack to configure.